• Text Mining & its Applications

    Unearthing the intelligence hidden in free form data

    Text Mining – What does it add to transaction data

    Text mining refers to extraction and putting together of textual information into quantitative forms in order to derive information and garner insights from it.

    There are many industry applications of text mining-

    • Market research surveys use text mining to make sense of open ended questions in surveys.
    • CRM data analysis uses text mining for adding value to customer churn modeling using customer feedback data with transaction data.
    • The entertainment business uses text mining as ‘sentiment analysis;’ to gauge if new movie releases garner favorable or unfavorable word of mouth reviews.
    • Publishers use text mining to get access to information in large databases via indexing and retrieval.

    In retail, text mining or text analytics in conjunction with transaction data analytics helps retailers-

    • Look deeper at real customer, product and service issues
    • Enhance value from market research and may even help to cut costs of doing large scale market research studies
    • Improve customer service by cutting lead times to address common issues
    • Create better products

    The process by which retailers can extract value from text data is:

    1. Identifying where text data is collected

    The three sources where text mining data is available and can be leveraged are:

    • Surveys – These are usually customer satisfaction surveys that a retailer initiates with a customer. A lot of open ended information provided in these surveys contains valuable text information that should be mined for a deeper look at customer issues.
    • Contact centre data – This data consists of e-mails, phone in transcripts and web chat or submissions by customers who are communicating an issue. Analysis of this data can yield a lot of very valuable information.
    • Internet data – Data on the internet in blogs, product review sites and expert groups contains a wealth of information that is not gleaned by satisfaction surveys or customer feedback via phone.

    2. Changing text data to structured form
    The next step in the process is to change unstructured data into a more manageable form of structured data. This involves several small steps:

    • Identification of the sources from where text data needs to be extracted
    • Decision on which unstructured data to analyze i.e. product related, sentiment related, time period related, particular promotion related etc.
    • Use of software that can extract the relevant information from various places
    • Creation of theme or concept buckets to be able to take a closer look at extracted information and link it to transaction data

    3. Analyzing text data
    Once the unstructured data has been made manageable; reports can then be generated from it. These help the retailer focus on addressing key metrics as they come up and resolve the relevant issues. Thus keeping cleaned text data as a separate entity allows retailers to focus on data which would otherwise not be looked at.

    4. Integrating text data with transaction data
    A lot of actionable insights can be generated if text data that is cleaned up is then integrated to the larger transaction data warehouse. The linking of these two complementary data sets generates added value for retail organizations. It helps answer questions like:

    • What is the reason for higher returns in a particular town/city/region?
    • Why are customers calling in regarding a particular SKU?
    • Which offer will a customer be most likely to accept?
    • Why did a particular promotion not do well?
    • What are the real reasons why customers have lapsed?
    • Which competitor is doing better in terms of product and quality and price?
    • Is a certain customer group adopting a new product more than others?

    While most retailers have the text information they need to improve their knowledge of their customers, products and service, very few presently mine this information. Retailers thus need to unlock the value lying in unstructured data with a clear vision on how they will clean and integrate this data to larger quantitative data sets. They can then start to and use the insights generated from this data to improve customer experience through better service, products, quality and process.

    Using text data to capture and add value to voice of customer



    • Mining the web to add semantics to retail data mining-Rayid Ghani
    • A method for generating plans for retail store improvements using text mining and conjoint analysis-T Kaneko in Proceedings of the 2007 conference on Human interface: Part II
    • Mining Text in a Retail Enterprise Assessing Customer Sentiment and Satisfaction by Sara Charen, Dan Ross
    • Text Analytics 2009-Users perspectives on solutions and providers-Seth grimes Alta Plana
    • Calling for Customer Experience Insight Social media may be hot, but don’t leave contact centers out in the cold. By Sid Banerjee Posted Mar 22, 2010 CRM.com
  • Python in Data Science

    “The joy of coding Python should be in seeing short, concise, readable classes that express a lot of action in a small amount of clear code — not in reams of trivial code that bores the reader to death” – Guido van Rossum (Creator of Python).

    Data Science is an emerging and extremely popular function in companies. Since the volume of data generated has increased significantly a new array of tools and techniques are deployed to make decisions out of raw big data. Python is among the most popular tools used by Data Analysts and Data Scientists. It’s a very powerful programming language that has custom libraries for Data Science.

    Python is a widely used general-purpose, high-level programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than would be possible in languages such as C++ or Java. The language provides constructs intended to enable clear programs on both a small and large scale.

    Python has been one of the premier general scripting languages, and a major web development language. Numerical and data analysis and scientific programming developed through the packages Numpy and Scipy, which, along with the visualization package Matplotlib formed the basis for an open-source alternative to Matlab. Numpy provided array objects, cross-language integration, linear algebra and other functionalities. Scipy adds to this and provides optimization, linear algebra, optimization, statistics and basic image analysis capabilities (Open CV).

    “One Python to Rule Them All”

    Beyond tapping into a ready-made Python developer pool, however, one of the biggest benefits of doing data science in Python is added efficiency of using one programming language across different applications.

    It turns out that the benefits of doing all of your development and analysis in one language are quite substantial. For one thing, when you can do everything in the same language, you don’t have to suffer the constant cognitive switch cost between languages and analysis.

    Also, you no longer need to worry about interfacing between different languages used for different parts of a project. Nothing is more annoying than parsing some text data in Python, finally getting it into the format you want internally, and then realizing you have to write it out to disk in a different format so that you can hand it off to R or MATLAB for some other set of analyses. In isolation, this kind of thing is not a big deal. It doesn’t take very long to write out a CSV or JSON file from Python and then read it into R. But it does add up. All of this overhead vanishes as soon as you move to a single language.

    Powerful statistical and numerical packages of python are:

    • NumPy and pandas allow you to read/manipulate data efficiently and easily
    • Matplotlib allows you to create useful and powerful data visualizations
    • scikit-learn allows you to train and apply machine learning algorithms to your data and make predictions
    • Cython allows you to convert your code and run them in C environment to largely reduce the runtime
    • pymysql allows you to easily connect to SQL database, execute queries and extract data
    • Beautiful Soup to easily read in XML and HTML type data which is quite common nowadays
    • iPython for interactive programming

    Python as Part of Data Science


    Python as a part of the eco-system, can be broadly divided into 4 parts:
    1) DATA
    2) ETL
    3) Analysis and Presentation
    4) Technologies and Utilities

    Data, as the word suggests. We can see data in any form: structured or unstructured. Structured data is a standard way to annotate your content so machines can understand it, it can be in a SQL database, a csv file etc. Structured data is always a piece of cake in data science industry.

    Actual problem starts when we see unstructured data. Unstructured data is a generic label for describing data that is not contained in a database or some other type of data structure. Unstructured data can be textual or non-textual. Textual unstructured data is generated in media like email messages, PowerPoint presentations and instant messages. Python is very useful in reading all kind of the data format and bring in to a structured data format.

    Extraction Transformation and Loading is the most costly major part of the data science. A data scientist spends 80% of time in data exploration, data summarization, data extraction and transformation and 8% in modeling and 12% in visualization. It can vary from project to project.

    Extraction: the desired data is identified and extracted from many different sources, including database systems and applications.
    Transformation: The transform step applies a set of rules to transform the data from the source to the target. This includes converting any measured data to the same dimension (i.e. conformed dimension) using the same units so that they can later be joined.
    Loading: it is necessary to ensure that the load is performed correctly and with as little resources as possible. The target of the Load process is often a database.

    Let’s take example of we need a twitter data for a social media sentimental analysis.

    We need to follow basic few step to get a clean structured data.

    1) Reading all the tweet in one language (encoding into utf-8)
    2) Removing Apostrophes e.g. “‘re” should be replace by “etc.
    3) Punctuations in sentence should be removed. e.g. !()-[]{}’”,.^&*_~ should be removed.
    4) Remove hyperlink
    5) Remove repeated character from the sentence. Eg.”I’m happppyyyyy!!!” should be “I am happy” after you have used all the step from 1 to4 in the sentence.

    Analysis and Presentation: Analysis with python can be broadly defined as Analysis with package like Pandas.

    Package Highlights of Pandas:

    • A fast and efficient DataFrame object for data manipulation with integrated indexing.
    • Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format.
    • Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form.
    • Flexible reshaping and pivoting of data sets.
    • Intelligent label-based slicing, fancy indexing, and subsetting of large data sets.
    • Columns can be inserted and deleted from data structures for size mutability.
    • Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets.
    • High performance merging and joining of data sets.
    • Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure.
    • Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data.

    Scikit-learn : scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

    Further plotting in python in python, we can use packages like Matplotlib, PyPlot.

    Matplotlib: is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like wxPython, Qt, or GTK+. There is also a procedural “pylab” interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB. SciPy makes use of matplotlib.

    Technologies and Utilities: when we say Technologies and Utilities that are all the repeated work what has be done in the past to get a result.

    Numpy play as important role in automations.
    NumPy is the fundamental package for scientific computing with Python. It contains among other things:

    • a powerful N-dimensional array object
    • sophisticated (broadcasting) functions
    • tools for integrating C/C++ and Fortran code
    • useful linear algebra, Fourier transform, and random number capabilities

    Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

    IPython Notebook : The IPython Notebook is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media, as shown in this example session:

    Window Class: google-chrome

    The IPython notebook with embedded rich text, code, mathematics and figures.
    It aims to be an agile tool for both exploratory computation and data analysis, and provides a platform to support reproducible research, since all inputs and outputs may be stored in a one-to-one way in notebook documents.

    There are two components:
    1) The IPython Notebook web application, for interactive authoring of literate computations, in which explanatory text, mathematics, computations and rich media output may be combined. Input and output are stored in persistent cells that may be edited in-place.
    2) Plain text documents, called notebooks, for recording and distributing the results of the rich computations.
    The Notebook app automatically saves the current state of the computation in the web browser to the corresponding notebook, which is just a standard text file with the extension .ipynb, stored in a working directory on your computer. This file can be easily put under version control and shared with colleagues.

    Despite the fact that the notebook documents are plain text files, they use the JSON format in order to store a complete, reproducible copy of the current state of the computation inside the Notebook app.

    Thus, Python has a great future in data science industry. There is a large community of developers who continually build new functionality into Python. A good rule of thumb is: if you are thinking about implementing a numerical routine into your code, check the documentation website first and you will be have your model ready in Python code. Happy Learning .

  • Data Visualisation

    Today we are trapped amidst tons and millions of cryptic data. We are continuously striving to understand and infer from these data. Data mining is the order of the day, but the perception of the data is what we believe is the end result.

    Why are weather reports more appealing to us when presented on a map than on bland tables? Why do we find those infographic images of the news articles more captivating? Be it the sensex points or stakeholder’s share or the earnings/turnovers; we inherently focus on the graphs and the charts. We need to admit that all those images are outcomes of tons of data, but yet are highly attractive to us.


    The secret behind this is the power of Visualization. Visualization can be called the art and science of data, by the way that it captures our attention and projects the data in a simplified way. Right from our childhood, we have been taught to perceive alphabets as visual images; we remember people when we see, rather than when we hear from them. Such is the power of Visualization on us that there is no doubt those infographic images are more appealing to us!

    This projection of data into pictorial or graphical form for the ease of understanding of the common man is what we call as the Data Visualisation technique. This Data Visualisation is making life easier in more than several ways. Let us understand the vitality of data visualisation by citing some critical scenarios.

    A Sales Manager of a Company works across umpteen sales figures on a weekly/monthly/quarterly/yearly basis. His past sales tracks guide him towards his future sales projections .So important are the vast data for him that when projected in a chart form, it just eases his life. The data can be presented across any time line, inference can be drawn from past or for future projections and more importantly the totality of data available can be viewed in one go.

    At other instances, we might come across scenarios when limited data needs to be driven to draw several hidden inferences. Marketing Managers need to walk their way through three dimensional data, say Market Share, Share of Voice and YoY Profit for their brands. The raw data present would just drive them crazy. On the Contrary, a simple bubble chart with Market Share and Share of Voice data on the axes and the Profit as the size of bubble would make wonders for them! The entire data can thus be projected on a common platform and hidden inferences can be drawn. With these visuals in place, drawing brand equity, competitiveness of the brands and what not can be derived easily and effectively.

    There are Challenging occasions when managers have to work with tons of data and arrive at concise and compelling findings. Working on such cumbersome data and projecting them in a presentable way would have not been possible, but for the data visualisation. By the aid of data visualisation, data can be drilled down to charts and graphs; they can be well integrated to take viewers on an interactive journey to grab insights out of those data.

    Thus by projecting data in visual forms, we not only draw the attention of the viewers, but also gain their confidence. As all the data is available at one place, the authenticity and credibility is established on either sides.

    Data Visualisation has a lot of scope for the future. As the vast data becomes presentable and readable, Data Visualisation paves way for further research on the data. Based on the present trends or emerging patterns, several new insights can be drawn. The more complex the data, the more is the scope to ponder on the data with these handy visualisation tools.
    Thus Data Visualization is the universal language of Data Science. It is easily comprehensible, concise and is the vital tool for the data analysis of the unexplored!!.

    Being factually accurate, Visualization helps viewers to make conclusions based on the data by offering important context for understanding the underlying information.


    “Formal education will make you a living. Self-education will make you a fortune”.
    -Jim Rohn

    Well let’s face it: we always tend to learn more when we’re thoroughly involved in a task rather than when we are given a lecture on it by a third person. For example, I could either give you lecture after lecture on how to make a sandwich, or I could drop you into a kitchen and say “It’s all yours. Make me a sandwich.” In an age when the internet can give us information about anything and everything under the sun, when learning about the surface of Pluto has become easier than finding your lost bike keys, it should not be too difficult to make your first sandwich, or your first dashboard for that matter. This is the idea that had given birth to the concept of ‘Hackathon’ in TEG Analytics.

    The Casus belli:

    The plan is to encourage self-learning and competition, while another benefit is that it initiates inter-team communication and knowledge sharing, besides providing a great opportunity to the participants to showcase their talent in front of the biggest brains of the company. That way, the company is also able to identify its employees’ talents and weaknesses. Needless to say, you end up learning a lot in the entire process. So yes, it is a win-win situation for all!

    It is important to mention here that hackathons in TEG are not like the regular hackathons as per the dictionary definition of the word. It’s actually even better! It’s not limited to coding and logical thinking skills of the person alone, but involves data visualization and business understanding.


    How to Train Your Dragon?

    I mean… employees.

    Well let me give you a brief idea about how hackathons are conducted here and how they support the concept of self-learning. First, the organisers make sure that all the employees have had at least one official training on the basics of the particular skill that they are going to be tested on. Then, they are divided into teams of two. These teams are built in such a way, that the most skilled person is partnered with a lesser skilled person and so on. The organisers then provide them with a common business problem that needs to be solved using a certain soft skill and presented before a panel of judges within a specified time frame, which is generally 15-20 days. The business problem is created in a way that gives the participants the feel of a real-life client handling process. They can Google as much as they need to and learn all about the problem or the tool, besides taking help from the organisers to clear their doubts. To motivate the participants further, incentives in the form of monetary benefits are provided.


    The Battles of Tableau and Excel:

    The first ever such competition held in TEG Analytics was Tableau hackathon. The second was Excel hackathon which was concluded recently. In both the events, the enthusiasm of the participants was extraordinary and the competition tough. Here, it is worth mentioning that the second hackathon witnessed more than twice the number of participants as the first. The competitive spirit among the teams was incredible, with each team leaving no stone unturned to prove they are better than all others. A week before the final day, one could find the participants spending late nights in office and even working on weekends to make sure they have used every last fragment of grey matter available to make their dashboards absolutely perfect. One the day of the presentation, their morale was sky-high and the passion was almost contagious, as the teams, armed with their codes, calculations and charts battled for the title of the ‘Best Dashboard’.


    And the Victor is..?

    Everybody! Because everybody wins. To conclude, I can say that conducting such events within the organization is a brilliant idea to encourage learning, team-spirit, healthy competition and improvement of one’s own soft skills. You could say it’s like pushing a bird off a tree and leaving it with two options: learning to fly or preparing to fall. And at the end of it, whether you fly or fall, you definitely learn the use of your wings and will probably be confident enough to flap them the next time you have to save yourself.

    Adwitiya Borah
    Data Analyst, TEG Analytics

  • Internet of Things Analytics


    Tony Stark has J.A.R.V.I.S; we have IoTA

    Most of us must have seen the movie Terminator in which the artificially intelligent operating system SKYNET becomes a lot more intelligent and decides to take over the world by spreading itself into all the systems across the world. Or we have seen the our favorite billionaire and brainy Tony Stark working with his faithful companion J.A.R.V.I.S. who does everything for him from making a cup of coffee to saving his life. In both these examples, one would notice the significance of what an artificially intelligent system could do and the scale of revolution it can bring about in our lives.

    Similar is the case with IoT – Internet of Things. IoT is a network of physical objects or “things” embedded with electronics, software, sensors, and connectivity to enable objects to exchange data with the manufacturer, operator and/or other connected devices based on the infrastructure of International Telecommunication Union’s Global Standards Initiative. The Internet of Things allows objects to be sensed and controlled remotely across existing network infrastructure, creating opportunities for more direct integration between the physical world and computer-based systems, and resulting in improved efficiency, accuracy and economic benefit. In simple words, if you want something to be done, tell your device and it would do it for you! Sounds quite stereotypical, isn’t it? One would argue that our mobile devices are already 50% voice operated. What difference would IoT make? Well, they would be surprised to know ramifications of the global application of IoT would spark the beginning of a new era; especially when it comes to Analytics.

    The crux of the discussion is the IoT enabled devices. They would collect data from all over the world; so your data would be geographically vast and demographically omniscient. With such a level of data, using the IoT analytics tools will improve real time decision making and customer experience. A coffee-maker with a bunch of buttons to make a good coffee is just a simple coffee-maker. But one which is network-connected and can be accessed from mobile phones is advanced or in better words, a “smart” coffee maker. The manufacturing industry could gather the data of the type of coffee you regularly drink and can make changes so that the maker adjusts itself to your preferences. Various enterprises benefit from IoT by monetizing their data assets and providing visibility to their customers and also understanding their needs in a much better way.

    • Compelling visualizations, interactive reporting, ad hoc analysis and tailored dashboards can be embedded into applications.
    • Highly customizable web-based user interface to match the branding, look and feel
    • Gathering competitor’s information and getting more insights based on merger, acquisitions, partnerships and pricing strategies.
    • Breaking down of markets into sub-segments to get a more comprehensive picture of the customer activities and buying patterns.

    IoT analytics tools have an unprecedented role in major industries like manufacturing, healthcare, energy and utilities, retail, transportation and logistics.

    Future cities are likely to include smart transport services for journey planning, adapting to travelers journey patterns, etc. to reduce expenditure and make transport more affordable. Smart buildings will be able to react to information derived from sensor networks across the city to adjust ventilation and window settings based on cross-referencing of pollution level and weather. Imagine how amazing it would be if the systems alert you to an open parking space when you enter a building. They can also be integrated into security systems to monitor the identities of the inhabitants and checking if any unauthorized personnel enter the place.

    A world where devices or “things” connected through networks and servers all across the world are doing analysis on not only your business or your competitor’s profit but integrating accurate decision making into daily lives of people is something which is most palpable right now. It is analytics at its best and what it should actually be like. This is just like having J.A.R.V.I.S or SKYNET everywhere in the world.

    Business leaders are busy thinking about a better future for the world; well I’d say that –

    ‘Internet of Things Analytics (IoTA) is the FUTURE’


    Driving efficiencies for marketers from brand-tracking studies

    Business Problem

    Brand tracks & reviews provide hindsight- like report cards they evaluate past activities. How do you take action from them? Can they provide you with foresight on how consumers make choices in a category? What are the relevant needs, how do consumers decode image, how do they respond to them and how do these integrate to determine overall brand perception, or guide purchase decision.


    Structural Equation Modeling (SEM) is an advanced analytics technique that models the complex relationship of product attributes, brand image and consumer response that determines consumer choice. It has helped our clients – leading global marketers – in answering questions like:

    1. Optimizing product formulation. Identify relevant consumer met/ unmet needs and how consumers perceive them. Plan scenarios around how product attributes can drive choice.
    2. What is the hierarchy of needs/benefits, brand image and the responses they evoke; how do those ladders up to impact overall brand perception, guide purchase decision.
    3. What is the competitive landscape- who is an immediate threat? In which segment? What are the opportunities – are we exploiting the brand associations?
    4. Providing strategic direction to building the brand identity system
      • What should be the optimal strategy for my brand- what image and product attributes evoke the best emotional, rational response from the target segment.
    5. Optimizing brand portfolio holistically, from a consumer relevance & response standpoint.
    Behind the Scenes

    SEM is based on the simple fact that different brand attributes have varying degrees of influence on product purchase/ brand equity. For any category, overall equity is built by multiple associations. SEM models the effect of attributes simultaneously influencing the ultimate dependent variable chosen. An understanding of the direct and indirect influences of product attributes- hierarchy of relevance- combined with competitive brand strength on these attributes helps deliver actionable insight.

    As can be observed from Figure 1 there are 13 consumer themes in boxes which build overall equity here. These themes are frames from 56 variables in the tracking study. The themes on the top are consumer-response themes while the lower ones are brand-trigger attributes and imagery. Note that the thickness of the lines show the strength of association and the percentages mentioned in the boxes represent the total effect of the theme on overall equity.


    Figure 2 shows ranking of equity themes in order of relevance juxtaposed with brand competitiveness on each theme. Trust, Reliable Protection and Popularity of brand are the top 3 relevant themes in this scenario. Client’s Brand B has strength on the themes of being One with nature, Friendly and Odor protection, while competitor Brand A has a higher strength on Being traditional and Having a heritage.
    Figure 3 indicates that ‘reliable protection to popularity’ and ‘confident to trust’ is a pathway which impacts overall equity. It can be therefore concluded that Brand B needs to own Reliable Protection and Popular Image to evoke Confidence & Trust thus having an impact on overall equity.

    Key Advantages

    The key advantages that such an approach offers are –

    1. Integrates the supplier side dimensions (product attributes, brand image) and consumer side dimensions (rational, emotional responses)
    2. Overlays brand scorecard onto the category model enabling the marketer to identify opportunities & threats.
    3. Can split model by segments, for additional insights.
    4. SEM can be conducted off any regular consumer tracking survey
  • Latest Analytics Opportunities in US Healthcare – An update


    Over the past few years there has been a lot of buzz around the OBAMACARE Health Reform which was implemented in 2014. The reform mandates that every individual buy health insurance, irrespective of which health bracket they fall in. In a way, it is a mandate for the employers too, to provide health insurance to each of their employees, irrespective of their company size. While this was gaining popularity, several states filed law suits against the federal government claiming that it was unconstitutional to force citizens to buy health insurance.

    Companies spend billions of dollars every year on health insurance. Yet, we see very limited initiatives to organize healthcare data and do analytics around it. The major hurdle that comes in the way of healthcare companies, is to decide on the kind of health plan / deal they can offer small and medium size employers, so that their interest in providing comprehensive healthcare to their employees goes up. Companies like Blue Cross & Blue Shield, Kaiser Permanente, Highmark, United Health Group etc. have spent lots of money in setting up their IT infrastructure, but the investment in Exploratory and Predictive Analytics is way behind.

    Exploratory Data Analysis has proved to be a great starting point in the analysis of B2B healthcare relations. It has enabled healthcare firms to help companies of all sizes in providing comprehensive health insurance. Analysis like classification and segmentation help in strategizing plans for small companies (with even less than 5 employees) where it gives them an option to be a part of a pool or consortium and avail healthcare like a mid-sized company. Now these companies individually may not be in a position to buy healthcare for its employees at all, but because they join a bigger umbrella (a consortium or a pool), it helps them afford the healthcare plan.

    For companies that are mid-sized and over, proper predictive analytics can help healthcare firms estimate the amount of claims that might arise from employees. This will help them estimate right premiums and other costs like co-pay and deductibles for the insured.
    With proper analysis of an individual’s health, premium and claims history, data scientists might be able to suggest a proper plan for individuals (HMP vs PPO vs Consumer Directed Health Plan – CDHP).

    There is lots of data available in the healthcare system which requires extensive research and analysis. These include –

    • DxCG Health Risk Scores Data
    • Claims Data
      • Inpatient Claims
      • Out-Patient Claims
      • Denials Data
      • Resubmissions of Claims
    • Premiums (Co-Pay and Deductible)
    • Dental Insurance
    • Eye Care Insurance
    • TPA Data

    The points mentioned above just form the tip of the iceberg. Data scientists have become really interested in the use of big data in healthcare insurance. About 70% of the data in healthcare is unstructured. By using Big Data techniques data scientists expect to learn trends from data so that important information can be extracted from them which could be used for serving Healthcare firms, employers and brokers as well.

  • Fortune in the cookies – maximizing online customer acquisition

    A cry in the dark

    Consider a person who has just walked into a Macy’s in a mall. So, why is she in the store and what is she looking for? Has she been to other stores or other Macy’s stores looking for the same item(s) she now wishes to purchase?

    In the traditional world, Macy’s can never know any of the above and that precisely has been and always shall be, the Achilles’ heel of traditional marketing. It constitutes a 2-player game (Player 1: Buyer, Player 2: Seller), where a player 1 has a distinct advantage due to the incomplete information at Player 2′s disposal. The Buyer here is looking to maximize his Utility from the purchase and the Seller here is looking to make a sale and maximize his margins from it. In the traditional setup, the Buyer generally knows what products are being offered, their price, the potential cost of those products to the Seller and similar stats even on the seller’s competitors. The Seller on the other hand has little to no information on what the Buyer has on her mind about her tastes and preferences, purchase behavior, prior purchase attempts, the urgency of her need, or the trigger for her purchase decision. The best guess the Seller can make on the Buyer is her purchasing ability and her intent to buy.  This is what we would call an incomplete information set and on top of that an asymmetric one (given that the Buyer knows more). Any system that improves on this information set for the Seller, improves on his ability to maximize his objective function.

    “I know what you did last summer…and even 5 minutes ago”
    Now if you take our current e-commerce environment – chances are all the activities of the buyer are recorded in what we call cookies.  This includes how many times she has viewed the product, in how many sites, for what length of time, how many times she has shown the intent to purchase by adding it to her shopping cart, what related products she has viewed or purchased, and how related searches she had conducted. It is these cookies that hold the key to unlocking the utility function of the consumer, by revealing her tastes and preferences, purchase behavior, and the whole nine yards. The question is – How do you wield that key?  There is terabytes of data to trudge upon before you get to something meaningful and actionable. Every Buyer has her own length of history and search pattern for one single purchase. Multiply that by the few dozen purchases she makes in a year and a few million customers the single Seller is dealing with.

    In my opinion, the most amazing gift endowed upon us marketers by digital media is this ability to deconstruct the drivers, needs, and aspirations of buyers down to atomic levels. Thanks to marketing organizations religiously farming SERP keywords, cookies, and site navigation data, we are in the lucrative position of unlocking the buyer’s utility function provided we are able to eliminate the noise in the signals by applying advanced quantitative methods on big data. Before we get into the ‘geek talk’ overdrive let us define the fundamental questions we are seeking to answer in order to arrive at individual buyer specific targeting strategies.

    It all begins with a search: Every search is nothing but an ‘expression’ of intent which offers the key to unraveling the buyers need of the hour.

    Understanding the long and short of the buyers recent web-trail creates an opportunity for the digital marketer to define customized strategies empowered by contextual and behavioral targeting.

    Combining search intent and cookie trail with site navigation (i.e. Pathing) helps understand which acquisition journeys lead to conversion and which paths are, well, roads to nowhere.

    Search (Organic+ Paid) traffic coming into an established e-commerce website often comprises hundreds of thousands of unique keywords. However these seemingly distinct searches can be assigned to a finite set of ‘intent groups’ through logical classification of ‘semantic’ and ‘thematic’ similarity… Consider these two keywords: “best credit card for small business” vs. “top small business credit cards”. Clearly, these two searches are semantically different, but evidently they express very similar, if not the same, intent on the part of the searchers. Essentially, the intent here can be categorized as falling in the segment COMPARATIVE. Needless to say, by applying various text mining techniques, we can capture the massive number of searches in distinct intent groups such as INFORMATIONAL (“what is…”, “How to…”), CALL TO ACTION (“apply for…”, “buy online…”), and so on.

    The objective of the entire exercise is to reduce the dimensionality of the massive keywords data into actionable, logical, accurate intent groups. This, when achieved, enables the digital marketer to rank site visitors from search channels in terms of ‘purchase propensity’. For instance, a visitor coming into the site having searched “apply today or “instant approval” is way lower in the sales funnel (i.e. closer to conversion) than one who arrived searching “low interest cards”. You can therefore understand how robust search intent segmentation can create a definitive early advantage for the e-marketer as far as addressing ‘WHO IS THE BUYER?’ is concerned.

    All this is great, but we also know that all the people in the ‘CALL TO ACTION’ group do not convert, and that few in the weaker intent segments actually do. This is often a function of how the visitors interact with the site (‘Pathing’). A smart e-commerce site can actually manipulate the visitor’s site navigation based on the knowledge of their search intent groups as well as cookie trails thus maximizing the likelihood of ‘site navigation’ culminating into ‘acquisition journeys.’

    Once we are able to define concrete intent groups and recent purchase priorities or needs (i.e., search history) of the visitor, it enables customized page/content displays that keeps the visitor on the ‘conversion path’.

    The analytical techniques here get way more complex than standard regression models. This is because one cannot make the oversimplified assumption of identifying the triggers of conversion based on site navigation on the day the conversion happens. Why so? Let’s try and illustrate with an example: A visitor comes to a luxury watch retailer site, landing on the Homepage and takes the following path:

    Homepage–>Products–>Add to Cart–>Checkout

    If our dependent variable was ‘conversion’(Y/N) and the independent variables (predictors) were visit/no visit flags to the website pages then the traditional logistic regression model would tend to imply that the pages ‘Products’, and  ‘Add to Cart’ and ‘Checkout’ are the strongest influencers of conversion. But as logical minds we know these are ‘self-selected’ pages for people who convert in as much that they cannot select the product of choice without clicking on the ‘Products’ page, and cannot complete the purchase without going through the ritual of visiting the subsequent two pages. This constitutes a unique situation where ‘correlation’ does not imply ‘causality’.

    So where did the model fail?
    It failed because it ignored the buyer’s entire acquisition journey through the sales window. On the day of the purchase, the decision to buy has most likely already been made in the buyer’s mind. The ‘pathing’ on that day is a mere execution of a foregone decision.

    The real journey of awareness-to-interest-to-decision, hidden away in the buyer’s prior visits to the site or related sites when she was mulling over the idea of whether or not to commit to the sale, holds the key to unlocking which pages/features on the site actually influenced her decision. These, mis amigos, are the real ‘foot soldiers’, the ‘movers & shakers’ that cradled the visitor to conversion.

    Mathematically one therefore has to estimate a panel data based mixed-effects models, where the pathing of each visitor, whether converted or not, on each visit is accounted for. One needs to understand the critical importance of integrating search intent and pathing based insights into e-commerce strategies. The digital marketing world is a two-edged sword – while on the one hand it offers tremendous opportunity to decode the buyer’s utility function, it also creates a perilous situation where the substitution of the seller by one of its competitors is a mere click of the mouse with the buyer not having to move an inch, and having the opportunity to compare offerings across multiple competing sellers in real time. The marketing campaigns that often fail are those where the seller puts his ‘brand’ above the buyer’s ‘needs’. Digital marketing should not be afflicted by seller’s ego which makes him self-assured of the footprint of his brand because the buyer is simply interested in her own best interest. If by leveraging intelligent big data analytics you can weave yourself into her scheme of things whereby she resonates with your brand as “THIS IS WHAT I WANT!” you would convert a site visitor into a customer or otherwise your competitor surely will.

  • Paid Search Analytics

    EPSON MFP image

    Ad Position Analysis

    There are many benefits of being in the top position in Google Ad words – Higher click-through rate (CTR), more impressions, greater share of search, and a greater likelihood of increased conversions. Unfortunately, along with these benefits, come additional costs for the advertiser.

    Assuming an ad with an average quality score, cost at an average ad position of 1 is 30% more than in position 2 and cost at position 2 is 20% more than in position 3. Consequently, brands would want to economize ad campaigns by reducing their bids to settle for a lower ad position. However, this is a mistake that can cost them dear because lowering your cost per click is not useful if you’re paying low prices for irrelevant clicks. By discovering new, relevant and valuable clicks, the distribution of your budget will improve substantially.

    Here’s something interesting!

    Click through Rates fall by 80% between an average position of 2-3 and a further 18% by a position of 3-4.  For generic keywords, brands can target an average position till 4-5 but for brand specific terms, accumulating clicks beyond an average position of 3-4 is highly unlikely.

    For a Global manufacturer and marketer of consumer and professional products, we analyzed Paid Search Data from Google Ad words for 6 months ending September 2014 to find the ‘Sweet Spot’ of average position for their brand.  For all brands, we observed a maximum CTR at an average position of 1-2 followed by a steep decline of 80% at an average position 2-3. The image on the left shows a comparison of CTR with average Ad Position across brands.

    ChangeInCTR CTRbyAD

    As depicted by the chart on the left, Ads beyond position 3 are hardly clicked. Brands need to make sure their ads are placed in the top 3 positions to increase clicks. Since, top 3 positions are more expensive to bid, one needs to prioritize between campaigns and brands. Also, a great Quality Score reduces the cost involved in bidding for the top position.


    The image on the right quantifies the fall in CTR at position 3 and 4. We found that at position 3, CTR for Branded keywords falls by 65%, whereas CTR for Generic keywords falls by 40%. CTR for Branded keywords tends to zero in fewer ad positions than for generic keywords.

    More than 90% of clicks are associated with generic keywords for products related to general categories, while close to 30% clicks are associated with branded keywords related to personal products. Branded keywords see a steeper fall in both CPC and CTR compared to a more gradual fall for generic keywords.

    It is clear that CTR at top positions are considerably higher than those at lower positions. This suggests that most consumers conduct limited search and have small consideration sets. More clicks at lower positions suggest that consumers may be evaluating more ads before making their purchase decisions; these consumers will be having higher purchase intent.  In this case, placing ads at intermediate positions may be an effective way to reach higher purchase intent consumers without paying more for the top positions.

    We conclude by saying that Generic keywords are more contextual than Branded keywords and require a careful design depending on personal vs general category brands to attain maximum clicks.

  • Matt Gennone

    Matt Matt Gennone recently became part of the TEG Leadership team for North America. With over a decade of experience in the outsourcing/analytics industry, Matt brings a unique perspective to the TEG business. With an infectious enthusiasm and a razor-sharp focus on the industry, Matt is a valuable addition to the Genies at TEG. Few excerpts from a recent interview given by him…

    1. How did you start your career in analytics?

    I started my career in analytics while working at small process consulting firm working with Fortune 500 insurance company looking to establish an analytics center of excellence. It was very early days before terms such as big data existed.

    2. How do you see Analytics evolving today in the industry as a whole? What are the most important contemporary trends that you see emerging in the Analytics space across the globe?

    Analytics is finding its way into every day conversations. I used to have trouble explaining what I do…the average person now knows terms like big data and analytics. I feel analytics is moving towards SAAS and learning algorithms. Knowledge workers are not immune to automation! That said companies who can meld man and machine will win the day.

    3. What are some of the main customer tenets (philosophies, goals, attributes) that will be a differentiator for the KPO industry going forward?

    Those companies that can solve real business problems and create new business models will be those that thrive. Basic problems around data connectivity, BI are being solved.

    4. What are the most significant challenges you face when selling analytics to organizations?

    There is no shortage of analytics projects, ideas. The general challenge is how to consume the analytics and do business differently at the street level. In some instances we are talking about 100 year old companies who have been successful doing what they are doing. It takes time to realize the fundamental business environment has changes for these companies.

    5. What is the most interesting thing that you have noticed in the India Offshoring model in your experience?

    I am always humbled by how determined, passionate the work force is in India. The level of education is top notch and focus on hard sciences has created a very strong global advance in the new economy. I see entrepreneurship taking hold and once it creates a Silicon Valley culture…WATCH OUT WORLD!

    6. Any interesting client experiences that you would like to share?

    Oh boy where to start……Having worked for a leading analytics firm for 8 years they have all been interesting! Those that have been the most fun to watch is companies who have a roadmap for analytics and seeing them execute that vision over the years.

    7. Any interesting India experience that you would like to share?

    It might be hard for my Italian relatives, and those in the US to believe, but India has the best ice cream!

Hide dock Show dock Back to top