• Medicare : Demystifying Consumer Preference

    Science behind Medicare Modelling


    Science behind Medicare modelling

    What is the data and what can it do?

    Ample public data, available through CMS and Medicare, is used in a machine-learning based tool that mimics consumer choice in Medicare Advantage. By comparing plan features including benefits, MOOP, drug deductibles, star ratings and other attributes across Medicare Advantage Plans at each county level, firms can identify the top attributes that determine plan competitiveness, predict enrolments, create marketing strategies and design better products.

    How to leverage data and associated challenges?

    Models can be built to be flexible yet robust, and advanced ensemble techniques and bagging algorithms are used to predict Medicare Advantage enrolments for every single plan in each county in the country. Data from various sources and spared across various files will have to harmonized and married and maintained for building the database. The models will need to ingest a large volume of data – literally 4000 attributes for each plan. One will have to find an effective way to enable people to use it. And all this will need to be done with a high degree of accuracy and, given the short duration of the AEP period, within a very limited amount of time.

    Is there a reliable and efficient way to do this?

    TEG Analytics has created a holistic solution for this problem: HealthWorks- a platform where all CMS information is available in a single easy-to-use and intuitive dashboard. The models have been homed in over the years to give over 99% accuracy in enrolment predictions for plans with county-level granularity within 72 hrs of the release of CMS data. The findings can further be used to generate insights about factors affecting the performance of each plan.

    To achieve this various data sources are mashed up together – across demographic information, eligible, market penetration and growth over time, income levels of Medicare-eligible; plan level features including costs, MOOP, benefits, deductibles, drug information, etc; county-level competitive features such as the number of new entrants and new plans rolled out; changes in market conditions due to increased costs, MOOP, momentum, etc. Robustness of models is ensured through hold-out validations that are done within and across sequential years, and our metrics minimise prediction errors at three different levels – within a county, within organizations, and across large, small and medium plans.

  • FB Workplace buzzes at TEG

    Why FB Workplace – An Ideal Enterprise Networking Platform for Startups?

    Monica woke up on a Monday morning in her cosy double bed. Already delayed than her usual schedule, she hurriedly got ready, ate her breakfast and rushed for her workplace. On her way, she realized that she has to update the company employees with the newest trend in Artificial Intelligence and the categorical shift the analytics industry will be taking in automating the process in the future. Drafting a mail and sending it across to all the employees at various locations, including those in client locations was a cumbersome task, would require intervention of the centralized mail desk… blah… blah… blah… But what occurred to her was that her company, TEG Analytics, was a Workplace user. She opened the workplace app from her phone and updated a new post with the link to her article and BINGO!! Within 15 minutes there were more than 25 views.

    4PM and she was sorted for the day. TEG’s annual Cricket Tournament was supposed to start in the next 10 minutes.

    “Arvind wanted to watch the tournament”, she jumped off her seat as she got ready to post a live video in TEG workplace the moment the match started.

    “Damn! Girish dropped yet another catch?” gasped a retired hurt Satya, Captain of Team Andhra Chargers, from his workstation, as he watched the video at his workstation at the other end of the office.

    Welcome to the new age of Enterprise Networking, better known as “Workplace”. After 20 months in a closed beta under the working title Facebook at Work, Facebook has finally brought its enterprise-focused messaging and networking service to market under a new name, Workplace – a platform which connects everyone at your company to turn ideas into action.


    Workplace – which is launching as a desktop and mobile app with News Feed, Groups both for your own company and with others, Chat direct messaging, Live video, Reactions, translation features, and video and audio calling — is now opening up to anyone to use, and the operative
    word here is “anyone”. This means that Workplace won’t only cater to the desk dwelling “researchers” of the company who are brainstorming every day for insights of the industry, in their air-conditioned cabins, but also the more “naive” machine handlers, people whose work involve travel, and everyone who have been rarely included into an organization’s greater digital collaborations. TEG has been very active in using all the features extensively, be it knowing about an employee profile who has recently joined, be it planning their Marketing strategy in a closed Marketing Group, or be it creating events with calendar (date/time) details.

    What’s more is that, Workplace wants to build itself “the Facebook way” with a unique twist. As explained by Julien Codorniou, director of Workplace, in an interview in London, “we had to build this totally separate from Facebook, and we had to test and get all the possible certifications to be a SaaS vendor”. Workplace has been tested in every milieu ranging from the most dynamic MNCs to the rather conservative government agencies. In such a scenario, it provides the perfect enterprise social networking platform for the Indian start-up market. What’s better is that Workplace is an ad-free space, separate from your personal Facebook account – hence nothing to distract you. Also, with Workplace designed in the same model as Facebook, people with not much exposure to enterprise networking find this interactive and easy to handle.

    Facebook has signed up around 800 clients in India including Bharti Airtel and Jet Airways for its workplace version, making the country one of the top 5 in the world for the enterprise communication app. It counts Godrej, Pidilite, MakeMyTrip, StoreKing and Jugnoo as some of its top clients in India.

    “We see it is as a different way of running the company by giving everyone a voice, even people who have never had email or a desktop before,” said Julien Codorniou, VP-Workplace by Facebook, which competes with Google, Microsoft and Slack in the office-communication segment. “Every company where you see desk-less workers, mobile-only workers is perfect for us. That is why I think there is a strong appetite for Workplace in India compared to other regions. It is a huge market,” said Codorniou. “Mobile first is a global strategy, but it resonates well with Indian companies.” Facebook says that the Indian workforce below the age of 25 years


    prefers using mobile applications to communicate rather than emails. Here’s where Workplace wins.

    Another innovative prospect of Workplace is its pricing model, compared to its competitors like Yammer, Slack, etc. Workplace, unlike its competitors which has different rates for low end basic features and high end features, provides all the features to its users at the same rate. It charges monthly depending on the number of active users a company has in that month, active meaning, having opened and used at least once the Workplace account in the month.

    Most enterprise social platforms fail to achieve broad traction because they don’t offer ready answers to “how” & “how much” questions. With Facebook’s announcement about the integrations with Box, Microsoft or Dropbox or even Quip/Salesforce turning true; Workplace will be the all-you-need Enterprise Networking platform. Eventually, at the end of the day, if you don’t integrate with the tools your customer use, you’re going to lose a customer – and that’s not a very positive payoff.

    Certainly, with a brand like Facebook, which has over the years captured people’s imagination and flattered people with their innovative approaches, endorsing Workplace, this seems an interesting concept. It still needs to be seen how they fare in a completely different platform, the Enterprise Social Network, and the way TEG is using it will help figure out the drawbacks and potentials.

  • Madalasa Venkataraman (Madhu)


    1) How did you get interested in working with data?

    I think it’s a personality defect. I am sure my parents despaired of me listening to anything without a sound logically constructed argument.
    I was never one to work on gut feel and was more of a ‘rationalist’ in my college days – I would never accept anything anyone said without proof, or at least without a debate backed by numbers.

    Somewhere along the way, I got into analyzing data just for the heck of it. Cost/benefit analysis, the heuristic optimizations that we do on an everyday basis – these fascinated me. And then I discovered microeconomics and finance – there was a whole world out there that discussed rational decision making in terms of utility functions!  Suddenly, when I learnt statistics, things sort of fell into place, the inherent conflicts in my data analysis and methodologies started having a name and a theory behind them. That was a moment of revelation (as much as passing the first stats course was :)
    To me, data represents a move towards a single truth – a unified view that just ‘is’, the layers and stories it reveals and hides is simply fascinating. Everything that happens, that bugs us, that needs solving, the tools are just there to help us solve, if we have the data. Data science is the medley of statistics meets business meets urgent problems that need to be solved, and that calls out to me.

    I didn’t set out to be a data scientist, and I didn’t set out to be a geek (honestly!). But when training meets passion, the possibilities are endless. Add belief to the mix – the relevance of data sciences and its ability to influence policy, business and I think that’s a winning combination.

    2) What are your principal responsibilities as a data scientist?

    I lead the Stats team at TEG Analytics. My role of to build the team, to make sure we build TEG’s competence in information storage and retrieval, statistical analysis, visualization and in business insights. – I get involved in projects, we brainstorm and innovate, and come up with amazing solutions that are state of art, cutting edge – and with relevance to the business context, the business issue/case we are trying to solve.

    3) What innovations have you brought into this role?

    The way I perceive my role is probably a little different to the traditional data scientist role. I am also here to invite our talent into a world of wonderful global innovations in machine learning, in AI, in building the next generation or suite of products and solutions that will solve real world business problems, to inspire them to reach beyond their current projects, to read and to upskill with ravenous hunger. I come from a teaching background. I have been a professor in business studies, and I work together with our teams to build a consulting perspective to our solutions across domains.

    4) Can you share examples of any interesting projects where data science played a crucial role?

    Some recent ones that have been interesting and challenging
    1. A brand juice sentiment analysis project. This was interesting because of the complexities in the data and in the interpretation of sentiment scores.
    2. A Medicare plan competitiveness analysis based on publicly available CMS data, using which we predicted enrollments in Medicare plans mimicking customers choice models.

    5) Any words of wisdom for Data Science students or practitioners starting out?

    More often than not, data science is seen entirely as a statistical/analytics effort, or as a business problem where numbers are incidental to the story. Data sciences is cross-disciplinary in nature – we need the stats acumen, and the business insights. Domain knowledge is essential – be willing to invest in it, as long as it takes. Knowing the right program and package is cool; to stitch the story together and influence budgetary allocations is more so.

    6) What Data Science methods have you found most helpful?

    Common sense, but that’s not really a data science method. I can’t call out a specific method – I personally like to use a judicious mix of parametric and model-free techniques, depending on the case. On a more serious note, irrespective of the method, or the machine learning, or the neural network package, there is merit in covering the basics. A data dictionary, good foundations, EDA and good design of experiments are mandatory. The rest is really going to change based on the task at hand.

    7) What are your favorite tools / applications to work with?

    I have used a variety of tools. I like Stata quite a lot. I am often asked if R is a better bet than SAS. SAS is a very powerful, accurate tool – its advantage is, if the program runs, the results are pretty much what you are looking for. In R, due to the multitude of packages, it’s easy for beginners to get confused, and the results are more dependent on the programmer’s skill levels.

    8) With data science permeating nearly every industry, what are you most excited to see in the future?

    IOT and AI are converging in a big way. There is tremendous potential, it’s an exciting field. Geo-spatial data is already big, it will get bigger with drone technology and geo-spatial visualization is a great field to look forward to.
    In the sales and marketing analytics field AI/NN models for relevant 1:1 personalization, multi-touch attribution in media efficiencies, hidden Markov models/LSTM for sequence learning in text analysis – these are some of the things to look out for.

    9) What lessons have you learned during your career that you would share with aspiring data scientists entering the field?

    Three things I believe are important: First -  Business trumps statistics, and that’s the natural order of this world. Second -The solution should be as complex as necessary, and no more – it’s important to embrace Occam’s razor. Fast failure is more important than the perfect model.
    Third – and most important. There are principles and theories in statistics, information modelling, databases – and there are tools and techniques. It is imperative to keep oneself updated on the tools and programs and applications, but always to relate it back to the fundamentals, the principles and the theory.

  • Retail Demand Forecasting

    How to develop an Effective Scientific Retail Demand Forecast?
    Purpose of the Forecast
    The ability to effectively forecast demand is critical to the success of a retailer. demand forecasting is especially important in the retail industry because it leads to …… lower inventory costs, faster cash turnover cycles, quicker response to trends, etc etc. Retailers require forecasts that would be instrumental in directing the organisation through a minefield of capacity constraints, multiple sales geographies and a multi-tier distribution channel. A robust demand forecast engine will significantly impact both top & bottom lines positively.

    Demand forecasting helps understand key questions viz. which market would place demands for which specific type of product, which manufacturing unit should cater to which retailer, how many product units are required in a given season etc.? Given the sophisticated tools & techniques available today, all retailers should replace gut based decision making with scientific forecasts. The benefits, throughout the lifecycle of the analysis will far outweigh the one time set up and ongoing maintenance costs. There is a lot of value in answering these questions through scientific methodologies as compared to educated guesses, or judgmental forecasts.

    Business Benefits
    Scientific forecasting generates demand forecasts which are more realistic, accurate and tailored to specific retail business area. It facilitates optimal decision-making at the headquarters, regional and local levels, leading to much lesser costs, higher revenues, better customer service and loyalty.

    Range of Business Users
    Traditionally, only the sales department has used forecasts, but in evolved markets the usage of forecasts is now pan organizational. Sales Revenue Forecasting, Marketing & Promotion Planning, Operations Planning, Inventory Management etc. also extensively use sales forecasts. Indian retail needs to imbibe this discipline as their scale of operations grows larger and they are unable to cope with the entrepreneurial style of functioning, which was the key to their success in the start up phase.


    Typical Challenges Faced!
    Though demand forecasting is an important aspect of a retail business, more often than not, it is laced with multiple challenges. Some of them could be:

    Level/Scope of the Forecasts
    A large retailer may have thousands of SKUs. A conscious decision has to be made regarding the product hierarchy level at which the forecasts are needed, as it is very challenging to produce forecasts for all existing SKUs, neither does it make sound financial sense in most cases. Other concern would be the number of stores a typical large retailer possesses, and whether a separate forecast is needed for each of the stores.

    In order to optimise the cost-benefit, TEG recommends creation of forecasts at the “Store-Cluster” & “SKU-Cluster” levels. The store clusters are created using store characteristics, like past demand patterns and local/ regional demand factors. The SKU clusters are determined by the category type, life cycle etc.

    New Product Forecasts
    A retailer typically launches new products every month/season. Using past data to forecast is not feasible, as past data does not exist. TEG, would tackle the situation by considering complementary products, based on their key characteristics like target segment, product category, price level, features etc. A rapidly emerging methodology is the estimation of future demand using Advanced Bayesian Models (Fig. 3).

    Bizarre/Missing Historic Sales Pattern
    The erratic sales figures for many items in the store often pose a lot of issues for scientific methods of forecasting. In these situations, we need to resort to extensive statistical data cleaning exercises.

    Non-availability of True Historic Demand
    Historic sales are used to estimate the future demand, as it is the only reliable quantitative indicator available about customer demand. However it is possible that sales data end-up with a bias because of the inventory rupture or temporary promotional activities. These situations need correction to sales history to reflect the true demand. Since demand bias is very business specific, such corrections usually require in-depth domain expertise to interpolate/extrapolate the sales figures.

    Forecasting Techniques
    Demand forecasting techniques are broadly divided into two categories: Judgmental and Statistical.


    The Scientific (Statistical) Forecast Models
    Scientific models are divided into two categories, Extrapolation Models & Causal Models (Fig. 2). The extrapolation models are based exclusively on the past/historic sales data where the trend, seasonality & cyclicity prevalent in the historic sales data are examined to project the sales in future. However it is pretty intuitive that the future sales not only depend on the past sales but also on the other factors viz. economic trends, competitors’ movement, festive events, promotional activities etc. In order to incorporate such external factors in forecasting, a variety of causal models are available. In absence of such external factors’ data, the extrapolation models provide decent forecasts in most of the situations.

    Key Comparisons of Various Scientific Models


    Implementing Forecasts

    There are two aspects to forecasting implementation, technical and functional. The challenges in both are different, while the technical challenges are easy to solve given the profusion of tools available in the market today, the functional challenges involve significant business process re-engineering and hence are the most typical point where organizations fail to capture the impact of forecasts.

    Technological implementation can be done via modelling tools like SAS, E-views etc. or via forecasting simulators, like TEG’s proprietary FutureWorksTM tool. Given, the forecasting model equation, the tools, would just need the forecasting inputs in order to generate the forecasts. In case of pure time series models, the inputs are simply past figures of the forecasted metric, while in case of causal forecasting models, we need the forecasted values of the input variables as well. This would need multiple models to be created.

    Organisationally, the forecasts need to be essential requirements before taking key decisions on supply chain, future media spend, inventory reallocation etc. It should be in the organisation’s DNA, that any of these decisions will not be taken without a study of how these decisions would impact future demand. Traditionally, this has been the hardest part of implementation, as organizations used to operate in a quick, informal, entrepreneurial culture, often fail to see the benefit of the extra discipline and rigor.

    TEG Scientific Forecasting Process
    TEG follows the CRISP-DM process for all modelling processes, including forecasting.


    A TEG Case Study
    A leading Indian Sports Goods Retailer wanted to develop a scientific forecasting system to foresee the future sales across various product hierarchical levels irrespective of the supply side constraints to facilitate various short & medium-long term business plans. Additionally, the system could provide an early warning of potential slack across chains/stores to enable full resource utilisation course correction.

    Methodology & Results
    After setting up the forecasting objective and scope, a list of potential factors (Fig. 5) were considered to build the forecast model across various channel & SKU-cluster combinations. Rigorous data treatment phase followed and various families of statistical models (specified in Fig. 3) were tested for each channel & SKU-cluster combination. A single model was finalized which produced the accuracy at the satisfactory level. Fig. 6 depicts one such model which was used to produce forecasts for 12 weeks in future. As evident, model is doing a good job in anticipating demand for certain types of interventions like, ICC events & seasonal promotions where demand is supposed to shoot upwards.



    Key Take-Aways

    The deployment of the Scientific Models to their forecasting process helped the Retailer in the following ways:

    1. Improved Forecasts – The forecasts were improved in the range of ~2-15% across different store-cluster & SKU-cluster combinations.
    2. Better Stock Management – The key achievement was to accurately pinpoint the slack periods for some of the SKU-clusters which were eating up the rack space in those time periods earlier. The retailer was also able to identify the unfulfilled demand for some of the SKU-clusters which was not getting captured with the traditional judgement forecasting approach. Identification of these gaps helped the retailer to better manage the stocks across different store-clusters by relocating them from low demand stores to high demand stores.
    3. Early Warning of Lull Periods – The knowledge of low sales regime well in advance (12-24 weeks) helped the retailer to frame the promotion calendar so that the sales could be hiked up to meet the targets.
  • TEG at Cypher 2017

    Madalasa Venkataraman, Chief Data Scientist at TEG Analytics, is a researcher at heart and a contrarian by nature. Madhu brings a unique perspective in leveraging the vast knowledge of statistical concepts and analytical techniques to solve complex business problems. A Fellow from IIM Bangalore, she has worked in academia and corporate sector. Madhu drives the culture of innovation at TEG and encourages the team to challenge status quo. She is often seen huddling with project teams to develop solutions through brainstorming and whiteboarding.

    Madhu has 18+ years of experience across marketing, finance insurance, and urban governance. Her area of interest and involvement at TEG include: marketing mix models, semantic/text analytics, recommender systems, forecasting, fraud analytics and pricing analytics. Madhu is also an avid columnist, her publications are frequently seen in academic and policy journals.

    Given that TEG has immense expertise in Sales and Marketing Analytics, Madhu is speaking at Cypher 2017 on how Big Data and AI in Sales and Marketing Analytics are driving micro-segmentation and personalized campaigns.

    As companies learn to process the flood of data from all sides, traditional models of marketing are slowly giving way to smarter, niche strategies. Firms are using big data analytics to uncover highly profitable niche segments, changing their channel management strategy and sales plays. While Big data helps in identifying these micro-segmentation layers, AI is used to personalize their campaign efforts to reach out to this targeted audience.

    Look forward to seeing you at the conference on 21st September from 11.20 – 12.15 pm. Feel free to reach out to us for more details.

    Who should attend: Aimed at “geeks” who are interested in AI – Big Data, “suits” who are trying to understand its implications on sales and marketing, and “Genies” as we like to call them – combination of both profiles, will reap most benefits from this session.

  • Offshore Analytics COE

    Offshore Analytics COE – cracking the code

    What is an ACOE?

    Increasingly, companies rely on their information systems to provide critical data on their markets, customers and business performance in order to understand what has happened, what is happening – and to predict what might happen. They are often challenged, however, by the lack of common analytics knowledge, standards and methods across the organization. To solve this problem, some leading organizations are extending the concept of Centers of Expertise (COE) to enterprise analytics.

    With these COEs, they have realized benefits such as reduced costs, enhanced performance, more timely service delivery and a streamlining of processes and policies. An Analytics COE (ACOE) brings together a community of highly skilled analysts and supporting functions, to engage in complex problem solving vis-à-vis analytics challenges facing the organization. The analytics COE fosters enterprise-wide knowledge sharing and supports C-level decision making with consistent, detailed and multifaceted analysis functionality.

    The eternal debate – in-house versus outsource

    On scanning the market it is evident that both the in-house and outsourced models are equally prevalent at least in India based ACOEs. Most of the financial institutions like Citicorp, HSBC, Barclays etc have chosen to go in-house. This is primarily due to data sensitivity issues. Firms in industries where the data security concerns are not as high like CPG, Pharma etc typically choose third party specialized analytics shops to set up ACOE for them. While making a decision on in-house versus outsource some points to be kept in mind are

    1. External consultants can be utilized for the heavy lifting i.e. data cleansing & harmonisation / modeling / reporting work. Internal resources with their better understanding of the competitive scenario, internal business realities and management goals can concentrate on using the insights generated from the analysis / reporting to formulate winning strategies/tactics
    2. External consultants provide you the flexibility of ramping up / down at short notice based on fluctuations in demand
    3. Analytics resources span a wide variety of skill sets across Data warehousing / BI / Modeling / Strategy. It’s difficult to find folks with skills / interests across all these areas. Often you do not need a skill set full time e.g. a modeler might be needed only 50% of the time. In case you hire internally you have to sub optimally utilize him / her for the balance 50%. An external team gives you the flexibility to alter the skill mix depending on demand while keeping the headcount constant e.g. a modeler can be swapped for a DW/BI resource if the need arises
    4. Possibility of leveraging experience across clients / domains.

    Initiating the engagement

    As with any outsourcing arrangement, setting up an ACOE is a 3 step process



    Ongoing governance of the relationship


    At TEG we recommend a 3 tier governance structure as described in the figure above for all ACOE relationships.

    1. The execution level relationship between analysts on both sides that takes decisions on the day to day deliverables
    2. The Project Manager – Client Team lead level relationship that works to provide prioritization and resolve any execution issues
    3. Client Sponsor – Consultant Senior management level relationship that works on relationship issues , contractual matters & account expansion etc

    Projects executed under ACOE

    Typically any project / process that need to be done on a regular and repeated basis is ideal for an ACOE. Building out an ACOE ensures high level of data and business understanding as the same analysts work across multiple projects. This set up is not suitable for situations where the analytical wok happens in spurts, with periods of inactivity in between.

    TEG runs ACOE for several Fortune 500 clients, and the analysts are engaged in a variety of tasks

    1. Apparel & Sports goods retailer
      • Maintain an Analytical Datamart of all sales , sell through , sell in and pricing data across multiple franchisees stores and accounts
      • Maintain the entire suite of Sell Through reporting for retail operations, merchandising & sales teams. This set of reports includes sales & inventory tracking , SKU performance and promotion tracking at various levels
      • Formulate promotion pricing strategy for factory outlet stores using sell through data
    2. Beauty products major
      • Survey analytics , identifying key trends from the survey results and drivers analysis
      • Market Basket Analysis, analyse past purchase history to identify the product combinations that have a natural affinity towards each other. Insights based on this analysis are used for cross-promotions, brochure layout, discount plans, promotions, and inventory management
      • ETL on the sales and marketing data to create an Analytical Data Mart that can be used as a DSS tool for strategic pricing & product management decisions
      • Online competitor price tracking, create a link extractor that scrapes price aggregator and competitor websites and creates a database of competitor product prices. This database is used by our client to perform price comparison studies and take strategic decisions on pricing
      • Generate Executive Management Workbooks to track market share of Top 100 products & provide analytical insights
    3. Credit card and personal finance firm
      • Creation of basic customer marketing , risk & collections report with multiple slicers for extensive deep dive analysis of customer transaction data
      • Collection queue analysis , ensuring equitable distribution of collection calls amongst different collections agents
      • Customer life time value analysis
      • Customer product switching analysis
      • Acquisition & active customer model scoring & refresh
    4. Nutritional & consumer products MLM firm
      • Campaign management using SAS, SQL & Siebel. Complete campaign management including propensity model creation , audience selection for specific campaigns, design of the campaign using DOE methodology , control group creation , campaign loading in the CRM system , post campaign analysis
      • Customer segmentation
      • Distributor profitability analysis
      • Customer segment migration analysis using Markov chain based models
    5. CPG major in household cleaning products
      • Creation of digital analytics DataMart using data across 18+ sources across 11 marketing channels
      • Creation and maintenance of complete reporting and dashboard suite for digital marketing analysis and reporting
      • Price and promotion analysis , price elasticity modeling , pricing tool to determine revenue and profitability impact of key pricing decisions
      • Market share reporting across 25 countries in LATAM & APAC
      • Creation of data feeds for MMX modeling
      • Shipment , Inventory & consumption analysis with a view to optimizing inventory and shipping costs
      • SharePoint dashboard creation to track usage of corporate help resources
    6. Consulting company focused on automobile sector
      • Demand forecasting of automotive sales based on variations in marketing spend across DMAs
      • Propensity modeling to determine the ideal prospects for direct sale of customized electric vehicle
      • Customer segmentation to determine the ideal customer profile for relaunch of a key model

    Key takeaways

    The ACOE model has been successfully deployed by clients across a variety of industries to beef up their analytical capabilities.

    In some cases the requirement is tactical for a limited period of time, but mostly clients use it strategically to harness best of breed capabilities that are difficult to build in house. The critical success factors in a ACOE relationship are

    1. Strong business understanding of client processes by the consultant team. This is usually done by posting key resources onsite on a permanent basis or on a rotational basis
    2. Strong governance at multiple levels
    3. Tight adherence to business and communication processes by both parties
    4. Well defined scope of services for the consultant teams
  • Improving Marketing Effectiveness (Using Performance Pointer)

    Improving Marketing Effectiveness (Using Performance Pointer)

    Retail and Consumer goods companies run multiple campaigns, promotions, and incentives to lure customers to buy more. A great deal of time, energy, and resources are deployed to execute the promotion programs. In most organisations the allocation of funds to various programs is based on gut feel or past experience. If the promotions do not pan out the way they were intended or perform better than expected, the decision maker is unable to explain the phenomenon or repeat the performance. Also the promotions may be targeted at a macro segment while they may be effective only at a micro segment thereby reducing the overall effectiveness of the program.


    Using Analytics the gut-based decision can be supported by facts. It helps the business make a better business decision and stay ahead of its competitors who may rely purely on gut feel or past experience. We have chosen an MLM (Multi Level Marketing) company as an example as the problem of allocating funds to various incentives is further amplified due to the large sales force engaged in the MLM companies. Retail and consumer goods industry can draw parallel between incentives and promotions that are run through the year targeted at various segments to improve sales.

    According to Philip Kotler, one of the distribution channels through which marketers deliver products and services to their target market is Direct Selling where companies use their own sales force to identify prospects and develop them into customers, and grow the business. Most Direct Selling companies employ a multi-level compensation plan where the sales personnel, who are not company employees, are paid not only for sales they personally generate, but also for the sales of other promoters they introduce to the company, creating a down line of distributors and a hierarchy of multiple levels of compensation in the form of a pyramid. This is what we commonly refer to as Multi-Level Marketing (MLM) or network marketing. Myriad companies like Amway, Oriflame, Herbalife, etc. have successfully centered their selling operations on it. As part of Sales Promotion activities MLM companies run Incentive programs for their Sales Representatives who are rewarded for their superior sales performance and introducing other people to the company as Sales Representatives.


    Business Challenge
    Although Incentives play a major role in sales lift, there are many other environmental factors such as advertisement spend, economic cycle, seasonality, company policies, and competitor policies also affect sales. It becomes increasingly difficult to isolate the impact of incentives on the sales. Usually MLM companies run multiple and overlapping incentive programs i.e. at any given time more than one incentive programs run simultaneously. See figure below. Rewards could be monetary or include non-monetary items like jewelry, electronic items, travel, cars etc. and are offered on a market-by market basis. A key question that arises is – “how do we understand the effectiveness of these multiple incentive programs?” The success of any MLM company is largely dependent on the performance of its Sales Representatives, Incentive programs are of paramount importance for realising the company’s marketing objectives and hence form a vital component of its marketing mix. Some of the large Direct Selling Corporations end up spending millions of dollars on Incentive programs in every market. It is important for the incentive managers to understand the effectiveness of the incentive programs, often measured in terms of Lift in Sales, in order to drive higher Return on Incentive Investment as against making gut-based investment decisions. To summarise, it is not easy to measure the ROI on Incentive programs for two broad reasons. First, it is difficult to separate the Lift in Sales due to Incentive programs from that due to other concurrent communication and marketing mix actions. Secondly, in most cases there may be an absence of “Silence Period” i.e. one or more incentive programs are functional at all points of time. This makes the task of baseline sales estimation virtually impossible by conventional methods. Usage of Analytics – the science of making data-driven decisions – becomes indispensible in order to address the above constraints while at the same time statistically quantify the individual Incentive ROI and make sales forecasts with sound accuracy. While doing so a systematic approach using best practices is followed in order to obtain reliable results in a consistent and predictable manner.

    It is very tempting to jump straight into the data exploration exercise. However, a structured approach ensures the outcome will be aligned with the business objectives and the process is repeatable.

    Objective Setting
    The first step towards building an analytics based solution is to list down the desired outcomes of the endeavor prior to analysing data. This means to thoroughly understand, from a business perspective, what the company really wants to accomplish. For instance, it may be important for one MLM company to evaluate the relative effectiveness of various components of its marketing mix which includes Incentives along with pricing, distribution, and advertising etc. while some other company may be interested in tracking the ROI of its past incentive programs in order to plan future incentive programs. In addition to the primary business objectives, there are typically other related business questions that the Incentive Manager would like to address. For example, while the primary goal of the MLM Company could be ROI estimation of Incentive programs, the incentive manager may also want to know which segment of Representatives are more responsive to a particular type of Incentive or to find out if the incentives are more effective in driving the sales of a particular product category. Moreover, it may also be prudent to design the process in a way that it could be repeatedly deployed across multiple countries rapidly and cost effectively. This is feasible where a direct selling company has uniform data encapsulation practices across all markets. A good practice while setting the objectives is to identify the potential challenges at the outset. The biggest challenge is the volume of Operational Data. In the case of retail and direct sellers it runs into billions of observations over a period of few years.

    Data Study
    Most Direct Sellers maintain sales data at granular levels and aggregated over a period of time and across categories along with Incentive attributes, measures and performance indicators. Hence, we can safely assume that most companies will have industry specific attributes like a multi-level compensation system. However, every MLM company will also have its own specific set of attributes which differentiate it from its competitors. It is therefore vital to develop a sound understanding of historical data in the given business context before using it for model building. This exercise also entails accurately understanding the semantics of various data fields. For instance, every MLM company will associate a leadership title with its Representatives. However, the meaning of a title and the business logic used to arrive at the leadership status of a Representative will vary from company to company. Good understanding of other Representative Population attributes like Age, duration of association with company, activity levels, down line counts etc. also leads to robust population segmentation. Depending on business objectives other data related to media spend, competitor activity, macro economic variables etc. should also be used. A potential issue of data fragmentation might arise here as voluminous data is broken up into smaller parts for ease of storage, which needs to be recombined logically and accurately using special techniques while reading raw data.
    Moreover, raw data coming from a data warehouse usually contains errors and missing values. Hence, it becomes important to identify them in the data using a comprehensive data review exercise so that they may be suitably corrected once the findings have been validated by the client. In extreme cases the errors and inconsistencies might warrant a fresh extraction of data. This may lead to an iterative data review exercise which is also used to validate the entire data understanding process. Any lapse in data understanding before preparing data for modeling might lead to bias and errors in estimation in future. A data report card helps clients understand the gaps in their data and establish procedures to fill the gaps. A sample scorecard is shown for reference.


    Data Preparation
    Modeling phase requires clean data with specific information in a suitable format. Data received from the company cannot be used as input for modeling as is. Data preparation is needed to transform input data from various sources into the desired shape. Not all information available from raw data may be needed. For example, variables like name of Incentive program, source of placing orders, educational qualifications of Representatives etc. may not be needed for building a model. Key variables that must go in model building are identified and redundancies removed from the data. Observations with incorrect data are deleted and missing values may be ignored or suitably estimated. Using the derived Representative attributes the population is segmented into logically separate strata. Data from different sources is combined to form a single table and new variables are derived to add more relevant variables. In the final step of the data preparation exercise measures are aggregated across periods, segments, geographies and product categories. The aggregated data is used as an input for modeling.



    Data Modeling
    A statistical model is a set of equations and assumptions that form a conceptual representation of a real-world situation. In the case of an MLM company a model could be a relationship between the Sales and other variables like Incentive costs, number of Representatives, media spend, Incentive attributes and Representative attributes. Before commencing with the modeling exercise the level at which the model should be built needs to be ascertained. A Top Down approach builds the model at the highest level and the results are proportionally disseminated down at the population segment level and then on to individual level. The resultant model may not be able to properly account for the variation in the dependent variable and introduce bias in estimates as the existence of separate strata in the population while model building is ignored. Choosing a Bottom Up approach on the other hand builds the model at the individual level and the results are aggregated up to the segment level and then on to the top level. This is exhaustive, but at the same time, very tedious, as Sales data usually runs into millions of observations and not all Representatives may be actively contributing to the Sales at all times. Moreover, if the project objective revolves around estimating national figures this exercise may become redundant. The Middle Out approach may be the best approach to model the data. The model is built at the Segment level and depending on the requirement the results may be aggregated up to the top level or proportionately disseminated down at the individual level. The first step of model building exercise is to specify the model equation. This requires the determination of the Dependent variable, Independent variables and the Control Variables. Control variables are those variables that determine the relationship between the dependent variable and independent variables. In a baseline estimation scenario, the Sales measure is the dependent variable; Incentive cost and other Incentive attributes form the independent variables; segmentation variables, time series, geography, inflation and other variables like media spend act as Control variables. Usually the model is non-linear i.e. the dependent variable is not directly proportional to one or more independent variables. A non-linear model may be transformed to a linear model by use of appropriate data transformations. For example, the relationship between Sales and Incentives is non-linear. Representative Incentives behave like consumer coupons where there is an initial spike in Sales at the start of the Incentive followed by a rapid decline, but the impact returns at the end of the incentive as Representatives try to beat the deadline. Application of coupon transformation to Incentive variables therefore produces a linear relationship between Sales and Incentives. Model coefficients are then estimated using advanced statistical techniques like Factor Analysis, Regression and Unobserved Component Modeling. The common practice followed across industry is to use Regression Analysis for explaining the relationship between the dependent variable and the independent variables and separately employ time series ARIMA (Auto Regressive Integrated Moving Average) models for forecasting as the data invariably has a time component. To solve the Regression models with all incentive attributes accounted for, they are first condensed into a few underlying factors accounting for most of the variance. These factors are then part of the regression along with the control variables. Final coefficients are a combination of factor loadings and model coefficients. This Regression model equation allows us to understand how the expected value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. The time series model is developed by reducing the non-stationary data to stationary data, removing the Seasonality and Cyclic components from it and estimating the coefficients of the ARIMA model. This approach often leads to in concordant answers from Regression and ARIMA models as Regression analysis will miss the trend and ARIMA forecasting may fail to account for causal effects. Unobserved Component Modeling may be employed if a very high accuracy is desired from the model. It leverages the concepts of typical time series analysis where observations close together tend to behave similarly while patterns of correlation (and regression errors) breaks down as observations get farther apart in time. Hence, regression coefficients are allowed to vary over time. Usual observed components representing the regression variables are estimated alongside the unobserved components such as trend, seasonality, and cycles. These components capture the salient features of the data series that are useful in both explaining and predicting series behavior. Once the model coefficients are determined it is essential to validate the model before using it for forecasting.


    Model Validation
    The validity of the model is contingent on certain assumptions that must be met. First l, the prediction errors should be Normally Distributed about the predicted values with a mean of zero. If the errors have unequal variances, a condition called heteroscedasticity, Weighted Least Squares method should be used in place of Ordinary Least Squares Regression. A plot of residuals against the predicted values of the dependent variable, any independent variable or time can detect the violation of the above assumption. Another assumption that is made in time series data is that prediction errors should not be correlated through time i.e. errors should not be auto correlated. This may be checked using the Durbin Watson test. If errors are found to be auto-correlated then Generalized Least Squares Regression should be used. It is also important to check for correlation among the independent variables, a condition called multi collinearity. It can induce errors in coefficient estimates and inflate their observed variances indicated by a variable’s Variation Inflation Factor (VIF). Multi-collinearity can be easily detected in a multiple regression model using a correlation analysis matrix for all the independent variables. High values of correlation coefficients indicate multi-collinearity. The simplest way to solve this problem is to remove collinear variables from the model equation. However, it may not always be feasible to remove variables from the equation. For example, the cost of an Incentive program is an important variable that cannot be removed if found to have a high Variation Inflation Factor. In such cases Ridge regression may be used in place of OLS Regression. However, some bias may sneak in the coefficient estimates. The goodness of model fit may be adjudged by the values of R², which is the model coefficient of determination. Its value ranges between 0 and 1. A good fit will have an R² a value of greater than 0.9. But any value of R² close to 1 must be bewared as it could be causing over-fitting. Such a model would give inaccurate forecasts. A low value of Mean Average Percentage Error (MAPE) of predicted value is also indicative of a good fit. Once the model assumptions are validated and goodness of fit established the model equation can be used for reporting and deployment purposes.


    Reporting & Deployment
    Depending on the chosen dependent variable based on the scope of the incentive modeling exercise the baseline measures like Sales, Volume, Representative count etc can be estimated using the model equation. These estimates along with other variables and derived values can be used to obtain insights about Incentive performance through dashboards with KPIs and other pre defined reports like annual Lift in Sales vs. Incentive Cost, Baseline Sales vs. Sales Representative Count, etc. The key to realizing the business objectives & deriving value from the modeling outcomes is to capture & present the findings in the most suitable form which will enable the end user to understand the business implications as well as to flexibly slice & dice data in any way in a convenient fashion without having to make any costly investments in acquiring and maintaining system resources. For example, an incentive manager could look at the average ROI of a particular type of Incentive program as a pre-built report and be given the flexibility of being able to compare the cost of that Incentive with that of another type of Incentive over an online hosted analytics platform which presents pre-canned reports along with user customizable reports and multi- dimensional Data Analysis capability. Such a system can give the end user the freedom to access the reports and analyse the data anytime anywhere using an internet browser. Once deployed, it may be refreshed with additional data in future and may also be used for multiple markets with minor region specific customisations.




    The Insight-based approach will significantly increase the confidence level of incentive managers while planning the incentive programs for MLM activities. They will be able to identify the Incentive programs which deliver high, medium and low paybacks, and hence optimize investment in them. It will also help the Direct Seller to check if any product categories are more responsive to Incentives than others. The endeavor can make significant impact where counter intuitive facts surface. For example, any particular event or holiday, which might be an influencing factor in designing incentive programs during a particular time of the year, may actually turn out to be an insignificant contributor to company sales. Incentive Managers can simulate various scenarios by assigning different values to the contributors and macro economic variables and forecast the ROI of near future incentive programs. This will enable regional incentive managers to drive efficiency and effectiveness in incentive planning and realise the company objective of enhanced Sales and ROI. The share of Incentive programs in marketing budget of most Direct Sellers has been progressively increasing and the expenditure incurred is steadily going up in face of competition in emerging markets like India and China which are fast becoming the engines of growth for global Direct Sellers. Investment in analytics based decision support systems will prove to be the difference maker for Direct Sellers.


  • Random Forest in Tableau using R

    Random Forest in Tableau using R

    I have been using Tableau for some time to explore and visualize the data in a beautiful and meaningful way. Quite recently, I have learned that there is a way to connect Tableau with R-language, an open source environment for advanced Statistical analysis. Marrying data mining and analytical capabilities of R with the user-friendly visualizations of Tableau would give us the ability to view and optimize the models in real-time with a few clicks.

    As soon as I discovered this, I tried to run the machine learning algorithm Random forest from Tableau. Random forest is a machine learning technique to identify features (independent variables) that are more discerning than others in explaining changes in a dependent variable. It achieves that by ensembling multiple decision trees that are constructed by randomizing the combination and order of variables used.

    The prediction accuracy of Random forest depends on the set of explanatory variables used in the formula. To arrive at the set of variables that makes the best prediction, one often needs to try multiple combinations of explanatory variables and then analyze the results to assess the accuracy of the model. Connecting R with Tableau will help you save a lot of time that would have otherwise gone into the tedious task of importing the data into Tableau every time you add/remove a variable.

    Tableau has a function script_real() that lets you run R-scripts from Tableau. To use this function in any calculated field, you need to set up the connection by following steps:

    1. Open R Studio and install the package ‘Rserve’


    2. Run the function Rserve()


    3. Once you see the message “Starting Rserve…”, open tableau and follow the below steps to setup the connection


    When you click on “Manage External Service Connection” or “Manage R Connection” depending on the version of Tableau, you’ll see the following window.


    Click OK to complete the connection between Tableau and R on your machine.

    Let’s take a simple example to understand how to leverage the connection with R to run Random Forest. In this example, I need to predict the enrollments for an insurance plan based on its features (say costs and benefits) and the past performance of similar plans.

    After importing the dataset into Tableau, we need to create a calculated field using the function script_real() to run the script for random forest which looks like below:

    Data<-read.csv(“C:/Tableau/Test 1.csv”)
    Data15 Data16 attach(Data15)
    rf ntree= 1000, Importance = TRUE, do.trace = 100,
    yhat Data16$Enrollments<-yhat

    To run the same script in Tableau using the function script_real(), we need to create a dataframe using only the required columns in the imported dataset. This should be done using the arguments .arg1…arg5 instead of actual column names since R will be able to access only the data that’s referred through arguments.

    The values for these arguments should be passed at the end of the R-script in the respective order i.e., .arg1 will take the values of the first mentioned field, .arg2 will take the values of the second mentioned field and so on.

    After making these changes, the code will look like the following:

    Data=data.frame(.arg1, .arg2, .arg3, .arg4, .arg5, .arg6)
    Data15 Data16 formula<-.arg2~.arg3+.arg4+.arg5+.arg6
    rf ntree= 1000, Importance = TRUE, do.trace = 100, na.action=na.omit)
    yhat Data16$.arg2<-yhat
    testdata<-rbind(Data15, Data16)
    testdata$.arg2′, ATTR([Year]),SUM([Enrollments]), SUM([Plan feature 1]),SUM([Plan feature 2]),SUM([Plan feature 3]),SUM([Plan feature 4]))

    The calculation must be set to “Plan ID” level to get the predictions for each plan ID.

    Although this approach achieves the objective of predicting enrollments for each plan, it doesn’t offer us the flexibility to run multiple iterations without having to change the code manually. To make the model running easier, we can create parameters as shown below to choose the variables that go into the model.


    Then, we can create calculated fields (as shown below) whose values change based on the variables selected in the parameters.

    case [Parameter1]
    when “Plan Feature 1″ then [Plan feature 1]
    when “Plan Feature 2″ then [Plan feature 2]
    when “Plan Feature 3″ then [Plan feature 3]
    when “Plan Feature 4″ then [Plan feature 4]
    ELSE 0

    After replacing the variables in code with parameters, the code will look like below:

    Data=data.frame(.arg1, .arg2, .arg3, .arg4, .arg5, .arg6)
    Data15 Data16 formula<-.arg2~.arg3+.arg4+.arg5+.arg6
    rf ntree= 1000, Importance = TRUE, do.trace = 100, na.action=na.omit)
    yhat Data16$.arg2<-yhat
    testdata<-rbind(Data15, Data16)
    testdata$.arg2′, ATTR([Year]),SUM([Enrollments]), SUM([var 1]),SUM([var 2]),SUM([var 3]),
    SUM([var 4]))

    This will let us run multiple iterations of random forest very easily as compared to manually adding and deleting variables in the code in R for every iteration. But, as you might have observed, this code takes exactly 4 variables only. This might be a problem since having a fixed number of variables in the model is a privilege you rarely (read as never) have.

    To keep the number of variables dynamic, a simple way in this case is to select “None” in the parameter which will make the corresponding variable 0 in the data. Random forest will ignore a column in the data if all the values are zero.

    As long as the no of variables is not too high, you can create as many parameters and select “None” in the parameters when you don’t want to select any more variables.

  • What happens when the big boys of the insurance industry meet under one roof?

    What happens when the big boys of the insurance industry meet under one roof?

    Apart from assessing how deep everyone’s pockets are, they discuss what they can do to make them deeper. The last 10 years have been incredibly profitable for the insurance industry as a whole – Personal, Commercial, Property, Life, Annuity, Healthcare – you name it! However, in an ever changing landscape where Social networking is shifting the balance of power to consumers, environmental pressures needing to be addressed, rise of economic power in emerging markets, Geo-political issues and last but not the least explosion of data and technology offer great risk to the insurance industry. And, typically, insurers tend to take risk, seriously!


    It was a pleasant summer morning in a big conference hall in Chicago where the leaders of the Insurance industry descended. All of these leaders were Chief Data/Insurance/Integration/Digital/Analytics/Innovation/Customer/Data Science Officers or basically anyone who had anything remotely to do with the data in insurance. A key theme of the day was how Insurance firms can move from a data centric approach to a data centric approach that solves business problems. A number of issues ranging from – Leveraging big data technologies & abolishing legacy systems, creating a culture of analytics in the organization, hiring the right people to work with data, using advanced machine learning and what can be done with Internet of Things. As an Analytics as a service company, TEG Analytics fueled passionate discussions on how they leverage advanced analytical techniques to drive business value from data. Here’s a summary of things that were discussed.

    As users of technology, insurers are typically laggards when compared to technologically progressive industries. In an environment of data proliferation and inexpensive computing & storage, as well said by a CTO, there is little scope of finding an excuse for not embracing a technology ecosystem that can drive gains from automation, operational efficiencies and improving the customer experience. Historically, insurers have used structured data to make tactical and operational decisions around customer targeting, risk pricing, loss estimation etc. However, with the augment of Internet of Things – massive volumes of unstructured and sensor data is being made available. According to one CDO, this is giving rise to a new generation of consumers who demand speed, transparency and convenience reversing the age old wisdom that ‘insurance is sold and not bought’. Choices are becoming complex to comprehend in the digital world of multiple interactions as choices become more dependent on trust defined by social networks and not agents/intermediaries. To harness this trend of ‘Big-data’ and complement structured with unstructured data, financial and intellectual investments are being done to allow insurers to make strategic forward-looking decisions from data. A unanimous view was to add new types of information, integrating external data sources, incorporating granularity to the data. With presence of NAIC, the standard-setting, regulatory support body in the room, the Importance of adhering to data governance policies was demonstrated.


    Some gasps were let out when a head of Data science said 75% of models do not see the light of the day of implementation. A key theme that pervaded the entire conference was – developing a culture of analytics within an organization. Data has always been used as a key ingredient in the different functions of the insurance industry. Risk, Underwriting, Pricing, Campaigns, Claims etc. all use data and key metrics in some way or the other. The problem arises when executives are unwilling to operationalize insights from data into making decisions. This is where having leaders in the space of Data, Information and Analytics must work together to inculcate a data driven decision making process in the organization. Seamless integration of processes of these three divisions can potentially transform the business without losing the sight of feasibility and risk. Harmonizing the analytics and business functions is imperative in capitalizing the tactical and strategic benefits of data. With new competitive pressures, risks, opportunities available in the market, the CAO must build a case for change with other business leaders. TEG Analytics believes that the analytics folks must work collaboratively with business leaders to define a clear, well-defined goal rooted completely in business strategy. Undertaking an analytics project with a business sponsor driven by a desired outcome and insights delivered @ the speed of business can create immediate, implementable value for the business function. Arvind (CEO of TEG Analytics) said that data science teams are sometimes infamous for interacting with the business teams in a language only they understand. They should engage project owners in a more holistic way and take them on a journey from the start to the end; finishing with a go-to plan or recommendation that is implementable.

    Sophisticated analytics progresses to a point where no more useful information can be extracted and all key decision-making has been automated to provide sharp & quick insights. Different functions in the insurance domain have historically used data. However, there is a big gap in using data and using data to make decisions swiftly.

    Underwriting in insurance can be automated and made intelligent by using structured data, sensor/IoT information along with unstructured data. Use of process mining techniques, NLP and deep learning algorithms, we can build personalized underwriting systems that take into account unique behaviors and circumstances.

    With the onset of internet, mobile and social; the way consumers interact has changed. This has led to disappearance of two things – distributor sales channels and the concept of ‘advice’ before buying an insurance product. Insurers must track the entire consumer journey to understand its needs and sentiments to be able to design personalized products. Advanced Machine learning techniques can be leveraged to infer customer behavior from this data. This machine-advisor evolution will offer intelligence based on customer needs by building recommender systems to advice products.
    Analytics will also help in improving profitability from operational efficiency. Multiple staffing models can be built and tested to increase resource utilization while increasing underwriting throughout sales performance. A machine learning based Claims insights platform can accurately model and update frequency and severity of losses over different economic and insurance cycles. Insurers can apply claims insights to product design, distribution, and marketing to improve overall lifetime profitability of customers. In order to determine repair costs, use of deep learning techniques to automatically categorize the severity of damage to vehicles involved in accidents. Use decision tree, SVM, and Bayesian Networks to build claims predictive models on telematics data. Use of graph or social networks to identify patterns of fraud in claims. These predictive models can improve effectiveness by identifying the best customers thereby refining risk assessments and enhancing claim adjustments.

    All in all, the Chief Data Officer conference was an insightful discussion on the current state of the insurance industry, its evolution in a world of massive data propagation and how firms must evolve with the changing landscape of the industry. Various players from different domains within the insurance vertical discussed key themes like abolishing legacy systems, moving to technologically advanced ecosystems capable of handling data from every sphere and leveraging advanced analytical techniques to derive business value for various functions of the industry.

  • Digital Marketing

    Impact Measurement & Key Performance Indicators

    What is Digital Marketing? What are its key components? How do we know whether our marketing works? This paper talks about the important KPIs that every marketer should measure, why these are important & what they say about your digital marketing performance.

    Digital marketing – Impact Measurement & Key Performance Indicators  

    The importance of advertising online!
    “The Internet is becoming the town square for the global village of tomorrow”
    – Bill Gates

    It’s become a cliché to say we live in a 24X7 networked world, but some clichés are true. Wespend more and more of our lives online, using the ‘net’ to book plane tickets, move money across bank accounts, read restaurant reviews , to an extent that we would be severely handicapped if our broadband stopped working tomorrow.

    As audiences spend more & more time online, just as predators follow prey on a migratory route, advertisers have started allocating greater parts of their budgets to digital media. According to research done by Zenith Optimedia1, while all advertising is likely to increase at a rate of ~ 5% between 2009 to 2013, online advertising will rise three times as fast.

    The charts below show how internet is grabbing a larger share of the advertising pie over time, at the cost of traditional media. If the projected growth rates continue, Internet advertising shall overtake Print in 2016 and TV in 2025, to become the single largest advertising channel.


    Understanding the online advertising beast

    “Rather than seeing digital marketing as an “add on”, marketers need to view it as a discipline that complements the communication mix and should be used to generate leads, get registrations or drive sales, rather than simply generating awareness.”
    – Charisse Tabak, of Acceleration Media

    Digital advertising is increasing in importance, even for heavy users of traditional media like CPG firms. Among TEG clients, we are seeing upto 1% of revenue being allocated for digital advertising. Consequently management is asking questions of the digital marketing group, about the tangible being generated. These are still early days, and most companies are still years away from getting true ROI numbers and optimizing digital spend, taking cross channel impacts into account. Mostly, our clients are deciding on the KPIs that are relevant & meaningful, and on the mechanisms that need to be set up to track & measure them.

    Before moving into the details, it is important to understand the key components of online marketing. Broadly, all online marketing channels are divided into 3 overarching categories, Paid, Owned & Earned. The definition first came into public domain in March of 2009, when Dan Goodall of Nokia, wrote in his blog about Nokia’s digital marketing strategy.


    At a high level, paid is media you buy – you get total control over messaging, reach and frequency, as much as your budget allows; earned is what others say about your brand – you get no control but you can influence outcomes if you’re smart; and owned is content you create – you control the messaging, but not so much whether anyone reads/views it.
    The three media types, are best suited to achieve different marketing objectives and have their own pros and cons. A very succinct summarization on the advantages and drawbacks of all three types of media has been done by Forrester Research’s Sean Corcoran, as illustrated below.



    Evaluating the impact of your digital strategy

    Evaluation of the ‘true impact’ of your digital marketing strategy & spend, is a multi-phase journey. At TEG Analytics, we break the journey into five distinct phases

    1. Datawarehousing & Reporting : Get all your digital data under one roof
    2. Dashboarding : Identify key KPIs and interactions and create meaningful dashboards
    3. Statistical Analysis: Determine the past impact on business KPIs like Sales, Profit etc from your Marketing inputs. This utilises Market & Media Mix modelling techniques
    4. Predictive Analysis: Use historical analysis to identify likely future scenarios for your business
    5. Optimization: Use inputs from predictive analysis to run an optimal marketing strategy by maximising ROI subject to budgetary constraints


    To begin with, one needs to decide what impact is desired from each digital channel e.g. e-mail campaigns should get my new customers to sign up on my site. This will easily lead us to the KPI that needs to be tracked on an ongoing basis, and any industry research can provide us the relevant benchmarks. The rest of the article shall focus on those metrics that TEG Analytics believes need to be captured & tracked to get a holistic & detailed understanding of digital marketing performance. These metrics have been arrived at from numerous projects in digital analytics that TEG Analytics has completed for clients across the globe.

    Paid Media

    Display Banner Advertising

    Display banners or banner ads are online display ads delivered by an ad server on a web page. It is intended to attract traffic to a website by linking to the website of the advertiser. Viewers can click on these ads to either watch on the page itself or get routed to advertiser’s website.

    Display Banners typically account for a lion’s share (40-55%) of digital advertising budget, based on TEG Analytics’ experience.
    There are 2 types of display banner advertising

    1. Flash/Static: These are simple banner ads with one or two frames. Approx 85% of all impressions served in FY11 were flash/static ads.
    2. Rich media: These are rich ads that allow people to expand and interact with further content within the banner itself. About 15% of impressions served in FY11 were rich media impressions

    The intention of display banner advertising is to drive traffic to our own website & also to create brand awareness. To determine if the strategy is working to deliver these goals, TEG Analytics recommends that all advertisers should track the following metrics

    • Impressions: This is an exposure metric that counts how many time an ad was shown
    • Clicks: Response to an ad measured through clicks
    • Click-rate: Clicks as % of impressions. Click-rate is declining in the industry as consumers tend to prefer getting all the relevant information within the banner itself
    • Rich media interactions: Is a counter of all interactions that take place within the rich media unit (e.g. expanding, clicking within the multiple parts of banner etc.)
    • Floodlight Metrics: This is a Doubleclick6 specific term and is used to track actions that visitors take once they arrive on the website. Each campaign and brand may track specific actions on site so there could be many floodlight metrics across all brands.

    NOTE: (All figures of proportion of digital ad spend by channel are approximations based on TEG analytics digital analytics project experience
    Based on transaction data analysis performed by TEG Analytics on client data)

    Paid Search

    Paid Search also known as Search Engine Marketing is used by advertisers to show relevant ads on Search Engines. For instance, a search for “Bleach” on a search engine, would throw up sponsored links on the top and on the right hand side of the page. These are Paid Search advertising links. Paid Search typically accounts for around 10 – 15% of the total digital advertising budget.

    Typically, there are two types of Paid Search or Search Engine Marketing ads:

    • Paid Search Ads: Text links that show up on a search engine
    • Content Targeting or Content network buy: Provided by Google and Yahoo. The ad is shown not on the search engine itself but within a network of advertisers that the client has connections with. An example would be if in Gmail there are a lot of conversations around travel to Africa, the relevant travel agency ad would show within the Gmail environment as a text link.

    Paid Search advertising is generally bought on Cost-per-Click basis. Google or Bing is paid only if someone clicks on the ad. This means the impressions on the Paid Search Advertising are technically free. Cost-per-click is decided through a bid engine. Lot of companies bid on the same key word and those with the highest bids get the position within the search engine.

    Content Targeting is bought on a Cost per thousand Impressions (CPM) basis.

    NOTE :( DoubleClick is a subsidiary of Google that develops and provides Internet ad serving services. Its clients include agencies, marketers (Universal McCann Interactive, AKQA etc.) and publishers who serve customers like Microsoft, General Motors, Coca-Cola, Motorola, L’Oréal, Palm, Inc., Apple Inc., Visa USA, Nike, Carlsberg among others. DoubleClick’s headquarters is in New York City, United States.)

    Paid Search is primarily intended to drive traffic to owned websites, and e-commerce links to induce purchase. TEG Analytics recommends that the following KPIs should be tracked and measured to evaluate paid search impact.

    • Impressions: This is an exposure metric that counts how many time an ad was seen/shown
    • Clicks: Response to an ad measured thru clicks
    • Click-rate: Clicks as % of impressions
    • Cost per Click: The total spend on Paid Search divided by the total Clicks that the advertiser has got. It is factored in through the bid management tool that the agency handles
    • Average Position: Specific to search advertising. It shows where the ad shows up within the paid search results of a search engine result page. Industry best practice is to be on the top 3 spots. Anything after the top 3 means that the ad shows up on the right hand side of the page, where both visibility of and response to the ad are minimal.

    Streaming TV

    Streaming TV is streaming video execution done on a digital platform. When advertising videos are shown on websites such as abc.com, google.com or others it’s called Streaming TV. Simplistically speaking it is an extension of television viewership. As television viewership starts moving to the digital channels, advertisers are moving money from television to digital media.

    Steaming TV accounts for around 15 – 20% of total digital advertising budget.
    Every Streaming TV ad is accompanied by free companion banner in the same environment. For instance if a streaming video is shown on www.abc.com, right next to it would be a companion banner similar to a display banner during the period the streaming video is playing. Online video ad streaming together with companion banner is termed as Streaming TV execution.

    Streaming TV executions are typically bought on a CPM or Cost per Thousand Impressions basis which implies the payment is based on the exposure that the advertiser gets in the Streaming Video space. In some rare cases, Streaming TV can also be bought on Cost per Video View.

    The desired actions from Streaming TV ads are very similar to TV advertising. TEG Analytics recommends our clients to track metrics that approximate to TRPs most closely.

    • Video Impressions: This is an exposure metric that counts how many times the video ad was seen/shown
    • Video Clicks: Response to the video ad measured thru clicks
    • Companion Banner Impressions: This is an exposure metric that counts how many time a Companion Banner ad was seen/shown
    • Companion Banner Clicks: Response to the display banner ad measured thru clicks
    • Video Midpoint or 50% Completion Metric: People have viewed at least 50% of the video ad

    Digital Coupons

    Digital coupons are the online counterpart of regular print coupons and are heavily used by CPG and other Consumer Products marketers, as a price discounting medium. Advertisers want to initiate trial & re-purchase by enticing consumers through price discounting. They account for approximately 2-5% of all digital advertising.

    • Digital Coupons are of the following types.
      • Print – Printing coupon on to a paper. Majority of redemptions are from this type of a coupon.
      • Save 2 Card – Come up especially in the last 2 years. Allows customers to save the coupon onto their loyalty card (such as loyalty card of Kroger or CVS). When customers go into the store and scan the loyalty card, these coupons will be redeemed at the point of sale. The volume of redemption on this type is quite low.
      • Print and Mail – Coupons are printed and mailed back to the advertiser along with the product purchase bill for redemption. This type of redemptions is also quite low in volume.
    • Distribution of coupons happens in the following ways.
      • DFSI Network – This is very similar to FSI. Multiple coupons of different companies and products are available to the customers and all these can be clicked and downloaded at one go by the customers. There is high volume of coupon prints but low volume of redemptions on this network
      • Banner Advertising – Display banners have coupons within the banners that can be clicked on and printed off the banner advertising itself
      • E-Mail Program – Coupons are included in some of the regular e-mails sent by the advertiser to the database of loyal customers. Customers can click on these coupons and download them
      • Websites – People can come to the advertiser websites and download the coupons available

    Coupons are essentially an extremely direct method of marketing, with a straight forward purpose of redemption by the customer. To ensure the redemption number being tracked is normalised across the size of the campaign itself, TEG Analytics recommends the redemptions be tracked as a proportion of prints. The KPIs that all coupon distributors should track are

    • Prints – Total # of Coupons printed or Save 2 Card from the digital environment
    • Redemptions – Number of printed Coupons Redeemed in store
    • Redemption Rate – Number of redemptions divided by Print. All coupons have an expiration date. However due to lag between actual redemptions and retailers reporting the redemption numbers, redemptions will be seen trickling in even after the expiration date

    Owned Media

    Company Websites

    Company websites are one of the most often used media to communicate the company and brand vision to the customer. It is also used as a tool to enable e-commerce for the advertiser’s products. Most of digital marketing, ultimately induces the viewer to visit the company website, hence it probably is THE most important part of your digital armoury. Websites typically consume 10-15% of the digital marketing budget.

    Google Analytics is typically used to track Website performance, and is highly recommended by TEG Analytics, as it has a very solid back end as well as evolved reporting interface. Google Analytics helps track where the visitors come from and all the actions they perform once they arrive on the websites such as the time, date and day of visit, whether the person is a first time or a return visitor, all content consumed and actions taken all the way through the points of exit. Google Analytics is a free online tool and there is no cost associated with implementing and using this system. On the flip side there is no direct customer support or customization possible.

    TEG Analytics recommends that the advertiser extract and maintain multiple metrics for tracking website performance.

    • Visits: A visit is a session that typically lasts for 30 minutes. After 30 minutes it would renew as a new session and would count as a new visit. If in a day a person visits the site 4 times, it would count as 4 visits assuming that each visit lasted less than 30 minutes.
    • Average time spent: Gives the time spent in seconds per visit to the website. Due to some tracking challenges, the time spent on the last page before exit does not get captured. For this reason, this metric should be used as a relative but not as an absolute metric. This implies that it is a good measure for comparison between websites but by itself can be used only for directional purposes.
    • Average pages per visit: Shows how many pages were consumed per visit. This is an engagement metric for most CPG firms. In other cases such as for e-commerce sites the objective would be to push people down the funnel as quickly as possible.
    • Return Visitors: This metric refers to visits from a browser that has already been exposed to the website. If a visit to www.example.com has already happened from a browser in a certain system, and the same browser is exposed again to this site, it is counted as a return visit. Return visit is true for the time period for which the report is selected. If it is for 1 month, then it is a return visit for that 1 month.
    • Unique Visitors: Unique visitors refer to the unique number of cookies on the browser that was exposed to the site. If the browser was exposed once, it would be 1 unique visitor. The unique visitor is true by default for a period of 2 years, unlike the return visitor which would be true for the period for which the report is being generated such as a month, a quarter or a year.

    Social Media

    Social Media is a part of owned digital media marketing. TEG Analytics has primarily worked with Facebook data. Overall clients spent about 6 – 10% of their digital media on Facebook & other social media sites.
    As Facebook is an owned vehicle, execution is typically handled by the community managers in the Marketing Communications team within the advertiser itself. There are some PR agencies that help the community managers handle pages as well, but TEG Analytics recommends that the Marketing Department build this up as an internal core strength as its importance is likely to increase over time. There is no cost associated with building any of the Facebook pages other than the cost of having the community managers on board.
    The cost associated with increasing exposure and driving fans to the Facebook page should be measured as a part of paid media and not as a part of the owned social media bucket.
    The main aim of Facebook pages is to drive loyal customers to become advocates and engage the customers to keep the advertiser brands on top of their minds. Keeping this in mind, the key metrics captured and measured from Facebook should be:

    • Fans to Date – This is a magnitude metric that gives the cumulative lifetime fans of the page. It is derived as previous day fans + total likes – total unsubscribes. The unsubscribes are the number of fans who have decided to unlike the pages
    • Monthly Interactions per Fan – This metric captures the quality of the fans on the Facebook page. As the objective is to drive engagement the hope is to have higher interactions per fan for each of the pages. Interactions is calculated as Likes + Comments + Discussion Posts + Wall Posts + Videos
    • Monthly Impressions per Fan – This captures the social reach of Facebook. For every fan of a brand page, what is the additional reach that they are providing for the facebook content?

    Relationship Marketing

    A lot of our clients have been capturing and maintaining a database of loyal customers over a period of time. A partner of choice has been a company called Merkle. Merkle system talks to another system that allows the advertiser to activate and communicate to the loyal customer database. The most popular vendor of the email communication system is a company called Responsys. 7Merkle captures the personally identifiable information (PII) on customers, scrubs it for validity and ensures privacy of each of the members coming into the loyal database. Responsys sends out emails to the customers in the Merkle database of loyal customers. The execution is primarily around sending out emails using the email ids in the database.

    Typically, we have seen that clients spend about 3-5% of their total budget on relationship marketing.

    The key metrics captured from this data source are primarily around the effectiveness of the email program.

    • Sent – The total number of email addresses to which the email was sent.
    • Delivered – The total number of email addresses to which the email was delivered. The difference between Sent and Delivered is called the Bounces and is a separate metric.
    • Open Rate – Out of the total email addresses the email was sent to, how many were actually opened.
    • Click Rate – This is a derived metric that tells how many clicks on the email happened as a proportion of the total delivered emails. This is calculated as Clicks/Delivered
    • Effective Rate – This is derived metrics that tells, of the total number of email addresses that opened the email, how many clicked on the email. This is calculated as Clicks/Opens

    Earned Media

    Buzz Marketing

    Buzz monitoring or social media monitoring is a fairly new discipline within most of our clients. Most companies use an outside tool/service like Sysomos, Radian6, peerFluence, Scout Labs, Artiklz etc. They typically provide information on brand and competitor related chatter in the social media space. Buzz technology has the ability to capture this chatter across blogs, news feeds, twitter and video scrapping, and builds holistic tracking systems.

    The tools need to be trained to capture relevant “keywords”. Companies monitor buzz and chatter on both their and competition brands, as it gives them a fairly good idea about their comparative positioning in the social media universe.

    There are also tools like Map from Sysomos that can be used by the brand team to understand broader consumer trends, to develop communication and get new product ideas and innovations.
    The key information from buzz that is critical for your brand is the extent of chatter about your brands, and the extent to which that chatter is ‘positive’. The KPIs that TEG Analytics recommends, to be tracked for this channel are

    • Mentions – This is a magnitude metric that captures the level of chatter about a particular tag defined by the community manager. For e.g. if information is being captured for a tag that says “Ipad”, this metric tells how many mentions of this chatter did the Buzz tracking tool find in the overall social universe.
    • %Positive/Neutral – This tries to get behind the sentiment behind the chatter. Based on an algorithm developed by the buzz tracking tool, every single post in the social environment is categorized into positive, negative or neutral. These algorithms are based on Natural Language Processing and are learning programs. This metric is a derived metric and is calculated as (total number of “positive+neutral” mentions)/total number of mentions.

    Life after metric tracking

    Tracking the relevant metrics is the first and very important step, to create a completely data driven decision system. It is the basic building block & your analytics suite cannot proceed without it. However, in order to make the best of your data and answer the “why” and “optimization” questions, we need to go further.

    During the creation of dashboards & reports, a lot of data is collected, curated and harmonised and goes through a lot of ETL activities. This data can be used for a variety of advanced analytics, which will help the company truly determine the ultimate impact of digital marketing on sales or brand equity. TEG analytics uses proprietary methodologies, to calculate “True Impact” of digital marketing by marrying digital data with traditional marketing data like TV, Print etc. and calculating Cross Channel Impact on Sales & Revenue. We have developed models using Bayesian hierarchical techniques, which eliminate all the noise and narrow down on the true impact of marketing.

    TEG Analytics also has a product for Digital Analytics called DigitalWorksTM that provides clients end to end digital analytics services. It has modules that address all the phases in analytics, as shown below.


    End Note
    To conclude, the world of digital marketing is new and exciting, and lot of the spend on digital marketing is currently being done to ‘keep up with the Joneses’, and the discipline that is present in creating traditional media marketing plans, is largely absent. However, this need not be the case, as most of the data as well as the tools to extract actionable insights from this data are available with consultancy firms like TEG Analytics. Once the power of this data is harnessed, companies will see a vast improvement in the efficacy and ROI of their digital marketing program.

Hide dock Show dock Back to top