• Medicare Plan Finder 2019

    Medicare Plan Finder: Scenario evaluation of plan placement and
    annual costs

    Executive Summary

    The Medicare Plan Finder (MPF) from Centres for Medicare and Medicaid Services (CMS) is used extensively by beneficiaries to compare and contrast plans based on their individual profile. A personalized search gives the option to the beneficiaries to enter the drugs they need to get more accurate cost estimates and coverage information…


    Get your Copy!

    Name (required)

    Business Email (required)

    Company Name (required)

    Phone Number

     Request HealthWorks Demo

  • Medicare Advantage 2019

    Landscape Report: First Glimpse

    * This report is created on data released by CMS on 28th September 2018.

    Executive Summary

    More than 20 million (33%) Medicare beneficiaries are enrolled in Medicare Advantage plans in 2018, which are offered by Private Payors as an alternative to the Original Medicare. According to -Centers for Medicare & Medicaid Services (CMS), Enrollment in Medicare Advantage (MA) is projected to be at an all-time high in 2019 with 22.6 million beneficiaries. This represents a projected 2.4 million (11.5%) increase from 20.2 million in 2018. Based on this projection, 36.7% of Medicare beneficiaries will be enrolled in Medicare Advantage in 2019.


    Get your Copy!

    Name (required)

    Business Email (required)

    Company Name (required)

    Phone Number

     Request Demo

    Landscape files revealed the MA plans along with stand-alone PDPs (Prescription Drug Plans) that become available for nearly 60 million beneficiaries to choose plans that best fit their needs. Medicare Payor Organizations struggle to analyze their competitive position in their market based on massive amount of Plan and Benefits data to be released by CMS in the first week of October. This first glimpse at the Medicare Advantage plans available for 2019 analyzes the Landscape files to review the plans offered during the Annual Enrollment Period (AEP). It describes the MA plan choices and availability, and how these compare to last year. Findings include:

    Premiums Spread:

    CMS officially announced a 6% decrease in Medicare Advantage premiums in 2019, from $29.81 in 2018 to $28.00, to improve health plan affordability for beneficiaries. 83% of MA enrollees are expected to have either the same or a lower premium in 2019. CMS estimates that 46% of MA beneficiaries in their current plan will have a $0 premium.

    Market Stability:

    Market stability across the years is an important component in estimating plan performance in a market by including effects of changes in market dynamics. If costs and deductibles have changed significantly across years, there may be higher prediction uncertainty. TEG StabilityIndex™ compares whether markets have significantly changed across 6 parameters: (a) The number of new plans that have been launched in the county (b) Age analysis of plan mix (c) Change in plan types HMO/PPO/SNP etc. (d) Change in premium (e) Changes in drug costs (f) Changes in number of parent organizations.

    Texas and Georgia rank highest on the instability index, because of greater changes in the competitive landscape, while Idaho and Indiana are relatively stable markets.


    Plans Availability:

    A total of 2,981 MA-PD* plans are offered this year, up from 2,565 – a 16% increase from last year. The increase in number of plans is predominantly among HMOs and Local PPOs. Local PPOs have seen a higher growth (26%) than Local HMOs (9%). Growth rate of Local PPOs is observed to be higher among the big players (53% growth) as compared to the others. HMOs continue to be the majority of plans available and account for about two-thirds (68%) of all plans offered in 2018. In addition, 706 Special Need Plans (SNPs) plans are offered, up from 640 last year. Over 900 Standalone Prescription Drug Plans are also available for the beneficiaries to choose from.


    Plan Accessibility

    The average Medicare beneficiary will have access to 23 MA plans in 2019 compared to 20 last year. Between 2013 and 2017, the average number of plans available to Medicare beneficiaries was relatively stable. At a county level, the average number of plans available has increased from 12 to 13 plans. However, the number of plans available varies widely across the country. Additionally, beneficiaries have more plans to choose from in metropolitan as compared to non-metropolitan areas.


    Variation in number of Plans

    On an average, 14 Medicare Advantage Plans are available per county in 2019, an increase from 12 plans in 2018, but this varies greatly across the country.

    • In 13% of counties, beneficiaries can choose from 21 to 30 plans and more than 31 plans in 6% of counties. Most of these counties are in key Medicare states with high eligible population like New York, California, Florida, Ohio, Michigan and Minnesota. Some of the counties like Medina, Mahoning (Ohio), Bucks and Dauphin (Pennsylvania) have up to 50 plan options.
    • 11 counties in Florida and 32 counties in New York average 37 plan options for beneficiaries. In contrast, 43% of counties’ (1,352) beneficiaries choose fewer than 10 plans. This, however, decreased from 55% in 2018.
    • Compared to last year, 27% more counties have plan choices for 11 and more plans implying an increase in plan options for beneficiaries. However, there are still 72 counties constituting to 0.4% of Medicare beneficiaries that offer one or no plan options.


    Market competition

    The 2019 Medicare Advantage market is comprised of national health plans, Blue Cross Blue Shield organizations, prominent regional health plans and specialized Medicare companies. Based on the 2019 CMS Landscape reports, Humana competes with more MA plans nationwide than any other. UnitedHealth continued to increase its MA plan offerings for the 2019 calendar year with 377 distinct plans identified, up 33 plans from last year. Blue Cross Blue Shield group, that has 36 subsidiaries, is third most competitive plan density followed by Aetna (including Coventry and other affiliates). Geisinger has seen the maximum growth in number of plans from last year (75%) followed by Wellcare (61%) and Aetna (55%).


    Plan availability across states:


    About TEG Analytics HealthWorks

    TEG’s HealthWorksTM is a comprehensive solution that mimics consumer choice in Medicare Advantage. By comparing plan features including benefits, MOOP, drug deductibles, star ratings and other attributes across Medicare Advantage plans at county level, Health-WorksTM helps firms identify the top attributes that lead to plan competitiveness, predict enrolments based on current plan and competitor’s plan features, design better products by simulating plan attribute levels, and obtain all CMS (Centers for Medicare and Medicaid Services) information in a single easy to use and intuitive dashboard.

  • Medicare : Demystifying Consumer Preference

    Science behind Medicare Modelling


    Science behind Medicare modelling

    What is the data and what can it do?

    Ample public data, available through CMS and Medicare, is used in a machine-learning based tool that mimics consumer choice in Medicare Advantage. By comparing plan features including benefits, MOOP, drug deductibles, star ratings and other attributes across Medicare Advantage Plans at each county level, firms can identify the top attributes that determine plan competitiveness, predict enrolments, create marketing strategies and design better products.

    How to leverage data and associated challenges?

    Models can be built to be flexible yet robust, and advanced ensemble techniques and bagging algorithms are used to predict Medicare Advantage enrolments for every single plan in each county in the country. Data from various sources and spared across various files will have to harmonized and married and maintained for building the database. The models will need to ingest a large volume of data – literally 4000 attributes for each plan. One will have to find an effective way to enable people to use it. And all this will need to be done with a high degree of accuracy and, given the short duration of the AEP period, within a very limited amount of time.

    Is there a reliable and efficient way to do this?

    TEG Analytics has created a holistic solution for this problem: HealthWorks- a platform where all CMS information is available in a single easy-to-use and intuitive dashboard. The models have been homed in over the years to give over 99% accuracy in enrolment predictions for plans with county-level granularity within 72 hrs of the release of CMS data. The findings can further be used to generate insights about factors affecting the performance of each plan.

    To achieve this various data sources are mashed up together – across demographic information, eligible, market penetration and growth over time, income levels of Medicare-eligible; plan level features including costs, MOOP, benefits, deductibles, drug information, etc; county-level competitive features such as the number of new entrants and new plans rolled out; changes in market conditions due to increased costs, MOOP, momentum, etc. Robustness of models is ensured through hold-out validations that are done within and across sequential years, and our metrics minimise prediction errors at three different levels – within a county, within organizations, and across large, small and medium plans.

  • FB Workplace buzzes at TEG

    Why FB Workplace – An Ideal Enterprise Networking Platform for Startups?

    Monica woke up on a Monday morning in her cosy double bed. Already delayed than her usual schedule, she hurriedly got ready, ate her breakfast and rushed for her workplace. On her way, she realized that she has to update the company employees with the newest trend in Artificial Intelligence and the categorical shift the analytics industry will be taking in automating the process in the future. Drafting a mail and sending it across to all the employees at various locations, including those in client locations was a cumbersome task, would require intervention of the centralized mail desk… blah… blah… blah… But what occurred to her was that her company, TEG Analytics, was a Workplace user. She opened the workplace app from her phone and updated a new post with the link to her article and BINGO!! Within 15 minutes there were more than 25 views.

    4PM and she was sorted for the day. TEG’s annual Cricket Tournament was supposed to start in the next 10 minutes.

    “Arvind wanted to watch the tournament”, she jumped off her seat as she got ready to post a live video in TEG workplace the moment the match started.

    “Damn! Girish dropped yet another catch?” gasped a retired hurt Satya, Captain of Team Andhra Chargers, from his workstation, as he watched the video at his workstation at the other end of the office.

    Welcome to the new age of Enterprise Networking, better known as “Workplace”. After 20 months in a closed beta under the working title Facebook at Work, Facebook has finally brought its enterprise-focused messaging and networking service to market under a new name, Workplace – a platform which connects everyone at your company to turn ideas into action.


    Workplace – which is launching as a desktop and mobile app with News Feed, Groups both for your own company and with others, Chat direct messaging, Live video, Reactions, translation features, and video and audio calling — is now opening up to anyone to use, and the operative
    word here is “anyone”. This means that Workplace won’t only cater to the desk dwelling “researchers” of the company who are brainstorming every day for insights of the industry, in their air-conditioned cabins, but also the more “naive” machine handlers, people whose work involve travel, and everyone who have been rarely included into an organization’s greater digital collaborations. TEG has been very active in using all the features extensively, be it knowing about an employee profile who has recently joined, be it planning their Marketing strategy in a closed Marketing Group, or be it creating events with calendar (date/time) details.

    What’s more is that, Workplace wants to build itself “the Facebook way” with a unique twist. As explained by Julien Codorniou, director of Workplace, in an interview in London, “we had to build this totally separate from Facebook, and we had to test and get all the possible certifications to be a SaaS vendor”. Workplace has been tested in every milieu ranging from the most dynamic MNCs to the rather conservative government agencies. In such a scenario, it provides the perfect enterprise social networking platform for the Indian start-up market. What’s better is that Workplace is an ad-free space, separate from your personal Facebook account – hence nothing to distract you. Also, with Workplace designed in the same model as Facebook, people with not much exposure to enterprise networking find this interactive and easy to handle.

    Facebook has signed up around 800 clients in India including Bharti Airtel and Jet Airways for its workplace version, making the country one of the top 5 in the world for the enterprise communication app. It counts Godrej, Pidilite, MakeMyTrip, StoreKing and Jugnoo as some of its top clients in India.

    “We see it is as a different way of running the company by giving everyone a voice, even people who have never had email or a desktop before,” said Julien Codorniou, VP-Workplace by Facebook, which competes with Google, Microsoft and Slack in the office-communication segment. “Every company where you see desk-less workers, mobile-only workers is perfect for us. That is why I think there is a strong appetite for Workplace in India compared to other regions. It is a huge market,” said Codorniou. “Mobile first is a global strategy, but it resonates well with Indian companies.” Facebook says that the Indian workforce below the age of 25 years


    prefers using mobile applications to communicate rather than emails. Here’s where Workplace wins.

    Another innovative prospect of Workplace is its pricing model, compared to its competitors like Yammer, Slack, etc. Workplace, unlike its competitors which has different rates for low end basic features and high end features, provides all the features to its users at the same rate. It charges monthly depending on the number of active users a company has in that month, active meaning, having opened and used at least once the Workplace account in the month.

    Most enterprise social platforms fail to achieve broad traction because they don’t offer ready answers to “how” & “how much” questions. With Facebook’s announcement about the integrations with Box, Microsoft or Dropbox or even Quip/Salesforce turning true; Workplace will be the all-you-need Enterprise Networking platform. Eventually, at the end of the day, if you don’t integrate with the tools your customer use, you’re going to lose a customer – and that’s not a very positive payoff.

    Certainly, with a brand like Facebook, which has over the years captured people’s imagination and flattered people with their innovative approaches, endorsing Workplace, this seems an interesting concept. It still needs to be seen how they fare in a completely different platform, the Enterprise Social Network, and the way TEG is using it will help figure out the drawbacks and potentials.

  • Madalasa Venkataraman (Madhu)


    1) How did you get interested in working with data?

    I think it’s a personality defect. I am sure my parents despaired of me listening to anything without a sound logically constructed argument.
    I was never one to work on gut feel and was more of a ‘rationalist’ in my college days – I would never accept anything anyone said without proof, or at least without a debate backed by numbers.

    Somewhere along the way, I got into analyzing data just for the heck of it. Cost/benefit analysis, the heuristic optimizations that we do on an everyday basis – these fascinated me. And then I discovered microeconomics and finance – there was a whole world out there that discussed rational decision making in terms of utility functions!  Suddenly, when I learnt statistics, things sort of fell into place, the inherent conflicts in my data analysis and methodologies started having a name and a theory behind them. That was a moment of revelation (as much as passing the first stats course was :)
    To me, data represents a move towards a single truth – a unified view that just ‘is’, the layers and stories it reveals and hides is simply fascinating. Everything that happens, that bugs us, that needs solving, the tools are just there to help us solve, if we have the data. Data science is the medley of statistics meets business meets urgent problems that need to be solved, and that calls out to me.

    I didn’t set out to be a data scientist, and I didn’t set out to be a geek (honestly!). But when training meets passion, the possibilities are endless. Add belief to the mix – the relevance of data sciences and its ability to influence policy, business and I think that’s a winning combination.

    2) What are your principal responsibilities as a data scientist?

    I lead the Stats team at TEG Analytics. My role of to build the team, to make sure we build TEG’s competence in information storage and retrieval, statistical analysis, visualization and in business insights. – I get involved in projects, we brainstorm and innovate, and come up with amazing solutions that are state of art, cutting edge – and with relevance to the business context, the business issue/case we are trying to solve.

    3) What innovations have you brought into this role?

    The way I perceive my role is probably a little different to the traditional data scientist role. I am also here to invite our talent into a world of wonderful global innovations in machine learning, in AI, in building the next generation or suite of products and solutions that will solve real world business problems, to inspire them to reach beyond their current projects, to read and to upskill with ravenous hunger. I come from a teaching background. I have been a professor in business studies, and I work together with our teams to build a consulting perspective to our solutions across domains.

    4) Can you share examples of any interesting projects where data science played a crucial role?

    Some recent ones that have been interesting and challenging
    1. A brand juice sentiment analysis project. This was interesting because of the complexities in the data and in the interpretation of sentiment scores.
    2. A Medicare plan competitiveness analysis based on publicly available CMS data, using which we predicted enrollments in Medicare plans mimicking customers choice models.

    5) Any words of wisdom for Data Science students or practitioners starting out?

    More often than not, data science is seen entirely as a statistical/analytics effort, or as a business problem where numbers are incidental to the story. Data sciences is cross-disciplinary in nature – we need the stats acumen, and the business insights. Domain knowledge is essential – be willing to invest in it, as long as it takes. Knowing the right program and package is cool; to stitch the story together and influence budgetary allocations is more so.

    6) What Data Science methods have you found most helpful?

    Common sense, but that’s not really a data science method. I can’t call out a specific method – I personally like to use a judicious mix of parametric and model-free techniques, depending on the case. On a more serious note, irrespective of the method, or the machine learning, or the neural network package, there is merit in covering the basics. A data dictionary, good foundations, EDA and good design of experiments are mandatory. The rest is really going to change based on the task at hand.

    7) What are your favorite tools / applications to work with?

    I have used a variety of tools. I like Stata quite a lot. I am often asked if R is a better bet than SAS. SAS is a very powerful, accurate tool – its advantage is, if the program runs, the results are pretty much what you are looking for. In R, due to the multitude of packages, it’s easy for beginners to get confused, and the results are more dependent on the programmer’s skill levels.

    8) With data science permeating nearly every industry, what are you most excited to see in the future?

    IOT and AI are converging in a big way. There is tremendous potential, it’s an exciting field. Geo-spatial data is already big, it will get bigger with drone technology and geo-spatial visualization is a great field to look forward to.
    In the sales and marketing analytics field AI/NN models for relevant 1:1 personalization, multi-touch attribution in media efficiencies, hidden Markov models/LSTM for sequence learning in text analysis – these are some of the things to look out for.

    9) What lessons have you learned during your career that you would share with aspiring data scientists entering the field?

    Three things I believe are important: First -  Business trumps statistics, and that’s the natural order of this world. Second -The solution should be as complex as necessary, and no more – it’s important to embrace Occam’s razor. Fast failure is more important than the perfect model.
    Third – and most important. There are principles and theories in statistics, information modelling, databases – and there are tools and techniques. It is imperative to keep oneself updated on the tools and programs and applications, but always to relate it back to the fundamentals, the principles and the theory.

  • Retail Demand Forecasting

    How to develop an Effective Scientific Retail Demand Forecast?
    Purpose of the Forecast
    The ability to effectively forecast demand is critical to the success of a retailer. demand forecasting is especially important in the retail industry because it leads to …… lower inventory costs, faster cash turnover cycles, quicker response to trends, etc etc. Retailers require forecasts that would be instrumental in directing the organisation through a minefield of capacity constraints, multiple sales geographies and a multi-tier distribution channel. A robust demand forecast engine will significantly impact both top & bottom lines positively.

    Demand forecasting helps understand key questions viz. which market would place demands for which specific type of product, which manufacturing unit should cater to which retailer, how many product units are required in a given season etc.? Given the sophisticated tools & techniques available today, all retailers should replace gut based decision making with scientific forecasts. The benefits, throughout the lifecycle of the analysis will far outweigh the one time set up and ongoing maintenance costs. There is a lot of value in answering these questions through scientific methodologies as compared to educated guesses, or judgmental forecasts.

    Business Benefits
    Scientific forecasting generates demand forecasts which are more realistic, accurate and tailored to specific retail business area. It facilitates optimal decision-making at the headquarters, regional and local levels, leading to much lesser costs, higher revenues, better customer service and loyalty.

    Range of Business Users
    Traditionally, only the sales department has used forecasts, but in evolved markets the usage of forecasts is now pan organizational. Sales Revenue Forecasting, Marketing & Promotion Planning, Operations Planning, Inventory Management etc. also extensively use sales forecasts. Indian retail needs to imbibe this discipline as their scale of operations grows larger and they are unable to cope with the entrepreneurial style of functioning, which was the key to their success in the start up phase.


    Typical Challenges Faced!
    Though demand forecasting is an important aspect of a retail business, more often than not, it is laced with multiple challenges. Some of them could be:

    Level/Scope of the Forecasts
    A large retailer may have thousands of SKUs. A conscious decision has to be made regarding the product hierarchy level at which the forecasts are needed, as it is very challenging to produce forecasts for all existing SKUs, neither does it make sound financial sense in most cases. Other concern would be the number of stores a typical large retailer possesses, and whether a separate forecast is needed for each of the stores.

    In order to optimise the cost-benefit, TEG recommends creation of forecasts at the “Store-Cluster” & “SKU-Cluster” levels. The store clusters are created using store characteristics, like past demand patterns and local/ regional demand factors. The SKU clusters are determined by the category type, life cycle etc.

    New Product Forecasts
    A retailer typically launches new products every month/season. Using past data to forecast is not feasible, as past data does not exist. TEG, would tackle the situation by considering complementary products, based on their key characteristics like target segment, product category, price level, features etc. A rapidly emerging methodology is the estimation of future demand using Advanced Bayesian Models (Fig. 3).

    Bizarre/Missing Historic Sales Pattern
    The erratic sales figures for many items in the store often pose a lot of issues for scientific methods of forecasting. In these situations, we need to resort to extensive statistical data cleaning exercises.

    Non-availability of True Historic Demand
    Historic sales are used to estimate the future demand, as it is the only reliable quantitative indicator available about customer demand. However it is possible that sales data end-up with a bias because of the inventory rupture or temporary promotional activities. These situations need correction to sales history to reflect the true demand. Since demand bias is very business specific, such corrections usually require in-depth domain expertise to interpolate/extrapolate the sales figures.

    Forecasting Techniques
    Demand forecasting techniques are broadly divided into two categories: Judgmental and Statistical.


    The Scientific (Statistical) Forecast Models
    Scientific models are divided into two categories, Extrapolation Models & Causal Models (Fig. 2). The extrapolation models are based exclusively on the past/historic sales data where the trend, seasonality & cyclicity prevalent in the historic sales data are examined to project the sales in future. However it is pretty intuitive that the future sales not only depend on the past sales but also on the other factors viz. economic trends, competitors’ movement, festive events, promotional activities etc. In order to incorporate such external factors in forecasting, a variety of causal models are available. In absence of such external factors’ data, the extrapolation models provide decent forecasts in most of the situations.

    Key Comparisons of Various Scientific Models


    Implementing Forecasts

    There are two aspects to forecasting implementation, technical and functional. The challenges in both are different, while the technical challenges are easy to solve given the profusion of tools available in the market today, the functional challenges involve significant business process re-engineering and hence are the most typical point where organizations fail to capture the impact of forecasts.

    Technological implementation can be done via modelling tools like SAS, E-views etc. or via forecasting simulators, like TEG’s proprietary FutureWorksTM tool. Given, the forecasting model equation, the tools, would just need the forecasting inputs in order to generate the forecasts. In case of pure time series models, the inputs are simply past figures of the forecasted metric, while in case of causal forecasting models, we need the forecasted values of the input variables as well. This would need multiple models to be created.

    Organisationally, the forecasts need to be essential requirements before taking key decisions on supply chain, future media spend, inventory reallocation etc. It should be in the organisation’s DNA, that any of these decisions will not be taken without a study of how these decisions would impact future demand. Traditionally, this has been the hardest part of implementation, as organizations used to operate in a quick, informal, entrepreneurial culture, often fail to see the benefit of the extra discipline and rigor.

    TEG Scientific Forecasting Process
    TEG follows the CRISP-DM process for all modelling processes, including forecasting.


    A TEG Case Study
    A leading Indian Sports Goods Retailer wanted to develop a scientific forecasting system to foresee the future sales across various product hierarchical levels irrespective of the supply side constraints to facilitate various short & medium-long term business plans. Additionally, the system could provide an early warning of potential slack across chains/stores to enable full resource utilisation course correction.

    Methodology & Results
    After setting up the forecasting objective and scope, a list of potential factors (Fig. 5) were considered to build the forecast model across various channel & SKU-cluster combinations. Rigorous data treatment phase followed and various families of statistical models (specified in Fig. 3) were tested for each channel & SKU-cluster combination. A single model was finalized which produced the accuracy at the satisfactory level. Fig. 6 depicts one such model which was used to produce forecasts for 12 weeks in future. As evident, model is doing a good job in anticipating demand for certain types of interventions like, ICC events & seasonal promotions where demand is supposed to shoot upwards.



    Key Take-Aways

    The deployment of the Scientific Models to their forecasting process helped the Retailer in the following ways:

    1. Improved Forecasts – The forecasts were improved in the range of ~2-15% across different store-cluster & SKU-cluster combinations.
    2. Better Stock Management – The key achievement was to accurately pinpoint the slack periods for some of the SKU-clusters which were eating up the rack space in those time periods earlier. The retailer was also able to identify the unfulfilled demand for some of the SKU-clusters which was not getting captured with the traditional judgement forecasting approach. Identification of these gaps helped the retailer to better manage the stocks across different store-clusters by relocating them from low demand stores to high demand stores.
    3. Early Warning of Lull Periods – The knowledge of low sales regime well in advance (12-24 weeks) helped the retailer to frame the promotion calendar so that the sales could be hiked up to meet the targets.
  • TEG at Cypher 2017

    Madalasa Venkataraman, Chief Data Scientist at TEG Analytics, is a researcher at heart and a contrarian by nature. Madhu brings a unique perspective in leveraging the vast knowledge of statistical concepts and analytical techniques to solve complex business problems. A Fellow from IIM Bangalore, she has worked in academia and corporate sector. Madhu drives the culture of innovation at TEG and encourages the team to challenge status quo. She is often seen huddling with project teams to develop solutions through brainstorming and whiteboarding.

    Madhu has 18+ years of experience across marketing, finance insurance, and urban governance. Her area of interest and involvement at TEG include: marketing mix models, semantic/text analytics, recommender systems, forecasting, fraud analytics and pricing analytics. Madhu is also an avid columnist, her publications are frequently seen in academic and policy journals.

    Given that TEG has immense expertise in Sales and Marketing Analytics, Madhu is speaking at Cypher 2017 on how Big Data and AI in Sales and Marketing Analytics are driving micro-segmentation and personalized campaigns.

    As companies learn to process the flood of data from all sides, traditional models of marketing are slowly giving way to smarter, niche strategies. Firms are using big data analytics to uncover highly profitable niche segments, changing their channel management strategy and sales plays. While Big data helps in identifying these micro-segmentation layers, AI is used to personalize their campaign efforts to reach out to this targeted audience.

    Look forward to seeing you at the conference on 21st September from 11.20 – 12.15 pm. Feel free to reach out to us for more details.

    Who should attend: Aimed at “geeks” who are interested in AI – Big Data, “suits” who are trying to understand its implications on sales and marketing, and “Genies” as we like to call them – combination of both profiles, will reap most benefits from this session.

  • Offshore Analytics COE

    Offshore Analytics COE – cracking the code

    What is an ACOE?

    Increasingly, companies rely on their information systems to provide critical data on their markets, customers and business performance in order to understand what has happened, what is happening – and to predict what might happen. They are often challenged, however, by the lack of common analytics knowledge, standards and methods across the organization. To solve this problem, some leading organizations are extending the concept of Centers of Expertise (COE) to enterprise analytics.

    With these COEs, they have realized benefits such as reduced costs, enhanced performance, more timely service delivery and a streamlining of processes and policies. An Analytics COE (ACOE) brings together a community of highly skilled analysts and supporting functions, to engage in complex problem solving vis-à-vis analytics challenges facing the organization. The analytics COE fosters enterprise-wide knowledge sharing and supports C-level decision making with consistent, detailed and multifaceted analysis functionality.

    The eternal debate – in-house versus outsource

    On scanning the market it is evident that both the in-house and outsourced models are equally prevalent at least in India based ACOEs. Most of the financial institutions like Citicorp, HSBC, Barclays etc have chosen to go in-house. This is primarily due to data sensitivity issues. Firms in industries where the data security concerns are not as high like CPG, Pharma etc typically choose third party specialized analytics shops to set up ACOE for them. While making a decision on in-house versus outsource some points to be kept in mind are

    1. External consultants can be utilized for the heavy lifting i.e. data cleansing & harmonisation / modeling / reporting work. Internal resources with their better understanding of the competitive scenario, internal business realities and management goals can concentrate on using the insights generated from the analysis / reporting to formulate winning strategies/tactics
    2. External consultants provide you the flexibility of ramping up / down at short notice based on fluctuations in demand
    3. Analytics resources span a wide variety of skill sets across Data warehousing / BI / Modeling / Strategy. It’s difficult to find folks with skills / interests across all these areas. Often you do not need a skill set full time e.g. a modeler might be needed only 50% of the time. In case you hire internally you have to sub optimally utilize him / her for the balance 50%. An external team gives you the flexibility to alter the skill mix depending on demand while keeping the headcount constant e.g. a modeler can be swapped for a DW/BI resource if the need arises
    4. Possibility of leveraging experience across clients / domains.

    Initiating the engagement

    As with any outsourcing arrangement, setting up an ACOE is a 3 step process



    Ongoing governance of the relationship


    At TEG we recommend a 3 tier governance structure as described in the figure above for all ACOE relationships.

    1. The execution level relationship between analysts on both sides that takes decisions on the day to day deliverables
    2. The Project Manager – Client Team lead level relationship that works to provide prioritization and resolve any execution issues
    3. Client Sponsor – Consultant Senior management level relationship that works on relationship issues , contractual matters & account expansion etc

    Projects executed under ACOE

    Typically any project / process that need to be done on a regular and repeated basis is ideal for an ACOE. Building out an ACOE ensures high level of data and business understanding as the same analysts work across multiple projects. This set up is not suitable for situations where the analytical wok happens in spurts, with periods of inactivity in between.

    TEG runs ACOE for several Fortune 500 clients, and the analysts are engaged in a variety of tasks

    1. Apparel & Sports goods retailer
      • Maintain an Analytical Datamart of all sales , sell through , sell in and pricing data across multiple franchisees stores and accounts
      • Maintain the entire suite of Sell Through reporting for retail operations, merchandising & sales teams. This set of reports includes sales & inventory tracking , SKU performance and promotion tracking at various levels
      • Formulate promotion pricing strategy for factory outlet stores using sell through data
    2. Beauty products major
      • Survey analytics , identifying key trends from the survey results and drivers analysis
      • Market Basket Analysis, analyse past purchase history to identify the product combinations that have a natural affinity towards each other. Insights based on this analysis are used for cross-promotions, brochure layout, discount plans, promotions, and inventory management
      • ETL on the sales and marketing data to create an Analytical Data Mart that can be used as a DSS tool for strategic pricing & product management decisions
      • Online competitor price tracking, create a link extractor that scrapes price aggregator and competitor websites and creates a database of competitor product prices. This database is used by our client to perform price comparison studies and take strategic decisions on pricing
      • Generate Executive Management Workbooks to track market share of Top 100 products & provide analytical insights
    3. Credit card and personal finance firm
      • Creation of basic customer marketing , risk & collections report with multiple slicers for extensive deep dive analysis of customer transaction data
      • Collection queue analysis , ensuring equitable distribution of collection calls amongst different collections agents
      • Customer life time value analysis
      • Customer product switching analysis
      • Acquisition & active customer model scoring & refresh
    4. Nutritional & consumer products MLM firm
      • Campaign management using SAS, SQL & Siebel. Complete campaign management including propensity model creation , audience selection for specific campaigns, design of the campaign using DOE methodology , control group creation , campaign loading in the CRM system , post campaign analysis
      • Customer segmentation
      • Distributor profitability analysis
      • Customer segment migration analysis using Markov chain based models
    5. CPG major in household cleaning products
      • Creation of digital analytics DataMart using data across 18+ sources across 11 marketing channels
      • Creation and maintenance of complete reporting and dashboard suite for digital marketing analysis and reporting
      • Price and promotion analysis , price elasticity modeling , pricing tool to determine revenue and profitability impact of key pricing decisions
      • Market share reporting across 25 countries in LATAM & APAC
      • Creation of data feeds for MMX modeling
      • Shipment , Inventory & consumption analysis with a view to optimizing inventory and shipping costs
      • SharePoint dashboard creation to track usage of corporate help resources
    6. Consulting company focused on automobile sector
      • Demand forecasting of automotive sales based on variations in marketing spend across DMAs
      • Propensity modeling to determine the ideal prospects for direct sale of customized electric vehicle
      • Customer segmentation to determine the ideal customer profile for relaunch of a key model

    Key takeaways

    The ACOE model has been successfully deployed by clients across a variety of industries to beef up their analytical capabilities.

    In some cases the requirement is tactical for a limited period of time, but mostly clients use it strategically to harness best of breed capabilities that are difficult to build in house. The critical success factors in a ACOE relationship are

    1. Strong business understanding of client processes by the consultant team. This is usually done by posting key resources onsite on a permanent basis or on a rotational basis
    2. Strong governance at multiple levels
    3. Tight adherence to business and communication processes by both parties
    4. Well defined scope of services for the consultant teams
  • Improving Marketing Effectiveness (Using Performance Pointer)

    Improving Marketing Effectiveness (Using Performance Pointer)

    Retail and Consumer goods companies run multiple campaigns, promotions, and incentives to lure customers to buy more. A great deal of time, energy, and resources are deployed to execute the promotion programs. In most organisations the allocation of funds to various programs is based on gut feel or past experience. If the promotions do not pan out the way they were intended or perform better than expected, the decision maker is unable to explain the phenomenon or repeat the performance. Also the promotions may be targeted at a macro segment while they may be effective only at a micro segment thereby reducing the overall effectiveness of the program.


    Using Analytics the gut-based decision can be supported by facts. It helps the business make a better business decision and stay ahead of its competitors who may rely purely on gut feel or past experience. We have chosen an MLM (Multi Level Marketing) company as an example as the problem of allocating funds to various incentives is further amplified due to the large sales force engaged in the MLM companies. Retail and consumer goods industry can draw parallel between incentives and promotions that are run through the year targeted at various segments to improve sales.

    According to Philip Kotler, one of the distribution channels through which marketers deliver products and services to their target market is Direct Selling where companies use their own sales force to identify prospects and develop them into customers, and grow the business. Most Direct Selling companies employ a multi-level compensation plan where the sales personnel, who are not company employees, are paid not only for sales they personally generate, but also for the sales of other promoters they introduce to the company, creating a down line of distributors and a hierarchy of multiple levels of compensation in the form of a pyramid. This is what we commonly refer to as Multi-Level Marketing (MLM) or network marketing. Myriad companies like Amway, Oriflame, Herbalife, etc. have successfully centered their selling operations on it. As part of Sales Promotion activities MLM companies run Incentive programs for their Sales Representatives who are rewarded for their superior sales performance and introducing other people to the company as Sales Representatives.


    Business Challenge
    Although Incentives play a major role in sales lift, there are many other environmental factors such as advertisement spend, economic cycle, seasonality, company policies, and competitor policies also affect sales. It becomes increasingly difficult to isolate the impact of incentives on the sales. Usually MLM companies run multiple and overlapping incentive programs i.e. at any given time more than one incentive programs run simultaneously. See figure below. Rewards could be monetary or include non-monetary items like jewelry, electronic items, travel, cars etc. and are offered on a market-by market basis. A key question that arises is – “how do we understand the effectiveness of these multiple incentive programs?” The success of any MLM company is largely dependent on the performance of its Sales Representatives, Incentive programs are of paramount importance for realising the company’s marketing objectives and hence form a vital component of its marketing mix. Some of the large Direct Selling Corporations end up spending millions of dollars on Incentive programs in every market. It is important for the incentive managers to understand the effectiveness of the incentive programs, often measured in terms of Lift in Sales, in order to drive higher Return on Incentive Investment as against making gut-based investment decisions. To summarise, it is not easy to measure the ROI on Incentive programs for two broad reasons. First, it is difficult to separate the Lift in Sales due to Incentive programs from that due to other concurrent communication and marketing mix actions. Secondly, in most cases there may be an absence of “Silence Period” i.e. one or more incentive programs are functional at all points of time. This makes the task of baseline sales estimation virtually impossible by conventional methods. Usage of Analytics – the science of making data-driven decisions – becomes indispensible in order to address the above constraints while at the same time statistically quantify the individual Incentive ROI and make sales forecasts with sound accuracy. While doing so a systematic approach using best practices is followed in order to obtain reliable results in a consistent and predictable manner.

    It is very tempting to jump straight into the data exploration exercise. However, a structured approach ensures the outcome will be aligned with the business objectives and the process is repeatable.

    Objective Setting
    The first step towards building an analytics based solution is to list down the desired outcomes of the endeavor prior to analysing data. This means to thoroughly understand, from a business perspective, what the company really wants to accomplish. For instance, it may be important for one MLM company to evaluate the relative effectiveness of various components of its marketing mix which includes Incentives along with pricing, distribution, and advertising etc. while some other company may be interested in tracking the ROI of its past incentive programs in order to plan future incentive programs. In addition to the primary business objectives, there are typically other related business questions that the Incentive Manager would like to address. For example, while the primary goal of the MLM Company could be ROI estimation of Incentive programs, the incentive manager may also want to know which segment of Representatives are more responsive to a particular type of Incentive or to find out if the incentives are more effective in driving the sales of a particular product category. Moreover, it may also be prudent to design the process in a way that it could be repeatedly deployed across multiple countries rapidly and cost effectively. This is feasible where a direct selling company has uniform data encapsulation practices across all markets. A good practice while setting the objectives is to identify the potential challenges at the outset. The biggest challenge is the volume of Operational Data. In the case of retail and direct sellers it runs into billions of observations over a period of few years.

    Data Study
    Most Direct Sellers maintain sales data at granular levels and aggregated over a period of time and across categories along with Incentive attributes, measures and performance indicators. Hence, we can safely assume that most companies will have industry specific attributes like a multi-level compensation system. However, every MLM company will also have its own specific set of attributes which differentiate it from its competitors. It is therefore vital to develop a sound understanding of historical data in the given business context before using it for model building. This exercise also entails accurately understanding the semantics of various data fields. For instance, every MLM company will associate a leadership title with its Representatives. However, the meaning of a title and the business logic used to arrive at the leadership status of a Representative will vary from company to company. Good understanding of other Representative Population attributes like Age, duration of association with company, activity levels, down line counts etc. also leads to robust population segmentation. Depending on business objectives other data related to media spend, competitor activity, macro economic variables etc. should also be used. A potential issue of data fragmentation might arise here as voluminous data is broken up into smaller parts for ease of storage, which needs to be recombined logically and accurately using special techniques while reading raw data.
    Moreover, raw data coming from a data warehouse usually contains errors and missing values. Hence, it becomes important to identify them in the data using a comprehensive data review exercise so that they may be suitably corrected once the findings have been validated by the client. In extreme cases the errors and inconsistencies might warrant a fresh extraction of data. This may lead to an iterative data review exercise which is also used to validate the entire data understanding process. Any lapse in data understanding before preparing data for modeling might lead to bias and errors in estimation in future. A data report card helps clients understand the gaps in their data and establish procedures to fill the gaps. A sample scorecard is shown for reference.


    Data Preparation
    Modeling phase requires clean data with specific information in a suitable format. Data received from the company cannot be used as input for modeling as is. Data preparation is needed to transform input data from various sources into the desired shape. Not all information available from raw data may be needed. For example, variables like name of Incentive program, source of placing orders, educational qualifications of Representatives etc. may not be needed for building a model. Key variables that must go in model building are identified and redundancies removed from the data. Observations with incorrect data are deleted and missing values may be ignored or suitably estimated. Using the derived Representative attributes the population is segmented into logically separate strata. Data from different sources is combined to form a single table and new variables are derived to add more relevant variables. In the final step of the data preparation exercise measures are aggregated across periods, segments, geographies and product categories. The aggregated data is used as an input for modeling.



    Data Modeling
    A statistical model is a set of equations and assumptions that form a conceptual representation of a real-world situation. In the case of an MLM company a model could be a relationship between the Sales and other variables like Incentive costs, number of Representatives, media spend, Incentive attributes and Representative attributes. Before commencing with the modeling exercise the level at which the model should be built needs to be ascertained. A Top Down approach builds the model at the highest level and the results are proportionally disseminated down at the population segment level and then on to individual level. The resultant model may not be able to properly account for the variation in the dependent variable and introduce bias in estimates as the existence of separate strata in the population while model building is ignored. Choosing a Bottom Up approach on the other hand builds the model at the individual level and the results are aggregated up to the segment level and then on to the top level. This is exhaustive, but at the same time, very tedious, as Sales data usually runs into millions of observations and not all Representatives may be actively contributing to the Sales at all times. Moreover, if the project objective revolves around estimating national figures this exercise may become redundant. The Middle Out approach may be the best approach to model the data. The model is built at the Segment level and depending on the requirement the results may be aggregated up to the top level or proportionately disseminated down at the individual level. The first step of model building exercise is to specify the model equation. This requires the determination of the Dependent variable, Independent variables and the Control Variables. Control variables are those variables that determine the relationship between the dependent variable and independent variables. In a baseline estimation scenario, the Sales measure is the dependent variable; Incentive cost and other Incentive attributes form the independent variables; segmentation variables, time series, geography, inflation and other variables like media spend act as Control variables. Usually the model is non-linear i.e. the dependent variable is not directly proportional to one or more independent variables. A non-linear model may be transformed to a linear model by use of appropriate data transformations. For example, the relationship between Sales and Incentives is non-linear. Representative Incentives behave like consumer coupons where there is an initial spike in Sales at the start of the Incentive followed by a rapid decline, but the impact returns at the end of the incentive as Representatives try to beat the deadline. Application of coupon transformation to Incentive variables therefore produces a linear relationship between Sales and Incentives. Model coefficients are then estimated using advanced statistical techniques like Factor Analysis, Regression and Unobserved Component Modeling. The common practice followed across industry is to use Regression Analysis for explaining the relationship between the dependent variable and the independent variables and separately employ time series ARIMA (Auto Regressive Integrated Moving Average) models for forecasting as the data invariably has a time component. To solve the Regression models with all incentive attributes accounted for, they are first condensed into a few underlying factors accounting for most of the variance. These factors are then part of the regression along with the control variables. Final coefficients are a combination of factor loadings and model coefficients. This Regression model equation allows us to understand how the expected value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. The time series model is developed by reducing the non-stationary data to stationary data, removing the Seasonality and Cyclic components from it and estimating the coefficients of the ARIMA model. This approach often leads to in concordant answers from Regression and ARIMA models as Regression analysis will miss the trend and ARIMA forecasting may fail to account for causal effects. Unobserved Component Modeling may be employed if a very high accuracy is desired from the model. It leverages the concepts of typical time series analysis where observations close together tend to behave similarly while patterns of correlation (and regression errors) breaks down as observations get farther apart in time. Hence, regression coefficients are allowed to vary over time. Usual observed components representing the regression variables are estimated alongside the unobserved components such as trend, seasonality, and cycles. These components capture the salient features of the data series that are useful in both explaining and predicting series behavior. Once the model coefficients are determined it is essential to validate the model before using it for forecasting.


    Model Validation
    The validity of the model is contingent on certain assumptions that must be met. First l, the prediction errors should be Normally Distributed about the predicted values with a mean of zero. If the errors have unequal variances, a condition called heteroscedasticity, Weighted Least Squares method should be used in place of Ordinary Least Squares Regression. A plot of residuals against the predicted values of the dependent variable, any independent variable or time can detect the violation of the above assumption. Another assumption that is made in time series data is that prediction errors should not be correlated through time i.e. errors should not be auto correlated. This may be checked using the Durbin Watson test. If errors are found to be auto-correlated then Generalized Least Squares Regression should be used. It is also important to check for correlation among the independent variables, a condition called multi collinearity. It can induce errors in coefficient estimates and inflate their observed variances indicated by a variable’s Variation Inflation Factor (VIF). Multi-collinearity can be easily detected in a multiple regression model using a correlation analysis matrix for all the independent variables. High values of correlation coefficients indicate multi-collinearity. The simplest way to solve this problem is to remove collinear variables from the model equation. However, it may not always be feasible to remove variables from the equation. For example, the cost of an Incentive program is an important variable that cannot be removed if found to have a high Variation Inflation Factor. In such cases Ridge regression may be used in place of OLS Regression. However, some bias may sneak in the coefficient estimates. The goodness of model fit may be adjudged by the values of R², which is the model coefficient of determination. Its value ranges between 0 and 1. A good fit will have an R² a value of greater than 0.9. But any value of R² close to 1 must be bewared as it could be causing over-fitting. Such a model would give inaccurate forecasts. A low value of Mean Average Percentage Error (MAPE) of predicted value is also indicative of a good fit. Once the model assumptions are validated and goodness of fit established the model equation can be used for reporting and deployment purposes.


    Reporting & Deployment
    Depending on the chosen dependent variable based on the scope of the incentive modeling exercise the baseline measures like Sales, Volume, Representative count etc can be estimated using the model equation. These estimates along with other variables and derived values can be used to obtain insights about Incentive performance through dashboards with KPIs and other pre defined reports like annual Lift in Sales vs. Incentive Cost, Baseline Sales vs. Sales Representative Count, etc. The key to realizing the business objectives & deriving value from the modeling outcomes is to capture & present the findings in the most suitable form which will enable the end user to understand the business implications as well as to flexibly slice & dice data in any way in a convenient fashion without having to make any costly investments in acquiring and maintaining system resources. For example, an incentive manager could look at the average ROI of a particular type of Incentive program as a pre-built report and be given the flexibility of being able to compare the cost of that Incentive with that of another type of Incentive over an online hosted analytics platform which presents pre-canned reports along with user customizable reports and multi- dimensional Data Analysis capability. Such a system can give the end user the freedom to access the reports and analyse the data anytime anywhere using an internet browser. Once deployed, it may be refreshed with additional data in future and may also be used for multiple markets with minor region specific customisations.




    The Insight-based approach will significantly increase the confidence level of incentive managers while planning the incentive programs for MLM activities. They will be able to identify the Incentive programs which deliver high, medium and low paybacks, and hence optimize investment in them. It will also help the Direct Seller to check if any product categories are more responsive to Incentives than others. The endeavor can make significant impact where counter intuitive facts surface. For example, any particular event or holiday, which might be an influencing factor in designing incentive programs during a particular time of the year, may actually turn out to be an insignificant contributor to company sales. Incentive Managers can simulate various scenarios by assigning different values to the contributors and macro economic variables and forecast the ROI of near future incentive programs. This will enable regional incentive managers to drive efficiency and effectiveness in incentive planning and realise the company objective of enhanced Sales and ROI. The share of Incentive programs in marketing budget of most Direct Sellers has been progressively increasing and the expenditure incurred is steadily going up in face of competition in emerging markets like India and China which are fast becoming the engines of growth for global Direct Sellers. Investment in analytics based decision support systems will prove to be the difference maker for Direct Sellers.


  • Random Forest in Tableau using R

    Random Forest in Tableau using R

    I have been using Tableau for some time to explore and visualize the data in a beautiful and meaningful way. Quite recently, I have learned that there is a way to connect Tableau with R-language, an open source environment for advanced Statistical analysis. Marrying data mining and analytical capabilities of R with the user-friendly visualizations of Tableau would give us the ability to view and optimize the models in real-time with a few clicks.

    As soon as I discovered this, I tried to run the machine learning algorithm Random forest from Tableau. Random forest is a machine learning technique to identify features (independent variables) that are more discerning than others in explaining changes in a dependent variable. It achieves that by ensembling multiple decision trees that are constructed by randomizing the combination and order of variables used.

    The prediction accuracy of Random forest depends on the set of explanatory variables used in the formula. To arrive at the set of variables that makes the best prediction, one often needs to try multiple combinations of explanatory variables and then analyze the results to assess the accuracy of the model. Connecting R with Tableau will help you save a lot of time that would have otherwise gone into the tedious task of importing the data into Tableau every time you add/remove a variable.

    Tableau has a function script_real() that lets you run R-scripts from Tableau. To use this function in any calculated field, you need to set up the connection by following steps:

    1. Open R Studio and install the package ‘Rserve’


    2. Run the function Rserve()


    3. Once you see the message “Starting Rserve…”, open tableau and follow the below steps to setup the connection


    When you click on “Manage External Service Connection” or “Manage R Connection” depending on the version of Tableau, you’ll see the following window.


    Click OK to complete the connection between Tableau and R on your machine.

    Let’s take a simple example to understand how to leverage the connection with R to run Random Forest. In this example, I need to predict the enrollments for an insurance plan based on its features (say costs and benefits) and the past performance of similar plans.

    After importing the dataset into Tableau, we need to create a calculated field using the function script_real() to run the script for random forest which looks like below:

    Data<-read.csv(“C:/Tableau/Test 1.csv”)
    Data15 Data16 attach(Data15)
    rf ntree= 1000, Importance = TRUE, do.trace = 100,
    yhat Data16$Enrollments<-yhat

    To run the same script in Tableau using the function script_real(), we need to create a dataframe using only the required columns in the imported dataset. This should be done using the arguments .arg1…arg5 instead of actual column names since R will be able to access only the data that’s referred through arguments.

    The values for these arguments should be passed at the end of the R-script in the respective order i.e., .arg1 will take the values of the first mentioned field, .arg2 will take the values of the second mentioned field and so on.

    After making these changes, the code will look like the following:

    Data=data.frame(.arg1, .arg2, .arg3, .arg4, .arg5, .arg6)
    Data15 Data16 formula<-.arg2~.arg3+.arg4+.arg5+.arg6
    rf ntree= 1000, Importance = TRUE, do.trace = 100, na.action=na.omit)
    yhat Data16$.arg2<-yhat
    testdata<-rbind(Data15, Data16)
    testdata$.arg2′, ATTR([Year]),SUM([Enrollments]), SUM([Plan feature 1]),SUM([Plan feature 2]),SUM([Plan feature 3]),SUM([Plan feature 4]))

    The calculation must be set to “Plan ID” level to get the predictions for each plan ID.

    Although this approach achieves the objective of predicting enrollments for each plan, it doesn’t offer us the flexibility to run multiple iterations without having to change the code manually. To make the model running easier, we can create parameters as shown below to choose the variables that go into the model.


    Then, we can create calculated fields (as shown below) whose values change based on the variables selected in the parameters.

    case [Parameter1]
    when “Plan Feature 1″ then [Plan feature 1]
    when “Plan Feature 2″ then [Plan feature 2]
    when “Plan Feature 3″ then [Plan feature 3]
    when “Plan Feature 4″ then [Plan feature 4]
    ELSE 0

    After replacing the variables in code with parameters, the code will look like below:

    Data=data.frame(.arg1, .arg2, .arg3, .arg4, .arg5, .arg6)
    Data15 Data16 formula<-.arg2~.arg3+.arg4+.arg5+.arg6
    rf ntree= 1000, Importance = TRUE, do.trace = 100, na.action=na.omit)
    yhat Data16$.arg2<-yhat
    testdata<-rbind(Data15, Data16)
    testdata$.arg2′, ATTR([Year]),SUM([Enrollments]), SUM([var 1]),SUM([var 2]),SUM([var 3]),
    SUM([var 4]))

    This will let us run multiple iterations of random forest very easily as compared to manually adding and deleting variables in the code in R for every iteration. But, as you might have observed, this code takes exactly 4 variables only. This might be a problem since having a fixed number of variables in the model is a privilege you rarely (read as never) have.

    To keep the number of variables dynamic, a simple way in this case is to select “None” in the parameter which will make the corresponding variable 0 in the data. Random forest will ignore a column in the data if all the values are zero.

    As long as the no of variables is not too high, you can create as many parameters and select “None” in the parameters when you don’t want to select any more variables.

Hide dock Show dock Back to top