Very Big Data

The field of analytics has fallen into a few big holes lately that represent both its promise and its peril.  These holes pertain to privacy, policy, and predictions.

Policy.  2.2/7. The biggest analytics project in recent history is the $6 billion federal investment in the health exchanges.  The goals of the health exchanges are to enroll people in the health insurance plans of their choice, determine insurance subsidies for individuals, and inform insurance companies so that they could issue policies and bills.

The project touches on all the requisites of analytics including big data collection, multiple sources, integration, embedded algorithms, real time reporting, and state of the art software and hardware.  As everyone knows, the implementation was a terrible failure.

The CBO’s conservative estimate was that 7 million individuals would enroll in the exchanges.  Only 2.2 million did so by the end of 2013.  (This does not include Medicaid enrollment which had its own projections.)  The big federal vendor, CGI, is being blamed for the mess.

Note that CGI was also the vendor for the Commonwealth of Massachusetts which had the worst performance of all states in meeting enrollment numbers despite its long head start as the Romney reform state and its groundbreaking exchange called the Connector. New analytics vendors, including Accenture and Optum, have been brought in for the rescue.

Was it really a result of bad software, hardware, and coding?   Was it  that the design to enroll and determine subsidies had “complexity built-in” because of the legislation that cobbled together existing cumbersome systems, e.g. private health insurance systems?  Was it because of the incessant politics of repeal that distracted policy implementation?  Yes, all of the above.

The big “hole”, in my view, was the lack of communications between the policy makers (the business) and the technology people.  The technologists complained that the business could not make decisions and provide clear guidance.  The business expected the technology companies to know all about the complicated analytics and get the job done, on time.

This ensuing rift where each group did not know how to talk with the other is recognized as a critical failure point.  In fact, those who are stepping into the rescue role have emphasized that there will be management status checks daily “at 9 AM and 5 PM” to bring people together, know the plan, manage the project, stay focused, and solve problems.

Walking around the hole will require a better understanding as to why the business and the technology folks do not communicate well and to recognize that soft people skills can avert hard technical catastrophes.

Predictions:  43-8. The great hope to demonstrate the value of analytics is (advanced) predictions.   It uses all the breadth and depth of big data to go beyond reporting on the past to predicting the future.  So, how could the predictions about the 2014 Super Bowl game between the Sea Hawks and the Broncos be so far off?

The point spread was 3 points but the actual spread was more than 10 times that as the Sea Hawks routed the Broncos and Peyton Manning from the first (mis) play of the game.   Perhaps there is a tribe of analytics “sharps” who are making it big in sports wagering but the facts are that the best of them only win about 53% of the time.

The irony perhaps is that football, like baseball and basketball, is a fully digitized industry unlike most others including healthcare which still struggles to use electronic medical records to capture its key transactions information.  In sports, every play action on the field is captured, recorded, and discussed, resulting in a rich performance database of players in almost every conceivable context, e.g. how a baseball hitter performs relative to a specific pitcher, playing field, regular or post-season game, and so forth.

But, it is clear from the big-miss prediction of the Super Bowl game that some important data that would improve the precision of the model are missing.  The “squares”, who rely on softer data (intuition), think they know this realty of the shortcomings of quant data, although their win rate is no better than that of the sharps.  My personal insight on this is when I was 16 years old I worked as a dog handler at a greyhound racing park.  I took a dog from its pen, to the viewing stand, into the starting gate, and picked it up at the conclusion of the race.   I knew when the dog was nervous, sick, and hyped up.  And I knew when they hit their head going into the gate that they would not recover to win the race.

The “hole” here is the reliance on the big data that is under the lamppost.  In this case, it is the big sports data, most of which is collected…because it can be… without a model in mind and mostly for its entertainment value.  The big data presumption is that if you build it (the database), the predictions will come.  That ain’t necessarily so, even if one runs zillions of simulations on all the yottabyte of big data.  The data have to be right for the model to work.  In the case of sports, there are lots of (“soft”) untapped personal data such as health, resilience, and response to certain threats (and more) that may be important factors in big game performance.   It’s a real short circuiting of predictive modeling to be carried away with the technologies of the yottabytes while avoiding a full understanding of the phenomena under study.

Privacy.  $1B. Target, the retailer, was the poster child for using big data for customer analytics to pump up sales.  It unabashedly collected lots of data on its customers, from a variety of sources, integrated it, and used it for predictive modeling to identify segments that are experiencing “moments that matter” when habits can be influenced to buy new products.  Target touts that “we’ll be sending you coupons for things you want before you even know you want them.”

For example, it developed algorithms about the probability of pregnancy and the delivery date to sell specific products that women buy at different times during their pregnancy.  It identified the women, sent them coupons, and opened its cash registers to amazing profits.

However, as we have learned, it also opened its cash registers, credit card machines, and databases to cybercriminals who stole the personal data of tens of millions of customers.  It is estimated that this error will cost Target over $1B in fraud claims.  Its stock price has fallen over 25% since the incident.

The “hole” is a comfortable one for analytics.   The habit is to uncork technology before its time.   For example, the NSA exploited the technology to tap telephone calls and scrape peoples’ metadata into a database before it confronted the likelihood that world leaders and the public at large would condemn it and it could not defend it in terms of averting terrorism.

Similarly, there was a lot of talk about the “creepiness” of retailers collecting personal data on customers by whatever means possible.  The big appetite for the data to improve sales may have blinded companies from thinking about the consequences and “forgetting” the basic responsibility to protect it.  In the Target case, there are known credit card technology safeguards, including the use of a security microchip, that were ignored.  Additionally, there must be encryption protocols and firewalls to decouple data so that cybercriminals would not find personal identity information.

The simple lesson is that just because the technology exists does mean that it should be used.  Perhaps one route around the “hole” is to “count to ten” before technology genies are let out of the bottle.

In summary, these three holes in of analytics are recurrent themes and threats to fulfilling the promise of analytics.  First, the technology cannot zoom ahead of the sociology.  The need for business results cannot err on the side of the creepy use of personal data to increase sales without a full respect of the need to protect privacy and to honor customers.
Second, big data is not the answer if it is not the right data.  The full potential of predictive modeling requires more thinking and less data processing.
And lastly, the big failures in analytics have less to do with bad machines and buggy software and much more to do with people on either side of the business and technology fence just not talking with one another.
Dwight McNeill is a health policy expert who focuses on analytics for population health, patient engagement, and provider performance. This post originally appeared at his personal blog.

9 replies »

  1. Makes sense. I would imagine however the Fed to be more inept at it than others, though.

  2. Any entity, Perry. Imagine trying to get myriad for-profit private healthcare entities to truly and fully collaborate outside of government by voluntarily contributing their clinical and administrative data in standardized format to some research entity — their competitive “business intelligence.”

    Good luck with that one. There will be stabs at it (e.g., perhaps the fruits of the CommonWell interop alliance), but nothing fully scaled up.

  3. ” The methodological problems will be legion.”

    So was this even a realistic project to have been undertaken by the Federal government, or would any entity have unsurmountable problems in this type of implementation?

  4. “If you look at the ACA in a broad historical sense as an analytics project with an emphasis on collecting, measuring and using data for policy and decision-making purposes, this is clearly one of the most ambitious efforts in recent history by government (or anybody else).”

    Indeed. That would in fact be so. Particularly once you add in genomics data to the wildly varying lengths and breadths of patient medical histories, and the equally disparate heterogeneity of clinical entities. The methodological problems will be legion.

  5. Ok. What if we substitute the words Affordable Care Act for the words “insurance exchanges?”

    If you look at the ACA in a broad historical sense as an analytics project with an emphasis on collecting, measuring and using data for policy and decision-making purposes, this is clearly one of the most ambitious efforts in recent history by government (or anybody else).

    Not sure, but I suspect that’s what Dwight is getting at here…

  6. I love the hyperbolic hype. Biggest in history. One of my close friends is a corporate analyst with Starbucks, up at HQ in Seattle They have more than 300 million Starbucks cards in circulation (~ the size of the U.S. population). They grind those account data productively every day (Joey is a SAS whiz).

    I can guarantee you they didn’t spend $6 billion on that capacity. Nor anywhere microscopically close to it.

    Biggest in history. Seriously?

    Really swell how people get played.

  7. The biggest analytics project in recent history is the $6 billion federal investment in the health exchanges.

    biggest in recent history?

    or biggest in history?