Speak Softly and Carry (Good) Data

Dale SandersAfter a recent talk, a client came up to me with a puzzled expression.

We made small talk. We talked about the weather. We talked about sports. Finally, he got to the point.

“When are you going to talk about Big Data?” he asked somewhat impatiently.

“I’m not,” I responded.

It transpired that he was expecting to hear about all of the miraculous things Big Data was going to do for his healthcare system. He had come expecting to hear my Big Data talk.

Apparently, this was something he had been looking forward to all week. He was to be disappointed.

As a matter of fact, I almost never talk about Big Data.

And for the most part, nobody at my company, HealthCatalyst, does either.

Which might seem a little strange for a company in the data and analytics business. You’d think we’d be singing the praises of Big Data from morning till night. But we aren’t. There’s a reason for that, which I think is important.

Now at Intermountain Healthcare, I led a team that pioneered the use of analytics and process improvement to dramatically improve outcomes. Many of the concepts that are key to value-based care and the accountable care model originated – at least in part – in the work we did at Intermountain. I will forever be indebted to that experience.

So you’d think I’d be the first to be talking about Big Data.

But I’m not.

For one thing, I find the concept of Big Data to be not terribly helpful to the work I do.The data sets we’re dealing with typically doesn’t meet the definition of Big Data. In many cases the hospital and health system data warehouses we’re analyzing include millions and millions of records, sure.  But Silicon Valley is dealing with huge volumes, velocities, and varieties in data.
Those data scientists are typically talking about extremely large data sets when they’re talking about as Big Data.

So, strictly speaking healthcare is not really dealing with Big Data. We’re just average data. I’m convinced that the folks and vendors who hype big data the most are suffering from some sort of Freudian insufficiency complex.

The other problem is more fundamental: everybody is talking about Big Data.

Name a problem and we’re throwing Big Data at it. Cancer. Alzheimer’s Disease. New drug discovery. ER wait times. Google recently announced an initiative to use Big Data to cure death.

You may have noticed that people in our industry have a habit of picking up on the next new thing. Two years ago it was social networking. Last year it was the cloud.

This year it’s Big Data. Next year, it’s going to be — well, we’ll just have to see what it’s going to be.

For a company in the analytics business, this period presents a strategic challenge. How do you talk about yourself when everybody else is talking about the same thing and your customers’ heads are starting to spin?

Luckily, in Health Catalyst’s case, the answer is quite straightforward: you talk about reality.

That’s generally how you combat hype. You talk about solid facts. You talk about pragmatism. You give a structure to the chaos. At Health Catalyst, we understand that every organization is at a different point in the progression from Healthcare 1.0 – the old way of doing things – to Healthcare 2.0 – the point where an organization is consistently able to scientifically leverage technology to improve outcomes and cut costs.

We’ve developed a comprehensive model for organizations to help with the transition and we call it the Healthcare Analytics Adoption Model: a core set of foundational principles. This approach acknowledges the core challenges facing most organizations.

Good Data / Bad Data

It drives me crazy when I hear people talking about the things they’re doing with data that have little if no grounding in reality.

A recent story in the news made me stop and think about this and wonder if we’re reaching a teachable moment.

A news organization called ProPublica did that something that sounded quite original and potentially groundbreaking. A team of reporters took CMS readmissions data and used it to develop an innovative sounding scorecard for surgeons. (You’ll find the original story and the follow up here).

My reaction was a “Huh. They can tell all this from CMS readmission data? That’s extremely impressive. ”

Without going into a whole lot of detail, the ProPublica researchers made it look simple. Surgeons had been scored using a simple system. If a patient was readmitted to the hospital within 30 days, that counted as a ding. If a patient died, that counted as a ding. The number had been risk-adjusted, although exactly what that meant wasn’t explained.  And just to be sure, they had checked with “experts.”

This score was presented to consumers as a reliable indicator of surgeon quality and safety.

There was no mention of potential limitations, which are considerable: the impact of patient population, the role of the care team, the behavior of individual patients that contributed to their readmission. Worst of all, there was no mention of the potential problems involved when data used to bill the government for patient services is used as a proxy for patient safety.

In short, there was no context. An ordinary consumer looking at the numbers wouldn’t know what to think and would almost certainly conclude the wrong thing.

The truth is, it’s complicated.

It’s always more complicated.

Now, a word about my background: before moving to lead the informatics team at Intermountain Healthcare I spent ten years working in the Air Force as a “command, control, communications, and intelligence officer” and with the National Security Agency. That experience shaped the way I look at information management.

My job involved the collection of massive amounts of data from every source of data that might contribute to better military and national security decision making, and then trying to actually make that data sensible and useful in situations where the wrong decision is measured at the international level.

To help us track the reliability of information, we would assign a very carefully calculated credibility score to the sources of data. Every intelligence report, very force status dashboard, every sensor warning system included some form of data quality, context, and credibility summary along with it, so that decision makers— the consumers of the data— could make the most informed decision possible.

Today, when I look at a data set, like the ProPublica report, my first question is: “What’s the source of this data and is it a credible source of data for the decisions that it’s supposed to improve?” Those early years of experience as a military intelligence officer, along with an undergraduate philosophy class called, “What is Truth?” taught me to pause and challenge the data until it either crumbles or prevails.

I learned to look for credible numbers. To look for red flags that indicated that information that told me one thing might not actually mean what I thought it did. And one of the most important things we learned in the Air Force when it came to data analysis and situational assessment: When in doubt, under-react.

The ProPublica incident suggests that those of us who produce and analyze healthcare data have a moral obligation to describe the context and degree of certainty that exists in the data we produce. We are obligated to ensure that the consumers of the data understand the decisions and conclusions that they should and shouldn’t make, based upon the data that they are consuming. The average American does not have a sophisticated understanding of data-driven analysis. We’re going to have to make data literacy a goal for ordinary Americans and perhaps even more importantly for the politicians in Washington and the managers who are using data as a policy tool and as a driver for their business.

We’re going to need to give folks the tools to evaluate the credibility of the data they use and train them to ask tough questions about what numbers can and cannot do for our decisions.

Thirty years after entering this business I am convinced as ever that technology and data combined can help drive the transformation of healthcare and American business. We are entering the Human Era of Data that will dramatically impact our evolution as a species.

So no, you won’t hear me talking much about Big Data.

Dale Sanders is Senior Vice President, Strategy.

Categories: Uncategorized

Tagged as:

8 replies »

  1. The “Standard Error of the Mean” (SE) problem, too. SD/sqrt(n). As n increases, the SE decreases toward zero. Goes to nominal p-value “statistical significance” vs actionable CLINICAL significance, which is more likely to become known via reproducibility through well-designed smaller repeated sample runs that consistently show useful differences where you’re on to something.

    Friend of mine is a corporate analyst for Starbucks up at HQ in Seattle. He once said “nobody samples any more, you just hit against the universe,” which, in their case is 400 million active Starbucks cards. They have SAS and a huge, hyperspeed multiprocesser hardware architecture.

    That’s all fine if you’re selling no-fat mocha frappes by the acre-feet, and miniscule differentials still yield significant profits.

  2. Welcome to m world, my friend. In wellness, there is no “good data.” Companies are told that participants reduced BMI and therefore money was saved.” There are about 16 things wrong with that one sentence, and yet the Mercers of the world are very quick to endorser this nonsense.

  3. Yes, we need to look beyond the jargon and the hype. Are we talking good data about the hospital, the surgeon, or the patient? In all three cases, the context is very important and in all three cases, business and privacy are the reality that Dale refers to. The business of healthcare is not driven to adopt the kinds of practices that would provide “good” context for the hospital and doctor. The hospitals, including the non-for-profit ones, guard their privacy in every interaction with the doctors and the patients. The doctors, understandably, also guard their privacy and we need a better discussion about how to balance that.

    The patient’s privacy may be the most important component of “good” data and the reality here is not good. The best outcome and context data for patients comes from surveillance across institutions and across physicians. Surveillance and aggregation of patient data is now occurring on a massive scale. Surescripts boasts of “230 Million Patients connected”. Optum’s numbers and aggregation across payer and provider are almost as large and the amount of aggregated data per patient is even larger than Surescripts. And then we have IMS Health… Whether we label the hundreds of universal surveillance businesses in healthcare Big Data or not is, and here I agree with Dale, unimportant. What is important is how “good” is this data about each of us. Why are we not allowed to see it and benefit from it or delete it? Why is the context around our personal data always determined by the institutions and not by ourselves?

    Technology now gives us almost unlimited ability to create “good” data on our institutions, doctors, and patients. We need to begin asking the question of where does the hidden collection of data end and how do we balance business and civil society interests in this new information technology age. My recent post on universal patient identifiers as a way to get “good” data is all about this: https://thehealthcareblog.com/blog/2015/09/01/universal-patient-identifiers-for-the-21st-century/

  4. Great piece Dale. Those of us who have spent years developing and applying machine learning and natural language processing (NLP) methods to solve real world health data challenges are hugely frustrated with hyped up “Big Data” claims. Hyperbolic claims of big data end up minimizing some really important developments in machine learning and natural language processing – those same methods often associated with making sense of truly big data.

    Several data “competitions” and published studies have proven that complementing traditional business intelligence approaches with machine learning + natural language processing is the most effective way to identify and predict some of healthcare’s most relevant challenges including: identification of potential readmissions; identification of actual problems, treatments, and tests; determining what actually happened in complex cases, and monitoring for important conditions such as pneumonia.

    We built software that sits on top of traditional data warehouse / BI products and combine structured + unstructured data to make predictions such as who’s likely to be high utilizers, who’s likely to acquire an infection, who’s eligible for a different risk stratification, etc. This is real world, targeted stuff, based on the needs of health organizations. Yet, all it takes is one “there’s nothing big data can’t solve” pitch and I find myself having to separate fact from fiction.

    You’ve got it exactly right. To make data work in healthcare, understand the context, consider the intended uses, and pick the right tool for the job.


  5. I just think Dale can’t handle being trendy. If we look into Health Catalysts big data sets I’m sure we can pinpoint a cost effective generic medication we can get him!

  6. I also agree. “Big data” succors actuaries and economists much more than those of us concerned with clinical issues. For us, “big data” falls victim to what Alfred North Whitehead termed “the fallacy of the misplaced concreteness.” It’s not that the numbers are totally irrelevant or that the scientific inferences simply fallacious. It’s that they are incomplete, often sorely incomplete windows into health and the experience of illness.

  7. I completely agree. Hyperventilation about Big Data is only rivaled by personalized medicine ..