I have read with interest the ongoing conversation about the ProPublica Surgeon Scorecard in THCB and beyond, not because I believe this latest effort at measuring quality will have a significant effect on patient care, but because behind the latest public metric debate – in fact behind all healthcare metric debates – is a major systemic problem. This problem somehow always seems to remain unseen. We acknowledge that measuring healthcare quality is difficult and that using medical data is challenging, but I’m not convinced that people completely understand why or how measurement and data are so difficult in healthcare…nor am I certain that everyone understands the repercussions of those challenges.
As I wrote here, the most promising recent development in medicine is the emphasis on learning from our data. We are finally digitizing records of clinician and patient interactions via the adoption of EMRs. Data warehousing technologies are connecting healthcare’s disparate systems and making data accessible to decision makers. Data will be the foundation for healthcare improvement. However, it is dangerous to assume that accessing raw data is equivalent to accessing relevant information.
All of today’s widely adopted EMR systems were designed to fulfill three purposes: financial reimbursement, narrative communication among clinicians, and legal protection. Now, we aim to use that data for very different purposes: the improvement of health and the discovery of process efficiencies. The impact of the resulting inconsistencies between design and use cannot be overstated, and yet no one else seems to be stating those impacts at all.
I was introduced to this dilemma early, when I was searching for a dissertation topic for my PhD in informatics. A urologist colleague suggested that I help him discover differences in the way various surgeons execute one of the most widely conducted surgeries: the radical retropubic prostatectomy (RRP) or surgical removal of the prostate.
The prostate is in a tight space with sensitive structures on nearly all sides. Some urologists take as wide a margin as possible to decrease the likelihood of leaving any cancer behind. Toward this end, they may remove one or both of the nerves that cause erections, leading to impotence, and in some cases, incontinence. Other surgeons excise a smaller margin, leaving more structures intact but risk leaving a small amount of cancer in the body. In all RRPs, the surgeon also runs the risk of nicking the bladder or related structures.
We know little about a surgeon’s decision making. The characteristics of the patient and his disease may play a role in which approach the surgeon takes. Or the surgeon’s teacher and their beliefs might be the dominant factor. We’re not sure. If we could count what surgeons did, why they did it, and whether it worked, we could make suggestions to all surgeons about what worked best in different situations. At the very least, we would know which surgeons tended to leave more cancer behind and which surgeons were more likely to leave a patient impotent. In theory, it sounds straightforward enough. Unfortunately, actually answering these basic questions was complicated enough to be considered the topic of a dissertation…for an informatics graduate student.
First, to find which patients had prostate cancer, we relied on ICD-9 codes. These codes were of questionable accuracy as a large number of patients with the code for prostate cancer were simply checked for the disease. They were so inaccurate, in fact, that we decided to build software to classify who actually had the disease. Second, despite the use of “standard” measures in pathology reports, pathologists entered important measures –the staging of the tumor and the intermediate outcome of whether or not any cancer was left behind after the surgery–as free text. Third, the surgeons described their surgical approaches (nerve-sparing or non) in unstructured free text in the surgical operative report. Fourth, we found that neither impotence nor incontinence were consistently documented in the record, making it impossible to count that outcome at all. Fifth, we might be able to infer accidental injury to the bladder based on reading between the lines of the operation report, but even that strategy was not 100% reliable.
What we really wanted to know from our study were whether the patient went on to have a long healthy life, recurrence of their cancer, and/or death. We soon found that we could only measure long life and recurrence for patients that chose to remain within the hospital system whose data we were studying. The only way for a hospital system to know this about their patients if the patient went elsewhere was for the hospital system to purchase claims data from a third party. Similarly, if the patient left its system, the hospital would only know the patient died by purchasing the National Death Index from the Center for Disease Control. But the NDI is available only if they intend to use it for research, and the information contained within is 12 to 24 months old (assuming you applied for it 2 – 3 months before it was available).
The good news: I had chosen a field with good job security prospects. The bad news: healthcare doesn’t have insight into the most basic and important information required to improve care.
I have since worked with data from 1500+ clinics, 265+ government, community, and academically-affiliated hospitals, and 10+ health insurers on efforts to learn from medical data how to better deliver patient care across many sub-disciplines of medicine. In my experience, the story above is not exceptional in any way. It is a story that would not surprise clinical researchers, health economics outcomes researchers, epidemiologists, QI specialists, or anyone that has attempted to tease truth from clinical data. Outside of these highly specialized fields, however, few people are familiar with what, exactly, makes measuring healthcare quality so difficult.
I do not mean to imply that learning from healthcare data is not possible. As Mark Friedberg stated in his thoughtful THCB piece about the ProPublica Scorecard, “Scoring the Surgeon Scorecard,” there are proper, scientifically credible methods of validating claims and other types of medical data: validating measures, checking the accuracy of the source data, optimizing choices of statistical methods, and calculating the reliability and risk of measurement error. We need these methods to overcome the limitations of what data we currently can capture and how we can capture it. As an example of why this is necessary, he points to inconsistencies discovered in the assignment of surgeons to surgeries performed in Part A versus Part B claims – a rather important detail.
Such checks and balances have a cost, though. The Surgeon Scorecard and most epidemiology, economics, and health services research efforts have the advantages of budgets and time allocated to perform the type of validation Dr. Friedberg describes. Furthermore, the results of such projects are typically papers that can be debated in the larger medical community, where errors can be brought to light quickly and safely.
But what happens when this same data is used to inform thousands of healthcare decisions in organizations and institutions across the US? Data warehousing and business intelligence tools play an increasingly important role in facilitating organizational leadership’s decision making. This same clinical data–with all of its limitation–are being used by such systems to assess clinical performance, identify patients in need of escalated care, and direct resources. The signing of Medicare Access and CHIP Reauthorization Program (MACRA), and the alignment of Meaningful Use 3 with it, will substantially increase the number of medical decisions made based on aggregated clinical data. Many of these data-driven inferences will appear as guidance in electronic medical records. Some already do.
The decisions made from these conclusions on a daily basis are far more impactful–both to individual patients and in aggregate–than a ProPublica report or even the results of most multi-million dollar randomized controlled trials (which result from extremely vigilant data collection and validation). And yet, thus far this question of how to use data not designed for quality improvement for quality improvement has been largely ignored.
And just what are the repercussions? Ultimately, we don’t know that, either. Each year, our healthcare system kills an estimated 400,000 people by mistake. However, that number might be closer to 200,000. We are fairly certain it’s at least 98,000, or that’s what the last study said, 16 years ago. We accidentally maim a lot more than that. Millions maybe.
Lost in our understandable disappointment with the size of those figures and our determination to improve them is the threat posed by our inability to quantify and understand something so critical. These deaths are a tragedy – the results of past healthcare failures. Our inability to even count the number of deaths (i.e., failures) reliably should be an outrage. And still there is little attention paid to the limitations our data infrastructure imposes on any legitimate attempt at understanding, let alone improving healthcare.
So what can be done to improve our current situation?
Innovations offering incremental improvement in data collection, management, and use will certainly help. We will improve technologies capable of translating a clinician’s words to billing codes. We will see increased use of machine learning and natural language processing to seek patterns in noisy, sparse data, helping us to understand what did happen in past healthcare encounters and what should happen in the future. Hospitals will consider hiring data professionals to work alongside doctors to capture evidence of what’s happening and why in a more useful manner. We will form new companies to track down actual patient outcomes across systems. Done right, these technologies and techniques could offer insights into not only better ways to take care of patients but also how confident we are in those decisions and what evidence exists to support them.
Additionally, we have the option–already mentioned–of validating our data. Should the same best practices of data validation employed by researchers be applied to the growing number of suggested care decisions delivered by clinical-data-dependent technologies? As a patient, I hope so. As an industry insider, I know that this will rarely be the case; in fact, I question whether it is even feasible. Keep in mind, in addition to paying for validation in time and money, we face variation in the documentation of medical condition and care between institutions and organizations. As a result, the validation of any one measure at any one facility is unlikely to simply transfer to the next. And this month’s long awaited switch from the 13,000 diagnostic codes of ICD-9 to the 68,000 codes of ICD-10 is unlikely to improve the validity or reliability of assignments.
Finally, we can acknowledge our limitations. A consistent theme in the ProPublica debate is this notion of “good enough.” In science and reporting we tend to accept “good enough” results if the researchers disclose the limitations of the conclusions. These insights are useful in helping potential users of information determine what to make of the results, how far to trust them, how to apply them to their own facilities, etc. Business intelligence and decision support tools that present the results of automated analyses based on clinical data could, similarly, disclose known data and methodological limitations, so clinicians could judge their utility for each patient.
We should consider all of these and probably more. None, however, will address the larger problem. One of my favorite quotes is written on the walls of the Institute for Healthcare Improvement, where our organization is currently housed: “Every system is perfectly designed to get the results it gets.” Today’s systems of data collection and analysis were designed to meet their goals of financial, legal, and narrative documentation, and they do so admirably. Nowhere in the design of today’s widely adopted systems did we insert the requirement that we learn from our data nor the requirement that we adapt care to those learnings. Until this requirement permeates the design of our information systems, we will continue to have to guess the answer to many important questions, including how many people we kill each year by accident.
—
Leonard D’Avolio Ph.D., is the CEO and co-founder of Cyft, assistant professor at Harvard Medical School, an advisor to Ariadne Labs and the Helmsley Charitable Trust Foundation. He can be followed on twitter @ldavolio and his writings and bio appear at http://scholar.harvard.edu/len
Categories: Uncategorized
Great Blog
Data blocking and lack of transparency are 2 of the biggest detriments to not only patient empowerment and engagement, but also to patient care, safety, the patient experience, and consumerism.
Eye opening! I am reminded of the year-old decision by the FDA to reschedule hydrocodone from Class 3 to a Class 2 more tightly controlled drug in hopes of stemming its abuse (i.e., seen as part of the war on drugs). In no way was it a unanimous decision (i.e., final vote 2:1), with strong advocates among physicians on both sides of the issue. News reports suggest that physicians treating patients with unremitting pain were largely for NOT rescheduling, as hydrocodone is inexpensive and can be easily managed with minimal side effects by responsible people. Those in favor of the rescheduling seemed to be fighting not for their patients, but for the honor of medicine, to improve physician image, withhold hydrocodone from “pill mills”, thwart the criminal element, and other non-patient interests. Noteworthy in the entire discussion is the FDA’s dependence on gross statistics on production and distribution as well as the general effects of overall drug abuse. There was a complete lack of relevant medical data or consideration of the impact on physicians, their ability to treat chronic pain, or their patients. The resulting decision to take such “blunt” action as to reschedule hydrocodone virtually cut off many chronic pain patients from this low-cost effective treatment option. Again, from only news reports (i.e., unfortunately, there is no better data available), many physicians and patients (and particularly those who could afford more costly options with increased side-effects) initially adjusted to the additional inconvenience, costs and discomfort. Since then, however, it seems that physician behavior is now being examined and monitored based on their hydrocodone prescribing patters (i.e., with no detail or understanding of their practice discipline or patient population, etc.), such that some or even many physicians (again, no real data available), in order to preserve their reputations and avoid confrontation with the FDA, are simply discontinuing their prescription of hydrocodone. While some patients can absorb the higher cost and worse side-effects of alternative treatments for chronic pain, many cannot . . . such as those patients trying to maintain a career or otherwise live a routine life, plus indigent patients who lack resources and must rely on clinics for care. Making matters worse, few people from either side of the argument had much hope that the rescheduling would keep hydrocodone out of the hands of the bad guys, while effectively (though perhaps unintentionally) assuring that the responsible people who were legitimately benefiting from the treatment no longer have access. Seems like another case of a reliance on readily available data driving an ineffective decision.
Terrific and sobering post. Thanks for it. We need this kind of constant reminder from the front lines of the limitations of the current data infrastructure, sources, tools, and analytics. In part, what I hear you saying, or what I’m reading into it, is that the rhetoric of Big Data today is outstripping the reality–at least insofar as it’s yielding significant the systemic change and improvement we so desperately need. Because God knows there’s a lotta buzz about Big Data nowadays. “Revelations” from data seem to be the new prayer to invoke…..and the promised land. We’ve seen this movie before. And yet. And yet. Much of the promise seem legit. Hope is offered, and we need hope. On the consumer use front, we are years away still…but maybe not more than 10. Things there ARE starting to pick up. Interest and use is growing. The literacy and numeracy problems are serious though, as we all know. I’ve long thought that every high school student in junior or senior year should be required to take 2 course–one on how money actually works in the real world, how you save, invest, budget, the stock market etc; and the second course on sex–the full story on how to have and do it responsibly and joyfully (subject to community norms if you like). I’m starting to believe that a 3rd course should be required…or at least a few seminars: how the health system works, how to navigate it and search online for reliable medical and healthcare information, including studies and evidence-based provider performance ratings and treatment comparisons. Money, sex, health. Our society suffers from ignorance and weirdness on all three critical fronts.
Thanks John. Honestly, the challenge with writing this piece was not coming off as too hyperbolic. But the reality sure doesn’t lend itself to subtlety. It’s like we lose 2 jumbo jets of Americans a day and we don’t bothering to look for the black boxes! (see – it’s hard:)
I don’t think most execs, administrators, clinicians are aware of the extent of it. In one multi-hospital study I was a part of we discovered almost 80% of patients with at least 1 ICD-9 code for colorectal cancer never had any evidence of the disease (most had colonoscopies performed). My oncologist friend bet me her career that this was wrong. We triple checked. She owes me.
Not all that big a deal when the only thing the data is used for is making sure we’re getting paid. Quite a big deal when you’re using the same data to determine who actually has cancer, who needs escalated care, or critical follow up.
As to how it might be addressed – Analytics companies that facilitate the discovery of previously uncoded complications, treatments, and patterns shed light on the gap between what’s structured and what really happened. The good news is, you can’t “un-see” such results and it changes the way you think about medical data.
We as vendors serving up analytics and / or decision support have a responsibility here. Those supplying the ability to identify those at risk, or populations that need attention, should be very clear as to how they arrived at those suggestions – and it’s important to acknowledge that no matter how well-informed, they are suggestions.
Consumers of such services have a responsibility as well. They should take ownership of the evaluation process. I wrote about why this is important and how they might do so a couple years ago. The article’s a bit more NLP-oriented, but mostly applies to evaluation of any analytics solution: http://ubm.io/1jPB8tw
Sometimes the data just isn’t there. Nissan knows more about my car buying “experience” than my doctors do about my last operation. Nissan emailed me twice and called when I didn’t respond. When I accidentally answered during a busy time, they scheduled a more appropriate time to get my feedback. I wasn’t re-admitted within 60 days following my last hospital encounter so I’m not worth a follow up. We’re seeing companies move in to this space and for good reason. It’s really hard to argue the value of a healthcare procedure without knowing the outcome. No, today’s “satisfaction” surveys don’t count as outcomes.
Thanks again for the question and the chance to draw attention to the issue
Great post Leonard.
A couple of points and a question:
If anything, I think you may be overestimating the general public’s level of data literacy. And also the level of curiousity. We need to know more. I think the data on data is likely to be shocking. My sense is the same is true for CEOs and policy makers. Not enough people look at a number and stop for a minute to ask the simple question “Hey, wait a second – does this make sense?” That’s a pretty freaking impressive number. Or that’s a pretty godawfully number. How did they get here? Most people assume that the numbers are good. I think that is changing and will continue to change which is important.
I’m going to take a wild educated guess and predict that things are going to change in the next few years as we see a further democratization of the data landscape. People may not be off doing their own clinical trials and 10 year quantified self projects, but they’re going to start playing with the numbers and calling out the truly bogus results.
I’d love to see some startups come along that give people the tools to do more with numbers. I have no idea what that looks like, but I’m pretty sure there will be a demand. Any thought from your perspective on how this could be accomplished?
Thanks @RogueRad
Good piece. Views from the trenches are always important. People underestimate how difficult it is to do high quality health services research, and despite mastery of statistics and attention to detail, how limited the data can still be.