On Feb 18, IBM announced its purchase of Truven Health Analytics for $2.6 billion. Truven collects and crunches payer data on medical costs and treatments. IBM will combine Truven’s data with recent other data acquisitions from the Cleveland Clinic’s “Explorys” and from Phytel, a software company that manages patient data. These data sets will be fed to Watson’s artificial intelligence engine in hope of helping doctors and administrators improve care and reducing costs. Truven’s data reflects more than 200 million patients’ payment records. Collectively, Watson will now have access to healthcare data on about 300 million patients.
Our question is whether healthcare payer data are so inaccurate and, worse, biased, that they are more likely to mislead than guide? Will the supercomputer’s semiconductors digestion of junk and contradictory information produce digital flatulence or digital decisiveness? On the other hand, despite our cautions, we also encourage IBM and Watson to continue their explorations with these data sets. There is much to learn and little to lose in trying, even if the incoming data are unusually messy, biased, and fragmented.
Will Watson’s diet deliver more noise than knowledge?
First, as noted above, the best data we have—from electronic health records (EHR)–are often seriously flawed, incomplete and inaccurate. The reasons for this are known: patients are seen in many different facilities that can’t communicate with each other because of proprietary data standards and the government’s laissez faire non-insistence on interoperability. Also, patients (as Dr. House reminded us) often lie, use other people’s insurance cards, have confusing names, or have names that healthcare institutions mangle in fascinating and intricate ways. (Hospitals have up to 35 different ways of recording the same name, e.g., Ross Koppel, R Koppel, R J Koppel, Koppel R, Koppel R J, and mistyped or confused, R Koppell, Ross K etcetera) There are myriad other reasons EHR misrepresent reality, including the basic fact that we often don’t know what’s wrong with the patient until many tests are concluded (and even then), patient memories are faulty or the information is embarrassing, most elderly patients have many medications with confusing names and dosages, doctors often want to avoid diagnoses that may prematurely prohibit patients’ ability to return to work or, the opposite, allow some time off of work, etc. In addition, although discomforting for patients to realize, there’s massive ambiguity in medicine. Physicians often don’t know what the heck is going on but are forced to enter specific diagnoses in the EHRs, which can’t handle probabilities or ambiguity. They don’t accept “probably a heart attack but possibly just a muscle tear near the ribs” because the symptoms are so similar.
Payer data are even worse:
Adding to the incomplete and unreliable data in the EHR, we now add the data from insurance claims and payments—the “payer data” that IBM just bought for Watson. Payer data reflect even more opportunities for misdirection than even EHR data because:
- If patients are often not forthcoming about their lives for reasons of embarrassment, privacy concerns, etcetera, then they have myriad understandable and primarily economic reasons to deceive about their health insurance. They may be using the name of a friend or relative who has health insurance, they may have a spouse’s or ex-spouse’s insurance, they may wish not to have certain procedures or conditions shown on their insurance records.
- The USA is unique in not having a unique patient ID. This creates massive havoc and we probably harm and kill a thousand Americans each week from wrong patient errors. Naturally, the ID errors are compounded by the forms submitted to and collected by payers.
- The aforementioned chaos of how even accurate names are recorded carries through in payment data, which is often worse.
- The EHRs have differing data standards and proprietary software that often makes a hash of patient data even before the information is coded for insurance payments.
- When the EHR data are then transubstantiated into payment codes, the ambiguity and uncertainties are “resolved” by algorithms that pick the codes that pay the most to the submitting hospitals and providers (see below for specific examples). Thus, if it’s either a muscle tear or a heart attack, you can be assured that the claim will be for a heart attack. Of course, the payer may demand additional information, which the medical institutions are obliged to submit. But the clarity of the initial vs. the subsequent claims is often as murky as you might suspect given the hundreds of millions of healthcare visits and procedures.
- Patients are often seen by many providers, each with a differing perspective of the patient’s problems and treatments. These diagnoses and treatments are all sent to the payers as claims. The resulting mishmash of information seldom produces unequivocal clarity.
- Even the list of one patient’s medications in one institution in just one episode of care is often more than 30% wrong. When linked with payer information associated with different medication suppliers (e.g., various local drug stores, drugs via mail, in-hospital prescribed and charged drugs, plus—many from different prescribers, plus OTC drugs, the payment information usually bares a scant resemblance to the reality. And this is even before we consider patients who split medications to use cost savings from cutting one 200 mg tablet into two 100 mg pills, or those sharing medication with spouses or friends, information about which may be shared with the clinician, but certainly not the insurance company.
- The coding systems for payments is at best Byzantine. Recently, the US version of the International Classification of Disease codes (ICD-10) in the USA was expanded from 16,000 to some 68,000 codes.The US also has the ICD-10 Procedure Coding System (ICD-10-PCS), a coding system that contains 76,000 codes not used by other countries. Opportunities for confusion and misclassification abound.
- The EHRs themselves are often usability nightmares. Drop down menus of medical conditions or medications may continue onto other screens without warning the clinician that the list continues. Hurried doctors and nurses may just select from the best available option displayed on the one screen in front of them.
In addition to the above problems, there’s the distortion created by the extraordinary time constraints leveraged against even the most idealistic medical practitioner. And even more pernicious, the gaming of the diagnosis to justify the highest DRG (Diagnostic Related Group) payments made by an ever increasing group of hospital-employed physicians.
The diagnostic related group is a payment method used by the payers to compensate hospitals for their care. The process specifies the number of days a typical patient with that diagnosis takes before discharge, and the hospital is then compensated for that typical stay. As is obvious, for a respiratory failure patient who requires intubation and mechanical ventilation for days, weeks, or sometimes even for months, the hospital will be compensated at a much higher rate than for a patient with mild pneumonia requiring observation for 24 hours to ensure his/her antibiotics are effective. Thus, it’s rare to find a patient admitted to a hospital with chest pain who is not admitted as anything other than Acute Coronary Syndrome (ACS)—rather than a less expensive diagnosis.
There are four reasons for this DRG inflation. The first, as noted above, is that it guarantees the highest DRG payment for the hospital.
Second, it means not having to justify and defend yourself for not giving guideline-mandated best practice therapy if you diagnose the chest pain as anything other than ACS, and it later emerges that you were wrong. Increasingly, emergency medicine groups are owned directly by the hospital, if the group is not owned outright, than the group is beholden to the good graces of the hospital, as their dissatisfaction with the emergency department (ED) physicians translates into a lost contract and a new ED group in its place. While this is not universally normative behavior, it’s not uncommon to have facilities track how the ED physicians’ categorize Medicare age patients they admit to the hospital—comparing admission targets (formally or informally) by hospital administration to ensure that the hospital beds are “adequately” used.
The current payment systems rewards some clinicians for making the numbers come out right; that process is sometimes more important than making the ‘right’ diagnoses. That is, errors in which the diagnosis is wrong, and are felt in retrospect to be much more benign, are more readily accepted than a patient who is later found with further study to have misclassified acute coronary syndrome. There are other reasons for this than just money. Because even one mischaracterized patient who was felt to have a benign chest pain syndrome, but who is later found to have ACS will potentially make the hospital’s quality statistics fail to meet current standards. Since the rate of compliance for specified measures for ACS has become essentially 100%. That 100% is at the price of at times massive over treatment of benign chest pain syndromes.
Thirdly it justifies a Cardiac Catheterization Labstudy as the first strategy, often times directly from the emergency department, as well as lucrative outpatient advanced imaging studies. This strategy is enhanced by the fact that these ED studies are fully reimbursed. Needless to say, the payment data (of course) reflects the suspected condition and treatment.
Fourth, since 25% of all malpractice claim payouts in Emergency Medicine are related to missed myocardial infarction, most emergency physicians assume the worst and admit all chest pain patients for a formal ‘rule-out’ admission. As a consequence this force also dilutes the mix of real to non-real coronary syndromes.
The problem is so pandemic that many clinicians now question every diagnosis not made by that clinician him- or herself.
Few criticize referring emergency medicine MD’s when they make a diagnosis of acute coronary syndrome, while admitting to the hospital obvious cases of costochondritis (rib-sternum arthritis), hyperventilation syndrome, or panic attack. The guidelines-based treatment of these mischaracterized cases make their contribution to the “evidence-based” treatment statistics and thus contribute to the hospital’s quality metrics, even though in a more deliberative age a more accurate diagnosis would not beat an economic disadvantage. The clinical rule has become “not all diagnoses are created equal” and the higher the risk of catastrophic outcome, the more serious the prognostic implications of the diagnosis than the more likely that diagnosis will be utilized for the reasons enumerated above.
Systematic and Unsystematic Distortions, Bias, Inaccurate and Confusing Payer Data:
In sum, payer data are often distorted in both directions (providers seeking more money and payers seeking to reduce payments and treatment costs); unreliable from both systematic and unsystematic biases introduced by inaccurate and often incomplete information; clinicians seeking to avoid sicker patients who will have poor outcomes; and data coded in ways to appear certain when the reality is anything but certitude. We feed Watson payer data at our peril, based on assumption that those correctly reflect real medical conditions, all treatments and the succession of treatments that ultimately resulted in cure, improvement, or even greater certainty.
But Perilous Does Not Mean Useless; We Encourage Watson to Continue Researching:
Sometimes lousy data can yield helpful insights. We can’t be certain that that Watson plus Truven and friends can’t build a better model of, say, “metabolic syndrome” than we have now? Perhaps we can use the IBM supercomputer to help build longitudinal models, even from “incomplete and inaccurate” data?
Also, because the EHR data are so messy and incomplete, maybe the payer data, when combined with other information, can help disambiguate parts of the picture. Those of us in medical informatics and medicine know that it’s often hard for experienced physicians to agree on the data describing a patient; it takes time, even when economic incentives are not pressuring a decision in one or another direction. As Keith Campbell puts it, “understandable, reproducible and useful” is the goal, but getting there may be incremental. Or, as the famous line about mathematical modeling goes: “all models are wrong, some are useful.” More, as we said earlier, the truth about most medical practice is that we are still learning. There’s easy stuff, like a broken finger, but the real diagnostic skill comes when there are multiple systems involved and possibly multiple etiologies, e.g., almost any old person or anyone who is very sick.
Why not try?
Crunching data is relatively safe and dramatically cheaper than clinical trials.And even more promising, if Watson can help combine the vast oceans of data from the EHRs—currently maintained in isolated data silosthat have differing data standards and proprietary software–it may accomplish what the federal government would not tackle in their sycophantic response to EHR vendors’ demand of non-regulation. That is, the EHR vendors didn’t want the federal government to establish data standards or requirements for interoperability because they feared it would inhibit sales of their families (or suites) of systems. Better to keep the customers locked within their existing systems. The result has been the metaphorical and real Tower of Babel with which we currently struggle. While the government has finally shown willingness to request data standards and interoperability, the effort is still feeble. We hope that the good folks feeding Watson will address the needs for crosswalks and other means of combing the data so that we can learn something from this rich trove of information. Certainly Watson’s ability to use natural language processing will be extraordinarily valuable in reading progress notes and perhaps help to resolve contradictions in and among patients’ records.
More reasons to give it a shot:
Sufficiently aggregated, EHR and payer data are a record of the ongoing national experiment we call healthcare: namely, different interventions in various contexts. Watson may help to analyze these data. Watson may also disentangle problems of over-diagnosis and reimbursement “up-coding.”
All data are abstractions and the “bad data” also reflect a reality. We may not like it, but it’s there.Thus, these data will predict things within this reality that we don’t care about (e.g., more redheads are slightly more likely to have heart attacks on Thursdays than would be expected by chance.)But, we can’t rule out the possibility that Watson will predict something we do care about. Science is full of serendipity, Watson’s logical crunching may lead to some discoveries that are useful.
To Conclude: As Hippocrates wisely said: Life is short, the art is long, opportunity fleeting, experience delusive, judgement difficult.
This most foundational physician captures precisely the crux and flux of modern medical practice. In spite of 2000 years of continuous and relentless advances,medicine remains more art than science because of the intricate nature of human biology and psychology, the interplay of complex and poorly understood social forces, the enigmatic dimension of spirit, and the variegated interplay of human cultural forces and beliefs which act in concert to make each ‘patient’ bewilderingly unique.
The promise of scientific medicine, the optimal care for individual patients based on the analysis of groups of behaviors remains frustratingly unfilled. However, we are on the cusp of a paradigmatic shift in clinical medicine. The electronic capture of increasing amounts of clinical data results in expanding opportunities for enhancing individual patient care through the analysis of this data by enhanced computing systems of previously unimagined power and depth. Watson offers this opportunity.
There will be false starts and hype as well as hope. Only by exploring these tools, and bringing the results to the crucible of real world practice can we hope to determine if Watson and its progeny will be able to sift and winnow the clinically useful out of the messy data, detritus error laden information that we feed it.
Knowledge discovery through the use of machine learning makes for ‘black box’ types of intellectual constructs. We can’t predict what insights such a system will yield but we should give it a try. In the end, only connections that make sense when we test the findings against the problems of practiced medicine can we determine if Watson helps patients and medical knowledge. We may have doubts, but we owe it to ourselves to see if Watson can help Dr. Watsons’ many patients.
Ross Koppel, Ph.D., FACMI is a sociologist at the University of Pennsylvania, where he is also a Senior Fellow at the Leonard Davis Institute of Healthcare Economics (Wharton), and affiliate faculty at the School of Medicine.
Frank Meissner is a cardiologist in El Paso, Texas. He served as a Flight Surgeon in the USAF for 25 years.