On Feb 18, IBM announced its purchase of Truven Health Analytics for $2.6 billion. Truven collects and crunches payer data on medical costs and treatments. IBM will combine Truven’s data with recent other data acquisitions from the Cleveland Clinic’s “Explorys” and from Phytel, a software company that manages patient data. These data sets will be fed to Watson’s artificial intelligence engine in hope of helping doctors and administrators improve care and reducing costs. Truven’s data reflects more than 200 million patients’ payment records. Collectively, Watson will now have access to healthcare data on about 300 million patients.
Our question is whether healthcare payer data are so inaccurate and, worse, biased, that they are more likely to mislead than guide? Will the supercomputer’s semiconductors digestion of junk and contradictory information produce digital flatulence or digital decisiveness? On the other hand, despite our cautions, we also encourage IBM and Watson to continue their explorations with these data sets. There is much to learn and little to lose in trying, even if the incoming data are unusually messy, biased, and fragmented.
Will Watson’s diet deliver more noise than knowledge?
First, as noted above, the best data we have—from electronic health records (EHR)–are often seriously flawed, incomplete and inaccurate. The reasons for this are known: patients are seen in many different facilities that can’t communicate with each other because of proprietary data standards and the government’s laissez faire non-insistence on interoperability. Also, patients (as Dr. House reminded us) often lie, use other people’s insurance cards, have confusing names, or have names that healthcare institutions mangle in fascinating and intricate ways. (Hospitals have up to 35 different ways of recording the same name, e.g., Ross Koppel, R Koppel, R J Koppel, Koppel R, Koppel R J, and mistyped or confused, R Koppell, Ross K etcetera) There are myriad other reasons EHR misrepresent reality, including the basic fact that we often don’t know what’s wrong with the patient until many tests are concluded (and even then), patient memories are faulty or the information is embarrassing, most elderly patients have many medications with confusing names and dosages, doctors often want to avoid diagnoses that may prematurely prohibit patients’ ability to return to work or, the opposite, allow some time off of work, etc. In addition, although discomforting for patients to realize, there’s massive ambiguity in medicine. Physicians often don’t know what the heck is going on but are forced to enter specific diagnoses in the EHRs, which can’t handle probabilities or ambiguity. They don’t accept “probably a heart attack but possibly just a muscle tear near the ribs” because the symptoms are so similar.