One of the computer applications that has received the most attention in healthcare is Watson, the IBM system that achieved fame by beating humans at the television game show, Jeopardy!. Sometimes it seems there is such hype around Watson that people do not realize what the system actually does. Watson is a type of computer application known as a “question-answering system.” It works similarly to a search engine, but instead of retrieving “documents” (e.g., articles, Web pages, images, etc.), it outputs “answers” (or at least short snippets of text that are likely to contain answers to questions posed to it).
As one who has done research in information retrieval (IR, also sometimes called “search”) for over two decades, I am interested in how Watson works and how well it performs on the tasks for which it is used. As someone also interested in IR applied to health and biomedicine, I am even more curious about its healthcare applications. Since winning at Jeopardy!, Watson has “graduated medical school” and “started its medical career”. The latter reference touts Watson as an alternative to the “meaningful use” program providing incentives for electronic health record (EHR) adoption, but I see Watson as a very different application, and one potentially benefitting from the growing quantity of clinical data, especially the standards-based data we will hopefully see in Stage 2 of the program. (I also have skepticism for some of these proposed uses of Watson, such as its “crunching” through EHR data to “learn” medicine. Those advocating Watson performing this task need to understand the limits to observational studies in medicine.)
One concern I have had about Watson is that the publicity around it has been mostly news articles and press releases. As an evidence-based informatician, I would like to see more scientific analysis, i.e., what does Watson do to improve healthcare and how successful is it at doing so? I was therefore pleased to come across a journal article evaluating Watson . In this first evaluation in the medical domain, Watson was trained using several resources from internal medicine, such as ACP Medicine, PIER, Merck Manual, and MKSAP. Watson was applied, and further trained with 5000 questions, in Doctor’s Dilemma, a competition somewhat like Jeopardy! that is run by American College of Physicians and in which medical trainees participate each year. A sample question from the paper is, Familial adenomatous polyposis is caused by mutations of this gene, with the answer being, APC Gene. (Googling the text of the question gives the correct answer at the top of its ranking to this and the two other sample questions provided in the paper).
Watson was evaluated on an additional 188 unseen questions . The primary outcome measure was recall (number of correct answers) at 10 results shown, and performance varied from 0.49 for the baseline system to 0.77 for the fully adapted and trained system. In other words, looking at the top ten answers for these 188 questions, 77% of those Watson provided were correct.
A recent blog posting calls for a “universal EMR” for the entire healthcare system. The author provides an example and correctly laments how lack of access to the complete data about a patient impedes optimal clinical care. I would add that quality improvement, clinical research, and public health are impeded by this situation as well.
However, I do not agree that a “universal EMR” is the best way to solve this problem. Instead, I would advocate that we need universal access to underlying clinical data, from which many different types of electronic health records (EHRs), personal health records (PHRs), and other applications can emerge.
What we really need for optimal use of health information is not an application but a platform. This notion has been advanced by many, perhaps most eloquently by Drs. Kenneth Mandl and Isaac Kohane of Boston Children’s Hospital [1,2]. Their work is being manifested in the SMART platform that is being funded by an ONC SHARP Award.
Several email lists I am on were abuzz last week about the publication of a paper that was described in a press release from Indiana University to demonstrate that “machine learning — the same computer science discipline that helped create voice recognition systems, self-driving cars and credit card fraud detection systems — can drastically improve both the cost and quality of health care in the United States.” The press release referred to a study published by an Indiana faculty member in the journal, Artificial Intelligence in Medicine .
While I am a proponent of computer applications that aim to improve the quality and cost of healthcare, I also believe we must be careful about the claims being made for them, especially those derived from results from scientific research.
After reading and analyzing the paper, I am skeptical of the claims made not only by the press release but also by the authors themselves. My concern is less about their research methods, although I have some serious qualms about them I will describe below, but more so with the press release that was issued by their university public relations office. Furthermore, as always seems to happen when technology is hyped, the press release was picked up and echoed across the Internet, followed by the inevitable conflation of its findings. Sure enough, one high-profile blogger wrote, “physicians who used an AI framework to make patient care decisions had patient outcomes that were 50 percent better than physicians who did not use AI.” It is clear from the paper that physicians did not actually use such a framework, which was only applied retrospectively to clinical data.
What exactly did the study show? Basically, the researchers obtained a small data set for one clinical condition in one institution’s electronic health record and applied some complex data mining techniques to show that lower cost and better outcomes could be achieved by following the options suggested by the machine learning algorithm instead of what the clinicians actually did. The claim, therefore, is that if the data mining were followed by the clinicians instead of their own decision-making, then better and cheaper care would ensue.
Most tools used in medicine require knowledge and skills of both those who develop them and use them. Even tools that are themselves innocuous can lead to patient harm.
For example, while it is difficult to directly harm a patient with a stethoscope, patients can be harmed when improper use of the stethoscope leads to them having tests and/or treatments they do not need (or not having tests and treatments they do need). More directly harmful interventions, such as invasive tests and treatments, can harm patients through their use as well.
To this end, health information technology (HIT) can harm patients. The direct harm from computer use in the care of patients is minimal, but the indirect harm can potentially be extraordinary. HIT usage can, for example, store results in an electronic health record (EHR) incompletely or incorrectly. Clinical decision support may lead clinician astray or may distract them with unnecessary excessive information. Medical imaging may improperly render findings.
Search engines may lead clinicians or patients to incorrect information. The informatics professionals who oversee implementation of HIT may not follow best practices to maximize successful use and minimize negative consequences. All of these harms and more were well-documented in the Institute of Medicine (IOM) report published last year on HIT and patient safety .
One aspect of HIT safety was brought to our attention when a critical care physician at our medical center, Dr. Jeffery Gold, noted that clinical trainees were increasingly not seeing the big picture of a patient’s care due to information being “hidden in plain sight,” i.e., behind a myriad of computer screens and not easily aggregated into a single picture. This is especially problematic where he works, in the intensive care unit (ICU), where the generation of data is vast, i.e., found to average about 1300 data points per 24 hours . This led us to perform an experiment where physicians in training were provided a sample case and asked to review an ICU case for sign-out to another physician . Our results found that for 14 clinical issues, only an average of 41% of issues (range 16-68% for individual issues) were uncovered.
Everyone, including this blog writer, has been touting the virtues of the vast troves of data already or soon to be available in the electronic health record (EHR), which will usher in the learning healthcare system [1, 2]. There is sometimes unbridled enthusiasm that the data captured in clinical systems, perhaps combined with research data such as gene sequencing, will effortlessly provide us knowledge of what works in healthcare and how new treatments can be developed [3, 4]. The data is unstructured? No problem, just apply natural language processing .
I honestly share in this enthusiasm, but I also realize that it needs to be tempered, or at least given a dose of reality. In particular, we must remember that our great data analytics and algorithms will only get us so far. If we have poor underlying data, the analyses may end up misleading us. We must be careful for problems of data incompleteness and incorrectness.
There are all sorts of reasons for inadequate data in EHR systems. Probably the main one is that those who enter data, i.e., physicians and other clinicians, are usually doing so for reasons other than data analysis. I have often said that clinical documentation can be what stands between a busy clinician and going home for dinner, i.e., he or she has to finish charting before ending the work day.
I also know of many clinicians whose enthusiasm for entering correct and complete data is tempered by their view of the entry of it as a data blackhole. That is, they enter data in but never derive out its benefits. I like to think that most clinicians would relish the opportunity to look at aggregate views of their patients in their practices and/or be able to identify patients who are outliers in one measure or another. Yet a common complaint I hear from clinicians is that data capture priorities are more driven by the hospital or clinic trying to maximize their reimbursement than to aid clinicians in providing better patient care.