Several email lists I am on were abuzz last week about the publication of a paper that was described in a press release from Indiana University to demonstrate that “machine learning — the same computer science discipline that helped create voice recognition systems, self-driving cars and credit card fraud detection systems — can drastically improve both the cost and quality of health care in the United States.” The press release referred to a study published by an Indiana faculty member in the journal, Artificial Intelligence in Medicine .
While I am a proponent of computer applications that aim to improve the quality and cost of healthcare, I also believe we must be careful about the claims being made for them, especially those derived from results from scientific research.
After reading and analyzing the paper, I am skeptical of the claims made not only by the press release but also by the authors themselves. My concern is less about their research methods, although I have some serious qualms about them I will describe below, but more so with the press release that was issued by their university public relations office. Furthermore, as always seems to happen when technology is hyped, the press release was picked up and echoed across the Internet, followed by the inevitable conflation of its findings. Sure enough, one high-profile blogger wrote, “physicians who used an AI framework to make patient care decisions had patient outcomes that were 50 percent better than physicians who did not use AI.” It is clear from the paper that physicians did not actually use such a framework, which was only applied retrospectively to clinical data.
What exactly did the study show? Basically, the researchers obtained a small data set for one clinical condition in one institution’s electronic health record and applied some complex data mining techniques to show that lower cost and better outcomes could be achieved by following the options suggested by the machine learning algorithm instead of what the clinicians actually did. The claim, therefore, is that if the data mining were followed by the clinicians instead of their own decision-making, then better and cheaper care would ensue.
How many nurses does it take to care for a hospitalized patient? No, that’s not a bad version of a light bulb joke; it’s a serious question, with thousands of lives and billions of dollars resting on the answer. Several studies (such as here and here) published over the last decade have shown that having more nurses per patient is associated with fewer complications and lower mortality. It makes sense.
Yet these studies have been criticized on several grounds. First, they examined staffing levels for hospitals as a whole, not at the level of individual units. Secondly, they compared well-staffed hospitals against poorly staffed ones, raising the possibility that staffing levels were a mere marker for other aspects of quality such as leadership commitment or funding. Finally, they based their findings on average patient load, failing to take into account patient turnover.
Last week’s NEJM contains the best study to date on this crucial issue. It examined nearly 200,000 admissions to 43 units in a “high quality hospital.” While the authors don’t name the hospital, they do tell us that the institution is a US News top rated medical center, has achieved nursing “Magnet” status, and, during the study period, had a mortality rate nearly 40 percent below that predicted for its case-mix. In other words, it was no laggard.
As one could guess from its pedigree and outcomes, the hospital’s approach to nurse staffing was not stingy. Of 176,000 nursing shifts during the study period, only 16 percent were significantly below the established target (the targets are presumably based on patient volume and acuity, but are not well described in the paper). The authors found that patients who experienced a single understaffed shift had a 2 percent higher mortality rate than ones who didn’t. Each additional understaffed shift carried a similar, and additive, risk. This means that the one-in-three patients who experienced three such shifts during their hospital stay had a 6 percent higher mortality than the few patients who didn’t experience any. If the FDA discovered that a new medication was associated with a 2 percent excess mortality rate, you can bet that the agency would withdraw it from the market faster than you could say “Sidney Wolfe.”
The effects of high patient turnover were even more striking. Exposure to a shift with unusually high turnover (7 percent of all shifts met this definition) was associated with a 4 percent increased odds of death. Apparently, patient turnover – admissions, discharges, and transfers – is to hospital units and nurses as takeoffs and landings are to airplanes and flight crews: a single 5-hour flight (one takeoff/landing) is far less stressful, and much safer, than five hour-long flights (5 takeoffs/landings).Continue reading…
A terrific article in The New York Times Magazine this summer described the decade-long effort on the part of IBM artificial intelligence researchers to build a computer that can beat humans in the game of “Jeopardy!” Since I’m not a computer scientist, their pursuit struck me at first as, well, trivial. But as I read the story, I came to understand that the advance may herald the birth of truly usable artificial intelligence for clinical decision-making.
And that is a big deal.
I’ve lamented, including in an article in this month’s Health Affairs, on the curious omission of diagnostic errors from the patient safety radar screen. Part of the problem is that diagnostic errors are awfully hard to fix. The best we’ve been able to do is improve information flow to try to prevent handoff errors, and teach ourselves to perform meta-cognition: that is, we can think about our own thinking, so that we are aware of common pitfalls and catch them before we pull our diagnostic trigger.
These solutions are fine, but they go only so far. In the age of Google, you’d think we’d be on the cusp of developing a computer that is a better diagnostician than the average doctor. Unfortunately, computer scientists have thought we were close to this same breakthrough for the past 40 years and both they and practicing clinicians have always come away disappointed. Before getting to the Jeopardy-playing computer, I’ll start by recounting the generally sad history of artificial intelligence (AI) in medicine, some of it drawn from our chapter on diagnostic errors in Internal Bleeding: