One big theme in AI research has been the idea of interpretability. How should AI systems explain their decisions to engender trust in their human users? Can we trust a decision if we don’t understand the factors that informed it?
I’ll have a lot more to say on the latter question some other time, which is philosophical rather than technical in nature, but today I wanted to share some of our research into the first question. Can our models explain their decisions in a way that can convince humans to trust them?
I am a radiologist, which makes me something of an expert in the field of human image analysis. We are often asked to explain our assessment of an image, to our colleagues or other doctors or patients. In general, there are two things we express.
What part of the image we are looking at.
What specific features we are seeing in the image.
This is partially what a radiology report is. We describe a feature, give a location, and then synthesise a conclusion. For example:
There is an irregular mass with microcalcification in the upper outer quadrant of the breast. Findings are consistent with malignancy.
You don’t need to understand the words I used here, but the point is that the features (irregular mass, microcalcification) are consistent with the diagnosis (breast cancer, malignancy). A doctor reading this report already sees internal consistency, and that reassures them that the report isn’t wrong. An common example of a wrong report could be:
Today on THCB Spotlights, Matthew speaks with Jeremy Orr, CEO of Medial EarlySign. Medial EarlySign does complex algorithmic detection of elevated risk trajectories for high-burden serious diseases, and the progression towards chronic diseases such as diabetes. Tune in to hear more about this AI/ML company that has been working on their algorithms since before many had even heard about machine learning, what they’ve been doing with Kaiser Permanente and Geisinger, and where they are going next.
Filmed at the HLTH Conference in Las Vegas, October 2019.
Medical AI testing is unsafe, and that isn’t likely to change anytime soon.
No regulator is seriously considering implementing “pharmaceutical style” clinical trials for AI prior to marketing approval, and evidence strongly suggests that pre-clinical testing of medical AI systems is not enough to ensure that they are safe to use. As discussed in a previous post, factors ranging from the laboratory effect to automation bias can contribute to substantial disconnects between pre-clinical performance of AI systems and downstream medical outcomes. As a result, we urgently need mechanisms to detect and mitigate the dangers that under-tested medical AI systems may pose in the clinic.
In a recent preprint co-authored with Jared Dunnmon from Chris Ré’s group at Stanford, we offer a new explanation for the discrepancy between pre-clinical testing and downstream outcomes: hidden stratification. Before explaining what this means, we want to set the scene by saying that this effect appears to be pervasive, underappreciated, and could lead to serious patient harm even in AI systems that have been approved by regulators.
But there is an upside here as well. Looking at the failures of pre-clinical testing through the lens of hidden stratification may offer us a way to make regulation more effective, without overturning the entire system and without dramatically increasing the compliance burden on developers.
Despite an area under the ROC curve of 1, Cassandra’s
prophesies were never believed. She neither hedged nor relied on retrospective
data – her predictions, such as the Trojan war, were prospectively validated. In
medicine, a new type of Cassandra has emerged –
one who speaks in probabilistic tongue, forked unevenly between the
probability of being right and the possibility of being wrong. One who, by conceding
that she may be categorically wrong, is technically never wrong. We call these
new Minervas “predictions.” The Owl of Minerva flies above its denominator.
Deep learning (DL) promises to transform the prediction
industry from a stepping stone for academic promotion and tenure to something
vaguely useful for clinicians at the patient’s bedside. Economists studying AI believe that AI is revolutionary,
revolutionary like the steam engine and the internet, because it better predicts.
Recently published in Nature, a sophisticated DL algorithm was able to predict acute kidney injury (AKI), continuously, in hospitalized patients by extracting data from their electronic health records (EHRs). The algorithm interrogated nearly million EHRS of patients in Veteran Affairs hospitals. As intriguing as their methodology is, it’s less interesting than their results. For every correct prediction of AKI, there were two false positives. The false alarms would have made Cassandra blush, but they’re not bad for prognostic medicine. The DL- generated ROC curve stands head and shoulders above the diagonal representing randomness.
The researchers used a technique called “ablation analysis.”
I have no idea how that works but it sounds clever. Let me make a humble
prophesy of my own – if unleashed at the bedside the AKI-specific, DL-augmented
Cassandra could unleash havoc of a scale one struggles to comprehend.
Leaving aside that the accuracy of algorithms trained
retrospectively falls in the real world – as doctors know, there’s a difference
between book knowledge and practical knowledge – the major problem is the
effect availability of information has on decision making. Prediction is
fundamentally information. Information changes us.
Two years ago we wouldn’t have believed it — the U.S. Congress is considering broad privacy and data protection legislation in 2019. There is some bipartisan support and a strong possibility that legislation will be passed. Two recent articles in The Washington Post and AP News will help you get up to speed.
Federal privacy legislation would have a huge impact on all healthcare stakeholders, including patients. Here’s an overview of the ground we’ll cover in this post:
Six Key Issues for Healthcare
We are aware of at least 5 proposed Congressional bills and 16 Privacy Frameworks/Principles. These are listed in the Appendix below; please feel free to update these lists in your comments. In this post we’ll focus on providing background and describing issues. In a future post we will compare and contrast specific legislative proposals.