By LUKE OAKDEN-RAYNER MD, JARED DUNNMON, PhD
Medical AI testing is unsafe, and that isn’t likely to change anytime soon.
No regulator is seriously considering implementing “pharmaceutical style” clinical trials for AI prior to marketing approval, and evidence strongly suggests that pre-clinical testing of medical AI systems is not enough to ensure that they are safe to use. As discussed in a previous post, factors ranging from the laboratory effect to automation bias can contribute to substantial disconnects between pre-clinical performance of AI systems and downstream medical outcomes. As a result, we urgently need mechanisms to detect and mitigate the dangers that under-tested medical AI systems may pose in the clinic.
In a recent preprint co-authored with Jared Dunnmon from Chris Ré’s group at Stanford, we offer a new explanation for the discrepancy between pre-clinical testing and downstream outcomes: hidden stratification. Before explaining what this means, we want to set the scene by saying that this effect appears to be pervasive, underappreciated, and could lead to serious patient harm even in AI systems that have been approved by regulators.
But there is an upside here as well. Looking at the failures of pre-clinical testing through the lens of hidden stratification may offer us a way to make regulation more effective, without overturning the entire system and without dramatically increasing the compliance burden on developers.Continue reading…