The Rise and Rise of Quantitative Cassandras


Despite an area under the ROC curve of 1, Cassandra’s prophesies were never believed. She neither hedged nor relied on retrospective data – her predictions, such as the Trojan war, were prospectively validated. In medicine, a new type of Cassandra has emerged –  one who speaks in probabilistic tongue, forked unevenly between the probability of being right and the possibility of being wrong. One who, by conceding that she may be categorically wrong, is technically never wrong. We call these new Minervas “predictions.” The Owl of Minerva flies above its denominator.

Deep learning (DL) promises to transform the prediction industry from a stepping stone for academic promotion and tenure to something vaguely useful for clinicians at the patient’s bedside. Economists studying AI believe that AI is revolutionary, revolutionary like the steam engine and the internet, because it better predicts.

Recently published in Nature, a sophisticated DL algorithm was able to predict acute kidney injury (AKI), continuously, in hospitalized patients by extracting data from their electronic health records (EHRs). The algorithm interrogated nearly million EHRS of patients in Veteran Affairs hospitals. As intriguing as their methodology is, it’s less interesting than their results. For every correct prediction of AKI, there were two false positives. The false alarms would have made Cassandra blush, but they’re not bad for prognostic medicine. The DL- generated ROC curve stands head and shoulders above the diagonal representing randomness.

The researchers used a technique called “ablation analysis.” I have no idea how that works but it sounds clever. Let me make a humble prophesy of my own – if unleashed at the bedside the AKI-specific, DL-augmented Cassandra could unleash havoc of a scale one struggles to comprehend.

Leaving aside that the accuracy of algorithms trained retrospectively falls in the real world – as doctors know, there’s a difference between book knowledge and practical knowledge – the major problem is the effect availability of information has on decision making. Prediction is fundamentally information. Information changes us.

For starters, the availability of information encourages the quest for that information. We seek because we can seek. To not know when we can know is to wallow in the regret we may have of knowing that we could have known. The cheaper the information, the more unforgiving the anticipatory regret. Ignorance is bliss but deliberate ignorance is a curse. It is better to have never known that we never knew than to know that we could have known.

Once we know we have to decide what to do with that information. The algorithm predicts AKI before the cardinal marker of AKI – the serum creatinine – is elevated. Thus, we’ll know before we might have known and will be obliged to do something about that knowing. With information, things get both promising and perilous – information is seldom an unalloyed good.

The knowledge that someone who is evidently hypovolemic, and on nephrotoxic medication which could be stopped, is heading towards AKI is useful information. If you know which butterfly in Rio de Janeiro will cause a tornado in Kansas by flapping its wings, you can save Kansas from a tornado. Similarly, a patient can be saved from AKI, renal replacement therapy, lifelong immunosuppressants, and premature death if we only knew who and reversed their renal decline. Information is most potent when it steers the Titanic from the iceberg it is destined to hit. Actionable information leads to a more favorable counterfactual – literally a parallel universe.

Action makes stochastic processes deterministic but action is imprecise when it responds bluntly to imprecise information. There is also the patient never destined to develop AKI – a post-operative patient whose pain was nicely controlled by NSAIDs. The information-philic doctors ran the AKI algorithm on the patient’s EHR, because they could. The algorithm predicted AKI. The doctors then stopped the NSAID, because it is nephrotoxic, and prescribed the patient oxycodone to both control the pain and save the kidney. The patient became dependent on opioids, then developed a fetish for cheap Chinese Fentanyl, then died from euphoric respiratory failure. In a parallel universe the patient lived happily ever after developing neither AKI nor opioid addiction. Turning the butterfly around so that it doesn’t flap its wings towards Kansas saves Kansas from a tornado, but could lead to an even more catastrophic tornado in Santiago.

Then there’s that patient with malignant hypertension who develops back pain and has a mildly elevated creatinine. The team suspects aortic dissection but the algorithm predicts AKI, seemingly plausible because uncontrolled hypertension toughens the hide of the arterioles supplying the kidney. The team wants to rule out aortic dissection with CT, but doesn’t want to give iodinated contrast because it’s allegedly nephrotoxic. The patient has an unenhanced CT – the radiologist can’t exclude dissection and recommends MRI. The MRI, performed three hours later, catches the dissection but the aorta, tired of waiting for reinforcements, ruptures in protest. The patient is rushed to the operating room. The aorta is repaired but the kidneys, exposed to genuine hypovolemia, are toast. The patient needs a renal transplant. Prediction is both an allegation and a self-fulfilling prophesy.

Information confuses. Information also clarifies – but not before it lures doctors to get more information, which can confuse even more. To be certain that the probabilistic AKI is real, the team orders a chest radiograph to look for subtle pulmonary edema, a beacon of the failing kidney. “Subtle” pulmonary edema is, well, subtle. The radiologist smears another probabilistic layer on the prediction. Two probabilities don’t make a certainty but they certainly make a possibility. The patient is caged by the possibility of doom and the doctors frozen in a state of pre-action hypervigilance.

Patient – “nurse, who are those people outside my room? I think I’m being stalked.”

Nurse – “don’t worry they’re not stalkers, they’re nephrologists – your personal Nephrology Seals. They’re on standby in case your kidneys fail.”

Hypervigilance won’t be cheap. Blood will be drawn frequently from the patient to check serum urea, creatinine and electrolytes – when the kidney fails the potassium rises and a high potassium, unchecked, could be fatal. The anticipatory regret of missing an imminently correctable high potassium is both hyperacute and sustained. The patient will be stabbed a thousand times. Impending renal failure, like eschatology, is like jam tomorrow – that the kidneys haven’t failed on Monday doesn’t mean they won’t fail on Friday. The algorithm boasts a lead time which’d make Nostradamus envious – i.e. it predicts doom long before the doom. Long lead times extend the window for prevention but also further the horizon separating safety from potential doom.

The patient with impending AKI will get an ultrasound to exclude a post-obstructive cause of renal failure with the taut “let’s avoid the iceberg rather than respond to hitting it” logic of relieving the obstruction before the obstruction relives the kidneys of its nephrons. The lucky few will have a nephrostomy to save the kidney from permanent renal failure from obstructive nephropathy. Most will simply be shuttled to radiology to have daily measurements of their renal pelvis by a burnt out radiologist day dreaming that AI will relieve them of their mundaneness.

Algorithms are reductionists. They can’t possibly know all the counterfactuals. The AKI-predictor knows about renal failure. But how good is its reductionism? Renal failure is defined by an elevated creatinine. Creatinine is a surrogate. Surrogates are partial truths – if they were whole truths they’d not be called surrogates – meaning they’re neither necessary nor sufficient for the outcome of interest.

The algorithm is largely accurate in predicting the elevation of a biomarker, whose elevation is largely accurate about the disease it is charged with flagging. Imagine Cassandra’s shame if she was 80 % accurate in predicting death – but on the 80 % of occasions she correctly predicted death the person was dead only 80 % of the time. If this sounds nonsensical – welcome to the world of surrogates. Even if an algorithm has 100 % fidelity with the surrogate which defines its success it’ll still fall short of speaking the whole truth.

Since one doesn’t know which patient is a true positive or a false positive, one must assume that the algorithm is telling the truth. But the problem with prediction isn’t just the false positives – it’s also the true positives. The degree of creatinine rise which defines renal failure, overdiagnoses renal failure. Furthermore, as the literature in contrast-induced nephropathy attests, random fluctuations in creatinine can meet the criterion for AKI.

Statisticians reproach doctors for using creatinine cut-offs – a phenomenon called “dichotomania.” They say that creatinine should be used as a continuous variable; if nature doesn’t obey thresholds, why should its custodians? But the mere act of making a decision draws lines even if nature is boundless. To act is to pause the continuous variable. The rage against dichotomania is a phenotype of dichotomania.

The problem, though, isn’t dichotomania; it isn’t the drawing of lines but where the line is drawn. The liberal definition of AKI means that many patients will be conscripted to the ranks of disease. The mild, self-resolving, AKI will not only be conflated with severe AKI but overwhelm severe AKI by its sheer number. AKI will be uncovered left, right and center. Just like podcasts have been devalued by the fact that nearly everyone has a podcast, the incidence of mild AKI will rise remorselessly and devalue severe AKI.

Cassandra, who knew for sure but couldn’t act, knew in vain. The quantitative Cassandras, who don’t know for sure but act, act in vain. Prediction is medicine’s Greek Tragedy. Machine learning could make acute kidney injury quotidian. The mathematical elegance of quantitative Cassandras doesn’t compensate for her overall futility.

Saurabh Jha (aka @RogueRad) is a radiologist and a seer. Like Cassandra, no one believes his prophesies, until it’s too late. This post originally appeared in NephJC.