By SAURABH JHA
Sequels generally disappoint. Jason couldn’t match the fear he generated in the original Friday the 13th. The sequel to the Parachute, a satirical piece canvassing PubMed for randomized controlled trials (RCTs) comparing parachutes to placebo, matched its brilliance, and even exceeded it, though the margin can’t be confirmed with statistical significance. The Parachute, published in BMJ’s Christmas edition, will go down in history with Jonathan Swift’s Modest Proposal and Frederic Bastiat’s Candlemakers’ Petition as timeless satire in which pedagogy punched above, indeed depended on, their absurdity.
In the Parachute, researchers concluded, deadpan, that since no RCT has tested the efficacy of parachutes when jumping off a plane, there is insufficient evidence to recommend them. At first glance, the joke was on RCTs and those who have an unmoored zeal for them. But that’d be a satirical conclusion. Sure, some want RCTs for everything, for whom absence of evidence means no evidence. But that’s because of a bigger problem which is that we refuse to acknowledge that causality has degrees, shades of gray, yet causality can sometimes be black and white. Somethings are self-evident.
In medicine, causation, even when it’s not correlation, is often probabilistic. Even the dreaded cerebral malaria doesn’t kill everyone. If you jump from a plane at 10, 000 feet without a parachute death isn’t probabilistic, it is certain. And we know this despite the absence of rigorous empiricism. It’s common sense. We need sound science to tease apart probabilities, and grayer the causality the sounder the empiricism must be to accord the treatment its correct quantitative benefit, the apotheosis of this sound science being an RCT. When empiricism ventures into certainties, it’s no longer sound science. It is parody.
If the femoral artery is nicked and blood spurts to the ceiling more forcefully than Bellagio’s fountains you don’t need an RCT to make a case for stopping the bleeding, even though all bleeding stops, eventually. But you do need an RCT if you’re testing which of the fine sutures at your disposal is better at sewing the femoral artery. The key point is the treatment effect – the mere act of stopping the bleed is a parachute, a huge treatment effect, which’d be idiotic to test in an RCT. Improving on the high treatment effect, even or particularly modestly, needs an RCT. The history of medicine is the history of parachutes and finer parachutes. RCTs became important when newer parachutes allegedly became better than their predecessors.
The point of the parachute satire is that the obvious doesn’t need empirical evidence. It is a joke on non-judgmentalism, or egalitarianism of judgment, on the objectively sincere but willfully naïve null hypothesis where all things remain equally possible until we have data.
There has been no RCT showing that cleaning one’s posterior after expulsion of detritus improves outcomes over placebo. This is our daily parachute. Yet some in the east may justifiably protest the superiority of the Occidental method of cleaning over their method of using hand and water without a well-designed RCT. Okay, that’s too much information. Plus, I’m unsure such an RCT would even be feasible as the cross over rate would be so high that no propensity-matching will adjust for the intention to wipe, but you get my drift.
The original Parachute satire is now folklore with an impressive H-index to boot. That it has been cited over thousand times is also satirical – the joke is on the H-index, a seriously flawed metric which is taken very seriously by serious academics. But it also means that to get a joke into a peer review publication you need to have a citation for your joke! The joke is also on the criminally unfunny Reviewer 2.
The problem with the parachute metaphor is that many physicians want their pet treatment, believing it to be a parachute, to be exempt from an RCT. This, too, is a consequence of non-judgmentalism, a scientific relativism where every shade of gray thinks it is black and white. One physician’s parachute is another physician’s umbrella. This is partly a result of the problem RCTs are trying to solve – treatment effects are probabilistic and when the added margins are so small, parachutes become difficult to disprove with certainty. You can’t rule out a parachute.
Patient: Was it God who I should thank for saving me from cardiogenic shock?
Cardiologist: In hindsight, I think it was a parachute.
Patient: Does this parachute have a name?
Cardiologist: We call it Impella.
Patient: Praise be to the Impella.
Cardiologist: Wait, it may have been the Swan Ganz catheter. Perhaps two parachutes saved you. Or maybe three, if we include Crestor.
The problem with RCTs is agreeing on equipoise – a state of genuine uncertainty that an intervention has net benefits. Equipoise is a tricky beast which exposes the parachute problem. If two dogmatic cardiac imagers are both certain that cardiac CT and SPECT, respectively, are the best first line test for suspected ischemia, then there’s equipoise. That they’re both certain about their respective modality doesn’t lessen the equipoise. That they disagree so vehemently with each other merely confirms equipoise. The key point is that when one physician thinks an intervention is a parachute and the other believes it’s an umbrella, there’s equipoise.
Equipoise, a zone of maximum uncertainty, is a war zone. We disagree most passionately about smallest effect sizes. No one argues about the efficacy of parachutes. To do an RCT you need consensus that there is equipoise. But the first rule of equipoise is that some believe there’s no equipoise – this is the crux of the tension. You can’t recruit cardiac imagers to a multi-center RCT comparing cardiac CT to SPECT if they believe SPECT is a parachute.
Consensus inevitably drifts to the lowest common denominator. As an example, when my family plans to eat out there’s fierce disagreement between my wife – who likes the finer taste of French cuisine, my kids – whose Americanized palate favors pizza, and me – my Neanderthalic palate craves goat curry. We argue and then we end up eating rice and lentils at home. Consensus is an equal opportunity spoil sport.
Equipoise has become bland and RCTs, instead of being daring, often recruit the lowest-risk patients for an intervention. RCTs have become contrived show rooms with the generalizability of Potemkin villages. Parachute’s sequel was a multi-center RCT in which people jumping from an aircraft were randomized to parachutes and backpack. There was no crossover. Protocol violation was nil but there was a cheeky catch. The aircraft was on the ground. Thus, the first RCT of parachutes, powered to make us laugh, was a null trial.
Point taken. But what was their point? Simply put, parachutes are useless if not needed. The pedagogy delivered was resounding precisely because of the absurdity of the trial. If you want to generalize an RCT you must choose the right patients, sick patients, patients on whom you’d actually use the treatment you’re testing. You must get your equipoise right. That was their point, made brilliantly. The joke wasn’t on RCTs; the joke was on equipoise. Equipoise is now the safest of safe spaces; college, joke-phobic, millennials would be envious. Equipoise is bollux.
The “Parachute Returns” satire had a mixed reception with audible consternation in some quarters. Though it may just be me and, admittedly, I find making Germans laugh easier than Americans, I was surprised by the provenance of the researchers, who hailed from Boston, better known for serious quantitative social engineers than stand-up quantitative comedians. Satire is best when it mocks your biases.
The quantitative sciences have become parody even, or particularly, when they don’t intend satire. An endlessly cited study concluded that medical errors are the third leading cause of death. The researchers estimated the national burden of medical errors from a mere thirty-five patients; it was the empirical version of feeding the multitude – the story from the New Testament of feeding of 5000 from five loaves and two breads. How can one take researchers seriously? I couldn’t. I had no rebuttal except satire.
In the age of unprecedented data-driven rationalism satire keeps judgment alive. To be fair, the statisticians, the gatekeepers of the quantitative sciences, have a stronger handle on satire than doctors. The Gaussian distribution has in-built absurdity. For example, because height follows a normal distribution, and the tails of the bell-shaped curve go on and on, a quantitative purist may conclude there’s a non-zero chance that an adult can be taller than a street light; it’s our judgment which says that this isn’t just improbable but impossible. Gauss might have pleaded – don’t take me literally, I mean statistically, I’m only an approximation.
A statistician once showed that the myth of storks delivering babies can’t empirically be falsified. There is, indeed, a correlation in Europe between live births and storks. The correlation coefficient was 0.62 with a p-value of 0.008. Radiologists would love to have that degree of correlation with each other when reading chest radiographs. The joke wasn’t on storks but simple linear regression, and for all the “correlation isn’t causation” wisdom, the pedagogic value of “stork deliver babies” is priceless.
If faith began where our scientific understanding ended, satire marks the boundaries of statistical certainty. Satire marks no-go areas where judgment still reigns supreme; a real estate larger than many believe. The irony of uncertainty is that we’re most uncertain of the true nature of treatment differences when the differences are the smallest. It’s easy seeing that Everest is taller than Matterhorn. But it takes more sophisticated measuring to confirm that Lhotse is taller than Makalu. The sophistication required of the quantitative sciences is inversely proportional to the effect size it seeks to prove. It’s as if mathematics is asking us to take a chill pill.
The penumbra of uncertainty is an eternal flame. Though the conventional wisdom is that a large enough sample size can douse uncertainty, even large n’s create problems. The renowned psychologist and uber researcher Paul Meehl conjectured that as the sample size approaches infinity there’s 50 % chance that we’ll reject the null hypothesis when we shouldn’t. With large sample sizes everything becomes statistically significant. Small n increases uncertainty and large n increases irrelevance. What a poetic trade-off! If psychology research has reproducibility problems, epidemiology is one giant shruggie.
When our endeavors become too big for their boots satire rears its absurd head. Satire is our check and balance. We’re trying to get too much out of the quantitative sciences. Satire marks the territory empiricism should stay clear of. If empiricism befriended satire it could be even greater, because satire keeps us humble.
The absurd coexists with the serious and like pigs and farmers resembling each other in the closing scene of Animal Farm, it’s no longer possible to tell apart the deservedly serious from the blithering nonsense. And that’s why we need satire more than ever.
Congratulations to the BMJ for keeping satire alive.
About the Author
Saurabh Jha is a frequent author of satire, and sometimes its subject. He can be reached on Twitter @RogueRad