I enjoyed Agatha Christie’s Hercule Poirot. Not only did the ingenious Belgian solve the murder so artfully. But someone identifiable is killed and someone identifiable is the killer.
Epidemiological studies are whodunits, too. Except you don’t know who has been killed, what the murder weapon is, or who the killer is. You only know that a murder may have happened.
A study found a higher incidence of breast cancer with false positive than true negative mammograms. Meaning false positive findings – findings thought to be cancer but aren’t – should lead to vigilance, not celebration.
Here’s an image to help put the absolute difference in perspective: If in the right aisle of a hall there are 600 women with false positive and in the left aisle 600 women with true negative mammograms, one extra woman in the right aisle will develop cancer over 10 years. Once we factor lead time and overdiagnosis, the extra cancer will probably not reduce longevity.
Whether it is the tiny benefit of statins or a tiny absolute risk increase in epidemiological studies, no effect is too small to fret about. The authors, to their credit, handled the results modestly and merely suggested that a false positive status be used in predicting risk of cancer — not that the false-positive result itself somehow causes an increase in cancer risk.
Effect size correlates poorly with media sensationalism. Media coverage was extensive, partly because false positives increasing cancer risk is Twilight Zonish – just when you thought it was safe to go outside.
New stories about the study — dozens of them from major media outlets — mainly got it right with their headlines. They suggested that false-positive results were “linked to” or “tied to” or “associated with”increased cancer risk. That’s the responsible approach to reporting on results of observational studies, no matter how big the data sample or how sophisticated the analysis. But a few stories made leaps of logic that just aren’t supported by the evidence.
How can a false positive mammogram increase the risk of cancer? If all true negative mammograms were reanalyzed in a parallel universe where radiologists called them positive, would some true negatives become cancer? Merely calling a mammogram positive shouldn’t cause cancer, unless radiologists have magical powers I’m unaware of.
Here’s an explanation. Findings concerning for cancer are a risk marker for cancer. In mammogram lexicon, the “probably benign” category (BI-RADS 3) includes findings which aren’t always benign – remember it’s “probably” benign not “certainly” benign. A few “probably benign” mammograms (less than 2 %) are cancer.
“False Positive Mammograms” included BI-RADS 3, but also BI-RADS 0 – when radiologists ask for more images because of something they see or can’t see. Suffice to say “false positive mammogram” isn’t a homogeneous group. Mammograms are falsely positive for many reasons.
Furthermore, radiologists have different operating characteristics – meaning they vary in sensitivity (ability to find cancer) and specificity (ability to discard non-cancer). Some radiologists call more false positives than others.
Another, more tenuous, explanation is that radiologists scrutinize more carefully the mammograms of women at higher risk of breast cancer. The raised scrutiny increases sensitivity and reduces specificity – think airport security on steroids – leading to false positives. The high risk leads to the false positive and the high risk leads to more cancers.
It is tempting to conclude that false positive mammograms will create an epidemic of cancers, at a population level, because of the high frequency of false positives. Let’s reason logically. Remember, false positive mammograms don’t cause cancer. The probability of a false positive mammogram is 60% over ten years. So if the majority of women have false positive mammograms at some point, and if false positive mammograms increase the risk – an increase above average, presumably – of breast cancer, then the majority of women would therefore be at above-average risk. This doesn’t make sense. How can most women be above average? Unless it is Lake Wobegon where everyone is above average?
This study is a taste of big data – strange associations with small effect sizes, amplified by the media, confusing us and diminishing our pleasure in our finite time on this warming planet.
We have two choices: defer unconditionally to the statistics, or defer to plausibility and common sense. We can explain the results of this study, or question its methodology. Questioning methodologies is not questioning researcher’s competence or integrity – a weak instrument can remain weak despite a yeoman effort.
The researchers studied a large database, which did not lack for sample size – there were over 3 million in the original cohort. About 40% of patients were excluded because of missing variables.
The researchers rightly excluded true positive and false negative mammograms from the analysis. There were four times as many false negative as true positive mammograms in the original cohort, which seems odd because it implies the sensitivity of mammograms is lower than the well-established figure of 84 – 90%. But in a secondary database sensitivity and specificity of mammograms, and baseline rate of cancer, can’t be reliably computed. What other conclusions can’t be reliably drawn from the database?
The increased cancer risk from false positive mammograms persisted through the various categories of breast density. This is important because breast density affects the sensitivity and specificity of mammograms.
The highest relative risk of cancer from false positive mammograms was in least dense breasts. The hazard ratio was 1.7 for least dense, and 1.4 for most dense, breasts. This finding is both odd and not odd. I can explain it: when a radiologist sees an abnormality it is more likely real, whether cancer or a risk marker for cancer, the less dense the breast.
My point is this: if you try hard you can explain, after the fact, tiny effects in a study such as this one. It’s more difficult knowing the limitations of methodologies, the instrument, which derived the tiny effect. All that needs to be said is “we adjusted for” and the case is closed.
There is an asymmetry between two unknowns – the validity of what we think we know about the world and the validity of methodologies which challenge what we think we know about the world. When multiplied by big data, this asymmetry could accumulate fallacies at a scale one struggles to comprehend.
Will this study change clinical practice? Will radiologists call fewer false positives for fear of their association with cancer, or more false positives for fear of missing a risk marker for cancer? It’s hard to say but it is unlikely that this study will reduce anxiety in this anxiety-rich corner of medicine.
I never could predict the plot twisters at the end of Poirot’s investigation. Spoiler alert: watch Death on the Nile carefully! But even Hercule Poirot would have been ill prepared for the ultimate plot twister in epidemiological whodunits: no one was murdered.
Saurabh Jha is skeptical by nature not because he hates you. He can be reached on Twitter @RogueRad. This post was first published in Health News Review.