Last week, a study in the New England Journal of Medicine called into question the effectiveness of surgical checklists for preventing harm.
Atul Gawande—one of the original researchers demonstrating the effectiveness of such checklists and author of a book on the subject—quickly wrote a rebuttal on the The Incidental Economist.
He writes, “I wish the Ontario study were better,” and I join him in that assessment, but want to take it a step further.
Gawande first criticizes the study for being underpowered. I had a hard time swallowing this argument given they looked at over 200,000 cases from 100 hospitals. I had to do the math. A quick calculation shows that given the rates of death in their sample, they only had about 40% power .
Then I became curious about Gawande’s original study. They achieved better than 80% power with just over 7,500 cases. How is this possible?!?
The most important thing I keep in mind when I think about statistical significance—other than the importance of clinical significance —is that not only does it depend on the sample size, but also the baseline prevalence and the magnitude of the difference you are looking for. In Gawande’s original study, the baseline prevalence of death was 1.5%.
This is substantially higher than the 0.7% in the Ontario study. When your baseline prevalence approaches the extremes (i.e.—0% or 50%) you have to pump up the sample size to achieve statistical significance.
So, Gawande’s study achieved adequate power because their baseline rate was higher and the difference they found was bigger. The Ontario study would have needed a little over twice as many cases to achieve 80% power.
This raises an important question: why didn’t the Ontario study look at more cases?