U.S. health care policies should be based on solid evidence, especially those policies with life-and-death consequences. All too often, though, they are not. Consider the recommendation by congressional advisors that the government should favor basic ambulances with only minimal equipment and less trained staff over advanced ambulances with more life-saving equipment and better trained staff. A poorly controlled study, however, claimed that patients were more likely to die during or after riding in the advanced ambulances than in the basic (but cheaper) ambulances.
Why would “basic” ambulances (with less life-saving equipment and with lesser trained staff) be better than the more advanced ambulances? They probably were not, and we’ll show how the data supporting the benefits of “basic” ambulances are unreliable, and often confuse cause and effect. Worse perhaps, the study offers yet another example of economic research devoid of context generating dubious national policy.
The Study
Researchers at the University of Chicago and Harvard Medical School used insurance data to examine how well a large sample of Medicare beneficiaries fared after ambulance transport for out-of-hospital emergencies. They compared those sent in basic life support ambulances vs. people transported in advanced life support ambulances.
The results, published in the Annals of Internal Medicine, are of course counterintuitive: patients transported to the hospital in Advanced Life Support ambulances were more likely to die than those riding in the simpler, basic ambulances.
Questioning the Study’s Continuing Impact
The study is continuing to reverberate in research and policy circles. Almost a year after its publication, it was vigorously debated at the prestigious conference of the American College of Emergency Physicians (ACEP). Most emergency medicine leaders there questioned the veracity of the study and wondered about the absence of collaboration with emergency medicine specialists, who are trained in ambulance dispatch decisions that match the severely ill or dying patients with the most effective emergency transport. The emergency physicians also expressed worry that Medicare might reduce reimbursement for advanced ambulances on the basis of this unreliable study.
More troubling, in an recent interview , one national expert, Dr. Howard Mell, worried that the study represented a “ticking time bomb” (personal communication) given the new president’s emphasis on cuts to health care.
Most severely ill Medicare patients are currently transported by advanced life support ambulances, so it would be great if the cheaper ones were better. But there are several reasons why it’s premature for policymakers to act on this finding—yet another untrustworthy study with poor research methods, but much media hype.
The Evidence: Confusing Cause and Effect
First, the study seems to assume that severely ill patients are assigned to either an advanced or a basic ambulance by the flip of a coin. But this is not the case. Policy demands that advanced ambulances are sent to sicker patients, those further from the hospital and those more likely to die on route. It’s not random selection (like gold standard randomized trials); it’s often carefully considered selection.
Second, the study is almost devoid of data on how sick the patients were in each type of ambulance—exactly the sort of information emergency services often consider in assigning advanced life support ambulances in an effort to save critically ill patients.
This study wrongly creates the impression that advanced ambulances cause more deaths. In fact, they transport patients who are already more likely to die.
One large study shows that advanced ambulance teams are twice as likely as basic ambulances to pick up people with respiratory distress, serious breathing conditions, resulting in more deaths. In other words, people who are barely breathing are 100% more likely to get more advanced ambulances, making it appear that advanced ambulances “cause” more deaths when it is the opposite. People who can’t breathe and are more likely to die, are sent advanced ambulances in efforts to save their lives (the very definition of triage).
As can be seen in the above figure, patients transported in advanced ambulances were far more likely to be suffering life-threatening conditions. They were almost twice as likely to have supplemental oxygen, were a third more likely to be admitted to the hospital, and had 12 times the rate of electrocardiograph monitoring. Moreover, only patients in the advanced support ambulances had intravenous lines.
The graph above, likewise illustrates the more severe clinical conditions of patients transported in advanced ambulances. They were a fifth more likely to have very low blood pressure, and a third more likely to have very high blood pressure. They were almost twice as likely to have asthma or emphysema, and almost four times more likely to suffer from respiratory depression.
This is hardly the result of a flip of the coin. Remember also, all of these conditions occurred before hospitalization. All of the advanced ambulance patients were more likely to die from a large number of severe conditions occurring at the time they were picked up. It is virtually impossible to state that the health of patients transported by basic and advanced life support ambulances were the same.
What’s more, basic ambulances are more likely to pick up patients from skilled nursing facilities, where they are used as taxis to move healthy elderly to routine medical procedures—not usually near-death events.
Yet, another problem with the study is that it is correlational: research comparing patients during the same period of time. There is no calculation of any change in (or “effect” on) deaths. This type of design is so weak that the international scientific body that reviews tens of thousands of medical studies (Cochrane) immediately rejects it as evidence—even before examining any other weaknesses.
Not surprisingly, in an online forum, emergency medical technicians who attempt to resuscitate seriously ill patients reacted skeptically to the study’s findings.
One EMT responded, “We don’t send basic life support ambulance to a head-on car crash on a freeway.”
Another stated, “A basic unit probably won’t be activated for an elderly person who’s difficult to arouse, complaining of chest pain.”
And another: “… if you are receiving an advanced life support ambulance from the start, it is because the dispatch center infers your situation is severe enough to require ALS [an advanced life support] ambulance. So you are already more likely to experience poorer outcomes if you are being sent an ALS [the advanced life support] ambulance.”
Despite these limitations, the article’s authors proclaimed the salutary implications of their study, even calculating that the country would save many millions by abandoning more expensive advanced ambulances.
And, as usual, media outlets, including the Washington Post, trumpeted the findings under exaggerated headlines: “Need an ambulance? Why you might not want the more sophisticated version.”
Health policies with life-and-death consequences deserve strong evidence. At present, too much of the research in this field is weak. The National Institutes of Health have identified a disturbing phenomenon, the non-reproducibility of science. Unreliable studies confuse doctors, scientists, local politicians, and policymakers. They can also harm patients. Less expensive medical care is sometimes more effective, wise and more humane. But if the federal government overgeneralizes that message from unreliable studies on emergency care, we may see fewer patients survive their journeys to the hospital. More generally, we must not let weak research dictate policy that appears to be politically attractive but is foolish or harmful.
Stephen Soumerai is Professor of Population Medicine and teaches research methods at Harvard Medical School and the Harvard Pilgrim Health Care Institute.
Professor Ross Koppel teaches research methods and statistics in the Sociology department at the University of Pennsylvania and is a Senior Fellow at the Leonard Davis Institute (Wharton) of Health Economics. He is also affiliate professor of medicine at Penn’s medical school
Categories: Uncategorized
This is a wonderful representation of the misapplication of research. Although we are inundated daily with the focus on data-driven healthcare decisions, and data-driven policy formation we are truly rudimentary in our interpretation of results as a society.
Interestingly, I had to laugh to myself when in the body of this article you spoke of the Cochrane database. I recently reviewed a particular surgical technique that the Cochrane Database of Systematic Reviews published. Unfortunately, Milliman Guidelines used Cochrane information to decide whether or not they’re going to cover the particular surgical intervention. Imagine my surprise when I dug deeper into the COCHRANE Database and found that there decision to adversely deny the procedure was based on ONE STUDY!
The point here is, evidence-based medicine as well as evidence-based policy has a long way to go to making genuine differences.
I’m not terribly persuaded by the “correlation doesn’t equal causation” argument in any context, including the Annals of Int Med paper on ambulances that Steve and Ross write about. If a correlation is tight, then it’s reasonable to infer causation and to act accordingly. If we could trust the correlation between higher mortality rates and transportation by advanced life support (ALS) ambulances reported by the Annals paper, I for one would prefer to be picked up by a basic life support (BLS) ambulance for non-heart attack emergencies even if I didn’t know exactly what was causing the alleged higher mortality in the ALS ambulances.
But I don’t trust the alleged correlation. The lesson I draw from Steve and Ross’s essay is that the correlation reported by the Annals study is not only not tight, it is probably bogus. The correlation is almost certainly bogus because the patients transported by the ALS ambulances are closer to death than those picked up by BLS ambulances, and the authors were unable to adjust their mortality data accurately. They could not adjust accurately because they were working only with administrative data; that data does not include information on numerous confounders; and the two methods they used to eliminate confounders (propensity scoring and instrumental variable analysis) were based on sweeping, vaguely articulated assumptions for which the authors offered no evidence.
The authors noted that their propensity scoring method “is susceptible to confounding by any unobserved patient characteristics associated with survival and ALS use; however, because ambulance dispatch
protocols prioritize ALS for the conditions we studied, such individual-level confounding is plausibly minimal.” I have no idea what that means. I’m especially annoyed by “plausibly minimal.” I don’t know what “minimal” means or why I should agree, and the weasel word “plausibly” only adds to my bafflement.
The authors noted their use of “the instrumental variable of county-level variation in overall ALS prevalence to predict the likelihood that a patient would receive ALS … is less susceptible to confounding by unobserved patient characteristics but is subject to confounding
by associations between rates of ALS use and other county characteristics that affect mortality.” Again, what on earth does this mean? They say the instrumental variable method “is less susceptible to confounding.” Less susceptible than what? Their dubious propensity scoring method? Why is that the gold standard? And how much “less”?
If the authors had clearly identified a mechanism or a set of mechanisms that could explain why ALS patients would suffer higher mortality rates long after they left their ambulance, this paper might be a little more believable. But they don’t. They don’t even lay out a clearly stated hypothesis.
This paper illustrates the cavalier attitude toward evidence that has characterized the managed care movement over the last half century.
“our growing refusal to distinguish between correlation and causation when it suits our purposes”
Jason, it is good to hear you say this. This is something that frequently needs more attention. Add to it the proper use of data and statistics and you might find improvement in some of your views.
Excellent. For more examples of accepted (though harmful) current practice based on bad research see any of Nortin Hadler’s books. And in the world of nutrition (influencing diabetes and cardiology), see Nina Teicholz’s The Big Fat Surprise.
This is a great article and speaks to a real issue in our data-driven society – our growing refusal to distinguish between correlation and causation when it suits our purposes.
Statistics and metrics are just a tool and can be used to justify anything. While people bemoan liberal arts degrees in favor of STEM, we need to have more people who can properly and critically analyze data as the authors have done.
I think this adverse selection dilemma seeps into the most elegant and well designed clinical trials. Imagine you are a newly diagnosed rheumatoid arthritic and your doc wants you to enroll in a RDBCT. You are told that you will be randomized into two groups. One group will get the latest cutting edge experimental hope for a disease modifying medicine….but it might not be as good as our current treatment and there might be surprising side effects. The other group will receive the best status quo drug that we have at the present time. Maybe you have talked to friends, some of whom have been treated with present day drugs and they are satisfied with their therapy. Or, you have read the NYT and understand the pretty-good outcomes we are achieving at present with the treatment of RA.
What type of patient is going to agree to enter the clinical trial?
Is each arm of the trial going to be enriched with certain types of patients? What if you were young and adventuresome and knew little about RA and wanted to please your doctor?
The National Institutes of Health have identified a disturbing phenomenon, the non-reproducibility of science. …
There was an article in The New Yorker in 2010 entitled THE TRUTH WEARS OFF in which the author discusses the fact that a positive finding in a scientific trial becomes less strong/significant when the trial is serially repeated. Maybe the “non-reproducibility” is result of our failure to fully understand the implications of the design of a scientific experiment or our failure to believe that “97% certainty” means that 3/100 times a different result will occur.
You have to see the logic here. If a study shows that you can save money, of course they’ll go for it, validity be damned.