The Dartmouth Team Responds (Again)

Reed Abelson and Gardiner Harris, the authors of the June 4th  New York Times article critical of the Dartmouth Atlas and research, have acknowledged Elliott Fisher and my concerns and clarified the record in their posting on the New York Times webpage.  They originally claimed that we failed to price adjust any of the Atlas measures. They now acknowledge that we do, but they’re hard to find on the Atlas website, a point we concede.  They originally claimed that quality measures were not available on the Atlas website.  They now acknowledge that quality measures are on the website, but they don’t like them.  We agree quality measures can be better – the type of research we do is always open to improvement — and Dr. Fisher has recently co-chaired an NQF committee with precisely this goal.  (See our more detailed response.)

But the primary purpose of this posting is to respond to the attack by Mr. Harris on the professional ethics of the Dartmouth researchers.  The key issue seems to be whether the research in two landmark 2003 Annals of Internal Medicine articles (here and here) were misrepresented by the Dartmouth researchers.  In his posting Mr. Harris asserts:

In an aside, when was the last time you saw researchers so profoundly mischaracterize their own work? How is it possible that they could claim their annals pieces concluded something when they didn’t? I can’t remember ever seeing that happen.

We are disappointed by this accusation. We can understand Mr. Harris’s frustrations in understanding the research, as it is often nuanced and tricky to follow.  This lack of understanding is illustrated by their recent New York Times posting, where they state:

In statistical terms, [the Dartmouth researchers’] claim is referred to as a negative correlation between spending and health outcomes, which means that when spending goes up, the health of patients goes down.

They have confused the idea of a correlation (high spending hospitals on average do slightly worse on quality and outcomes) with causation (if a hospital spends more money, outcomes for those patients will get worse).

The more fundamental point, however, is their claim that we misrepresenting the two 2003 Annals of Internal Medicine studies written by Dr. Fisher and others.  Ms. Abelson and Mr. Harris state that

The Dartmouth work has long been cited as proving that regions and hospitals that spend less on health care provide better care than regions and hospitals that spend more…. As the article noted, [Dr. Fisher] asked in Congressional testimony last year, “Why are access and quality worse in high-spending regions?”

Now we come to their smoking gun(s):

Those [Annals] studies did conclude that there was no association between higher spending and better health, but they did not show any link between higher spending and worse health.

One of the paper’s arguments was summarized in the abstract this way: “Neither quality of care nor access to care appear to be better for Medicare enrollees in higher-spending regions.”

The second paper’s summary: “Medicare enrollees in higher-spending regions receive more care than those in lower-spending regions but do not have better health outcomes or satisfaction with care.”

Ms. Abelson and Mr. Harris are correct to note that the studies conclusively rule out the null hypothesis that more spending is associated with better outcomes – the major point of the paper.  But anyone who reads both articles will come away with more than that — a finding that outcome measures are worse on average in high-cost regions.  We did a quick tabulation of the large number of outcomes measures in the article – a total of 42 different measures (these are reported in more detail in our background paper).  Of the total, 23 showed significantly worse outcomes in high-spending regions, 14 showed no significant effects, and just 5 showed significant positive effects in high-spending regions.

In other words, if one were to construct an index of quality, it would show nearly 5 significantly negative measures of quality of care in the high cost regions for every one positive measure.

So when Abelson and Harris claim that Fisher and others are overstating the results of the papers, they are wrong.  Perhaps this simply reflects a lack of experience in reading and interpreting scientific papers.   But that is no excuse to be making unfounded accusations against us.

We have also been disappointed at the adversarial nature of the process.

It began in February with Mr. Harris announcing at the beginning of an interview that he was going to “take down” the Dartmouth Atlas (as documented in Maggie Mahar’s blog).  And it is ending (we hope) with Mr. Harris’s posting on The Health Care Blog questioning our ethical standards.  This saddens us because of the missed opportunity to improve the dialogue in Washington and elsewhere about the strengths and limitations of research on regional variation.

16 replies »

  1. Jus10:
    The point is that using death as an endpoint to determine whether or not spending influences “outcomes” is only helpful when you measure all of the “outomes.” Like survival. Otherwise you have an incomplete dataset that’s worse than useless when trying to evaluate what saves lives and what doesn’t. It’s actually dangerous when idiots who take it at face value use it to allocate scarce resources.
    To get back to the brake example. The point isn’t that ABS and a kite are equally effective ways to slow down a car. They aren’t. But if you only looked at incidents that ended in crashes, and not the rate at which drivers employing the respective braking systems crashed (hint: this is analogous to a data set that only includes people who died), there’d be people out there who are dumb enough to conclude that kites are just as effective as ABS. If they were as dumb as the clowns that compose the Dartmouth Atlas and their fans, they’d go further and argue that anything more expensive than a kite is a waste of money and propose a bureaucratic mechanism that would incentivise the adopting of kite-only braking systems.

  2. @Yaj: Following the logic of the previous example, if $3000 anti-lock brakes don’t work, why would you want to spend the money on them? If they are both equally ineffective, spend the $5 on the kite and have some fun.

  3. How about addressing the argument that the central metric by which your Atlas judges quality of care is terribly flawed?
    That is, you only look at spending and outcomes for patients that died, instead of looking at the impact that the said spending had on relative survival rates.
    Car A has an anti-lock braking system that costs $3000. Car B hangs a $5 kite out the window to increase air resistance. Both crash after the brakes were applied. If Dartmouth Atlas applied the same methods to this data set that they do to their analysis of medical spending and “outcomes,” they’d conclude that anti-lock brakes are no more effective than the kite-drag brake, and 600 times more expensive. Ergo we could achieve the same outcomes and spend 600 times less by getting rid of anti-lock brakes on all cars.
    Speaking of correlations, how strong is the evidence that there’s either a causal or reliable relationship between patient satisfaction and clinical efficacy. If anything, the level of satisfaction reported by people treated by, say, homeopaths is *higher* than that of persons treated by physicians. Unfortunately, there is also no credible evidence whatsoever that administering trillion-fold dilutions of substances that may or may not have any physiological effect is even a plausible modality for treating any illness, much less a clinically effective one.
    The Dartmouth Atlas is a fraud and a travesty that will significantly impair both the efficacy and the efficiency of medical care in the United States.

  4. Really?
    The question at hand is not with the conceptual findings, that there is variation, but with the root cause. And yes, since people are paying attention, the scientists should welcome critique such that they don’t become the same demons that they criticise for speaking up, or God forbid the ones that are over utilizing care! It isn’t fully transparent, it isn’t causal and it has lots of room for improvement, but it is worth the fight to improve it and make it even better!
    Everyone needs to pause and take a breath and remember why we do what we do.
    Communications is a 100/100 proposition, if people don’t understand it, you haven’t explained it. If that fails, find a new line of research because that is the ultimate in informed medical decision making!

  5. In reading this thread I am again struck by how little experience the writers and commentators actually have with the management of health care. Without actual experience, the arguments tend to devolve into petty nit picking… none of which will lead to either understanding or solutions to our greatest health care problem. That is, costs and premiums continue to rise to levels we cannot afford. I can attest, from personal health care management experience that the medical cost of unnecessary treatment (over-utilization) in the U.S. is well over 30%.
    That people are still arguing this point shows how little the so called “experts” really understand about our health care system. It has been quite awhile since I visited this blog. Unfortunately, the posts seem just as argumentative, silly, or poorly informed as they were six months ago. It’s sad really.

  6. I would encourage you to take this matter up with the new public editor at the New York Times, Arthur S. Brisbane, whose appointment was announced a few days ago. You’ll never win an argument with Abelson and Harris, but Brisbane might be less arrogant and more intelligent and therefore able to see things differently. He may not have begun work just yet, but probing the embarrassing work by Abelson and Harris certainly would get him off to a rousing start.

  7. First Richard Cooper goes after the Dartmouth Atlas. Then it’s Peter Bach and now the NYT. Each got taken to the woodshed by Skinner and Fisher for not reading the academic papers carefully. Each was motivated by a ‘search for sensationalism’ (an expression used by one of the posters above) and simplified the Dartmouth message to the point of getting it all wrong.
    All this said, there must a more reasoned, careful, critique of the Dartmouth work.

  8. I love you Dartmouth guys! Nice work and great responses.
    Too bad NYT just lost some major points on their smarts.

  9. Since there hasn’t been an RCT of health insurance in this country since the Rand Experiment in the 70s, there isn’t going to be a definitive answer on health care spending. Given that there likely isn’t going to be another RCT similiar to that experiment ever again the U.S. for several reasons, you have to look for other data sets to make inferences about cost and quality.
    The Dartmouth Atlas of Care is one really potential interesting way of looking at the conundrum of cost and quality. Now granted it is limited in some capacities (which the authors generally acknowledge) but to dismiss it entirely is incredibly foolhardy. If the authors wanted to present a more detailed picture, they should look at some of the research that is emerging on quality-based purchasing and its attempts to analyze costs/quality.
    It amazes me though that the press still generally does a terrible and inept job of conveying the point about causation vs. correlation. This happens in a number of fields.
    Best example I can think of offhand is among sports journalists who insist certain players are ‘more clutch’ despite amble statistical evidence that either debunks those claims or actually proves those players are worse than the norm.

  10. Barry,
    I am skeptical whether you can will find enough docs “like at Mayo” to cover the entire US. There is nothing superspecial about them, and many academics from other institutions are better researchers and/or better clinicians than their respective Mayo counterparts. But again, we are talking about covering the entire US. One should try to enhance a patient- and EBM centered culture everywhere, but I think one should be realistic that this is not possible everywhere.
    I agree with you on the need for real tort reform (an issue unfortunately not recognized as relevant by a lot of progressives, i.e. often unfairly confined to the conservative camp). No doctor doing a reasonable effort should be sued (he/she might be subject to state board action if found to be incompetent, but that should be a different matter for repeat offenders). One could do even a small scale reform that is infinitely more useful than caps, e.g. Haskel M: A proposal for addressing the effects of hindsight (…) Tort trial and insurance Practice law journal 2007:42:3

  11. rbar,
    I think there are probably plenty of doctors around the country as smart as most or all of those at Mayo. Mayo’s use of electronic records can and probably will spread more broadly over time. Their muli-specialty group practice model is far from unique and is likely to be more broadly replicated. What can’t be easily reproduced is their collaborative, collegial, PATIENT CENTERED culture. Even Rochester Mayo has not been able to completely replicate its culture at its other locations. Their large endowment and their niche in lucrative executive physicals which help to sustain their organization financially can’t be easily reproduced either.
    I agree with you about the low hanging fruit, especially no MRI for an ordinary headache. However, doctors are likely to need robust tort reform before they will embrace more conservative practice patterns when it comes to diagnostic imaging and other testing. I define tort reform, not as damage caps, but as robust safe harbor protection from lawsuits when evidence based guidelines are followed where they exist and the use of special health courts instead of juries to resolve medical disputes.

  12. With large scale statistical research, one should always try to square it with intuition and common sense of pelpe involved into the matter at hand.
    I think it is just commonsensical to state that higher resources can result into better outcome if the money is spent wisely. Spending wisely can be achieved by 1) simple rules (no brain MRIs for typical migraines), by 2) more complex rules (agent x prolongs survival in cancer z at stage TxNyMz), and in may cases, by 3) complex considerations taking many multiple factors into account.
    Obviously, if we want to spend money wisely, we would first pick the low hanging fruit at 1) and 2). Re. 3) that’s what you need excellent physicians for (and well informed patients can help as well). I personally have seen physicians smarter than I am making strange and suboptimal judgment calls … but for instance, I have never seen anything really bad coming out of Rochester Mayo, probably a result of optimal working conditions for smart physicians with good judgement and clinical skills. We should always try to educate better and better clinicians, but frankly, I am skeptical that we can get Mayo level care everywhere. That’s why our efforts have to be targeted towards 1) and 2).

  13. I’ve been a journalist for 26 years. Journalists have always been bad at reporting risk. Still are, as this episode proves.

  14. The troubling question for me is that, since journalists frequently misinterpret scientific research (note the recent complaints about their over-glamorizing the new melanoma research, for instance) and patients are out there actively looking at scientific papers, what do we in the profession do to avoid widespread misinterpretations? The average American is abysmally educated on this subject, and its nuances are even difficult for the average private practitioner M.D., such as myself.
    The journalistic bias and search for sensationalism noted by the Dartmouth authors is, unfortunately, somewhat endemic among the press. It’s difficult to combat and, ultimately, they are going to have to let the press have the last word just to satisfy their juvenility.

  15. To me, this is how this whole, messy, disappointing debate boils down: the NYT is criticizing the Dartmouth researchers for implying that high-spending regions provide worse care than low-spending regions. This isn’t really true, what the Dartmouth papers says is that despite spending lots of money, high-spending areas have more or less equally bad care as everywhere else in the US. Therefore, they are probably wasting a lot of money since they are not getting any more value, and in that way they provide worse care, “worse” meaning less efficient.
    This is my interpretation of the overall thesis of much of the Dartmouth team’s work and it seems like Abelson and Harris are trying to catch Fisher, Skinner et al falling into the semantic trap of saying that their findings imply that high-spending regions provide poor care, when they mean poor value care. This may be a lesson for the team at Dartmouth that when your research becomes influential and important enough that everyone is paying attention, you have to be very careful about what you say. In the end, I am still convinced that the Dartmouth researchers have made incredible contributions to health policy which stand on their own, no matter what the powerpoint slides or congressional testimony says. But many people will interpret Abelson and Harris’ article to mean that this research is fundamentally flawed, which it certainly isn’t, at least no more so than any other policy papers. That’s the part of this little saga that I find most depressing.

Leave a Reply

Your email address will not be published. Required fields are marked *