Repeal + Replace

Paying Doctors For Outcomes Makes Sense in Theory. So Why Doesn’t it Work in the Real World?

For decades, the costs of health care in America have escalated without comparable improvements in quality. This is the central paradox of the American system, in which costs outstrip those everywhere else in the developed world, even though health outcomes are rarely better, and often worse.

In an effort to introduce more powerful incentives for improving care, recent federal and private policies have turned to a “pay-for-performance” model: Physicians get bonuses for meeting certain “quality of care standards.” These can range from demonstrating that they have done procedures that ought to be part of a thorough physical (taking blood pressure) to producing a positive health outcome (a performance target like lower cholesterol, for instance).

Economists argue that such financial incentives motivate physicians to improve their performance and increase their incomes. In theory, that should improve patient outcomes. But in practice, pay-for-performance simply doesn’t work. Even worse, the best evidence reveals that giving doctors extra cash to do what they are trained to do can backfire in ways that harm patients’ health.

The stakes are high. Britain, with a much different health system — single payer — has embraced pay-for-performance in a big way, spending well over $12 billion on such programs in 12 years. And pay for performance is a feature of virtually every major health program in the US.

While cost estimates are scarce, regulations intended to incentivize doctors for quality and efficiency cost physicians more than $15 billion just for documenting their actions. In yet another assault on common sense, Congress passed an enhanced pay-for-performance law (“MACRA”) that went live January 1.

Blame for the wasteful embrace of pay-for-performance measures can be directed to at least two sources: First, an overreliance on economic theory in the absence of empirical testing. (Of course, performance will get better if you pay people for outcomes, an Econ 101 student might say.) Second, numerous studies have purported to show that health outcomes improve when doctors’ pay is pegged to performance outcome — yet these studies have fatal flaws.

Many such studies suffer from what’s known as “history bias.” That is, they tend to treat any positive health trend after the introduction of performance pay as the result of that payment system. But it’s often the case that the positive trend predates the introduction of the treatment.

The failure of pay for performance has been demonstrated repeatedly in scientific studies. In a recent article in the CDC’s Preventing Chronic Disease, we showed that much of the early research on the supposed success of pay for performance was conducted with serious research design flaws. For example, in the UK, effective treatment of high blood pressure has been increasing for years — well before pay-for-performance measures designed to improve blood-pressure treatment had begun. Doctors had both been getting better at identifying patients with high blood pressure and drug treatment regimens had been improving. But the early research inappropriately credited pay for performance with all the improvements that followed its introduction.

Consider the following graph, from a major study evaluating the United Kingdom’s pay-for-performance policy where diabetes is concerned. It purported to find a major positive effect. The red dashed line shows where the rewards program began:

Figure 1. Mean clinical quality scores for diabetes treatment at 42 practices participating in a study evaluating pay-for-performance in the UK. The scale for scores ranges from 0 percent (no quality indicator was met) to 100 percent (all quality indicators were met for all patients). Campbell SM, Reeves D, Kontopantelis E, Sibbald B, Roland M. Effects of pay for performance on the quality of primary care in England. N Engl J Med 2009;361(4):368–78.

The key problem here is that the researchers use only two data points during the long period before the program was implemented, and two data points afterward. If anything, it appears that the improvements — to the extent any are detectable by examining only two data points — may have grown less quickly after implementation of pay-for-performance. We also don’t know if any small improvements resulted from pay-for-performance or from some other changes in physicians’ practice.

The next figure illustrates a result of one of the most convincingly negative studies of the UK’s pay-for-performance policy. In this case, the treatment question involved patients with hypertension. Using a strong long-term research design and seven years of monthly data for 400,000 patients before and after the program’s implementation (84 time points), the study showed that the pay-for-performance program was introduced in the middle of a slight rise in the percentage of patients who began blood pressure treatment.

It seems clear from the trend line that pay for performance did not cause the rise:

Figure 2. Percentage of study patients who began antihypertensive drug treatment from January 2001 through July 2006. The dashed line indicates when the UK’s pay-for-performance policy was implemented (April 2004). Serumaga, Ross-Degnan, Avery, Elliott, Majumdar, Zhang, et al. Effect of pay for performance on the management and outcomes of hypertension in the United Kingdom. BMJ 2011.

This is a big deal: a $12 billion program that links doctors’ incomes to measures of health-care quality had no effect.

The strongest design for evaluating policies is a randomized controlled trial (RCT). In such study designs, random allocation of participants into intervention and control groups increases the likelihood that the only difference between the groups is the pay-for-performance intervention. In a recent RCT, physicians in the pay-for-performance condition were eligible to receive up to $1,024 whenever a patient met target cholesterol levels. Physicians in the control groups received no economic incentives to hit those targets.

There was no real difference in improvements between the two groups:

No study is perfect, and it’s unlikely that a single study can determine the truth. But when you single out the most rigorous systematic reviews, empirical support for pay for performance evaporates.

Why doesn’t pay for performance work?

There are a few reasons why performance incentives fail. They reward doctors for things they already do, like prescribing antihypertensive drugs. What’s more, the programs often use lousy, unreliable quality measures: For example, they might penalize doctors for not prescribing antibiotics to patients who are allergic to them.

More troubling, there is evidence that such policies may even harm patients by encouraging unethical practice. One international systematic review found — in addition to no positive effects — that pay-for-performance programs had the unintended consequence of discouraging doctors from treating the sickest and most costly patients; there’s an incentive to cherry-pick the healthiest, active, and wealthy patients.

Health professionals do not respond to economic carrots and sticks like rats in mazes. As the leading health care economist Uwe Reinhardt said, “The idea that everyone’s professionalism and everyone’s good will has to be bought with tips is bizarre.”

Some health policy experts, like Harvard public health professor Ashish Jha, have argued that the awards in pay-for-performance programs simply ought to be increased: “Make the incentives big enough, and you’ll see change,” he has said. But there’s no evidence that the program has failed because doctors aren’t being paid enough. A pay-for-performance program in the UK paid an extra $40,000 per year on average to family doctors, but it still failed to improve care.

The pattern goes deeper than flawed study design and quality measures. Policymakers too often show unbridled confidence in economic theories and models that are unsupported by evidence. Health economists aim to predict how doctors will respond to incentives, but without understanding the complex pressures they face that shape behavior — including high patient loads, incomprehensible insurance rules, increasing time demands for more and more regulatory requirements, duplicative or conflicting regulations, and documentation of often unnecessary clinical data in different and noncommunicating electronic medical records systems.

In April 2015, ignorant of decades of research, a bipartisan Congress passed a huge new law (“MACRA”) that will tie even more funding to these questionable “quality scores” beginning this month — even amid the tumult of the Obamacare debate. The government’s MACRA rules took up almost 2,400 pages of text, and physicians are already balking at the additional paperwork and screen time.

Under MACRA, doctors who opt into pay for performance are allowed to themselves choose, out of many possibilities, the six criteria on which their performance will be judged by the Centers for Medicare and Medicaid Services (CMS). Letting doctors choose their own criteria clearly lets doctors game the system for extra income, and it seems unlikely to provide any useful data — especially with almost every doctor choosing a different mix of standards.

We can do better. Researchers, policymakers, and journalists have a responsibility to understand the crucial role of robust research design. Academic journals should adopt the same research design standards used by Cochrane, the leading international medical research organization that conducts reviews of medical evidence. Cochrane weeds out the weakest studies.

Instead of a punitive incentive-and-penalty approach, policymakers should try to identify the reasons for poor performance. In contrast to numbers that can be gamed, doctors and nurses want concrete information they can use to improve care and save money. One of the most celebrated successes in American medicine involved the use of doctors, nurses, and pharmacists to counsel frail elderly people being discharged from hospitals and follow them at home to help them take their drugs and stay healthy. This program avoided costly and painful readmissions to the hospital.

We also must rethink the role of abstract economic theory and dubious economic models in policymaking. While much of human activity can be attributed to simple financial incentives, not all can nor should be. This is not just an academic argument. America spends more on medical care than any other nation but gets second-rate results. We need better research and more realistic theory to guide our massive investments in health care.

Stephen Soumerai is professor of population medicine and research methods at Harvard Medical School and the Harvard Pilgrim Health Care Institute. Ross Koppel teaches research methods and statistics in the sociology department at the University of Pennsylvania, conducts research on health care IT, and is a senior fellow at the Wharton School’s Leonard Davis Institute of Health Economics.

This post first appeared in Vox.

Livongo’s Post Ad Banner 728*90

18
Leave a Reply

12 Comment threads
6 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
15 Comment authors
lisaindfwMichel AccadNiran al-Agba, MDJim PurcellAllan Recent comment authors
newest oldest most voted
lisaindfw
Member

Another reason it doesn’t work? Patients. The doctor can do everything right but if the patient goes home and eats crap and doesn’t take their meds they have a bad outcome. Those are the patients that get fired from the practice. We avoid that by not taking any 3rd party payment — they pay us when they leave and if they have insurance can file on their own.

Michel Accad
Member

I applaud the authors for bringing attention to the “history bias,” which is one of the most common means of self-deception (or trickery?) to justify policy interventions. I am puzzled by their recommendation to apply a corrective on the basis of “robust research design.” Who will ultimately judge the value and trustworthiness of empirical studies to guide policy? And by the time such research is conducted and analyzed, the health care environment is inevitably changed, making the study essentially moot. Rather, it is most important for economists and policy analysts to better reflect on the effect of triangulating the medical… Read more »

Joe Flower
Member

Great, important discussion. Though there are some disagreements here, let me just emphasize a key point that everyone agrees on, that is indeed at a crisis point, that is in fact by all reports degrading the real productivity of physicians at a point when we need their productivity more than ever. That key point is the sheer complexity of these measures. Economics 101 believes in incentives. Advanced systems behavioral economics examines how and whether that incentive actually works. Any incentive is a communication: Do this, you get that. If you make this sale you get 10% of the sale amount,… Read more »

jstavene
Member
jstavene

This is fascinating!, best article I have read in quite awhile. though I completely disagree! When I have a car motor rebuilt,,and the car dies in 30 miles from the mechanic shop,,,I stop payment to the mechanic lest its repaired,,, when my father passed away less then 16 hours after release from the hospital,,, that hospital still expected to be paid?,,,, when my mother had a lung puncture done which later within days,,was determined not to have been needed, that in fact damaged her so badly ,,compounded with her other health issues she now needs a heart lung transplant,,,, we… Read more »

J Citizen
Member
J Citizen

this ignores the elephant in the room- American patients are fat, lazy, and stubborn know it alls. I toured Europe for 3 weeks in grad school for a neuro conference, I saw 0 overweight europeans. The roads are too small to drive everywhere, so they walk. They eat high calorie food but very small portions, meat is expensive, buffets were non existent. The only obese people I saw were all American tourists. Paying physicians, who are already overworked, underpaid, and burned out, based on patient behavior is idiotic, unless you can punish the patient. 1. Physicians are overworked/underpaid- EMR/EHR cut… Read more »

jstavene
Member
jstavene

I do agree, with what you say. When a doctor says a patient needs a lifestyle change,,, the patient ignores it,,far too often! (I must admit I have been on a restricted diet since 16 and am 39 now,, it took me years of annoying and angering my own doctors,, to finely get to a point where I could adapt) I do think patients need a penalty for not exercising, or following diet restrictions,,, I worked for a company who yearly took BMI, and blood sugar, and many other analytics and they pro-rated our insurance on that,,,giving a “”preferred/cheaper” rate… Read more »

Allan
Member
Allan

“I do think patients need a penalty for not exercising, or following diet restrictions,,,”

They pay a very big penalty, their health. You might want to consider this anecdote. Years ago many people actually paid cash when they saw their doctor. I noticed that many of my uninsured diabetics were more likely to follow the regiment offered because it was less expensive. We would work things around so that the number of office visits could be reduced because they had better control over their blood sugars.. The insured one’s didn’t seem to worry as much.

Millenson
Member
Millenson

Well, yes and no. I couldn’t agree more about the dangers of simple financial incentives, and I am particularly concerned about the love affair with an oversimplified “consumerism.” I recently wrote in defense of the word “patient” in the BMJ. (The blog version had the better title, “Girls, Queers and Patients”) http://blogs.bmj.com/bmj/2016/08/18/michael-m-millenson-girls-queers-and-patients/ and I’ve written in the Journal of General Internal Medicine about the way in which three aspects of patient-centeredness — the clinical, economic and ethical — can be synergistic but also conflict. Indeed, back in the 1990s, I satirized the George W. Bush administration’s push towards consumerism by… Read more »

ashishkjha
Member

This is an interesting piece and summarizes some of the literature on pay for performance. I am sympathetic to the Steve and Ross’s conclusions and even honored that they would mention me. I just wished they would stop so obviously taking my comments out of context. If you want a quick summary of the literature, here’s what we know about pay for performance (P4P): 1. It tends to reward those who were already doing well. 2. It tends to have very small beneficial effects on processes of care. 3. It has no real impact on patient outcomes. 4. There is… Read more »

Ross Koppel
Member
Ross Koppel

I very much thank Ashish for his comment. He’s right: I had not appreciated enough the distinction of differences between MDs and hospitals. (Note: Steve may have examined this in detail…. I’m speaking about my own shortcomings here.) Hospitals exist sui generis–they are systems, not people/individuals. It follows, therefore, that the carrots and sticks to motivate improvements might well be different from those for individuals. Alas, as Ashish points out, neither incentive program (for docs or for hospitals) seem to be very efficacious. The measures are often ill-considered, there are so many multi-co-linear factors that the mind boggles, and the… Read more »

Allan
Member
Allan

“However, there has been the mounting evidence—even in multiple meta-analyses—that P4P programs were having little effect across a range of clinical services, from quality of ambulatory care to rates of breast cancer screening. Despite this, Congress created multiple P4P programs within the ACA to incentivize better care.”

An important issue is why Congress has passed programs that the studies indicate don’t work while not changing the incentives? Do you have an answer?

Ronald Graf
Member

Stephen and Ross, thanks and I agree with your thoroughly supported article. What if we created the same incentives that make many orders arrive in one day from Amazon or Ebay? It’s called anonymous public feedback ranking. With every provider and item having a code this enables each billing event to potentially supply a chit to the patient that allows them to login anonymously to leave feedback on code relevant programmed questions regarding their care or care items. This could produce a statistical database showing rankings, volumes and pricing. Published pricing with quality transparency would revolutionize the market and apply… Read more »

meltoots
Member
meltoots

Thanks for the article. As a front line MD that has reported PQRS for 5 years, and been the physician IT leader of our hospital for years , I can tell you that it has done nothing but add more administrative burden to our day and its ripe with errors. MACRA is even more of a burden and will fail without question. 1. Its ALWAYS the DOCTORS fault: To assume that 1 physician is responsible for ALL the patient’s health measures/outcomes is ridiculous. An MD can try all they can to control blood sugar, but if the patient continuously ignores… Read more »