Uncategorized

Pay for Performance in Healthcare: Do We Need Less, More, or Different?

The debate over pay for performance in healthcare gets progressively more interesting, and confusing. And, with Medicare’s recent launch of its value-based purchasing and readmission penalty programs, the debate is no longer theoretical.

Just in the past several months, we’ve seen studies showing that pay for performance works, and others showing that it doesn’t. We’ve heard from some theorists who describe P4P as sapping intrinsic motivation and doing violence to professionalism, and others who feel that its effects are as natural and predictable as water running downhill. Some commentators beg us to stop it, while others denounce P4P’s current incarnations as too wimpy to work and recommend they be turbo-charged.

If we weren’t talking about the central policy question of a field as important as healthcare, we could call this a draw and move on. But the stakes are too high, so it’s worth taking a moment to review what we know.

In the U.S., the main test of P4P has been Medicare’s Hospital Quality Incentive Demonstration (HQID) program. A recent analysis of this program, which offered relatively small performance-based bonuses to a sample of 252 hospitals in the large Premier network, found that, after 6 years, hospitals in the intervention group had no better outcomes than those (3363 hospitals) in the control arm. Prior papers from the HQID demonstrated mild improvements in adherence to some process measures, but – as in a disconcerting number of studies – this did not translate into meaningful improvements in hard outcomes such as mortality.

Contrast the HQID results with those seen in a recent publication of the results of a 24-hospital study in northwest England, in which P4P (with higher bonuses, up to 4%) was associated with significant improvements in risk-adjusted mortality rates for patients with pneumonia, acute myocardial infarction, and heart failure. In other studies, substantial bonuses to British GPs (up to 30%) have been shown to be associated with improved adherence to process measures and intermediate outcomes such as control of hypertension and cholesterol.

The competing ideologies are as interesting as the dueling p values. It has become clear that the world is sorting itself into two camps: people rooting for P4P to flourish and others hoping that it crashes and burns. In the former camp are individuals who view doctors as economic

creatures, nothing more, nothing less. They see protestations by physicians that “we can’t be bought” as both unbelievable and haughty. Such individuals find succor in the history of fee-for-service medicine. “We already have pay for performance,” I’ve been told on several occasions. “We pay more for the performance of procedures, hospitalizations, and office visits, and so that’s precisely what medicine produces.” Even for those rooting for better angels, this argument is hard to ignore.

In the opposite camp are those who point to medicine’s history as a noble profession. They note that one of the defining characteristics of professions is that they place the needs of those they serve over their own. In addition to this social-good argument, they tout empirical evidence from the trendy field of behavioral economics, which highlights the tension between intrinsic (driven by purpose, altruism, mastery) and extrinsic (driven by money) motivation. This research has demonstrated that not only do financial incentives frequently not work as well as one might like, they may even “crowd out” intrinsic motivation. The ever-present physician-gadflies Steffie Woolhandler and David Himmelstein, joined by behavioral economist Dan Ariely, highlight several of these arguments in a recent Health Affairs blog. They cite one study that found that incentive payments decrease the frequency of blood donations (as compared with voluntary donations) and another that found that parents became more likely to pick up their kids after-hours when an Israeli day care center imposed fines for late pickups.(“Fines had transformed promptness from a moral duty to a market transaction governed by price,” they write.)

This camp’s bible is Daniel Pink’s book Drive, and one of its prophets is my ABIM colleague Chris Cassel, who has argued, in two fine JAMA articles (coauthored by Sachin Jain, here and here) that financial incentives can suppress motivation, turning physicians from “knights” (individuals motivated by professional values) into “pawns” (passive participants doing backflips in response to external incentives). Another prominent member of this camp is Don Berwick, who addressed this issue before he became, well, Don Berwick. Writing in 1995, he argued

I find myself an extremist and therefore suspicious of my answer. But it is, nonetheless, the best answer I have yet found regarding merit pay for doctors or any group of workers; namely, “Stop it.” [Such pay] is destructive of what we need most in our healthcare industry – teamwork, continuous improvement, innovation, learning, pride, joy, mutual respect, and a focus of all of our energies on meeting the needs of those who come to us for help. We can find better ways to decide on how we pay each other and better uses for our energies than in the study and management of carrots and sticks.

So where does this gumbo of empirical evidence and exhortation leave us? As with most really complicated questions in life, the right answer will be as utterly unsatisfying to those rooting for their home team as it is predictable: it’ll lie somewhere between the poles. To me, while the evidence supporting P4P in healthcare is weak, it is far too early to pull the plug on a strategy with so much face validity, particularly with all that’s hanging in the balance.

But boy, do we have lots of details to sort out. How much money should be in P4P bonuses and penalties? (Medicare’s current value-based purchasing plan pegs bonuses at 1%, doubling in a few years – much lower than most experts recommend to catch the attention of doctors and hospitals.) What is the right mix of payments that go to best performers versus best improvers. (Early programs gave bonuses only to the former, but the correct answer must involve a Solomonic splitting of the baby, and Medicare’s current value-based purchasing plan does just that.)

What is the best blend of process and structural vs. outcome measures? (Medicare began with process measures, but value-based purchasing and other P4P programs are increasingly combining processes with risk-adjusted outcomes.) When does transparency get you far enough along that P4P is superfluous or simply not worth the hassle? (The biggest surprise in this area has been the power of simple transparency in driving change, particularly since it has been accompanied by relatively meager consumer-based changes in behavior. This is part of the reason why it has been hard to show benefit for hospital P4P in the U.S.: all “control” hospitals are participating in a vigorous transparency program, which has been strikingly successful.) How can we align everyone’s incentives? (P4P programs to date have mostly focused on hospitals or doctors, rarely both. Future programs should try to align these forces.) And – perhaps most important of all – is there a way to implement P4P without dousing the flame of intrinsic motivation and professionalism? (I have no clever answer to that one, but I suspect someone smarter than me will figure it out.)

In the final days of the presidential campaign, we saw the hazards of reading the evidence – in this case, the polls – through an ideological lens. Even as we debated the “Is-Nate-Silver-right?” question, we knew that the final answer was forthcoming, on the evening of November 6. With P4P, it won’t be quite that simple: there will not be any singular event that tells us that we have things structured just right to maximize benefit and minimize harm. So, for now, the best stance is to keep an open mind, listen to both sides of the argument, review the research in as unbiased a way as one can muster, and pray for more and better studies.

In the end, I’m guessing that the best solution will not be one that treats physicians as purely economic animals. They’re not. But – as much as we’d wish it to be so – it is equally unlikely to be one that relies completely on the kindness of strangers.

Robert Wachter, MD, professor of medicine at UCSF, is widely regarded as a leading figure in the patient safety and quality movements. He edits the federal government’s two leading safety websites, and the second edition of his book, “Understanding Patient Safety,” was recently published by McGraw-Hill. In addition, he coined the term “hospitalist” in an influential 1996 essay in The New England Journal of Medicine and is chair of the American Board of Internal Medicine.  His posts appear semi-regularly on THCB and on his own blog, Wachter’s World.

4 replies »

  1. I am also in the Ariely camp, but I do believe that there are certain administrative tasks that are indisputably contributing to poor outcomes, and that can be measured objectively. Hospitals should not be rewarded for proper performance, which should be assumed, but they should be penalized for not doing so.
    A good example would be having discharge instructions delivered to the primary care physician and/or long term facility, immediately upon discharge (not 4 days later, if at all).

    All current P4P measures are of questionable quality and fuzzy enough to be gamed and will most likely do more harm then good because accountability starts where responsibility ends.
    That said, I wouldn’t be opposed to continuing the study of limited experiments if we think that 6 years and hundreds of hospitals are not indicative enough.

  2. Good discussions. Missing, however, is a recognition of the power of “gaming the system”. We already see it in coding (“down” by insureres. “up” by providers”. Any complex reward system is an invitation to gaming, even making it a necessity. Medicare found that out when they instituted 75 “cost-saving” risk managment changes and found that expenses went up instead of down. There are consulting companies that make their money by training providers how best to extract money from the system. The same will be true in P4P. More frightening is the observation. that rigid enforcement can lead to avoidance of unique and appropriate medical care. Add to that the huge cost of trying to codify and administer a P4P system. We still have much to learn.

  3. Interesting post, Bob. I am in the Ariely camp and have enjoyed hearing he and Dan Pink speak. But I agree that there is no question that money acts as a driver.

    So, how do we assess whether we’ve hit the right measures? I understand that measuring of process and outcome goals such as the ones you referenced re cardiac mortalities and adherence to control of blood pressure in the UK. But how do we assess a sense of professionalism and responsibility?

    With the greatest respect and appreciation for hospitalists (and my desire to have them take care of my patients when in hospital), there is no question that it has worsened the shift worker mindset of these providers. There is a very fragmented sense of accountability.

    Similarly, when looking at the benefits of work hour regulations, I know that there is a real question as to whether or not that has actually improved the rates of errors. Anecdotally and based on discussions with many other academic colleagues, it has unquestionably led to a shift worker mentality – I work from 7am until 6pm and then I’m done, or post-call, it’s more important for me to go home and nap than to stay and find out what the CT scan showed on Mr Smith and what the outcome of his care was. The data is also now slowly coming out about the errors that are coming on the other side of this, with communication and hand-off issues.

    How do we measure these softer outcomes of professionalism, responsibility, accountability, and communication? And, to the camp that feels that doctors are entirely about money and to claim otherwise is haughty, doesn’t pushing these measures dumb us down further into money-driven technicians?