QUALITY: Performance measures only have a little of the answer

(Hat-tip to Modern Healthcare for spotting this one). While there was lots of fuss about the IHI 1OOK lives campaign recently and whether it did or didn’t meet its target—and the NY Times gave it a pat on the back this morning in the Editorial section—there’s perhaps even more important news from a study published in JAMA today. A large multi-center team looked at the Medicare data for performance measures on post-heart attack patients with regard to how improved processes related to outcomes. These measures are the bedrock of the “we know what to do, but we don’t know how to do it” meme of IHI and the quality movement. In other words, the theory is that if we just did it all as well as the literature says we should, then there is potential for vast improvement. Unfortunately the outcomes are sobering for those of us who believe that if you apply relatively simple industrial processes to medicine it can markedly improve outcomes (and lower costs too).

We found moderately strong correlations (correlation coefficients ≥0.40; P values <.001) for all pairwise comparisons between beta-blocker use at admission and discharge, aspirin use at admission and discharge, and angiotensin-converting enzyme inhibitor use, and weaker, but statistically significant, correlations between these medication measures and smoking cessation counseling and time to reperfusion therapy measures (correlation coefficients <0.40; P values <.001). Some process measures were significantly correlated with risk-standardized, 30-day mortality rates (P values <.001) but together explained only 6.0% of hospital-level variation in risk-standardized, 30-day mortality rates for patients with AMI.

In other words, even when the hospitals did well on the performance measures, it only explained a small fraction of the overall variation in outcomes. So there are to my mind only two possible conclusions. Either performance measurements and controlling process variation don’t matter too much, or we actually—in this case at least—don’t know what works. Neither one is a particularly satisfying explanation.

Categories: Uncategorized

Tagged as: ,

16 replies »

  1. I like this idea of avoiding what we know does not work. Can we have some dialogue on that? What is good example that we can examine now? (not gastric freezing!).

  2. I feel a bit like a broken record, but since the discussion has strayed somewhat from Matthew’s original post, I would like to try to get Maggie to agree with me on one point.
    Maggie- I agree with your concept of a bell curve (in fact, I regularly talk with patients about ‘the great bell curve of life’ that explains variations in outcomes, intelligence, running speed, etc.).
    The way to move the curve ‘to the left’ as you say, is much easier if we focus on what SHOULD NOT be done for certain conditions, rather than trying to get agreement on ‘best treatments’ for conditions.
    For most things in medicine (as in the rest of life), there are generally multiple right ways to approach problems; however, there are definite wrong ways– and here is where even the stubborn can find common ground.
    Maggie and others- I suggest we first focus on the ”Don’t dos’ when we are looking at quality improvement. The doctors who actually populate the far ‘right end” of the curve generally do so by making wrong decisions as much as by failing to make right decisions.

  3. “In other words, if ‘the professionals’ can’t measure how to choose a ‘good’ healthcare provider, then how can we expect consumers to make wise choices…”
    That’s one way to say it!

  4. “Isn’t consumer-directed healthcare based on the notion of having providers compete on outcomes and cost through transparency? The problem is, we need the data and information systems to generate truly useful outcomes and cost data before we can rationally expect consumer to select providers intelligently.”
    In other words, if “the professionals” can’t measure how to choose a “good” healthcare provider, then how can we expect consumers to make wise choices…

  5. I agree that EMRs (EHRs) are an essential piece. But they must be applicable to all specialties and all clinical data sets, and, ideally, they should be designed to transmit the diagnostic, process, and outcomes data (stripped of identifiers) to researchers working in collaboration with the practitioners. In addition, they should be integrated with decision-support tools (including diagnostic assessment, medication management, basic alerts and reminders, and plan-of-care generation and execution management) if they are to help improve outcomes significantly. This means we need radical and affordable HIT innovation to evolve current day applications into transformational tools.
    Concerning Porter: Isn’t consumer-directed healthcare based on the notion of having providers compete on outcomes and cost through transparency? The problem is, we need the data and information systems to generate truly useful outcomes and cost data before we can rationally expect consumer to select providers intelligently. When we are serious about collecting the requisite data and developing the requisite tools, high-performing providers would welcome such transparency as a form of competitive advantage. I’d bet that these top performers would also be using good decision-support tools, be engaged in research activities, and be commited to learning continually through constructive feedback.

  6. Matt and Steve–
    I agree with both of you. If we want to measure quality, we need to be collecitng and analyzing data a much greater depth and breadth of data, and that is impossible without electronic medical records.
    This reminds me of Michael Porter’s new book on healthcare and his assertion that all we need to do is “make outcomes trasnparent” and ask health care providers compete on quality.
    Does he address the lack of EMRs needed to collect the data or the difficulty of “making outomes trasnparent?”

  7. The current constraining factor for outcomes measurement is ease/burden on measurement.
    We know that the measures we use are crude, but the burden on forcing – in a paper-based healthcare world – MORE chart review, etc. for measurement of P4P indicators can make the cost of measurement greater than (economic) reward.
    I see lowering the burden/cost of indicator measurement (both for P4P and – God forbid – internal performance improvement efforts) as a key benefit of EMRs (of the not-so-distant future).

  8. Maggie’s response is excellent.
    I’ve always argued that it is foolish to measure care quality using only process measures, and this study validates my position. I’ve also argued that using only claims (administrative) data to measure care quality isn’t wise since comprehensive clinical (encounter) data is crucial. I am not at all surprised, therefore, that a weak correlation exists between a few generic processes and a few short-term claims-type outcomes measures (e.g., risk-adjusted 30-day mortality rates).
    The healthcare industry has to start collecting and analyzing lots of comprehensive, detailed, clinically-relevant data, including:
    • Diagnostic data about patients’ physical and psychological conditions (including lab test and imaging results, vital signs, patient-reported symptoms, mental health exams, etc.); environment influences; genetic markers; family history; treatment history; patient preferences; etc.
    • Treatment data including compliance to standard guidelines, variances from the guidelines and the reasons/justifications, the timeliness and appropriateness of plan-of-care order execution, clinician variables, patient compliance, etc.
    • Outcomes data – both short term and long term – including changes in patients’ symptomotology and morbidity/quality of life, as well as mortality.
    • Preventative maintenance data.
    Only by studying this wealth and diversity of data can be begin to understand the relationship between diagnosis, treatment, and outcomes. If we did this, I bet we’ll discover all sorts of useful information that will improve care quality by, for example, realizing that we must revamp our diagnostic system to classify each patient more precisely; we ought to tailor treatment guidelines to patient’s individualized needs and preferences; we had better track patient’s data across their entire lifetimes; and we should pay attention to the mind-body connection.
    When we get serious about all this, we will see significant improvements in care. The incentives, therefore, should be on collecting and analyzing a much greater breadth and depth of patient and treatment-related data in order to increase our knowledge about the safest and most cost-effective care and prentative services for each person.

  9. Matthew writes:
    > So there are to my mind only two possible conclusions
    One of my former boss/mentors taught me If you didn’t measure it, you don’t know it. Bureaucracy driven process control and reporting has enabled the measurements outlined in the abstract. This is science, and is reason enough to continue bureaucracy driven process control and reporting.
    Apparently, relatively simple industrial processes can improve 30-day mortality rates for AMI by 6% in spite of the other apparent fact that “we” don’t know much about what drives 30-day mortality rates. This is nothing to sneeze at. Billion dollar drugs are patented and marketed on less than this.
    I am personally more interested in 1,000-day mortality rates than I am in 30-day mortality rates, but for now it seems this is the best we can do. Let us therefore have even more bureaucracy driven process control and reporting. Errr…. I mean science.

  10. I swear I hadn’t read maggie’s comments before I posted this. But now I remember who posted the article a few weeks ago! Thanks Maggie!

  11. Disclosure: I haven’t read this study in full yet.
    But a few quick comments:
    – How can anyone knock Pay for Performance while it is still in its infancy? We’ve finally aligned hospitals’ incentives with care that should in theory result in less business for the hospital…aligning hospitals’ incentives with patients’ is the Fermat’s Last Theorem of health care, and at least CMS is trying.
    – Someone on this blog referenced Atul Gawande’s New Yorker Article entitled “The Bell Curve” (http://www.newyorker.com/fact/content/?041206fa_fact) about a month ago. I think this article is a great illustration of the (in the words of nacho libre) “neety greety” that goes on in hospitals that results in different outcomes of care. I do not see why this type of investigation can’t go on with AMI or any other outcome. Once they get down to the human interaction level, they could look across the cohort for repeat behaviors. (In other words, I’m going with ‘we don’t know what works’…yet. But I trust that our ability to keep data more effectively will continue, and at some point we’ll have the right information to cover the other 94% of variation.
    – Appropriately, the above article references the work of Don Berwick. A couple things about the Donald. 1.)He really likes systems. He is famous for passing on a phrase that he heard somewhere that reads “every system produces outcomes that the system itself is designed to produce” or something like that (shows you how much I was paying attention in class). The point is, he believes that you can tweak inputs to produce desired outputs – that’s the whole point of the 100k lives campaign. 2.) My guess is he could have told you the results of the JAMA study before they happened. If anyone has respect for the multitude of variables that go into these outcomes, it is Berwick. But until we know more, why not incentivize what we know is good practice?
    It would be a shame to trivialize the current rubric for quality because not enough variation is explained. I see this study as a sign that the low-hanging fruit is still hanging in the search for explanations of outcome variation. If we’re willing to take a walk through the orchard, my guess is we won’t come back hungry.

  12. The JAMA study shows that outcomes research is still an infant science–which makes “pay for performance” premature, at best.
    Today,the measures that we use to rate performance are generally either too crude (did the patient live or die over the next six months?) or too narrow (did he receive an aspirin?)
    Measuring the quality of medical outcomes is far more complex than judging the quality of Toyotas as they roll off the assembly line. We need far subtler measures of quality.
    For exapmle, if the patient died, was he in extreme pain? Did he undergo an unncessary, stressful operation before he died? If he lived, did he go home to play with his grandchildren, or did he wind up warehoused in a nursing home for the next seven years?
    It will take time to find subtler ways of asking the questions that measure quality of care. In the meantime, there is a real danger that if we focus on the things that are easy to count–and train people to focus on a checklist of those 20 things– they will be distracted from paying attention to other, more important aspects of care. (Studies show that this has proved a problem in industires where workers have been paid for performance–they do the precise things they need to do to get the cheese, but while concentrating on the cheese, they neglect other things.)
    For example, it’s definitely good to give the aspirin. But in terms of quality of care, it’s at least as important that someone sits down with the patient suffering from congestive heart failure–and his family–to discuss the pros and cons of an operation that cannot cure a weak heart.
    But just because we’re in the early stages of learning how to measure quality does not mean that we should give up on trying to develop guidelines for “best practice” based on clinical evidence. It just means that this will be a long process.
    In a piece titled “The Bell Curve” (The New Yorker 12/04/06), Dr. Atul Gawande reveals how long it can take to develop guidelines for just one disease: cystic fibrosis.
    “One small field in medicine has been far ahead of most others in measuring the performance of its practitioners: cystic-fibrosis care,” Gawande writes. “For forty years, the Cystic Fibrosis Foundation has gathered detailed data from the country’s cystic-fibrosis treatment centers. It did not begin doing so because it was more enlightened than everyone else. It did so because, in the nineteen-sixties, a pediatrician from Cleveland named LeRoy Matthews was driving people in the field crazy.
    Matthews was claiming to have an annual mortality rate at Babies and Children’s hospital in Cleveland that was less than two per cent. To anyone treating CF at the time, it was a preposterous assertion. National mortality rates for the disease were estimated to be higher than twenty per cent a year, and the average patient died by the age of three.
    In 1964, the Cystic Fibrosis Foundation gave a University of Minnesota pediatrician named Warren Warwick a budget of ten thousand dollars to collect reports on every patient treated at the thirty-one CF centers in the United States that year—data that would test Matthews’s claim. Several months later, he had the results: the median estimated age at death for patients in Matthews’s center was twenty-one years, seven times the age of patients treated elsewhere. He had not had a single death among patients younger than six in at least five years.
    Matthews’ treatment was complex. “Unlike pediatricians elsewhere, Matthews viewed CF as a cumulative disease and provided aggressive treatment long before his patients became sick. He made his patients sleep each night in a plastic tent filled with a continuous, aerosolized water mist so dense you could barely see through it. This thinned the tenacious mucus that clogged their airways and enabled them to cough it up. Like British pediatricians, he also had family members clap on the children’s chests daily to help loosen the mucus. Finally, he devised a detailed rigorous daily regime for his patients and insisted that they follow it–day after day, year after year.
    Ultimately Matthews’ treatment became the standard in this country. The American Thoracic Society endorsed his approach, and Warwick’s data registry on treatment centers proved to be so useful that the Cystic Fibrosis Foundation has continued it
    As a result, many cystic fibrosis patients live much, much longer. (By1966, mortality from CF nationally had dropped so much that the average life expectancy of CF patients had already reached ten years. By 1972, it was eighteen years)
    Still, most cystic fibrosis centers are not as successful as Matthews. A few were extremely successful; most were moderately successful. This brings me to Barry’s question.
    Yes, the skill of the doctor, nurses and others in the team is key. As Gawande points out: “If you plotted a graph showing the results of all the centers treating cystic fibrosis—or any other disease, for that matter—people expected that the curve would look something like a shark fin, with most places clustered around the very best outcomes. But the evidence has begun to indicate otherwise. What you tend to find is a bell curve: a handful of teams with disturbingly poor outcomes for their patients, a handful with remarkably good results, and a great undistinguished middle.
    In other words, doctors, like electricians, money managers and all of the rest of us live on a bell curve. There are very few Warren Buffets.
    But Matthews cannot treat all of the children with cystic fibrosis. (This is another way in which the practice of medicine is different from manufacturing Toyotas. If Toyota hit upon a new “best” model that got 80 miles to the gallon, it could simply turn out tens of thousands of that “new improved” Toyota. But no one can give the most successful doctors on the far left of the bell curve another thousand hours in their day.)
    However, by studying outcomes, identifying Matthews success, and turning his treatment into guidelines that other doctors can use, we can MOVE THE ENTIRE BELL CURVE TO THE LEFT. That’s the goal.
    Also, we can begin to identify phyisicans on the far right side of the curve, and make sure that they do no harm. (For instance, surgeons say that some surgeons should only assist in operations, but never lead the team.)
    In the end, all of this will take time–years and years of collecting data (which we will be able to do much more easily when we have electronic records of patients’ histories), analyzing that data, learning to ask questions about the aspects of care that are hardest to measure, and putting clinical evidence into practice.
    Progress will be incremental (though with each increment, a dozen, two dozen, a hundred more people not only survive but thrive)and the project will always be open-ended. This, however, is no reason to abandon it.

  13. Maybe Eric, but the truth is that the lack of a system now has lead to the tremendous variation in outcomes, and I find it more likely that we need more investigation of what works rather than just accepting the status quo. Of course that assumes that we care about outcomes. As you know I’m more interested in constraining inputs.

  14. Barry- that is a fabulous question… and one for which I would like to answer with a whole posting— I will try to find time today or tonight to give your answer its due.

  15. Could it be that the most important variable here is the skill of the surgeon or interventional cardiologist, at least, with respect to those patients that required an intervention or surgery?

  16. Hmmm- could that actually be why so many people– doctors, nurses, etc.– are so skeptical of government-controlled, bureaucracy driven, ‘pay for performance’?
    Outcomes have so many variables- and patients are so different- that focusing on basic process measures like IHI, and rewarding both patient and doctor process measures like Bridges to Excellence, and making compensation decisions based upon real science like health courts (see my op-ed at http://www.azcentral.com/arizonarepublic/opinions/articles/1231satlet5-311.html) are a better blueprint for improving healthcare.
    Quoting you Matthew ‘neither one is particularly satisfying explanation’ has not prevented so many at THCB from wanting to believe that beta ‘blockers and aspirin’ and the like are the holy grail so much that reality is denied.