In God We Trust. All Others Must Bring Data.

I knew it would happen sooner or later, and earlier this week it finally did.

In 2003 US News & World Report pronounced my hospital, UCSF Medical Center, the 7th best in the nation. That same year, Medicare launched its Hospital Compare website. For the first time, quality measures for patients with pneumonia, heart failure, and heart attack were now instantly available on the Internet. While we performed well on many of the Medicare measures, we were mediocre on some. And on one of them – the percent of hospitalized pneumonia patients who received pneumococcal vaccination prior to discharge – we were abysmal, getting it right only 10% of the time.

Here we were, a billion dollar university hospital, one of healthcare’s true Meccas, and we couldn’t figure out how to give patients a simple vaccine. Trying to inspire my colleagues to tackle this and other QI projects with the passion they require, I appealed to both physicians’ duty to patients and our innate competitiveness. US News & World Report might now consider us one of the top ten hospitals in the country, I said, but that was largely a reputational contest. How long do you think it’ll be before these publicly reported quality measures factor heavily into the US News rankings? Or that our reputation will actually be determined by real performance data?

The late Israeli prime minister Golda Meir once quipped, “Don’t be humble, you’re not that great.” To me, the launch of the Hospital Compare website was our Golda Meir moment.

And yesterday, when USN&WR unveiled its annual “Honor Roll” of “America’s Best Hospitals” for 2012, our Golda Meir moment came home to roost. UCSF had fallen out of the top 10 for the first time in over a decade – dropping from number 7 to number 13. The headline, though, was that Johns Hopkins Hospital had been pushed off its Number One perch, the coveted (and widely advertised) spot it held for 21 years.

What happened? When US News launched its Best Hospitals list with its April 30, 1990 issue, the entire ranking (which, then and now, considers only large teaching hospitals with advanced technologies) was based on reputation – a survey of 400 experts in each specialty rated the best hospitals in their field. Was this a measure of quality and safety? Maybe a little. But I’d bet that the rankings had more to do with the prominence of each hospital’s senior physicians, its publications and NIH portfolio, the quality of its training programs, and its successes in philanthropy than with the quality of the care it delivered. While the magazine changed the methodology to include some non-reputational outcome and process data in 1993, the reputational survey remained the most important factor. A 2010 Annals of Internal Medicine study found that the reputational score explained the final overall ranking in more than 90% of cases.

For this year’s ranking, USN&WR changed its methodology again, dialing down the credit for reputation (to just under one-third) and increasing the weight given to other statistical measures of quality, such as risk-adjusted mortality (32.5%), patient safety (things like major post-operative bleeding, 5%), and “other care-related indicators” (such as nurse-staffing, 30%). The result is that the Top Ten list – which had been about as predictable as the one chiseled on Mount Sinai – now offered some drama. In addition to UCSF and Hopkins, others that dropped included Penn (10th to 15th) and Michigan (14th to 17th); Vanderbilt, Stanford, and University of Washington fell off the list (which ends at 17) entirely. Moving up were Mass General (now number one), Pitt (from 12th to 10th), and two hospitals that had previously been off the list (NYU, now 11th, and Northwestern, now 12th).

Americans love rankings, and the hospital ranking game has become big business. Ranking is a cash cow for the rankers: when US News publishes a rankings issue (not just of hospitals but professional schools, colleges, and more), it’s a guaranteed hot seller. Rankings are also a big deal for those who get ranked. Hospitals are now ranked by at least half a dozen organizations, including Solucient (recently renamed Truven) and Healthgrades. Even the Joint Commission, which used to limit its work to hospital accreditation, has joined the ranking business. Just in the past two months, two new rankings were released, one by the business coalition The Leapfrog Group and another by Consumer Reports. Add up all the top hospitals lists and you’ll see some 500 American hospitals that can (and do, in billboards large and small) claim to be “One of the Top 100 in the Nation.”

Importantly, the evidence that patients use these rankings to make choices about where to receive their care is limited (Bill Clinton famously choose a “poorly ranked,” on outcomes at least, heart surgeon for his heart bypass operation, based on the surgeon’s reputation). Yet there is no question that good results are touted widely and disappointing results lead to significant soul searching, changes in resource allocation, and even some real improvements. After our Golda Meir moment in 2003, my hospital utterly transformed its approach to quality, safety, and patient experience, and we have made amazing strides (including, you’ll be pleased to know, in our pneumovax rate). Without question, UCSF is a far better hospital today than it was then, and I don’t think that would have happened without public reporting and rankings. The fact that a few of our peer hospitals moved the needle even farther than we did, as reflected in this year’s USN&WR list, will motivate us to do still better.

While some of the energy that rankings create is healthy, there is also a dark side, mostly because today’s quality measures are far from perfect. As the skin in the quality game increases, so too will the unintended consequences. Extra energy and money will go into the problems that feed the rankings, much of it drawn from areas that are just as important but not measured. Just consider all of the attention being lavished on preventing hospital falls and central line infections, safety problems that are not nearly as consequential or common as diagnostic errors (which have received considerably less attention because they’re so hard to measure). Great performance on some measures – like ultra-tight glucose control or the four-hour door-to-antibiotics measure for pneumonia – was ultimately proven to be harmful to patients.

And, as long as many of the outcome measures (such as mortality and readmission rates) are judged based on “observed-to-expected” ratios, hospitals will find it a lot easier to improve their ranking by changing the “expected” number (through changing their documentation and coding) than by actually improving the quality of care. You can bet that every hospital vying for a Top Ten spot is working this angle vigorously (with the aid, of course, of pricey consultants), resulting in something of a coding arms race. Appropriate coding is important and it is worthwhile to truly document our patients’ severity of illness, say by writing “severe sepsis” rather than “sepsis” when it is clinically apt. But this effort to document every co-morbidity and to use words that will trigger higher expected mortality rates can border on Kafkaesque. One consultant recommends that clinicians chart “functional quadriplegia” (yes, it’s got its own ICD-9 code, 780.72) when describing a bedbound patient. I’m sorry, but that’s just silly.

It’s easy to point to the gaming and the potential for unfairness, and to dismiss rankings as a childish and wasteful enterprise, more Reality Show than science. To me, though, the upside far outweighs the downside. Ranking and public reporting does serve to motivate hospitals to take quality and safety seriously, and to invest in systems and people to improve them. The unintended consequences should become less prominent as we develop more robust measures and as we are forced to all report measures the same way – the latter should be a key goal of regulators and an important deliverable for IT vendors. At my hospital, this year’s dip will drive us to redouble our efforts to improve the care we deliver. For patients, that seems like a win.

And as for being #13, well, that still makes us the top ranked hospital within 300 miles of San Francisco. As I said, we healthcare folks are competitive souls.

Robert Wachter, MD, professor of medicine at UCSF, is widely regarded as a leading figure in the patient safety and quality movements. He edits the federal government’s two leading safety websites, and the second edition of his book, “Understanding Patient Safety,” was recently published by McGraw-Hill. In addition, he coined the term “hospitalist” in an influential 1996 essay in The New England Journal of Medicine and is chair-elect of the American Board of Internal Medicine.  His posts appear semi-regularly on THCB and on his own blog, Wachter’s World.

5 replies »

  1. I think the medical coding “arms race” as it were has existed in many different settings since the beginning. However, the hopes of the Medicare Shared Savings Program, ACOs, and alternative quality contracts being advanced by private payers are that we can at least somewhat correct the flawed incentives of FFS payment systems to allow providers to get off the FFS revenue treadmill, have more flexibility to redesign systems and processes, redeploy currently horrifically misplaced capacity in the system, and to bring quality incentives explicitly into the picture (instead of like, I don’t know, Medicare paying MORE for mistakes.)

  2. I don’t spend much time in hospitals, being a primary care physician, so this is more of a voyeuristic activity for me. My goal is for my patients to spend as little time in hospitals as is possible as well, and when they are there I want them to stay there as short of a time as possible spending as little money as possible. I also want my patients to feel the staff cares about them and that the hospitalists see them as more than just a commodity.

    Hospital rankings are big money, as perception will drive business (even in a small market like ours). This means that, to me and to my patients, rankings are really about marketing and not about quality. The most financially successful hospitals will go for rankings that yield the best financial return and market the rankings they have in a way that gives the best ROI.

    So basically I think this is a vestige of a past system which emphasized what was best for doctors and hospitals, and in which the ultimate goal was consumption of resources. It gives little real guidance to patients who want good care. Yes, I do think it is important to make hospitals show their infection rates and quality numbers, but I’d say it’s more like a health department rating of a restaurant than it is a beauty pageant. It’s less “see how beautiful I am,” and more “I’m not as bad as the other guys.”

  3. I have spent time in a variety of settings where the healthcare quality measurement data has been highly suspect. On one particularly memorable occasion, I remember working through an interview case study for a fellowship at the Dartmouth Institute looking at outcomes and quality figures from some common and expensive procedures at some of the “top” systems and institutions in the country. I decided to run an ANOVA test on some of the relevant data they provided because something seemed suspect. We were looking at as little as 2-3% variance on some of the metrics collected which was simply unbelievably consistent. I do realize that variance reduction is usually a major aspect of quality improvement efforts (the lean/six sigma black belt/ninja whatever approach) this was just such an extreme case. I simply looked up at the interviewer and said “this data has clearly been cleaned/scrubbed/tampered with before they sent it to us”. She simply nodded in agreement. Welcome to the future…

    I think the medical coding “arms race” as it were has existed in many different settings since the beginning. However, the hopes of the Medicare Shared Savings Program, ACOs, and alternative quality contracts being advanced by private payers are that we can at least somewhat correct the flawed incentives of FFS payment systems to allow providers to get off the FFS revenue treadmill, have more flexibility to redesign systems and processes, redeploy currently horrifically misplaced capacity in the system, and to bring quality incentives explicitly into the picture (instead of like, I don’t know, Medicare paying MORE for mistakes.)

    The beltway policy wonkish, technocrat, command and control explanation is basically this:

    1. Get facilities to use an integrated, interoperable EMR systems (herculean undertaking itself)
    2. Get facilities used to do consistent data capture, billing, pharmacy, etc.
    3. Get facilities to start using data to drive basic decisions and automate some care processes.
    4. Get facilities to use full-blown clinical decision intelligence systems to drive clinical decisions.

    That is the pipedream. I have been to Geisinger. I have seen what they can do and it is simply amazing. However, I honestly doubt many institutions could ever hold a candle to their facilities for a variety of reasons-financially, managerially, or otherwise.

    Under any system short of “big box medicine” at integrated HMOs like Kaiser where both the financing and provision of care are housed within the same organization or single-payer explicit statutory rationing, providers will always have the ability to game separate payers in the system and transfer financial risk accordingly, since they are the ones collecting the administrative, billing, and financial information themselves. I still believe Kaiser was/is the future of American medicine, and we are about to see medical Armageddon unfold as big payers start reaching down and grabbing provider organizations while providers in new ACO agreements try to reach up and grab their own financing operations.

    However, I think at the very least this will just become part of a process to contain costs if providers end up gaming these quality based reimbursement systems as well. We are rapidly approaching a stage where the attitudes from all payers might well become:

    “Well, at least we eliminated FFS…the providers are obviously faking their quality information…cost shifting is getting under control…the incentives are much improved at least for those institutions that are actually trying… the quality (to the extent we can know) is not getting worse..but at least the maximum year after year increase will be general inflation + 1% now, etc.”

    We have to see how this all pans out, but it certainly is the end of the beginning rather than the beginning of the end.

  4. There is no question about the gaming of the system.

    I am a physician and have worked as a consultant (I don’t recall being ‘high-priced’ but then again, I was only an employee of the parent company) for one of the ratings organizations who also helped hospitals improve their quality.

    The outcome measures are really outdated in a fashion (30dy morbidity outcomes, readmission rates, etc as opposed to much harder to obtain longer-term functional outcomes). But they are a first step for an industry that has not really had great outcomes accountability.

    That USNWR has finally changed from their 60% weighting on reputation is one of those no-brainers that used to have us gnashing our teeth (imaging trying to tell a hospital they had terrible objective quality outcomes when they were being told how great they are by a major magazine).

    No question about the whole complications/expected complication ratio issue. There is a lot of gaming that people try in managing a patient’s risk profile and how likely someone is to suffer a complication. All sorts of discussions about coding take place and it can really start to feel like it’s all about manipulating the system.

    The major comfort I could always take though was this – understanding what a patient’s pre-existing condition was, accurately capturing it, passing the information on, and most importantly, developing a system to use this information to intervene in care PREoperatively and then have a coordinately plan POSToperatively actually made changes in quality outcomes. And this was real improvement in care and communication. Something that occasionally lagged in the numbers, but that patient satisfaction numbers could actually show with greater immediacy and feedback.

    The system is far from perfect. I have heard physicians trash the ratings systems and point out that they wouldn’t take their dog to the higher ranked local hospital nearby because they knew intuitively that care was worse there. But there are plenty of big name, powerhouse academic centers with bad numbers who simply don’t believe that they are not as great as they thought and believe the solution is to dismiss the data (I’m not actually referring to UCSF but a few other facilities).

    For these hospitals, new and better systems in assessing, caputuring and communicating data are a great wake up call.

  5. This post is Exhibit A for why my new book is entitled Why Nobody Beieves the Numbers, and why it’s been endorsed by a founder of the Leapfrog Group and a past head of CMS, who agree that shocking amounts of data are somewhere on a spectrum between misleading and wrong.

    My only complaint about your post is that you’re too nice. This industry is plagued by a combination of bias amongst those who compile the data and mind-boggling innumeracy not just on the part of the average reader but also of the consultants who the average reader reiies on.

    Whie this posting was about hospital quality, the need to improve reporting is equally acute in population-based studies. For instance, Kaiser published a peer-reviewed journal study showing a 56-to-1 ROI and a dramatic reduction in non-cardiac mortaity from pharmacist consults about using more cardiac medications. A careful read showed that the study was full of so many internal contradictions that it basically invaidated itself…but nobody noticed.

    I have stopped procuring wellness and disease management from companies that don’t have Letters of Vaidation from 3rd Parties in which the 3rd Party itsef guarantees that the data is correct, accepting recourse against themselves if a significant mistake is found.