Quality is the new watchword in healthcare; it’s what we seek – and increasingly, what we try to measure. Medications, devices, care delivery, hospital services – all are now scrutinized as we seek to gauge their benefit, and justify their cost.
The idea of using metrics to evaluate quality make sense, but only if we can trust the metrics themselves. Otherwise, we risk becoming party to an updated version of craniometry, systematized false-precision that focuses on easily-measurable parameters (such as head circumference) that may not represent meaningful proxies for the assessments we’re really after (i.e. intelligence).
The good news is that the science of testing, of developing evaluation instruments, has improved over time. We’re now better able to recognize the qualities and properties of good tests – and to identify where they’re likely to fall short.
We’re also getting more comfortable with demanding robust evaluation instruments. For example, the FDA’s approach to patient-reported outcomes places exceptional (and appropriate) emphasis on the assessment tool chosen, and requires that it demonstrates the appropriate properties before relying on its results.
Unfortunately, one critically important area within our healthcare system that seems to have escaped such careful review is the way the competence of care providers is typically assessed and certified.
Whether you are an X-ray technician, a physical therapist, a registered nurse, or a transplant surgeon, you are required to pass through a gauntlet of costly certification exams. These tests, already significant, are assuming an even greater importance as the healthcare system increasingly looks to them as proxies for quality. Certification can be required for employment and for admission privileges, and frequently impacts the reimbursement rate for healthcare providers.
All this makes complete sense – provided the certification tests themselves are sound.
Unfortunately, the world of healthcare worker certification remains a bit like the wild west, as medical organizations and professional societies approach certification testing with profoundly different degrees of rigor — and generally little-to-no transparency.
An assessment test, fundamentally, should be judged by two criteria: reliability and validity.
Reliability refers to the intrinsic quality of the measurement instrument itself: Does it yield consistent results, independent of who is scoring it? Would the same person, if administered the test on different occasions, likely receive the same result?
Validity requires reliability, but represents a higher bar, and speaks to the relevance of the test. Does the exam measure what you think it does? Does a healthcare worker certification test accurately predict competence?
With the right tape measure and technique, for example, head circumference might represent an extremely reliable instrument for the measurement of intelligence, but obviously would not represent a valid tool.
High stakes, widely-administered standardized tests such as the SAT and MCAT are generally viewed as reliable instruments, although with limited validity; they are best at predicting first year performance (generally less accurately than high school or college grades, respectively), as well as performance on subsequent standardized exams.
As you move into tests that are given to more specialized groups of people, not only is it even harder to find any meaningful evidence for relevant validity, but even the reliability of the instruments is often (though not always) uncertain, or inadequately substantiated.
As a result, professional societies administer – and pocket hefty fees from – tests that may not be reliable, and which are used as proxies for quality yet may not forecast better than chance, and may not be able to meaningfully distinguish between examinees. The competent may be labeled as unqualified, while the dangerous may be accepted as competent.
Often, there is little incentive for those who administer the tests to put in the effort required to produce a reliable instrument. To this point, most stakeholders have been content to take the word of professional societies regarding who is and isn’t qualified, assuming these expert organization are best positioned to make this determination.
As our healthcare system has come under increased scrutiny, however, one of the great lessons is that we no longer need to accept the word of authority figures, and should insist on transparency, on verifying the data for ourselves.
We now know enough about assessment instrument development that we can demand any professional society offering (and charging steep fees for) certification exams provide data demonstrating these exams are at least reliable instruments – and beyond that, correlate in some way with the outcome we seek. It can no longer be acceptable for certification organizations to hide behind a fortress of assumed authority.
Guilds such as medical societies fundamentally can select their members any way they choose, and can pick their own inclusion criteria, whether arbitrary or rigorously validated. Consequently, responsibility for ensuring the certification process uses reliable measures that robustly correlate with real-world performance likely rests outside the guilds, with those who must decide how much value to place upon guild certification.
At a minimum, no hospital or health system should use certification as a basis for employment for any healthcare worker unless there’s clear evidence the certification itself is meaningful; payors (including the government) should similarly decline to provide a pay differential for certified vs non-certified healthcare workers without clear evidence the certification process is robust, and demonstrably correlated with quality.
It’s concerning to contemplate how much federal money may flow to healthcare workers based on potentially flawed certification processes, and how many patients may be routinely exposed to workers who are not appropriately qualified.
Used properly, metrics – including, but in no way limited to certification testing – can significantly improve the quality of our health, provided the measures used are rigorously developed and transparently validated.
But numbers for their own sake, without an appropriate foundation of support, are worthless. Actually, they’re worse than that: they’re harmful, as they obscure rather than illuminate, and make true quality even more difficult to discern.
David Shaywitz is co-founder of the Center for Assessment Technology and Continuous Health (CATCH) in Boston. He is a strategist at a biopharmaceutical company in South San Francisco. You can follow him at his personal website. This post originally appeared on Forbes.