By PRANAV PURI, PUNEET KAUR, and MARCUS WIGGINS, MBA
As current medical students, the ongoing COVID-19 pandemic represents the most significant healthcare crisis of our lifetimes. COVID-19 has upended nearly every element of healthcare in the United States, including medical education. The pandemic has exposed shortcomings in healthcare delivery ranging from the care of nursing home residents to the lack of interoperable health data. However, the pandemic has also exposed shortcomings in the residency match process.
Consider the United States Medical Licensing Examination (USMLE) Step 1. A 2018 survey of residency program directors cited USMLE Step 1 scores as the most important factor in selecting candidates to interview. Moreover, program directors frequently apply numerical Step 1 score cutoffs to screen applicants for interviews. As such, there are marked variations in mean Step 1 scores across clinical specialties. For example, in 2018, US medical graduates who matched into neurosurgery had a mean Step 1 scores of 245, while those matching into neurology had a mean Step 1 score of 231.
One would assume that, at a minimum, Step 1 scores are a standardized, objective measure to statistically distinguish applicants. Unfortunately, this does not hold true. In its score interpretation guidelines, the National Board of Medical Examiners (NBME) provides Step 1’s standard error of difference (SED) as an index to determine whether the difference between two scores is statistically meaningful. The NBME reports a SED of 8 for Step 1. Assuming Step 1 scores are normally distributed, the 95% confidence interval of a Step 1 score can thus be estimated as the score plus or minus 1.96 times the standard error (Figure 1). For example, consider Student A who is interested in pursuing neurosurgery and scores 231. The 95% confidence interval of this score would span from 215 to 247. Now consider Student B who is also interested in neurosurgery and scores 245. The 95% interval of this score would span from 229 to 261. The confidence intervals of these two scores clearly overlap, and therefore, there is no statistically significant difference between Student A and Student B’s exam performance. If these exam scores represented the results of a clinical trial, we would describe the results as null and dismiss the difference in scores as mere chance.
Continue reading…