The Real Problem with Board Exams-and How to Solve It

This week there’s been a debate brewing about why so many young doctors are failing their board exams. On one side John Schumann writes that young clinicians may not have the time or study habits to engage in lifelong learning, so they default to “lifelong googling.” On the other, David Shaywitz blames the tests themselves as being outmoded rites of passage administered by guild-like medical societies. He poses the question: Are young doctors failing their boards, or are we failing them?

The answer is: (C) All of the above.

I can say this with high confidence because as a young doctor-in-training who just completed my second year of medical school, I’ve become pretty good at answering test questions. Well before our White Coat Ceremonies, medical students have been honed into lean, mean, test-taking machines by a series of now-distant acronyms: AP, SAT, ACT, MCAT. Looming ahead are even more acronyms, only these are slightly longer and significantly more expensive: NBME, COMLEX, USMLE, ABIM. Even though their letters and demographics differ, what each of these acronyms share is the ability to ideologically divide a room in less time than Limbaugh.

This controversy directly results from the clear dichotomy* between the theory behind the exams and their practical consequences. In theory these exams do serve necessary and even agreeable purposes, including:

1)     Ensuring a minimum body of knowledge or skill before advancing a student to the next level in her education,

2)     Providing an “objective” measure to compare applicants in situations where demand for positions exceeds supply.

So apart from the common, albeit inconvenient, side effects that students experience (fatigue, irritability, proctalgia), what are the problems with these tests in practice? These are five of the core issues that are cited as the basis for reformations to our current examination model:

1)     Lack of objectivity. Tests are created by humans and thus are inherently biased. While they aim to assess a broad base of knowledge or skills, performance can be underestimated not due to a lack of this base but due to issues with the testing format, such as duration, question types, and scoring procedure (e.g. the SAT penalizes guessers, whereas the ACT does not). Just as our current model of clinical trial testing is antithetical to personalized medicine (What is a standard dose? Or, more puzzlingly, a standard patient?), our current model of testing does not take into account these individual differences.

2)     Lack of correlation to future performance. Bad test-takers can be good doctors and good test-takers can be bad doctors. I imagine that an exam may have an identity crisis if it looked at itself in the mirror and asked, “How many good potential doctors am I screening out by mistake?”  This is the basis for the MCAT’s fifth revision in its 84-year history, which the CEO of the AAMC, Darrel Kirch, explains has “been done with a very clear eye toward the changes that are occurring in health care and the kinds of physicians we will need.”

3)     The Stepping Stone Myth. Students view each exam as an obstacle that has to be overcome to get to the next step, rather than as an opportunity to assess one’s knowledge and skills so they can fill in gaps where needed. This perspective over-instrumentalizes the instrument and is the basis for counterintuitive findings, such as the observation that hyper-specialized physicians (e.g. dermatologists and orthopedic surgeons) have higher USMLE scores than primary care physicians who arguably need a broader knowledge base.

4)     A bloated test-prep industry. Students are responsible for two curricula: the official curriculum offered by the institution at which they receive their degree, and the unofficial curriculum offered by thriving test-prep companies. For example, the refusal of many institutions to “teach to the test” has led to student reliance on dozens of companies selling over-priced review books, lecture videos, and question banks (it’s common for already debt-ridden medical students to spend $100 per month for a subscription to one question bank – six times more than a Netflix account!) As is the case with the publishing industry, the irony is that medical students and professors produce many of these test-prep resources and hand them over to companies that overcharge their classmates and students.

5)     Cycles of cramming and forgetting. During my first semester of medical school, a professor was referring to the binge-purge cycles of medical knowledge and introduced me to an accurate, if not-exactly-politically-correct, description of medical school: “academic bulimia.” This is perhaps the worst consequence of our test-filled path to becoming physicians. Monthly block exams and yearly Step exams do not promote lifelong learning, and instead lead to lifelong cramming – and forgetting.

Shaywitz and a number of THCB readers may discount the importance of this last point because, as he writes, the “ability to regurgitate information is less important than [the] ability to access data and intelligently process it.” After all, medical knowledge is not only growing exponentially, but is also dynamic so it would be an impossible task for any human to truly keep pace and absorb all of the information. But, I don’t believe this is sufficient ground to discount the value of the tests, especially as they relate to securing a basic foundation of medical knowledge.

As an editor of Medgadget and tech enthusiast (Seriously: I recently wrote here about doing a physical exam with a smartphone.), I certainly would like to see clinical decision support tools like IBM’s Watson develop to the point where they can supplant the need for human cognition, both with respect to memory and reasoning. But, as Shaywitz correctly points out it’s important to understand the limitations of any technology, especially where lives are concerned.

Problems may not only result from the technology itself (remember the Space Odyssey’s HAL computer**?), but specifically from our reliance on the technology: in this case, being able to look up information such as drug side effects and physical findings anytime and anywhere. Overreliance on technology can lead to complacency, inefficiency, and errors. Take for example the basic skills of spelling, reading, and math. Regarding complacency, many students have a defeatist mentality when it comes to spelling because checkers are built into word-processors. In terms of inefficiency, it would be incredibly time-consuming to read anything of substance without an essential foundation of vocabulary, even if you had the most advanced or automated dictionary. Back to medicine, THCB reader Dr. Leora Horwitz summarized the situation well in a comment on Shaywitz’s article: “At 20 min per patient (generously), I really need a pretty comprehensive and accurate fund of knowledge that I can access without doing a lot of real-time looking up.” And in terms of errors, students who grow used to calculators often do not double-check the output to make sure it makes sense (this error is still the basis for many drug dosing errors in health care). Technologies of convenience often become technologies of dependence.

Thus, even though board exams may seem to be outdated rites of passage, I believe they will remain important checkpoints that ensure young clinicians devote enough time to developing a fundamental body of medical knowledge. This foundation will be essential to the appropriate interpretation of outputs from any technologies they may use. Where technology can undeniably help clinicians, and the subject I have recently become obsessed with, is in improving the efficiency of learning and retention of key medical concepts – even after the board exams are over. While these exams may be outdated even more anachronistic is the method by which we train our clinicians, which is not substantially different than the system Abraham Flexner espoused more than a century ago. It will be of critical importance that we develop new technologies and processes that improve lifelong learning and retention. The ability to solve this problem for medical students, where the stakes are arguably the highest, has ramifications for other areas of education as well.

Shiv Gaglani is co-founder of the medical education technology company, Osmosis.  An editor of Medgadget, he is currently an MD/MBA candidate at the  Johns Hopkins School of Medicine and Harvard Business School.

* In theory given his perfect SAT score he should be more than ready for college, but in practice he can’t tie his shoelaces.

** HAL is an alphabetical shift of the IBM acronym; hopefully not a portend for health care technology.

8 replies »

  1. The boards say that they test judgment, factual knowledge, and how to use it. Many physicians feel what they test is irrelevant to their practices. There is a lack of hard data that board exams and especially maintenance of certification accomplishes anything except revenue generation for the boards. It is likely that closed book exams are anachronistic in the era of smart phones with immediate internet access.

  2. Prep tests dont cause deficiencies in adaptability, rather it makes it more difficult for me to assess future hires. High test scores mostly reflect wanting to have high test scores. Being nosy, I have asked some of my guys with the test score/performance discrepancy and they did take prep courses. Of course, I dont know the denominator, so that should be taken with a grain of salt.

    What I hope people are learning, and I am not sure it can be learned at the med student level, is how to deal with ambiguity and uncertainty, especially coupled with time pressures. If you routinely take an extra standard deviation or two of time more to do stuff compared with other people, I cannot afford to have you on my staff.


  3. Thanks for the comment, Whatsen. Is your experience based on most EMRs or are there examples you’ve seen where the technology has been seamlessly and usefully integrated?

  4. Hi Bubba, I’m interested as to why that was the main take-away for you, but thanks for raising it because it inspired me to do some reading on bias in computer systems: http://vsdesign.org/publications/pdf/64_friedman.pdf. Algorithms can certainly be weighted to reflect the biases of their human creators (read: almost every dating website), but once a relatively unbiased and non-adapting algorithm is put into place it’ll consistently be unbiased. We know this because we understand exactly how math we device and computers we build work. Conversely we don’t understand our brains to this level, which is reflected by the depth of the literature about how we may reduce errors linked to human subjectivity.

  5. Thanks Steve. I’m interested in hearing your thoughts about how prep companies may be partly responsible for deficiencies in adaptability. Regarding the point about board scores not necessarily correlating with ability to function in new situations, I do see some positive signs that medicine is incorporating these. In my training we’ve had formative assessments that replicate crisis conditions, which have been really useful.

  6. The boards test judgment, factual knowledge, and how to use it in diagnosing and treating disease.

    The new EHR devices displaying screens of legible gibberish, deceptive presentations of data, ordering which requires too many clicks, idiosynchratic and whimsical functionality, and errorgenic potential absorb muchmtime from the residents.

    But after all, this time is well spent learning how to treat the newest scourge on our patients, which is causing innumerable deaths and injuries either directly due to design defects, delays, interoperability failures, system unavailability, or cognitive disruption.

    It is no surprise that doctors and nurses are being dumbed down by the EHR devices.

  7. “Tests are created by humans and thus are inherently biased.”

    In the event that you are unaware of this fact, computers are created by humans, not some futuristic space intelligent-life form and thus can also be said to be biased …

  8. Given the wide disparity in training programs, I think we need some means to have grads demonstrate that they have achieved some minimally acceptable level of knowledge. The tests can serve the that function. I will have to say that in my hires I don’t see that board scores are necessarily correlating with competence and ability. Some of my recent hires with very high board scores are completely unable to function in a crisis or in a situation they have never seen before. I have to wonder if this isnt partly due to the prep companies.