Making a diagnosis is easy if the test we use to make the diagnosis defines the disease. These sorts of tests, called “reference-standard” tests, when present at any level of the test’s result, make the diagnosis. A spinal fluid culture growing listeria or opioids in the urine are examples.
Using reference-standard tests in clinical medicine, however, is not the norm. The reason for this is that reference-standard tests often don’t exist and if they do they may be dangerous, difficult to obtain, and costly. Hence, we use most often non-reference standard tests that can only raise or lower the likelihoods of diseases. There is nothing particularly new in these comments. Every reader will know such concepts as, the “sensitivity and specificity” of a test. Every reader will remember hearing about, or be able to construct, 2X2 tables showing the sensitivity of a test; the corresponding false negative percent; the specificity of the test; the corresponding false positive percent.
But, despite the ever-present teaching of how tests ‘work”, it is my experience that physicians and patients have difficulty using the measures of a test’s value in clinical care. This difficulty is manifest in the observation that diagnosis mistakes may be common and the perceived mistake is the inciting event in up to 40% of malpractice cases. If the conceptual ideas for appropriate test characteristics are so clear and well taught, why is there so much difficulty in using tests to make a correct diagnosis? I contend that the way we teach and understand testing has not allowed us to advance an ideal, numerate approach to accurately making a diagnosis. I claim, also, that the concept of a single “sensitivity and specificity” for a test is actually suspect, even incorrect.
Such a claim, given the ingrained history of teaching about tests, will take some clear arguing. First, the terms used to describe the value of a test result are based on a designation of the test result being positive or negative. The dichotomy results in the following terms: sensitivity (true positive test), specificity (true negative test), 1- sensitivity (false negative test result), and 1- specificity (false positive test result). Hence, the dichotomous categorization of a test’s result sets up a 2 X 2 clinical world. Physicians are forced to truncate a differential diagnosis to two diagnoses in order to fit this dichotomous world of positive/negative test results. This dichotomy of positive or negative, then, has led to test characteristics taught as a single set of numbers for an individual test. But, clinicians face more than 2 potential disease conditions that may explain a patient’s symptoms and signs, and, in fact, there is no dichotomy for a test’s result. Test results are just what they are, a result, and every level of the test’s value carries a different set of numbers with it. There is no single set of numbers for a test’s result; there are as many sets of numbers as there are individual results of a test. Hence, the conceptual idea of a positive or negative test underestimates the information contained in a test result. The relevant analogy is in the “P” statistic, the p-value which we artificially deploy as indicating a positive or negative result.
Designating a test as positive/negative requires a “cut-off”, or threshold value for the test. A test result was never intended to be “ordained” as positive or negative. A test result is merely a number that is open to interpretation. The idea that a test is either true or false took seed in “signal-detection-theory” in the 1940’s in radar and psychometrics. A signal on a radar screen may be an enemy, or a flock of birds, and training in radar detection aimed to increase the true detections of enemy while reducing a retaliatory barrage at our feathered friends (falsely detected as enemy). In diagnostic testing, values for the test result above the cut-off value are positive (enemy), and, below, negative (birds). However, as in radar detection, some signals are stronger than others and the ability to determine true from false depends on the strength of the signal. For example, the values for the Prostate Specific Antigen (PSA) of 8 ng/dl and 60 ng/dl are both “positive” tests as they are greater than the cut-off value used to appoint that test as positive/negative. But, if the sensitivity and specificity assigned to the cut-off value of PSA is used to classify patients with these specific laboratory values, important information about the ability to detect cancer is being lost [kattan].
There are, actually, then, different sensitivity numbers for each level of a test’s result. If we change our thinking about dichotomies, then, the term, specificity, is not needed. A false positive test result really is the just the sensitivity of the test’s result for a different clinical condition. There is no need for any other term than sensitivity when describing tests. The crucial information in clinical care is, 1) the absolute level of the test result, regardless if the value is low or high, not the cut-off value, and 2) the sensitivity of that particular laboratory value for any disease in the differential diagnosis.
The sensitivity of a test result will vary, then, by the level of the test result and the potential disease conditions that a patient may have. Table 1,2 and 3 is an illustration. A hypothetical list of potential diseases is listed in the first column of Table 1 and the sensitivities of the hypothetical test at 2 test result levels (moderately-high level and very-high level) for each disease are in the second and third columns. For any disease, a lower value for a test’s result will be seen more often in diseased people (higher sensitivity); a higher value for the test’s result will occur less frequently, or, in other words have a lower sensitivity. But, the important relationship between sensitivities, at any test level, is the ratio of sensitivities. We can construct a ratio of sensitivities for any of the competing disease entities depending on the specific test value. For example, in Table 2, since the test result in the second column for the “moderately-elevated” test value is seen in 60% of patients with disease A and in 10% of those with disease B, the test result in the second column will increase disease A’s odds 6 fold (60%/10%) relative to disease B’s odds. The last column in the table shows how the percent chances of the disease conditions will be altered by a test result at that level. The other two hypothetical disease entities, C and D, are not re-ordered much by the test result. The reason for this is that the sensitivity of the test finding is nearly the same in each.
Table 3 tells a different story. The ratios are different for the test result in the third column. The test result in column 3 is more abnormal. While only 5% of patients with disease, A, have the more abnormal test result, that level is not seen in the other disease states. These tables show that some signals (test values) are definitive; some muddle.
Presently, however, it is a difficult task to find the sensitivity for every value of a diagnostic test as finding these values requires papers to report distributions of the same test values in cohorts of patients with different, defined diseases. Instead, a single summary sensitivity/specificity number set is presented, which limits our understanding of the information contained in the actual test results we find with our patients. Even when researchers address this issue of variable test results, often they report “scores” rather than sensitivities for specific test results [kattan]. Some report Receiver Operator Curves (ROC) curves to depict the differences in the characteristics of test results for every cut-off point of the test. But, these curves do not tell us the specific test values for every point on the curve. Hence, we can’t use these curves for the bedside care of a patient. We do not need to carry scores for test results or summary curves for all values of a test when we see patients; we only need the sensitivity of the exact test result.
Another implication of this change in diagnostic test thinking is, that, if we only have to know the sensitivity of a test result, we can rid ourselves of thinking of the diagnostic process as disease/no-disease. Importantly, knowing the sensitivity of the test result for all diagnoses that may cause a patient’s complaints allows us to refine even a long list of diagnosis possibilities. Table 2 illustrates how using only the sensitivity of a test result (in this example, the moderately-high test result from column 2 of table 1) allows us to manage more than 2 diseases possibilities at the same time. In fact, we can re-order any number of diagnoses with a single test parameter.
This change in thinking about diagnostic tests simplifies testing and diagnosis. Only one term is needed, clinicians would only have to focus on the specific test results in their patients, and, finally, they could more readily use a single test result to re-order and manage longer lists of potential diagnoses. If we accept that only the sensitivity of a specific test result for all competing diseases/conditions that may explain a patient’s complaint matters in clinical care, we will be able to think less of disease/no-disease, but, instead, think more about how a specific test result changes the likelihoods of underlying disease possibilities. If we accept this new convention of testing, researchers will follow the clinician’s lead and improve the way their studies evaluate and communicate test results. It may improve our abilities to make diagnoses if we rid ourselves of the concept of a positive/negative test and focus, instead, on the information contained in the specific findings in our patients.
ref: Elstein, AS. Thinking about diagnostic thinking; a 30-year perspective. Adv in Health Sci Educ.14:7-18, 2009.
Robert McNutt, MD is a board certified internist in Clarendon Hills, Illinois. He is a Professor at Rush Medical College of Rush University.