Uncategorized

Rethinking our Thinking about Diagnostic tests: There is nothing Positive or Negative about a Test result

Making a diagnosis is easy if the test we use to make the diagnosis defines the disease. These sorts of tests, called “reference-standard” tests, when present at any level of the test’s result, make the diagnosis. A spinal fluid culture growing listeria or opioids in the urine are examples.

Using reference-standard tests in clinical medicine, however, is not the norm. The reason for this is that reference-standard tests often don’t exist and if they do they may be dangerous, difficult to obtain, and costly. Hence, we use most often non-reference standard tests that can only raise or lower the likelihoods of diseases. There is nothing particularly new in these comments. Every reader will know such concepts as, the “sensitivity and specificity” of a test. Every reader will remember hearing about, or be able to construct, 2X2 tables showing the sensitivity of a test; the corresponding false negative percent; the specificity of the test; the corresponding false positive percent.

But, despite the ever-present teaching of how tests ‘work”, it is my experience that physicians and patients have difficulty using the measures of a test’s value in clinical care. This difficulty is manifest in the observation that diagnosis mistakes may be common and the perceived mistake is the inciting event in up to 40% of malpractice cases. If the conceptual ideas for appropriate test characteristics are so clear and well taught, why is there so much difficulty in using tests to make a correct diagnosis? I contend that the way we teach and understand testing has not allowed us to advance an ideal, numerate approach to accurately making a diagnosis. I claim, also, that the concept of a single “sensitivity and specificity” for a test is actually suspect, even incorrect.

Such a claim, given the ingrained history of teaching about tests, will take some clear arguing. First, the terms used to describe the value of a test result are based on a designation of the test result being positive or negative. The dichotomy results in the following terms: sensitivity (true positive test), specificity (true negative test), 1- sensitivity (false negative test result), and 1- specificity (false positive test result). Hence, the dichotomous categorization of a test’s result sets up a 2 X 2 clinical world. Physicians are forced to truncate a differential diagnosis to two diagnoses in order to fit this dichotomous world of positive/negative test results. This dichotomy of positive or negative, then, has led to test characteristics taught as a single set of numbers for an individual test. But, clinicians face more than 2 potential disease conditions that may explain a patient’s symptoms and signs, and, in fact, there is no dichotomy for a test’s result. Test results are just what they are, a result, and every level of the test’s value carries a different set of numbers with it. There is no single set of numbers for a test’s result; there are as many sets of numbers as there are individual results of a test. Hence, the conceptual idea of a positive or negative test underestimates the information contained in a test result. The relevant analogy is in the “P” statistic, the p-value which we artificially deploy as indicating a positive or negative result.

Designating a test as positive/negative requires a “cut-off”, or threshold value for the test. A test result was never intended to be “ordained” as positive or negative. A test result is merely a number that is open to interpretation. The idea that a test is either true or false took seed in “signal-detection-theory” in the 1940’s in radar and psychometrics. A signal on a radar screen may be an enemy, or a flock of birds, and training in radar detection aimed to increase the true detections of enemy while reducing a retaliatory barrage at our feathered friends (falsely detected as enemy). In diagnostic testing, values for the test result above the cut-off value are positive (enemy), and, below, negative (birds). However, as in radar detection, some signals are stronger than others and the ability to determine true from false depends on the strength of the signal. For example, the values for the Prostate Specific Antigen (PSA) of 8 ng/dl and 60 ng/dl are both “positive” tests as they are greater than the cut-off value used to appoint that test as positive/negative. But, if the sensitivity and specificity assigned to the cut-off value of PSA is used to classify patients with these specific laboratory values, important information about the ability to detect cancer is being lost [kattan].

There are, actually, then, different sensitivity numbers for each level of a test’s result. If we change our thinking about dichotomies, then, the term, specificity, is not needed. A false positive test result really is the just the sensitivity of the test’s result for a different clinical condition. There is no need for any other term than sensitivity when describing tests. The crucial information in clinical care is, 1) the absolute level of the test result, regardless if the value is low or high, not the cut-off value, and 2) the sensitivity of that particular laboratory value for any disease in the differential diagnosis.

The sensitivity of a test result will vary, then, by the level of the test result and the potential disease conditions that a patient may have. Table 1,2 and 3 is an illustration. A hypothetical list of potential diseases is listed in the first column of Table 1 and the sensitivities of the hypothetical test at 2 test result levels (moderately-high level and very-high level) for each disease are in the second and third columns. For any disease, a lower value for a test’s result will be seen more often in diseased people (higher sensitivity); a higher value for the test’s result will occur less frequently, or, in other words have a lower sensitivity.  But, the important relationship between sensitivities, at any test level, is the ratio of sensitivities. We can construct a ratio of sensitivities for any of the competing disease entities depending on the specific test value.  For example, in Table 2, since the test result in the second column for the “moderately-elevated” test value is seen in 60% of patients with disease A and in 10% of those with disease B, the test result in the second column will increase disease A’s odds 6 fold (60%/10%) relative to disease B’s odds. The last column in the table shows how the percent chances of the disease conditions will be altered by a test result at that level. The other two hypothetical disease entities, C and D, are not re-ordered much by the test result. The reason for this is that the sensitivity of the test finding is nearly the same in each.

Table 3 tells a different story. The ratios are different for the test result in the third column. The test result in column 3 is more abnormal. While only 5% of patients with disease, A, have the more abnormal test result, that level is not seen in the other disease states. These tables show that some signals (test values) are definitive; some muddle.

Presently, however, it is a difficult task to find the sensitivity for every value of a diagnostic test as finding these values requires papers to report distributions of the same test values in cohorts of patients with different, defined diseases. Instead, a single summary sensitivity/specificity number set is presented, which limits our understanding of the information contained in the actual test results we find with our patients. Even when researchers address this issue of variable test results, often they report “scores” rather than sensitivities for specific test results [kattan]. Some report Receiver Operator Curves (ROC) curves to depict the differences in the characteristics of test results for every cut-off point of the test. But, these curves do not tell us the specific test values for every point on the curve. Hence, we can’t use these curves for the bedside care of a patient. We do not need to carry scores for test results or summary curves for all values of a test when we see patients; we only need the sensitivity of the exact test result.

Another implication of this change in diagnostic test thinking is, that, if we only have to know the sensitivity of a test result, we can rid ourselves of thinking of the diagnostic process as disease/no-disease. Importantly, knowing the sensitivity of the test result for all diagnoses that may cause a patient’s complaints allows us to refine even a long list of diagnosis possibilities. Table 2 illustrates how using only the sensitivity of a test result (in this example, the moderately-high test result from column 2 of table 1) allows us to manage more than 2 diseases possibilities at the same time. In fact, we can re-order any number of diagnoses with a single test parameter.

This change in thinking about diagnostic tests simplifies testing and diagnosis. Only one term is needed, clinicians would only have to focus on the specific test results in their patients, and, finally, they could more readily use a single test result to re-order and manage longer lists of potential diagnoses. If we accept that only the sensitivity of a specific test result for all competing diseases/conditions that may explain a patient’s complaint matters in clinical care, we will be able to think less of disease/no-disease, but, instead, think more about how a specific test result changes the likelihoods of underlying disease possibilities. If we accept this new convention of testing, researchers will follow the clinician’s lead and improve the way their studies evaluate and communicate test results. It may improve our abilities to make diagnoses if we rid ourselves of the concept of a positive/negative test and focus, instead, on the information contained in the specific findings in our patients.

Table 1:

test 1

 

table 2

 

Table 3:

table 3

ref: Elstein, AS. Thinking about diagnostic thinking; a 30-year perspective. Adv in Health Sci Educ.14:7-18, 2009.

Robert McNutt, MD is a board certified internist in Clarendon Hills, Illinois. He is a Professor at Rush Medical College of Rush University.

Categories: Uncategorized

Tagged as:

4 replies »

  1. Thank you for thoughtful comments. The intent of this piece was simple; the absolute value of a test contains THE useful information. I have been lucky to be, often, the “last diagnostician in line” given my place in medicine. I note, too often, that diagnosis thinking is miss-guided by not paying attention to the information in the test value. Many times I hear quotes of likelihood ratios that do not match the clinical condition. The piece was to serve as a reminder.

    The comments, however, show that diagnosis and information management are not uniform concepts held singularly in the heads of physicians. A test value contains information, it does not tell you what to do with the information. Just like statistics; it is a number but tells you nothing about what to do with the number. Learning what to do is decision making.

    Last, tests are both underused, and uselessly, over-used. My favorite is BNP, for example. One commenter is correct; severity matters. Yet I see BNP orders on patients with frothing florid pulmonary edema and low ejection fractions get this test ordered over and over. Tests that are co-dependent and redundant are useless regardless their values.

    Last, if one commenter is correct, that docs order tests for reasons beyond the rational use for individual patients, we have bigger problems than innumeracy.

  2. Good exposition of the fallacy of dichotomous test results, although the continuous model brings its own fallacies.

    Few come to mind instantly. First, numbers are continuous but decision-making is dichotomous, or a trilemma or pentalemma if you prefer. This means a line has to be drawn somewhere. There is always going to be non-linearity (or lunacy) near thresholds.

    Second, disease is not a uniform state. There are varying degrees of badness and goodness. A strongly positive test may have a different sensitivity for severe disease vs. mild disease.

    Third, much of decision making is not based on rank probabilities of likelihood of disease but on severity of diseases. This is certainly true in the emergency department where the emergency physician may want to make sure that the patient’s chest pain is not from an aortic dissection. Thus, the physician may approach decision making dichotomously, rather than polychotomously (neologism). We may wish for this not to be the case, but it is and is a very rational way given the constraints of the emergency department.

  3. This is not a better way, just another way, of orienting your thinking towards test ordering strategy. Some folks like cats. Some dogs. The same for chocolate and vanilla.

    I have always gravitated towards PPV/NPV in my approach. I know others who do the same. If varying sensitivity threshold floats your boat, more power to you. But I dont think there is a best way. Just use something.

    I would also add, if docs dont use a rational approach to obtaining tests now, no post will make them believers (they wont grasp above: 2×2 tables a leap for the lapsed senior doc).

    You need an entry first on overcoming lazy thinking and components of good doctoring.

    Brad

  4. Robert, Thanks for this. We needed this discussion.
    Tests are just appendices and additions to the patient’s history….the biochemical and hematologic and microbiologic history, etc. And we can’t have too much history.*

    *Lots of testing is theoretically good _if_1. tests are cheap (most are, intrinsically, but are used as profit centers unfortunately) 2. Doctors know how to use them as guides only–not as biblical pronouncements– and understand their personalities. 3. They are safe and don’t use too much blood or cause too many side affects (eg. you wouldn’t want to use IV glucose tolerance testing too much…et al) or false positive lead-following.

    Disparaging testing has been done for costs and because some docs take their results too literally and chase down everything under the sun.

    Arriving at the correct diagnoses fast by liberal testing will shorten LOSs, and its penumbra of benefits.