review article

Simple Statistical Measures for Assessing the Accuracy of a Diagnostic Test in Clinical Medicine

Daniela Leonardis1*,  Graziella D’Arrigo1, Samar Abd ElHafeez2, Maria Fusaro1,3, Stefanos Roumeliotis4, Giovanni Tripepi1*

1National Research Council (CNR), Institute of Clinical Physiology (IFC) 56124 Pisa, Italy.

2Epidemiology Department, High Institute of Public Health, Alexandria University, Alexandria 21561, Egypt.

3Department of Medicine, University of Padua, 35129 Padua, Italy.

4Division of Nephrology and Hypertension, 1st Department of Internal Medicine, AHEPA Hospital, School of Medicine, Aristotle University of Thessaloniki,54636 Thessaloniki, Greece.

*Corresponding author: Leonardis D & Tripepi G, National Research Council (CNR), Institute of Clinical Physiology (IFC) 56124 Pisa, Italy.

Received Date: 18 January, 2023

Accepted Date: 24 January, 2023

Published Date: 30 January, 2023

Citation: Leonardis D, D’Arrigo G, Abd ElHafeez S, Fusaro M, Roumeliotis S, et al. (2023) Simple Statistical Measures for Assessing the Accuracy of a Diagnostic Test in Clinical Medicine. Rep Glob Health Res 6: 148. DOI: https://doi.org/10.29011/2690-9480.100148

Abstract

Diagnosis plays a key role in the decision-making process in medicine. At variance that in simple situations in which it may be sufficient to recognize the clinical picture dictated by the experience, in more complex situations it is very important to choose the most accurate diagnostic test. The simplest statistical measures used to assess the performance of a diagnostic test are sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios (LRs). Prevalence of disease does not affect sensitivity and specificity of a test while it influences its pre-dictive values, positive and negative. Finally, the likelihood ratio allows physicians to interpret test results in a clinical perspective, because it expresses how many patients given by the test as affected are true positives and false positives (positive likelihood ratio) as well as how many pa-tients given by the test as unaffected are true negatives and false negatives (negative likelihood ratio).Diagnostic research is carried out in the setting of a the cross-sectional study design. In this paper, by providing a series of examples from the literature, we explain how to calculate and in-terpret the most simple statistical indexes qualifying a diagnostic test.

Keywords: Diagnostic test, Accuracy, Oxidative stress, Sensitivity, Specificity, Positive and negative predictive values, Positive and negative likelihood ratios, Statistical test

Introduction

Diagnosis, together with prognosis and treatment, is one of the three decisional processes of clinical medicine. An accurate diagnostic evaluation is a first and essential step in optimizing the patient’s prognosis, therefore the correct interpretation of the statistical methods applied to diagnostic research plays a primary role in fully appre-ciating the results of diagnostic studies.

An ideal diagnostic test is defined as “gold standard” because it perfectly dis-criminates between patients with and without a specific disease [1] and is also the ref-erence standard for validating other diagnostic tests. Validation is necessary when a diagnostic test does not clearly distinguish between patients with and patients without a given disease, so that the test results overlap between the two groups. Because of this, it is necessary to measure the degree of uncertainty of the judgment (affect-ed/unaffected) as expressed by the test result: if it is a binary variable (posi-tive/negative), we need to calculate sensitivity (i.e. true positives), specificity (i.e. true negatives), positive predictive value, negative predictive value, positive likelihood ra-tio (+LR), negative likelihood ratio (-LR), and accuracy (i.e. the percentage of cases correctly classified) of diagnostic test versus the best available test at a certain mo-ment, i.e. gold standard [2]. If the diagnostic test is quantitative and therefore ex-pressed in continuous variables the discriminatory power is obtained by applying re-ceiver operating characteristic (ROC) curve analysis. [3]. To understand the meaning of each of these indices we consider the general example reported in Table 1, which combines the results of a diagnostic test (positive/negative) with the presence/absence of a certain disease.

 

 

 

Disease

Test Result

Present

Absent

 

Positive

A

b

 

a + b

Negative

C

d

 

c + d

 

a + c

b + d

 

N

Table 1: Contingent table of the agreement between test results and presence/absence of the disease.

Data in Table 1 serve to calculate the various indexes of the diagnostic power of a test, that is:

- sensitivity, i.e. the percentage of patients with a positive test among those who are sick [a/(a+c)] (true positives);

- specificity, i.e. the percentage of patients with negative test among those who do not have the disease [d/(b+d)] (true negative).

The degree of uncertainty around the estimates of sensitivity and specificity is ex-pressed by 95% confidence interval (i.e., 95% CI), this latter providing the precision of each estimate:

 


- positive predictive value, i.e. the percentage of sick patients among those who are test positive [a / (a + b)];

- negative predictive value, i.e. the percentage of healthy patients among those who are test - negative [d/(c + d)];

- accuracy, i.e. the percentage of patients correctly classified by the test [(a+ d)/N].

- Positive LR (+LR): it is the ratio between the probability that the test is positive in sick people and the probability that the test is positive in healthy people [+LR = sensi-tivity / (100-specificity)]. Thus, it is the ratio between true positives and false positives. From this perspective, the +LR expresses, in the case of a positive test result, how many times it is more likely that the subject is sick rather than healthy. As a consequence, the greater the + LR, the greater the diagnostic performance of the test.

-Negative LR (-LR): it is the ratio between the probability that the test is negative in sick people and the probability that the test is negative in healthy people [-LR = (100-sensitivity) / specificity]. Thus, it is the ratio between false negatives and true negatives. Therefore, the - LR expresses, in the case of a negative test result, how many times it is more likely that the subject is sick rather than healthy. The consequence of this is that the lower the -LR, the greater the diagnostic performance of the test. Alt-hough there are no universally accepted criteria in literature for the interpretation of the positive and negative LR, it is possible to refer to Table 2 which relates some of the possible values of the two indices with the usefulness of the test in clinical practice.

+ LR

- LR

Impact of the Test on the Diagnosis

>10

<0.1

Conclusive

5-10

0.1-0.2

Moderately useful

2-5

0.2-0.5

Sometimes useful

1-2

0.5-1

Rarely useful

1

1

Useless

Table 2: General interpretation of positive and negative LR in clinical practice.

In the description of a commercially available diagnostic kit, we always find the value of sensitivity and specificity and never the positive and negative predictive val-ues. This is because sensitivity and specificity, being independent of the prevalence (or pre-test probability) of the disease, are fixed properties of the test, i.e. are valid in any context, unlike the positive and negative predictive values, which are context-specific indices, i.e. are affected by the prevalence of the disease of interest. The gold standard study design to assess the diagnostic value of a certain biomarker is the cross-sectional study. In fact, the cross-sectional design reflects the typical setting of “making a diag-nosis”, that is to establish whether a patient is affected or not affected by the disease of interest, based on the result of the diagnostic test, at the time of the visit.

In this paper, we explain how to calculate and interpret the most simple statistical indexes qualifying a diagnostic test by exposing some examples.

The first example concerns the diagnostic value of troponin T for alterations in left ventricular mass in patients with end stage kidney failure (ESKF) [4].

The second example is taken from a paper on cardiac allograft vasculopathy (CAV), which is a major threat to long-term survival after heart transplantation (HT) [5]. The role of oxidative stress in the pathogenesis of the vasculopathy is undisputed and, for this reason, the authors have tried to find a simpler and less expensive test that could predict CAV compared to coronary angiography (CAG) and they identified it as the oxidative stress index (OSI), which was defined as the ratio of the total oxidant status (TOS) serum level to total antioxidant capacity (TAC) serum level.

The third example, finally, moves on a topic of virology and, precisely, in the di-agnostic area of Sars-CoV-2 showing the application of these simple statistical meth-ods for both molecular and serological tests [6].

Example 1

Cardiac Troponin T (cTnT) is a strong predictor of adverse clinical outcomes in patients with ESKF and there is strong evidence that this peptide could serve as a bi-omarker of alterations in left ventricular mass and function in this patient-population [4]. Studies on cardiac hormones is a growing research area because they aim to iden-tify biomarkers potentially candidate to replace echocardiography at least in the screening phase of patients at risk.

In a cross-sectional study, 199 ESKF patients were enrolled to assess the overall accuracy of cTnT to discriminate patients with and without left ventricular hypertro-phy (LVH) as assessed by echocardiography. Overall, 149 patients out of 199 had LVH. Thus, the pre-test probability or prevalence of LVH in the study sample is 75% (i.e. 149/199=0.75 or 75%). In this study, the best cut-off of cTnT (i.e. the value of cTnT giv-ing the best combination of sensitivity and specificity as identified by the ROC curve analysis [3]) to identify patients with LVH resulted to be 55 ng/L. The sensitivity and specificity of the cTnT threshold (55 ng/L) were 70.0% and 68.0%, respectively (Table 3).

Test Results

LVH

 

Present

Absent

Positive  (cTnT>55 ng/L)

105

16

121

Negative  (cTnT <55.5 ng/L)

44

34

78

 

149

50

199

Table 3: Diagnostic value of the cTnT cut-off of 55 ng/L to identify ESKF patients with LVH at echocardiography.

The indices of the diagnostic value of cTnT were calculated as follows:

  • Sensitivity: 105/149=0.70 (70%)
  • Specificity: 34/50=0.68 (68%).
  • False positives (1-specificity): 100-68=32.0%;
  • Positive predictive value: 105/121=0.87 (87%)
  • Negative predictive value: 34/78=0.44 (44%)
  • Accuracy: (105+34) /199=0.70 (70%)

Sensitivity and specificity can be combined into a single index: the “likelihood ra-tio”.

The +LR corresponding to the threshold of 55 ng/L of cTnT is 2.3 [70/(100-70)=2.3; that is: sensitivity/(1-specificity)] while the -LR is 0.44 [(100-70)/68=0.44; that is: (1-sensitivity)/specificity]. To interpret a + LR value of about 2, we must consider 1 at the denominator (that is: 2/1). This means that every 3 patients given as having LVH according to a cTNT value > 55 ng/L, 2 are “true positives” and 1 is a “false positive”. To interpret a -LR of 0.44, we preliminary calculate its inverse (i.e. 1/0.44=2.3; i.e. about 2) and always we must consider “1” at the denominator. It implies that every 3 pa-tients given as unaffected by LVH according to a cTnT value <55 ng/L, 2 are “true neg-atives” and 1 is a “false negative”.

We previously alluded to the concept of the pre-test probability of disease and we clarified how this probability generally corresponds to the prevalence. Therefore, in the 199 patients the pre-test probability of LVH is 75% (i.e. 149/199, see Table 3).

We have already stated that the positive predictive value of a biomarker is de-pendent on disease prevalence, as opposed to sensitivity and specificity which repre-sent fixed properties of a test. To better understand this important concept, we calculate the positive predictive value for LVH of cTnT (> 55 ng/L) in two hospital wards: ward A and ward B. The prevalence of LVH is different between the two wards: in ward A it is 20% and in ward B it is 70%. By applying the Bayes’ theorem, it is possible to calculate the positive predictive value of ANF using the prevalence and the +LR.

We start the calculation by referring to the ward A.

Firstly, we calculate the pre-test odds as follows:

pre-test odds =prevalence/(100-prevalence) = 20.0/ (100-20.0) =0.25.

Then, we calculate the post-test odds, that is:

post-test odds = (pre-test odds) * (+LR)=0.25 * 2.3=0.575

Therefore, the post-test probability or positive predictive value of cTnT for LVH in ward A is:

post-test probability = post-test odds/ (post-test odds + 1) = 0.575/ (0.575 + 1) =0.575/1.575=0.36 (36%).

We now consider the ward B, by making the same calculations. That is: pre-test odds =70.0/ (100-70.0) =2.33 post-test odds =2.33 * 2.3=5.4

post-test probability = 5.4/ (5.4 + 1) =5.4/6.4=0.84 (84%)

It is evident how the prevalence of LVH, by considering unchanged the +LR, im-portantly affects the positive predictive value of the test. The higher the prevalence of LVH, the greater the positive predictive value of the cTnT threshold.

Now, we calculate the negative predictive value of the cTnT threshold (≤55 ng/L), always applying the Bayes’ theorem, in the two wards.

In Ward A, we have:

pre-test odds=prevalence/(100-prevalence)=20.0/(100-20.0)=0.25

post-test odds = (pre-test odds) * (-LR) =0.25 * 0.44=0.11 post-test probability = post-test odds /(post test odds + 1), that is the post-test probabil-ity of having LVH given the negative test (i.e. cTnT < 55 ng/L), that  is:

0.11/(0.11 + 1)=0.11/1.11=0.10 (10%)

From this latter calculation, we can derive the post-test probability of not having LVH given the negative test (cTnT <55 ng/L) which corresponds to the negative predictive value: Negative predictive value=1-0.11=0.89 (89%)

Now we consider the ward B, by making the same calculations, that is:

pre-test odds =70.0/ (100-70.0)=2.33

Post-test odds = (pre-test odds) * (-LR) = 2.33 * 0.44=1.03

Post-test probability = post-test odds / (post-test odds + 1), that is the post-test probabil-ity of having LVH given the negative test, i.e. 1.03/(1.03 + 1)=1.03/2.03=0.51 (51%)

From this latter calculation, we again derive the post-test probability of not having LVH given the negative test (cTnT <55 ng/L) which corresponds to the negative predic-tive value:

Negative predictive value= 1-0.51=0.49 (49%).

Thus, the higher the prevalence of LVH, the lower the negative predictive value of the cTnT threshold, and vice-versa, the lower the prevalence of LVH, the higher the nega-tive predictive value of the cTnT threshold.

Example 2

Cardiac allograft vasculopathy (CAV) is an accelerated form of coronary artery disease (CAD) which represents a major factor limiting long-term survival after heart transplantation (HT) showing a frequency ranging from 8% at 1 year after the surgery to 50% within 10 years after HT.

The coronary angiography is the gold standard for detecting focal plaques, but it is associated with the use of contrast and ionizing radiation. The other sensitive tool to identify CAV is intravascular ultrasonography which can detect vasculopathy in the epicardial arteries but lacks the ability to evaluate the entire coronary tree. In a study the researchers investigated a new noninvasive, sensitive and specific tool in order to detect CAV early, assessing the role of oxidative-antioxidative balance disturbances at the beginning and progression of cardiac allograft vasculopathy [5]. The total oxidant status (TOS) is used to estimate the oxidation state of the body and the total antioxi-dant capacity (TAC) is applied to assess the antioxidant status. The oxidative stress index (OSI), which is the ratio of TOS to TAC, could be a more precise index of oxidative stress in the body because it is a comprehensive measurement of TAC and TOS. A total of 194 consecutive patients after HT were enrolled in the study. The diagnosis of CAV was based on the results of coronary angiographies and defined in “early” and “late” according to the current International Society for Heart and Lung Transplanta-tion criteria. The serum levels of TOS and TAC were measured by methods described by Erel [6,7] and the global balance of oxidation-antioxidant was estimated by OSI (TAC/TOS ratio). To identify risk factors for CAV, patients were classified as not hav-ing CAV, defined as the lack of any lesions in the coronary vessels, and as having CAV (from CAV 1 to CAV 3).

The overall accuracy of TAC, TOS, and OSI for CAV detection were evaluated by calculating each area under the curve (AUC) by the receiver operating characteristic (ROC) curve analysis [3]. The cut-off point values of 1.08 for TAC and 4.94 for TOS levels, respectively, as derived from the ROC curve, represent the thresholds of the two biomarkers that maximize the difference between true positive rates and false positive rates. The two cutoff reached good sensitivity (74% and 65%) and specificity (85% and 90%) for CAV detection. Both markers achieved high PPV (83% and 86%, respectively) and NPV (77% and 72%), indicating good results in term of likelihood ratios (+LR=4.8 for TAC and 6.3 for TOS; -LR=0.30 and 0.39 for TAC) as well as good accuracy (79% and 77% for TAC and TOS, respectively). The combined use of TOS and TAC levels with OSI ratio improved identification of CAV and the cut-off value of 4.17 for OSI reached a sensitivity of 89% and a specificity of 87%.

 

Table 4: Summary of the ROC curve analysis for TAC, TOS, and OSI.

Now, we describe in detail the ability of OSI ratio to differentiate patients with CAV from those without CAV in respect of cut-off value
(4.17) (Table 5).

Diagnostic Test

Disease

 

With CAV

Without CAV

Positive (OSI ratio> 4.17)

86

13

99

Negative (OSI ratio < 4.17)

11

84

95

 

97

97

Total: 194

Table 5: Diagnostic value of OSI ratio for detecting CAV

The diagnostic performance of the OSI ratio is as follow: PPV=86/99=0.87 (87%), NPV=84/95=0.88 (88%), sensitivity=86/97=0.89 (89%), and specificity=84/97=0.87 (87%) and accuracy was (86+84)/194=0.88 (88%). The authors conclude that their study high-lights a role of the oxidative stress in cardiac allograft vasculopathy in HT recipients. The results show that the oxidative-antioxidative balance is shifted toward production of free radicals. The OSI ratio represents a new, simple, noninvasive and low-cost marker for CAV detection, generating the opportunity of additional therapy with an-tioxidant substances in the management of patients after HT.

Example 3

The COVID-19 pandemic demonstrates the importance of rapid and accurate di-agnostics in the control of infectious diseases. Laboratory-based molecular assays for detecting SARSCoV-2 in respiratory specimens are the current reference standard for COVID 19 diagnosis, but point-of-care technologies and serologic immunoassays are rapidly emerging. Although real time reverse transcriptase polymerase chain reaction (RT-PCR) - based assays performed in the laboratory on respiratory specimens are the cornerstone of Covid-19 diagnostic testing, several novel or complementary diagnostic methods are being developed and evaluated. In their paper Bisoffi Z et al. [6] assessed the sensitivity, specificity, positive and negative predictive values (PPV and NPV) of three widely used molecular (RT-PCRs) tests, (with six different gene targets) and of six serologic tests [five IgG-IgM rapid diagnostic tests (RDT) and an ELISA IgA-IgG test] for diagnosis of SARS-CoV-2 infection. All consecutive patients presenting to the emergency room with clinical suspicion of COVID-19 and submitted to diagnostic tests were enrolled for a total of 346 patients. Of these, 85 (24.6%) were classified as infected and 261 (75.4%) as no infected. Thus, the pre-test probability of the disease was 24.6%. As first example, we report the results of the assays targeting the RNA-dependent RNA polymerase gene (“Target RdRP kit”, see Table 6). Sensitivity and specificity were cal-culated as 80/85=0.941 (94.1%) and as 259/261=0.992 (99.2%), respectively. PPV and NPV were calculated as 80/82=0.976 (97.6%) and as 259/264=0.981 (98.1%), respec-tively.

 

 

Disease

 

 

 

With Sars-CoV2 infection

Without Sars-CoV2 infection

 

Diagnostic Test

Positive

80

2

82

 

Negative

5

259

264

 

 

85

261

Total: 346

Table 6: Diagnostic value of the molecular test “Target RdRP (kit)”

The specificity and the PPV reached 100% with an assay (Real Quality RQ-SARS-nCoV-2 assay) targeting two genes instead of one that is the spike protein gene (S) and the RNA-dependent RNA polymerase gene (RdRp) (“Target S and RdRp kit”) (Table 7). Sensitivity was 78/85=0.917 (91.7%) and the NPV was 261/268=0.974 (97.4%).

 

 

Disease

 

 

 

 

With Sars-CoV2 infection

Without infection

Sars-CoV2

 

Diagnostic Test

Positive

78

0

 

78

 

Negative

7

261

 

268

 

 

85

261

 

Total:346

Table 7: Diagnostic value of the Molecular test “Target S and RdRp (kit)”

Now we consider the diagnostic value of the Serological test and we report, as an example, only one of the used assays i.e. the “Prima Professional IgM”. The sensitivity of this test was 39/85=0.458 (45.8%), the specificity 208/261=0.796 (79.6%), the PPV 39/92=0.423 (42.3%) and the NPV 208/254= 0.818 (81.8%) (Table 8).

 

 

Disease

 

 

 

With Sars-Cov2 Infection

Without Sars-Cov2 Infection

 

Diagnostic Test

Positive

39

53

92

 

Negative

46

208

254

 

 

85

261

Total:346

Table 8: Diagnostic value of the Serological test “Prima Professional IgM”

The authors conclude that for molecular diagnostic purposes, accepting positive results in any single gene target appears justified for cases with clinical suspicion of COVID-19 in an emergency room. Conversely, a confirmation of the diagnosis, based on the positivity of multiple genomic regions, might be more appropriate when the test is deployed for screening purposes in a phase of low/ very low viral circulation. The serologic tests included in this study did not demonstrate suitable sensitivity for clinical use on acutely ill patients. An overview of the critical appraisal methods and problem-solving skills necessary to an accurate diagnosis of infectious diseases and the identification of infectious agents is reported elsewhere [9].

Summary and Conclusions

In every day clinical practice, the diagnosis is first of all a logical process starting with an accurate evaluation of patient’s signs and symptoms. The diagnosis is essential in clinical medicine. The ideal diagnostic test perfectly discriminates the sick from the healthy, i.e. a test with a sensitivity and specificity of 100%. However, the reference standards that reflects ‘the truth’ are methods generally expensive, time consuming and require specific expertise. Thus, their large-scale use is rather limited. For this reason, studies on the diagnostic value of specific biomarkers are a growing research area because they aim to identify biomarkers, cheap and relatively easy to measure, which can replace a specific gold standard. Sensitivity and specificity are measures for evaluating the accuracy of a diagnostic test that do not depend on the prevalence of the disease. However, two tests with the same accuracy can display different true positive and true negative rates. The positive and negative predictive values, on the other hand, depend on the prevalence of the disease in the population, so the data calculated on a certain population cannot be applicable to a different group. Positive and negative Likelihood ratios are useful because the combine true positives and false positives (+LR) and true negatives and false negatives (-LR). The study design to be used in diagnostic research is the cross-sectional one.

Author Contributions: Conceptualization, D.L, G.D., G.T.; investigation, D.L., S.A.E., G.T.; methodology, S.A.E., G.D., M.F.; supervision, D.L. S.R., G.T.; writing—original draft preparation, D.L. G.T..; writing—review and editing, D.L. G.T. All authors have read and agreed to the published version of the manuscript.

Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.

Data Availability Statement: Not applicable.

Conflicts of Interest: The authors declare no conflict of interest.

References

  1. Cleophas TJ, Zwinderman AH, Cleophas TF, Cleophas EP (2009)Summary of Validation Procedures for Diagnostic Tests. StatisticsApplied to Clinical Trials 4: 433-447.
  2. Karlijn J van S, Vianda SS, Johannes BR, Friedo WD, Carmine Z, JJager K (2009) Diagnostic methods I: sensitivity, specificity, and other measures of accuracy, Kidney International 75: 1257-1263.
  3. Tripepi G, J. Jager K , Friedo W (2009) Dekker and Carmine Zoccali1 Diagnostic methods 2: receiver operating characteristic (ROC) curves. Kidney Int 76 :252-256.
  4. Mallamaci F, Zoccali C, Parlongo S, Tripepi G, Benedetto FA, et (2002) Cardiovascular Risk Extended Evaluation in Dialysis Investigators. Diagnostic value of troponin T for alterations in leftventricular mass and function in dialysis patients. Kidney Int 62: 1884-90
  5. Szczurek W, Gąsior M, Romuk E, Skrzypek M, Zembala M, et al (2020) Investigation of the Role of Oxidative Stress and Factors Associated with Cardiac Allograft Vasculopathy in Patients after Heart Oxidative Medicine and Cellular Longevity 9:9.
  6. Bisoffi Z, Pomari E, Deiana M, Piubelli C, Ronzoni N, et al. (2020) Sensitivity, Specificity and Predictive Values of Molecular and Serological Tests for COVID-19: A Longitudinal Study in Emergency Diagnostics (Basel) 10: 669.
  7. Erel (2005) A new automated colorimetric method for measuring total oxidant status. Clinical Biochemistry 12: 1103–1111.
  8. Erel (2004) A novel automated direct measurement method for total antioxidant capacity using a new generation, more stable ABTS radical cation. Clinical Biochemistry 4: 277–285.
  9. Mahon C, Lehman D (2022) Textbook of Diagnostic Microbiology.

© by the Authors & Gavin Publishers. This is an Open Access Journal Article Published Under Attribution-Share Alike CC BY-SA: Creative Commons Attribution-Share Alike 4.0 International License. With this license, readers can share, distribute, download, even commercially, as long as the original source is properly cited. Read More About Open Access Policy.

Reports on Global Health Research