Simple Statistical Measures for Assessing the Accuracy of a Diagnostic Test in Clinical Medicine
Daniela Leonardis^{1*}, Graziella D’Arrigo^{1}, Samar Abd ElHafeez^{2}, Maria Fusaro^{1,3}, Stefanos Roumeliotis^{4}, Giovanni Tripepi^{1*}
^{1}National Research Council (CNR), Institute of Clinical Physiology (IFC) 56124 Pisa, Italy.
^{2}Epidemiology Department, High Institute of Public Health, Alexandria University, Alexandria 21561, Egypt.
^{3}Department of Medicine, University of Padua, 35129 Padua, Italy.
^{4}Division of Nephrology and Hypertension, 1st Department of Internal Medicine, AHEPA Hospital, School of Medicine, Aristotle University of Thessaloniki,54636 Thessaloniki, Greece.
^{*}Corresponding author: Leonardis D & Tripepi G, National Research Council (CNR), Institute of Clinical Physiology (IFC) 56124 Pisa, Italy.
Received Date: 18 January, 2023
Accepted Date: 24 January, 2023
Published Date: 30 January, 2023
Citation: Leonardis D, D’Arrigo G, Abd ElHafeez S, Fusaro M, Roumeliotis S, et al. (2023) Simple Statistical Measures for Assessing the Accuracy of a Diagnostic Test in Clinical Medicine. Rep Glob Health Res 6: 148. DOI: https://doi.org/10.29011/26909480.100148
Abstract
Diagnosis plays a key role in the decisionmaking process in medicine. At variance that in simple situations in which it may be sufficient to recognize the clinical picture dictated by the experience, in more complex situations it is very important to choose the most accurate diagnostic test. The simplest statistical measures used to assess the performance of a diagnostic test are sensitivity, specificity, positive and negative predictive values, positive and negative likelihood ratios (LRs). Prevalence of disease does not affect sensitivity and specificity of a test while it influences its predictive values, positive and negative. Finally, the likelihood ratio allows physicians to interpret test results in a clinical perspective, because it expresses how many patients given by the test as affected are true positives and false positives (positive likelihood ratio) as well as how many patients given by the test as unaffected are true negatives and false negatives (negative likelihood ratio).Diagnostic research is carried out in the setting of a the crosssectional study design. In this paper, by providing a series of examples from the literature, we explain how to calculate and interpret the most simple statistical indexes qualifying a diagnostic test.
Keywords: Diagnostic test, Accuracy, Oxidative stress, Sensitivity, Specificity, Positive and negative predictive values, Positive and negative likelihood ratios, Statistical test
Introduction
Diagnosis, together with prognosis and treatment, is one of the three decisional processes of clinical medicine. An accurate diagnostic evaluation is a first and essential step in optimizing the patient’s prognosis, therefore the correct interpretation of the statistical methods applied to diagnostic research plays a primary role in fully appreciating the results of diagnostic studies.
An ideal diagnostic test is defined as “gold standard” because it perfectly discriminates between patients with and without a specific disease [1] and is also the reference standard for validating other diagnostic tests. Validation is necessary when a diagnostic test does not clearly distinguish between patients with and patients without a given disease, so that the test results overlap between the two groups. Because of this, it is necessary to measure the degree of uncertainty of the judgment (affected/unaffected) as expressed by the test result: if it is a binary variable (positive/negative), we need to calculate sensitivity (i.e. true positives), specificity (i.e. true negatives), positive predictive value, negative predictive value, positive likelihood ratio (+LR), negative likelihood ratio (LR), and accuracy (i.e. the percentage of cases correctly classified) of diagnostic test versus the best available test at a certain moment, i.e. gold standard [2]. If the diagnostic test is quantitative and therefore expressed in continuous variables the discriminatory power is obtained by applying receiver operating characteristic (ROC) curve analysis. [3]. To understand the meaning of each of these indices we consider the general example reported in Table 1, which combines the results of a diagnostic test (positive/negative) with the presence/absence of a certain disease.



Disease 

Test Result 
Present 
Absent 


Positive 
A 
b 

a + b 
Negative 
C 
d 

c + d 

a + c 
b + d 

N 
Table 1: Contingent table of the agreement between test results and presence/absence of the disease.
Data in Table 1 serve to calculate the various indexes of the diagnostic power of a test, that is:
 sensitivity, i.e. the percentage of patients with a positive test among those who are sick [a/(a+c)] (true positives);
 specificity, i.e. the percentage of patients with negative test among those who do not have the disease [d/(b+d)] (true negative).
The degree of uncertainty around the estimates of sensitivity and specificity is expressed by 95% confidence interval (i.e., 95% CI), this latter providing the precision of each estimate:
 positive predictive value, i.e. the percentage of sick patients among those who are test positive [a / (a + b)];
 negative predictive value, i.e. the percentage of healthy patients among those who are test  negative [d/(c + d)];
 accuracy, i.e. the percentage of patients correctly classified by the test [(a+ d)/N].
 Positive LR (+LR): it is the ratio between the probability that the test is positive in sick people and the probability that the test is positive in healthy people [+LR = sensitivity / (100specificity)]. Thus, it is the ratio between true positives and false positives. From this perspective, the +LR expresses, in the case of a positive test result, how many times it is more likely that the subject is sick rather than healthy. As a consequence, the greater the + LR, the greater the diagnostic performance of the test.
Negative LR (LR): it is the ratio between the probability that the test is negative in sick people and the probability that the test is negative in healthy people [LR = (100sensitivity) / specificity]. Thus, it is the ratio between false negatives and true negatives. Therefore, the  LR expresses, in the case of a negative test result, how many times it is more likely that the subject is sick rather than healthy. The consequence of this is that the lower the LR, the greater the diagnostic performance of the test. Although there are no universally accepted criteria in literature for the interpretation of the positive and negative LR, it is possible to refer to Table 2 which relates some of the possible values of the two indices with the usefulness of the test in clinical practice.
+ LR 
 LR 
Impact of the Test on the Diagnosis 
>10 
<0.1 
Conclusive 
510 
0.10.2 
Moderately useful 
25 
0.20.5 
Sometimes useful 
12 
0.51 
Rarely useful 
1 
1 
Useless 
Table 2: General interpretation of positive and negative LR in clinical practice.
In the description of a commercially available diagnostic kit, we always find the value of sensitivity and specificity and never the positive and negative predictive values. This is because sensitivity and specificity, being independent of the prevalence (or pretest probability) of the disease, are fixed properties of the test, i.e. are valid in any context, unlike the positive and negative predictive values, which are contextspecific indices, i.e. are affected by the prevalence of the disease of interest. The gold standard study design to assess the diagnostic value of a certain biomarker is the crosssectional study. In fact, the crosssectional design reflects the typical setting of “making a diagnosis”, that is to establish whether a patient is affected or not affected by the disease of interest, based on the result of the diagnostic test, at the time of the visit.
In this paper, we explain how to calculate and interpret the most simple statistical indexes qualifying a diagnostic test by exposing some examples.
The first example concerns the diagnostic value of troponin T for alterations in left ventricular mass in patients with end stage kidney failure (ESKF) [4].
The second example is taken from a paper on cardiac allograft vasculopathy (CAV), which is a major threat to longterm survival after heart transplantation (HT) [5]. The role of oxidative stress in the pathogenesis of the vasculopathy is undisputed and, for this reason, the authors have tried to find a simpler and less expensive test that could predict CAV compared to coronary angiography (CAG) and they identified it as the oxidative stress index (OSI), which was defined as the ratio of the total oxidant status (TOS) serum level to total antioxidant capacity (TAC) serum level.
The third example, finally, moves on a topic of virology and, precisely, in the diagnostic area of SarsCoV2 showing the application of these simple statistical methods for both molecular and serological tests [6].
Example 1
Cardiac Troponin T (cTnT) is a strong predictor of adverse clinical outcomes in patients with ESKF and there is strong evidence that this peptide could serve as a biomarker of alterations in left ventricular mass and function in this patientpopulation [4]. Studies on cardiac hormones is a growing research area because they aim to identify biomarkers potentially candidate to replace echocardiography at least in the screening phase of patients at risk.
In a crosssectional study, 199 ESKF patients were enrolled to assess the overall accuracy of cTnT to discriminate patients with and without left ventricular hypertrophy (LVH) as assessed by echocardiography. Overall, 149 patients out of 199 had LVH. Thus, the pretest probability or prevalence of LVH in the study sample is 75% (i.e. 149/199=0.75 or 75%). In this study, the best cutoff of cTnT (i.e. the value of cTnT giving the best combination of sensitivity and specificity as identified by the ROC curve analysis [3]) to identify patients with LVH resulted to be 55 ng/L. The sensitivity and specificity of the cTnT threshold (55 ng/L) were 70.0% and 68.0%, respectively (Table 3).
Test Results 
LVH 


Present 
Absent 

Positive (cTnT>55 ng/L) 
105 
16 
121 
Negative (cTnT <55.5 ng/L) 
44 
34 
78 

149 
50 
199 
Table 3: Diagnostic value of the cTnT cutoff of 55 ng/L to identify ESKF patients with LVH at echocardiography.
The indices of the diagnostic value of cTnT were calculated as follows:
 Sensitivity: 105/149=0.70 (70%)
 Specificity: 34/50=0.68 (68%).
 False positives (1specificity): 10068=32.0%;
 Positive predictive value: 105/121=0.87 (87%)
 Negative predictive value: 34/78=0.44 (44%)
 Accuracy: (105+34) /199=0.70 (70%)
Sensitivity and specificity can be combined into a single index: the “likelihood ratio”.
The +LR corresponding to the threshold of 55 ng/L of cTnT is 2.3 [70/(10070)=2.3; that is: sensitivity/(1specificity)] while the LR is 0.44 [(10070)/68=0.44; that is: (1sensitivity)/specificity]. To interpret a + LR value of about 2, we must consider 1 at the denominator (that is: 2/1). This means that every 3 patients given as having LVH according to a cTNT value > 55 ng/L, 2 are “true positives” and 1 is a “false positive”. To interpret a LR of 0.44, we preliminary calculate its inverse (i.e. 1/0.44=2.3; i.e. about 2) and always we must consider “1” at the denominator. It implies that every 3 patients given as unaffected by LVH according to a cTnT value <55 ng/L, 2 are “true negatives” and 1 is a “false negative”.
We previously alluded to the concept of the pretest probability of disease and we clarified how this probability generally corresponds to the prevalence. Therefore, in the 199 patients the pretest probability of LVH is 75% (i.e. 149/199, see Table 3).
We have already stated that the positive predictive value of a biomarker is dependent on disease prevalence, as opposed to sensitivity and specificity which represent fixed properties of a test. To better understand this important concept, we calculate the positive predictive value for LVH of cTnT (> 55 ng/L) in two hospital wards: ward A and ward B. The prevalence of LVH is different between the two wards: in ward A it is 20% and in ward B it is 70%. By applying the Bayes’ theorem, it is possible to calculate the positive predictive value of ANF using the prevalence and the +LR.
We start the calculation by referring to the ward A.
Firstly, we calculate the pretest odds as follows:
pretest odds =prevalence/(100prevalence) = 20.0/ (10020.0) =0.25.
Then, we calculate the posttest odds, that is:
posttest odds = (pretest odds) * (+LR)=0.25 * 2.3=0.575
Therefore, the posttest probability or positive predictive value of cTnT for LVH in ward A is:
posttest probability = posttest odds/ (posttest odds + 1) = 0.575/ (0.575 + 1) =0.575/1.575=0.36 (36%).
We now consider the ward B, by making the same calculations. That is: pretest odds =70.0/ (10070.0) =2.33 posttest odds =2.33 * 2.3=5.4
posttest probability = 5.4/ (5.4 + 1) =5.4/6.4=0.84 (84%)
It is evident how the prevalence of LVH, by considering unchanged the +LR, importantly affects the positive predictive value of the test. The higher the prevalence of LVH, the greater the positive predictive value of the cTnT threshold.
Now, we calculate the negative predictive value of the cTnT threshold (≤55 ng/L), always applying the Bayes’ theorem, in the two wards.
In Ward A, we have:
pretest odds=prevalence/(100prevalence)=20.0/(10020.0)=0.25
posttest odds = (pretest odds) * (LR) =0.25 * 0.44=0.11 posttest probability = posttest odds /(post test odds + 1), that is the posttest probability of having LVH given the negative test (i.e. cTnT < 55 ng/L), that is:
0.11/(0.11 + 1)=0.11/1.11=0.10 (10%)
From this latter calculation, we can derive the posttest probability of not having LVH given the negative test (cTnT <55 ng/L) which corresponds to the negative predictive value: Negative predictive value=10.11=0.89 (89%)
Now we consider the ward B, by making the same calculations, that is:
pretest odds =70.0/ (10070.0)=2.33
Posttest odds = (pretest odds) * (LR) = 2.33 * 0.44=1.03
Posttest probability = posttest odds / (posttest odds + 1), that is the posttest probability of having LVH given the negative test, i.e. 1.03/(1.03 + 1)=1.03/2.03=0.51 (51%)
From this latter calculation, we again derive the posttest probability of not having LVH given the negative test (cTnT <55 ng/L) which corresponds to the negative predictive value:
Negative predictive value= 10.51=0.49 (49%).
Thus, the higher the prevalence of LVH, the lower the negative predictive value of the cTnT threshold, and viceversa, the lower the prevalence of LVH, the higher the negative predictive value of the cTnT threshold.
Example 2
Cardiac allograft vasculopathy (CAV) is an accelerated form of coronary artery disease (CAD) which represents a major factor limiting longterm survival after heart transplantation (HT) showing a frequency ranging from 8% at 1 year after the surgery to 50% within 10 years after HT.
The coronary angiography is the gold standard for detecting focal plaques, but it is associated with the use of contrast and ionizing radiation. The other sensitive tool to identify CAV is intravascular ultrasonography which can detect vasculopathy in the epicardial arteries but lacks the ability to evaluate the entire coronary tree. In a study the researchers investigated a new noninvasive, sensitive and specific tool in order to detect CAV early, assessing the role of oxidativeantioxidative balance disturbances at the beginning and progression of cardiac allograft vasculopathy [5]. The total oxidant status (TOS) is used to estimate the oxidation state of the body and the total antioxidant capacity (TAC) is applied to assess the antioxidant status. The oxidative stress index (OSI), which is the ratio of TOS to TAC, could be a more precise index of oxidative stress in the body because it is a comprehensive measurement of TAC and TOS. A total of 194 consecutive patients after HT were enrolled in the study. The diagnosis of CAV was based on the results of coronary angiographies and defined in “early” and “late” according to the current International Society for Heart and Lung Transplantation criteria. The serum levels of TOS and TAC were measured by methods described by Erel [6,7] and the global balance of oxidationantioxidant was estimated by OSI (TAC/TOS ratio). To identify risk factors for CAV, patients were classified as not having CAV, defined as the lack of any lesions in the coronary vessels, and as having CAV (from CAV 1 to CAV 3).
The overall accuracy of TAC, TOS, and OSI for CAV detection were evaluated by calculating each area under the curve (AUC) by the receiver operating characteristic (ROC) curve analysis [3]. The cutoff point values of 1.08 for TAC and 4.94 for TOS levels, respectively, as derived from the ROC curve, represent the thresholds of the two biomarkers that maximize the difference between true positive rates and false positive rates. The two cutoff reached good sensitivity (74% and 65%) and specificity (85% and 90%) for CAV detection. Both markers achieved high PPV (83% and 86%, respectively) and NPV (77% and 72%), indicating good results in term of likelihood ratios (+LR=4.8 for TAC and 6.3 for TOS; LR=0.30 and 0.39 for TAC) as well as good accuracy (79% and 77% for TAC and TOS, respectively). The combined use of TOS and TAC levels with OSI ratio improved identification of CAV and the cutoff value of 4.17 for OSI reached a sensitivity of 89% and a specificity of 87%.
Table 4: Summary of the ROC curve analysis for TAC, TOS, and OSI.
Now, we describe in detail the ability of OSI ratio to differentiate patients with CAV from those without CAV in respect of cutoff value
(4.17) (Table 5).
Diagnostic
Test 
Disease 


With
CAV 
Without
CAV 

Positive (OSI ratio> 4.17) 
86 
13 
99 
Negative (OSI ratio < 4.17) 
11 
84 
95 

97 
97 
Total: 194 
Table 5: Diagnostic value of OSI ratio for detecting CAV
The diagnostic performance of the OSI ratio is as follow: PPV=86/99=0.87 (87%), NPV=84/95=0.88 (88%), sensitivity=86/97=0.89 (89%), and specificity=84/97=0.87 (87%) and accuracy was (86+84)/194=0.88 (88%). The authors conclude that their study highlights a role of the oxidative stress in cardiac allograft vasculopathy in HT recipients. The results show that the oxidativeantioxidative balance is shifted toward production of free radicals. The OSI ratio represents a new, simple, noninvasive and lowcost marker for CAV detection, generating the opportunity of additional therapy with antioxidant substances in the management of patients after HT.
Example 3
The COVID19 pandemic demonstrates the importance of rapid and accurate diagnostics in the control of infectious diseases. Laboratorybased molecular assays for detecting SARSCoV2 in respiratory specimens are the current reference standard for COVID 19 diagnosis, but pointofcare technologies and serologic immunoassays are rapidly emerging. Although real time reverse transcriptase polymerase chain reaction (RTPCR)  based assays performed in the laboratory on respiratory specimens are the cornerstone of Covid19 diagnostic testing, several novel or complementary diagnostic methods are being developed and evaluated. In their paper Bisoffi Z et al. [6] assessed the sensitivity, specificity, positive and negative predictive values (PPV and NPV) of three widely used molecular (RTPCRs) tests, (with six different gene targets) and of six serologic tests [five IgGIgM rapid diagnostic tests (RDT) and an ELISA IgAIgG test] for diagnosis of SARSCoV2 infection. All consecutive patients presenting to the emergency room with clinical suspicion of COVID19 and submitted to diagnostic tests were enrolled for a total of 346 patients. Of these, 85 (24.6%) were classified as infected and 261 (75.4%) as no infected. Thus, the pretest probability of the disease was 24.6%. As first example, we report the results of the assays targeting the RNAdependent RNA polymerase gene (“Target RdRP kit”, see Table 6). Sensitivity and specificity were calculated as 80/85=0.941 (94.1%) and as 259/261=0.992 (99.2%), respectively. PPV and NPV were calculated as 80/82=0.976 (97.6%) and as 259/264=0.981 (98.1%), respectively.


Disease 




With SarsCoV2 infection 
Without SarsCoV2 infection 

Diagnostic Test 
Positive 
80 
2 
82 

Negative 
5 
259 
264 


85 
261 
Total: 346 
Table 6: Diagnostic value of the molecular test “Target RdRP (kit)”
The specificity and the PPV reached 100% with an assay (Real Quality RQSARSnCoV2 assay) targeting two genes instead of one that is the spike protein gene (S) and the RNAdependent RNA polymerase gene (RdRp) (“Target S and RdRp kit”) (Table 7). Sensitivity was 78/85=0.917 (91.7%) and the NPV was 261/268=0.974 (97.4%).


Disease 





With SarsCoV2 infection 
Without infection 
SarsCoV2 

Diagnostic Test 
Positive 
78 
0 

78 

Negative 
7 
261 

268 


85 
261 

Total:346 
Table 7: Diagnostic value of the Molecular test “Target S and RdRp (kit)”
Now we consider the diagnostic value of the Serological test and we report, as an example, only one of the used assays i.e. the “Prima Professional IgM”. The sensitivity of this test was 39/85=0.458 (45.8%), the specificity 208/261=0.796 (79.6%), the PPV 39/92=0.423 (42.3%) and the NPV 208/254= 0.818 (81.8%) (Table 8).


Disease 




With SarsCov2 Infection 
Without SarsCov2 Infection 

Diagnostic Test 
Positive 
39 
53 
92 

Negative 
46 
208 
254 


85 
261 
Total:346 
Table 8: Diagnostic value of the Serological test “Prima Professional IgM”
The authors conclude that for molecular diagnostic purposes, accepting positive results in any single gene target appears justified for cases with clinical suspicion of COVID19 in an emergency room. Conversely, a confirmation of the diagnosis, based on the positivity of multiple genomic regions, might be more appropriate when the test is deployed for screening purposes in a phase of low/ very low viral circulation. The serologic tests included in this study did not demonstrate suitable sensitivity for clinical use on acutely ill patients. An overview of the critical appraisal methods and problemsolving skills necessary to an accurate diagnosis of infectious diseases and the identification of infectious agents is reported elsewhere [9].
Summary and Conclusions
In every day clinical practice, the diagnosis is first of all a logical process starting with an accurate evaluation of patient’s signs and symptoms. The diagnosis is essential in clinical medicine. The ideal diagnostic test perfectly discriminates the sick from the healthy, i.e. a test with a sensitivity and specificity of 100%. However, the reference standards that reflects ‘the truth’ are methods generally expensive, time consuming and require specific expertise. Thus, their largescale use is rather limited. For this reason, studies on the diagnostic value of specific biomarkers are a growing research area because they aim to identify biomarkers, cheap and relatively easy to measure, which can replace a specific gold standard. Sensitivity and specificity are measures for evaluating the accuracy of a diagnostic test that do not depend on the prevalence of the disease. However, two tests with the same accuracy can display different true positive and true negative rates. The positive and negative predictive values, on the other hand, depend on the prevalence of the disease in the population, so the data calculated on a certain population cannot be applicable to a different group. Positive and negative Likelihood ratios are useful because the combine true positives and false positives (+LR) and true negatives and false negatives (LR). The study design to be used in diagnostic research is the crosssectional one.
Author Contributions: Conceptualization, D.L, G.D., G.T.; investigation, D.L., S.A.E., G.T.; methodology, S.A.E., G.D., M.F.; supervision, D.L. S.R., G.T.; writing—original draft preparation, D.L. G.T..; writing—review and editing, D.L. G.T. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.
References
 Cleophas TJ, Zwinderman AH, Cleophas TF, Cleophas EP (2009)Summary of Validation Procedures for Diagnostic Tests. StatisticsApplied to Clinical Trials 4: 433447.
 Karlijn J van S, Vianda SS, Johannes BR, Friedo WD, Carmine Z, JJager K (2009) Diagnostic methods I: sensitivity, specificity, and other measures of accuracy, Kidney International 75: 12571263.
 Tripepi G, J. Jager K , Friedo W (2009) Dekker and Carmine Zoccali1 Diagnostic methods 2: receiver operating characteristic (ROC) curves. Kidney Int 76 :252256.
 Mallamaci F, Zoccali C, Parlongo S, Tripepi G, Benedetto FA, et (2002) Cardiovascular Risk Extended Evaluation in Dialysis Investigators. Diagnostic value of troponin T for alterations in leftventricular mass and function in dialysis patients. Kidney Int 62: 188490
 Szczurek W, Gąsior M, Romuk E, Skrzypek M, Zembala M, et al (2020) Investigation of the Role of Oxidative Stress and Factors Associated with Cardiac Allograft Vasculopathy in Patients after Heart Oxidative Medicine and Cellular Longevity 9:9.
 Bisoffi Z, Pomari E, Deiana M, Piubelli C, Ronzoni N, et al. (2020) Sensitivity, Specificity and Predictive Values of Molecular and Serological Tests for COVID19: A Longitudinal Study in Emergency Diagnostics (Basel) 10: 669.
 Erel (2005) A new automated colorimetric method for measuring total oxidant status. Clinical Biochemistry 12: 1103–1111.
 Erel (2004) A novel automated direct measurement method for total antioxidant capacity using a new generation, more stable ABTS radical cation. Clinical Biochemistry 4: 277–285.
 Mahon C, Lehman D (2022) Textbook of Diagnostic Microbiology.