Samuel Chao1*, Tanya Pilcz1, Dimitri Stamatiou1, Jay Ying1, Robert Burakoff2, Leroy D. Mell3
1GeneNews Ltd, Richmond Hill, Ontario, Canada
2Weill Cornell Medical College, New York, New York, USA
3Innovative Diagnostic Laboratory, Richmond, Virginia, USA
*Corresponding author: Samuel Chao, GeneNews Ltd, Richmond Hill, Ontario, Canada. Tel: +19052092030; Email: firstname.lastname@example.org
Received Date: 03 January, 2019; Accepted Date: 10 January, 2019; Published Date: 18 January, 2019
ColonSentry® is a molecular test for assessing the potential of colorectal cancer and pre-malignant lesions in average risk individuals. Initially developed from a clinical study involving approximately 10,000 subjects in North America, this test has been commercialized and administered to over 100,000 patients. We compare the real-life distribution of the results against the model that was initially developed and review them in the context of measurement stability, and to evaluate the validity of the assumptions made during the construction of the mathematical model. We confirm that the commercial application of the test falls well within the designed quality assurance limits and that stability was maintained over a period of multiple years. The model’s assumption of two subpopulations, one with colorectal cancer at 0.7% prevalence, and the other without colorectal cancer at 99.3% prevalence, fit the data within the expected measurement tolerances. We discuss enhancement of the model to address a precancerous polyp phase subpopulation, and how the test results can be used to identify patients who should be referred directly for colonoscopy versus other modalities for colorectal cancer screening.
2. Keywords: Colon cancer; ColonSentry; Colorectal cancer; Diagnostic; Early detection
In 2018, approximately 97,000 new cases of colon cancer and 43,000 new cases of rectal cancer are anticipated to be diagnosed in the United States . Colorectal cancer (CRC) is the third most common cancer diagnosed in the US with a lifetime risk for men and women of approximately 4% . Early diagnosis is critical to survival. The 5-year survival rate for stage I colon cancer is ~92% whereas the survival rate for stage IIIB-IV varies from 69% to 11%, depending on the extent of disease . The United States Preventive Services Task Force recommends screening for colorectal cancer using stool based tests (gFOBT, FIT, FIT-DNA) or direct visualization tests (sigmoidoscopy, colonoscopy, CT colonography) in adults, beginning at age 50 years and continuing until age 75 . Compliance and risks associated with these procedures vary.
GeneNews developed and validated ColonSentry, a convenient blood-based colorectal cancer risk prediction test that determines an individual’s current risk of having colorectal cancer. Risk is determined by measuring the levels of 7 genes (ANXA3, CLEC4D, LMNB1, PRRG4, TNFAIP6, VNN1 and IL2RB) in the blood and inputting that information into a proprietary algorithm. Clinical validation results were published in 2009 in the International Journal of Cancer . The ColonSentry model was developed on a training set consisting of 112 CRC and 120 Controls with an area under the curve (AUC) of 0.80 (95% confidence interval: 0.74 - 0.85), 64% specificity, 82% sensitivity and 73% accuracy. The predictive performance was validated on an age/gender/ethnicity balanced test set consisting of 202 CRC and 208 Controls with an AUC of 0.80 (95% confidence interval: 0.76–0.84), 70% specificity, 72% sensitivity and 71% accuracy. An analysis of the prediction distribution for location and stage of CRC shows equal sensitivity for both left and right sided lesions and a progressive increase as the cancer progresses . ColonSentry has been commercially available in the US since 2012 and offered by CLIA accredited Innovative Diagnostic Laboratory (IDL), located in Richmond, VA, since 2014.
Subsequent to the launch of ColonSentry in 2008, many groups have independently validated gene expression from the 7-gene panel, autonomously or in-combination with each other, to determine the use as a CRC diagnostic marker(s). In 2010, Yip et al. validated ColonSentry in Malaysia on 99 CRC and 111 Controls reporting an AUC of 0.76 (95% confidence interval: 0.70 to 0.82), 77% specificity, 61% sensitivity and 70% accuracy , comparable to the data obtained from North American validation. Chang et al. developed a blood-based CRC detection assay that included the ANXA3, TNFAIP6 and IL2RB biomarkers . ColonSentry has been shown to detect left and right CRC with similar sensitivity, unlike Colonoscopy which misses right-sided lesions [7-9]. To date, ColonSentry has been used to assess colon cancer risk in over 100,000 patients from the United States. We evaluate how well the model developed for ColonSentry approximated the general population and whether the assumptions that were made could be validated.
4.1 ColonSentry Logistic Regression (LogReg) Score and Relative Risk Determination
The qPCR data of
each sample is specified by
The log-odd value of a sample being predicted as CRC was given by
Bayes’ Theorem was
applied to calculate the current CRC risk using LogReg scores. The LogReg score
distributions of CRC and controls in the dataset were used to determine
corresponding distributions in the average-risk population. More precisely, the
conditional probability of CRC patients having LogReg score
conditional probability of controls having LogReg score
Then, given a
subject’s LogReg score
where the a priori probability p=0.007 was the CRC prevalence in average-risk population.
An individual’s relative risk (RR) for CRC is reported as their “CRC Score”, defined as the probability of having CRC divided by CRC prevalence, was given by
At RR=1.0, a subject has the same CRC risk as the un-stratified average-risk population.
4.2 ColonSentry Test Procedure and Data Collection from IDL
4.2.1 qPCR and Plate-to-plate calibration
For qRT‐PCR, blood collected in PAXgene™ tubes (PreAnalytiX) was processed according to PAXgene™ Blood RNA Kit protocol. RNA quantity was determined by absorbance at 260nm in a NanoDrop 8000 (Thermo Scientific™).
Approximately one microgram of RNA was reverse transcribed into single‐stranded complementary DNA (cDNA) using High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) in a 20 μL reaction volume. For PCR, 8 ng cDNA was mixed with QuantiTect® Probe PCR Master Mix (Qiagen) and TaqMan® dual‐labeled probe and primers corresponding to the gene‐of‐interest and denominator in a 10 μL reaction volume. PCR amplification was performed using a Viia7 Real-Time PCR Instrument (Applied Biosystems). Quality assurance processes included verification of negative template control for lack of amplification, review of amplification curve shape for adequate signal, difference between duplicate wells and stability of the calibrator, positive and negative reference sample. Samples that failed these quality control checks were repeated. Samples that failed a second time were excluded from the analysis. To stabilize the qPCR measurements against variations from various sources (e.g., instrument, reagent lots), a known reference obtained from a qualified pooled RNA is placed on each plate and run alongside the patient samples.
Measured delta Ct values are then compared against the established reference values and these results are then used to calibrate the unknown samples. To evaluate the performance of this calibration procedure, two other known and qualified references are also measured on each plate: a “positive” reference known to generate a high CRC score and a “negative” reference known to generate a low CRC score. These two references are processed the same way as the unknown subjects. The “calibrated” delta Ct for these two references can then be monitored for deviations from expected values.
4.2.2 Data Collection
Since inception, more than 100,000 ColonSentry tests for clinical purposes have been performed in the U.S., 95,139 of which included a minimal set of clinical information to verify whether the patient would have qualified as “average risk” (i.e., no first-degree relative with CRC, no previous CRC or surgery for CRC). The age distribution by gender of these 95,139 patients is presented in Figure 2. ColonSentry scores from 95,139 patients, collected and processed as described in 2.2.1 at IDL (Richmond, Virginia) were used in this analysis.
4.3 Model Development
ColonSentry scores from 95,139 patients, collected and processed as described in 2.2.1 at IDL (Richmond, Virginia) were plotted and the distribution of these scores were compared to the model’s projected score distribution for an average risk population with 0.7% CRC prevalence.
4.4 Model Comparison and Evaluation
The histogram of the distribution of accumulated patient scores was compared to the predicted distribution based on the model described above using Bayes’ theorem. The bin size was set to 0.1 units on the LogReg scale. The difference between the two distributions is quantified as the RMS error which is defined as the square root of the mean of the sum of the squares of the differences at each evaluated LogReg score along the horizontal axis of the distribution chart.
5.1 Comparison and Evaluation of the Predictive Model to Observed Data
By the use of Bayes’ Theorem, the CRC samples from the clinical trial  were scaled to the known 0.7% prevalence, with the non-CRC samples scaled to represent the remaining 99.3% of the average risk population. The expected LogReg score distribution is presented in Figure 1.
Approximately 100,000 ColonSentry tests were performed and 95,139 of them also included a minimal set of clinical information to verify whether the patient would have qualified as “average risk” (i.e., no first-degree relative with CRC, no previous CRC or surgery for CRC). The age distribution by gender of these 95,139 patients is presented in Figure 2.
The model generated scores for the patients from which CRC relative risk could be predicted. The distribution of these scores were compared to the model’s projected score distribution for an average risk population with 0.7% CRC prevalence (Figure 3). There was a slight displacement to the right for the actual IDL distribution (blue) relative to the model's curve (red). The asymmetrical difference curve (green) suggests that the error is mainly a relative displacement between the two curves rather than a difference in standard deviation. The total Root Mean Square (RMS) error was determined to be 0.051%.
One way to estimate the drift is to shift the results until the difference is minimized. The optimum shift was determined to be 0.1 units to achieve near perfect overlap throughout the range of the test results. This shift of 0.1 units magnitude is well within the allowed tolerance for the ColonSentry test which specifies a window of +/- 0.6 units for the LogReg score at 95% Confidence when all QC limits are met. Shifting the IDL lab data by 0.1 units reduced the overall RMS error to 0.033%, a factor of about 1.6X smaller than for the un-shifted results (Figure 4)
5.2 Model for Early Detection
Our original model only accounted for two subgroups in the average population: subjects with CRC and subjects who are confirmed by colonoscopy and pathology to be free from CRC, polyps or advanced adenoma. However, population statistics have determined that there is a significant additional subgroup with either polyps or advanced adenomas which are non-cancerous precursor stages of CRC.
We hypothesized that this additional subgroup would have a distribution that would have prediction scores in between the CRC and Control groups, the same spread and have a prevalence in the range of 9% to 37% (Telford 2010: 9% at age 50, Frazier 2000: 21%, Imperial 2014: 37%) [10-12]. The three subgroup model is presented in Figure 5.
The difference between the IDL data and the model is minimized for a shift of zero and 17% prevalence for the pre-cancer stage centered about one quarter of the distance between the CRC and the pathology-free subgroups (Figure 6). The RMS error at 0.031% is lower than the value from the unshifted 2-subgroup model and even slightly lower than the 0.1-shifted model. The zero-shift is consistent with the results of the positive and negative controls, so it is more likely that the 3-subgroup model is the more accurate representation of real-life data.
The long-term results that we have accumulated confirm that the ColonSentry model represented the average population with equitable accuracy. While the 2-subgroup model is a minimal representation of the real-life surveillance population, it was well within the pre-determined acceptable QC limits of the observed population distribution.
ColonSentry development began with a population of 9,199 patients recruited from multiple colorectal cancer surveillance clinics located in Canada and the US. Of these subjects, only 68 were subsequently diagnosed to have colorectal cancer by colonoscopy and pathological analysis of the biopsy . This is equivalent to a prevalence rate of 0.74% which is in agreement with sources such as US SEER. This data demonstrates that the population is likely similar to the target US population.
Additional cancer samples were required to identify robust biomarkers, develop an algorithm and appropriately power statistical analysis. To achieve this, GeneNews began collecting additional samples from cancer clinics. All cancer cases were then carefully matched with subjects from the surveillance clinics for age, sex, BMI, ethnicity and cancer stage. The final cohort selected for the training set included 112 cancer cases matched to 120 pathology-free subjects. The model fitted to these data was then used to predict a test set with 202 cancer cases with a matching set of 208 control subjects.
The 7 genes included in the ColonSentry gene panel were initially selected based on microarray gene profiling of control and diseased patients [3,13]. At the time, it was unknown what, if any, role these genes had in colorectal cancer. In 2009, our analysis showed that ANXA3, CLEC4D, LMNB1, TNFAIP6, PRRG4 and VNN1 were upregulated and IL2RB was downregulated in patients with colorectal cancer but no further information on those genes was available. Since then, 6 of the 7 ColonSentry biomarkers have now been implicated in cancer, 5 of which are specifically implicated in colorectal cancer, validating their use as robust biomarkers to predict colorectal cancer risk. Multiple groups have independently studied ColonSentry, or the biomarkers within, and have validated our results [4,5,14,15].
Currently, ColonSentry can be used to identify patients at increased risk of CRC. Patients with a ColonSentry current risk scores greater than or equal to 2 are advised to pursue further evaluation with recommended screening modalities such as colonoscopy. Additional studies are underway to identify biomarkers which can predict CRC earlier, at the advanced adenoma stage. Preliminary data suggests that the ColonSentry biomarkers may play a role in the detection of advanced adenoma. Studies are underway to determine how ColonSentry can be used, or redefined, to detect CRC at the advanced adenoma stage.
The original population model only included confirmed CRC cases and control cases which were confirmed free of CRC and polyps or advanced adenoma by colonoscopy and biopsy. The decision to exclude subjects with polyps or advanced adenoma was driven by the long wait for the pathology to be confirmed and the consequently small number for which confirmation was available by the time the cancer-branch development was nearing completion. Relative displacement between the model and actual results may be attributed to the error in the estimate based on the initial training set or analytical drifts over the period of several years.
ColonSentry can be considered for use as an adjunct method to colon cancer screening tests in non-compliant patient populations.
· Authors S.C., T.P., D.S., J.Y., were employed by GeneNews Ltd. L.M. was employed by Innovative Diagnostic Laboratory. R.B. declares no competing interests.
· GeneNews developed and validated ColonSentry in Canada and it is commercially offered in the US by Innovative Diagnostic Laboratory. None of the authors have a financial interest in this product.
· Research presented here is covered by patent US8921074B2 where S.C. is listed as a co-inventor.
· S.C. and T.P. contributed equally to this work.
· This work was fully funded by GeneNews Ltd.
· The authors wish to thank Dr. Adam Dempsey and Dr. Joel Brill for their editorial assistance.
Figure 1: Expected LogReg Score Distribution for the ColonSentry Test. The distribution of LogReg scores is presented for both the non-colorectal cancer (non-CRC) and colorectal cancer (CRC) subgroups. The relative distribution of the non-CRC is indicated using the on left vertical scale, while the CRC uses the right vertical scale. The bin size to determine the distribution was set to a 0.1. Note that the secondary vertical axis for CRC group is expanded 20X compared to the Control group.
Figure 2: Patient Age Distribution for ColonSentry Tests Performed at IDL. The distribution of patients’ age for the 95,139 ColonSentry tests performed at IDL separated by gender.
Figure 3: Comparison of LogReg Score Distribution of the ColonSentry Test between the Model and IDL Lab Results. The LogReg Score distribution was compared between the original model (red) and the IDL laboratory test results (blue) using the vertical scale on the left. The difference between the two distributions is presented as deviation (green) using the vertical scale on the right, the Root Mean Square Error value is shown in the legend as 0.051%. The bin size to determine the distribution was set to a 0.1.
Figure 4: Optimization of IDL Lab Results. Shifting the LogReg score distribution of the IDL lab results (blue) by 0.1 units reduced the error (green) between the model (red) and the IDL lab results and the Root Mean Square value to 0.033% from 0.051%. A 0.1 unit shift, which is within allowable QA tolerance, resulted in optimal overlap of the two distributions. The bin size to determine the distribution was set to a 0.1.
Figure 5: Expected LogReg Score Distribution for a Three Sub-Group Cancer Risk Prediction Model. The distribution of LogReg scores is presented for the non-colorectal cancer (non-CRC), non-cancerous precursor stages of CRC (pre-CRC) and colorectal cancer (CRC) subgroups. The relative distribution of the non-CRC is indicated using the left vertical scale, while the pre-CRC and CRC use the right vertical scale. Note that the secondary vertical axis for CRC group is expanded 20X compared to the Control group to preserve the relative ordinal magnitudes of the Control, pre-CRC and CRC groups.
Figure 6: Comparison of LogReg Score Distribution of the Three Subgroup Model and the IDL Lab Results. The LogReg Score distribution was compared between the three subgroup model (red) and the IDL laboratory test results (blue) using the vertical scale on the left. The difference between the two distributions is presented as deviation (green) using the vertical scale on the right. The RMS was determined to be 0.031% between the two distributions. The bin size to determine the distribution was set to a 0.1.
2. US Preventive Services Task Force K, Bibbins-Domingo K, Grossman DC, Curry SJ, Davidson KW, et al. (2016) Screening for Colorectal Cancer: US Preventive Services Task Force Recommendation Statement. JAMA 315: 2564-2575.
Citation: Chao S, Pilcz T, Stamatiou D, Ying J, Burakoff R, et al. (2019) Stability of The ColonSentry® Colon Cancer Risk Stratification Test. Int J Dis Markers: IJDM-101. DOI: 10.29011/IJDM -101. 100001