Predicting the Stage of Colorectal Cancer Using Potential mRNA-Based Biomarker

Authors : Aman Goyal; Santushti Gandhi; Garima Arora

Volume/Issue : Volume 7 - 2022, Issue 12 - December

Google Scholar :

Scribd :


To use classification machine learning techniques to differentiate between early and late-stage colorectal cancer based on individuals' mRNA profiles with clinically recorded data. Methods: A gene signature of 14 unique mRNAs was found using a benchmark dataset extracted from The Cancer Genome Atlas (TCGA) [1]. The data of the genes were normalized using statistical methods, and the stages of the cancer were divided into the early and late stages. The best genes were found using hypothesis testing. The gene set was then tested on a new dataset gathered using the Gene Expression Omnibus (GEO) [2] using the GSE32323 accession number. A null hypothesis was also considered for the study. Results: A training accuracy of 75% was achieved for the gene expression with a ROC-AUC score of 0.81 for the TCGA dataset using an ensemble technique. On this gene expression, a testing accuracy of 74% and a ROC-AUC score of 0.72 was achieved for the GEO dataset. The null hypothesis was also proven wrong in favour of the alternative hypothesis. Conclusion: The study successfully proved the hypothesis and presented a set of 14 unique mRNAs that help predict the stage of Colorectal cancer in an individual.

Keywords : Biomarker, Ensemble Technique, Hypothesis Testing, mRNA, Machine Learning.


Paper Submission Last Date
31 - March - 2023

Paper Review Notification
In 1-2 Days

Paper Publishing
In 2-3 Days

Video Explanation for Published paper

Never miss an update from Papermashup

Get notified about the latest tutorials and downloads.

Subscribe by Email

Get alerts directly into your inbox after each post and stay updated.

Subscribe by RSS

Add our RSS to your feedreader to get regular updates from us.