A Machine Learned Classifier That Uses Gene Expression Data to Accurately Predict Estrogen Receptor Status |
| |
Authors: | Meysam Bastani Larissa Vos Nasimeh Asgarian Jean Deschenes Kathryn Graham John Mackey Russell Greiner |
| |
Affiliation: | 1. Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.; 2. Department of Oncology, University of Alberta, Edmonton, Alberta, Canada.; 3. Alberta Innovates Centre for Machine Learning, Edmonton, Alberta, Canada.; 4. Department of Pathology and Laboratory Medicine, University of Alberta, Edmonton, Alberta, Canada.; 5. Cross Cancer Institute, Edmonton, Alberta, Canada.; University of Glasgow, United Kingdom, |
| |
Abstract: | BackgroundSelecting the appropriate treatment for breast cancer requires accurately determining the estrogen receptor (ER) status of the tumor. However, the standard for determining this status, immunohistochemical analysis of formalin-fixed paraffin embedded samples, suffers from numerous technical and reproducibility issues. Assessment of ER-status based on RNA expression can provide more objective, quantitative and reproducible test results.MethodsTo learn a parsimonious RNA-based classifier of hormone receptor status, we applied a machine learning tool to a training dataset of gene expression microarray data obtained from 176 frozen breast tumors, whose ER-status was determined by applying ASCO-CAP guidelines to standardized immunohistochemical testing of formalin fixed tumor.ResultsThis produced a three-gene classifier that can predict the ER-status of a novel tumor, with a cross-validation accuracy of 93.17±2.44%. When applied to an independent validation set and to four other public databases, some on different platforms, this classifier obtained over 90% accuracy in each. In addition, we found that this prediction rule separated the patients'' recurrence-free survival curves with a hazard ratio lower than the one based on the IHC analysis of ER-status.ConclusionsOur efficient and parsimonious classifier lends itself to high throughput, highly accurate and low-cost RNA-based assessments of ER-status, suitable for routine high-throughput clinical use. This analytic method provides a proof-of-principle that may be applicable to developing effective RNA-based tests for other biomarkers and conditions. |
| |
Keywords: | |
|
|