Multi-TGDR: A Regularization Method for Multi-Class Classification in Microarray Experiments |
| |
Authors: | Suyan Tian Mayte Suárez-Fari?as |
| |
Affiliation: | 1. Division of Clinical Epidemiology, First Hospital of the Jilin University, Changchun, Jilin, China.; 2. Center for Clinical and Translational Science, The Rockefeller University, New York, New York, United States of America.; 3. Laboratory for Investigative Dermatology, The Rockefeller University, New York, New York, United States of America.; University of Georgia, United States of America, |
| |
Abstract: | BackgroundAs microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set.ResultsUsing real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted.ConclusionsCompared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended. |
| |
Keywords: | |
|
|