Unsupervised feature selection via two-way ordering in gene expression analysis |
| |
Authors: | Ding Chris H Q |
| |
Affiliation: | NERSC Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA. chqding@lbl.gov |
| |
Abstract: | MOTIVATION: Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Most current methods select genes based on known phenotype information. However, certain set of genes may correspond to new phenotypes which are yet unknown, and it is important to develop novel effective selection methods for their discovery without using any prior phenotype information. RESULTS: We propose and study a new method to select relevant genes based on their similarity information only. The method relies on a mechanism for discarding irrelevant genes. A two-way ordering of gene expression data can force irrelevant genes towards the middle in the ordering and thus can be discarded. Mechanisms based on variance and principal component analysis are also studied. When applied to expression profiles of colon cancer and leukemia, the unsupervised method outperforms the baseline algorithm that simply uses all genes, and it also selects relevant genes close to those selected using supervised methods. SUPPLEMENT: More results and software are online: http://www.nersc.gov/~cding/2way. |
| |
Keywords: | |
本文献已被 PubMed Oxford 等数据库收录! |
|