期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Minimum redundancy feature selection from microarray gene expression data 总被引：7，自引：0，他引：7

Ding C Peng H 《Journal of bioinformatics and computational biology》2005,3(2):185-205

How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/~cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/. 相似文献

2.

A survey on filter techniques for feature selection in gene expression microarray analysis 总被引：1，自引：0，他引：1

Lazar C Taminau J Meganck S Steenhoff D Coletta A Molter C de Schaetzen V Duque R Bersini H Nowé A 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(4):1106-1119

A plenitude of feature selection (FS) methods is available in the literature, most of them rising as a need to analyze data of very high dimension, usually hundreds or thousands of variables. Such data sets are now available in various application areas like combinatorial chemistry, text mining, multivariate imaging, or bioinformatics. As a general accepted rule, these methods are grouped in filters, wrappers, and embedded methods. More recently, a new group of methods has been added in the general framework of FS: ensemble techniques. The focus in this survey is on filter feature selection methods for informative feature discovery in gene expression microarray (GEM) analysis, which is also known as differentially expressed genes (DEGs) discovery, gene prioritization, or biomarker discovery. We present them in a unified framework, using standardized notations in order to reveal their technical details and to highlight their common characteristics as well as their particularities. 相似文献

3.

Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data 总被引：2，自引：0，他引：2

Lepre J Rice JJ Tu Y Stolovitzky G 《Bioinformatics (Oxford, England)》2004,20(7):1033-1044

MOTIVATION: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. RESULTS: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. AVAILABILITY: Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen). 相似文献

4.

Effective feature selection framework for cluster analysis of microarray data

Pok G Liu JC Ryu KH 《Bioinformation》2010,4(8):385-389

The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results. 相似文献

5.

Robust feature selection for microarray data based on multicriterion fusion 总被引：1，自引：0，他引：1

Yang F Mao KZ 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(4):1080-1092

Feature selection often aims to select a compact feature subset to build a pattern classifier with reduced complexity, so as to achieve improved classification performance. From the perspective of pattern analysis, producing stable or robust solution is also a desired property of a feature selection algorithm. However, the issue of robustness is often overlooked in feature selection. In this study, we analyze the robustness issue existing in feature selection for high-dimensional and small-sized gene-expression data, and propose to improve robustness of feature selection algorithm by using multiple feature selection evaluation criteria. Based on this idea, a multicriterion fusion-based recursive feature elimination (MCF-RFE) algorithm is developed with the goal of improving both classification performance and stability of feature selection results. Experimental studies on five gene-expression data sets show that the MCF-RFE algorithm outperforms the commonly used benchmark feature selection algorithm SVM-RFE. 相似文献

6.

A gene expression bar code for microarray data

Zilliox MJ Irizarry RA 《Nature methods》2007,4(11):911-913

The ability to measure genome-wide expression holds great promise for characterizing cells and distinguishing diseased from normal tissues. Thus far, microarray technology has been useful only for measuring relative expression between two or more samples, which has handicapped its ability to classify tissue types. Here we present a method that can successfully predict tissue type based on data from a single hybridization. A preliminary web-tool is available online (http://rafalab.jhsph.edu/barcode/). 相似文献

7.

A stable gene selection in microarray data analysis

Kun Yang Zhipeng Cai Jianzhong Li Guohui Lin 《BMC bioinformatics》2006,7(1):228-16

Background

Microarray data analysis is notorious for involving a huge number of genes compared to a relatively small number of samples. Gene selection is to detect the most significantly differentially expressed genes under different conditions, and it has been a central research focus. In general, a better gene selection method can improve the performance of classification significantly. One of the difficulties in gene selection is that the numbers of samples under different conditions vary a lot. 相似文献

8.

Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis 总被引：1，自引：0，他引：1

Tang Y Zhang YQ Huang Z 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(3):365-381

Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine - Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression datasets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is O(d*log(2)d}, where d is the size of the original gene set. 相似文献

9.

Regularization network-based gene selection for microarray data analysis

Zhou X Mao KZ 《International journal of neural systems》2006,16(5):341-352

Microarray data contains a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This presents problems to discriminant analysis of microarray data. One way to alleviate the problem is to reduce dimensionality of data by selecting important genes to the discriminant problem. Gene selection can be cast as a feature selection problem in the context of pattern classification. Feature selection approaches are broadly grouped into filter methods and wrapper methods. The wrapper method outperforms the filter method but at the cost of more intensive computation. In the present study, we proposed a wrapper-like gene selection algorithm based on the Regularization Network. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we proposed does not demand repeated training in the leave-one-out procedure. 相似文献

10.

A combinational feature selection and ensemble neural network method for classification of gene expression data 总被引：1，自引：0，他引：1

Bing?Liu Qinghua?Cui Tianzi?Jiang Email author Songde?Ma 《BMC bioinformatics》2004,5(1):136

Background

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification. 相似文献

11.

Cluster-Rasch models for microarray gene expression data 总被引：1，自引：0，他引：1

Li H Hong F 《Genome biology》2001,2(8):research0031.1-research003113

Background

We propose two different formulations of the Rasch statistical models to the problem of relating gene expression profiles to the phenotypes. One formulation allows us to investigate whether a cluster of genes with similar expression profiles is related to the observed phenotypes; this model can also be used for future prediction. The other formulation provides an alternative way of identifying genes that are over- or underexpressed from their expression levels in tissue or cell samples of a given tissue or cell type.

Results

We illustrate the methods on available datasets of a classification of acute leukemias and of 60 cancer cell lines. For tumor classification, the results are comparable to those previously obtained. For the cancer cell lines dataset, we found four clusters of genes that are related to drug response for many of the 90 drugs that we considered. In addition, for each type of cell line, we identified genes that are over- or underexpressed relative to other genes.

Conclusions

The cluster-Rasch model provides a probabilistic model for describing gene expression patterns across samples and can be used to relate gene expression profiles to phenotypes. 相似文献

12.

Clustering methods for microarray gene expression data 总被引：1，自引：0，他引：1

Belacel N Wang Q Cuperlovic-Culf M 《Omics : a journal of integrative biology》2006,10(4):507-531

Within the field of genomics, microarray technologies have become a powerful technique for simultaneously monitoring the expression patterns of thousands of genes under different sets of conditions. A main task now is to propose analytical methods to identify groups of genes that manifest similar expression patterns and are activated by similar conditions. The corresponding analysis problem is to cluster multi-condition gene expression data. The purpose of this paper is to present a general view of clustering techniques used in microarray gene expression data analysis. 相似文献

13.

Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data 总被引：3，自引：0，他引：3

Xuegong Zhang Xin Lu Qian Shi Xiu-qin Xu Hon-chiu E Leung Lyndsay N Harris James D Iglehart Alexander Miron Jun S Liu Wing H Wong 《BMC bioinformatics》2006,7(1):197-13

Background

Like microarray-based investigations, high-throughput proteomics techniques require machine learning algorithms to identify biomarkers that are informative for biological classification problems. Feature selection and classification algorithms need to be robust to noise and outliers in the data. 相似文献

14.

LS Bound based gene selection for DNA microarray data

Zhou X Mao KZ 《Bioinformatics (Oxford, England)》2005,21(8):1559-1564

MOTIVATION: One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. RESULTS: We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. AVAILABILITY: A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). CONTACT: ekzmao@ntu.edu.sg. 相似文献

15.

Random forest for gene selection and microarray data classification

Moorthy K Mohamad MS 《Bioinformation》2011,7(3):142-146

A random forest method has been selected to perform both gene selection and classification of the microarray data. In this embedded method, the selection of smallest possible sets of genes with lowest error rates is the key factor in achieving highest classification accuracy. Hence, improved gene selection method using random forest has been proposed to obtain the smallest subset of genes as well as biggest subset of genes prior to classification. The option for biggest subset selection is done to assist researchers who intend to use the informative genes for further research. Enhanced random forest gene selection has performed better in terms of selecting the smallest subset as well as biggest subset of informative genes with lowest out of bag error rates through gene selection. Furthermore, the classification performed on the selected subset of genes using random forest has lead to lower prediction error rates compared to existing method and other similar available methods. 相似文献

16.

基于小波高频系数基因芯片数据的特征提取

刘玉杰刘毅慧《生物信息学》2011,9(4):339-343

结合小波分析理论与支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。本文着重研究小波高频系数基因芯片数据的特征提取,并通过实验对比小波高频系数和低频系数特征提取对分类器性能的影响。其中haar小波3层分解提取高频系数,送入分类器分类后,得到的正确分类率为93.31%。db1小波4层分解提取低频系数,送入分类器分类后,得到的正确分类率为93.53%。小波低频系数特征提取分类效果总体上好于高频系数,分类器性能稳定。相似文献

17.

Visualization of microarray gene expression data

下载免费PDF全文

Prasad TV Ahson SI 《Bioinformation》2006,1(4):141-145

Microarray gene expression data is used in various biological and medical investigations. Processing of gene expression data requires algorithms in data mining, process automation and knowledge discovery. Available data mining algorithms exploits various visualization techniques. Here, we describe the merits and demerits of various visualization parameters used in gene expression analysis. 相似文献

18.

GEPAS: A web-based resource for microarray gene expression data analysis

下载免费PDF全文

Herrero J Al-Shahrour F Díaz-Uriarte R Mateos A Vaquerizas JM Santoyo J Dopazo J 《Nucleic acids research》2003,31(13):3461-3467

We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es). 相似文献

19.

FiGS: a filter-based gene selection workbench for microarray data

Taeho Hwang Choong-Hyun Sun Taegyun Yun Gwan-Su Yi 《BMC bioinformatics》2010,11(1):50

Background

The selection of genes that discriminate disease classes from microarray data is widely used for the identification of diagnostic biomarkers. Although various gene selection methods are currently available and some of them have shown excellent performance, no single method can retain the best performance for all types of microarray datasets. It is desirable to use a comparative approach to find the best gene selection result after rigorous test of different methodological strategies for a given microarray dataset. 相似文献

20.

A temporal precedence based clustering method for gene expression microarray data

Ritesh Krishna Chang-Tsun Li Vicky Buchanan-Wollaston 《BMC bioinformatics》2010,11(1):68

Background

Time-course microarray experiments can produce useful data which can help in understanding the underlying dynamics of the system. Clustering is an important stage in microarray data analysis where the data is grouped together according to certain characteristics. The majority of clustering techniques are based on distance or visual similarity measures which may not be suitable for clustering of temporal microarray data where the sequential nature of time is important. We present a Granger causality based technique to cluster temporal microarray gene expression data, which measures the interdependence between two time-series by statistically testing if one time-series can be used for forecasting the other time-series or not. 相似文献