期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A new regularized least squares support vector regression for gene selection

Pei-Chun Chen Su-Yun Huang Wei J Chen Chuhsing K Hsiao 《BMC bioinformatics》2009,10(1):44

Background

Selection of influential genes with microarray data often faces the difficulties of a large number of genes and a relatively small group of subjects. In addition to the curse of dimensionality, many gene selection methods weight the contribution from each individual subject equally. This equal-contribution assumption cannot account for the possible dependence among subjects who associate similarly to the disease, and may restrict the selection of influential genes. 相似文献

2.

Gene selection based on multi-class support vector machines and genetic algorithms

Souza BF Carvalho AP 《Genetics and molecular research : GMR》2005,4(3):599-607

Microarrays are a new technology that allows biologists to better understand the interactions between diverse pathologic state at the gene level. However, the amount of data generated by these tools becomes problematic, even though data are supposed to be automatically analyzed (e.g., for diagnostic purposes). The issue becomes more complex when the expression data involve multiple states. We present a novel approach to the gene selection problem in multi-class gene expression-based cancer classification, which combines support vector machines and genetic algorithms. This new method is able to select small subsets and still improve the classification accuracy. 相似文献

3.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data 总被引：1，自引：0，他引：1

Huang HL Chang FL 《Bio Systems》2007,90(2):516-528

An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献

4.

Gene selection for classification of microarray data based on the Bayes error

Zhang JG Deng HW 《BMC bioinformatics》2007,8(1):370

Background

With DNA microarray data, selecting a compact subset of discriminative genes from thousands of genes is a critical step for accurate classification of phenotypes for, e.g., disease diagnosis. Several widely used gene selection methods often select top-ranked genes according to their individual discriminative power in classifying samples into distinct categories, without considering correlations among genes. A limitation of these gene selection methods is that they may result in gene sets with some redundancy and yield an unnecessary large number of candidate genes for classification analyses. Some latest studies show that incorporating gene to gene correlations into gene selection can remove redundant genes and improve classification accuracy. 相似文献

5.

Partial least squares regression, support vector machine regression, and transcriptome-based distances for prediction of maize hybrid performance with gene expression data

Fu J Falke KC Thiemann A Schrag TA Melchinger AE Scholten S Frisch M 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,124(5):825-833

相似文献

6.

SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data

Pirooznia M Deng Y 《BMC bioinformatics》2006,7(Z4):S25

相似文献

7.

Tumor classification by partial least squares using microarray gene expression data 总被引：30，自引：0，他引：30

Nguyen DV Rocke DM 《Bioinformatics (Oxford, England)》2002,18(1):39-50

MOTIVATION: One important application of gene expression microarray data is classification of samples into categories, such as the type of tumor. The use of microarrays allows simultaneous monitoring of thousands of genes expressions per sample. This ability to measure gene expression en masse has resulted in data with the number of variables p(genes) far exceeding the number of samples N. Standard statistical methodologies in classification and prediction do not work well or even at all when N < p. Modification of existing statistical methodologies or development of new methodologies is needed for the analysis of microarray data. RESULTS: We propose a novel analysis procedure for classifying (predicting) human tumor samples based on microarray gene expressions. This procedure involves dimension reduction using Partial Least Squares (PLS) and classification using Logistic Discrimination (LD) and Quadratic Discriminant Analysis (QDA). We compare PLS to the well known dimension reduction method of Principal Components Analysis (PCA). Under many circumstances PLS proves superior; we illustrate a condition when PCA particularly fails to predict well relative to PLS. The proposed methods were applied to five different microarray data sets involving various human tumor samples: (1) normal versus ovarian tumor; (2) Acute Myeloid Leukemia (AML) versus Acute Lymphoblastic Leukemia (ALL); (3) Diffuse Large B-cell Lymphoma (DLBCLL) versus B-cell Chronic Lymphocytic Leukemia (BCLL); (4) normal versus colon tumor; and (5) Non-Small-Cell-Lung-Carcinoma (NSCLC) versus renal samples. Stability of classification results and methods were further assessed by re-randomization studies. 相似文献

8.

Kernelized partial least squares for feature reduction and classification of gene microarray data

WH Land X Qiao DE Margolis WS Ford CT Paquette JF Perez-Rogers JA Borgia JY Yang Y Deng 《BMC systems biology》2011,5(Z3):S13

相似文献

9.

Partial least squares proportional hazard regression for application to DNA microarray survival data 总被引：3，自引：0，他引：3

Nguyen DV Rocke DM 《Bioinformatics (Oxford, England)》2002,18(12):1625-1632

相似文献

10.

Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis 总被引：3，自引：0，他引：3

Kim H Park H 《Bioinformatics (Oxford, England)》2007,23(12):1495-1502

MOTIVATION: Many practical pattern recognition problems require non-negativity constraints. For example, pixels in digital images and chemical concentrations in bioinformatics are non-negative. Sparse non-negative matrix factorizations (NMFs) are useful when the degree of sparseness in the non-negative basis matrix or the non-negative coefficient matrix in an NMF needs to be controlled in approximating high-dimensional data in a lower dimensional space. RESULTS: In this article, we introduce a novel formulation of sparse NMF and show how the new formulation leads to a convergent sparse NMF algorithm via alternating non-negativity-constrained least squares. We apply our sparse NMF algorithm to cancer-class discovery and gene expression data analysis and offer biological analysis of the results obtained. Our experimental results illustrate that the proposed sparse NMF algorithm often achieves better clustering performance with shorter computing time compared to other existing NMF algorithms. AVAILABILITY: The software is available as supplementary material. 相似文献

11.

Hybrid huberized support vector machines for microarray classification and gene selection 总被引：1，自引：0，他引：1

Wang L Zhu J Zou H 《Bioinformatics (Oxford, England)》2008,24(3):412-419

MOTIVATION: The standard L(2)-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant genes for the classification. The L(1)-norm SVM is a variant of the standard L(2)-norm SVM, that constrains the L(1)-norm of the fitted coefficients. Due to the singularity of the L(1)-norm, the L(1)-norm SVM has the property of automatically selecting relevant genes. On the other hand, the L(1)-norm SVM has two drawbacks: (1) the number of selected genes is upper bounded by the size of the training data; (2) when there are several highly correlated genes, the L(1)-norm SVM tends to pick only a few of them, and remove the rest. RESULTS: We propose a hybrid huberized support vector machine (HHSVM). The HHSVM combines the huberized hinge loss function and the elastic-net penalty. By doing so, the HHSVM performs automatic gene selection in a way similar to the L(1)-norm SVM. In addition, the HHSVM encourages highly correlated genes to be selected (or removed) together. We also develop an efficient algorithm to compute the entire solution path of the HHSVM. Numerical results indicate that the HHSVM tends to provide better variable selection results than the L(1)-norm SVM, especially when variables are highly correlated. AVAILABILITY: R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/. 相似文献

12.

Missing value estimation for DNA microarray gene expression data: local least squares imputation 总被引：9，自引：0，他引：9

Kim H Golub GH Park H 《Bioinformatics (Oxford, England)》2005,21(2):187-198

MOTIVATION: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. RESULTS: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. AVAILABILITY: The software is available at http://www.cs.umn.edu/~hskim/tools.html CONTACT: hpark@cs.umn.edu 相似文献

13.

Missing value estimation for DNA microarray gene expression data: local least squares imputation

Kim Hyunsoo; Golub Gene H.; Park Haesun 《Bioinformatics (Oxford, England)》2006,22(11):1410-1411

In our article, only a set of random positions of missing valueswas used for each dataset. However, imputation methods may 相似文献

14.

Partial least squares dimension reduction for microarray gene expression data with a censored response

Nguyen DV 《Mathematical biosciences》2005,193(1):119-137

相似文献

15.

Gene selection in microarray data: the elephant,the blind men and our algorithms 总被引：1，自引：0，他引：1

Stolovitzky G 《Current opinion in structural biology》2003,13(3):370-376

Gene expression array data provide shadows of intricate cellular processes. Learning how to make the most of the information present in expression arrays has become a discipline in itself. In recent years, there has been an explosion of methods that analyze gene expression arrays to produce long lists of genes that express differentially in distinct cellular states. These lists will have to be organized, and the algorithms that produced them combined, if we wish to piece together the rich cellular structures probed by this high-throughput technology. Researchers will have to understand the benefits and limitations of the many existing methods to produce the combination of algorithms that best suits their gene expression experiments. 相似文献

16.

Stable feature selection based on the ensemble L 1 -norm support vector machine for biomarker discovery

Moon Myungjin Nakai Kenta 《BMC genomics》2016,17(13):65-74

Background

Lately, biomarker discovery has become one of the most significant research issues in the biomedical field. Owing to the presence of high-throughput technologies, genomic data, such as microarray data and RNA-seq, have become widely available. Many kinds of feature selection techniques have been applied to retrieve significant biomarkers from these kinds of data. However, they tend to be noisy with high-dimensional features and consist of a small number of samples; thus, conventional feature selection approaches might be problematic in terms of reproducibility.

Results

In this article, we propose a stable feature selection method for high-dimensional datasets. We apply an ensemble L ₁-norm support vector machine to efficiently reduce irrelevant features, considering the stability of features. We define the stability score for each feature by aggregating the ensemble results, and utilize backward feature elimination on a purified feature set based on this score; therefore, it is possible to acquire an optimal set of features for performance without the need to set a specific threshold. The proposed methodology is evaluated by classifying the binary stage of renal clear cell carcinoma with RNA-seq data.

Conclusion

A comparison with established algorithms, i.e., a fast correlation-based filter, random forest, and an ensemble version of an L ₂-norm support vector machine-based recursive feature elimination, enabled us to prove the superior performance of our method in terms of classification as well as stability in general. It is also shown that the proposed approach performs moderately on high-dimensional datasets consisting of a very large number of features and a smaller number of samples. The proposed approach is expected to be applicable to many other researches aimed at biomarker discovery.

相似文献

17.

Iterated local least squares microarray missing value imputation

Cai Z Heydari M Lin G 《Journal of bioinformatics and computational biology》2006,4(5):935-957

Microarray gene expression data often contains multiple missing values due to various reasons. However, most of gene expression data analysis algorithms require complete expression data. Therefore, accurate estimation of the missing values is critical to further data analysis. In this paper, an Iterated Local Least Squares Imputation (ILLSimpute) method is proposed for estimating missing values. Two unique features of ILLSimpute method are: ILLSimpute method does not fix a common number of coherent genes for target genes for estimation purpose, but defines coherent genes as those within a distance threshold to the target genes. Secondly, in ILLSimpute method, estimated values in one iteration are used for missing value estimation in the next iteration and the method terminates after certain iterations or the imputed values converge. Experimental results on six real microarray datasets showed that ILLSimpute method performed at least as well as, and most of the time much better than, five most recent imputation methods. 相似文献

18.

Robust feature selection for microarray data based on multicriterion fusion 总被引：1，自引：0，他引：1

Yang F Mao KZ 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(4):1080-1092

Feature selection often aims to select a compact feature subset to build a pattern classifier with reduced complexity, so as to achieve improved classification performance. From the perspective of pattern analysis, producing stable or robust solution is also a desired property of a feature selection algorithm. However, the issue of robustness is often overlooked in feature selection. In this study, we analyze the robustness issue existing in feature selection for high-dimensional and small-sized gene-expression data, and propose to improve robustness of feature selection algorithm by using multiple feature selection evaluation criteria. Based on this idea, a multicriterion fusion-based recursive feature elimination (MCF-RFE) algorithm is developed with the goal of improving both classification performance and stability of feature selection results. Experimental studies on five gene-expression data sets show that the MCF-RFE algorithm outperforms the commonly used benchmark feature selection algorithm SVM-RFE. 相似文献

19.

Gene/protein name recognition based on support vector machine using dictionary as features

Mitsumori T Fation S Murata M Doi K Doi H 《BMC bioinformatics》2005,6(Z1):S8

相似文献

20.

Reconstruction of genetic association networks from microarray data: a partial least squares approach

Pihur V Datta S Datta S 《Bioinformatics (Oxford, England)》2008,24(4):561-568

MOTIVATION: Gene association/interaction networks provide vast amounts of information about essential processes inside the cell. A complete picture of gene-gene associations/interactions would open new horizons for biologists, ranging from pure appreciation to successful manipulation of biological pathways for therapeutic purposes. Therefore, identification of important biological complexes whose members (genes and their products proteins) interact with each other is of prime importance. Numerous experimental methods exist but, for the most part, they are costly and labor intensive. Computational techniques, such as the one proposed in this work, provide a quick 'budget' solution that can be used as a screening tool before more expensive techniques are attempted. Here, we introduce a novel computational method based on the partial least squares (PLS) regression technique for reconstruction of genetic networks from microarray data. RESULTS: The proposed PLS method is shown to be an effective screening procedure for the detection of gene-gene interactions from microarray data. Both simulated and real microarray experiments show that the PLS-based approach is superior to its competitors both in terms of performance and applicability. AVAILABILITY: R code is available from the supplementary web-site whose URL is given below. 相似文献