期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines 总被引：11，自引：0，他引：11

Peng S Xu Q Ling XB Peng X Du W Chen L 《FEBS letters》2003,555(2):358-362

Simultaneous multiclass classification of tumor types is essential for future clinical implementations of microarray-based cancer diagnosis. In this study, we have combined genetic algorithms (GAs) and all paired support vector machines (SVMs) for multiclass cancer identification. The predictive features have been selected through iterative SVMs/GAs, and recursive feature elimination post-processing steps, leading to a very compact cancer-related predictive gene set. Leave-one-out cross-validations yielded accuracies of 87.93% for the eight-class and 85.19% for the fourteen-class cancer classifications, outperforming the results derived from previously published methods. 相似文献

2.

Hybrid huberized support vector machines for microarray classification and gene selection 总被引：1，自引：0，他引：1

Wang L Zhu J Zou H 《Bioinformatics (Oxford, England)》2008,24(3):412-419

MOTIVATION: The standard L(2)-norm support vector machine (SVM) is a widely used tool for microarray classification. Previous studies have demonstrated its superior performance in terms of classification accuracy. However, a major limitation of the SVM is that it cannot automatically select relevant genes for the classification. The L(1)-norm SVM is a variant of the standard L(2)-norm SVM, that constrains the L(1)-norm of the fitted coefficients. Due to the singularity of the L(1)-norm, the L(1)-norm SVM has the property of automatically selecting relevant genes. On the other hand, the L(1)-norm SVM has two drawbacks: (1) the number of selected genes is upper bounded by the size of the training data; (2) when there are several highly correlated genes, the L(1)-norm SVM tends to pick only a few of them, and remove the rest. RESULTS: We propose a hybrid huberized support vector machine (HHSVM). The HHSVM combines the huberized hinge loss function and the elastic-net penalty. By doing so, the HHSVM performs automatic gene selection in a way similar to the L(1)-norm SVM. In addition, the HHSVM encourages highly correlated genes to be selected (or removed) together. We also develop an efficient algorithm to compute the entire solution path of the HHSVM. Numerical results indicate that the HHSVM tends to provide better variable selection results than the L(1)-norm SVM, especially when variables are highly correlated. AVAILABILITY: R code are available at http://www.stat.lsa.umich.edu/~jizhu/code/hhsvm/. 相似文献

3.

Parallelization of multicategory support vector machines (PMC-SVM) for classifying microarray data

Zhang C Li P Rajendran A Deng Y Chen D 《BMC bioinformatics》2006,7(Z4):S15

相似文献

4.

Data-dependent kernel machines for microarray data classification

Xiong H Zhang Y Chen XW 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):583-595

One important application of gene expression analysis is to classify tissue samples according to their gene expression levels. Gene expression data are typically characterized by high dimensionality and small sample size, which makes the classification task quite challenging. In this paper, we present a data-dependent kernel for microarray data classification. This kernel function is engineered so that the class separability of the training data is maximized. A bootstrapping-based resampling scheme is introduced to reduce the possible training bias. The effectiveness of this adaptive kernel for microarray data classification is illustrated with a k-Nearest Neighbor (KNN) classifier. Our experimental study shows that the data-dependent kernel leads to a significant improvement in the accuracy of KNN classifiers. Furthermore, this kernel-based KNN scheme has been demonstrated to be competitive to, if not better than, more sophisticated classifiers such as Support Vector Machines (SVMs) and the Uncorrelated Linear Discriminant Analysis (ULDA) for classifying gene expression data. 相似文献

5.

Multiclass cancer classification by support vector machines with class-wise optimized genes and probability estimates

Ashish Anand 《Journal of theoretical biology》2009,259(3):533-229

We investigate the multiclass classification of cancer microarray samples. In contrast to classification of two cancer types from gene expression data, multiclass classification of more than two cancer types are relatively hard and less studied problem. We used class-wise optimized genes with corresponding one-versus-all support vector machine (OVA-SVM) classifier to maximize the utilization of selected genes. Final prediction was made by using probability scores from all classifiers. We used three different methods of estimating probability from decision value. Among the three probability methods, Platt's approach was more consistent, whereas, isotonic approach performed better for datasets with unequal proportion of samples in different classes. Probability based decision does not only gives true and fair comparison between different one-versus-all (OVA) classifiers but also gives the possibility of using them for any post analysis. Several ensemble experiments, an example of post analysis, of the three probability methods were implemented to study their effect in improving the classification accuracy. We observe that ensemble did help in improving the predictive accuracy of cancer data sets especially involving unbalanced samples. Four-fold external stratified cross-validation experiment was performed on the six multiclass cancer datasets to obtain unbiased estimates of prediction accuracies. Analysis of class-wise frequently selected genes on two cancer datasets demonstrated that the approach was able to select important and relevant genes consistent to literature. This study demonstrates successful implementation of the framework of class-wise feature selection and multiclass classification for prediction of cancer subtypes on six datasets. 相似文献

6.

Oligonucleotide microarray identification of Bacillus anthracis strains using support vector machines

Doran M Raicu DS Furst JD Settimi R Schipma M Chandler DP 《Bioinformatics (Oxford, England)》2007,23(4):487-492

The capability of a custom microarray to discriminate between closely related DNA samples is demonstrated using a set of Bacillus anthracis strains. The microarray was developed as a universal fingerprint device consisting of 390 genome-independent 9mer probes. The genomes of B. anthracis strains are monomorphic and therefore, typically difficult to distinguish using conventional molecular biology tools or microarray data clustering techniques. Using support vector machines (SVMs) as a supervised learning technique, we show that a low-density fingerprint microarray contains enough information to discriminate between B. anthracis strains with 90% sensitivity using a reference library constructed from six replicate arrays and three replicates for new isolates. 相似文献

7.

Visualization-based cancer microarray data classification analysis

Mramor M Leban G Demsar J Zupan B 《Bioinformatics (Oxford, England)》2007,23(16):2147-2154

MOTIVATION: Methods for analyzing cancer microarray data often face two distinct challenges: the models they infer need to perform well when classifying new tissue samples while at the same time providing an insight into the patterns and gene interactions hidden in the data. State-of-the-art supervised data mining methods often cover well only one of these aspects, motivating the development of methods where predictive models with a solid classification performance would be easily communicated to the domain expert. RESULTS: Data visualization may provide for an excellent approach to knowledge discovery and analysis of class-labeled data. We have previously developed an approach called VizRank that can score and rank point-based visualizations according to degree of separation of data instances of different class. We here extend VizRank with techniques to uncover outliers, score features (genes) and perform classification, as well as to demonstrate that the proposed approach is well suited for cancer microarray analysis. Using VizRank and radviz visualization on a set of previously published cancer microarray data sets, we were able to find simple, interpretable data projections that include only a small subset of genes yet do clearly differentiate among different cancer types. We also report that our approach to classification through visualization achieves performance that is comparable to state-of-the-art supervised data mining techniques. AVAILABILITY: VizRank and radviz are implemented as part of the Orange data mining suite (http://www.ailab.si/orange). SUPPLEMENTARY INFORMATION: Supplementary data are available from http://www.ailab.si/supp/bi-cancer. 相似文献

8.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data 总被引：1，自引：0，他引：1

Huang HL Chang FL 《Bio Systems》2007,90(2):516-528

An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献

9.

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification 总被引：2，自引：0，他引：2

Alexander Statnikov Lily Wang Constantin F Aliferis

《BMC bioinformatics》

Background

Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. 相似文献

10.

SVM Classifier - a comprehensive java interface for support vector machine classification of microarray data

Pirooznia M Deng Y 《BMC bioinformatics》2006,7(Z4):S25

相似文献

11.

Multiclass classification of microarray data with repeated measurements: application to cancer

Yeung KY Bumgarner RE 《Genome biology》2003,4(12):R83

Prediction of the diagnostic category of a tissue sample from its gene-expression profile and selection of relevant genes for class prediction have important applications in cancer research. We have developed the uncorrelated shrunken centroid (USC) and error-weighted, uncorrelated shrunken centroid (EWUSC) algorithms that are applicable to microarray data with any number of classes. We show that removing highly correlated genes typically improves classification results using a small set of genes. 相似文献

12.

Support vector machine classification and validation of cancer tissue samples using microarray expression data 总被引：45，自引：0，他引：45

Furey TS Cristianini N Duffy N Bednarski DW Schummer M Haussler D 《Bioinformatics (Oxford, England)》2000,16(10):906-914

MOTIVATION: DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. We have developed a new method to analyse this kind of data using support vector machines (SVMs). This analysis consists of both classification of the tissue samples, and an exploration of the data for mis-labeled or questionable tissue results. RESULTS: We demonstrate the method in detail on samples consisting of ovarian cancer tissues, normal ovarian tissues, and other normal tissues. The dataset consists of expression experiment results for 97,802 cDNAs for each tissue. As a result of computational analysis, a tissue sample is discovered and confirmed to be wrongly labeled. Upon correction of this mistake and the removal of an outlier, perfect classification of tissues is achieved, but not with high confidence. We identify and analyse a subset of genes from the ovarian dataset whose expression is highly differentiated between the types of tissues. To show robustness of the SVM method, two previously published datasets from other types of tissues or cells are analysed. The results are comparable to those previously obtained. We show that other machine learning methods also perform comparably to the SVM on many of those datasets. AVAILABILITY: The SVM software is available at http://www.cs. columbia.edu/ approximately bgrundy/svm. 相似文献

13.

Classification of multiple cancer types by multicategory support vector machines using gene expression data 总被引：11，自引：0，他引：11

Lee Y Lee CK 《Bioinformatics (Oxford, England)》2003,19(9):1132-1139

MOTIVATION: High-density DNA microarray measures the activities of several thousand genes simultaneously and the gene expression profiles have been used for the cancer classification recently. This new approach promises to give better therapeutic measurements to cancer patients by diagnosing cancer types with improved accuracy. The Support Vector Machine (SVM) is one of the classification methods successfully applied to the cancer diagnosis problems. However, its optimal extension to more than two classes was not obvious, which might impose limitations in its application to multiple tumor types. We briefly introduce the Multicategory SVM, which is a recently proposed extension of the binary SVM, and apply it to multiclass cancer diagnosis problems. RESULTS: Its applicability is demonstrated on the leukemia data (Golub et al., 1999) and the small round blue cell tumors of childhood data (Khan et al., 2001). Comparable classification accuracy shown in the applications and its flexibility render the MSVM a viable alternative to other classification methods. SUPPLEMENTARY INFORMATION: http://www.stat.ohio-state.edu/~yklee/msvm.htm 相似文献

14.

Using recurrence quantification analysis descriptors for protein sequence classification with support vector machines

Mitra J Mundra P Kulkarni BD Jayaraman VK 《Journal of biomolecular structure & dynamics》2007,25(3):289-298

相似文献

15.

Multidimensional support vector machines for visualization of gene expression data

Komura D Nakamura H Tsutsumi S Aburatani H Ihara S 《Bioinformatics (Oxford, England)》2005,21(4):439-444

MOTIVATION: Since DNA microarray experiments provide us with huge amount of gene expression data, they should be analyzed with statistical methods to extract the meanings of experimental results. Some dimensionality reduction methods such as Principal Component Analysis (PCA) are used to roughly visualize the distribution of high dimensional gene expression data. However, in the case of binary classification of gene expression data, PCA does not utilize class information when choosing axes. Thus clearly separable data in the original space may not be so in the reduced space used in PCA. RESULTS: For visualization and class prediction of gene expression data, we have developed a new SVM-based method called multidimensional SVMs, that generate multiple orthogonal axes. This method projects high dimensional data into lower dimensional space to exhibit properties of the data clearly and to visualize a distribution of the data roughly. Furthermore, the multiple axes can be used for class prediction. The basic properties of conventional SVMs are retained in our method: solutions of mathematical programming are sparse, and nonlinear classification is implemented implicitly through the use of kernel functions. The application of our method to the experimentally obtained gene expression datasets for patients' samples indicates that our algorithm is efficient and useful for visualization and class prediction. CONTACT: komura@hal.rcast.u-tokyo.ac.jp. 相似文献

16.

Modelling ecological niches with support vector machines 总被引：2，自引：1，他引：2

JOHN M. DRAKE CHRISTOPHE RANDIN† ANTOINE GUISAN† 《Journal of Applied Ecology》2006,43(3):424-432

相似文献

17.

Secondary structure prediction with support vector machines 总被引：8，自引：0，他引：8

Ward JJ McGuffin LJ Buxton BF Jones DT 《Bioinformatics (Oxford, England)》2003,19(13):1650-1655

MOTIVATION: A new method that uses support vector machines (SVMs) to predict protein secondary structure is described and evaluated. The study is designed to develop a reliable prediction method using an alternative technique and to investigate the applicability of SVMs to this type of bioinformatics problem. METHODS: Binary SVMs are trained to discriminate between two structural classes. The binary classifiers are combined in several ways to predict multi-class secondary structure. RESULTS: The average three-state prediction accuracy per protein (Q(3)) is estimated by cross-validation to be 77.07 +/- 0.26% with a segment overlap (Sov) score of 73.32 +/- 0.39%. The SVM performs similarly to the 'state-of-the-art' PSIPRED prediction method on a non-homologous test set of 121 proteins despite being trained on substantially fewer examples. A simple consensus of the SVM, PSIPRED and PROFsec achieves significantly higher prediction accuracy than the individual methods. 相似文献

18.

Erratum to: Multiclass classification of microarray data with repeated measurements: application to cancer

Ka?Yee?Yeung Email author Roger?E?Bumgarner 《Genome biology》2005,6(13):405

相似文献

19.

Statistical analysis of big data: an approach based on support vector machines for classification and regression problems

N. O. Kadyrova L. V. Pavlova 《Biophysics》2014,59(3):364-373

A new type of learning algorithms with the supervisor for estimating multidimensional functions is considered. These methods based on Support Vector Machines are widely used due to their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data. Support vector machines and related kernel methods are extremely good at solving prediction problems in computational biology. A background about statistical learning theory and kernel feature spaces is given including practical and algorithmic considerations. 相似文献

20.

Multi-channel surface EMG classification using support vector machines and signal-based wavelet optimization

Marie-Franoise Lucas Adrien Gaufriau Sylvain Pascual Christian Doncarli Dario Farina 《Biomedical signal processing and control》2008,3(2):169-174

The study proposes a method for supervised classification of multi-channel surface electromyographic signals with the aim of controlling myoelectric prostheses. The representation space is based on the discrete wavelet transform (DWT) of each recorded EMG signal using unconstrained parameterization of the mother wavelet. The classification is performed with a support vector machine (SVM) approach in a multi-channel representation space. The mother wavelet is optimized with the criterion of minimum classification error, as estimated from the learning signal set. The method was applied to the classification of six hand movements with recording of the surface EMG from eight locations over the forearm. Misclassification rate in six subjects using the eight channels was (mean ± S.D.) 4.7 ± 3.7% with the proposed approach while it was 11.1 ± 10.0% without wavelet optimization (Daubechies wavelet). The DWT and SVM can be implemented with fast algorithms, thus, the method is suitable for real-time implementation. 相似文献