期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Classification of gene microarrays by penalized logistic regression 总被引：2，自引：0，他引：2

Zhu J Hastie T 《Biostatistics (Oxford, England)》2004,5(3):427-443

Classification of patient samples is an important aspect of cancer diagnosis and treatment. The support vector machine (SVM) has been successfully applied to microarray cancer diagnosis problems. However, one weakness of the SVM is that given a tumor sample, it only predicts a cancer class label but does not provide any estimate of the underlying probability. We propose penalized logistic regression (PLR) as an alternative to the SVM for the microarray cancer diagnosis problem. We show that when using the same set of genes, PLR and the SVM perform similarly in cancer classification, but PLR has the advantage of additionally providing an estimate of the underlying probability. Often a primary goal in microarray cancer diagnosis is to identify the genes responsible for the classification, rather than class prediction. We consider two gene selection methods in this paper, univariate ranking (UR) and recursive feature elimination (RFE). Empirical results indicate that PLR combined with RFE tends to select fewer genes than other methods and also performs well in both cross-validation and test samples. A fast algorithm for solving PLR is also described. 相似文献

2.

Investigating the gene expression profiles of cells in seven embryonic stages with machine learning algorithms

《Genomics》2020,112(3):2524-2534

The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages. 相似文献

3.

Improved centroids estimation for the nearest shrunken centroid classifier

Wang S Zhu J 《Bioinformatics (Oxford, England)》2007,23(8):972-979

MOTIVATION: The nearest shrunken centroid (NSC) method has been successfully applied in many DNA-microarray classification problems. The NSC uses 'shrunken' centroids as prototypes for each class and identifies subsets of genes that best characterize each class. Classification is then made to the nearest (shrunken) centroid. The NSC is very easy to implement and very easy to interpret, however, it has drawbacks. RESULTS: We show that the NSC method can be interpreted in the framework of LASSO regression. Based on that, we consider two new methods, adaptive L(infinity)-norm penalized NSC (ALP-NSC) and adaptive hierarchically penalized NSC (AHP-NSC), with two different penalty functions for microarray classification, which improve over the NSC. Unlike the L(1)-norm penalty used in LASSO, the penalty terms that we consider make use of the fact that parameters belonging to one gene should be treated as a natural group. Numerical results indicate that the two new methods tend to remove irrelevant genes more effectively and provide better classification results than the L(1)-norm approach. AVAILABILITY: R code for the ALP-NSC and the AHP-NSC algorithms are available from authors upon request. 相似文献

4.

Combined feature selection and cancer prognosis using support vector machine regression

Sun BY Zhu ZH Li J Linghu B 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1671-1677

Prognostic prediction is important in medical domain, because it can be used to select an appropriate treatment for a patient by predicting the patient's clinical outcomes. For high-dimensional data, a normal prognostic method undergoes two steps: feature selection and prognosis analysis. Recently, the L?-L?-norm Support Vector Machine (L?-L? SVM) has been developed as an effective classification technique and shown good classification performance with automatic feature selection. In this paper, we extend L?-L? SVM for regression analysis with automatic feature selection. We further improve the L?-L? SVM for prognostic prediction by utilizing the information of censored data as constraints. We design an efficient solution to the new optimization problem. The proposed method is compared with other seven prognostic prediction methods on three realworld data sets. The experimental results show that the proposed method performs consistently better than the medium performance. It is more efficient than other algorithms with the similar performance. 相似文献

5.

Recipe for uncovering predictive genes using support vector machines based on model population analysis

Li HD Liang YZ Xu QS Cao DS Tan BB Deng BC Lin CC 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1633-1641

Selecting a small number of informative genes for microarray-based tumor classification is central to cancer prediction and treatment. Based on model population analysis, here we present a new approach, called Margin Influence Analysis (MIA), designed to work with support vector machines (SVM) for selecting informative genes. The rationale for performing margin influence analysis lies in the fact that the margin of support vector machines is an important factor which underlies the generalization performance of SVM models. Briefly, MIA could reveal genes which have statistically significant influence on the margin by using Mann-Whitney U test. The reason for using the Mann-Whitney U test rather than two-sample t test is that Mann-Whitney U test is a nonparametric test method without any distribution-related assumptions and is also a robust method. Using two publicly available cancerous microarray data sets, it is demonstrated that MIA could typically select a small number of margin-influencing genes and further achieves comparable classification accuracy compared to those reported in the literature. The distinguished features and outstanding performance may make MIA a good alternative for gene selection of high dimensional microarray data. (The source code in MATLAB with GNU General Public License Version 2.0 is freely available at http://code.google.com/p/mia2009/). 相似文献

6.

Applications of support vector machines to cancer classification with microarray data

Chu F Wang L 《International journal of neural systems》2005,15(6):475-484

Microarray gene expression data usually have a large number of dimensions, e.g., over ten thousand genes, and a small number of samples, e.g., a few tens of patients. In this paper, we use the support vector machine (SVM) for cancer classification with microarray data. Dimensionality reduction methods, such as principal components analysis (PCA), class-separability measure, Fisher ratio, and t-test, are used for gene selection. A voting scheme is then employed to do multi-group classification by k(k - 1) binary SVMs. We are able to obtain the same classification accuracy but with much fewer features compared to other published results. 相似文献

7.

Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data 总被引：2，自引：0，他引：2

Song S Zhan Z Long Z Zhang J Yao L 《PloS one》2011,6(2):e17191

Background

Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming.

Methodology/Principal Findings

Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time.

Conclusions/Significance

The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice. 相似文献

8.

Colon cancer prediction with genetic profiles using intelligent techniques

Subha Mahadevi Alladi Shinde Santosh P Vadlamani Ravi Upadhyayula Suryanarayana Murthy 《Bioinformation》2008,3(3):130-133

Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques. 相似文献

9.

Epileptic Seizure Detection Based on New Hybrid Models with Electroencephalogram Signals

《IRBM》2020,41(6):331-353

Objectives: Epileptic seizures are one of the most common diseases in society and difficult to detect. In this study, a new method was proposed to automatically detect and classify epileptic seizures from EEG (Electroencephalography) signals.Methods: In the proposed method, EEG signals classification five-classes including the cases of eyes open, eyes closed, healthy, from the tumor region, an epileptic seizure, has been carried out by using the support vector machine (SVM) and the normalization methods comprising the z-score, minimum-maximum, and MAD normalizations. To classify the EEG signals, the support vector machine classifiers having different kernel functions, including Linear, Cubic, and Medium Gaussian, have been used. In order to evaluate the performance of the proposed hybrid models, the confusion matrix, ROC curves, and classification accuracy have been used. The used SVM models are Linear SVM, Cubic SVM, and Medium Gaussian SVM.Results: Without the normalizations, the obtained classification accuracies are 76.90%, 82.40%, and 81.70% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively. After applying the z-score normalization to the multi-class EEG signals dataset, the obtained classification accuracies are 77.10%, 82.30%, and 81.70% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively. With the minimum-maximum normalization, the obtained classification accuracies are 77.20%, 82.40%, and 81.50% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively. Moreover, finally, after applying the MAD normalization to the multi-class EEG signals dataset, the obtained classification accuracies are 76.70%, 82.50%, and 81.40% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively.Conclusion: The obtained results have shown that the best hybrid model is the combination of cubic SVM and MAD normalization in the classification of EEG signals classification five-classes. 相似文献

10.

Gene selection using support vector machines with non-convex penalty 总被引：2，自引：0，他引：2

Zhang HH Ahn J Lin X Park C 《Bioinformatics (Oxford, England)》2006,22(1):88-95

MOTIVATION: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of 'high-dimensional low sample size'. Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g. between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide a unified procedure for simultaneous gene selection and cancer classification, achieving high accuracy in both aspects. RESULTS: In this paper we develop a novel type of regularization in support vector machines (SVMs) to identify important genes for cancer classification. A special nonconvex penalty, called the smoothly clipped absolute deviation penalty, is imposed on the hinge loss function in the SVM. By systematically thresholding small estimates to zeros, the new procedure eliminates redundant genes automatically and yields a compact and accurate classifier. A successive quadratic algorithm is proposed to convert the non-differentiable and non-convex optimization problem into easily solved linear equation systems. The method is applied to two real datasets and has produced very promising results. AVAILABILITY: MATLAB codes are available upon request from the authors. 相似文献

11.

基于优化核参数支持向量机的意识任务分类

薛建中闫相国郑崇勋王浩军《生物物理学报》2003,19(3):322-326

根据支持向量机的基本原理,给出一种推广误差上界估计判据,并利用该判据进行最优核参数的自动选取。对三种不同意识任务的脑电信号进行多变量自回归模型参数估计,作为意识任务的特征向量,利用支持向量机进行训练和分类测试。分类结果表明,优化核参数的支持向量机分类器取得了最佳的分类效果,分类正确率明显高于径向基函数神经网络。相似文献

12.

ESVM: evolutionary support vector machine for automatic feature selection and classification of microarray data 总被引：1，自引：0，他引：1

Huang HL Chang FL 《Bio Systems》2007,90(2):516-528

An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems. 相似文献

13.

Extreme Learning Machine-Based Classification of ADHD Using Brain Structural MRI Data

Xiaolong Peng Pan Lin Tongsheng Zhang Jue Wang 《PloS one》2013,8(11)

Background

Effective and accurate diagnosis of attention-deficit/hyperactivity disorder (ADHD) is currently of significant interest. ADHD has been associated with multiple cortical features from structural MRI data. However, most existing learning algorithms for ADHD identification contain obvious defects, such as time-consuming training, parameters selection, etc. The aims of this study were as follows: (1) Propose an ADHD classification model using the extreme learning machine (ELM) algorithm for automatic, efficient and objective clinical ADHD diagnosis. (2) Assess the computational efficiency and the effect of sample size on both ELM and support vector machine (SVM) methods and analyze which brain segments are involved in ADHD.

Methods

High-resolution three-dimensional MR images were acquired from 55 ADHD subjects and 55 healthy controls. Multiple brain measures (cortical thickness, etc.) were calculated using a fully automated procedure in the FreeSurfer software package. In total, 340 cortical features were automatically extracted from 68 brain segments with 5 basic cortical features. F-score and SFS methods were adopted to select the optimal features for ADHD classification. Both ELM and SVM were evaluated for classification accuracy using leave-one-out cross-validation.

Results

We achieved ADHD prediction accuracies of 90.18% for ELM using eleven combined features, 84.73% for SVM-Linear and 86.55% for SVM-RBF. Our results show that ELM has better computational efficiency and is more robust as sample size changes than is SVM for ADHD classification. The most pronounced differences between ADHD and healthy subjects were observed in the frontal lobe, temporal lobe, occipital lobe and insular.

Conclusion

Our ELM-based algorithm for ADHD diagnosis performs considerably better than the traditional SVM algorithm. This result suggests that ELM may be used for the clinical diagnosis of ADHD and the investigation of different brain diseases. 相似文献

14.

PCP: a program for supervised classification of gene expression profiles 总被引：1，自引：0，他引：1

Buturović LJ 《Bioinformatics (Oxford, England)》2006,22(2):245-247

PCP (Pattern Classification Program) is an open-source machine learning program for supervised classification of patterns (vectors of measurements). The principal use of PCP in bioinformatics is design and evaluation of classifiers for use in clinical diagnostic tests based on measurements of gene expression. PCP implements leading pattern classification and gene selection algorithms and incorporates cross-validation estimation of classifier performance. Importantly, the implementation integrates gene selection and class prediction stages, which is vital for computing reliable performance estimates in small-sample scenarios. Additionally, the program includes automated and efficient model selection (optimization of parameters) for support vector machine (SVM) classifier. The distribution includes Linux and Windows/Cygwin binaries. The program can easily be ported to other platforms. AVAILABILITY: Free download at http://pcp.sourceforge.net 相似文献

15.

Predicting Drug-Target Interactions Using Drug-Drug Interactions

Shinhyuk Kim Daeyong Jin Hyunju Lee 《PloS one》2013,8(11)

Computational methods for predicting drug-target interactions have become important in drug research because they can help to reduce the time, cost, and failure rates for developing new drugs. Recently, with the accumulation of drug-related data sets related to drug side effects and pharmacological data, it has became possible to predict potential drug-target interactions. In this study, we focus on drug-drug interactions (DDI), their adverse effects () and pharmacological information (), and investigate the relationship among chemical structures, side effects, and DDIs from several data sources. In this study, data from the STITCH database, from drugs.com, and drug-target pairs from ChEMBL and SIDER were first collected. Then, by applying two machine learning approaches, a support vector machine (SVM) and a kernel-based L1-norm regularized logistic regression (KL1LR), we showed that DDI is a promising feature in predicting drug-target interactions. Next, the accuracies of predicting drug-target interactions using DDI were compared to those obtained using the chemical structure and side effects based on the SVM and KL1LR approaches, showing that DDI was the data source contributing the most for predicting drug-target interactions. 相似文献

16.

Identification of marker genes in Alzheimer's disease using a machine-learning model

Inamul Hasan Madar Ghazala Sultan Iftikhar Aslam Tayubi Atif Noorul Hasan Bandana Pahi Anjali Rai Pravitha Kasu Sivanandan Tamizhini Loganathan Mahamuda Begum Sneha Rai 《Bioinformation》2021,17(2):348

Alzheimer''s Disease (AD) is one of the most common causes of dementia, mostly affecting the elderly population. Currently, there is no proper diagnostic tool or method available for the detection of AD. The present study used two distinct data sets of AD genes, which could be potential biomarkers in the diagnosis. The differentially expressed genes (DEGs) curated from both datasets were used for machine learning classification, tissue expression annotation and co-expression analysis. Further, CNPY3, GPR84, HIST1H2AB, HIST1H2AE, IFNAR1, LMO3, MYO18A, N4BP2L1, PML, SLC4A4, ST8SIA4, TLE1 and N4BP2L1 were identified as highly significant DEGs and exhibited co-expression with other query genes. Moreover, a tissue expression study found that these genes are also expressed in the brain tissue. In addition to the earlier studies for marker gene identification, we have considered a different set of machine learning classifiers to improve the accuracy rate from the analysis. Amongst all the six classification algorithms, J48 emerged as the best classifier, which could be used for differentiating healthy and diseased samples. SMO/SVM and Logit Boost further followed J48 to achieve the classification accuracy. 相似文献

17.

Gene selection for sample classifications in microarray experiments

Tsai CA Chen CH Lee TC Ho IC Yang UC Chen JJ 《DNA and cell biology》2004,23(10):607-614

DNA microarray technology provides useful tools for profiling global gene expression patterns in different cell/tissue samples. One major challenge is the large number of genes relative to the number of samples. The use of all genes can suppress or reduce the performance of a classification rule due to the noise of nondiscriminatory genes. Selection of an optimal subset from the original gene set becomes an important prestep in sample classification. In this study, we propose a family-wise error (FWE) rate approach to selection of discriminatory genes for two-sample or multiple-sample classification. The FWE approach controls the probability of the number of one or more false positives at a prespecified level. A public colon cancer data set is used to evaluate the performance of the proposed approach for the two classification methods: k nearest neighbors (k-NN) and support vector machine (SVM). The selected gene sets from the proposed procedure appears to perform better than or comparable to several results reported in the literature using the univariate analysis without performing multivariate search. In addition, we apply the FWE approach to a toxicogenomic data set with nine treatments (a control and eight metals, As, Cd, Ni, Cr, Sb, Pb, Cu, and AsV) for a total of 55 samples for a multisample classification. Two gene sets are considered: the gene set omegaF formed by the ANOVA F-test, and a gene set omegaT formed by the union of one-versus-all t-tests. The predicted accuracies are evaluated using the internal and external crossvalidation. Using the SVM classification, the overall accuracies to predict 55 samples into one of the nine treatments are above 80% for internal crossvalidation. OmegaF has slightly higher accuracy rates than omegaT. The overall predicted accuracies are above 70% for the external crossvalidation; the two gene sets omegaT and omegaF performed equally well. 相似文献

18.

The osteogenic differentiation of human bone marrow stromal cells induced by nanofiber scaffolds using bioinformatics

《生物化学与生物物理学报:疾病的分子基础》2021,1867(12):166245

This article aims to investigate the mechanism of behaviors of human bone marrow stromal cells (hBMSCs) affected by scaffold structure combining Monte Carlo feature selection (MFCS), incremental feature selection (IFS) and support vector machine (SVM). The specific differentially expressed genes (DEGs) of hBMSCs cultured on nanofiber (NF) scaffolds and freeform fabrication (FFF) scaffolds were obtained. Key genes were screened from common genes between osteogenic DEGs and NF specific DEGs with MFCS, IFS and SVM. The results demonstrated that NF scaffolds induced hBMSCs to express more genes related to osteogenic differentiation. Finally, 16 key genes were identified among the common genes. The common genes were significantly enriched in Rap1 signaling pathway, extracellular matrix and ossification. The results in this study suggested that the gene expression of hBMSCs was sensitive to NF scaffolds and FFF scaffolds, and the osteogenic differentiation of hBMSCs could be enhanced by NF scaffolds. 相似文献

19.

LS Bound based gene selection for DNA microarray data

Zhou X Mao KZ 《Bioinformatics (Oxford, England)》2005,21(8):1559-1564

MOTIVATION: One problem with discriminant analysis of DNA microarray data is that each sample is represented by quite a large number of genes, and many of them are irrelevant, insignificant or redundant to the discriminant problem at hand. Methods for selecting important genes are, therefore, of much significance in microarray data analysis. In the present study, a new criterion, called LS Bound measure, is proposed to address the gene selection problem. The LS Bound measure is derived from leave-one-out procedure of LS-SVMs (least squares support vector machines), and as the upper bound for leave-one-out classification results it reflects to some extent the generalization performance of gene subsets. RESULTS: We applied this LS Bound measure for gene selection on two benchmark microarray datasets: colon cancer and leukemia. We also compared the LS Bound measure with other evaluation criteria, including the well-known Fisher's ratio and Mahalanobis class separability measure, and other published gene selection algorithms, including Weighting factor and SVM Recursive Feature Elimination. The strength of the LS Bound measure is that it provides gene subsets leading to more accurate classification results than the filter method while its computational complexity is at the level of the filter method. AVAILABILITY: A companion website can be accessed at http://www.ntu.edu.sg/home5/pg02776030/lsbound/. The website contains: (1) the source code of the gene selection algorithm; (2) the complete set of tables and figures regarding the experimental study; (3) proof of the inequality (9). CONTACT: ekzmao@ntu.edu.sg. 相似文献

20.

Doubly penalized buckley-james method for survival data with high-dimensional covariates.

Sijian Wang Bin Nan Ji Zhu David G Beer 《Biometrics》2008,64(1):132-140

Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study. 相似文献