共查询到16条相似文献,搜索用时 265 毫秒
1.
特征提取和分类是模式识别中的关键问题。结合小波分析理论和支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。提取小波低频系数表征原始数据并送入支持向量机分类器分类,实验证明:提取db1小波4层分解下的低频系数,送入分类器分类后正确分类率达到93.53%。Haar小波的正确率是92.94%。可见提取不同小波低频系数,得到的分类效果相差不大。 相似文献
2.
3.
结合小波分析理论与支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。本文着重研究小波高频系数基因芯片数据的特征提取,并通过实验对比小波高频系数和低频系数特征提取对分类器性能的影响。其中haar小波3层分解提取高频系数,送入分类器分类后,得到的正确分类率为93.31%。db1小波4层分解提取低频系数,送入分类器分类后,得到的正确分类率为93.53%。小波低频系数特征提取分类效果总体上好于高频系数,分类器性能稳定。 相似文献
4.
基于氨基酸组成分布的蛋白质同源寡聚体分类研究 总被引:7,自引:0,他引:7
基于一种新的特征提取方法——氨基酸组成分布,使用支持向量机作为成员分类器,采用“一对一”的多类分类策略,从蛋白质一级序列对四类同源寡聚体进行分类研究。结果表明,在10-CV检验下,基于氨基酸组成分布,其总分类精度和精度指数分别达到了86.22%和67.12%,比基于氨基酸组成成分的传统特征提取方法分别提高了5.74和10.03个百分点,比二肽组成成分特征提取方法分别提高了3.12和5.63个百分点,说明氨基酸组成分布对于蛋白质同源寡聚体分类是一种非常有效的特征提取方法;将氨基酸组成分布和蛋白质序列长度特征组合,其总分类精度和精度指数分别达到了86.35%和67.23%,说明蛋白质序列长度特征含有一定的空间结构信息。 相似文献
5.
建立了基于小波降噪和支持向量机的结肠癌基因表达数据肿瘤识别模型.对试验数据进行小波分解,并利用交叉验证的方法计算试验样本的平均分类准确率,确定小波函数与小波分解层数;引入能量阈值方法对小波分解系数进行阈值处理,达到降噪的目的;提出了基因分类贡献率与主成分分析结合的方法,提取结肠癌样本数据特征;利用支持向量机强大的非线性映射能力,实现对结肠癌样本数据的非线性分类.为了减弱样本集的划分对分类准确率的影响,本文采取Jackknife检验方法对支持向量分类器的分类器检验,其分类准确率为96.77%.试验结果证明了该方法的有效性,该方法对结肠癌的识别具有一定的参考价值. 相似文献
6.
7.
针对目前多分类运动想象脑电识别存在特征提取单一、分类准确率低等问题,提出一种多特征融合的四分类运动想象脑电识别方法来提高识别率。对预处理后的脑电信号分别使用希尔伯特-黄变换、一对多共空间模式、近似熵、模糊熵、样本熵提取结合时频—空域—非线性动力学的初始特征向量,用主成分分析降维,最后使用粒子群优化支持向量机分类。该算法通过对国际标准数据集BCI2005 Data set IIIa中的k3b受试者数据经MATLAB仿真处理后获得93.30%的识别率,均高于单一特征和其它组合特征下的识别率。分别对四名实验者实验采集运动想象脑电数据,使用本研究提出的方法处理获得了72.96%的平均识别率。结果表明多特征融合的特征提取方法能更好的表征运动想象脑电信号,使用粒子群支持向量机可取得较高的识别准确率,为人脑的认知活动提供了一种新的识别方法。 相似文献
8.
9.
准确对事件诱发电位(ERPs)进行分类,对于各种人类认知研究和临床医学评估非常有意义.由于ERPs信号是非常高维的数据,而且其中包含非常多的与分类无关的信息,从ERPs信号中提取特征尤显重要.分析了共空间模式(CSP)的原理和不足,引入自回归(AR)模型与白化变换相结合,提出了针对ERPs分类的时空特征提取方法,并设计了验证该方法的认知实验,在认知实验数据上分别用时空特征提取方法与CSP提取特征,用同样的分类器支持向量机(SVM)训练分类器,比较它们的分类效果.实验表明,在ERPs分类问题上,时空特征提取方法与CSP相比具有明显的优势,在参数确定合理的情况下,时空特征提取方法可使分类准确率达到90%以上. 相似文献
10.
11.
Sun BY Zhu ZH Li J Linghu B 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1671-1677
Prognostic prediction is important in medical domain, because it can be used to select an appropriate treatment for a patient by predicting the patient's clinical outcomes. For high-dimensional data, a normal prognostic method undergoes two steps: feature selection and prognosis analysis. Recently, the L?-L?-norm Support Vector Machine (L?-L? SVM) has been developed as an effective classification technique and shown good classification performance with automatic feature selection. In this paper, we extend L?-L? SVM for regression analysis with automatic feature selection. We further improve the L?-L? SVM for prognostic prediction by utilizing the information of censored data as constraints. We design an efficient solution to the new optimization problem. The proposed method is compared with other seven prognostic prediction methods on three realworld data sets. The experimental results show that the proposed method performs consistently better than the medium performance. It is more efficient than other algorithms with the similar performance. 相似文献
12.
A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection 总被引:16,自引:0,他引:16
Yasui Y Pepe M Thompson ML Adam BL Wright GL Qu Y Potter JD Winget M Thornquist M Feng Z 《Biostatistics (Oxford, England)》2003,4(3):449-463
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery. 相似文献
13.
Oh JH Nandi A Gurnani P Knowles L Schorge J Rosenblatt KP Gao JX 《Journal of bioinformatics and computational biology》2006,4(6):1159-1179
Ovarian cancer recurs at the rate of 75% within a few months or several years later after therapy. Early recurrence, though responding better to treatment, is difficult to detect. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry has showed the potential to accurately identify disease biomarkers to help early diagnosis. A major challenge in the interpretation of SELDI-TOF data is the high dimensionality of the feature space. To tackle this problem, we have developed a multi-step data processing method composed of t-test, binning and backward feature selection. A new algorithm, support vector machine-Markov blanket/recursive feature elimination (SVM-MB/RFE) is presented for the backward feature selection. This method is an integration of minimum weight feature elimination by SVM-RFE and information theory based redundant/irrelevant feature removal by Markov Blanket. Subsequently, SVM was used for classification. We conducted the biomarker selection algorithm on 113 serum samples to identify early relapse from ovarian cancer patients after primary therapy. To validate the performance of the proposed algorithm, experiments were carried out in comparison with several other feature selection and classification algorithms. 相似文献
14.
Barla A Jurman G Riccadonna S Merler S Chierici M Furlanello C 《Briefings in bioinformatics》2008,9(2):119-128
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies. 相似文献
15.
Navdeep Jaitly Anoop Mayampurath Kyle Littlefield Joshua N Adkins Gordon A Anderson Richard D Smith 《BMC bioinformatics》2009,10(1):1-15
Background
The majority of ovarian cancer biomarker discovery efforts focus on the identification of proteins that can improve the predictive power of presently available diagnostic tests. We here show that metabolomics, the study of metabolic changes in biological systems, can also provide characteristic small molecule fingerprints related to this disease.Results
In this work, new approaches to automatic classification of metabolomic data produced from sera of ovarian cancer patients and benign controls are investigated. The performance of support vector machines (SVM) for the classification of liquid chromatography/time-of-flight mass spectrometry (LC/TOF MS) metabolomic data focusing on recognizing combinations or "panels" of potential metabolic diagnostic biomarkers was evaluated. Utilizing LC/TOF MS, sera from 37 ovarian cancer patients and 35 benign controls were studied. Optimum panels of spectral features observed in positive or/and negative ion mode electrospray (ESI) MS with the ability to distinguish between control and ovarian cancer samples were selected using state-of-the-art feature selection methods such as recursive feature elimination and L1-norm SVM.Conclusion
Three evaluation processes (leave-one-out-cross-validation, 12-fold-cross-validation, 52-20-split-validation) were used to examine the SVM models based on the selected panels in terms of their ability for differentiating control vs. disease serum samples. The statistical significance for these feature selection results were comprehensively investigated. Classification of the serum sample test set was over 90% accurate indicating promise that the above approach may lead to the development of an accurate and reliable metabolomic-based approach for detecting ovarian cancer. 相似文献16.
Subha Mahadevi Alladi Shinde Santosh P Vadlamani Ravi Upadhyayula Suryanarayana Murthy 《Bioinformation》2008,3(3):130-133
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment.
Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present
study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative
study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic
regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this
dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks
as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all
well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature
selection is an efficient and viable alternative to popular techniques. 相似文献