首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到16条相似文献,搜索用时 265 毫秒
1.
刘玉杰  刘毅慧 《生物信息学》2011,9(3):255-258,262
特征提取和分类是模式识别中的关键问题。结合小波分析理论和支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。提取小波低频系数表征原始数据并送入支持向量机分类器分类,实验证明:提取db1小波4层分解下的低频系数,送入分类器分类后正确分类率达到93.53%。Haar小波的正确率是92.94%。可见提取不同小波低频系数,得到的分类效果相差不大。  相似文献   

2.
高维蛋白质波谱数据分析过程中,对于数据的特征提取一直是许多学者专注解决的问题。本文提出了一种基于高频系数的小波分析和主成份分析技术(Principal component analysis,PCA)的特征提取方法,首先采用小波分析技术对数据进行降噪,提取高频系数作为特征,之后用主成份分析技术进行降维。实验显示:本论文中提出的方法在8-7-02、4/3/02数据集上的实验识别率分别可以达到100%和99.45%,可以有效提高分类识别率。  相似文献   

3.
结合小波分析理论与支持向量机理论,构造分类器模型,将前列腺癌基因芯片数据分成癌症和正常两种。本文着重研究小波高频系数基因芯片数据的特征提取,并通过实验对比小波高频系数和低频系数特征提取对分类器性能的影响。其中haar小波3层分解提取高频系数,送入分类器分类后,得到的正确分类率为93.31%。db1小波4层分解提取低频系数,送入分类器分类后,得到的正确分类率为93.53%。小波低频系数特征提取分类效果总体上好于高频系数,分类器性能稳定。  相似文献   

4.
基于氨基酸组成分布的蛋白质同源寡聚体分类研究   总被引:7,自引:0,他引:7  
基于一种新的特征提取方法——氨基酸组成分布,使用支持向量机作为成员分类器,采用“一对一”的多类分类策略,从蛋白质一级序列对四类同源寡聚体进行分类研究。结果表明,在10-CV检验下,基于氨基酸组成分布,其总分类精度和精度指数分别达到了86.22%和67.12%,比基于氨基酸组成成分的传统特征提取方法分别提高了5.74和10.03个百分点,比二肽组成成分特征提取方法分别提高了3.12和5.63个百分点,说明氨基酸组成分布对于蛋白质同源寡聚体分类是一种非常有效的特征提取方法;将氨基酸组成分布和蛋白质序列长度特征组合,其总分类精度和精度指数分别达到了86.35%和67.23%,说明蛋白质序列长度特征含有一定的空间结构信息。  相似文献   

5.
建立了基于小波降噪和支持向量机的结肠癌基因表达数据肿瘤识别模型.对试验数据进行小波分解,并利用交叉验证的方法计算试验样本的平均分类准确率,确定小波函数与小波分解层数;引入能量阈值方法对小波分解系数进行阈值处理,达到降噪的目的;提出了基因分类贡献率与主成分分析结合的方法,提取结肠癌样本数据特征;利用支持向量机强大的非线性映射能力,实现对结肠癌样本数据的非线性分类.为了减弱样本集的划分对分类准确率的影响,本文采取Jackknife检验方法对支持向量分类器的分类器检验,其分类准确率为96.77%.试验结果证明了该方法的有效性,该方法对结肠癌的识别具有一定的参考价值.  相似文献   

6.
支持向量机(SVM)是广泛应用于各个领域的分类算法,包括生物信息学。本研究应用SVM作为蛋白质相互作用的分类算法,所用蛋白质相互作用数据下载于墨尼黑生物信息学中心的酿酒酵母数据集,包含有6736条蛋白质,其中相互作用的有4837对,不相互作用的有9674对。提取蛋白质主要结构的电荷和等电位点特征,并应用SVM分类算法对此进行了分类。结果显示,分类的正确率在60%左右,但是较系统发育谱法还是获得了较高的分类正确率。  相似文献   

7.
针对目前多分类运动想象脑电识别存在特征提取单一、分类准确率低等问题,提出一种多特征融合的四分类运动想象脑电识别方法来提高识别率。对预处理后的脑电信号分别使用希尔伯特-黄变换、一对多共空间模式、近似熵、模糊熵、样本熵提取结合时频—空域—非线性动力学的初始特征向量,用主成分分析降维,最后使用粒子群优化支持向量机分类。该算法通过对国际标准数据集BCI2005 Data set IIIa中的k3b受试者数据经MATLAB仿真处理后获得93.30%的识别率,均高于单一特征和其它组合特征下的识别率。分别对四名实验者实验采集运动想象脑电数据,使用本研究提出的方法处理获得了72.96%的平均识别率。结果表明多特征融合的特征提取方法能更好的表征运动想象脑电信号,使用粒子群支持向量机可取得较高的识别准确率,为人脑的认知活动提供了一种新的识别方法。  相似文献   

8.
基于支持向量机的~(31)P磁共振波谱肝细胞癌诊断   总被引:1,自引:1,他引:0  
支持向量机是在统计学习理论基础上发展起来的一种新的机器学习方法,在模式识别领域有着广泛的应用。利用基于支持向量机模型的31P磁共振波谱数据对肝脏进行分类,区别肝细胞癌,肝硬化和正常的肝组织。通过对基于多项式核函数和径向基核函数的支持向量机分类器进行比较,并且得到三种肝脏分类的识别率。实验表明基于31P磁共振波谱数据的支持向量机分类模型能够对活体肝脏进行诊断性的预测。  相似文献   

9.
准确对事件诱发电位(ERPs)进行分类,对于各种人类认知研究和临床医学评估非常有意义.由于ERPs信号是非常高维的数据,而且其中包含非常多的与分类无关的信息,从ERPs信号中提取特征尤显重要.分析了共空间模式(CSP)的原理和不足,引入自回归(AR)模型与白化变换相结合,提出了针对ERPs分类的时空特征提取方法,并设计了验证该方法的认知实验,在认知实验数据上分别用时空特征提取方法与CSP提取特征,用同样的分类器支持向量机(SVM)训练分类器,比较它们的分类效果.实验表明,在ERPs分类问题上,时空特征提取方法与CSP相比具有明显的优势,在参数确定合理的情况下,时空特征提取方法可使分类准确率达到90%以上.  相似文献   

10.
介绍了非负矩阵分解算法(NMF)的基本原理,给出一种利用NMF进行脑电能量谱特征提取的方法。设计试验对10个被试在三种不同注意任务中的脑电信号进行特征提取,并采用人工神经网络作为分类器进行分类测试。结果表明,NMF算法在高维特征空间具有较强的特征选择能力,其分类正确率明显高于主分量分析(PCA)方法和直接法,三种意识任务的分类正确率分别达到84.5、88%和86.5。  相似文献   

11.
Prognostic prediction is important in medical domain, because it can be used to select an appropriate treatment for a patient by predicting the patient's clinical outcomes. For high-dimensional data, a normal prognostic method undergoes two steps: feature selection and prognosis analysis. Recently, the L?-L?-norm Support Vector Machine (L?-L? SVM) has been developed as an effective classification technique and shown good classification performance with automatic feature selection. In this paper, we extend L?-L? SVM for regression analysis with automatic feature selection. We further improve the L?-L? SVM for prognostic prediction by utilizing the information of censored data as constraints. We design an efficient solution to the new optimization problem. The proposed method is compared with other seven prognostic prediction methods on three realworld data sets. The experimental results show that the proposed method performs consistently better than the medium performance. It is more efficient than other algorithms with the similar performance.  相似文献   

12.
With recent advances in mass spectrometry techniques, it is now possible to investigate proteins over a wide range of molecular weights in small biological specimens. This advance has generated data-analytic challenges in proteomics, similar to those created by microarray technologies in genetics, namely, discovery of 'signature' protein profiles specific to each pathologic state (e.g. normal vs. cancer) or differential profiles between experimental conditions (e.g. treated by a drug of interest vs. untreated) from high-dimensional data. We propose a data-analytic strategy for discovering protein biomarkers based on such high-dimensional mass spectrometry data. A real biomarker-discovery project on prostate cancer is taken as a concrete example throughout the paper: the project aims to identify proteins in serum that distinguish cancer, benign hyperplasia, and normal states of prostate using the Surface Enhanced Laser Desorption/Ionization (SELDI) technology, a recently developed mass spectrometry technique. Our data-analytic strategy takes properties of the SELDI mass spectrometer into account: the SELDI output of a specimen contains about 48,000 (x, y) points where x is the protein mass divided by the number of charges introduced by ionization and y is the protein intensity of the corresponding mass per charge value, x, in that specimen. Given high coefficients of variation and other characteristics of protein intensity measures (y values), we reduce the measures of protein intensities to a set of binary variables that indicate peaks in the y-axis direction in the nearest neighborhoods of each mass per charge point in the x-axis direction. We then account for a shifting (measurement error) problem of the x-axis in SELDI output. After this pre-analysis processing of data, we combine the binary predictors to generate classification rules for cancer, benign hyperplasia, and normal states of prostate. Our approach is to apply the boosting algorithm to select binary predictors and construct a summary classifier. We empirically evaluate sensitivity and specificity of the resulting summary classifiers with a test dataset that is independent from the training dataset used to construct the summary classifiers. The proposed method performed nearly perfectly in distinguishing cancer and benign hyperplasia from normal. In the classification of cancer vs. benign hyperplasia, however, an appreciable proportion of the benign specimens were classified incorrectly as cancer. We discuss practical issues associated with our proposed approach to the analysis of SELDI output and its application in cancer biomarker discovery.  相似文献   

13.
Ovarian cancer recurs at the rate of 75% within a few months or several years later after therapy. Early recurrence, though responding better to treatment, is difficult to detect. Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry has showed the potential to accurately identify disease biomarkers to help early diagnosis. A major challenge in the interpretation of SELDI-TOF data is the high dimensionality of the feature space. To tackle this problem, we have developed a multi-step data processing method composed of t-test, binning and backward feature selection. A new algorithm, support vector machine-Markov blanket/recursive feature elimination (SVM-MB/RFE) is presented for the backward feature selection. This method is an integration of minimum weight feature elimination by SVM-RFE and information theory based redundant/irrelevant feature removal by Markov Blanket. Subsequently, SVM was used for classification. We conducted the biomarker selection algorithm on 113 serum samples to identify early relapse from ovarian cancer patients after primary therapy. To validate the performance of the proposed algorithm, experiments were carried out in comparison with several other feature selection and classification algorithms.  相似文献   

14.
The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 10(3) times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Support Vector Machine (SVM) or feature ranking methods (recursive feature elimination or I-Relief). A procedure for assessing stability and predictive value of the resulting biomarkers' list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies.  相似文献   

15.

Background

The majority of ovarian cancer biomarker discovery efforts focus on the identification of proteins that can improve the predictive power of presently available diagnostic tests. We here show that metabolomics, the study of metabolic changes in biological systems, can also provide characteristic small molecule fingerprints related to this disease.

Results

In this work, new approaches to automatic classification of metabolomic data produced from sera of ovarian cancer patients and benign controls are investigated. The performance of support vector machines (SVM) for the classification of liquid chromatography/time-of-flight mass spectrometry (LC/TOF MS) metabolomic data focusing on recognizing combinations or "panels" of potential metabolic diagnostic biomarkers was evaluated. Utilizing LC/TOF MS, sera from 37 ovarian cancer patients and 35 benign controls were studied. Optimum panels of spectral features observed in positive or/and negative ion mode electrospray (ESI) MS with the ability to distinguish between control and ovarian cancer samples were selected using state-of-the-art feature selection methods such as recursive feature elimination and L1-norm SVM.

Conclusion

Three evaluation processes (leave-one-out-cross-validation, 12-fold-cross-validation, 52-20-split-validation) were used to examine the SVM models based on the selected panels in terms of their ability for differentiating control vs. disease serum samples. The statistical significance for these feature selection results were comprehensively investigated. Classification of the serum sample test set was over 90% accurate indicating promise that the above approach may lead to the development of an accurate and reliable metabolomic-based approach for detecting ovarian cancer.  相似文献   

16.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号