首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
探讨原发性肝癌患者精确放疗后乙型肝炎病毒(hepatitis b virus,HBV)再激活的危险特征和分类预测模型。提出基于遗传算法的特征选择方法,从原发性肝癌数据的初始特征集中选择HBV再激活的最优特征子集。建立贝叶斯和支持向量机的HBV再激活分类预测模型,并预测最优特征子集和初始特征集的分类性能。实验结果表明,基于遗传算法的特征选择提高了HBV再激活分类性能,最优特征子集的分类性能明显优于初始特征子集的分类性能。影响HBV再激活的最优特征子集包括:HBV DNA水平,肿瘤分期TNM,Child-Pugh,外放边界和全肝最大剂量。贝叶斯的分类准确性最高可达82.89%,支持向量机的分类准确性最高可达83.34%。  相似文献   

2.
The thermostability of proteins is particularly relevant for enzyme engineering. Developing a computational method to identify mesophilic proteins would be helpful for protein engineering and design. In this work, we developed support vector machine based method to predict thermophilic proteins using the information of amino acid distribution and selected amino acid pairs. A reliable benchmark dataset including 915 thermophilic proteins and 793 non-thermophilic proteins was constructed for training and testing the proposed models. Results showed that 93.8% thermophilic proteins and 92.7% non-thermophilic proteins could be correctly predicted by using jackknife cross-validation. High predictive successful rate exhibits that this model can be applied for designing stable proteins.  相似文献   

3.
Ding S  Zhang S  Li Y  Wang T 《Biochimie》2012,94(5):1166-1171
Knowledge of structural classes plays an important role in understanding protein folding patterns. In this paper, features based on the predicted secondary structure sequence and the corresponding E–H sequence are extracted. Then, an 11-dimensional feature vector is selected based on a wrapper feature selection algorithm and a support vector machine (SVM). Among the 11 selected features, 4 novel features are newly designed to model the differences between α/β class and α + β class, and other 7 rational features are proposed by previous researchers. To examine the performance of our method, a total of 5 datasets are used to design and test the proposed method. The results show that competitive prediction accuracies can be achieved by the proposed method compared to existing methods (SCPRED, RKS-PPSC and MODAS), and 4 new features are demonstrated essential to differentiate α/β and α + β classes. Standalone version of the proposed method is written in JAVA language and it can be downloaded from http://web.xidian.edu.cn/slzhang/paper.html.  相似文献   

4.
Microarrays have thousands to tens-of-thousands of gene features, but only a few hundred patient samples are available. The fundamental problem in microarray data analysis is identifying genes whose disruption causes congenital or acquired disease in humans. In this paper, we propose a new evolutionary method that can efficiently select a subset of potentially informative genes for support vector machine (SVM) classifiers. The proposed evolutionary method uses SVM with a given subset of gene features to evaluate the fitness function, and new subsets of features are selected based on the estimates of generalization error of SVMs and frequency of occurrence of the features in the evolutionary approach. Thus, in theory, selected genes reflect to some extent the generalization performance of SVM classifiers. We compare our proposed method with several existing methods and find that the proposed method can obtain better classification accuracy with a smaller number of selected genes than the existing methods.  相似文献   

5.
Yan X  Chao T  Tu K  Zhang Y  Xie L  Gong Y  Yuan J  Qiang B  Peng X 《FEBS letters》2007,581(8):1587-1593
  相似文献   

6.
Structural class characterizes the overall folding type of a protein or its domain and the prediction of protein structural class has become both an important and a challenging topic in protein science. Moreover, the prediction itself can stimulate the development of novel predictors that may be straightforwardly applied to many other relational areas. In this paper, 10 frequently used sequence-derived structural and physicochemical features, which can be easily computed by the PROFEAT (Protein Features) web server, were taken as inputs of support vector machines to develop statistical learning models for predicting the protein structural class. More importantly, a strategy of merging different features, called best-first search, was developed. It was shown through the rigorous jackknife cross-validation test that the success rates by our method were significantly improved. We anticipate that the present method may also have important impacts on boosting the predictive accuracies for a series of other protein attributes, such as subcellular localization, membrane types, enzyme family and subfamily classes, among many others.  相似文献   

7.
8.
原发性肝癌(PLC)患者在精确放疗后乙型肝炎病毒(HBV)再激活是一种常见并发症,及时的预测防护能降低发病率、死亡率。研究表明:多余的特征变量会影响HBV再激活的预测精度。通过提出基于近邻成分分析(NCA)的特征选择方法找出HBV再激活的危险因素及特征组合。之后分别建立经Bayes优化前后的支持向量机模型(SVM)对这些关键特征子集及初始特征集进行分类预测。实验结果表:明HBV DNA水平、KPS评分、分割方式、外放边界、V25、肿瘤分期TNM、ChildPugh等都是影响HBV再激活的危险因素。其中经NCA特征选择之后发现的V25是在乙型肝炎病毒再激活研究中首次提出的危险因素。10折交叉验证下特征组合HBV DNA水平、外放边界、V25的预测精度高达86.11%。支持向量机分类器可以很好的应用于乙型肝炎病毒再激活的研究,特征选择后的关键特征组合具有更优越的分类性能。  相似文献   

9.
在抗艾滋病治疗中,HIV-1蛋白酶抑制剂发挥着重要作用。对于HIV-1蛋白酶裂解作用位点的研究有助于找到新的治疗靶点。为了对HIV-1蛋白酶特异位点进行预测,本研究用氨基酸索引数据库(Amino Acid Index,AAIndex)中的531个氨基酸物理化学性质参数直接表征肽样本的结构,通过二层特征筛选,最终将4248个表征参数降为57个表征参数。分别采取四种核函数进行HIV-1蛋白酶特异位点的支持向量机(SVM)建模,并通过10折交叉验证及外部测试集方法来验证建模的准确性。结果表明选取NormalizePolyKernel核函数进行SVM建模效果优于其他核函数(PolyKernel、PUK、RBFKernel),所建立的模型对于训练集的10组交叉验证预测准确率达到93.947%,对于外部测试集的预测正确率达到93.684%。  相似文献   

10.
Qiu JD  Sun XY  Suo SB  Shi SP  Huang SY  Liang RP  Zhang L 《Biochimie》2011,93(7):1132-1138
Many proteins exist in vivo as oligomers with different quaternary structural attributes rather than as individual chains. These proteins are the structural components of various biological functions, including cooperative effects, allosteric mechanisms and ion-channel gating. With the dramatic increase in the number of protein sequences submitted to the public databank, it is important for both basic research and drug discovery research to acquire the knowledge about possible quaternary structural attributes of their interested proteins in a timely manner. A high-throughput method (DWT_SVM), fusing discrete wavelet transform (DWT) and support vector machine (SVM) classifier algorithm with various physicochemical features, has been developed to predict protein quaternary structure. The accuracy in distinguishing candidate proteins as homo-oligomer or hetero-oligomer using the dataset R2720 was 85.95% and 85.49% respectively by jackknife, showing that DWT_SVM is guide promising in predicting protein quaternary structures. The online service is available at http://bioinfo.ncu.edu.cn/Services.aspx. Protein sequences in FASTA format can be directly fed to the system OligoPred. The processed results will be presented in a diagram that includes the information of feature extraction and the classification error rate.  相似文献   

11.
针对目前多分类运动想象脑电识别存在特征提取单一、分类准确率低等问题,提出一种多特征融合的四分类运动想象脑电识别方法来提高识别率。对预处理后的脑电信号分别使用希尔伯特-黄变换、一对多共空间模式、近似熵、模糊熵、样本熵提取结合时频—空域—非线性动力学的初始特征向量,用主成分分析降维,最后使用粒子群优化支持向量机分类。该算法通过对国际标准数据集BCI2005 Data set IIIa中的k3b受试者数据经MATLAB仿真处理后获得93.30%的识别率,均高于单一特征和其它组合特征下的识别率。分别对四名实验者实验采集运动想象脑电数据,使用本研究提出的方法处理获得了72.96%的平均识别率。结果表明多特征融合的特征提取方法能更好的表征运动想象脑电信号,使用粒子群支持向量机可取得较高的识别准确率,为人脑的认知活动提供了一种新的识别方法。  相似文献   

12.
13.
14.
表面肌电信号(Surface Electromyography,sEMG)是通过相应肌群表面的传感器记录下来的一维时间序列非平稳生物电信号,不但反映了神经肌肉系统活动,对于反映相应动作肢体活动信息同样重要。而模式识别是肌电应用领域的基础和关键。为了在应用基于表面肌电信号模式识别中选取合适算法,本文拟对基于表面肌电信号的人体动作识别算法进行回顾分析,主要包括模糊模式识别算法、线性判别分析算法、人工神经网络算法和支持向量机算法。模糊模式识别能自适应提取模糊规则,对初始化规则不敏感,适合处理s EMG这样具有严格不重复的生物电信号;线性判别分析对数据进行降维,计算简单,但不适合大数据;人工神经网络可以同时描述训练样本输入输出的线性关系和非线性映射关系,可以解决复杂的分类问题,学习能力强;支持向量机处理小样本、非线性的高维数据优势明显,计算速度快。比较各方法的优缺点,为今后处理此类问题模式识别算法选取提供了参考和依据。  相似文献   

15.
Zhang SW  Pan Q  Zhang HC  Shao ZC  Shi JY 《Amino acids》2006,30(4):461-468
Summary. The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence. Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types.  相似文献   

16.
Development of glutamate non-competitive antagonists of mGluR1 (Metabotropic glutamate receptor subtype 1) has increasingly attracted much attention in recent years due to their potential therapeutic application for various nervous disorders. Since there is no crystal structure reported for mGluR1, ligand-based virtual screening (VS) methods, typically pharmacophore-based VS (PB-VS), are often used for the discovery of mGluR1 antagonists. Nevertheless, PB-VS usually suffers a lower hit rate and enrichment factor. In this investigation, we established a multistep ligand-based VS approach that is based on a support vector machine (SVM) classification model and a pharmacophore model. Performance evaluation of these methods in virtual screening against a large independent test set, M-MDDR, show that the multistep VS approach significantly increases the hit rate and enrichment factor compared with the individual SB-VS and PB-VS methods. The multistep VS approach was then used to screen several large chemical libraries including PubChem, Specs, and Enamine. Finally a total of 20 compounds were selected from the top ranking compounds, and shifted to the subsequent in vitro and in vivo studies, which results will be reported in the near future.  相似文献   

17.
Gram-negative bacteria have five major subcellular localization sites: the cytoplasm, the periplasm, the inner membrane, the outer membrane, and the extracellular space. The subcellular location of a protein can provide valuable information about its function. With the rapid increase of sequenced genomic data, the need for an automated and accurate tool to predict subcellular localization becomes increasingly important. We present an approach to predict subcellular localization for Gram-negative bacteria. This method uses the support vector machines trained by multiple feature vectors based on n-peptide compositions. For a standard data set comprising 1443 proteins, the overall prediction accuracy reaches 89%, which, to the best of our knowledge, is the highest prediction rate ever reported. Our prediction is 14% higher than that of the recently developed multimodular PSORT-B. Because of its simplicity, this approach can be easily extended to other organisms and should be a useful tool for the high-throughput and large-scale analysis of proteomic and genomic data.  相似文献   

18.
19.
20.
Long intergenic non-coding RNAs (lincRNAs) are a new type of non-coding RNAs and are closely related with the occurrence and development of diseases. In previous studies, most lincRNAs have been identified through next-generation sequencing. Because lincRNAs exhibit tissue-specific expression, the reproducibility of lincRNA discovery in different studies is very poor. In this study, not including lincRNA expression, we used the sequence, structural and protein-coding potential features as potential features to construct a classifier that can be used to distinguish lincRNAs from non-lincRNAs. The GA–SVM algorithm was performed to extract the optimized feature subset. Compared with several feature subsets, the five-fold cross validation results showed that this optimized feature subset exhibited the best performance for the identification of human lincRNAs. Moreover, the LincRNA Classifier based on Selected Features (linc-SF) was constructed by support vector machine (SVM) based on the optimized feature subset. The performance of this classifier was further evaluated by predicting lincRNAs from two independent lincRNA sets. Because the recognition rates for the two lincRNA sets were 100% and 99.8%, the linc-SF was found to be effective for the prediction of human lincRNAs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号