共查询到20条相似文献,搜索用时 78 毫秒
1.
Pugalenthi G Kumar KK Suganthan PN Gangal R 《Biochemical and biophysical research communications》2008,367(3):630-634
Identification of catalytic residues can provide valuable insights into protein function. With the increasing number of protein 3D structures having been solved by X-ray crystallography and NMR techniques, it is highly desirable to develop an efficient method to identify their catalytic sites. In this paper, we present an SVM method for the identification of catalytic residues using sequence and structural features. The algorithm was applied to the 2096 catalytic residues derived from Catalytic Site Atlas database. We obtained overall prediction accuracy of 88.6% from 10-fold cross validation and 95.76% from resubstitution test. Testing on the 254 catalytic residues shows our method can correctly predict all 254 residues. This result suggests the usefulness of our approach for facilitating the identification of catalytic residues from protein structures. 相似文献
2.
MOTIVATION: Fold recognition is a key step in the protein structure discovery process, especially when traditional sequence comparison methods fail to yield convincing structural homologies. Although many methods have been developed for protein fold recognition, their accuracies remain low. This can be attributed to insufficient exploitation of fold discriminatory features. RESULTS: We have developed a new method for protein fold recognition using structural information of amino acid residues and amino acid residue pairs. Since protein fold recognition can be treated as a protein fold classification problem, we have developed a Support Vector Machine (SVM) based classifier approach that uses secondary structural state and solvent accessibility state frequencies of amino acids and amino acid pairs as feature vectors. Among the individual properties examined secondary structural state frequencies of amino acids gave an overall accuracy of 65.2% for fold discrimination, which is better than the accuracy by any method reported so far in the literature. Combination of secondary structural state frequencies with solvent accessibility state frequencies of amino acids and amino acid pairs further improved the fold discrimination accuracy to more than 70%, which is approximately 8% higher than the best available method. In this study we have also tested, for the first time, an all-together multi-class method known as Crammer and Singer method for protein fold classification. Our studies reveal that the three multi-class classification methods, namely one versus all, one versus one and Crammer and Singer method, yield similar predictions. AVAILABILITY: Dataset and stand-alone program are available upon request. 相似文献
3.
4.
外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。 相似文献
5.
外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。 相似文献
6.
支持向量机在害虫发生量预测中的应用 总被引:6,自引:0,他引:6
害虫发生量与其影响因子之间具有复杂的非线性和时滞性关系,传统方法不能很好的分析和拟合高度非线性的害虫发生量变化规律,导致预测精度不理想。为了有效构建害虫发生量与其影响因子之间复杂的非线性关系模型,提高害虫发生量预测精度,提出一种基于支持向量机的害虫发生量预测方法。该方法首先通过F测验对害虫发生量的最佳时滞阶数进行确定,并利用最佳时滞阶数对样本进行重构;然后利用前向浮动因子筛选法对害虫发生量的影响因子进行筛选,筛选出对预测结果贡献大的影响因子;最后采用10折交叉验证得到害虫发生量的最优预测模型。采用粘虫的幼虫发生密度数据在Mat-lab7.0平台下对该方法进行测试与分析,实验结果表明,相对于其它预测方法,支持向量机提高了害虫发生量的预测精度,克服了传统方法的缺陷,更适合于非线性、小样本的害虫发生量预测。 相似文献
7.
β-Turn is a secondary protein structure type that plays an important role in protein configuration and function. Here, we introduced an approach of β-turn prediction that used the support vector machine (SVM) algorithm combined with predicted secondary structure information. The secondary structure information was obtained by using E-SSpred, a new secondary protein structure prediction method. A 7-fold cross validation based on the benchmark dataset of 426 non-homologous protein chains was used to evaluate the performance of our method. The prediction results broke the 80% Q total barrier and achieved Q total = 80.9%, MCC = 0.44, and Q predicted higher 0.9% when compared with the best method. The results in our research are coincident with the conclusion that β-turn prediction accuracy can be improved by inclusion of secondary structure information. 相似文献
8.
Background
We apply a new machine learning method, the so-called Support Vector Machine method, to predict the protein structural class. Support Vector Machine method is performed based on the database derived from SCOP, in which protein domains are classified based on known structures and the evolutionary relationships and the principles that govern their 3-D structure. 相似文献9.
基于蛋白质序列,提出了一种新的超二级结构模体β-发夹的预测方法。利用离散增量构成的向量来表示序列信息,并将6个离散增量输入支持向量机,在六维向量空间中寻找最优超平面,将β-发夹和非β-发夹进行分类。计算结果表明,利用所设计的算法预测β-发夹,有较高的预测能力。对于训练集,5-交叉检验的预测总精度为81.24%,相关系数为0.57,β-发夹敏感性为83.06%;对于独立的检验集,预测总精度为78.34%,相关系数0.56,β-发夹敏感性为77.24%。将此预测模型应用于CASP6的63个蛋白质进行检验,得到较好结果。 相似文献
10.
比较序列分析作为RNA二级结构预测的最可靠途径, 已经发展出许多算法。将基于此方法的结构预测视为一个二值分类问题: 根据序列比对给出的可用信息, 判断比对中任意两列能否构成碱基对。分类器采用支持向量机方法, 特征向量包括共变信息、热力学信息和碱基互补比例。考虑到共变信息对序列相似性的要求, 通过引入一个序列相似度影响因子, 来调整不同序列相似度情况下共变信息和热力学信息对预测过程的影响, 提高了预测精度。通过49组Rfam-seed比对的验证, 显示了该方法的有效性, 算法的预测精度优于多数同类算法, 并且可以预测简单的假节。 相似文献
11.
启动子预测是研究基因转录调控的重要环节,但现有算法的预测正确率偏低.在深入分析启动子生物特征的基础上,提出了一种基于支持向量机的枯草杆菌启动子预测算法,在启动子序列的组成特征、信号特征和结构特征中选取9种典型特征作为预测的依据,对于信号特征,除了利用保守模式的一致序列,还考虑了间隔距离的分布信息.首先通过特征描述模型分别计算每种特征在启动子序列和非启动子序列中的得分,将特征得分组合成9维特征向量,再利用支持向量机在特征向量集上进行训练和判别.对实际数据集进行的刀切法测试验证了算法的有效性.对σ启动予的预测,平均正确率达到了90.7%;对几种其它σ因子启动子的预测,平均正确率也超过了80%.算法不但有广泛的适用性,还有良好的可扩展性,能够方便的容纳新特征,使识别性能不断提高. 相似文献
12.
Background
The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information.Methods
We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis.Results
Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests.Conclusions
The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.13.
14.
随着各种生物基因组序列测定工作的完成,大量的DNA序列数据涌现出来,为研究在基因组中寻找水平转移基因提供了极大的便利.将基因序列特征分析和支持向量机技术结合起来,通过分析基因序列的特征差异发现水平转移基因.依据以前研究工作的基础,选取了绝对密码子使用频率(FCU)作为序列特征,主要因为它既包含了基因密码子使用偏性的信息,也包含了基因所编码蛋白的氨基酸组成信息,支持向量机利用这些信息进行水平转移基因分析和预测,可以提高预测的准确性.另外,提出了基于分链的水平转移基因预测新方法,即将细菌基因组前导链和滞后链上的基因区别对待,分别进行水平转移基因预测.结果显示,基本预测方法要优于目前预测结果最好的Tsirigos等提出的基于八联核苷酸频率的打分算法,命中率的相对提高率最高达31.47%,而基于分链的方法对水平转移基因的预测取得了更好的结果. 相似文献
15.
S. Parfait P.M. Walker G. Créhange X. Tizon J. Mitéran 《Biomedical signal processing and control》2012,7(5):499-508
Prostate cancer is the most common cancer in men over 50 years of age and it has been shown that nuclear magnetic resonance spectra are sensitive enough to distinguish normal and cancer tissues. In this paper, we propose a classification technique of spectra from magnetic resonance spectroscopy. We studied automatic classification with and without quantification of metabolite signals. The dataset is composed of 22 patient datasets with a biopsy-proven cancer, from which we extracted 2464 spectra from the whole prostate and of which 1062 were localised in the peripheral zone. The spectra were manually classed into 3 different categories by a spectroscopist with 4 years experience in clinical spectroscopy of prostate cancer: undetermined, healthy and pathologic. We used different preprocessing methods (module, phase correction only, phase correction and baseline correction) as input for Support Vector Machine and for Multilayer Perceptron, and we compared the results with those from the expert. If we class only healthy and pathologic spectra we reach a total error rate of 4.51%. However, if we class all spectra (undetermined, healthy and pathologic) the total error rate rises to 11.49%. We have shown in this paper that the best results are obtained using the pre-processed spectra without quantification as input for the classifiers and we confirm that Support Vector Machine are more efficient than Multilayer Perceptron in processing high dimensional data. 相似文献
16.
Cornelia Caragea Jivko Sinapov Adrian Silvescu Drena Dobbs Vasant Honavar 《BMC bioinformatics》2007,8(1):438
Background
Glycosylation is one of the most complex post-translational modifications (PTMs) of proteins in eukaryotic cells. Glycosylation plays an important role in biological processes ranging from protein folding and subcellular localization, to ligand recognition and cell-cell interactions. Experimental identification of glycosylation sites is expensive and laborious. Hence, there is significant interest in the development of computational methods for reliable prediction of glycosylation sites from amino acid sequences. 相似文献17.
基于模糊支持向量机的膜蛋白折叠类型预测 总被引:1,自引:0,他引:1
现有的基于支持向量机(support vector machine,SVM)来预测膜蛋白折叠类型的方法.利用的蛋白质序列特征并不充分.并且在处理多类蛋白质分类问题时存在不可分区域,针对这两类问题.提取蛋白质序列的氨基酸和二肽组成特征,并计算加权的多阶氨基酸残基指数相关系数特征,将3类特征融和作为分类器的输入特征矢量.并采用模糊SVM(fuzzy SVM,FSVM)算法解决对传统SVM不可分数据的分类.在无冗余的数据集上测试结果显示.改进的特征提取方法在相同分类算法下预测性能优于已有的特征提取方法:FSVM在相同特征提取方法下性能优于传统的SVM.二者相结合的分类策略在独立性数据集测试下的预测精度达到96.6%.优于现有的多种预测方法.能够作为预测膜蛋白和其它蛋白质折叠类型的有效工具. 相似文献
18.
Lipid–protein interactions play a vital role in various biological processes, which are involved in cellular functions and can affect the stability, folding and the function of peptides and proteins. In this study, a sequence-based method by using support vector machine and position specific scoring matrix (PSSM) was proposed to predict lipid-binding sites. Considering the influence of surrounding residues of one amino acid, a sliding window was chosen to encode the PSSM profiles. By incorporating the evolutionary information and the local features of residues surrounding one lipid-binding site, the method yielded a high accuracy of 80.86% and the Matthew’s Correlation Coefficient of 0.58 by using fivefold cross validation test. The good result indicates the applicability of the method. 相似文献
19.
内源性转录终止子的计算预测是基因转录调控研究的重要内容,但当前方法的预测特异性偏低.在深入分析大肠杆菌内源性终止子中RNA发夹结构和多聚胸腺嘧啶区域等特征信号的基础上,为内源性终止子建立了一个由5个特征变量组成的包含序列组分、局部构象和能量分布信息的特征集,并根据此特征集实现了一种基于支持向量机的内源性终止子计算预测方法.针对大肠杆菌内源性终止子数据集和编码区阴性对照集的六重交叉验证测试证实了预测方法的有效性,对已知数据的预测平均正确率达到了99.4%.在对大肠杆菌全基因组限定范围内的搜索中,该预测方法可以成功地识别出绝大多数已知内源性终止子,与其他几种常用方法相比,预测结果总数大幅度减少,预测的特异性有了明显提高. 相似文献
20.
MOTIVATION: The prediction of ligand-binding residues or catalytically active residues of a protein may give important hints that can guide further genetic or biochemical studies. Existing sequence-based prediction methods mostly rank residue positions by evolutionary conservation calculated from a multiple sequence alignment of homologs. A problem hampering more wide-spread application of these methods is the low per-residue precision, which at 20% sensitivity is around 35% for ligand-binding residues and 20% for catalytic residues. RESULTS: We combine information from the conservation at each site, its amino acid distribution, as well as its predicted secondary structure (ss) and relative solvent accessibility (rsa). First, we measure conservation by how much the amino acid distribution at each site differs from the distribution expected for the predicted ss and rsa states. Second, we include the conservation of neighboring residues in a weighted linear score by analytically optimizing the signal-to-noise ratio of the total score. Third, we use conditional probability density estimation to calculate the probability of each site to be functional given its conservation, the observed amino acid distribution, and the predicted ss and rsa states. We have constructed two large data sets, one based on the Catalytic Site Atlas and the other on PDB SITE records, to benchmark methods for predicting functional residues. The new method FRcons predicts ligand-binding and catalytic residues with higher precision than alternative methods over the entire sensitivity range, reaching 50% and 40% precision at 20% sensitivity, respectively. AVAILABILITY: Server: http://frpred.tuebingen.mpg.de. Data sets: ftp://ftp.tuebingen.mpg.de/pub/protevo/FRpred/. 相似文献