共查询到19条相似文献,搜索用时 187 毫秒
1.
从氨基酸的物化特性出发,利用物理学中“粗粒化”和“分组”的思想,提出了一种新的蛋白质序列特征提取方法——分组重量编码方法。采用组分耦合算法作为分类器,从蛋白质一级序列出发对细胞凋亡蛋白的亚细胞定位进行研究。针对Zhou和Doctor使用的数据集,Re—substitution和Jackknife检验总体预测精度分别为98、O%和85.7%,比基于氨基酸组成和组分耦合算法的总体预测精度提高了7.2%和13.2%;针对陈颖丽和李前忠使用的数据集,Re—substitution和Jackknife检验总体预测精度分别为94.0%和80、1%,比基于二肽组成和离散增量算法的总体预测精度提高了5.9%和2、0%。针对我们自己整理的最新数据集,通过Re—substitution和Jackknife检验,总体预测精度分别为97.33%和75、11%。实验结果表明蛋白质序列的分组重量编码对于细胞凋亡蛋白的定位研究是一种有效的特征提取方法。 相似文献
2.
在破译了基因序列的后基因组时代,随着系统生物学实验的快速发展,产生了大量的蛋白质相互作用数据,利用这些数据寻找功能模块及预测蛋白质功能在功能基因组研究中具有重要意义.打破了传统的基于蛋白质间相似度的聚类模式,直接从蛋白质功能团的角度出发,考虑功能团间的一阶和二阶相互作用,提出了模块化聚类方法(MCM),对实验数据进行聚类分析,来预测模块内未知蛋白质的功能.通过超几何分布P值法和增、删、改相互作用的方法对聚类结果进行预测能力分析和稳定性分析.结果表明,模块化聚类方法具有较高的预测准确度和覆盖率,有很好的容错性和稳定性.此外,模块化聚类分析得到了一些具有高预测准确度的未知蛋白质的预测结果,将会对生物实验有指导意义,其算法对其他具有相似结构的网络也具有普遍意义. 相似文献
3.
基于相互作用的蛋白质功能预测 总被引:1,自引:0,他引:1
蛋白质功能预测是后基因时代研究的热点问题。基于相互作用的蛋白质功能预测方法目前应用比较广泛,但是当"伙伴蛋白质"(interacting partners)数目k较小时,其预测准确率不高。从蛋白质相互作用网络入手,结合"小世界网络"特性,有效解决了k较小时预测准确率不高的问题。对酵母(Saccharomyces cerevisiae)蛋白质的相互作用网络进行预测,当k≤4时其预测准确率比相同条件下的GO(global optimization)方法有一定提高。实验结果表明:该方法能够有效的应用于伙伴蛋白质数目较小时的蛋白质功能预测。 相似文献
4.
5.
6.
蛋白质是有机生命体内不可或缺的化合物,在生命活动中发挥着多种重要作用,了解蛋白质的功能有助于医学和药物研发等领域的研究。此外,酶在绿色合成中的应用一直备受人们关注,但是由于酶的种类和功能多种多样,获取特定功能酶的成本高昂,限制了其进一步的应用。目前,蛋白质的具体功能主要通过实验表征确定,该方法实验工作繁琐且耗时耗力,同时,随着生物信息学和测序技术的高速发展,已测序得到的蛋白质序列数量远大于功能获得注释的序列数量,高效预测蛋白质功能变得至关重要。随着计算机技术的蓬勃发展,由数据驱动的机器学习方法已成为应对这些挑战的有效解决方案。本文对蛋白质功能及其注释方法以及机器学习的发展历程和操作流程进行了概述,聚焦于机器学习在酶功能预测领域的应用,对未来人工智能辅助蛋白质功能高效研究的发展方向提出了展望。 相似文献
7.
8.
9.
随着计算能力的增加和生物数据的快速扩展,利用生物信息学解决一些生物学问题逐渐成为主流的解决方案。蛋白质功能预测是生物医学和药物研究领域的重要任务。利用生物信息学进行蛋白质功能预测成为研究热点。本文将基于生物信息学的蛋白质功能预测方法归纳为3类:基于蛋白质序列的方法、基于蛋白质结构的方法和基于蛋白质相互作用网络的方法,并进一步分析和总结了这些方法的具体算法以及最新研究进展,为生物医学和药物研究领域深入探索预测蛋白质功能提供重要参考。 相似文献
10.
蛋白质是生物体内最必需也是最通用的大分子,对它们功能的认识对于科学领域和农业领域的发展有着至关重要的作用。随着后基因组时代的发展,NCBI数据库中迅速涌现出大量不明结构与功能的蛋白质序列,这些蛋白质序列甚至一跃成了研究的热点。近几十年来蛋白质功能预测的方法不断被完善。由最初的仅基于蛋白质序列或3D结构信息的方法衍生出更多的基于序列相似性、基于结构基序、基于相互作用网络等新方法,这些新型方法采用新的算法、新的研究思路和技术手段,力求得到准确性与普遍性并存,能够被广泛应用的蛋白质功能预测方法。本文综述了近年来蛋白质功能预测的方法,并将这些研究方法分类归纳,各自阐明了每类方法的优缺点。 相似文献
11.
12.
It is well known in the literature that an ensemble of classifiers obtains good performance with respect to that obtained by a stand-alone method. Hence, it is very important to develop ensemble methods well suited for bioinformatics data. In this work, we propose to combine the feature extraction method based on grouped weight with a set of amino-acid alphabets obtained by a Genetic Algorithm. The proposed method is applied for predicting DNA-binding proteins. As classifiers, the linear support vector machine and the radial basis function support vector machine are tested. As performance indicators, the accuracy and Matthews's correlation coefficient are reported. Matthews's correlation coefficient obtained by our ensemble method is approximately 0.97 when the jackknife cross-validation is used. This result outperforms the performance obtained in the literature using the same dataset where the features are extracted directly from the amino-acid sequence. 相似文献
13.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server. 相似文献
14.
15.
16.
预测蛋白质间相互作用的生物信息学方法 总被引:8,自引:0,他引:8
后基因组时代的研究模式,已从原来的序列-结构-功能转向基因表达-系统动力学-生理功能。建立蛋白质间相互作用的完全网络,即蛋白质相互作用组(interactome),将有助于从系统角度加深对细胞结构和功能的认识,并为新药靶点的发现和药物设计提供理论基础。一系列系统分析蛋白质相互作用的实验方法已经建立,近年来,出现了多种预测蛋白质相互作用的生物信息学方法,这些方法不仅是对传统实验方法的有价值的补充,而且能够扩展实验方法的预测范围;同时,在开发这些方法的过程中建立了一些重要的分子进化和分子生物学慨念。本文综述了9种生物信息学方法的原理、方法评估、存在的问题.并分析了这个领域的发展前景。 相似文献
17.
Lili Xi 《Journal of theoretical biology》2010,264(4):1159-1168
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates. 相似文献
18.
Reinhard Lohmann Gisbert Schneider Dirk Behrens Paul Wrede 《Protein science : a publication of the Protein Society》1994,3(9):1597-1601
The architecture and weights of an artificial neural network model that predicts putative transmembrane sequences have been developed and optimized by the algorithm of structure evolution. The resulting filter is able to classify membrane/nonmembrane transition regions in sequences of integral human membrane proteins with high accuracy. Similar results have been obtained for both training and test set data, indicating that the network has focused on general features of transmembrane sequences rather than specializing on the training data. Seven physicochemical amino acid properties have been used for sequence encoding. The predictions are compared to hydrophobicity plots. 相似文献
19.
Karthikeyan Pallipalayam Periyasamy Palaniswamy Thanga Velan Lakshmi Chinmay Kumar Dwibedi Arunachalam Annamalai 《Bioinformation》2009,4(5):184-186
The human Y chromosome is the sex determining chromosome. The number of proteins associated with this chromosome is 196 and 107 of
the 196 proteins have yet not been characterised. Here, we describe the analysis of these 107 proteins by computing various physicochemical
properties using sequence and predicted structural data to elucidate molecular function. We present the derived data in the form a
form a database made freely available for download, review, refinement and update.