首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
在基因组数据中,有20%~30%的产物被预测为跨膜蛋白,本文通过对膜蛋白拓扑结构预测方法进行分析,并评价其结果,为选择更合适的拓扑结构预测方法预测膜蛋白结构。通过对目前已有的拓扑结构预测方法的评价分析,可以为我们在实际工作中提供重要的参考。比如对一个未知拓扑结构的跨膜蛋白序列,我们可以先进行是否含有信号肽的预测,参考Polyphobius和SignalP两种方法,若两种方法预测结果不一致,综合上述对两种方法的评价,Polyphobius预测的综合能力较好,可取其预测的结果,一旦确定含有信号肽,则N端必然位于膜外侧。然后结合序列的长度,判断蛋白是单跨膜还是多重跨膜,即可参照上述评价结果,选择合适的拓扑结构预测方法进行预测。  相似文献   

2.
目的:对拟南芥的CAX1蛋白进行跨膜结构预测,构建CAX1的跨膜结构模型。方法:以生物信息学工具Signal 3.0、Conpred Ⅱ、TMHMM 2.0、HMMTOP、MEMSAT3、ConPred2分析拟南芥CAX1蛋白的一级序列。结果:模型显示CAX1共有10个跨膜区,分别为氨基酸残基73-93,128-147,163-185,198-219,236-256,286-307,322-344,357-379,387-407,414-432。结论:与现有的资料相印证,此模型可以作为CAX1功能研究的参考模型。  相似文献   

3.
粟酒裂殖酵母全基因组中含信号肽蛋白质的研究   总被引:1,自引:0,他引:1  
刘玉岭  柳云帆  谢建平 《遗传》2007,29(2):250-256
对粟酒裂殖酵母全基因组3条染色体上的4,997个蛋白序列进行了全局性的分析,利用signalP3.0软件分析这些蛋白的N-末端信号肽序列, 预测有N-末端分泌信号肽序列的蛋白196个;利用TMpred 软件分析跨膜结构, 预测跨膜蛋白117个; 使用PrositeScan程序分析膜脂蛋白的脂结合位点, 预测有膜脂结合蛋白13个, 进而预测分泌性蛋白序列66个。使用Target P分析66个分泌蛋白的蛋白序列, 研究这些蛋白在细胞中的定位。这些分泌蛋白的功能涉及粟酒裂殖酵母的营养、生殖、细胞间以及细胞与环境间的交流等许多方面, 对细胞的生存和繁殖有重要意义, 在系统生物学的研究中有重要参考价值。粟酒裂殖酵母分泌组的研究也将为粟酒裂殖酵母作为药物筛选模型以及开发为外源蛋白表达的宿主提供基础。  相似文献   

4.
基于小波分析的膜蛋白跨膜区段序列分析和预测   总被引:2,自引:0,他引:2  
膜蛋白是一类结构独特的蛋白质,在各种细胞中普遍存在,发挥着重要的生理功能。目前仅有少数膜蛋白听结构被实验测出,因此用计算机预测膜蛋白的结构是蛋白质结构预测的主要研究内容之一。膜蛋白一般在膜上形成保守的跨膜螺旋结构,序列特征明显,比较适合用预测的方法确定跨膜螺旋区段的位置。国际上已有一些研究者用人工神经网络方法、多序列比对方法和统计方法进行了预测尝试,取得了一定的成功经验。我们对蛋白质序列数据库中的  相似文献   

5.
《生命科学研究》2017,(2):95-100
利用生物信息学方法对配体门控离子通道受体人P2X_1蛋白的理化性质、信号肽、跨膜区、亲疏水性、二级结构及三级结构、蛋白质间的相互作用、GO注释等进行预测分析。结果显示:P2X_1蛋白是两次跨膜的亲水蛋白质,由399个氨基酸组成,等电点为8.75,有一段核定位序列;二级结构存在6个α螺旋区和15个β折叠区,三级结构预测结果的可靠性为45.35%,拉曼图分析表明预测结构较稳定;与P2X_1相互作用的蛋白质主要是传导信号的嘌呤受体和离子通道蛋白质,而且P2X_1蛋白可能参与血管凝集及炎症反应等生理过程。上述对人P2X_1蛋白结构及功能的预测为研究人P2X_1蛋白在生物过程及疾病治疗方面的作用提供了重要的理论依据。  相似文献   

6.
目的:基于生物信息学预测人线粒体转录终止因子3(hMTERF3)蛋白的结构与功能。方法:利用GenBank、Uniprot、ExPASy、SWISS-PROT数据库资源和不同的生物信息学软件对hMTERF3蛋白进行系统研究,包括hMTERF3的理化性质、跨膜区和信号肽、二级结构功能域、亚细胞定位、蛋白质的功能分类预测、同源蛋白质多重序列比对、系统发育树构建、三级结构同源建模。结果:软件预测hMTERF3蛋白的相对分子质量为47.97×103,等电点为8.60,不具信号肽和跨膜区;二级结构分析显示主要为螺旋和无规则卷曲,包含6个MTERF基序,三级结构预测结果与二级结构预测结果相符;亚细胞定位分析结果显示该蛋白定位于人线粒体;功能分类预测其为转运和结合蛋白,参与基因转录调控;同源蛋白质多重序列比对和进化分析显示,hMTERF3蛋白与大鼠、小鼠等哺乳动物的MTERF3蛋白具有高度同源性,在系统发育树上聚为一类。结论:hMTERF3蛋白的生物信息学分析为进一步开展对该蛋白的结构和功能的实验研究提供了理论依据。  相似文献   

7.
8.
运用生物信息学方法预测分析不同物种间STC2蛋白的理化性质、同源性以及人STC2蛋白亲水性、核定位序列,跨膜区域、信号肽结构、亚细胞定位,二级结构、三级结构、互作蛋白、GO注释分析。STC2蛋白由302个氨基酸组成,理论等电点为6.93,具有较强亲水性,在哺乳动物中较为保守,不存在核定位序列和跨膜结构,含有信号肽,主要集中在细胞内质网或分泌到胞外;STC2蛋白二级结构预测有11个α螺旋区和1个β折叠区,拉曼图表明三级预测结构可靠,互作蛋白及GO注释提示STC2可参与多种细胞生物学过程。通过对人STC2蛋白结构和功能的预测分析,为STC2蛋白的进一步研究提供一定的理论依据,也为STC2相关疾病的诊治提供新的思路。  相似文献   

9.
APEX1是影响基因表达和氧化还原活性相关碱基修复和多功能蛋白的关键基因,对APEX1基因及编码蛋白进行生物信息学方面的深入分析,更有利于对APEX1基因相关癌症及其他遗传流行病学的阐述。本研究利用生物信息学分析方法以11个物种APEX1基因序列及编码蛋白序列为研究对象,对人APEX1基因进行了系统进化分析及预测启动子及Cp G岛,并对其蛋白质理化性质及结构功能等进行了预测分析。核酸分析结果表明,人APEX1基因包含5个外显子,预测核心启动子区域为158~208 bp。进化分析结果显示,人与黑猩猩的遗传距离为0.003,亲缘关系最近。蛋白预测软件结果显示,人APEX1蛋白主要由自由卷曲及α-螺旋结构组成,无合适信号肽及明显跨膜区,表明该基因不参与跨膜物质运输及信号转导。  相似文献   

10.
家蝇小热休克蛋白(sHsp20.6)的生物信息学分析   总被引:1,自引:1,他引:0       下载免费PDF全文
研究家蝇小热休克蛋白sHsp20.6的生物学功能。方法应用生物信息学的方法和工具对家蝇sHsp20.6的理化性质、疏水性、跨膜区和信号肽、膜体分析、二级结构功能域、蛋白质的功能分类预测、多重序列比对与系统发育树构建、三级结构建模进行分析。结果表明:家蝇sHsp20.6是一个亲水蛋白,分子量为20.64kD,等电点为5.66,不具有跨膜区和信号肽,包含有一个HSP20的结构域,主要构成原件为α螺旋和无规则卷曲,三维结构预测显示该蛋白为棒状结构,C端结构域具有7个片层结构。聚类分析显示,家蝇sHsp20.6蛋白与昆虫中的直系同源小热休克蛋白(orthologoussmallheatshockprotein)聚为一类。  相似文献   

11.
In the present study, an attempt has been made to develop a method for predicting gamma-turns in proteins. First, we have implemented the commonly used statistical and machine-learning techniques in the field of protein structure prediction, for the prediction of gamma-turns. All the methods have been trained and tested on a set of 320 nonhomologous protein chains by a fivefold cross-validation technique. It has been observed that the performance of all methods is very poor, having a Matthew's Correlation Coefficient (MCC) 相似文献   

12.
Tertiary structure prediction of a protein from its amino acid sequence is one of the major challenges in the field of bioinformatics. Hierarchical approach is one of the persuasive techniques used for predicting protein tertiary structure, especially in the absence of homologous protein structures. In hierarchical approach, intermediate states are predicted like secondary structure, dihedral angles, Cα-Cα distance bounds, etc. These intermediate states are used to restraint the protein backbone and assist its correct folding. In the recent years, several methods have been developed for predicting dihedral angles of a protein, but it is difficult to conclude which method is better than others. In this study, we benchmarked the performance of dihedral prediction methods ANGLOR and SPINE X on various datasets, including independent datasets. TANGLE dihedral prediction method was not benchmarked (due to unavailability of its standalone) and was compared with SPINE X and ANGLOR on only ANGLOR dataset on which TANGLE has reported its results. It was observed that SPINE X performed better than ANGLOR and TANGLE, especially in case of prediction of dihedral angles of glycine and proline residues. The analysis suggested that angle shifting was the foremost reason of better performance of SPINE X. We further evaluated the performance of the methods on independent ccPDB30 dataset and observed that SPINE X performed better than ANGLOR.  相似文献   

13.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

14.
药物研发是非常重要但也十分耗费人力物力的过程。利用计算机辅助预测药物与蛋白质亲和力的方法可以极大地加快药物研发过程。药物靶标亲和力预测的关键在于对药物和蛋白质进行准确详细地信息表征。提出一种基于深度学习与多层次信息融合的药物靶标亲和力的预测模型,试图通过综合药物与蛋白质的多层次信息,来获得更好的预测表现。首先将药物表述成分子图和扩展连接指纹两种形式,分别利用图卷积神经网络模块和全连接层进行学习;其次将蛋白质序列和蛋白质K-mer特征分别输入卷积神经网络模块和全连接层来学习蛋白质潜在特征;随后将4个通道学习到的特征进行融合,再利用全连接层进行预测。在两个基准药物靶标亲和力数据集上验证了所提方法的有效性,并与其他已有模型作对比研究。结果说明提出的模型相比基准模型能得到更好的预测性能,表明提出的综合药物与蛋白质多层次信息的药物靶标亲和力预测策略是有效的。  相似文献   

15.
Prediction of protein secondary structure at 80% accuracy   总被引:11,自引:0,他引:11  
Secondary structure prediction involving up to 800 neural network predictions has been developed, by use of novel methods such as output expansion and a unique balloting procedure. An overall performance of 77.2%-80.2% (77.9%-80.6% mean per-chain) for three-state (helix, strand, coil) prediction was obtained when evaluated on a commonly used set of 126 protein chains. The method uses profiles made by position-specific scoring matrices as input, while at the output level it predicts on three consecutive residues simultaneously. The predictions arise from tenfold, cross validated training and testing of 1032 protein sequences, using a scheme with primary structure neural networks followed by structure filtering neural networks. With respect to blind prediction, this work is preliminary and awaits evaluation by CASP4.  相似文献   

16.
17.
Knowledge of the pathogen-host interactions between the species is essentialin order to develop a solution strategy against infectious diseases. In vitro methods take extended periods of time to detect interactions and provide very few of the possible interaction pairs. Hence, modelling interactions between proteins has necessitated the development of computational methods. The main scope of this paper is integrating the known protein interactions between thehost and pathogen organisms to improve the prediction success rate of unknown pathogen-host interactions. Thus, the truepositive rate of the predictions was expected to increase.In order to perform this study extensively, encoding methods and learning algorithms of several proteins were tested. Along with human as the host organism, two different pathogen organisms were used in the experiments. For each combination of protein-encoding and prediction method, both the original prediction algorithms were tested using only pathogen-host interactions and the same methodwas testedagain after integrating the known protein interactions within each organism. The effect of merging the networks of pathogen-host interactions of different species on the prediction performance of state-of-the-art methods was also observed. Successwas measured in terms of Matthews correlation coefficient, precision, recall, F1 score, and accuracy metrics. Empirical results showed that integrating the host and pathogen interactions yields better performance consistently in almost all experiments.  相似文献   

18.
Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.  相似文献   

19.
MOTIVATION: beta-turn is an important element of protein structure. In the past three decades, numerous beta-turn prediction methods have been developed based on various strategies. For a detailed discussion about the importance of beta-turns and a systematic introduction of the existing prediction algorithms for beta-turns and their types, please see a recent review (Chou, Analytical Biochemistry, 286, 1-16, 2000). However at present, it is still difficult to say which method is better than the other. This is because of the fact that these methods were developed on different sets of data. Thus, it is important to evaluate the performance of beta-turn prediction methods. RESULTS: We have evaluated the performance of six methods of beta-turn prediction. All the methods have been tested on a set of 426 non-homologous protein chains. It has been observed that the performance of the neural network based method, BTPRED, is significantly better than the statistical methods. One of the reasons for its better performance is that it utilizes the predicted secondary structure information. We have also trained, tested and evaluated the performance of all methods except BTPRED and GORBTURN, on new data set using a 7-fold cross-validation technique. There is a significant improvement in performance of all the methods when secondary structure information is incorporated. Moreover, after incorporating secondary structure information, the Sequence Coupled Model has yielded better results in predicting beta-turns as compared with other methods. In this study, both threshold dependent and independent (ROC) measures have been used for evaluation.  相似文献   

20.
MOTIVATION: We are motivated by the fast-growing number of protein structures in the Protein Data Bank with necessary information for prediction of protein-protein interaction sites to develop methods for identification of residues participating in protein-protein interactions. We would like to compare conditional random fields (CRFs)-based method with conventional classification-based methods that omit the relation between two labels of neighboring residues to show the advantages of CRFs-based method in predicting protein-protein interaction sites. RESULTS: The prediction of protein-protein interaction sites is solved as a sequential labeling problem by applying CRFs with features including protein sequence profile and residue accessible surface area. The CRFs-based method can achieve a comparable performance with state-of-the-art methods, when 1276 nonredundant hetero-complex protein chains are used as training and test set. Experimental result shows that CRFs-based method is a powerful and robust protein-protein interaction site prediction method and can be used to guide biologists to make specific experiments on proteins. AVAILABILITY: http://www.insun.hit.edu.cn/~mhli/site_CRFs/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号