首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 171 毫秒
膜蛋白是生物膜功能的主要体现者,是细胞执行各种功能的物质基础,在细胞中发挥着至关重要的作用.分类预测未知类型的膜蛋白对于生物学相关研究具有指导性意义,是膜蛋白结构与功能研究领域的一项重要基础性工作.针对膜蛋白分类预测问题,利用k子串离散源的方法对膜蛋白序列进行特征提取,并融合最小离散增量方法和加权K近邻算法构建一种新型的膜蛋白分类预测模型,在自检验、Jackknife检验和独立测试集检验三种典型的检验方式下,预测准确率分别为99.95%、86.16%和98.36%.实验结果表明,k子串离散源方法能够有效提取膜蛋白序列的特征信息,与现有方法相比,该分类模型具有较高的分类预测成功率.  相似文献   

随机森林方法预测膜蛋白类型   总被引:2,自引:0,他引:2  
膜蛋白的类型与其功能是密切相关的,因此膜蛋白类型的预测是研究其功能的重要手段,从蛋白质的氨基酸序列出发对膜蛋白的类型进行预测有重要意义。文章基于蛋白质的氨基酸序列,将组合离散增量和伪氨基酸组分信息共同作为预测参数,采用随机森林分类器,对8类膜蛋白进行了预测。在Jackknife检验下的预测精度为86.3%,独立检验的预测精度为93.8%,取得了好于前人的预测结果。  相似文献   

膜蛋白是重要的药物靶位点,对膜蛋白类型的研究有助于药物的成功设计,因此正确预测膜蛋白类型对于药物研发是十分必要的。本文采用由274条分枝杆菌膜蛋白序列组成的一致性小于40%的数据集,以经过优化的伪氨基酸组分为特征,利用支持向量机分类算法预测分枝杆菌膜蛋白类型,在Jackknife检验下,得到85.4%的总体准确率和72.2%的平均准确率。结果说明,该方法可用于分枝杆菌膜蛋白类型的识别,将有助于抗分枝杆菌药物的开发。  相似文献   

近年来,质谱技术在膜蛋白结构与功能研究中被广泛应用。由于膜蛋白的跨膜结构域含有大量疏水性氨基酸,常常导致液质串联质谱检测的序列覆盖率较低,从而限制了质谱技术在膜蛋白结构与功能研究中的应用。文中利用人的整合膜蛋白维生素K环氧化物还原酶为模型,优化胶内消化条件,建立了一种稳定提高膜蛋白质谱序列覆盖率的糜蛋白酶胶内消化方法。通过探索钙离子浓度、pH值和缓冲体系对序列覆盖率、检测特异肽段的总数和类型以及特异肽段大小的影响,发现在5–10 mmol/L钙离子浓度、pH 8.0–8.5的Tris-HCl缓冲液中,可以兼顾序列覆盖率和肽段的多样性。该方法可以使膜蛋白的质谱覆盖率达到80%以上,将在膜蛋白结构与功能、膜蛋白相互作用位点的鉴定以及膜蛋白与小分子药物结合位点的鉴定等研究中具有广泛的应用价值。  相似文献   

在基因组数据中,有20%~30%的产物被预测为跨膜蛋白,本文通过对膜蛋白拓扑结构预测方法进行分析,并评价其结果,为选择更合适的拓扑结构预测方法预测膜蛋白结构。通过对目前已有的拓扑结构预测方法的评价分析,可以为我们在实际工作中提供重要的参考。比如对一个未知拓扑结构的跨膜蛋白序列,我们可以先进行是否含有信号肽的预测,参考Polyphobius和SignalP两种方法,若两种方法预测结果不一致,综合上述对两种方法的评价,Polyphobius预测的综合能力较好,可取其预测的结果,一旦确定含有信号肽,则N端必然位于膜外侧。然后结合序列的长度,判断蛋白是单跨膜还是多重跨膜,即可参照上述评价结果,选择合适的拓扑结构预测方法进行预测。  相似文献   

基于模糊支持向量机的膜蛋白折叠类型预测   总被引:1,自引:0,他引:1  
现有的基于支持向量机(support vector machine,SVM)来预测膜蛋白折叠类型的方法.利用的蛋白质序列特征并不充分.并且在处理多类蛋白质分类问题时存在不可分区域,针对这两类问题.提取蛋白质序列的氨基酸和二肽组成特征,并计算加权的多阶氨基酸残基指数相关系数特征,将3类特征融和作为分类器的输入特征矢量.并采用模糊SVM(fuzzy SVM,FSVM)算法解决对传统SVM不可分数据的分类.在无冗余的数据集上测试结果显示.改进的特征提取方法在相同分类算法下预测性能优于已有的特征提取方法:FSVM在相同特征提取方法下性能优于传统的SVM.二者相结合的分类策略在独立性数据集测试下的预测精度达到96.6%.优于现有的多种预测方法.能够作为预测膜蛋白和其它蛋白质折叠类型的有效工具.  相似文献   

基于小波分析的膜蛋白跨膜区段序列分析和预测   总被引:2,自引:0,他引:2  
膜蛋白是一类结构独特的蛋白质,在各种细胞中普遍存在,发挥着重要的生理功能。目前仅有少数膜蛋白听结构被实验测出,因此用计算机预测膜蛋白的结构是蛋白质结构预测的主要研究内容之一。膜蛋白一般在膜上形成保守的跨膜螺旋结构,序列特征明显,比较适合用预测的方法确定跨膜螺旋区段的位置。国际上已有一些研究者用人工神经网络方法、多序列比对方法和统计方法进行了预测尝试,取得了一定的成功经验。我们对蛋白质序列数据库中的  相似文献   

李楠  李春 《生物信息学》2012,10(4):238-240
基于氨基酸的16种分类模型,给出蛋白质序列的派生序列,进而结合加权拟熵和LZ复杂度构造出34维特征向量来表示蛋白质序列。借助于贝叶斯分类器对同源性不超过25%的640数据集进行蛋白质结构类预测,准确度达到71.28%。  相似文献   

蛋白质的序列、结构和功能多种多样。大量研究表明蛋白质的结构与其氨基酸序列的排序有关,并且局部的氨基酸序列环境对蛋白质的结构具有一定的影响。本文提出一种新的基于5-mer氨基酸扭转角统计偏好的蛋白质结构类型预测方法,该方法通过PDB数据库中5-mer中间氨基酸的扭转角统计偏好来进行结构类型的预测。新方法可以通过计算机仿真实现对新蛋白质序列结构类型的快速预测,并通过两组随机抽取的CATH数据验证了新方法的有效性。  相似文献   

膜蛋白的结构预测在目前比较困难.本文利用已建立的模式识别方法预测了三个典型的膜蛋白RC,BR和RH的二级结构,预测结果与实验资料的符合率与该方法用于球蛋白时的结果相仿,是成功的.本文进一步完善了模式识别预测蛋白质二级结构的方法.建立了针对球蛋白二级结构预测的多分类方法,预测精度大于60%.事实证明这是一种较好的结构预测方法,鉴于目前国内外运用模式识别方法进行结构预测研究的还不多见,我们拟进一步发展完善这一方法.  相似文献   

Discrimination of outer membrane proteins using support vector machines   总被引:3,自引:0,他引:3  
MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for dissecting outer membrane proteins (OMPs) from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have developed a method based on support vector machines using amino acid composition and residue pair information. Our approach with amino acid composition has correctly predicted the OMPs with a cross-validated accuracy of 94% in a set of 208 proteins. Further, this method has successfully excluded 633 of 673 globular proteins and 191 of 206 alpha-helical membrane proteins. We obtained an overall accuracy of 92% for correctly picking up the OMPs from a dataset of 1087 proteins belonging to all different types of globular and membrane proteins. Furthermore, residue pair information improved the accuracy from 92 to 94%. This accuracy of discriminating OMPs is higher than that of other methods in the literature, which could be used for dissecting OMPs from genomic sequences. AVAILABILITY: Discrimination results are available at http://tmbeta-svm.cbrc.jp.  相似文献   

外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。  相似文献   

外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。  相似文献   

Discriminating outer membrane proteins for globular proteins (GPs) and other types of membrane proteins from genomic sequences is an important and hot topic. In this paper, a measure based on information discrepancy is proposed and applied to the discrimination of outer membrane proteins. It differs from previous methods which are based on amino acid composition. Our approach focuses on the comparison of subsequence distributions and takes into account the effect of residue order in protein primary structures. As a result, the new approach outperforms all previous methods on the same benchmark datasets. In particular, we show that the proposed approach has correctly identified the outer membrane proteins at an accuracy of 99% for the training set of 337 proteins and has correctly excluded the GPs at an accuracy of 86% in a non-redundant dataset of 668 proteins. Furthermore, this method is able to correctly exclude alpha-helical membrane proteins at an accuracy of 100%.  相似文献   

We have performed an amino acid composition (AAC) analysis of the complete sequences for 235 secondary transport proteins from Escherichia coli, which have functions in the uptake and export of organic and inorganic metabolites, efflux of drugs and in controlling membrane potential. This revealed the trends in content for specific amino acid types and for combinations of amino acids with similar physicochemical properties. In certain proteins or groups of proteins, the so-called spikes of high content for a specific amino acid type or combination of amino acids were identified and confirmed statistically, which in some cases could be directly related to function and ligand specificity. This was prevalent in proteins with a function of multidrug or metal ion efflux. Any tool that can help in identifying bacterial multidrug efflux proteins is important for a better understanding of this mechanism of antibiotic resistance. Phylogenetic analysis based on sequence alignments and comparison of sequences at the N- and C-terminal ends confirmed transporter Family classification. Locations of specific amino acid types in some of the proteins that have crystal structures (EmrE, LacY, AcrB) were also considered to help link amino acid content with protein function. Though there are limitations, this work has demonstrated that a basic analysis of AAC is a useful tool to use in combination with other computational and experimental methods for classifying and investigating function and ligand specificity in a large group of transport or other membrane proteins, including those that are molecular targets for development of new drugs.  相似文献   

Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the influence of physico-chemical, energetic and conformational properties of amino acid residues for discriminating outer membrane proteins using different machine learning algorithms, such as, Bayes rules, Logistic functions, Neural networks, Support vector machines, Decision trees, etc. We observed that most of the properties have discriminated the OMPs with similar accuracy. The neural network method with the property, free energy change could discriminate the OMPs from other folding types of globular and membrane proteins at the 5-fold cross-validation accuracy of 94.4% in a dataset of 1,088 proteins, which is better than that obtained with amino acid composition. The accuracy of discriminating globular proteins is 94.3% and that of transmembrane helical (TMH) proteins is 91.8%. Further, the neural network method is tested with globular proteins belonging to 30 major folding types and it could successfully exclude 99.4% of the considered 1612 non-redundant proteins. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

A new algorithm to predict the types of membrane proteins is proposed. Besides the amino acid composition of the query protein, the information within the amino acid sequence is taken into account. A formulation of the autocorrelation functions based on the hydrophobicity index of the 20 amino acids is adopted. The overall predictive accuracy is remarkably increased for the database of 2054 membrane proteins studied here. An improvement of about 13% in the resubstitution test and 8% in the jackknife test is achieved compared with those of algorithms based merely on the amino acid composition. Consequently, overall predictive accuracy is as high as 94% and 82% for the resubstitution and jackknife tests, respectively, for the prediction of the five types. Since the proposed algorithm is based on more parameters than those in the amino acid composition approach, the predictive accuracy would be further increased for a larger and more class-balanced database. The present algorithm should be useful in the determination of the types and functions of new membrane proteins. The computer program is available on request.  相似文献   

Predicting membrane protein type is a meaningful task because this kind of information is very useful to explain the function of membrane proteins. Due to the explosion of new protein sequences discovered, it is highly desired to develop efficient computation tools for quickly and accurately predicting the membrane type for a given protein sequence. Even though several membrane predictors have been developed, they can only deal with the membrane proteins which belong to the single membrane type. The fact is that there are membrane proteins belonging to two or more than two types. To solve this problem, a system for predicting membrane protein sequences with single or multiple types is proposed. Pseudo–amino acid composition, which has proven to be a very efficient tool in representing protein sequences, and a multilabel KNN algorithm are used to compose this prediction engine. The results of this initial study are encouraging.  相似文献   

MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have systematically analyzed the amino acid composition of globular proteins from different structural classes and outer membrane proteins. We found that the residues, Glu, His, Ile, Cys, Gln, Asn and Ser, show a significant difference between globular and outer membrane proteins. Based on this information, we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 89% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 80%. These accuracy levels are comparable to other methods in the literature, and this is a simple method, which could be used for dissecting outer membrane proteins from genomic sequences. The influence of protein size, structural class and specific residues for discrimination is discussed.  相似文献   

Identifying the subcellular localization of proteins is particularly helpful in the functional annotation of gene products. In this study, we use Machine Learning and Exploratory Data Analysis (EDA) techniques to examine and characterize amino acid sequences of human proteins localized in nine cellular compartments. A dataset of 3,749 protein sequences representing human proteins was extracted from the SWISS-PROT database. Feature vectors were created to capture specific amino acid sequence characteristics. Relative to a Support Vector Machine, a Multi-layer Perceptron, and a Naive Bayes classifier, the C4.5 Decision Tree algorithm was the most consistent performer across all nine compartments in reliably predicting the subcellular localization of proteins based on their amino acid sequences (average Precision=0.88; average Sensitivity=0.86). Furthermore, EDA graphics characterized essential features of proteins in each compartment. As examples, proteins localized to the plasma membrane had higher proportions of hydrophobic amino acids; cytoplasmic proteins had higher proportions of neutral amino acids; and mitochondrial proteins had higher proportions of neutral amino acids and lower proportions of polar amino acids. These data showed that the C4.5 classifier and EDA tools can be effective for characterizing and predicting the subcellular localization of human proteins based on their amino acid sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号