首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 250 毫秒
1.
基于蛋白质序列,提出了一种新的超二级结构模体β-发夹的预测方法。利用离散增量构成的向量来表示序列信息,并将6个离散增量输入支持向量机,在六维向量空间中寻找最优超平面,将β-发夹和非β-发夹进行分类。计算结果表明,利用所设计的算法预测β-发夹,有较高的预测能力。对于训练集,5-交叉检验的预测总精度为81.24%,相关系数为0.57,β-发夹敏感性为83.06%;对于独立的检验集,预测总精度为78.34%,相关系数0.56,β-发夹敏感性为77.24%。将此预测模型应用于CASP6的63个蛋白质进行检验,得到较好结果。  相似文献   

2.
蛋白质超二级结构预测是三级结构预测的一个非常重要的中间步骤。本文从蛋白质的一级序列出发,对5793个蛋白质中的四类简单超二级结构进行预测,以位点氨基酸为参数,采用3种片段截取方式,分别用离散增量算法预测的结果不理想,将组合的离散增量值作为特征参数输入支持向量机,取得了较好的预测结果,5交叉检验的平均预测总精度达到83.0%,Matthew’s相关系数在0.71以上。  相似文献   

3.
蛋白质亚细胞定位的识别   总被引:5,自引:2,他引:3  
根据蛋白质的亚细胞定位,将蛋白质分为12类,用离散量的数学理论,以蛋白质中400个氨基酸二联体数目构成离散源,通过计算离散增量预测蛋白质的亚细胞定位,用Self-consistency和Jackknife两种方法测试均获得较高的预测成功率。结果表明:Self-consistency方法预测成功率为84.5%,Jackknife方法预测成功率为81.1%。  相似文献   

4.
利用支持向量机和马氏判别式预测人类polⅡ启动子   总被引:1,自引:0,他引:1  
林昊  杨科利 《生物信息学》2009,7(2):117-119,127
通过选取人类启动子与非启动子序列中不同的k-mer作为预测算法的基础特征,分别以三个区域(-249~-1;0~+50;-30~+30)的6-mer频数作为离散源参数构建离散增量,同时选取24个位点(-31~-21;-4-+2;+25-+29)的3-mer频数作为位置打分函数的参数,分别利用支持向量机和马氏判别式为判别函数对启动子进行预测。用10折叠交叉检验来衡量两种算法的预测能力,预测结果成功率分别达到87.0%和87.9%。对于独立检验集,敏感性分别为62.7%和76.0%,特异性分别为77.5%和66.8%。  相似文献   

5.
以人IL-32α的氨基酸序列为基础,采用SOPMA法预测IL-32α的二级结构;应用ProScale、Bcepred服务器和EMBOSS软件,分析人IL-32α亲水性、柔韧性、可及性和抗原性指数,并结合二级结构特征,预测IL-32α的届细胞表位.结果显示:IL-32α的二级结构由α螺旋(61.07%)和无规卷曲(38.93%)组成:其B细胞表位可能位于N端第87~94、102~108、121~127区段.预测结果将有助于确定IL-32α的B细胞表位,为研制抗人IL-32单克隆抗体提供了理论依据.  相似文献   

6.
 本文对蛋白质序列的肽键进行了统计分析,计算了二肽构象参数P_α、P_β、P_c和三肽构象参数Q_α、Q_β、Q_c。在此基础上提出了由氨基酸序列预测二级结构的规则。预测的正确率达90%,优于Chou-Fasman方法。这个结果表明二肽(三肽)关联在形成蛋白质二级结构中具有明显的重要性。  相似文献   

7.
蛋白质结构型的识别方法   总被引:2,自引:0,他引:2  
给出了α型、β型、α/β型、多域型蛋白质二级结构主序列六联体的分布规律.提出了根据蛋白质二级结构主序列对蛋白质结构型进行识别(分类)的方法.以蛋白质二级结构主序列三联体为参数,利用Mahalanobis距离方法对上述4种结构型的蛋白质进行识别,分类的总体准确率为81%;以二级结构主序列中六联体的频数构成蛋白质结构的多样性源,利用多样性增量极小化对上述4种结构型进行识别,分类的总体准确率为83%. 同时也给出了对紧结构域的识别途径.  相似文献   

8.
通过现有的序列同源性比较、二级结构预测、三维结构预测和模拟方法,得到了拟南芥中PAP特异磷酸酶的三维结构.这是一种与酵母中的Hal2p蛋白质类似,并且N端为α+β,C端为α/β结构域的多结构域蛋白质.分析预测所得结构,发现了Mg2+等金属离子的结合位点,推测了对Na+敏感的结构基础.这些结合位点与其生化功能相关.而且,通过结构与功能分析,讨论了蛋白质数据库(PDB)中同一个酶已有理论结构的不合理性.  相似文献   

9.
随机森林方法预测膜蛋白类型   总被引:2,自引:0,他引:2  
膜蛋白的类型与其功能是密切相关的,因此膜蛋白类型的预测是研究其功能的重要手段,从蛋白质的氨基酸序列出发对膜蛋白的类型进行预测有重要意义。文章基于蛋白质的氨基酸序列,将组合离散增量和伪氨基酸组分信息共同作为预测参数,采用随机森林分类器,对8类膜蛋白进行了预测。在Jackknife检验下的预测精度为86.3%,独立检验的预测精度为93.8%,取得了好于前人的预测结果。  相似文献   

10.
用离散量的方法识别蛋白质的超二级结构   总被引:1,自引:0,他引:1  
用离散量的方法,对2208个分辨率在2.5I以上的高精度的蛋白质结构中四类超二级结构进行了识别。从蛋白质一级序列出发,以氨基酸(20种氨基酸加一个空位)和其紧邻关联共同为参数,当序列模式固定长取8个氨基酸残基时,对“822”序列模式3交叉检验的平均预测精度达到78.1%,jack-knife检验的平均预测精度达到76.7%;当序列模式固定长取10个氨基酸残基时,对“1041”序列模式3交叉检验的平均预测精度达到83.1%,jack-knife检验的平均预测精度达到79.8%。  相似文献   

11.
Based on the concept that the structural class of a protein is mainly determined by its secondary structure sequence, a new algorithm for prediction of the structural class of a protein is proposed. By use of the number of alpha -helices, beta -strands, and betaalphabeta fragments, the structural class of a protein can be predicted by an algorithm based on the increment of diversity (ID), in which the sole prediction parameter-the increment of diversity is used as the index of prediction of structural class of a protein. The results indicate that the high rates of correct prediction are obtained for complete set (standard set) from Brookhaven Protein Data Bank-CD ROM (PDB) published in October 1995 and the test set newly released from Brookhaven Protein Data Bank-CD ROM (PDB) before July 1998, respectively.  相似文献   

12.
Luo L  Li X 《Proteins》2000,39(1):9-25
Based on the concept that the framework structure of a protein is determined by its secondary structure sequence, a new method for recognition and prediction of the structural class is suggested. By use of parameters N(alpha), N(beta), and N(beta(alpha)beta) (the number of alpha-helices, beta-strands, and beta(alpha)beta fragments), one can recognize the structural class with an accuracy higher than 90% when applied to the complete set (standard set) published in October 1995 and the structure data newly released before July 1998 (test set). Furthermore, the framework structures of beta, alpha, and alpha/beta protein are studied. It is found that these structures can be built from some basic units and that their architecture obeys some definite rules. Based on the packing of these basic units a set of rules for the recognition of topologies of the framework structure are worked out. When applied to the 1995 standard set and the 1998 test set the rates of correct recognition are higher than 77%. The simplicity and universality of framework structures are indicated which may be related to the evolutionary conservation of these folds. Proteins 2000;39:9-25.  相似文献   

13.
Jost (Ecology, 88:2427–2439, 2007) recently showed that the Shannon diversity is the only standard diversity measure that can be partitioned into meaningful independent alpha and beta components when plot weights are unequal. This conclusion is very disappointing if one wants to calculate the beta diversity of unequal weighted plots using a parametric measure with varying sensitivities to the occurrence of rare and abundant species. To overcome this impasse, at least partially, in this paper, I propose a parametric measure of beta diversity that is based on the combination of Shannon’s entropy with Hurlbert’s ‘expected species diversity’. Unlike most parametric measures of diversity, the proposed index has a clear probabilistic interpretation, allowing at the same time a multiplicative partition of diversity into independent alpha and beta components for unequally weighted plots.  相似文献   

14.
A new method has been developed to compute the probability that each amino acid in a protein sequence is in a particular secondary structural element. Each of these probabilities is computed using the entire sequence and a set of predefined structural class models. This set of structural classes is patterned after Jane Richardson''s taxonomy for the domains of globular proteins. For each structural class considered, a mathematical model is constructed to represent constraints on the pattern of secondary structural elements characteristic of that class. These are stochastic models having discrete state spaces (referred to as hidden Markov models by researchers in signal processing and automatic speech recognition). Each model is a mathematical generator of amino acid sequences; the sequence under consideration is modeled as having been generated by one model in the set of candidates. The probability that each model generated the given sequence is computed using a filtering algorithm. The protein is then classified as belonging to the structural class having the most probable model. The secondary structure of the sequence is then analyzed using a "smoothing" algorithm that is optimal for that structural class model. For each residue position in the sequence, the smoother computes the probability that the residue is contained within each of the defined secondary structural elements of the model. This method has two important advantages: (1) the probability of each residue being in each of the modeled secondary structural elements is computed using the totality of the amino acid sequence, and (2) these probabilities are consistent with prior knowledge of realizable domain folds as encoded in each model. As an example of the method''s utility, we present its application to flavodoxin, a prototypical alpha/beta protein having a central beta-sheet, and to thioredoxin, which belongs to a similar structural class but shares no significant sequence similarity.  相似文献   

15.
16.
The interpretation of the circular dichroism (CD) spectra of proteins to date requires additional secondary structural information of the proteins to be analyzed, such as X-ray or NMR data. Therefore, these methods are inappropriate for a CD database whose secondary structures are unknown, as in the case of the membrane proteins. The convex constraint analysis algorithm (Perczel, A., Hollósi, M., Tusnády, G., & Fasman, G. D., 1991, Protein Eng. 4, 669-679), on the other hand, operates only on a collection of spectral data to extract the common spectral components with their spectral weights. The linear combinations of these derived "pure" CD curves can reconstruct the original data set with great accuracy. For a membrane protein data set, the five-component spectra so obtained from the deconvolution consisted of two different types of alpha helices (the alpha helix in the soluble domain and the alpha T helix, for the transmembrane alpha helix), a beta-pleated sheet, a class C-like spectrum related to beta turns, and a spectrum correlated with the unordered conformation. The deconvoluted CD spectrum for the alpha T helix was characterized by a positive red-shifted band in the range 195-200 nm (+95,000 deg cm2 dmol-1), with the intensity of the negative band at 208 nm being slightly less negative than that of the 222-nm band (-50,000 and -60,000 deg cm2 dmol-1, respectively) in comparison with the regular alpha helix, with a positive band at 190 nm and two negative bands at 208 and 222 nm with magnitudes of +70,000, -30,000, and -30,000 deg cm2 dmol-1, respectively.  相似文献   

17.
A significant number of protein sequences in a given proteome have no obvious evolutionarily related protein in the database of solved protein structures, the PDB. Under these conditions, ab initio or template-free modeling methods are the sole means of predicting protein structure. To assess its expected performance on proteomes, the TASSER structure prediction algorithm is benchmarked in the ab initio limit on a representative set of 1129 nonhomologous sequences ranging from 40 to 200 residues that cover the PDB at 30% sequence identity and which adopt alpha, alpha + beta, and beta secondary structures. For sequences in the 40-100 (100-200) residue range, as assessed by their root mean square deviation from native, RMSD, the best of the top five ranked models of TASSER has a global fold that is significantly close to the native structure for 25% (16%) of the sequences, and with a correct identification of the structure of the protein core for 59% (36%). In the absence of a native structure, the structural similarity among the top five ranked models is a moderately reliable predictor of folding accuracy. If we classify the sequences according to their secondary structure content, then 64% (36%) of alpha, 43% (24%) of alpha + beta, and 20% (12%) of beta sequences in the 40-100 (100-200) residue range have a significant TM-score (TM-score > or = 0.4). TASSER performs best on helical proteins because there are less secondary structural elements to arrange in a helical protein than in a beta protein of equal length, since the average length of a helix is longer than that of a strand. In addition, helical proteins have shorter loops and dangling tails. If we exclude these flexible fragments, then TASSER has similar accuracy for sequences containing the same number of secondary structural elements, irrespective of whether they are helices and/or strands. Thus, it is the effective configurational entropy of the protein that dictates the average likelihood of correctly arranging the secondary structure elements.  相似文献   

18.
A protein is usually classified into one of the following four structural classes: all alpha, all beta, (alpha + beta) and alpha/beta. In this paper, based on the maximum correlation-coefficient principle, a new formulation is proposed for predicting the structural class of a protein according to its amino acid composition. Calculations have been made for a development set of proteins from which the amino acid compositions for the standard structural classes were derived, and an independent set of proteins which are outside the development set. The former can test the self consistency of a method and the latter can test its extrapolating effectiveness. In both cases, the results showed that the new method gave a considerably higher rate of correct prediction than any of the previous methods, implying that a significant improvement has been achieved by implementing the maximum-correlation-coefficient principle in the new method.  相似文献   

19.
放牧对内蒙古典型草原α、β和γ多样性的影响机制   总被引:2,自引:0,他引:2       下载免费PDF全文
人类活动干扰对生物多样性和生态系统功能的影响机制是近年来生态学研究的一个热点问题。该研究以内蒙古锡林郭勒草原生态系统国家野外科学观测研究站的大型放牧控制实验为平台, 系统地研究了不同降水(丰水年份和平水年份)和地形(平地和坡地)条件下, 放牧对典型草原不同空间尺度植物多样性(α、β和γ多样性)的影响。研究发现: (1)降水和地形条件及其交互效应对植物多样性有明显的影响, 丰水年份的α、β和γ多样性均高于平水年份; 降水和地形条件存在交互效应, 平水年份坡地系统的α多样性高于平地系统, 丰水年份平地系统的α和γ多样性高于坡地系统, 而地形对β多样性并没有显著影响; (2)随着放牧强度的增加, 平地和坡地的α多样性均呈逐渐下降的趋势, 不同植物群落成员型(优势种、常见种和稀有种)对放牧的响应及其对α多样性的贡献不同, 其中稀有种对α多样性的贡献最大, 常见种次之, 优势种最小; (3) γ多样性对放牧强度的响应受地形条件的影响, 随着放牧强度的增加, 平地γ多样性呈逐渐下降的趋势, 而坡地γ多样性呈现先减少后增加的趋势; (4)平地β多样性随放牧强度的增加而逐渐减小, 而坡地并没有明显的规律。该研究表明, 植物多样性对放牧的响应受降水和地形因素的调控, 平地对放牧的缓冲能力强于坡地, 干旱会加剧过度放牧对生物多样性的影响; 稀有种对于草地生态系统的多样性维持具有重要意义。因此, 在确定合理的放牧强度时, 应结合降水和地形条件。在平水年份需加强平地系统植物多样性的保护, 而在丰水年份需加强坡地系统植物多样性的保护, 从而实现草地资源的可持续性利用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号