首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 937 毫秒
1.
基于氨基酸特征序列对人类Rh血型系统的蛋白质结构分析   总被引:1,自引:0,他引:1  
高雷  朱平 《生物信息学》2009,7(4):248-251
利用代数学中同态思想和物理中的“粗粒化”思想,以及HP模型,根据a,t,c,g的化学结构分类,提出了DNA序列的特征序列概念(σ-,τ-,σ∩τ-)并推广到蛋白质序列中,从而给出一种数值刻划,将蛋白质序列简化成一个(0,1)序列,基于上述给出特征序列的方法,根据氨基酸分子量与简并度的关系,提出了另外一种DNA序列的特征序列概念(-)并推广到蛋白质序列中,进而给出了另外一种数值刻划,将蛋白质序列简化成一个(0,1,2)序列,通过比较RHD基因和RHCE基因的特征序列的数值刻划图,得出RHD基因和RHCE基因均偏爱使用低分子量且高简并度的氨基酸。  相似文献   

2.
氨基酸序列集熵值计算工具实现及应用   总被引:1,自引:0,他引:1  
氨基酸序列保守区和可变区分析是蛋白质结构和功能分析预测的关键环节。本研究根据该需求,编写了Entropy软件,实现了氨基酸序列集熵值计算、统计分析和优势序列模型自动生成等功能,并利用其对A型流感病毒血凝素氨基酸序列的特征进行了分析。该软件为氨基酸序列集保守性分析提供了可靠工具。  相似文献   

3.
为了研究一级结构对蛋白质耐热性的影响,利用软件DNAMAN对16个家族32种蛋白质序列进行了氨基酸含量分析,并统计分析了氨基酸组成对蛋白质耐热性的影响。通过比较同一家族的高低温蛋白质序列及16个家族中所有高温和低温蛋白质序列中氨基酸含量的变化可以推断(从低温到高温):Ser、Cys.含量降低显著,Arg、Ile、Pro含量升高显著。由此可知高温蛋白质倾向于含有疏水性氨基酸而避免亲水性氨基酸。  相似文献   

4.
艾亮  冯杰 《生物信息学》2023,21(3):179-186
本文提出了一种新的快速非比对的蛋白质序列相似性与进化分析方法。在刻画蛋白质序列特征时,首先将氨基酸的10种理化性质通过主成分分析浓缩为6个主成分,并且将每条蛋白质序列里的氨基酸数目作为权重对主成分得分值进行加权平均,然后再融合氨基酸的位置信息构成一个26维的蛋白质序列特征向量,最后利用欧式距离度量蛋白质序列间的相似性及进化关系。通过对3个蛋白质序列数据集的测试表明,本文提出的方法能将每条蛋白质序列准确聚类,并且简便快捷,说明了该方法的有效性。  相似文献   

5.
为了更多地挖掘隐藏在蛋白质序列中的信息,本研究将20种氨基酸均匀地排列在单位圆周上,得到每种氨基酸对应的二维坐标,再与氨基酸的6个理化指标结合起来,最终用一个八维向量来刻画蛋白质序列。为避免数据极差对分析结果造成的影响,本研究对蛋白质序列所对应的八维向量作归一化处理。基于归一化后的蛋白质序列的向量表示,运用神经网络对蛋白质序列进行分类,并根据向量之间的欧式距离来量化序列之间的相似性。最后,以9个不同物种的ND5蛋白质序列以及8个不同物种的ND6蛋白质序列为例,Clustal W序列比对方法为基准,对本研究的方法与5-字母方法进行验证和比较,结果表明本研的方法是有效的。  相似文献   

6.
陈浩  朱晟  陈良标 《遗传学报》2005,32(3):315-321
20世纪70年代,Ohno提出了功能蛋白的起源理论,认为寡肽片段的周期性重复是蛋白质起源的一种方式。蛋白质内部重复片段在蛋白质序列进化的过程中具有重要意义。选取原核生物、古细菌、真核生物的8个代表物种,设计了新的蛋白质内部重复片段的提取方法,并用矩阵的方式对重复片段的类型及其出现的频率进行形象地展现,既保留了重复片段的序列特征又可进行全局性的统计描述。分析表明:真核生物高频率的使用简单重复序列;真细菌也具有低频率使用简单重复序列的现象;而古细菌则几乎没有。进一步研究显示,3大种群生物偏向性使用氨基酸构成蛋白质内部重复片段的形为与蛋白质组的氨基酸使用频率紧密相关。其相关系数在真细菌和古细菌中高于0.95,而真核生物略低。真核生物蛋白质组大量使用简单重复片段,以及两者在氨基酸使用上的较低相关性暗示简单重复序列的快速进化是导致真核生物蛋白质组高复杂性的一个关键因素。  相似文献   

7.
核酸序列中包含一定的蛋白质结构信息。根据通常情况下遗传密码表中密码子中间位的碱基配对时产生的氢键数目,尝试将20种氨基酸划分为两类,并用自编的计算机软件对蛋白质二级结构数据库中两类氨基酸的类聚现象进行了统计分析。结果表明,使用这种方法对氨基酸进行划分后,氨基酸残基具有较大概率与划入同一类的氨基酸残基相邻出现,并且这种聚集体对二级结构具有一定的偏好性。最后按照该方法设计了一段氨基酸序列并给出了预测服务器预测得到的结构。  相似文献   

8.
马鹏  王联结 《生物工程学报》2007,23(6):1082-1085
核酸序列中包含一定的蛋白质结构信息。根据通常情况下遗传密码表中密码子中间位的碱基配对时产生的氢键数目,尝试将20种氨基酸划分为两类,并用自编的计算机软件对蛋白质二级结构数据库中两类氨基酸的类聚现象进行了统计分析。结果表明,使用这种方法对氨基酸进行划分后,氨基酸残基具有较大概率与划入同一类的氨基酸残基相邻出现,并且这种聚集体对二级结构具有一定的偏好性。最后按照该方法设计了一段氨基酸序列并给出了预测服务器预测得到的结构。  相似文献   

9.
SARS冠状病毒全基因组序列初步分析   总被引:4,自引:0,他引:4  
对已经完成全序列测定的12个SARS病毒基因组进行了多序列比对,发现序列主体部分29708 b具有99.82%的相同碱基,除2个序列各有5个和6个碱基的缺失外,其余部分共有42个位点核苷酸碱基的差异,其中28个位点的碱基差异可引起氨基酸残基改变。利用蛋白质二级结构和跨膜螺旋预测以及蛋白质定位等生物信息学工具,分析了这些产生氨基酸改变部位的蛋白质构像,推测了可能产生的结构和功能改变,为进一步生物学实验提供参考。所有分析结果同时在北京大学生物信息中心抗SARS网站(antisars.cbi.pku.edu.cn)上发布。  相似文献   

10.
为分析具有相同VP1-2A区基因序列的甲型肝炎病毒(Hepatitis A virus,HAV)流行株全基因组序列特征,收集我国6个省市不同年份同一起或不同起甲型肝炎(甲肝)暴发中,部分甲肝病例急性期血清标本,提取HAV RNA,进行VP1-2A区基因分型,RT-PCR分段扩增HAV近全基因组序列,构建系统进化树,分析基因组特征。本研究获得16条HAV近全基因组序列,均属于基因IA亚型,核苷酸和氨基酸序列同源性分别为95.81%~100%和99.23%~100%。与GenBank中HAV序列比较,15条序列与Man12-001(蒙古国株)最接近,核苷酸和氨基酸序列同源性分别为97.87%~99.81%和99.55%~99.95%;1条序列与HAJEF-K12(日本株)最接近,核苷酸和氨基酸序列同源性分别为99.37%和99.91%。本文中VP1-2A区序列完全相同的HAV流行株,来源于同一起甲肝暴发时,HAV近全基因组核苷酸序列差异为0%~0.03%;来源于不同起暴发时,核苷酸序列差异为0.18%~0.99%。16条HAV序列在已发表的中和抗原表位未发现变异,编码区氨基酸序列处于负向选...  相似文献   

11.
The amino acid sequences of some fiber proteins possibly have a periodic structure. This periodicity can be analyzed using the Fourier transform of the mathematical image of the symbol sequence of amino acid residues in proteins. One of several possible methods of Fourier transform has been chosen as optimal for the given study. This optimal Fourier transform has been used to analyze the periodic structures in several fiber proteins of bacteriophage T4. Amino acids from some groups form sequences of alternating elements with a relatively small period (T=15); those from other groups form sequences with other small periods (T=10 and T=8). Relatively large periods of amino acid arrangement, with the entire amino acid sequence of the protein being divided between them into four or six equal parts, is a new finding. The data on protein structural periodicity make it possible to align the amino acid sequences according to the periodic structures of both type. The results obtained agree with the results of previous crystallographic and electron microscopic studies.__________Translated from Molekulyarnaya Biologiya, Vol. 39, No. 2, 2005, pp. 321–329.Original Russian Text Copyright © 2005 by Simakova, Simakov.  相似文献   

12.
The review deals with repeating fragments of amino acid sequences, so-called "motifs", that are important in maintaining structural integrity and/or function of various proteins, especially those interacting with phospholipid aggregates. The occurrence of Phe-Leu-Gly motif characteristic for the amino acid sequence of the primate immuno-deficiency viruses fusion peptides is analysed in various proteins, as well as tripeptide fragments of general formula Xaa-Xah-Gly (Xaa-Phe, Tyr; Xab-hydrophobic amino acids Ala, Val, Leu, Ile) homologous to the above motif and retro-sequences Gly-Xab-Xaa. These tripeptide repeats are characteristic for the amino acid sequences of complex membrane proteins, viral envelope proteins, proteinases and proteins connected with energy transfer or interacting with lipids. These repeats are frequently met in conservative regions of amino acid sequences, in sites readily accessible for other molecules at the boundary of or between structured fragments, this being due to the backbone semi-coiled form's preference in the given amino acid fragment. This protein motif appears to play an important role at the initial stages of the large protein's interaction with the phospholipid membrane.  相似文献   

13.
Evolution of the amino acid substitution in the mammalian myoglobin gene   总被引:1,自引:0,他引:1  
Summary Multivariate statistical analyses were applied to 16 physical and chemical properties of amino acids. Four of these properties; volume, polarity, isoelectric point (charge), and hydrophobicity were found to explain adequately 96% of the total variance of amino acid attributes. Using these four quantitative measures of amino acid properties, a structural discriminate function in the form of a weighted difference sum of squares equation was developed. The discriminate function is weighted by the location of each particular residue within a given tertiary structure and yields a numerical discriminate or difference value for the replacement of these residues by different amino acids. This resulting discriminate value represents an expression of the perturbation in the local positional environment of a protein when an amino acid substitution occurs. With the use of this structural discriminate function, a residue by residue comparison of the known mammalian myoglobin sequences was carried out in an attempt to elucidate the positions of possible deviations from the known tertiary structure of sperm whale myoglobin. Only 11 of the 153 residue positions in myoglobin demonstrated possible structural deviations. From this analysis, indices of difference were calculated for all amino acid exchanges between the various myoglobins. All comparisons yielded indices of difference that were considerably lower than would be expected if mutations had been fixed at random, even if the organization of the genetic code is taken into consideration. On the basis of these results, it is inferred that some form of selection has acted in the evolution of mammalian myoglobins to favor amino acid substitutions that are compatible with the retention of the original conformation of the protein.  相似文献   

14.
15.
Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.  相似文献   

16.
Mishra P  Pandey PN 《Bioinformation》2011,6(10):372-374
The number of amino acid sequences is increasing very rapidly in the protein databases like Swiss-Prot, Uniprot, PIR and others, but the structure of only some amino acid sequences are found in the Protein Data Bank. Thus, an important problem in genomics is automatically clustering homologous protein sequences when only sequence information is available. Here, we use graph theoretic techniques for clustering amino acid sequences. A similarity graph is defined and clusters in that graph correspond to connected subgraphs. Cluster analysis seeks grouping of amino acid sequences into subsets based on distance or similarity score between pairs of sequences. Our goal is to find disjoint subsets, called clusters, such that two criteria are satisfied: homogeneity: sequences in the same cluster are highly similar to each other; and separation: sequences in different clusters have low similarity to each other. We tested our method on several subsets of SCOP (Structural Classification of proteins) database, a gold standard for protein structure classification. The results show that for a given set of proteins the number of clusters we obtained is close to the superfamilies in that set; there are fewer singeltons; and the method correctly groups most remote homologs.  相似文献   

17.
Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10-12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made.  相似文献   

18.
19.
The amino acid sequences of rat ribosomal proteins L27a and L28 were deduced from the sequences of nucleotides in recombinant cDNAs and confirmed from the NH2-terminal amino acid sequences of the proteins. L27a contains 147 amino acids (the NH2-terminal methionine is removed after translation of the mRNA) and has a molecular weight of 16 476. Hybridization of the cDNA to digests of nuclear DNA suggests that there are 18-22 copies of the L27a gene. The mRNA for the protein is about 600 nucleotides in length. L27a is homologous to mouse L27a (there are 3 amino acid changes) and to yeast L29. Rat ribosomal protein L28 has 136 amino acids (its NH2-terminal methionine is also processed after translation) and has a molecular weight of 15 707. Hybridization of the cDNA to digests of nuclear DNA suggests that there are 9 or 10 copies of the L28 gene. The mRNA for the protein is about 640 nucleotides in length. L28 contains a possible internal duplication of 9 residues. Corrections are recorded in the sequences reported before for rat ribosomal proteins S4 and S12.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号