首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Z. Yang  S. Kumar    M. Nei 《Genetics》1995,141(4):1641-1650
A statistical method was developed for reconstructing the nucleotide or amino acid sequences of extinct ancestors, given the phylogeny and sequences of the extant species. A model of nucleotide or amino acid substitution was employed to analyze data of the present-day sequences, and maximum likelihood estimates of parameters such as branch lengths were used to compare the posterior probabilities of assignments of character states (nucleotides or amino acids) to interior nodes of the tree; the assignment having the highest probability was the best reconstruction at the site. The lysozyme c sequences of six mammals were analyzed by using the likelihood and parsimony methods. The new likelihood-based method was found to be superior to the parsimony method. The probability that the amino acids for all interior nodes at a site reconstructed by the new method are correct was calculated to be 0.91, 0.86, and 0.73 for all, variable, and parsimony-informative sites, respectively, whereas the corresponding probabilities for the parsimony method were 0.84, 0.76, and 0.51, respectively. The probability that an amino acid in an ancestral sequence is correctly reconstructed by the likelihood analysis ranged from 91.3 to 98.7% for the four ancestral sequences.  相似文献   

2.
Abstract

Protein sequences are treated as stochastic processes on the basis of a reduced amino acid alphabet of 10 types of amino acids. The realization of a stochastic process is described by associated transition probability matrix that corresponds to the process uniquely. Then new distances between transition probability matrices are defined for sequences similarity analysis. Two separate datasets are prepared and tested to identify the validity of the method. The results demonstrate the new method is powerful and efficient.  相似文献   

3.
蛋白质的序列、结构和功能多种多样.大量研究表明蛋白质的结构与其氨基酸序列的排序有关,并且局部的氨基酸序列环境对蛋白质的结构具有一定的影响.本文提出一种新的基于5-mer氨基酸扭转角统计偏好的蛋白质结构类型预测方法,在该方法通过PDB数据库中5-mer中间氨基酸的扭转角统计偏好来进行结构类型的预测.新方法可以通过计算机仿...  相似文献   

4.
为了更多地挖掘隐藏在蛋白质序列中的信息,本研究将20种氨基酸均匀地排列在单位圆周上,得到每种氨基酸对应的二维坐标,再与氨基酸的6个理化指标结合起来,最终用一个八维向量来刻画蛋白质序列。为避免数据极差对分析结果造成的影响,本研究对蛋白质序列所对应的八维向量作归一化处理。基于归一化后的蛋白质序列的向量表示,运用神经网络对蛋白质序列进行分类,并根据向量之间的欧式距离来量化序列之间的相似性。最后,以9个不同物种的ND5蛋白质序列以及8个不同物种的ND6蛋白质序列为例,Clustal W序列比对方法为基准,对本研究的方法与5-字母方法进行验证和比较,结果表明本研的方法是有效的。  相似文献   

5.
6.
Bio-support vector machines for computational proteomics   总被引:2,自引:0,他引:2  
MOTIVATION: One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias. RESULTS: By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness.  相似文献   

7.
A new method based on the near infrared technique has been developed for the noninvasive and nondestructive determination of the identity and sequences of amino acid residues in small peptides. The method is capable of distinguishing not only peptides with very similar structures (e.g., Gly-Ala-Ala, Gly-Ala-Leu, Leu-Gly-Gly and Gly-Leu-Leu-Gly, Gly-Leu-Gly-Gly, Gly-Gly-Ala-Gly) but also peptides with the same amino acid residues but different sequences (e.g., Gly-Ala-Ala, Ala-Gly-Ala, Ala-Ala-Gly and Gly-Gly-Gly-Ala, Gly-Gly-Ala-Gly).  相似文献   

8.
Unbiased estimation of evolutionary distance between nucleotide sequences   总被引:7,自引:2,他引:5  
A new algorithm for estimating the number of nucleotide substitutions per site (i.e., the evolutionary distance) between two nucleotide sequences is presented. This algorithm can be applied to many estimation methods, such as Jukes and Cantor's method, Kimura's transition/transversion method, and Tajima and Nei's method. Unlike ordinary methods, this algorithm is always applicable. Numerical computations and computer simulations indicate that this algorithm gives an almost unbiased estimate of the evolutionary distance, unless the evolutionary distance is very large. This algorithm should be useful especially when we analyze short nucleotide sequences. It can also be applied to amino acid sequences, for estimating the number of amino acid replacements.   相似文献   

9.
A new method based on the analysis of oligopeptide composition of the amino acid sequences from different protein families is presented. We assume, that any protein family can be characterized by the set of oligopeptides (oligopeptides vocabulary). We demonstrate, that oligopeptides vocabulary comparison can distinguish different families from each other and from random sequences. It should be noted, that this comparison can be successfully performed on the set of only 25 dipeptides and without preliminary alignment. We demonstrate, that characteristic peptides are localized in the regions of functional significance, as shown on the example of GTP-binding domain of translation elongation factors. We suggest how to use this method to localize the boundaries of functional domains in amino sequences. On the example of few functional domains we demonstrate, that the average error of prediction does not exceed 3-4 amino acid residue.  相似文献   

10.
Protein sequence comparison based on the wavelet transform approach   总被引:4,自引:0,他引:4  
A protein's chemical properties, the chain conformation, the function of the protein and its species specificity are determined by the information contained in the amino acid sequence. Proteins of similar functions have at some level sequential identical amino acid sequences. The closer the phylogenetic relationship, the more similar are the sequences. To find the similarities between two or more protein sequences is of great importance for protein sequence analysis. The differences in the amino acid sequences permit the construction of a family tree of evolution. In this work, a comparison method was devised that is capable of analysing a protein sequence 'hierarchically', i.e. it can examine a protein sequence at different spatial resolutions. Based on a wavelet decomposition of protein sequences and a cross-correlation study, a sequence-scale similarity concept is proposed for generating a similarity vector, which renders the comparison of two sequences feasible at different spatial resolutions (scales). This new similarity concept is an expansion of the conventional sequence similarity, which only takes into account the local pairwise amino acid match and ignores the information contained in coarser spatial resolutions.  相似文献   

11.
艾亮  冯杰 《生物信息学》2023,21(3):179-186
本文提出了一种新的快速非比对的蛋白质序列相似性与进化分析方法。在刻画蛋白质序列特征时,首先将氨基酸的10种理化性质通过主成分分析浓缩为6个主成分,并且将每条蛋白质序列里的氨基酸数目作为权重对主成分得分值进行加权平均,然后再融合氨基酸的位置信息构成一个26维的蛋白质序列特征向量,最后利用欧式距离度量蛋白质序列间的相似性及进化关系。通过对3个蛋白质序列数据集的测试表明,本文提出的方法能将每条蛋白质序列准确聚类,并且简便快捷,说明了该方法的有效性。  相似文献   

12.
We present a model of amino acid sequence evolution based on a hidden Markov model that extends to transmembrane proteins previous methods that incorporate protein structural information into phylogenetics. Our model aims to give a better understanding of processes of molecular evolution and to extract structural information from multiple alignments of transmembrane sequences and use such information to improve phylogenetic analyses. This should be of value in phylogenetic studies of transmembrane proteins: for example, mitochondrial proteins have acquired a special importance in phylogenetics and are mostly transmembrane proteins. The improvement in fit to example data sets of our new model relative to less complex models of amino acid sequence evolution is statistically tested. To further illustrate the potential utility of our method, phylogeny estimation is performed on primate CCR5 receptor sequences, sequences of l and m subunits of the light reaction center in purple bacteria, guinea pig sequences with respect to lagomorph and rodent sequences of calcitonin receptor and K-substance receptor, and cetacean sequences of cytochrome b.  相似文献   

13.
MOTIVATION: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. RESULTS: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.  相似文献   

14.
The rational design of peptide and protein helices is not only of practical importance for protein engineering but also is a useful approach in attempts to improve our understanding of protein folding. Recent modifications of theoretical models of helix‐coil transitions allow accurate predictions of the helix stability of monomeric peptides in water and provide new possibilities for protein design. We report here a new method for the design of α‐helices in peptides and proteins using AGADIR, the statistical mechanical theory for helix‐coil transitions in monomeric peptides and the tunneling algorithm of global optimization of multidimensional functions for optimization of amino acid sequences. CD measurements of helical content of peptides with optimized sequences indicate that the helical potential of protein amino acids is high enough to allow formation of stable α‐helices in peptides as short as of 10 residues in length. The results show the maximum achievable helix content (HC) of short peptides with fully optimized sequences at 5 °C is expected to be ~70–75%. Under certain conditions the method can be a powerful practical tool for protein engineering. Unlike traditional approaches that are often used to increase protein stability by adding a few favorable interactions to the protein structure, this method deals with all possible sequences of protein helices and selects the best one from them. Copyright © 2009 European Peptide Society and John Wiley & Sons, Ltd.  相似文献   

15.
We present a new support vector machine (SVM)-based approach to predict the substrate specificity of subtypes of a given protein sequence family. We demonstrate the usefulness of this method on the example of aryl acid-activating and amino acid-activating adenylation domains (A domains) of nonribosomal peptide synthetases (NRPS). The residues of gramicidin synthetase A that are 8 A around the substrate amino acid and corresponding positions of other adenylation domain sequences with 397 known and unknown specificities were extracted and used to encode this physico-chemical fingerprint into normalized real-valued feature vectors based on the physico-chemical properties of the amino acids. The SVM software package SVM(light) was used for training and classification, with transductive SVMs to take advantage of the information inherent in unlabeled data. Specificities for very similar substrates that frequently show cross-specificities were pooled to the so-called composite specificities and predictive models were built for them. The reliability of the models was confirmed in cross-validations and in comparison with a currently used sequence-comparison-based method. When comparing the predictions for 1230 NRPS A domains that are currently detectable in UniProt, the new method was able to give a specificity prediction in an additional 18% of the cases compared with the old method. For 70% of the sequences both methods agreed, for <6% they did not, mainly on low-confidence predictions by the existing method. None of the predictive methods could infer any specificity for 2.4% of the sequences, suggesting completely new types of specificity.  相似文献   

16.
从蛋白质折叠成自由能最小的稳定结构类型为研究的出发点,为揭示蛋白质空间折叠的动力学本质,对非同源蛋白质数据库,以蛋白质序列的氮基酸频率和自协方差函数为特征矢量,求出表征特征矢量中各分量耦合作用与协同作用的协方差矩阵所对应的特征值.与Chou的方法相比,更全面地反映了蛋白质折叠密码的简并性、全局性和多意性,为定量表征折叠成不同结构类的蛋白质,提供了一种动力学参数分析方法.  相似文献   

17.
A new method for detecting site-specific variation of evolutionary rate (the so-called covarion process) from protein sequence data is proposed. It involves comparing the maximum-likelihood estimates of the replacement rate of an amino acid site in distinct subtrees of a large tree. This approach allows detection of covarion at the gene or the amino acid levels. The method is applied to mammalian-mitochondrial-protein sequences. Significant covarion-like evolution is found in the (simian) primate lineage: some amino acid positions are fast-evolving (i.e. unconstrained) in non-primate mammals but slow-evolving (i.e. highly constrained) in primates, and some show the opposite pattern. Our results indicate that the mitochondrial genome of primates reached a new peak of the adaptive landscape through positive selection.  相似文献   

18.
Prediction of protein secondary structure by the hidden Markov model   总被引:4,自引:0,他引:4  
The purpose of this paper is to introduce a new method for analyzingthe amino acid sequences of proteins using the hidden Markovmodel (HMM), which is a type of stochastic model. Secondarystructures such as helix, sheet and turn are learned by HMMs,and these HMMs are applied to new sequences whose structuresare unknown. The output probabilities from the HMMs are usedto predict the secondary structures of the sequences. The authorstested this prediction system on 100 sequences from a publicdatabase (Brookhaven PDB). Although the implementation is ‘withoutgrammar’ (no rule for the appearance patterns of secondarystructure) the result was reasonable.  相似文献   

19.
All popular algorithms of pair-wise alignment of protein primary structures (e.g. Smith-Waterman (SW), FASTA, BLAST, et al.) utilize only amino acid sequences. The SW-algorithm is the most accurate among them, i.e. it produces alignments that are most similar to the alignments obtained by superposition of protein 3D-structures. But even the SW-algorithm is unable to restore the 3D-based alignment if similarity of amino acid sequences (%id) is below 30%. We have proposed a novel alignment method that explicitly takes into account the secondary structure of the compared proteins. We have shown that it creates significantly more accurate alignments compared to SW-algorithm. In particular, for sequences with %id < 30% the average accuracy of the new method is 58% compared to 35% for SW-algorithm (the accuracy of an algorithmic sequence alignment is the part of restored position of a "golden standard" alignment obtained by superposition of corresponding 3D-structures). The accuracy of the proposed method is approximately identical both for experimental, and for theoretically predicted secondary structures. Thus the method can be applied for alignment of protein sequences even if protein 3D-structure is unknown. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

20.
The main work of this paper is to propose a new theory and method, which is based on the idea of the pseudo-amino acid composition, for phylogenetic analysis of DNA primary sequences. In our method, we revise the part of the occurrence frequency of 20 amino acids in the method of the pseudo-amino acid composition by replacing the frequency of 16 dinucleotides. And we select eight LZ complexity factors of eight (0,1) sequences of a DNA primary sequence as PseAA components. Finally, we characterize a DNA sequence with a 24-dimensional vector. We reconstruct the phylogenetic trees of two datasets. The results show that our method is efficient and significant.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号