首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. RESULTS: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.  相似文献   

2.
Freeing phylogenies from artifacts of alignment.   总被引:1,自引:0,他引:1  
Widely used methods for phylogenetic inference, both those that require and those that produce alignments, share certain weaknesses. These weaknesses are discussed, and a method that lacks them is introduced. For each pair of sequences in the data set, the method utilizes both insertion-deletion and amino acid replacement information to estimate a pairwise evolutionary distance. It is also possible to allow regional heterogeneity of replacement rates. Because a likelihood framework is adopted, the standard deviation of each pairwise distance can be estimated. The distance matrix and standard error estimates are used to infer a phylogenetic tree. As an example, this method is used on 10 widely diverged sequences of the second largest RNA polymerase subunit. A pseudo-bootstrap technique is devised to assess the validity of the inferred phylogenetic tree.  相似文献   

3.
Amino acid sequences of peptides are often inferred from their amino acid compositions by comparison with homologous peptides of known sequence. The probabilities are considered that by such an approach errors are made due to the occurrence of balanced double changes, i.e. reciprocal substitutions, between two homologous peptides of identical compositions. Formulae are derived for the calculation of these probabilities, depending on peptide length and evolutionary distance. However, such calculations requiring too much computer time, the probabilities for reciprocal substitutions are estimated by simulation of evolutionary changes in peptides. It can be concluded from the resulting data that for many purposes the possible errors in amino acid sequences partially inferred from amino acid compositions are acceptably small.  相似文献   

4.
以甘蓝型油菜新鲜嫩叶为实验材料提取其总DNA,以其为模板,根据拟南芥Toc33基因编码区序列设计引物,PCR扩增甘蓝型油菜叶绿体外膜蛋白转运机器的构件蛋白基因Toc33,得到两条扩增带,测序结果显示克隆到的两个片段分别长1370bp、1490bp,将这两个片段分别命名为Bn Tpc33-1,Bn Toc33-2,序列比较发现它们之间的同源性为78%,其中外显子的同源性为96%,而内含子的同源性仅为60%。为研究Toc33与同一基因家族的Toc34基因功能间的关系,对拟南芥、油菜、诸葛菜等植物的Toc33、Toc34蛋白序列进行比较分析并构建了分子系统进化树。  相似文献   

5.
杨子恒 《遗传学报》1994,21(3):198-200
本文考察了目前采用的估计同源蛋白质序列间进化距离的方法缺陷,并提出了几个新的计算公式,它们考虑了氨基酸位点间显然存在的替代速率的差异。另外,提出了一种考虑氨基酸间不同替代概率的最大似然估计方法。文中对这些公式进行了计算比较,并对它在实际中的运用提出了建议。  相似文献   

6.
Using a maximum-likelihood formalism, we have developed a method with which to reconstruct the sequences of ancestral proteins. Our approach allows the calculation of not only the most probable ancestral sequence but also of the probability of any amino acid at any given node in the evolutionary tree. Because we consider evolution on the amino acid level, we are better able to include effects of evolutionary pressure and take advantage of structural information about the protein through the use of mutation matrices that depend on secondary structure and surface accessibility. The computational complexity of this method scales linearly with the number of homologous proteins used to reconstruct the ancestral sequence.  相似文献   

7.
Reconstructing evolution of sequences subject to recombination using parsimony   总被引:14,自引:0,他引:14  
The parsimony principle states that a history of a set of sequences that minimizes the amount of evolution is a good approximation to the real evolutionary history of the sequences. This principle is applied to the reconstruction of the evolution of homologous sequences where recombinations or horizontal transfer can occur. First it is demonstrated that the appropriate structure to represent the evolution of sequences with recombinations is a family of trees each describing the evolution of a segment of the sequence. Two trees for neighboring segments will differ by exactly the transfer of a subtree within the whole tree. This leads to a metric between trees based on the smallest number of such operations needed to convert one tree into the other. An algorithm is presented that calculates this metric. This metric is used to formulate a dynamic programming algorithm that finds the most parsimonious history that fits a given set of sequences. The algorithm is potentially very practical, since many groups of sequences defy analysis by methods that ignore recombinations. These methods give ambiguous or contradictory results because the sequence history cannot be described by one phylogeny, but only a family of phylogenies that each describe the history of a segment of the sequences. The generalization of the algorithm to reconstruct gene conversions and the possibility for heuristic versions of the algorithm for larger data sets are discussed.  相似文献   

8.
9.
Each amino acid in a protein is considered to be an individual, mutable characteristic of the species from which the protein is extracted. For a branching tree representing the evolutionary history of the known sequences in different species, our computer programs use majority logic and parsimony of mutations to determine the most likely ancestral amino acid for each position of the protein at each node of the tree. The number of mutations necessary between the ancestral and present species is summed for each branch and the entire tree. The programs then move branches to make many different configurations, from which we select the one with the minimum number of mutations as the most likely evolutionary history. We used this method to elucidate primate phylogeny from sequences of fibrinopeptides, carbonic anhydrase, and the hemoglobin beta, delta and alpha chains. All available sequences indicate that the early Pongidae had diverged into two lines before the divergence of an ancestor for the human line alone. We have constructed some probable ancestral sequences at major points during primate evolution and have developed tentative trees showing the order of divergences and evolutionary distances among primate groups. Further questions on primate evolution could be answered in the future by the detemination of the appropriate sequences.  相似文献   

10.
Statistical methods for computing the standard errors of the branching points of an evolutionary tree are developed. These methods are for the unweighted pair-group method-determined (UPGMA) trees reconstructed from molecular data such as amino acid sequences, nucleotide sequences, restriction-sites data, and electrophoretic distances. They were applied to data for the human, chimpanzee, gorilla, orangutan, and gibbon species. Among the four different sets of data used, DNA sequences for an 895-nucleotide segment of mitochondrial DNA (Brown et al. 1982) gave the most reliable tree, whereas electrophoretic data (Bruce and Ayala 1979) gave the least reliable one. The DNA sequence data suggested that the chimpanzee is the closest and that the gorilla is the next closest to the human species. The orangutan and gibbon are more distantly related to man than is the gorilla. This topology of the tree is in agreement with that for the tree obtained from chromosomal studies and DNA-hybridization experiments. However, the difference between the branching point for the human and the chimpanzee species and that for the gorilla species and the human-chimpanzee group is not statistically significant. In addition to this analysis, various factors that affect the accuracy of an estimated tree are discussed.   相似文献   

11.
A new method for detecting site-specific variation of evolutionary rate (the so-called covarion process) from protein sequence data is proposed. It involves comparing the maximum-likelihood estimates of the replacement rate of an amino acid site in distinct subtrees of a large tree. This approach allows detection of covarion at the gene or the amino acid levels. The method is applied to mammalian-mitochondrial-protein sequences. Significant covarion-like evolution is found in the (simian) primate lineage: some amino acid positions are fast-evolving (i.e. unconstrained) in non-primate mammals but slow-evolving (i.e. highly constrained) in primates, and some show the opposite pattern. Our results indicate that the mitochondrial genome of primates reached a new peak of the adaptive landscape through positive selection.  相似文献   

12.
A protein phylogenetic tree was constructed from 24 homologous proteinase inhibitor I sequences identified in the EMBUGenbank and Swiss-Prot databases and from translated amino acid data from four constitutive cDNA clones of proteinase inhibitor I characterized from potato tuber mRNA. The tree suggests that divergence of at least four paralogous proteins with functional specialization occurred at different times during the evolutionary history of the proteinase inhibitor I family. Five distinct regions in the primary structure, earlier identified by structural studies, were used to analyze the inhibitor family for hypervariability (Creighton and Darby, Trends Biochem Sci 14:319–324, 1989). Mutations did not occur with higher-than-random frequency within the proteinase binding region. When isoinhibitor, orthologous, or paralogous data subsets were subsequently analyzed the same results were obtained. Comparison of the amino acid sequences for all the known potato proteinase isoinhibitor I proteins identified ten highly variable sites. These also were distributed randomly. Thus hypervariability, which has been observed in all other serine proteinase inhibitor families to date, appears to be lacking in the proteinase inhibitor I family.  相似文献   

13.
A new D-type retrovirus originally designated SAIDS-D/Washington and here referred to as retrovirus-D/Washington (R-D/W) was recently isolated at the University of Washington Primate Center, Seattle, Wash., from a rhesus monkey with an acquired immunodeficiency syndrome and retroperitoneal fibromatosis. To better establish the relationship of this new D-type virus to the prototype D-type virus, Mason-Pfizer monkey virus (MPMV), we have purified and compared six structural proteins from each virus. The proteins purified from each D-type retrovirus include p4, p10, p12, p14, p27, and a phosphoprotein designated pp18 for MPMV and pp20 for R-D/W. Amino acid analysis and N-terminal amino acid sequence analysis show that the p4, p12, p14, and p27 proteins of R-D/W are distinct from the homologous proteins of MPMV but that these proteins from the two different viruses share a high degree of amino acid sequence homology. The p10 proteins from the two viruses have similar amino acid compositions, and both are blocked to N-terminal Edman degradation. The phosphoproteins from the two viruses each contain phosphoserine but are different from each other in amino acid composition, molecular weight, and N-terminal amino acid sequence. The data thus show that each of the R-D/W proteins examined is distinguishable from its MPMV homolog and that a major difference between these two D-type retroviruses is found in the viral phosphoproteins. The N-terminal amino acid sequences of D-type retroviral proteins were used to search for sequence homologies between D-type and other retroviral amino acid sequences. An unexpected amino acid sequence homology was found between R-D/W pp20 (a gag protein) and a 28-residue segment of the env precursor polyprotein of Rous sarcoma virus. The N-terminal amino acid sequences of the D-type major gag protein (p27) and the nucleic acid-binding protein (p14) show only limited amino acid sequence homology to functionally homologous proteins of C-type retroviruses.  相似文献   

14.
Two ways of estimating superimposed fixed mutations in the divergent descent of proteins are examined. One method counts these in terms of a Poisson process operating within selective constraints. The other uses the maximum parsimony method to connect the contemporary sequences through intervening ancestral sequences in an evolutionary tree, and then, from the distribution of fixed mutations in dense regions of this genealogy, estimates how many fixations should be added to sparse regions. An algorithm is described which determines such augmented distances. The two methods yield similar estimates of genetic divergence when tested on a series of cytochrome c amino acid sequences. Within those constraints imposed by Darwinian selection, the dynamic behavior of the evolutionary divergence of proteins is described by the probabilistic pathways of the stochastic model. The parsimony model provides a valid Aufbau-Prinzip for examining which of those pathways occurred along a particular lineage. Concordance of the numerical magnitudes of genetic divergence estimates made by the two methods reveals them as logically consistent complements, not as mutually exclusive antagonists. Both methods indicate that cytochrome c has evolved in a non-uniform manner over geological time and more rapidly than previously estimated.  相似文献   

15.
L F Wu  A Reizer  J Reizer  B Cai  J M Tomich    M H Saier  Jr 《Journal of bacteriology》1991,173(10):3117-3127
The fruK gene encoding fructose-1-phosphate kinase (FruK), located within the fructose (fru)-catabolic operon of Rhodobacter capsulatus, was sequenced. FruK of R. capsulatus (316 amino acids; molecular weight = 31,232) is the same size as and is homologous to FruK of Escherichia coli, phosphofructokinase B (PfkB) of E. coli, phosphotagatokinase of Staphylococcus aureus, and ribokinase of E. coli. These proteins therefore make up a family of homologous proteins, termed the PfkB family. A phylogenetic tree for this new family was constructed. Sequence comparisons plus chemical inactivation studies suggested the lack of involvement of specific residues in catalysis. Although the Rhodobacter FruK differed markedly from the other enzymes within the PfkB family with respect to amino acid composition, these enzymes exhibited similar predicted secondary structural features. A large internal segment of the Rhodobacter FruK was found to be similar in sequence to the domain bearing the sugar bisphosphate-binding region of the large subunit of ribulose 1,5-bisphosphate carboxylase/oxygenase of plants and bacteria. Proteins of the PfkB family did not exhibit statistically significant sequence identity with PfkA of E. coli. PfkA, however, is homologous to other prokaryotic and eukaryotic ATP- and PPi-dependent Pfks (the PfkA family). These eukaryotic, ATP-dependent enzymes each consist of a homotetramer (mammalian) or a heterooctamer (yeasts), with each subunit containing an internal duplication of the size of the entire PfkA protein of E. coli. In some of these enzymes, additional domains are present. A phylogenetic tree was constructed for the PfkA family and revealed that the bacterial enzymes closely resemble the N-terminal domains of the eukaryotic enzyme subunits whereas the C-terminal domains have diverged more extensively. The PPi-dependent Pfk of potato is only distantly related to the ATP-dependent enzymes. On the basis of their similar functions, sizes, predicted secondary structures, and sequences, we suggest that the PfkA and PfkB families share a common evolutionary origin.  相似文献   

16.
Estimation of primate speciation dates using local molecular clocks   总被引:16,自引:0,他引:16  
Protein-coding genes of the mitochondrial genomes from 31 mammalian species were analyzed to estimate the speciation dates within primates and also between rats and mice. Three calibration points were used based on paleontological data: one at 20-25 MYA for the hominoid/cercopithecoid divergence, one at 53-57 MYA for the cetacean/artiodactyl divergence, and the third at 110-130 MYA for the metatherian/eutherian divergence. Both the nucleotide and the amino acid sequences were analyzed, producing conflicting results. The global molecular clock was clearly violated for both the nucleotide and the amino acid data. Models of local clocks were implemented using maximum likelihood, allowing different evolutionary rates for some lineages while assuming rate constancy in others. Surprisingly, the highly divergent third codon positions appeared to contain phylogenetic information and produced more sensible estimates of primate divergence dates than did the amino acid sequences. Estimated dates varied considerably depending on the data type, the calibration point, and the substitution model but differed little among the four tree topologies used. We conclude that the calibration derived from the primate fossil record is too recent to be reliable; we also point out a number of problems in date estimation when the molecular clock does not hold. Despite these obstacles, we derived estimates of primate divergence dates that were well supported by the data and were generally consistent with the paleontological record. Estimation of the mouse-rat divergence date, however, was problematic.  相似文献   

17.
In the field of phylogenetics and comparative genomics, it is important to establish orthologous relationships when comparing homologous sequences. Due to the slight sequence dissimilarity between orthologs and paralogs, it is prone to regarding paralogs as orthologs. For this reason, several methods based on evolutionary distance, phylogeny and BLAST have tried to detect orthologs with more precision. Depending on their algorithmic implementations, each of these methods sometimes has increased false negative or false positive rates. Here, we developed a novel algorithm for orthology detection that uses a distance method based on the phylogenetic criterion of minimum evolution. Our algorithm assumes that sets of sequences exhibiting orthologous relationships are evolutionarily less costly than sets that include one or more paralogous relationships. Calculation of evolutionary cost requires the reconstruction of a neighbor-joining (NJ) tree, but calculations are unaffected by the topology of any given NJ tree. Unlike tree reconciliation, our algorithm appears free from the problem of incorrect topologies of species and gene trees. The reliability of the algorithm was tested in a comparative analysis with two other orthology detection methods using 95 manually curated KOG datasets and 21 experimentally verified EXProt datasets. Sensitivity and specificity estimates indicate that the concept of minimum evolution could be valuable for the detection of orthologs.  相似文献   

18.
The evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluating the importance of this site in maintaining the structure/function of the protein. When evolutionary rates are estimated, one must reconstruct the phylogenetic tree describing the evolutionary relationship among the sequences under study. However, if the inferred phylogenetic tree is incorrect, it can lead to erroneous site-specific rate estimates. Here we describe a novel Bayesian method that uses Markov chain Monte Carlo methodology to integrate over the space of all possible trees and model parameters. By doing so, the method considers alternative evolutionary scenarios weighted by their posterior probabilities. We show that this comprehensive evolutionary approach is superior over methods that are based on only a single tree. We illustrate the potential of our algorithm by analyzing the conservation pattern of the potassium channel protein family.Itay Mayrose, Amir Mitchell contributed equal. Reviewing Editor : Dr. Nicolas Galtier  相似文献   

19.
A comparative analysis is presented of 24 known amino acid sequences of RNA-dependent RNA polymerases of positive strand RNA viruses infecting animals, plants and bacteria. Using a newly proposed methodology of group alignment for weakly similar sequences, evolutionary conserved fragments of all these proteins were unambiguously aligned. A unique pattern (consensus) of 7 invariant amino acid residues was revealed which is absent from the sequences of other RNA and DNA polymerases and is thought to unequivocally identify the RNA-dependent RNA polymerases of positive strand RNA viruses. Based on the obtained alignment a tentative phylogenetic tree of viral RNA polymerases was constructed for the first time. The RNA-dependent RNA polymerases of positive strand RNA viruses are concluded to comprise a distinct family of evolutionary related proteins.  相似文献   

20.
Alignment of whole genomes.   总被引:28,自引:4,他引:24       下载免费PDF全文
A new system for aligning whole genome sequences is described. Using an efficient data structure called a suffix tree, the system is able to rapidly align sequences containing millions of nucleotides. Its use is demonstrated on two strains of Mycoplasma tuberculosis, on two less similar species of Mycoplasma bacteria and on two syntenic sequences from human chromosome 12 and mouse chromosome 6. In each case it found an alignment of the input sequences, using between 30 s and 2 min of computation time. From the system output, information on single nucleotide changes, translocations and homologous genes can easily be extracted. Use of the algorithm should facilitate analysis of syntenic chromosomal regions, strain-to-strain comparisons, evolutionary comparisons and genomic duplications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号