首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Maximum likelihood (ML) phylogenies based on 9,957 amino acid (AA) sites of 45 proteins encoded in the plastid genomes of Cyanophora, a diatom, a rhodophyte (red algae), a euglenophyte, and five land plants are compared with respect to several properties of the data, including between-site rate variation and aberrant amino acid composition in individual species. Neighbor-joining trees from AA LogDet distances and ML analyses are seen to be congruent when site rate variability was taken into account. Four feasible trees are identified in these analyses, one of which is preferred, and one of which is almost excluded by statistical criteria. A transition probability matrix for the general reversible Markov model of amino acid substitutions is estimated from the data, assuming each of these four trees. In all cases, the tree with diatom and rhodophyte as sister taxa was clearly favored. The new transition matrix based on the best tree, called cpREV, takes into account distinct substitution patterns in plastid-encoded proteins and should be useful in future ML inferences using such data. A second rate matrix, called cpREV*, based on a weighted sum of rate matrices from different trees, is also considered. Received: 3 June 1999 / Accepted: 26 November 1999  相似文献   

2.
Summary Several forms of maximum likelihood models are applied to aligned amino acid sequence data coded for in the mitochondrial DNA of six species (chicken, frog, human, bovine, mouse, and rat). These models range in form from relatively simple models of the type currently used for inferring phylogenetic tree structure to models more complex than those that have been used previously. No major discrepancies between the optimal trees inferred by any of these methods are found, but there are huge differences in adequacy of fit. A very significant finding is that the fit of any of these models is vastly improved by allowing a certain proportion of the amino acid sites to be invariant. An even more important, although disquieting, finding is that none of these models fits well, as judged by standard statistical criteria. The primary reason for this is that amino acid sites undergo substitution according to a process that is very heterogeneous. Because most phylogenetic inference is accomplished by choosing the optimal tree under the assumption that a homogeneous process is acting on the sites, the potential invalidity of some such conclusions is raised by this article's results. The seriousness of this problem depends upon the robustness of the phylogenetic inferential procedure to departures from the underlying model.  相似文献   

3.
4.
Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers. Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates, however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States. The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1 subtype B env sequences. Received: 4 October 2000 / Accepted: 1 March 2001  相似文献   

5.
A simple nearly neutral mutation model of protein evolution was studied using computer simulation assuming a constant population size. In this model, a gene consists of a finite number of codons and there is no recombination within a gene. Each codon has two replacement and one silent sites. The fitness of a gene was determined multiplicatively by amino acids specified by codons (the independent multicodon model). Nucleotide diversity at replacement sites decreases as selection becomes stronger. A reduction of nucleotide diversity at silent sites also occurs as selection intensifies but the magnitude of the reduction is not a monotone function of the intensity of selection. The dispersion index is close to one. The average value of Tajima's and Fu and Li's statistics are negative and their absolute values increases as selection intensifies. However, their powers of detecting selection under the present model were not high unless the number of sites is large or mutation rate is high. The MK test was shown to detect intermediate selection fairly well. For comparison, the house-of-cards model was also investigated and its behavior was shown to be more sensitive to changes of population size than that of the independent multicodon model. The relevance of the present model for explaining protein evolution was discussed comparing its prediction and recent DNA data. Received: 24 May 1999 / Accepted: 17 August 1999  相似文献   

6.
A higher rate of molecular evolution in rodents than in primates at synonymous sites and, to a lesser extent, at amino acid replacement sites has been reported previously for most nuclear genes examined. Thus in these genes the average ratio of amino acid replacement to synonymous substitution rates in rodents is lower than in primates, an observation at odds with the neutral model of molecular evolution. Under Ohta's mildly deleterious model of molecular evolution, these observations are seen as the consequence of the combined effects of a shorter generation time (driving a higher mutation rate) and a larger effective population size (resulting in more effective selection against mildly deleterious mutations) in rodents. The present study reports the results of a maximum-likelihood analysis of the ratio of amino acid replacements to synonymous substitutions for genes encoded in mitochondrial DNA (mtDNA) in these two lineages. A similar pattern is observed: in rodents this ratio is significantly lower than in primates, again consistent only with the mildly deleterious model. Interestingly the lineage-specific difference is much more pronounced in mtDNA-encoded than in nuclear-encoded proteins, an observation which is shown to run counter to expectation under Ohta's model. Finally, accepting certain fossil divergence dates, the lineage-specific difference in amino acid replacement-to-synonymous substitution ratio in mtDNA can be partitioned and is found to be entirely the consequence of a higher mutation rate in rodents. This conclusion is consistent with a replication-dependent model of mutation in mtDNA. Received: 24 September 1999 / Accepted: 18 September 2000  相似文献   

7.
Algorithmic details to obtain maximum likelihood estimates of parameters on a large phylogeny are discussed. On a large tree, an efficient approach is to optimize branch lengths one at a time while updating parameters in the substitution model simultaneously. Codon substitution models that allow for variable nonsynonymous/synonymous rate ratios (ω=d N/d S) among sites are used to analyze a data set of human influenza virus type A hemagglutinin (HA) genes. The data set has 349 sequences. Methods for obtaining approximate estimates of branch lengths for codon models are explored, and the estimates are used to test for positive selection and to identify sites under selection. Compared with results obtained from the exact method estimating all parameters by maximum likelihood, the approximate methods produced reliable results. The analysis identified a number of sites in the viral gene under diversifying Darwinian selection and demonstrated the importance of including many sequences in the data in detecting positive selection at individual sites. Received: 25 April 2000 / Accepted: 24 July 2000  相似文献   

8.
We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.  相似文献   

9.
Zhou XX  Wang YB  Pan YJ  Li WF 《Amino acids》2008,34(1):25-33
Summary. Thermophilic proteins show substantially higher intrinsic thermal stability than their mesophilic counterparts. Amino acid composition is believed to alter the intrinsic stability of proteins. Several investigations and mutagenesis experiment have been carried out to understand the amino acid composition for the thermostability of proteins. This review presents some generalized features of amino acid composition found in thermophilic proteins, including an increase in residue hydrophobicity, a decrease in uncharged polar residues, an increase in charged residues, an increase in aromatic residues, certain amino acid coupling patterns and amino acid preferences for thermophilic proteins. The differences of amino acids composition between thermophilic and mesophilic proteins are related to some properties of amino acids. These features provide guidelines for engineering mesophilic protein to thermophilic protein. Authors’ addresses: Yuan-Jiang Pan, Institute of Chemical Biology and Pharmaceutical Chemistry, Zhejiang University, Zhejiang University Road 38, Hangzhou 310027, China; Wei-Fen Li, Microbiology Division, College of Animal Science, Zhejiang University, Hangzhou 310029, China  相似文献   

10.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

11.
The vertebrates are traditionally classified into two distinct groups, Agnatha (jawless vertebrates) and Gnathostomata (jawed vertebrates). Extant agnathans are represented by hagfishes (Myxiniformes) and lampreys (Petromyzontiformes), frequently grouped together within the Cyclostomata. Whereas the recognition of the Gnathostomata as a clade is commonly acknowledged, a consensus has not been reached regarding whether or not Cyclostomata represents a clade. In the present study we have used newly established sequences of the protein-coding genes of the mitochondrial DNA molecule of the hagfish to explore agnathan and gnathostome relationships. The phylogenetic analysis of Pisces, using echinoderms as outgroup, placed the hagfish as a sister group of Vertebrata sensu stricto, i.e., the lamprey and the gnathostomes. The phylogenetic analysis of the Gnathostomata identified a basal divergence between gnathostome fishes and a branch leading to birds and mammals, i.e., between ``Anamnia' and Amniota. The lungfish has a basal position among gnathostome fishes with the teleosts as the most recently evolving lineage. The findings portray a hitherto unrecognized polarity in the evolution of bony fishes. The presently established relationships are incompatible with previous molecular studies. Received: 15 August 1997 / Accepted: 1 October 1997  相似文献   

12.
Circular permutations of genes during molecular evolution often are regarded as elusive, although a simple model can explain these rearrangements. The model assumes that first a gene duplication of the precursor gene occurs in such a way that both genes become fused in frame, leading to a tandem protein. After generation of a new start codon within the 5′ part of the tandem gene and a stop at an equivalent position in the 3′ part of the gene, a protein is encoded that represents a perfect circular permutation of the precursor gene product. The model is illustrated here by the molecular evolution of adenine-N6 DNA methyltransferases. β- and γ-type enzymes of this family can be interconverted by a single circular permutation event. Interestingly, tandem proteins, proposed as evolutionary intermediates during circular permutation, can be directly observed in the case of adenine methyltransferases, because some enzymes belonging to type IIS, like the FokI methyltransferase, are built up by two fused enzymes, both of which are active independently of each other. The mechanism for circular permutation illustrated here is very easy and applicable to every protein. Thus, circular permutation can be regarded as a normal process in molecular evolution and a changed order of conserved amino acid motifs should not be interpreted to argue against divergent evolution. Received: 17 November 1998 / Accepted: 19 February 1999  相似文献   

13.
Synonymous substitution rates in mitochondrial and nuclear genes of Drosophila were compared. To make accurate comparisons, we considered the following: (1) relative synonymous rates, which do not require divergence time estimates, should be used; (2) methods estimating divergence should take into account base composition; (3) only very closely related species should be used to avoid effects of saturation; (4) the heterogeneity of rates should be examined. We modified the methods estimating synonymous substitution numbers to account for base composition bias. By using these methods, we found that mitochondrial genes have 1.7–3.4 times higher synonymous substitution rates than the fastest nuclear genes or 4.5–9.0 times higher rates than the average nuclear genes. The average rate of synonymous transversions was 2.7 (estimated from the melanogaster species subgroup) or 2.9 (estimated from the obscura group) times higher in mitochondrial genes than in nuclear genes. Synonymous transversions in mitochondrial genes occurred at an approximately equivalent rate to those in the fastest nuclear genes. This last result is not consistent with the hypothesis that the difference in turnover rates between mitochondrial and nuclear genomes is the major factor determining higher synonymous substitution rates in mtDNA. We conclude that the difference in synonymous substitution rates is due to a combination of two factors: a higher transitional mutation rate in mtDNA and constraints on nuclear genes due to selection for codon usage. Received: 27 November 1996 / Accepted: 8 May 1997  相似文献   

14.
15.
The relationship between 3-deoxy-D-manno-2-octulosonic acid 8-phosphate (KDO 8-P) synthase and 3-deoxy-D-arabino-2-heptulosonic acid 7-phosphate (DAH 7-P) synthase has not been adequately addressed in the literature. Based on recent reports of a metal requiring KDO 8-P synthase and the newly solved X-ray crystal structures of both Escherichia coli KDO 8-P synthase and DAH 7-P synthase, we begin to address the evolutionary kinship between these catalytically similar enzymes. Using a maximum likelihood-based grouping of 29 KDO 8-P synthase sequences, we demonstrate the existence of a new class of KDO 8-P synthase, the members of which we propose to require a metal cofactor for catalysis. Similarly, we hypothesize a class of DAH 7-P synthase that does not have the metal requirement of the heretofore model E. coli enzyme. Based on this information and a careful investigation of the reported X-ray crystal structures, we also propose that KDO 8-P synthase and DAH 7-P synthase are the product of a divergent evolutionary process from a common ancestor.  相似文献   

16.
The cattle genome contains several distinct centromeric satellites with interrelated evolutionary histories. We compared these satellites in Bovini species that diverged 0.2 to about 5 Myr ago. Quantification of hybridization signals by phosphor imaging revealed a large variation in the relative amounts of the major satellites. In the genome of water buffalo this has led to the complete deletion of satellite III. Comparative sequencing and PCR-RFLP analysis of satellites IV, 1.711a, and 1.711b from the related Bos and Bison species revealed heterogeneities in 0.5 to 2% of the positions, again with variations in the relative amounts of sequence variants. Restriction patterns generated by double digestions suggested a recombination of sequence variants. Our results are compatible with a model of the life history of satellites during which homogeneity of interacting repeat units is both cause and consequence of the rapid turnover of satellite DNA. Initially, a positive feedback loop leads to a rapid saltatory amplification of homogeneous repeat units. In the second phase, mutations inhibit the interaction of repeat units and coexisting sequence variants amplify independently. Homogenization by the spreading of one of the variants is prevented by recombination and the satellite is eventually outcompeted by another, more homogeneous tandem repeat sequence. Received: 21 July 2000 / Accepted: 30 October 2000  相似文献   

17.
The mammalian defensin molecule is a short, highly cationic peptide cytotoxic to both microbial and mammalian cells which is cleaved from a precursor including a signal peptide and a highly anionic propiece. A phylogenetic analysis of 28 complete sequences from five mammalian species (mouse, rat, guinea pig, rabbit, and human) showed species-specific clusters of sequences, indicating that the genes duplicated after divergence of these species. Comparison of rates of synonymous and nonsynonymous nucleotide substitution suggested that gene duplication has often been followed by a period in which diversification of the mature defensins at the amino acid level has been selectively favored. In some comparisons, it appeared that amino acid differences in this region have appeared in a nonrandom fashion so as to change the pattern of residue charges. Because it has been hypothesized that the negative charge in the propiece serves to balance the positive charge in the mature defensin and thus to prevent cytotoxicity prior to cleavage, we used a maximum likelihood method of reconstructing ancestral states in order to test whether this balance has been maintained over evolutionary time in spite of rapid diversification of the mature defensin at the amino acid level. Reconstructed ancestral sequences always maintained a charge balance between mature defensin and propiece, and changes in the net positive charge of the mature defensin were balanced by corresponding changes in the propiece. The results support the hypothesis that, in the evolution of these proteins, amino acid changes have occurred in a coordinated fashion so as to preserve an adaptive phenotype. Received: 23 October 1996 / Accepted: 7 January 1997  相似文献   

18.
Summary The relative abundances among the amino acids, which are functionally similar to one another, were explained by random partition of a unit interval.  相似文献   

19.
In viruses an increased coding ability is provided by overlapping genes, in which two alternative open reading frames (ORFs) may be translated to yield two distinct proteins. The identification of signature sequences in overlapping genes is a topic of particular interest, since additional out-of-frame coding regions can be nested within known genes. In this work, a novel feature peculiar to overlapping coding regions is presented. It was detected by analysis of a sample set of 21 virus genomic sequences and consisted in the repeated occurrence of a cluster of basic amino acid residues, encoded by a frame, combined to a stretch of acidic residues, encoded by the corresponding overlapping frame. A computer scan of an additional set of virus sequences demonstrated that this feature is common to several other known overlapping ORFs and led to prediction of a novel overlapping gene in hepatitis G virus (HGV). The occurrence of a bifunctional coding region in HGV was also supported by its extremely lower rate of synonymous nucleotide substitutions compared to that observed in the other gene regions of the HGV genome. Analysis of the amino acid sequence that was deduced from the putative overlapping gene revealed a high content of basic residues and the presence of a nuclear targeting signal; these characteristics suggest that a core-like protein may be expressed by this novel ORF. Received: 21 July 1999 / Accepted: 26 October 1999  相似文献   

20.
A model of nucleotide substitution that allows the transition/transversion rate bias to vary across sites was constructed. We examined the fit of this model using likelihood-ratio tests by analyzing 13 protein coding genes and 1 pseudogene. Likelihood-ratio testing indicated that a model that allows variation in the transition/transversion rate bias across sites provided a significant improvement in fit for most protein coding genes but not for the pseudogene. When the analysis was repeated with parameters estimated separately for first, second, and third codon positions, strong heterogeneity was uncovered for the first and second codon positions; the variation in the transition/transversion rate was generally weaker at the third codon position. The transition rate bias and branch lengths are underestimated when variation in the transition/transversion rate was not accommodated, suggesting that it may be important to accommodate variation in the pattern of nucleotide substitution for accurate estimation of evolutionary parameters. Received: 4 November 1997 / Accepted: 19 May 1998  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号