首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.  相似文献   

2.
Study of structure/function relationships constitutes an important field of research, especially for modification of protein function and drug design. However, the fact that rational design (i.e. the modification of amino acid sequences by means of directed mutagenesis, based on knowledge of the three-dimensional structure) appears to be much less efficient than irrational design (i.e. random mutagenesis followed by in vitro selection) clearly indicates that we understand little about the relationships between primary sequence, three-dimensional structure and function. The use of evolutionary approaches and concepts will bring insights to this difficult question. The increasing availability of multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico evolutionary methods to predict details of protein function in duplicated (paralogous) proteins. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogs. It has been proposed that the positions that show switches in substitution rate over time--i.e., 'heterotachous sites'--are good indicators of functional divergence. However, it appears that heterotachy is a much more general process, since most variable sites of homologous proteins with no evidence of functional shift are heterotachous. Similarly, it appears that switches in substitution rate are as frequent when paralogous sequences are compared as when orthologous sequences are compared. Heterotachy, instead of being indicative of functional shift, may more generally reflect a less specific process related to the many intra- and inter-molecular interactions compatible with a range of more or less equally viable protein conformations. These interactions will lead to different constraints on the nature of the primary sequences, consistently with theories suggesting the non-independence of substitutions in proteins. However, a specific type of amino acid variation might constitute a good indicator of functional divergence: substitutions occurring at positions that are generally slowly evolving. Such substitutions at constrained sites are indeed much more frequent soon after gene duplication. The identification and analysis of these sites by complementing structural information with evolutionary data may represent a promising direction to future studies dealing with the functional characterization of an ever increasing number of multi-gene families identified by complete genome analysis.  相似文献   

3.
We report the presence of four nuclear paralogs of a 380-bp segment of cytochrome b in callitrichine primates (marmosets and tamarins). The mitochondrial cytochrome b sequence and each nuclear paralog were obtained from several species, allowing multiple comparisons of rates and patterns of substitution both between mitochondrial and nuclear sequences and among nuclear sequences. The mitochondrial DNA had high overall rates of molecular evolution and a strong bias toward substitutions at third codon positions. Rates of molecular evolution among the nuclear sequences were low and constant, and there were small differences in substitution patterns among the nuclear clades which were probably attributable to the small number of sites involved. A novel method of phylogenetic reconstruction based on the large difference in rates of evolution at different codon positions among mitochondrial and nuclear clades was used to determine whether different nuclear paralogs represent independent transposition events or duplications following a single insertion. This method is generally applicable in cases where differences in pattern of molecular evolution are known, and it showed that at least three of the four nuclear clades represent independent transposition events. The insertion events giving rise to two of the nuclear clades predate the divergence of the callitrichines, whereas those leading to the other two nuclear clades may have occurred in the common ancestor of marmosets.  相似文献   

4.
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fraction of the complexity involved or overfitting models that add little resolution to the problem. Such oversimplified models may lead us to assign high confidence to an incorrect tree (inconsistency). Rate-across-site (RAS) models are commonly used evolutionary models in phylogenetic studies. These account for heterogeneity in the evolutionary rates among sites but do not account for changing within-site rates across lineages (heterotachy). If heterotachy is common, using RAS models may lead to systematic errors in tree inference. In this work we show possible misleading effects in tree inference when the assumption of constant within-site rates across lineages is violated using maximum likelihood. Using a simulation study, we explore the ways in which gamma stationary models can lead to wrong topology or to deceptive bootstrap support values when the within-site rates change across lineages. More precisely, we show that different degrees of heterotachy mislead phylogenetic inference when the model assumed is stationary. Finally, we propose a geometry-based approach to visualize and to test for the possible existence of bias due to heterotachy.  相似文献   

5.
Sequence alignments of multiple genes are routinely used to infer phylogenetic relationships among species. The analysis of their concatenation is more likely to give correct results under an assumption of homotachy (i.e., the evolutionary rates within lineages in each of the concatenated genes are constant during evolution). Here, we examine how the violation of homotachy (i.e., presence of within-site rate variation, called heterotachy) distorts species phylogenies. A theoretical examination has been conducted using a four taxon case and the neighbor joining (NJ) method, concluding that NJ recovers the incorrect tree when concatenated genes exhibit heterotachy. The application of average and weighted-average distance approaches, where gene boundaries are kept intact, overcomes the detrimental effect of heterotachy in multigene analysis using the NJ method.  相似文献   

6.
Granule-bound starch synthase: structure, function, and phylogenetic utility   总被引:18,自引:2,他引:16  
Interest in the use of low-copy nuclear genes for phylogenetic analyses of plants has grown rapidly, because highly repetitive genes such as those commonly used are limited in number. Furthermore, because low- copy genes are subject to different evolutionary processes than are plastid genes or highly repetitive nuclear markers, they provide a valuable source of independent phylogenetic evidence. The gene for granule-bound starch synthase (GBSSI or waxy) exists in a single copy in nearly all plants examined so far. Our study of GBSSI had three parts: (1) Amino acid sequences were compared across a broad taxonomic range, including grasses, four dicotyledons, and the microbial homologs of GBSSI. Inferred structural information was used to aid in the alignment of these very divergent sequences. The informed alignments highlight amino acids that are conserved across all sequences, and demonstrate that structural motifs can be highly conserved in spite of marked divergence in amino acid sequence. (2) Maximum-likelihood (ML) analyses were used to examine exon sequence evolution throughout grasses. Differences in probabilities among substitution types and marked among-site rate variation contributed to the observed pattern of variation. Of the parameters examined in our set of likelihood models, the inclusion of among-site rate variation following a gamma distribution caused the greatest improvement in likelihood score. (3) We performed cladistic parsimony analyses of GBSSI sequences throughout grasses, within tribes, and within genera to examine the phylogenetic utility of the gene. Introns provide useful information among very closely related species, but quickly become difficult to align among more divergent taxa. Exons are variable enough to provide extensive resolution within the family, but with low bootstrap support. The combined results of amino acid sequence comparisons, maximum-likelihood analyses, and phylogenetic studies underscore factors that might affect phylogenetic reconstruction. In this case, accommodation of the variable rate of evolution among sites might be the first step in maximizing the phylogenetic utility of GBSSI.   相似文献   

7.
The hepatitis B virus (HBV) has a circular DNA genome of about 3,200 base pairs. Economical use of the genome with overlapping reading frames may have led to severe constraints on nucleotide substitutions along the genome and to highly variable rates of substitution among nucleotide sites. Nucleotide sequences from 13 complete HBV genomes were compared to examine such variability of substitution rates among sites and to examine the phylogenetic relationships among the HBV variants. The maximum likelihood method was employed to fit models of DNA sequence evolution that can account for the complexity of the pattern of nucleotide substitution. Comparison of the models suggests that the rates of substitution are different in different genes and codon positions; for example, the third codon position changes at a rate over ten times higher than the second position. Furthermore, substantial variation of substitution rates was detected even after the effects of genes and codon positions were corrected; that is, rates are different at different sites of the same gene or at the same codon position. Such rates after the correction were also found to be positively correlated at adjacent sites, which indicated the existence of conserved and variable domains in the proteins encoded by the viral genome. A multiparameter model validates the earlier finding that the variation in nucleotide conservation is not random around the HBV genome. The test for the existence of a molecular clock suggests that substitution rates are more or less constant among lineages. The phylogenetic relationships among the viral variants were examined. Although the data do not seem to contain sufficient information to resolve the details of the phylogeny, it appears quite certain that the serotypes of the viral variants do not reflect their genetic relatedness. Correspondence to: Z. Yang  相似文献   

8.
Rate heterogeneity among lineages is a common feature of molecular evolution, and it has long impeded our ability to accurately estimate the age of evolutionary divergence events. The development of relaxed molecular clocks, which model variable substitution rates among lineages, was intended to rectify this problem. Major subtypes of pandemic HIV-1 group M are thought to exemplify closely related lineages with different substitution rates. Here, we report that inferring the time of most recent common ancestor of all these subtypes in a single phylogeny under a single (relaxed) molecular clock produces significantly different dates for many of the subtypes than does analysis of each subtype on its own. We explore various methods to ameliorate this problem. We conclude that current molecular dating methods are inadequate for dealing with this type of substitution rate variation in HIV-1. Through simulation, we show that heterotachy causes root ages to be overestimated.  相似文献   

9.
Only relatively recently have researchers turned to molecular methods for nematode phylogeny reconstruction. Thus, we lack the extensive literature on evolutionary patterns and phylogenetic usefulness of different DNA regions for nematodes that exists for other taxa. Here, we examine the usefulness of mtDNA for nematode phylogeny reconstruction and provide data that can be used for a priori character weighting or for parameter specification in models of sequence evolution. We estimated the substitution pattern for the mitochondrial ND4 gene from intraspecific comparisons in four species of parasitic nematodes from the family Trichostrongylidae (38-50 sequences per species). The resulting pattern suggests a strong mutational bias toward A and T, and a lower transition/transversion ratio than is typically observed in other taxa. We also present information on the relative rates of substitution at first, second, and third codon positions and on relative rates of saturation of different types of substitutions in comparisons ranging from intraspecific to interordinal. Silent sites saturate extremely quickly, presumably owing to the substitution bias and, perhaps, to an accelerated mutation rate. Results emphasize the importance of using only the most closely related sequences in order to infer patterns of substitution accurately for nematodes or for other taxa having strongly composition-biased DNA. ND4 also shows high amino acid polymorphism at both the intra- and interspecific levels, and in higher level comparisons, there is evidence of saturation at variable amino acid sites. In general, we recommend using mtDNA coding genes only for phylogenetics of relatively closely related nematode species and, even then, using only nonsynonymous substitutions and the more conserved mitochondrial genes (e.g., cytochrome oxidases). On the other hand, the high substitution rate in genes such as ND4 should make them excellent for population genetics studies, identifying cryptic species, and resolving relationships among closely related congeners when other markers show insufficient variation.   相似文献   

10.
Variation in substitution rates among evolutionary lineages (among-lineage rate variation or ALRV) has been reported to negatively affect the estimation of phylogenies. When the substitution processes underlying ALRV are modeled inadequately, non-sister taxa with similar substitution rates are estimated incorrectly as sister species due to long-branch attraction. Recent advances in modeling site-specific rate variation (heterotachy) have reduced the impacts of ALRV on phylogeny estimation in several empirical and simulated datasets. However, the addition of parameters to the substitution model reduces power to estimate each parameter correctly, which can also lead to incorrect phylogeny estimation. A potential solution to this problem is to identify the levels of ALRV that negatively impact phylogeny estimation such that molecular markers with non-deleterious levels of ALRV can be identified. To this end, we used analyses of empirical and simulated gene datasets to evaluate whether levels of ALRV identified in a mitochondrial genomic dataset for salamanders negatively impacted phylogeny estimation. We simulated data with and without ALRV, holding all other evolutionary parameters constant, and compared the phylogenetic performance of both simulated and empirical datasets. Overall, we found limited, positive effects of ALRV on phylogeny estimation in this dataset, the majority of which resulted from an increase in substitution rate on short branches. We conclude that ALRV does not always negatively impact phylogeny estimation. Therefore, ALRV can likely be disregarded as a criterion for marker selection in comparable phylogenetic studies.  相似文献   

11.
Most molecular phylogenetic studies of vertebrates have been based on DNA sequences of mitochondrial-encoded genes. MtDNA evolves rapidly and is thus particularly useful for resolving relationships among recently evolved groups. However, it has the disadvantage that all of the mitochondrial genes are inherited as a single linkage group so that only one independent gene tree can be inferred regardless of the number of genes sequenced. Introns of nuclear genes are attractive candidates for independent sources of rapidly evolving DNA: they are pervasive, most of their nucleotides appear to be unconstrained by selection, and PCR primers can be designed for sequences in adjacent exons where nucleotide sequences are conserved. We sequenced intron 7 of the beta-fibrinogen gene (beta-fibint7) for a diversity of woodpeckers and compared the phylogenetic signal and nucleotide substitution properties of this DNA sequence with that of mitochondrial-encoded cytochrome b (cyt b) from a previous study. A few indels (insertions and deletions) were found in the beta-fibint7 sequences, but alignment was not difficult, and the indels were phylogentically informative. The beta-fibint7 and cyt b gene trees were nearly identical to each other but differed in significant ways from the traditional woodpecker classification. Cyt b evolves 2.8 times as fast as beta-fibint7 (14. 0 times as fast at third codon positions). Despite its relatively slow substitution rate, the phylogenetic signal in beta-fibint7 is comparable to that in cyt b for woodpeckers, because beta-fibint7 has less base composition bias and more uniform nucleotide substitution probabilities. As a consequence, compared with cyt b, beta-fibint7 nucleotide sites are expected to enter more distinct character states over the course of evolution and have fewer multiple substitutions and lower levels of homoplasy. Moreover, in contrast to cyt b, in which nearly two thirds of nucleotide sites rarely vary among closely related taxa, virtually all beta-fibint7 nucleotide sites appear free of selective constraints, which increases informative sites per unit sequenced. However, the estimated gamma distribution used to model rate variation among sites suggests constraints on some beta-fibint7 sites. This study suggests that introns will be useful for phylogenetic studies of recently evolved groups.  相似文献   

12.
Divergence time and substitution rate are seriously confounded in phylogenetic analysis, making it difficult to estimate divergence times when the molecular clock (rate constancy among lineages) is violated. This problem can be alleviated to some extent by analyzing multiple gene loci simultaneously and by using multiple calibration points. While different genes may have different patterns of evolutionary rate change, they share the same divergence times. Indeed, the fact that each gene may violate the molecular clock differently leads to the advantage of simultaneous analysis of multiple loci. Multiple calibration points provide the means for characterizing the local evolutionary rates on the phylogeny. In this paper, we extend previous likelihood models of local molecular clock for estimating species divergence times to accommodate multiple calibration points and multiple genes. Heterogeneity among different genes in evolutionary rate and in substitution process is accounted for by the models. We apply the likelihood models to analyze two mitochondrial protein-coding genes, cytochrome oxidase II and cytochrome b, to estimate divergence times of Malagasy mouse lemurs and related outgroups. The likelihood method is compared with the Bayes method of Thorne et al. (1998, Mol. Biol. Evol. 15:1647-1657), which uses a probabilistic model to describe the change in evolutionary rate over time and uses the Markov chain Monte Carlo procedure to derive the posterior distribution of rates and times. Our likelihood implementation has the drawbacks of failing to accommodate uncertainties in fossil calibrations and of requiring the researcher to classify branches on the tree into different rate groups. Both problems are avoided in the Bayes method. Despite the differences in the two methods, however, data partitions and model assumptions had the greatest impact on date estimation. The three codon positions have very different substitution rates and evolutionary dynamics, and assumptions in the substitution model affect date estimation in both likelihood and Bayes analyses. The results demonstrate that the separate analysis is unreliable, with dates variable among codon positions and between methods, and that the combined analysis is much more reliable. When the three codon positions were analyzed simultaneously under the most realistic models using all available calibration information, the two methods produced similar results. The divergence of the mouse lemurs is dated to be around 7-10 million years ago, indicating a surprisingly early species radiation for such a morphologically uniform group of primates.  相似文献   

13.
Statistical Properties of a DNA Sample under the Finite-Sites Model   总被引:1,自引:0,他引:1       下载免费PDF全文
Z. Yang 《Genetics》1996,144(4):1941-1950
Statistical properties of a DNA sample from a random-mating population of constant size are studied under the finite-sites model. It is assumed that there is no migration and no recombination occurs within the locus. A Markov process model is used for nucleotide substitution, allowing for multiple substitutions at a single site. The evolutionary rates among sites are treated as either constant or variable. The general likelihood calculation using numerical integration involves intensive computation and is feasible for three or four sequences only; it may be used for validating approximate algorithms. Methods are developed to approximate the probability distribution of the number of segregating sites in a random sample of n sequences, with either constant or variable substitution rates across sites. Calculations using parameter estimates obtained for human D-loop mitochondrial DNAs show that among-site rate variation has a major effect on the distribution of the number of segregating sites; the distribution under the finite-sites model with variable rates among sites is quite different from that under the infinite-sites model.  相似文献   

14.
The nature of heterotachy at the center of recent controversy over the relative performance of tree-building methods is different from the form of heterotachy that has been inferred in empirical studies. The latter have suggested that proportions of variable sites (p(var)) vary among orthologues and among paralogues. However, the strength of this inference, describing what may be one of the most important evolutionary properties of sequence data, has remained weak. Consequently, other models of sequence evolution have been proposed to explain some long-branch attraction (LBA) problems that could be attributed to differences in p(var). For an empirical case with plastid and eubacterial RNA polymerase sequences, we confirm using capture-recapture estimates and simulations that p(var) can differ among orthologues in anciently diverged evolutionary lineages. We find that parsimony and a least squares distance method that implements an overly simple model of sequence evolution are susceptible to LBA induced by this form of heterotachy. Although homogeneous maximum likelihood inference was found to be robust to model misspecification in our specific example, we caution against assuming that it will always be so.  相似文献   

15.
16.
Evolutionary relationships are typically inferred from molecular sequence data using a statistical model of the evolutionary process. When the model accurately reflects the underlying process, probabilistic phylogenetic methods recover the correct relationships with high accuracy. There is ample evidence, however, that models commonly used today do not adequately reflect real-world evolutionary dynamics. Virtually all contemporary models assume that relatively fast-evolving sites are fast across the entire tree, whereas slower sites always evolve at relatively slower rates. Many molecular sequences, however, exhibit site-specific changes in evolutionary rates, called "heterotachy." Here we examine the accuracy of 2 phylogenetic methods for incorporating heterotachy, the mixed branch length model--which incorporates site-specific rate changes by summing likelihoods over multiple sets of branch lengths on the same tree--and the covarion model, which uses a hidden Markov process to allow sites to switch between variable and invariable as they evolve. Under a variety of simple heterogeneous simulation conditions, the mixed model was dramatically more accurate than homotachous models, which were subject to topological biases as well as biases in branch length estimates. When data were simulated with strong versions of the types of heterotachy observed in real molecular sequences, the mixed branch length model was more accurate than homotachous techniques. Analyses of empirical data sets confirmed that the mixed branch length model can improve phylogenetic accuracy under conditions that cause homotachous models to fail. In contrast, the covarion model did not improve phylogenetic accuracy compared with homotachous models and was sometimes substantially less accurate. We conclude that a mixed branch length approach, although not the solution to all phylogenetic errors, is a valuable strategy for improving the accuracy of inferred trees.  相似文献   

17.
The existence of a lineage-specific nucleotide substitution rate in mammalian mtDNA has been investigated by analyzing the mtDNA of all available species, that is, 35 complete mitochondrial genomes from 14 mammalian orders. A detailed study of their evolutionary dynamics has been carried out on both ribosomal RNA and first and second codon positions (P12) of H-strand protein-coding genes by using two different types of relative-rate tests. Results are quite congruent between ribosomal and P12 sites. Significant rate variations have been observed among orders and among species of the same order. However, rate variation does not exceed 1.8-fold between the fastest (Proboscidea and Primates) and the slowest (Perissodactyla) evolving orders. Thus, the observed mitochondrial rate variations among taxa do not invalidate the suitability of mtDNA for drawing mammalian phylogeny. Dependence of evolutionary rate differences on variations in mutation and/or fixation rates was examined. Body size, generation time, and metabolic rate were tested, and no significant correlation was observed between them and the taxon-specific evolutionary rates, most likely because the latter might be influenced by multiple overlapping variable constraints.  相似文献   

18.
Serial transfer of plastids from one eukaryotic host to another is the key process involved in evolution of secondhand plastids. Such transfers drastically change the environment of the plastids and hence the selection regimes, presumably leading to changes over time in the characteristics of plastid gene evolution and to misleading phylogenetic inferences. About half of the dinoflagellate protists species are photosynthetic and unique in harboring a diversity of plastids acquired from a wide range of eukaryotic algae. They are therefore ideal for studying evolutionary processes of plastids gained through secondary and tertiary endosymbioses. In the light of these processes, we have evaluated the origin of 2 types of dinoflagellate plastids, containing the peridinin or 19'-hexanoyloxyfucoxanthin (19'-HNOF) pigments, by inferring the phylogeny using "covarion" evolutionary models allowing the pattern of among-site rate variation to change over time. Our investigations of genes from secondary and tertiary plastids derived from the rhodophyte plastid lineage clearly reveal "heterotachy" processes characterized as stationary covarion substitution patterns and changes in proportion of variable sites across sequences. Failure to accommodate covarion-like substitution patterns can have strong effects on the plastid tree topology. Importantly, multigene analyses performed with probabilistic methods using among-site rate and covarion models of evolution conflict with proposed single origin of the peridinin- and 19'-HNOF-containing plastids, suggesting that analysis of secondhand plastids can be hampered by convergence in the evolutionary signature of the plastid DNA sequences. Another type of sequence convergence was detected at protein level involving the psaA gene. Excluding the psaA sequence from a concatenated protein alignment grouped the peridinin plastid with haptophytes, congruent with all DNA trees. Altogether, taking account of complex processes involved in the evolution of dinoflagellate plastid sequences (both at the DNA and amino acid level), we demonstrate the difficulty of excluding independent, tertiary origin for both the peridinin and 19'-HNOF plastids involving engulfment of haptophyte-like algae. In addition, the refined topologies suggest the red algal order, Porphyridales, as the endosymbiont ancestor of the secondary plastids in cryptophytes, haptophytes, and heterokonts.  相似文献   

19.
Molecular evolution of nitrate reductase genes   总被引:9,自引:0,他引:9  
To understand the evolutionary mechanisms and relationships of nitrate reductases (NRs), the nucleotide sequences encoding 19 nitrate reductase (NR) genes from 16 species of fungi, algae, and higher plants were analyzed. The NR genes examined show substantial sequence similarity, particularly within functional domains, and large variations in GC content at the third codon position and intron number. The intron positions were different between the fungi and plants, but conserved within these groups. The overall and nonsynonymous substitution rates among fungi, algae, and higher plants were estimated to be 4.33 × 10−10 and 3.29 × 10−10 substitutions per site per year. The three functional domains of NR genes evolved at about one-third of the rate of the N-terminal and the two hinge regions connecting the functional domains. Relative rate tests suggested that the nonsynonymous substitution rates were constant among different lineages, while the overall nucleotide substitution rates varied between some lineages. The phylogenetic trees based on NR genes correspond well with the phylogeny of the organisms determined from systematics and other molecular studies. Based on the nonsynonymous substitution rate, the divergence time of monocots and dicots was estimated to be about 340 Myr when the fungi–plant or algae–higher plant divergence times were used as reference points and 191 Myr when the rice–barley divergence time was used as a reference point. These two estimates are consistent with other estimates of divergence times based on these reference points. The lack of consistency between these two values appears to be due to the uncertainty of the reference times. Received: 10 April 1995 / Accepted: 10 September 1995  相似文献   

20.
Simplifying assumptions made in various tree reconstruction methods-- notably rate constancy among nucleotide sites, homogeneity, and stationarity of the substitutional processes--are clearly violated when nucleotide sequences are used to infer distant relationships. Use of tree reconstruction methods based on such oversimplified assumptions can lead to misleading results, as pointed out by previous authors. In this paper, we made use of a (discretized) gamma distribution to account for variable rates of substitution among sites and built models that allowed for unequal base frequencies in different sequences. The models were nonhomogeneous Markov-process models, assuming different patterns of substitution in different parts of the tree. Data of the small-subunit rRNAs from four species were analyzed, where base frequencies were quite different among sequences and rates of substitution were highly variable at sites. Parameters in the models were estimated by maximum likelihood, and models were compared by the likelihood-ratio test. The nonhomogeneous models provided significantly better fit to the data than homogeneous models despite their involvement of many parameters. They also appeared to produce reasonable estimation of the phylogenetic tree; in particular, they seemed able to identify the root of the tree.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号