首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There are three different methods of estimating the number of nucleotide substitutions between a pair of species from amino acid sequence data, i.e. the Poisson correction method, random evolutionary hit method, and counting the actual but minimum number of nucleotide substitutions. In this paper the relationships among the estimates obtained by these methods are studied empirically. The results obtained indicate that there is a high correlation among these estimates and in practice any of the three methods may be used for constructing evolutionary trees or relating nucleotide substitutions to evolutionary time. The effects of varying rates of nucleotide substition among different sites on the Poisson correction and random evolutionary hit methods are also studied mathematically. It is shown that these two methods are quite insensitive to the variation of the rate of nucleotide substitution.  相似文献   

2.
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small.  相似文献   

3.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

4.
Reconstruction of ancestral DNA and amino acid sequences is an important means of inferring information about past evolutionary events. Such reconstructions suggest changes in molecular function and evolutionary processes over the course of evolution and are used to infer adaptation and convergence. Maximum likelihood (ML) is generally thought to provide relatively accurate reconstructed sequences compared to parsimony, but both methods lead to the inference of multiple directional changes in nucleotide frequencies in primate mitochondrial DNA (mtDNA). To better understand this surprising result, as well as to better understand how parsimony and ML differ, we constructed a series of computationally simple "conditional pathway" methods that differed in the number of substitutions allowed per site along each branch, and we also evaluated the entire Bayesian posterior frequency distribution of reconstructed ancestral states. We analyzed primate mitochondrial cytochrome b (Cyt-b) and cytochrome oxidase subunit I (COI) genes and found that ML reconstructs ancestral frequencies that are often more different from tip sequences than are parsimony reconstructions. In contrast, frequency reconstructions based on the posterior ensemble more closely resemble extant nucleotide frequencies. Simulations indicate that these differences in ancestral sequence inference are probably due to deterministic bias caused by high uncertainty in the optimization-based ancestral reconstruction methods (parsimony, ML, Bayesian maximum a posteriori). In contrast, ancestral nucleotide frequencies based on an average of the Bayesian set of credible ancestral sequences are much less biased. The methods involving simpler conditional pathway calculations have slightly reduced likelihood values compared to full likelihood calculations, but they can provide fairly unbiased nucleotide reconstructions and may be useful in more complex phylogenetic analyses than considered here due to their speed and flexibility. To determine whether biased reconstructions using optimization methods might affect inferences of functional properties, ancestral primate mitochondrial tRNA sequences were inferred and helix-forming propensities for conserved pairs were evaluated in silico. For ambiguously reconstructed nucleotides at sites with high base composition variability, ancestral tRNA sequences from Bayesian analyses were more compatible with canonical base pairing than were those inferred by other methods. Thus, nucleotide bias in reconstructed sequences apparently can lead to serious bias and inaccuracies in functional predictions.  相似文献   

5.
We propose two approximate methods (one based on parsimony and one on pairwise sequence comparison) for estimating the pattern of nucleotide substitution and a parsimony-based method for estimating the gamma parameter for variable substitution rates among sites. The matrix of substitution rates that represents the substitution pattern can be recovered through its relationship with the observable matrix of site pattern frequences in pairwise sequence comparisons. In the parsimony approach, the ancestral sequences reconstructed by the parsimony algorithm were used, and the two sequences compared are those at the ends of a branch in the phylogenetic tree. The method for estimating the gamma parameter was based on a reinterpretation of the numbers of changes at sites inferred by parsimony. Three data sets were analyzed to examine the utility of the approximate methods compared with the more reliable likelihood methods. The new methods for estimating the substitution pattern were found to produce estimates quite similar to those obtained from the likelihood analyses. The new method for estimating the gamma parameter was effective in reducing the bias in conventional parsimony estimates, although it also overestimated the parameter. The approximate methods are computationally very fast and appear useful for analyzing large data sets, for which use of the likelihood method requires excessive computation.   相似文献   

6.
Two ways of estimating superimposed fixed mutations in the divergent descent of proteins are examined. One method counts these in terms of a Poisson process operating within selective constraints. The other uses the maximum parsimony method to connect the contemporary sequences through intervening ancestral sequences in an evolutionary tree, and then, from the distribution of fixed mutations in dense regions of this genealogy, estimates how many fixations should be added to sparse regions. An algorithm is described which determines such augmented distances. The two methods yield similar estimates of genetic divergence when tested on a series of cytochrome c amino acid sequences. Within those constraints imposed by Darwinian selection, the dynamic behavior of the evolutionary divergence of proteins is described by the probabilistic pathways of the stochastic model. The parsimony model provides a valid Aufbau-Prinzip for examining which of those pathways occurred along a particular lineage. Concordance of the numerical magnitudes of genetic divergence estimates made by the two methods reveals them as logically consistent complements, not as mutually exclusive antagonists. Both methods indicate that cytochrome c has evolved in a non-uniform manner over geological time and more rapidly than previously estimated.  相似文献   

7.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.  相似文献   

8.
The phylogenetic inference of ancestral protein sequences is a powerful technique for the study of molecular evolution, but any conclusions drawn from such studies are only as good as the accuracy of the reconstruction method. Every inference method leads to errors in the ancestral protein sequence, resulting in potentially misleading estimates of the ancestral protein's properties. To assess the accuracy of ancestral protein reconstruction methods, we performed computational population evolution simulations featuring near-neutral evolution under purifying selection, speciation, and divergence using an off-lattice protein model where fitness depends on the ability to be stable in a specified target structure. We were thus able to compare the thermodynamic properties of the true ancestral sequences with the properties of “ancestral sequences” inferred by maximum parsimony, maximum likelihood, and Bayesian methods. Surprisingly, we found that methods such as maximum parsimony and maximum likelihood that reconstruct a “best guess” amino acid at each position overestimate thermostability, while a Bayesian method that sometimes chooses less-probable residues from the posterior probability distribution does not. Maximum likelihood and maximum parsimony apparently tend to eliminate variants at a position that are slightly detrimental to structural stability simply because such detrimental variants are less frequent. Other properties of ancestral proteins might be similarly overestimated. This suggests that ancestral reconstruction studies require greater care to come to credible conclusions regarding functional evolution. Inferred functional patterns that mimic reconstruction bias should be reevaluated.  相似文献   

9.
Summary Conducting computer simulations, Nei and Tateno (1978) have shown that Jukes and Holmquist's (1972) method of estimating the number of nucleotide substitutions tends to give an overestimate and the estimate obtained has a large variance. Holmquist and Conroy (1980) repeated some parts of our simulation and claim that the overestimation of nucleotide substitutions in our paper occurred mainly because we used selected data. Examination of Holmquist and Conroy's simulation indicates that their results are essentially the same as ours when the Jukes-Holmquist method is used, but since they used a different method of computation their estimates of nucleotide substitutions differed substantially from ours. Another problem in Holmquist and Conroy's Letter is that they confused the expected number of nucleotide substitution with the number in a sample. This confusion has resulted in a number of unnecessary arguments. They also criticized ourX 2 measure, but this criticism is apparently due to a misunderstanding of the assumptions of our method and a failure to use our method in the way we described. We believe that our earlier conclusions remain unchanged.  相似文献   

10.
Many methods are available for estimating ancestral values of continuous characteristics, but little is known about how well these methods perform. Here we compare six methods: linear parsimony, squared-change parsimony, one-parameter maximum likelihood (Brownian motion), two-parameter maximum likelihood (Ornstein-Uhlenbeck process), and independent comparisons with and without branch-length information. We apply these methods to data from 20 morphospecies of Pleistocene planktic Foraminifera in order to estimate ancestral size and shape variables, and compare these estimates with measurements on fossils close to the phylogenetic position of 13 ancestors. No method produced accurate estimates for any variable: estimates were consistently less good as predictors of the observed values than were the averages of the observed values. The two-parameter maximum-likelihood model consistently produces the most accurate size estimates overall. Estimation of ancestral sizes is confounded by an evolutionary trend towards increasing size. Shape showed no trend but was still estimated very poorly: we consider possible reasons. We discuss the implications of our results for the use of estimates of ancestral characteristics.  相似文献   

11.
A simulation study was carried out to investigate the relative importance of tree topology (both balance and stemminess), evolutionary rates (constant, varying among characters, and varying among lineages), and evolutionary models in determining the accuracy with which phylogenetic trees can be estimated. The three evolutionary context models were phyletic (characters can change at each simulated time step), speciational (changes are possible only at the time of speciation into two daughter lineages), and punctuational (changes occur at the time of speciation but only in one of the daughter lineages). UPGMA clustering and maximum parsimony (“Wagner trees”) methods for estimating phylogenies were compared. All trees were based on eight recent OTUs. The three evolutionary context models were found to have the largest influence on accuracy of estimates by both methods. The next most important effect was that of the stemminess × context interaction. Maximum parsimony and UPGMA performed worst under the punctuational models. Under the phyletic model, trees with high stemminess values could be estimated more accurately and balanced trees were slightly easier to estimate than unbalanced ones. Overall, maximum parsimony yielded more accurate trees than UPGMA—but that was expected for these simulations since many more characters than OTUs were used. Our results suggest that the great majority of estimated phylogenetic trees are likely to be quite inaccurate; they underscore the inappropriateness of characterizing current phylogenetic methods as being for reconstruction rather than for estimation.  相似文献   

12.
Yang C  Hao H  Liu S  Liu Y  Yue B  Zhang X 《Mitochondrial DNA》2012,23(2):131-133
The Chinese oriental vole (Eothenomys chinensis) belongs to subfamily Arvicolinae, which is endemic to the mountains in southwest China. E. chinensis and other Arvicoline species display a number of features that make them ideal for evolutionary studies of speciation and the role of Quaternary glacial cycles on diversification. In this study, the complete mitochondrial genome of E. chinensis was sequenced. It was determined to be 16,362 bases. The nucleotide sequence data of 12 heavy-strand protein-coding genes of E. chinensis and other 19 rodents were used for phylogenetic analyses. Trees constructed using three different phylogenetic methods (Bayesian, maximum parsimony, and maximum likelihood) showed a similar topology demonstrating that E. chinensis was clustered in subfamily arvicolinae--formed a solid monophyletic group being sister to the subfamily Cricetinae. And the trees also suggested that E. chinensis is a sister to the genus Microtus and Proedromys.  相似文献   

13.
Summary The augmentation procedure of G.W. Moore leads to correct estimates of the total number of nucleotide substitutions separating two genes descendent from a common ancestor provided the data base is sufficiently dense. These estimates are in agreement with the true distance values from simulations of known evolutionary pathways. The estimates, on the average, are unbiased: they neither overaugment nor underaugment seriously. The variance of the population of augmented distance values reflects accurately the variance of the population of true distance values and is thus not abnormally large due to procedural defects in the algorithm.The augmented distances are in agreement with stochastic models tested on real data when the latter take proper account of the restricted mutability of codons resulting from natural selection.When the experimental data base is not dense, the augmented distance values and population variance may underestimate both the true distance values and their variance. This has a logical consequence that there exist significant and numerous errors in the ancestral sequences reconstructed by the parsimony principle from such data bases.The restrictions, resulting from natural selection, on the mutability of different nucleotide sites is shown to bear critically on the accuracy of estimates of the total number of nucleotide replacements made by stochastic models.  相似文献   

14.
Computer simulations provide a flexible method for assessing the power and robustness of phylogenetic inference methods. Unfortunately, simulated data are often obviously atypical of data encountered in studies of molecular evolution. Unrealistic simulations can lead to conclusions that are irrelevant to real-data analyses or can provide a biased view of which methods perform well. Here, we present a software tool designed to generate data under a complex codon model that allows each residue in the protein sequence to have a different set of equilibrium amino acid frequencies. The software can obtain maximum-likelihood estimates of the parameters of the Halpern and Bruno model from empirical data and a fixed tree; given an arbitrary tree and a fixed set of parameters, the software can then simulate artificial datasets.We present the results of a simulation experiment using randomly generated tree shapes and substitution parameters estimated from 1610 mammalian cytochrome b sequences.We tested tree inference at the amino acid, nucleotide and codon levels and under parsimony, maximum-likelihood, Bayesian and distance criteria (for a total of more than 650 analyses on each dataset). Based on these simulations, nucleotide-level analyses seem to be more accurate than amino acid and codon analyses. The performance of distance-based phylogenetic methods appears to be quite sensitive to the choice of model and the form of rate heterogeneity used. Further studies are needed to assess the generality of these conclusions. For example, fitting parameters of the Halpern Bruno model to sequences from other genes will reveal the extent to which our conclusions were influenced by the choice of cytochrome b. Incorporating codon bias and more sources heterogeneity into the simulator will be crucial to determining whether the current results are caused by a bias in the current simulation study in favour of nucleotide analyses.  相似文献   

15.
A mathematical theory for the evolutionary change of restriction endonuclease cleavage sites is developed, and the probabilities of various types of restriction-site changes are evaluated. A computer simulation is also conducted to study properties of the evolutionary change of restriction sites. These studies indicate that parsimony methods of constructing phylogenetic trees often make erroneous inferences about evolutionary changes of restriction sites unless the number of nucleotide substitutions per site is less than 0.01 for all branches of the tree. This introduces a systematic error in estimating the number of mutational changes for each branch and, consequently, in constructing phylogenetic trees. Therefore, parsimony methods should be used only in cases where nucleotide sequences are closely related. Reexamination of Ferris et al.'s data on restriction-site differences of mitochondrial DNAs does not support Templeton's conclusions regarding the phylogenetic tree for man and apes and the molecular clock hypothesis. Templeton's claim that Nei and Li's method of estimating the number of nucleotide substitutions per site is seriously affected by parallel losses and loss-gains of restriction sites is also unsupported.   相似文献   

16.
A major assumption of many molecular phylogenetic methods is the homogeneity of nucleotide frequencies among taxa, which refers to the equality of the nucleotide frequency bias among species. Changes in nucleotide frequency among different lineages in a data set are thought to lead to erroneous phylogenetic inference because unrelated clades may appear similar because of evolutionarily unrelated similarities in nucleotide frequencies. We tested the effects of the heterogeneity of nucleotide frequency bias on phylogenetic inference, along with the interaction between this heterogeneity and stratified taxon sampling, by means of computer simulations using evolutionary parameters derived from genomic databases. We found that the phylogenetic trees inferred from data sets simulated under realistic, observed levels of heterogeneity for mammalian genes were reconstructed with accuracy comparable to those simulated with homogeneous nucleotide frequencies; the results hold for Neighbor-Joining, minimum evolution, maximum parsimony, and maximum-likelihood methods. The LogDet distance method, specifically designed to deal with heterogeneous nucleotide frequencies, does not perform better than distance methods that assume substitution pattern homogeneity among sequences. In these specific simulation conditions, we did not find a significant interaction between phylogenetic accuracy and substitution pattern heterogeneity among lineages, even when the taxon sampling is increased.  相似文献   

17.
The phylogenetic affinities of the chaetognaths: a molecular analysis   总被引:8,自引:3,他引:5  
The chaetognaths, or arrowworms, constitute a small and enigmatic phylum of marine invertebrates whose phylogenetic affinities have long been uncertain. A popular hypothesis is that the chaetognaths are the sister group of the major deuterostome phyla: chordates, hemichordates, and echinoderms. Here we attempt to determine the affinities of the chaetognaths by using molecular sequence data. We describe the isolation and nucleotide sequence determination of 18S ribosomal DNA from one species of chaetognath and one acanthocephalan. Extensive phylogenetic analyses employing a suite of phylogenetic reconstruction methods (maximum parsimony, maximum likelihood, evolutionary parsimony, and two distance methods) suggest that the hypothesized relationship between chaetognaths and the deuterostomes is incorrect. In contrast, we propose that the lineage leading to the chaetognaths arose prior to the advent of the coelomate metazoa.   相似文献   

18.
Yang Z 《Systematic biology》1998,47(1):125-133
The effect of the evolutionary rate of a gene on the accuracy of phylogeny reconstruction was examined by computer stimulation. The evolutionary rate is measured by the tree length, that is, the expected total number of nucleotide substitutions per site on the phylogeny. DNA sequence data were simulated using both fixed trees with specified branch lengths and random trees with branch lengths generated from a model of cladogenesis. The parsimony and likelihood methods were used for phylogeny reconstruction, and the proportion of correctly recovered branch partitions by each method was estimated. Phylogenetic methods including parsimony appear quite tolerant of multiple substitutions at the same site. The optimum levels of sequence divergence were even higher than upper limits previously suggested for saturation of substitutions, indicating that the problem of saturation may have been exaggerated. Instead, the lack of information at low levels of divergence should be seriously considered in evaluation of a gene's phylogenetic utility, especially when the gene sequence is short. The performance of parsimony, relative to that of likelihood, does not necessarily decrease with the increase of the evolutionary rate.  相似文献   

19.
Summary Statistical properties of Goodman et al.'s (1974) method of compensating for undetected nucleotide substitutions in evolution are investigated by using computer simulation. It is found that the method tends to overcompensate when the stochastic error of the number of nucleotide substitutions is large. Furthermore, the estimate of the number of nucleotide substitutions obtained by this method has a large variance. However, in order to see whether this method gives overcompensation when applied together with the maximum parsimony method, a much larger scale of simulation seems to be necessary.  相似文献   

20.
Dou J  Zhao X  Fu X  Jiao W  Wang N  Zhang L  Hu X  Wang S  Bao Z 《Biology direct》2012,7(1):17-9
ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome. RESULTS: Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents. CONCLUSIONS: The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号