共查询到20条相似文献,搜索用时 9 毫秒
1.
Relative efficiencies of the maximum-parsimony and distance-matrix methods of phylogeny construction for restriction data. 总被引:4,自引:0,他引:4
The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation. In this simulation, six DNA sequences of 16,000 nucleotides were assumed to evolve following a given model tree. The recognition sequences of 20 different six-base restriction enzymes were used to identify the restriction sites of the DNA sequences generated. The restriction-site data and restriction-fragment data thus obtained were used to reconstruct a phylogenetic tree, and the tree obtained was compared with the model tree. This process was repeated 300 times. The results obtained indicate that when the rate of nucleotide substitution is constant the probability of obtaining the correct tree (Pc) is generally higher in the NJ method than in the MP method. However, if we use the average topological deviation from the model tree (dT) as the criterion of comparison, the NJ and MP methods are nearly equally efficient. When the rate of nucleotide substitution varies with evolutionary lineage, the NJ method is better than the MP method, whether Pc or dT is used as the criterion of comparison. With 500 nucleotides and when the number of nucleotide substitutions per site was very small, restriction-site data were, contrary to our expectation, more useful than sequence data. Restriction-fragment data were less useful than restriction-site data, except when the sequence divergence was very small. UPGMA seems to be useful only when the rate of nucleotide substitution is constant and sequence divergence is high. 相似文献
2.
In the reconstruction of a large phylogenetic tree, the most difficult part is usually the problem of how to explore the topology space to find the optimal topology. We have developed a "divide-and-conquer" heuristic algorithm in which an initial neighbor-joining (NJ) tree is divided into subtrees at internal branches having bootstrap values higher than a threshold. The topology search is then conducted by using the maximum-likelihood method to reevaluate all branches with a bootstrap value lower than the threshold while keeping the other branches intact. Extensive simulation showed that our simple method, the neighbor-joining maximum-likelihood (NJML) method, is highly efficient in improving NJ trees. Furthermore, the performance of the NJML method is nearly equal to or better than existing time-consuming heuristic maximum-likelihood methods. Our method is suitable for reconstructing relatively large molecular phylogenetic trees (number of taxa >/= 16). 相似文献
3.
Grishin NV 《Journal of molecular evolution》1995,41(5):675-679
A general model for estimating the number of amino acid substitutions per site (d) from the fraction of identical residues between two sequences (q) is proposed. The well-known Poisson-correction formula q = e
–d
corresponds to a site-independent and amino-acid-independent substitution rate. Equation q = (1 – e –2d
)/2d, derived for the case of substitution rates that are site-independent, but vary among amino acids, approximates closely the empirical method, suggested by Dayhoff et al. (1978). Equation q = 1/(1 + d) describes the case of substitution rates that are amino acid-independent but vary among sites. Lastly, equation q = [ln(1 + 2d)]/2d accounts for the general case where substitution rates can differ for both amino acids and sites. 相似文献
4.
Correspondence to: M. Nei 0871 相似文献
5.
Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide 总被引:17,自引:0,他引:17
Summary A formal mathematical analysis of Kimura's (1981) six-parameter model of nucleotide substitution for the case of unequal substitution rates among different pairs of nucleotides is conducted, and new formulae for estimating the number of nucleotide substitutions and its standard error are obtained. By using computer simulation, the validities and utilities of Jukes and Cantor's (1969) one-parameter formula, Takahata and Kimura's (1981) four-parameter formula, and our sixparameter formula for estimating the number of nucleotide substitutions are examined under three different schemes of nucleotide substitution. It is shown that the one-parameter and four-parameter formulae often give underestimates when the number of nucleotide substitutions is large, whereas the six-parameter formula generally gives a good estimate for all the three substitution schemes examined. However, when the number of nucleotide substitutions is large, the six-parameter and four-parameter formulae are often inapplicable unless the number of nucleotides compared is extremely large. It is also shown that as long as the mean number of nucleotide substitutions is smaller than one per nucleotide site the three formulae give more or less the same estimate regardless of the substitution scheme used.On leave of absence from the Department of Biology, Faculty of Science, Kyushu University 33, Fukuoka 812, Japan 相似文献
6.
When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models.
Correspondence to: M. Nei 相似文献
7.
We test models for the evolution of helical regions of RNA sequences, where the base pairing constraint leads to correlated compensatory substitutions occurring on either side of the pair. These models are of three types: 6-state models include only the four Watson-Crick pairs plus GU and UG; 7-state models include a single mismatch state that combines all of the 10 possible mismatches; 16-state models treat all mismatch states separately. We analyzed a set of eubacterial ribosomal RNA sequences with a well-established phylogenetic tree structure. For each model, the maximum-likelihood values of the parameters were obtained. The models were compared using the Akaike information criterion, the likelihood-ratio test, and Cox's test. With a high significance level, models that permit a nonzero rate of double substitutions performed better than those that assume zero double substitution rate. Some models assume symmetry between GC and CG, between AU and UA, and between GU and UG. Models that relaxed this symmetry assumption performed slightly better, but the tests did not all agree on the significance level. The most general time-reversible model significantly outperformed any of the simplifications. We consider the relative merits of all these models for molecular phylogenetics. 相似文献
8.
On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled 总被引:4,自引:0,他引:4
Using analytical methods, we show that under a variety of model misspecifications, Neighbor-Joining, minimum evolution, and least squares estimation procedures are statistically inconsistent. Failure to correctly account for differing rates-across-sites processes, failure to correctly model rate matrix parameters, and failure to adjust for parallel rates-across-sites changes (a rates-across-subtrees process) are all shown to lead to a "long branch attraction" form of inconsistency. In addition, failure to account for rates-across-sites processes is also shown to result in underestimation of evolutionary distances for a wide variety of substitution models, generalizing an earlier analytical result for the Jukes-Cantor model reported in Golding and a similar bias result for the GTR or REV model in Kelly and Rice (1996). Although standard rates-across-sites models can be employed in many of these cases to restore consistency, current models cannot account for other kinds of misspecification. We examine an idealized but biologically relevant case, where parallel changes in rates at sites across subtrees is shown to give rise to inconsistency. This changing rates-across-subtrees type model misspecification cannot be adjusted for with conventional methods or without carefully considering the rate variation in the larger tree. The results are presented for four-taxon trees, but the expectation is that they have implications for larger trees as well. To illustrate this, a simulated 42-taxon example is given in which the microsporidia, an enigmatic group of eukaryotes, are incorrectly placed at the archaebacteria-eukaryotes split because of incorrectly specified pairwise distances. The analytical nature of the results lend insight into the reasons that long branch attraction tends to be a common form of inconsistency and reasons that other forms of inconsistency like "long branches repel" can arise in some settings. In many of the cases of inconsistency presented, a particular incorrect topology is estimated with probability converging to one, the implication being that measures of uncertainty like bootstrap support will be unable to detect that there is a problem with the estimation. The focus is on distance methods, but previous simulation results suggest that the zones of inconsistency for distance methods contain the zones of inconsistency for maximum likelihood methods as well. 相似文献
9.
10.
We develop techniques to estimate the statistical significance of gap-free alignments between two genomic DNA sequences, using human-mouse alignments as an example. The sequences are assumed to be sufficiently similar that some but not all of the neutrally evolving regions (i.e., those under no evolutionary constraint) can be reliably aligned. Our goal is to model the situation in which the neutral rate of evolution, and hence the extent of the aligning intervals, varies across the genome. In some cases, this permits the weaker of two matches to be judged as less likely to have arisen by chance, provided it lies in a genomic interval with a high level of background divergence. We employ a hidden Markov model to capture variations in divergence rates and assign probability values to gap-free alignments using techniques of Dembo and Karlin, which are related to those used for the same purpose by BLAST. Our methods are illustrated in detail using a 1.49 Mb genomic region. Results obtained from the analysis of human chromosome 22 using these techniques are also provided. 相似文献
11.
Relative efficiencies of the maximum parsimony and distance-matrix methods in obtaining the correct phylogenetic tree 总被引:13,自引:1,他引:13
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small. 相似文献
12.
Rates of nucleotide substitution and mammalian nuclear gene evolution. Approximate and maximum-likelihood methods lead to different conclusions 总被引:14,自引:0,他引:14
Rates and patterns of synonymous and nonsynonymous substitutions have important implications for the origin and maintenance of mammalian isochores and the effectiveness of selection at synonymous sites. Previous studies of mammalian nuclear genes largely employed approximate methods to estimate rates of nonsynonymous and synonymous substitutions. Because these methods did not account for major features of DNA sequence evolution such as transition/transversion rate bias and unequal codon usage, they might not have produced reliable results. To evaluate the impact of the estimation method, we analyzed a sample of 82 nuclear genes from the mammalian orders Artiodactyla, Primates, and Rodentia using both approximate and maximum-likelihood methods. Maximum-likelihood analysis indicated that synonymous substitution rates were positively correlated with GC content at the third codon positions, but independent of nonsynonymous substitution rates. Approximate methods, however, indicated that synonymous substitution rates were independent of GC content at the third codon positions, but were positively correlated with nonsynonymous rates. Failure to properly account for transition/transversion rate bias and unequal codon usage appears to have caused substantial biases in approximate estimates of substitution rates. 相似文献
13.
14.
The inconsistency of the maximum parsimony method is known to occur even when the rate of nucleotide substitution is constant. To understand why this inconsistency occurs, a mathematical study was conducted for the cases of five, six, and seven sequences. The results obtained indicate that this inconsistency occurs because the probability of occurrence of nucleotide configurations generated by one substitution on a short interior branch is often lower than that of configurations generated by more substitutions on other longer branches. The chance of occurrence of this event—or, the inconsistency of the maximum parsimony method—apparently increases as the number of sequences increases. The inconsistency may occur even when the extent of sequence divergence is relatively small.
Correspondence to: M. Nei 相似文献
15.
Maxim V Gerashchenko Zalan Peterfi Sun Hee Yim Vadim N Gladyshev 《Nucleic acids research》2021,49(2):e9
There has been a surge of interest towards targeting protein synthesis to treat diseases and extend lifespan. Despite the progress, few options are available to assess translation in live animals, as their complexity limits the repertoire of experimental tools to monitor and manipulate processes within organs and individual cells. It this study, we developed a labeling-free method for measuring organ- and cell-type-specific translation elongation rates in vivo. It is based on time-resolved delivery of translation initiation and elongation inhibitors in live animals followed by ribosome profiling. It also reports translation initiation sites in an organ-specific manner. Using this method, we found that the elongation rates differ more than 50% among mouse organs and determined them to be 6.8, 5.0 and 4.3 amino acids per second for liver, kidney, and skeletal muscle, respectively. We further found that the elongation rate is reduced by 20% between young adulthood and mid-life. Thus, translation, a major metabolic process in cells, is tightly regulated at the level of elongation of nascent polypeptide chains. 相似文献
16.
Anup Som 《Theorie in den Biowissenschaften》2007,125(2):133-145
In this article, a new approach is presented for estimating the efficiencies of the nucleotide substitution models in a four-taxon
case and then this approach is used to estimate the relative efficiencies of six substitution models under a wide variety
of conditions. In this approach, efficiencies of the models are estimated by using a simple probability distribution theory.
To assess the accuracy of the new approach, efficiencies of the models are also estimated by using the direct estimation method.
Simulation results from the direct estimation method confirmed that the new approach is highly accurate. The success of the
new approach opens a unique opportunity to develop analytical methods for estimating the relative efficiencies of the substitution
models in a straightforward way. 相似文献
17.
Piontkivska H 《Molecular phylogenetics and evolution》2004,31(3):865-873
Choice of a substitution model is a crucial step in the maximum likelihood (ML) method of phylogenetic inference, and investigators tend to prefer complex mathematical models to simple ones. However, when complex models with many parameters are used, the extent of noise in statistical inferences increases, and thus complex models may not produce the true topology with a higher probability than simple ones. This problem was studied using computer simulation. When the number of nucleotides used was relatively large (1000 bp), the HKY+Gamma model showed smaller d(T) topological distance between the inferred and the true trees) than the JC and Kimura models. In the cases of shorter sequences (300 bp) simpler model and search algorithm such as JC model and SA+NNI search were found to be as efficient as more complicated searches and models in terms of topological distances, although the topologies obtained under HKY+Gamma model had the highest likelihood values. The performance of relatively simple search algorithm SA+NNI was found to be essentially the same as that of more extensive SA+TBR search under all models studied. Similarly to the conclusions reached by Takahashi and Nei [Mol. Biol. Evol. 17 (2000) 1251], our results indicate that simple models can be as efficient as complex models, and that use of complex models does not necessarily give more reliable trees compared with simple models. 相似文献
18.
The structure of the nuclear factor-kappaB protein-DNA complex varies with DNA-binding site sequence 总被引:3,自引:0,他引:3
Menetski JP 《The Journal of biological chemistry》2000,275(11):7619-7625
19.
Relative allocation to horn and body growth in bighorn rams varies with resource availability 总被引:5,自引:2,他引:5
Festa-Bianchet Marco; Coltman David W.; Turelli Luca; Jorgenson Jon T. 《Behavioral ecology》2004,15(2):305-312
Males may allocate a greater proportion of metabolic resourcesto maintenance than to the development of secondary sexual characterswhen food is scarce, to avoid compromising their probabilityof survival. We assessed the effects of resource availabilityon body mass and horn growth of bighorn rams (Ovis canadensis)at Ram Mountain, Alberta, Canada over 30 years. The number ofadult ewes in the population tripled during our study, and theaverage mass of yearling females decreased by 13%. We used theaverage mass of yearling females as an index of resource availability.Yearling female mass was negatively correlated with the bodymass of rams of all ages, but it affected horn growth only duringthe first three years of life. Yearly horn growth was affectedby a complex interaction of age, body mass, and resource availability.Among rams aged 24 years, the heaviest individuals hadsimilar horn growth at high and at low resource availability,but as ram mass decreased, horn growth for a given body massbecame progressively smaller with decreasing resource availability.For rams aged 59 years, horn growth was weakly but positivelycorrelated with body mass, and rams grew slightly more hornfor a given body mass as resource availability decreased. Whenfood is limited, young rams may direct more resources to bodygrowth than to horn growth, possibly trading long-term reproductivesuccess for short-term survival. Although horn growth of olderrams appeared to be greater at low than at high resource availability,we found no correlation between early and late growth in hornlength for the same ram, suggesting that compensatory horn growthdoes not occur in our study population. Young rams with longerhorns were more likely to be shot by sport hunters than thosewith shorter horns. Trophy hunting could select against ramswith fast-growing horns. 相似文献
20.
Ziheng Yang 《Journal of molecular evolution》1995,40(6):689-697
Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite. 相似文献