首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Models of sequence evolution play an important role in molecular evolutionary studies. The use of inappropriate models of evolution may bias the results of the analysis and lead to erroneous conclusions. Several procedures for selecting the best-fit model of evolution for the data at hand have been proposed, like the likelihood ratio test (LRT) and the Akaike (AIC) and Bayesian (BIC) information criteria. The relative performance of these model-selecting algorithms has not yet been studied under a range of different model trees. In this study, the influence of branch length variation upon model selection is characterized. This is done by simulating sequence alignments under a known model of nucleotide substitution, and recording how often this true model is recovered by different model-fitting strategies. Results of this study agree with previous simulations and suggest that model selection is reasonably accurate. However, different model selection methods showed distinct levels of accuracy. Some LRT approaches showed better performance than the AIC or BIC information criteria. Within the LRTs, model selection is affected by the complexity of the initial model selected for the comparisons, and only slightly by the order in which different parameters are added to the model. A specific hierarchy of LRTs, which starts from a simple model of evolution, performed overall better than other possible LRT hierarchies, or than the AIC or BIC. Received: 2 October 2000 / Accepted: 4 January 2001  相似文献   

2.
Many tests of the lineage dependence of substitution rates, computations of the error of evolutionary distances, and simulations of molecular evolution assume that the rate of evolution is constant in time within each lineage descended from a common ancestor. However, estimates of the index of dispersion of numbers of mammalian substitutions suggest that the rate has time-dependent variations consistent with a fractal-Gaussian-rate Poisson process, which assumes common descent without assuming rate constancy. While this model does not affect certain relative-rate tests, it substantially increases the uncertainty of branch lengths. Thus, fluctuations in the rate of substitution cannot be neglected in calculations that rely on evolutionary distances, such as the confidence intervals of divergence times and certain phylogenetic reconstructions. The fractal-Gaussian-rate Poisson process is compared and contrasted with previous models of molecular evolution, including other Poisson processes, the fractal renewal process, a Lévy-stable process, a fractional-difference process, and a log-Brownian process. The fractal models are more compatible with mammalian data than the nonfractal models considered, and they may also be better supported by Darwinian theory. Although the fractal-Gaussian-rate Poisson process has not been proven to have better agreement with data or theory than the other fractal models, its Gaussian nature simplifies the exploration of its impact on evolutionary distance errors and relative-rate tests. Received: 29 September 1999 / Accepted: 20 January 2000  相似文献   

3.
Direct Calculation of a Tree Length Using a Distance Matrix   总被引:8,自引:0,他引:8  
Comparative studies of tree-building methods have shown minimum evolution to be in general an accurate criterion for selecting a true tree. To improve the use of this criterion, this paper proposes a method for rapidly and directly calculating a length of a dichotomous tree without having to resort to branch length calculations. This direct calculation (DC) method applies to the complete final topology, giving equal importance to each branch after a dichotomy. According to this method, the tree length S DC is S DC =∑ i j (D ij /2 Bij ) = (∑ i<j D ij 2 Bmax−Bij )/2 Bmax −1 where D ij is the observed distance between taxa i and j, B ij is the number of branches connecting i and j, Bmax is the greatest B ij in the tree, and the powers of two are due to the dichotomy of the tree. This tree length expression may be used as a rapid method for selecting the shortest tree from a set of hypothetical or subobtimal trees. Received: 2 March 2000 / Accepted: 24 March 2000  相似文献   

4.
A new, model-based method was devised to locate nucleotide changes in a given phylogenetic tree. For each site, the posterior probability of any possible change in each branch of the tree is computed. This probabilistic method is a valuable alternative to the maximum parsimony method when base composition is skewed (i.e., different from 25% A, 25% C, 25% G, 25% T): computer simulations showed that parsimony misses more rare → common than common → rare changes, resulting in biased inferred change matrices, whereas the new method appeared unbiased. The probabilistic method was applied to the analysis of the mutation and substitution processes in the mitochondrial control region of mouse. Distinct change patterns were found at the polymorphism (within species) and divergence (between species) levels, rejecting the hypothesis of a neutral evolution of base composition in mitochondrial DNA. Received: 15 March 1999 / Accepted: 7 October 1999  相似文献   

5.
Algorithmic details to obtain maximum likelihood estimates of parameters on a large phylogeny are discussed. On a large tree, an efficient approach is to optimize branch lengths one at a time while updating parameters in the substitution model simultaneously. Codon substitution models that allow for variable nonsynonymous/synonymous rate ratios (ω=d N/d S) among sites are used to analyze a data set of human influenza virus type A hemagglutinin (HA) genes. The data set has 349 sequences. Methods for obtaining approximate estimates of branch lengths for codon models are explored, and the estimates are used to test for positive selection and to identify sites under selection. Compared with results obtained from the exact method estimating all parameters by maximum likelihood, the approximate methods produced reliable results. The analysis identified a number of sites in the viral gene under diversifying Darwinian selection and demonstrated the importance of including many sequences in the data in detecting positive selection at individual sites. Received: 25 April 2000 / Accepted: 24 July 2000  相似文献   

6.
Maximum likelihood (ML) phylogenies based on 9,957 amino acid (AA) sites of 45 proteins encoded in the plastid genomes of Cyanophora, a diatom, a rhodophyte (red algae), a euglenophyte, and five land plants are compared with respect to several properties of the data, including between-site rate variation and aberrant amino acid composition in individual species. Neighbor-joining trees from AA LogDet distances and ML analyses are seen to be congruent when site rate variability was taken into account. Four feasible trees are identified in these analyses, one of which is preferred, and one of which is almost excluded by statistical criteria. A transition probability matrix for the general reversible Markov model of amino acid substitutions is estimated from the data, assuming each of these four trees. In all cases, the tree with diatom and rhodophyte as sister taxa was clearly favored. The new transition matrix based on the best tree, called cpREV, takes into account distinct substitution patterns in plastid-encoded proteins and should be useful in future ML inferences using such data. A second rate matrix, called cpREV*, based on a weighted sum of rate matrices from different trees, is also considered. Received: 3 June 1999 / Accepted: 26 November 1999  相似文献   

7.
樟子松人工林一级枝条基径和枝长模型的研究   总被引:4,自引:0,他引:4  
刘兆刚  舒扬  李凤日 《植物研究》2008,28(2):244-248
以东北林业大学帽儿山实验林场樟子松人工林为研究对象,采用枝解析的方法,于2002年、2003年测定了53株林木(年龄17~38 a,直径8.61~21.5 cm,树高7.48~18.24 m)的树冠变量,建立了基于总着枝深度(DINC)和树冠内一级枝条基径(BD)、枝长(BL)的预估模型。对于大小相同树木的一级枝条, 这些树冠变量随着DINC的增加而增大,而林木的胸径(DBH)、树高(HT)变量又很好地反映了不同大小树木的基径和枝长的变化。采用独立检验样本对构建的树冠内一级枝条基径和枝长模型进行了拟合统计量和精度检验,结果表明:模型预测效果良好,精度均达到95%以上。构建的一级枝条基径和枝长模型为进一步合理地描述樟子松人工林树冠的形状及其变化以及三维可视化经营提供依据。  相似文献   

8.
The phylogenetic placement of the Aquifex and Thermotoga lineages has been inferred from (i) the concatenated ribosomal proteins S10, L3, L4, L23, L2, S19, L22, and S3 encoded in the S10 operon (833 aa positions); (ii) the joint sequences of the elongation factors Tu(1α) and G(2) coded by the str operon tuf and fus genes (733 aa positions); and (iii) the joint RNA polymerase β- and β′-type subunits encoded in the rpoBC operon (1130 aa positions). Phylogenies of r-protein and EF sequences support with moderate (r-proteins) to high statistical confidence (EFs) the placement of the two hyperthermophiles at the base of the bacterial clade in agreement with phylogenies of rRNA sequences. In the more robust EF-based phylogenies, the branching of Aquifex and Thermotoga below the successive bacterial lineages is given at bootstrap proportions of 82% (maximum likelihood; ML) and 85% (maximum parsimony; MP), in contrast to the trees inferred from the separate EF-Tu(1α) and EF-G(2) data sets, which lack both resolution and statistical robustness. In the EF analysis MP outperforms ML in discriminating (at the 0.05 level) trees having A. pyrophilus and T. maritima as the most basal lineages from competing alternatives that have (i) mesophiles, or the Thermus genus, as the deepest bacterial radiation and (ii) a monophyletic A. pyrophilusT. maritima cluster situated at the base of the bacterial clade. RNAP-based phylogenies are equivocal with respect to the Aquifex and Thermotoga placements. The two hyperthermophiles fall basal to all other bacterial phyla when potential artifacts contributed by the compositionally biased and fast-evolving Mycoplasma genitalium and Mycoplasma pneumoniae sequences are eschewed. However, the branching order of the phyla is tenuously supported in ML trees inferred by the exhaustive search method and is unresolved in ML trees inferred by the quartet puzzling algorithm. A rooting of the RNA polymerase-subunit tree at the mycoplasma level seen in both the MP trees and the ML trees reconstructed with suboptimal amino acid substitution models is not supported by the EF-based phylogenies which robustly affiliate mycoplasmas with low-G+C gram-positives and, most probably, reflects a ``long branch attraction' artifact. Received: 22 September 1999 / Accepted: 11 January 2000  相似文献   

9.
A heuristic approach to search for the maximum-likelihood (ML) phylogenetic tree based on a genetic algorithm (GA) has been developed. It outputs the best tree as well as multiple alternative trees that are not significantly worse than the best one on the basis of the likelihood criterion. These near-optimum trees are subjected to further statistical tests. This approach enables ones to infer phylogenetic trees of over 20 taxa taking account of the rate heterogeneity among sites on practical time scales on a PC cluster. Computer simulations were conducted to compare the efficiency of the present approach with that of several likelihood-based methods and distance-based methods, using amino acid sequence data of relatively large (5–24) taxa. The superiority of the ML method over distance-based methods increases as the condition of simulations becomes more realistic (an incorrect model is assumed or many taxa are involved). This approach was applied to the inference of the universal tree based on the concatenated amino acid sequences of vertically descendent genes that are shared among all genomes whose complete sequences have been reported. The inferred tree strongly supports that Archaea is paraphyletic and Eukarya is specifically related to Crenarchaeota. Apart from the paraphyly of Archaea and some minor disagreements, the universal tree based on these genes is largely consistent with the universal tree based on SSU rRNA. Received: 4 January 2001 / Accepted: 16 May 2001  相似文献   

10.
We have investigated the effects of different among-site rate variation models on the estimation of substitution model parameters, branch lengths, topology, and bootstrap proportions under minimum evolution (ME) and maximum likelihood (ML). Specifically, we examined equal rates, invariable sites, gamma-distributed rates, and site-specific rates (SSR) models, using mitochondrial DNA sequence data from three protein-coding genes and one tRNA gene from species of the New Zealand cicada genus Maoricicada. Estimates of topology were relatively insensitive to the substitution model used; however, estimates of bootstrap support, branch lengths, and R-matrices (underlying relative substitution rate matrix) were strongly influenced by the assumptions of the substitution model. We identified one situation where ME and ML tree building became inaccurate when implemented with an inappropriate among-site rate variation model. Despite the fact the SSR models often have a better fit to the data than do invariable sites and gamma rates models, SSR models have some serious weaknesses. First, SSR rate parameters are not comparable across data sets, unlike the proportion of invariable sites or the alpha shape parameter of the gamma distribution. Second, the extreme among-site rate variation within codon positions is problematic for SSR models, which explicitly assume rate homogeneity within each rate class. Third, the SSR models appear to give severe underestimates of R-matrices and branch lengths relative to invariable sites and gamma rates models in this example. We recommend performing phylogenetic analyses under a range of substitution models to test the effects of model assumptions not only on estimates of topology but also on estimates of branch length and nodal support.  相似文献   

11.
The phylogenetic position of hagfishes in vertebrate evolution is currently controversial. The 18S and 28S rRNA trees support the monophyly of hagfishes and lampreys. In contrast, the mitochondrial DNAs suggest the close association of lampreys and gnathostomes. To clarify this controversial issue, we have conducted cloning and sequencing of the four nuclear DNA–coded single-copy genes encoding the triose phosphate isomerase, calreticulin, and the largest subunit of RNA polymerase II and III. Based on these proteins, together with the Mn superoxide dismutase for which hagfish and lamprey sequences are available in database, phylogenetic trees have been inferred by the maximum likelihood (ML) method of protein phylogeny. It was shown that all the five proteins prefer the monophyletic tree of cyclostomes, and the total log-likelihood of the five proteins significantly supports the cyclostome monophyly at the level of ±1 SE. The ML trees of aldolase family comprising three nonallelic isoforms and the complement component group comprising C3, C4, and C5, both of which diverged during vertebrate evolution by gene duplications, also suggest the cyclostome monophyly. Received: 28 April 1999 / Accepted: 30 June 1999  相似文献   

12.
The Rooting of the Universal Tree of Life Is Not Reliable   总被引:19,自引:0,他引:19  
Several composite universal trees connected by an ancestral gene duplication have been used to root the universal tree of life. In all cases, this root turned out to be in the eubacterial branch. However, the validity of results obtained from comparative sequence analysis has recently been questioned, in particular, in the case of ancient phylogenies. For example, it has been shown that several eukaryotic groups are misplaced in ribosomal RNA or elongation factor trees because of unequal rates of evolution and mutational saturation. Furthermore, the addition of new sequences to data sets has often turned apparently reasonable phylogenies into confused ones. We have thus revisited all composite protein trees that have been used to root the universal tree of life up to now (elongation factors, ATPases, tRNA synthetases, carbamoyl phosphate synthetases, signal recognition particle proteins) with updated data sets. In general, the two prokaryotic domains were not monophyletic with several aberrant groupings at different levels of the tree. Furthermore, the respective phylogenies contradicted each others, so that various ad hoc scenarios (paralogy or lateral gene transfer) must be proposed in order to obtain the traditional Archaebacteria–Eukaryota sisterhood. More importantly, all of the markers are heavily saturated with respect to amino acid substitutions. As phylogenies inferred from saturated data sets are extremely sensitive to differences in evolutionary rates, present phylogenies used to root the universal tree of life could be biased by the phenomenon of long branch attraction. Since the eubacterial branch was always the longest one, the eubacterial rooting could be explained by an attraction between this branch and the long branch of the outgroup. Finally, we suggested that an eukaryotic rooting could be a more fruitful working hypothesis, as it provides, for example, a simple explanation to the high genetic similarity of Archaebacteria and Eubacteria inferred from complete genome analysis.  相似文献   

13.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

14.
Animals evolved a variety of gene families involved in cell–cell communication and developmental control by gene duplication and domain shuffling. Each family is made up of several subtypes or subfamilies with distinct structures and functions, which diverged by gene duplications and domain shufflings before the divergence of parazoans and eumetazoans. Since the separation from protostomes, vertebrates expanded the multiplicity of members (isoforms) in the same subfamily by further gene duplications in their early evolution before the fish–tetrapod split. To know the dates of isoform duplications more closely, we have conducted isolation and sequencing cDNAs encoding the fibroblast growth factor receptor, Eph, src, and platelet-derived growth factor receptor subtypes belonging to the protein tyrosine kinase family from Branchiostoma belcheri, an amphioxus, Eptatretus burgeri, a hagfish, and Lampetra reissneri, a lamprey. From a phylogenetic tree of each subfamily inferred from a maximum likelihood (ML) method, together with a bootstrap analysis based on the ML method, we have shown that the isoform duplications frequently occurred in the early evolution of vertebrates around or just before the divergence of cyclostomes and gnathostomes by gene duplications and possibly chromosomal duplications. Received: 28 April 1998 / Accepted: 30 June 1999  相似文献   

15.
There are two tightly linked loci (D and CE) for the human Rh blood group. Their gene products are membrane proteins having 12 transmembrane domains and form a complex with Rh50 glycoprotein on erythrocytes. We constructed phylogenetic networks of human and nonhuman primate Rh genes, and the network patterns suggested the occurrences of gene conversions. We therefore used a modified site-by-site reconstruction method by using two assumed gene trees and detected 9 or 11 converted regions. After eliminating the effect of gene conversions, we estimated numbers of nonsynonymous and synonymous substitutions for each branch of both trees. Whichever gene tree we selected the branch connecting hominoids and Old World monkeys showed significantly higher nonsynonymous than synonymous substitutions, an indication of positive selection. Many other branches also showed higher nonsynonymous than synonymous substitutions; this suggests that the Rh genes have experienced some kind of positive selection. Received: 16 March 1999 / Accepted: 17 June 1999  相似文献   

16.
Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers. Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates, however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States. The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1 subtype B env sequences. Received: 4 October 2000 / Accepted: 1 March 2001  相似文献   

17.
The minimum-evolution (ME) method of phylogenetic inference is based on the assumption that the tree with the smallest sum of branch length estimates is most likely to be the true one. In the past this assumption has been used without mathematical proof. Here we present the theoretical basis of this method by showing that the expectation of the sum of branch length estimates for the true tree is smallest among all possible trees, provided that the evolutionary distances used are statistically unbiased and that the branch lengths are estimated by the ordinary least-squares method. We also present simple mathematical formulas for computing branch length estimates and their standard errors for any unrooted bifurcating tree, with the least-squares approach. As a numerical example, we have analyzed mtDNA sequence data obtained by Vigilant et al. and have found the ME tree for 95 human and 1 chimpanzee (outgroup) sequences. The tree was somewhat different from the neighbor-joining tree constructed by Tamura and Nei, but there was no statistically significant difference between them.   相似文献   

18.
The correlation was shown between the length of introns and the codon usage of the coding sequences of the corresponding genes, which in some cases can be related to the level of gene expression. The link is positive in the unicellular organisms, i.e., genes with the longer introns show the higher bias of codon usage. It is most pronounced in baker's yeast, where it is definitely related to the level of gene expression—genes with the higher level of expression have the longer introns. The correlation is inverted in multicellular organisms as compared to unicellular ones. Some organisms, however, do not show the link. The presence or absence of the link does not seem to be related to the GC percent of the coding sequences. Received: 7 December 1999 / Accepted: 10 May 2000  相似文献   

19.
We have examined the length distribution of perfect dimer repeats, where perfect means uninterrupted by any other base, using data from GenBank on primates and rodents. Virtually no lengths greater than 30 repeats are found, except for rodent AG repeats, which extend to 35. Comparable numbers of long AC and AG repeats suggest that they have not been selected for special functions or DNA structures. We have compared the data with predictions of two models: (1) a Bernoulli Model in which bases are assumed equally likely and distributed at random and (2) an Unbiased Random Walk Model (URWM) in which repeats are permitted to change length by plus or minus one unit, with equal probabilities, and in which base substitutions are allowed to destroy long perfect repeats, producing two shorter perfect repeats. The source of repeats is assumed to be from single base substutions from neighboring sequences, i.e., those differing from the perfect repeat by a single base. Mutation rates either independent of repeat length or proportional to length were considered. An upper limit to the lengths L≈ 30 is assumed and isolated dimers are assumed unable to expand, so that there are absorbing barriers to the random walk at lengths 1 and L+ 1, and a steady state of lengths is reached. With these assumptions and estimated values for the rates of length mutation and base substitution, reasonable agreement is found with the data for lengths > 5 repeats. Shorter repeats, of lengths ≤ 3 are in general agreement with the Bernoulli Model. By reducing the rate of length mutations for n≤ 5, it is possible to obtain reasonable agreement with the full range of data. For these reduced rates, the times between length mutations become comparable to those suggested for a bottleneck in the evolution of Homo sapiens, which may be the reason for low heterozygosity of short repeats.  相似文献   

20.
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected. [Bayesian credible intervals; DistR method; multigene phylogeny; PHYML; rate heterogeneity.].  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号