首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.  相似文献   

3.
4.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

5.
The WNT: gene family is large, and new members are still being discovered. We constructed a parsimony tree for the WNT: family based on all 82 of the full-length sequences currently available. The inclusion of sequences from the cephalochordate amphioxus is especially useful in comprehensive gene trees, because the amphioxus genes in each subfamily often mark the base of the vertebrate diversification. We thus isolated full-length cDNAs of five amphioxus WNT: genes (AmphiWnt1, AmphiWnt4, AmphiWnt7, AmphiWnt8, and AmphiWnt11) for addition to the overall WNT: family tree. The analysis combined amino acid and nucleotide sequences (excluding third codon positions), taking into account 97% of the available data for each sequence. This combinatorial method had the advantage of generating a single most-parsimonious tree that was trichotomy-free. The reliability of the nodes was assessed by both jackknifing and Bremer support (decay index). A regression analysis revealed that branch length was strongly correlated with branch support, and possible reasons for this pattern are discussed. The tree topology suggested that in amphioxus, at least an AmphiWnt5 and an AmphiWnt10 have yet to be discovered.  相似文献   

6.
Summary We have recently described a method of building phylogenetic trees and have outlined an approach for proving whether a particular tree is optimal for the data used. In this paper we describe in detail the method of establishing lower bounds on the length of a minimal tree by partitioning the data set into subsets. All characters that could be involved in duplications in the data are paired with all other such characters. A matching algorithm is then used to obtain the pairing of characters that reveals the most duplications in the data. This matching may still not account for all nucleotide substitutions on the tree. The structure of the tree is then used to help select subsets of three or more. characters until the lower bound found by partitioning is equal to the length of the tree. The tree must then be a minimal tree since no tree can exist with a length less than that of the lower bound.The method is demonstrated using a set of 23 vertebrate cytochrome c sequences with the criterion of minimizing the total number of nucleotide substitutions. There are 131130 7045768798 9603440625 topologically distinct trees that can be constructed from this data set. The method described in this paper does identify 144 minimal tree variants. The method is general in the sense that it can be used for other data and other criteria of length. It need not however always be possible to prove a tree minimal but the method will give an upper and lower bound on the length of minimal trees.  相似文献   

7.
Accuracy of phylogenetic trees estimated from DNA sequence data   总被引:4,自引:1,他引:3  
The relative merits of four different tree-making methods in obtaining the correct topology were studied by using computer simulation. The methods studied were the unweighted pair-group method with arithmetic mean (UPGMA), Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was assumed to evolve into eight sequences following a given model tree. Both constant and varying rates of nucleotide substitution were considered. Once the DNA sequences for the eight extant species were obtained, phylogenetic trees were constructed by using corrected (d) and uncorrected (p) nucleotide substitutions per site. The topologies of the trees obtained were then compared with that of the model tree. The results obtained can be summarized as follows: (1) The probability of obtaining the correct rooted or unrooted tree is low unless a large number of nucleotide differences exists between different sequences. (2) When the number of nucleotide substitutions per sequence is small or moderately large, the FM, DW, and MF methods show a better performance than UPGMA in recovering the correct topology. The former group of methods is particularly good for obtaining the correct unrooted tree. (3) When the number of substitutions per sequence is large, UPGMA is at least as good as the other methods, particularly for obtaining the correct rooted tree. (4) When the rate of nucleotide substitution varies with evolutionary lineage, the FM, DW, and MF methods show a better performance in obtaining the correct topology than UPGMA, except when a rooted tree is to be produced from data with a large number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250 WORDS)   相似文献   

8.
We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127--150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151--166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.  相似文献   

9.
Isolates of cauliflower mosaic virus (CaMV) differ in host range and symptomatology. Knowledge of their sequence relationships should assist in identifying nucleotide sequences responsible for isolate-specific characters. Complete nucleotide sequences of the DNAs of eight isolates of CaMV were aligned and the aligned sequences were used to analyze phylogenetic relationships by maximum likelihood, bootstrapped parsimony, and distance methods. Isolates found in North America clustered separately from those isolated from other parts of the world. Additional isolates, for which partial sequences were available, were incorporated into phylogenetic analysis of the sequences of genome segments corresponding to individual protein coding regions or the large intergenic region of CaMV DNA. The analysis revealed several instances where the position of an isolate on a tree for one coding region did not agree with the position of the isolate on the tree for the complete genome or with its position on trees for other coding regions. Examination of the distribution of shared residue types of phylogenetically informative positions in anomalous regions suggested that most of the anomalies were due to recombination events during the evolution of the isolates. Application of an algorithm that searches for segments of significant length that are identical between pairs of isolates or contain a significantly high concentration of polymorphisms suggested two additional recombination events between progenitors of the isolates studied and an event between the XinJing isolate and a CaMV not represented in the data set. An earlier phylogenetic origin for CaMV than for carnation etched ring virus, the caulimovirus used as outgroup in these analyses, was deduced from the position of the outgroup with North American isolates in some trees, but with non-North American isolates in other trees. Correspondence to: U. Melcher  相似文献   

10.
We present a further application of the stochastic model previously described (Lanave et al., 1984, 1985) for measuring the nucleotide substitution rate in the mammalian evolution of the mitochondrial DNA (mtDNA). The applicability of this method depends on the validity of "stationarity conditions" (equal nucleotide frequencies at first, second and third silent codon positions in homologous protein coding genes). In the comparison of homologous sequences satisfying the stationarity condition at the silent sites, only the four codon families (quartets) for which both transitions and transversions are silent at the third position are considered here. This has allowed us to estimate the transition and transversion rates for any pair of species. We have analyzed the third silent codon position of the triplet rat-mouse-cow, of a series of slightly divergent primates and of two Drosophila species. In terms of two external dating input we have then determined the phylogenetic trees for rat, mouse, and cow as well as for a number of primates including man. The phylogenetic tree that we have derived for the triplet rat, mouse and cow agrees with that we had previously determined by analyzing the first, second and third silent codon positions (in both duets and quartets) of mt genes (Lanave et al., 1985). For primates our method leads to the following branching order from the oldest to the most recent: Gibbon, Orangutan, Gorilla, Chimpanzee and Man. In absolute time, fixing the distance Chimpanzee-Man as 5 million years (Myr) we estimate the dating of the divergence nodes as: Gorilla 7 Myr; Orangutan 16 Myr; Gibbon 20 Myr. In all cases analyzed, the transition rate has been found to be substantially higher than the transversion rate. Moreover we have found that the transition/transversion ratio is different in the various lineages. We suggest that this fact is probably related to the nucleotide frequencies at the third silent codon position.  相似文献   

11.
12.
MOTIVATION: Heterochronous gene sequence data is important for characterizing the evolutionary processes of fast-evolving organisms such as RNA viruses. A limited set of algorithms exists for estimating the rate of nucleotide substitution and inferring phylogenetic trees from such data. The authors here present a new method, Tree and Rate Estimation by Local Evaluation (TREBLE) that robustly calculates the rate of nucleotide substitution and phylogeny with several orders of magnitude improvement in computational time. METHODS: For the basis of its rate estimation TREBLE novelly utilizes a geometric interpretation of the molecular clock assumption to deduce a local estimate of the rate of nucleotide substitution for triplets of dated sequences. Averaging the triplet estimates via a variance weighting yields a global estimate of the rate. From this value, an iterative refinement procedure relying on statistical properties of the triplets then generates a final estimate of the global rate of nucleotide substitution. The estimated global rate is then utilized to find the tree from the pairwise distance matrix via an UPGMA-like algorithm. RESULTS: Simulation studies show that TREBLE estimates the rate of nucleotide substitution with point estimates comparable with the best of available methods. Confidence intervals are comparable with that of BEAST. TREBLE's phylogenetic reconstruction is significantly improved over the other distance matrix method but not as accurate as the Bayesian algorithm. Compared with three other algorithms, TREBLE reduces computational time by a minimum factor of 3000. Relative to the algorithm with the most accurate estimates for the rate of nucleotide substitution (i.e. BEAST), TREBLE is over 10,000 times more computationally efficient. AVAILABILITY: jdobrien.bol.ucla.edu/TREBLE.html  相似文献   

13.
Irwin AJ  Hamrick JL  Godt MJ  Smouse PE 《Heredity》2003,90(2):187-194
Studies of pollen movement in plant populations are often limited to a single reproductive event, despite concerns about the adequacy of single-year measures for perennial organisms. In this study, we estimate the effective number of pollen donors per tree from a multiyear study of Albizia julibrissin Durazz (mimosa, Fabaceae), an outcrossing, insect-pollinated tree. We determined 40 seedling genotypes for each of 15 seed trees during 4 successive years. A molecular analysis of variance of the pollen gametes fertilizing the sampled seeds was used to partition variation in pollen pools among seed trees, among years, and within single tree-year collections. Using these variance components, we demonstrate significant male gametic variability among years for individual trees. However, results indicate that yearly variation in the 'global pollen pool', averaged over all 15 seed trees for these 4 years, is effectively zero. We estimate the effective number of pollen donors for a single mimosa tree (N(ep)) to be 2.87. Single season analyses yield N(ep) approximately 2.05, which is 40% less than the value of N(ep) estimated from 4 years of data. We discuss optimal sampling for future studies designed to estimate N(ep). Studies should include more trees, each sampled over at least a few years, with fewer seeds per tree per year than are needed for a traditional parentage study.  相似文献   

14.
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p- distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree- making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.   相似文献   

15.
Statistical methods for computing the standard errors of the branching points of an evolutionary tree are developed. These methods are for the unweighted pair-group method-determined (UPGMA) trees reconstructed from molecular data such as amino acid sequences, nucleotide sequences, restriction-sites data, and electrophoretic distances. They were applied to data for the human, chimpanzee, gorilla, orangutan, and gibbon species. Among the four different sets of data used, DNA sequences for an 895-nucleotide segment of mitochondrial DNA (Brown et al. 1982) gave the most reliable tree, whereas electrophoretic data (Bruce and Ayala 1979) gave the least reliable one. The DNA sequence data suggested that the chimpanzee is the closest and that the gorilla is the next closest to the human species. The orangutan and gibbon are more distantly related to man than is the gorilla. This topology of the tree is in agreement with that for the tree obtained from chromosomal studies and DNA-hybridization experiments. However, the difference between the branching point for the human and the chimpanzee species and that for the gorilla species and the human-chimpanzee group is not statistically significant. In addition to this analysis, various factors that affect the accuracy of an estimated tree are discussed.   相似文献   

16.
Recent phylogenetic analyses of cetacean relationships based on DNA sequence data have challenged the traditional view that baleen whales (Mysticeti) and toothed whales (Odontoceti) are each monophyletic, arguing instead that baleen whales are the sister group of the odontocete family Physeteridae (sperm whales). We reexamined this issue in light of a morphological data set composed of 207 characters and molecular data sets of published 12S, 16S, and cytochrome b mitochondrial DNA sequences. We reach four primary conclusions: (1) Our morphological data set strongly supports the traditional view of odontocete monophyly; (2) the unrooted molecular and morphological trees are very similar, and most of the conflict results from alternative rooting positions; (3) the rooting position of the molecular tree is sensitive to choice of artiodactyls outgroup taxa and the treatment of two small but ambiguously aligned regions of the 12S and 16S sequences, whereas the morphological root is strongly supported; and (4) combined analyses of the morphological and molecular data provide a well-supported phylogenetic estimate consistent with that based on the morphological data alone (and the traditional view of toothed-whale monophyly) but with increased bootstrap support at nearly every node of the tree.  相似文献   

17.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

18.
The problem of determining an optimal phylogenetic tree from a set of data is an example of the Steiner problem in graphs. There is no efficient algorithm for solving this problem with reasonably large data sets. In the present paper an approach is described that proves in some cases that a given tree is optimal without testing all possible trees. The method first uses a previously described heuristic algorithm to find a tree of relatively small total length. The second part of the method independently analyses subsets of sites to determine a lower bound on the length of any tree. We simultaneously attempt to reduce the total length of the tree and increase the lower bound. When these are equal it is not possible to make a shorter tree with a given data set and given criterion. An example is given where the only two possible minimal trees are found for twelve different mammalian cytochrome c sequences. The criterion of finding the smallest number of minimum base changes was used. However, there is no general method of guaranteeing that a solution will be found in all cases and in particular better methods of improving the estimate of the lower bound need to be developed.  相似文献   

19.
R. B. Meagher  S. Berry-Lowe    K. Rice 《Genetics》1989,123(4):845-863
The nucleotide sequences encoding the mature portion of 31 ribulose 1.5-bisphosphate carboxylase small subunit (SSU) genes from 17 genera of plants, green algae and cyanobacteria were examined. Among the 465 pairwise sequence comparisons, SSU multigene family members within the same species were more similar to each other in nonsynonymous or replacement nucleotide substitutions (RNS) than they were to SSU sequences in any other organism. The concerted evolution of independent SSU gene lineages within closely related plant species suggests that homogenization of RNS positions has occurred at least once in the life of each genus. The rate of expected RNS among mature SSU sequences was calculated to be 1.25 X 10(-9)/site/yr for the first 70 million years (MY) of divergence with a significant slowing to 0.13 X 10(-9)/site/yr for the next 1,400 MY. The data suggest that mature SSU sequences do not accumulate more than 20% differences in the RNS positions without compensatory changes in other components of this enzyme system. During the first 70 MY of divergence between species, the rate of expected synonymous or silent nucleotide substitutions (SNS) is approximately 6.6 X 10(-9)/site/yr. This is five times the RNS rate and is similar to the silent rate observed in animals. In striking contrast, SNS and RNS do not show this correlation among SSU gene family members within a species. A mechanism involving gene conversion within the exons followed by selection for biased gene conversion products with conservation of RNS positions and divergence of SNS positions is discussed. A SSU gene tree based on corrected RNS for 31 SSU sequences is presented and agrees well with a species tree based on morphological and cytogenetic traits for the 17 genera examined. SSU gene comparisons may be useful in predicting phylogenetic relationships and in some cases divergence times of various plant, algal and cyanobacterial species.  相似文献   

20.
The statistical properties of sample estimation and bootstrap estimation of phylogenetic variability from a sample of nucleotide sequences are studied by using model trees of three taxa with an outgroup and by assuming a constant rate of nucleotide substitution. The maximum-parsimony method of tree reconstruction is used. An analytic formula is derived for estimating the sequence length that is required if P, the probability of obtaining the true tree from the sampled sequences, is to be equal to or higher than a given value. Bootstrap estimation is formulated as a two-step sampling procedure: (1) sampling of sequences from the evolutionary process and (2) resampling of the original sequence sample. The probability that a bootstrap resampling of an original sequence sample will support the true tree is found to depend on the model tree, the sequence length, and the probability that a randomly chosen nucleotide site is an informative site. When a trifurcating tree is used as the model tree, the probability that one of the three bifurcating trees will appear in > or = 95% of the bootstrap replicates is < 5%, even if the number of bootstrap replicates is only 50; therefore, the probability of accepting an erroneous tree as the true tree is < 5% if that tree appears in > or = 95% of the bootstrap replicates and if more than 50 bootstrap replications are conducted. However, if a particular bifurcating tree is observed in, say, < 75% of the bootstrap replicates, then it cannot be claimed to be better than the trifurcating tree even if > or = 1,000 bootstrap replications are conducted. When a bifurcating tree is used as the model tree, the bootstrap approach tends to overestimate P when the sequences are very short, but it tends to underestimate that probability when the sequences are long. Moreover, simulation results show that, if a tree is accepted as the true tree only if it has appeared in > or = 95% of the bootstrap replicates, then the probability of failing to accept any bifurcating tree can be as large as 58% even when P = 95%, i.e., even when 95% of the samples from the evolutionary process will support the true tree. Thus, if the rate-constancy assumption holds, bootstrapping is a conservative approach for estimating the reliability of an inferred phylogeny for four taxa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号