首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
It is now well-established that compositional bias in DNA sequences can adversely affect phylogenetic analysis based on those sequences. Phylogenetic analyses based on protein sequences are generally considered to be more reliable than those derived from the corresponding DNA sequences because it is believed that the use of encoded protein sequences circumvents the problems caused by nucleotide compositional biases in the DNA sequences. There exists, however, a correlation between AT/GC bias at the nucleotide level and content of AT- and GC-rich codons and their corresponding amino acids. Consequently, protein sequences can also be affected secondarily by nucleotide compositional bias. Here, we report that DNA bias not only may affect phylogenetic analysis based on DNA sequences, but also drives a protein bias which may affect analyses based on protein sequences. We present a striking example where common phylogenetic tools fail to recover the correct tree from complete animal mitochondrial protein-coding sequences. The data set is very extensive, containing several thousand sites per sequence, and the incorrect phylogenetic trees are statistically very well supported. Additionally, neither the use of the LogDet/paralinear transform nor removal of positions in the protein alignment with AT- or GC-rich codons allowed recovery of the correct tree. Two taxa with a large compositional bias continually group together in these analyses, despite a lack of close biological relatedness. We conclude that even protein-based phylogenetic trees may be misleading, and we advise caution in phylogenetic reconstruction using protein sequences, especially those that are compositionally biased. Received: 19 February 1998 / Accepted: 28 August 1998  相似文献   

2.
A major assumption of many molecular phylogenetic methods is the homogeneity of nucleotide frequencies among taxa, which refers to the equality of the nucleotide frequency bias among species. Changes in nucleotide frequency among different lineages in a data set are thought to lead to erroneous phylogenetic inference because unrelated clades may appear similar because of evolutionarily unrelated similarities in nucleotide frequencies. We tested the effects of the heterogeneity of nucleotide frequency bias on phylogenetic inference, along with the interaction between this heterogeneity and stratified taxon sampling, by means of computer simulations using evolutionary parameters derived from genomic databases. We found that the phylogenetic trees inferred from data sets simulated under realistic, observed levels of heterogeneity for mammalian genes were reconstructed with accuracy comparable to those simulated with homogeneous nucleotide frequencies; the results hold for Neighbor-Joining, minimum evolution, maximum parsimony, and maximum-likelihood methods. The LogDet distance method, specifically designed to deal with heterogeneous nucleotide frequencies, does not perform better than distance methods that assume substitution pattern homogeneity among sequences. In these specific simulation conditions, we did not find a significant interaction between phylogenetic accuracy and substitution pattern heterogeneity among lineages, even when the taxon sampling is increased.  相似文献   

3.
Opinions split when it comes to the significance and thus the weighting of indel characters as phylogenetic markers. This paper attempts to test the phylogenetic information content of indels and nucleotide substitutions by proposing an a priori weighting system of non-protein-coding genes. Theoretically, the system rests on a weighting scheme which is based on a falsificationist approach to cladistic inference. It provides insertions, deletions and nucleotide substitutions weights according to their specific number of identical classes of potential falsifiers, resulting in the following system: nucleotide substitutions weight = 3, deletions of n nucleotides weight = (2n–1), and insertions of n nucleotides weight = (5n–1). This weighting system and the utility of indels as phylogenetic markers are tested against a suitable data set of 18S rDNA sequences of Diptera and Strepsiptera taxa together with other Metazoa species. The indels support the same clades as the nucleotide substitution data, and the application of the weighting system increases the corresponding consistency indices of the differentially weighted character types. As a consequence, applying the weighting system seems to be reasonable, and indels appear to be good phylogenetic markers.  相似文献   

4.
Standard methods of phylogenetic reconstruction are based on models that assume homogeneity of nucleotide composition among taxa. However, this assumption is often violated in biological data sets. In this study, we examine possible effects of nucleotide heterogeneity among lineages on the phylogenetic reconstruction of a bacterial group that spans a wide range of genomic nucleotide contents: obligately endosymbiotic bacteria and free-living or commensal species in the gamma-Proteobacteria. We focus on AT-rich primary endosymbionts to better understand the origins of obligately intracellular lifestyles. Previous phylogenetic analyses of this bacterial group point to the importance of accounting for base compositional variation in estimating relationships, particularly between endosymbiotic and free-living taxa. Here, we develop an approach to compare susceptibility of various phylogenetic reconstruction methods to the effects of nucleotide heterogeneity. First, we identify candidate trees of gamma-Proteobacteria groEL and 16S rRNA using approaches that assume homogeneous and stationary base composition, including Bayesian, maximum likelihood, parsimony, and distance methods. We then create permutations of the resulting candidate trees by varying the placement of the AT-rich endosymbiont Buchnera. These permutations are evaluated under the nonhomogeneous and nonstationary maximum likelihood model of Galtier and Gouy, which allows equilibrium base content to vary among examined lineages. Our results show that commonly used phylogenetic methods produce incongruent trees of the Enterobacteriales, and that the placement of Buchnera is especially unstable. However, under a nonhomogeneous model, various groEL and 16S rRNA phylogenies that separate Buchnera from other AT-rich endosymbionts (Blochmannia and Wigglesworthia) have consistently and significantly higher likelihood scores. Blochmannia and Wigglesworthia appear to have evolved from secondary endosymbionts, and represent an origin of primary endosymbiosis that is independent from Buchnera. This application of a nonhomogeneous model offers a computationally feasible way to test specific phylogenetic hypotheses for taxa with heterogeneous and nonstationary base composition.  相似文献   

5.
We have analyzed the nad3-rps12 locus for eight angiosperms in order to compare the utility of mitochondrial DNA and edited mRNA sequences in phylogenetic reconstruction. The two coding regions, containing from 25 to 35 editing sites in the various plants, have been concatenated in order to increase the significance of the analysis. Differing from the corresponding chloroplast sequences, unedited mitochondrial DNA sequences seem to evolve under a quasi-neutral substitution process which undifferentiates the nucleotide substitution rates for the three codon positions. By using complete gene sequences (all codon positions) we found that genomic sequences provide a classical angiosperm phylogenetic tree with a clear-cut grouping of monocotyledons and dicotyledons with Magnoliidae at the basal branch of the tree. Conversely, owing to their low nucleotide substitution rates, edited mRNA sequences were found not to be suitable for studying phylogenetic relationships among angiosperms. Received: 24 January 1996 / Accepted: 5 June 1996  相似文献   

6.
We show that in animal mitochondria homologous genes that differ in guanine plus cytosine (G + C) content code for proteins differing in amino acid content in a manner that relates to the G + C content of the codons. DNA sequences were analyzed using square plots, a new method that combines graphical visualization and statistical analysis of compositional differences in both DNA and protein. Square plots divide codons into four groups based on first and second position A + T (adenine plus thymine) and G + C content and indicate differences in amino acid content when comparing sequences that differ in G + C content. When sequences are compared using these plots, the amino acid content is shown to correlate with the nucleotide bias of the genes. This amino acid effect is shown in all protein-coding genes in the mitochondrial genome, including cox I, cox II, and cyt b, mitochondrial genes which are commonly used for phylogenetic studies. Furthermore, nucleotide content differences are shown to affect the content of all amino acids with A + T- and G + C-rich codons. We speculate that phylogenetic analysis of genes so affected may tend erroneously to indicate relatedness (or lack thereof) based only on amino acid content. Received: 3 July 1996 / Accepted: 6 November 1996  相似文献   

7.
8.
Phylogenetic analyses frequently rely on models of sequence evolution that detail nucleotide substitution rates, nucleotide frequencies, and site-to-site rate heterogeneity. These models can influence hypothesis testing and can affect the accuracy of phylogenetic inferences. Maximum likelihood methods of simultaneously constructing phylogenetic tree topologies and estimating model parameters are computationally intensive, and are not feasible for sample sizes of 25 or greater using personal computers. Techniques that initially construct a tree topology and then use this non-maximized topology to estimate ML substitution rates, however, can quickly arrive at a model of sequence evolution. The accuracy of this two-step estimation technique was tested using simulated data sets with known model parameters. The results showed that for a star-like topology, as is often seen in human immunodeficiency virus type 1 (HIV-1) subtype B sequences, a random starting topology could produce nucleotide substitution rates that were not statistically different than the true rates. Samples were isolated from 100 HIV-1 subtype B infected individuals from the United States and a 620 nt region of the env gene was sequenced for each sample. The sequence data were used to obtain a substitution model of sequence evolution specific for HIV-1 subtype B env by estimating nucleotide substitution rates and the site-to-site heterogeneity in 100 individuals from the United States. The method of estimating the model should provide users of large data sets with a way to quickly compute a model of sequence evolution, while the nucleotide substitution model we identified should prove useful in the phylogenetic analysis of HIV-1 subtype B env sequences. Received: 4 October 2000 / Accepted: 1 March 2001  相似文献   

9.
Substitutional bias confounds inference of cyanelle origins from sequence data   总被引:10,自引:0,他引:10  
Summary Available molecular and biochemical data offer conflicting evidence for the origin of the cyanelle of Cyanophora paradoxa. We show that the similarity of cyanelle and green chloroplast sequences is probably a result of these two lineages independently developing the same pattern of directional nucleotide change (substitutional bias). This finding suggests caution should be exercised in the interpretation of nucleotide sequence analyses that appear to favor the view of a common endosymbiont for the cyanelle and chlorophyll-b-containing chloroplasts. The data and approaches needed to resolve the issue of cyanelle origins are discussed. Our findings also have general implications for phylogenetic inference under conditions where the base compositions (compositional bias) of the sequences analyzed differ. Offprint requests to: C.J. Howe  相似文献   

10.
MOTIVATION: Maximum-likelihood analysis of nucleotide and amino acid sequences is a powerful approach for inferring phylogenetic relationships and for comparing evolutionary hypotheses. Because it is a computationally demanding and time-consuming process, most algorithms explore only a minute portion of tree-space, with the emphasis on finding the most likely tree while ignoring the less likely, but not significantly worse, trees. However, when such trees exist, it is equally important to identify them to give due consideration to the phylogenetic uncertainty. Consequently, it is necessary to change the focus of these algorithms such that near optimal trees are also identified. RESULTS: This paper presents the Advanced Stepwise Addition Algorithm for exploring tree-space and two algorithms for generating all binary trees on a set of sequences. The Advanced Stepwise Addition Algorithm has been implemented in TrExML, a phylogenetic program for maximum-likelihood analysis of nucleotide sequences. TrExML is shown to be more effective at finding near optimal trees than a similar program, fastDNAml, implying that TrExML offers a better approach to account for phylogenetic uncertainty than has previously been possible. A program, TreeGen, is also described; it generates binary trees on a set of sequences allowing for extensive exploration of tree-space using other programs. AVAILABILITY: TreeGen, TrExML, and the sequence data used to test the programs are available from the following two WWW sites: http://whitetail.bemidji.msus. edu/trexml/and http://jcsmr.anu.edu.au/dmm/humgen.+ ++html.  相似文献   

11.
Rhodotorula aurantiaca (Saito) Lodder is an anamorphic basidiomycetous yeast species that belongs to the so-called "Erythrobasidium lineage" of the Urediniomycetes, according to molecular phylogenetic studies based on nucleotide sequence analyses of different ribosomal DNA regions. In the most recent editions of the yeast taxonomy treatises the species Rhodotorula colostri (Castelli) Lodder and Rhodotorula crocea Shifrine & Phaff were listed as synonyms of R. aurantiaca. Taxonomic heterogeneity within R. aurantiaca was demonstrated in a study based on whole-cell protein profiles and is also hinted at by the observed differences in physiological and biochemical characteristics among the different strains under that species name. We determined partial nucleotide sequences of the 26S rRNA gene (D1/D2 domains) of strains maintained in the CBS culture collection under R. aurantiaca, including the type strains of its synonyms. The results showed that R. colostri and R. crocea are clearly distinct from R. aurantiaca and from any other currently recognised basidiomycetous yeast species. Furthermore, phylogenetic analysis of the sequence data placed the former two species in separate lineages of the Microbotryomycetidae: R. colostri in the "ruineniae clade" (Sporidiobolus lineage or Sporidiobolales) and R. crocea loosely linked to Rhodotorula javanica (Microbotryum lineage).  相似文献   

12.
Alignments of nucleotide or amino acid sequences may contain a variety of different signals, one of which is the historical signal that we often try to recover by phylogenetic analysis. Other signals, such as those arising due to compositional heterogeneities, among-lineage and among-site rate heterogeneities, invariant sites, and covariotides, may interfere adversely with the recovery of the historical signal. The effect of the interaction of these signals on phylogenetic inference is not well understood and may, in many cases, even be underappreciated. In this study, we investigate this matter and present results based on Monte Carlo simulations. We explored the success of four phylogenetic methods in recovering the true tree from data that had evolved under conditions where the equilibrium base frequencies and substitution rates were allowed to vary among lineages. Seven scenarios with increasingly complex conditions were investigated. All of the methods tested, with the exception of neighbor-joining using LogDet distances, were sensitive to compositional convergence in nonsister lineages. Maximum parsimony was also susceptible to attraction between long edges. In many cases, however, phylogenetic inference methods can still recover the true tree when misleading signals are present, in some instances even when the historical signal is no longer dominant. These results highlight the growing need for simple methods to detect violation of the phylogenetic assumptions.  相似文献   

13.
MOTIVATION: The availability of the whole genomic sequences of HIV-1 viruses provides an excellent resource for studying the HIV-1 phylogenies using all the genetic materials. However, such huge volumes of data create computational challenges in both memory consumption and CPU usage. RESULTS: We propose the complete composition vector representation for an HIV-1 strain, and a string scoring method to extract the nucleotide composition strings that contain the richest evolutionary information for phylogenetic analysis. In this way, a large-scale whole genome phylogenetic analysis for thousands of strains can be done both efficiently and effectively. By using 42 carefully curated strains as references, we apply our method to subtype 1156 HIV-1 strains (10.5 million nucleotides in total), which include 825 pure subtype strains and 331 recombinants. Our results show that our nucleotide composition string selection scheme is computationally efficient, and is able to define both pure subtypes and recombinant forms for HIV-1 strains using the 5000 top ranked nucleotide strings. AVAILABILITY: The Java executable and the HIV-1 datasets are accessible through 'http://www.cs.ualberta.ca/~ghlin/src/WebTools/hiv.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

14.
Knowledge of rRNA structure is increasingly important to assist phylogenetic analysis through reconstructing optimal alignment, utilizing molecule features as an additional source of data and refining appropriate models of evolution of the molecule. We describe a procedure of optimization for alignment and a new coding method for nucleotide sequence data using secondary structure models of the D2 and D3 expansion fragments of the LSU-rRNA gene reconstructed for fifteen nematode species of the agriculturally important and diverse family Hoplolaimidae, order Tylenchida. Using secondary structure information we converted the original sequence data into twenty-eight symbol codes and submitted the transformed data to maximum parsimony analysis. We also applied the original sequence data set for Bayesian inference. This used the doublet model with sixteen states of nucleotide doublets for the stem region and the standard model of DNA substitution with four nucleotide states for loops and bulges. By this approach, we demonstrate that using structural information for phylogenetic analyses led to trees with lower resolved relationships between clades and likely eliminated some artefactual support for misinterpreted relationships, such as paraphyly of Helicotylenchus or Rotylenchus. This study as well as future phylogenetic analyses is herein supported by the development of an on-line database, NEMrRNA, for rRNA molecules in a structural format for nematodes. We also have developed a new computer program, RNAstat, for calculation of nucleotide statistics designed and proposed for phylogenetic studies.  相似文献   

15.
Summary An overview of recent molecular analyses regarding origins of plastids in algal lineages is presented. Since different phylogenetic analyses can yield contradictory views of algal plastid origins, we have examined the effect of two distance measurement methods and two distance matrix tree-building methods upon topologies for the ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit nucleotide sequence data set. These results are contrasted to those from bootstrap parsimony analysis of nucleotide sequence data subsets. It is shown that the phylogenetic information contained within nucleotide sequences for the chloroplast-encoded gene for the large subunit of ribulose-1,5-bisphosphate carboxylase/oxygenase, integral to photosynthesis, indicates an independent origin for this plastid gene in different plant taxa. This finding is contrasted to contrary results derived from 16S rRNA sequences. Possible explanations for discrepancies observed for these two different molecules are put forth. Other molecular sequence data which address questions of early plant evolution and the eubacterial origins of algal organelles are discussed. Offprint requests to: W. Martin  相似文献   

16.
Selecting the best-fit model of nucleotide substitution   总被引:2,自引:0,他引:2  
Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution.  相似文献   

17.
Modes and rates of molecular evolution, and congruence and combinability for phylogenetic reconstruction, of portions of the nuclear large ribosomal subunit (nLSU-rDNA) and mitochondrial small subunit (mtSSU-rDNA) genes were investigated in the mushroom genus Amanita. The AT content was higher in the mtSSU-rDNA than in the nLSU-rDNA. A transition bias in which AT substitutions were as frequent as transitions was present in the mtSSU-rDNA but not in the nLSU-rDNA. Among-sites rate variation in nucleotide substitutions at variable sites was present in the nLSU-rDNA but not in the mtSSU-rDNA. Likelihood ratio tests indicated very different models of evolution for the two molecules. A molecular clock could be rejected for both data sets. Rates of molecular evolution in the two molecules were uncoupled: faster evolutionary rates in the mtSSU-rDNA and nLSU-rDNA were not observed for the same taxa. In separate phylogenetic analyses, the nLSU-rDNA data set had higher phylogenetic resolution. The partition homogeneity test and statistical bootstrap support for branches indicated absence of conflict in the phylogenetic signal in the two data sets; however, tree topologies produced from the separate data sets were not congruent. Heterogeneity in modes and rates of evolution in the two molecules pose difficulties for a combined analysis of the two data sets: the use of equally weighted parsimony is not fully satisfactory when rate heterogeneity is present, and it is impractical to determine a model for maximum-likelihood analysis that fits simultaneously two heterogeneous data sets. Overall topologies produced from either the separated or the combined analyses using various tree reconstruction methods were identical for nearly all statistically significant branches.  相似文献   

18.
We have investigated the phylogenetic relationships of monotremes and marsupials using nucleotide sequence data from the neurotrophins; nerve growth factor (NGF), brain-derived neurotrophic factor (BDNF), and neurotrophin-3 (NT-3). The study included species representing monotremes, Australasian marsupials and placentals, as well as species representing birds, reptiles, and fish. PCR was used to amplify fragments encoding parts of the neurotrophin genes from echidna, platypus, and eight marsupials from four different orders. Phylogenetic trees were generated using parsimony analysis, and support for the different tree structures was evaluated by bootstrapping. The analysis was performed with NGF, BDNF, or NT-3 sequence data used individually as well as with the three neurotrophins in a combined matrix, thereby simultaneously considering phylogenetic information from three separate genes. The results showed that the monotreme neurotrophin sequences associate to either therian or bird neurotrophin sequences and suggests that the monotremes are not necessarily related closer to therians than to birds. Furthermore, the results confirmed the present classification of four Australasian marsupial orders based on morphological characters, and suggested a phylogenetic relationship where Dasyuromorphia is related closest to Peramelemorphia followed by Notoryctemorphia and Diprotodontia. These studies show that sequence data from neurotrophins are well suited for phylogenetic analysis of mammals and that neurotrophins can resolve basal relationships in the evolutionary tree. Received: 27 January 1997 / Accepted: 20 March 1997  相似文献   

19.
Selection at the protein-level can influence nucleotide substitution patterns for protein-coding genes, which in turn can affect their performance as phylogenetic characters. In this study, we compare two protein-coding nuclear genes that appear to have evolved under markedly different selective constraints and evaluate how selection has shaped their phylogenetic signal. We sequenced 1,100+ bp of exon 6 of the gene encoding dentin matrix protein 1 (DMP1) from most of the currently recognized genera of New World opossums (family: Didelphidae) and compared these data to an existing matrix of sequences from the interphotoreceptor retinoid-binding protein gene (IRBP) and morphological characters. In comparison to IRBP, DMP1 has far fewer sites under strong purifying selection and exhibits a number of sites under positive directional selection. Furthermore, selection on the DMP1 protein appears to conserve short, acidic, serine-rich domains rather than primary amino acid sequence; as a result, DMP1 has significantly different nucleotide substitution patterns from IRBP. Using Bayesian methods, we determined that DMP1 evolves almost 30% faster than IRBP, has 2.5 times more variable sites, has less among-site rate heterogeneity, is skewed toward A and away from CT (IRBP has relatively even base frequencies), and has a significantly lower rate of change between adenine and any other nucleotide. Despite these different nucleotide substitution patterns, estimates of didelphid relationships based on separate phylogenetic analyses of these genes are remarkably congruent whether patterns of nucleotide substitution are explicitly modeled or not. Nonetheless, DMP1 contains more phylogenetically informative characters per unit sequence and resolves more nodes with higher support than does IRBP. Thus, for these two genes, relaxed functional constraints and positive selection appear to improve the efficiency of phylogenetic estimation without compromising its accuracy.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号