首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gene trees are evolutionary trees representing the ancestry of genes sampled from multiple populations. Species trees represent populations of individuals—each with many genes—splitting into new populations or species. The coalescent process, which models ancestry of gene copies within populations, is often used to model the probability distribution of gene trees given a fixed species tree. This multispecies coalescent model provides a framework for phylogeneticists to infer species trees from gene trees using maximum likelihood or Bayesian approaches. Because the coalescent models a branching process over time, all trees are typically assumed to be rooted in this setting. Often, however, gene trees inferred by traditional phylogenetic methods are unrooted. We investigate probabilities of unrooted gene trees under the multispecies coalescent model. We show that when there are four species with one gene sampled per species, the distribution of unrooted gene tree topologies identifies the unrooted species tree topology and some, but not all, information in the species tree edges (branch lengths). The location of the root on the species tree is not identifiable in this situation. However, for 5 or more species with one gene sampled per species, we show that the distribution of unrooted gene tree topologies identifies the rooted species tree topology and all its internal branch lengths. The length of any pendant branch leading to a leaf of the species tree is also identifiable for any species from which more than one gene is sampled.  相似文献   

2.
A total of 22 genes from the genome of Salinibacter ruber strain M31 were selected in order to study the phylogenetic position of this species based on protein alignments. The selection of the genes was based on their essential function for the organism, dispersion within the genome, and sufficient informative length of the final alignment. For each gene, an individual phylogenetic analysis was performed and compared with the resulting tree based on the concatenation of the 22 genes, which rendered a single alignment of 10,757 homologous positions. In addition to the manually chosen genes, an automatically selected data set of 74 orthologous genes was used to reconstruct a tree based on 17,149 homologous positions. Although single genes supported different topologies, the tree topology of both concatenated data sets was shown to be identical to that previously observed based on small subunit (SSU) rRNA gene analysis, in which S. ruber was placed together with Bacteroidetes. In both concatenated data sets the bootstrap was very high, but an analysis with a gradually lower number of genes indicated that the bootstrap was greatly reduced with less than 12 genes. The results indicate that tree reconstructions based on concatenating large numbers of protein coding genes seem to produce tree topologies with similar resolution to that of the single 16S rRNA gene trees. For classification purposes, 16S rRNA gene analysis may remain as the most pragmatic approach to infer genealogic relationships.  相似文献   

3.
Phylogenies involving nonmodel species are based on a few genes, mostly chosen following historical or practical criteria. Because gene trees are sometimes incongruent with species trees, the resulting phylogenies may not accurately reflect the evolutionary relationships among species. The increase in availability of genome sequences now provides large numbers of genes that could be used for building phylogenies. However, for practical reasons only a few genes can be sequenced for a wide range of species. Here we asked whether we can identify a few genes, among the single-copy genes common to most fungal genomes, that are sufficient for recovering accurate and well-supported phylogenies. Fungi represent a model group for phylogenomics because many complete fungal genomes are available. An automated procedure was developed to extract single-copy orthologous genes from complete fungal genomes using a Markov Clustering Algorithm (Tribe-MCL). Using 21 complete, publicly available fungal genomes with reliable protein predictions, 246 single-copy orthologous gene clusters were identified. We inferred the maximum likelihood trees using the individual orthologous sequences and constructed a reference tree from concatenated protein alignments. The topologies of the individual gene trees were compared to that of the reference tree using three different methods. The performance of individual genes in recovering the reference tree was highly variable. Gene size and the number of variable sites were highly correlated and significantly affected the performance of the genes, but the average substitution rate did not. Two genes recovered exactly the same topology as the reference tree, and when concatenated provided high bootstrap values. The genes typically used for fungal phylogenies did not perform well, which suggests that current fungal phylogenies based on these genes may not accurately reflect the evolutionary relationships among species. Analyses on subsets of species showed that the phylogenetic performance did not seem to depend strongly on the sample. We expect that the best-performing genes identified here will be very useful for phylogenetic studies of fungi, at least at a large taxonomic scale. Furthermore, we compare the method developed here for finding genes for building robust phylogenies with previous ones and we advocate that our method could be applied to other groups of organisms when more complete genomes are available.  相似文献   

4.
The concordance of gene trees and species trees is reconsidered in detail, allowing for samples of arbitrary size to be taken from the species. A sense of concordance for gene tree and species tree topologies is clarified, such that if the "collapsed gene tree" produced by a gene tree has the same topology as the species tree, the gene tree is said to be topologically concordant with the species tree. The term speciodendric is introduced to refer to genes whose trees are topologically concordant with species trees. For a given three-species topology, probabilities of each of the three possible collapsed gene tree topologies are given, as are probabilities of monophyletic concordance and concordance in the sense of N. Takahata (1989), Genetics 122, 957-966. Increasing the sample size is found to increase the probability of topological concordance, but a limit exists on how much the topological concordance probability can be increased. Suggested sample sizes beyond which this probability can be increased only minimally are given. The results are discussed in terms of implications for molecular studies of phylogenetics and speciation.  相似文献   

5.
We studied the phylogenetic relationships among Japanese Leptocarabus ground beetles, which show extensive trans-species polymorphisms in mitochondrial gene genealogies. Simultaneous analysis of combined nuclear data with partial sequences from the long-wavelength rhodopsin, wingless, phosphoenolpyruvate carboxykinase, and 28S rRNA genes resolved the relationships among the five species, although separate analyses of these genes provided topologies with low resolution. For both the nuclear gene tree resulting from the combined data from four genes and a mitochondrial cytochrome oxidase subunit I (COI) gene tree, we applied a Bayesian divergence time estimation using a common calibration method to identify mitochondrial introgression events that occurred after speciation. Three mitochondrial lineages shared by two or three species were likely subject to introgression due to interspecific hybridization because the coalescent times for these lineages were much shorter than the corresponding speciation times estimated from nuclear gene sequences. We demonstrated that when species phylogeny is fully resolved with nuclear gene sequence data, comparative analysis of nuclear and mitochondrial gene trees can be used to infer introgressive hybridization events that might cause trans-species polymorphisms in mitochondrial gene trees.  相似文献   

6.
Yang CC  Sakai H  Numa H  Itoh T 《Gene》2011,477(1-2):53-60
Although a large number of genes are expected to correctly solve a phylogenetic relationship, inconsistent gene tree topologies have been observed. This conflicting evidence in gene tree topologies, known as gene tree discordance, becomes increasingly important as advanced sequencing technologies produce an enormous amount of sequence information for phylogenomic studies among closely related species. Here, we aim to characterize the gene tree discordance of the Asian cultivated rice Oryza sativa and its progenitor, O. rufipogon, which will be an ideal case study of gene tree discordance. Using genome and cDNA sequences of O. sativa and O. rufipogon, we have conducted the first in-depth analyses of gene tree discordance in Asian rice. Our comparison of full-length cDNA sequences of O. rufipogon with the genome sequences of the japonica and indica cultivars of O. sativa revealed that 60% of the gene trees showed a topology consistent with the expected one, whereas the remaining genes supported significantly different topologies. Moreover, the proportions of the topologies deviated significantly from expectation, suggesting at least one hybridization event between the two subgroups of O. sativa, japonica and indica. In fact, a genome-wide alignment between japonica and indica indicated that significant portions of the indica genome are derived from japonica. In addition, literature concerning the pedigree of the indica cultivar strongly supported the hybridization hypothesis. Our molecular evolutionary analyses deciphered complicated evolutionary processes in closely related species. They also demonstrated the importance of gene tree discordance in the era of high-speed DNA sequencing.  相似文献   

7.
Most plant phylogenetic inference has used DNA sequence data from the plastid genome. This genome represents a single genealogical sample with no recombination among genes, potentially limiting the resolution of evolutionary relationships in some contexts. In contrast, nuclear DNA is inherently more difficult to employ for phylogeny reconstruction because major mutational events in the genome, including polyploidization, gene duplication, and gene extinction can result in homologous gene copies that are difficult to identify as orthologs or paralogs. Gene tree parsimony (GTP) can be used to infer the rooted species tree by fitting gene genealogies to species trees while simultaneously minimizing the estimated number of duplications needed to reconcile conflicts among them. Here, we use GTP for five nuclear gene families and a previously published plastid data set to reconstruct the phylogenetic backbone of the aquatic plant family Pontederiaceae. Plastid-based phylogenetic studies strongly supported extensive paraphyly of Eichhornia (one of the four major genera) but also depicted considerable ambiguity concerning the true root placement for the family. Our results indicate that species trees inferred from the nuclear genes (alone and in combination with the plastid data) are highly congruent with gene trees inferred from plastid data alone. Consideration of optimal and suboptimal gene tree reconciliations place the root of the family at (or near) a branch leading to the rare and locally restricted E. meyeri. We also explore methods to incorporate uncertainty in individual gene trees during reconciliation by considering their individual bootstrap profiles and relate inferred excesses of gene duplication events on individual branches to whole-genome duplication events inferred for the same branches. Our study improves understanding of the phylogenetic history of Pontederiaceae and also demonstrates the utility of GTP for phylogenetic analysis.  相似文献   

8.
Clades that have undergone episodes of rapid cladogenesis are challenging from a phylogenetic point of view. They are generally characterised by short or missing internal branches in phylogenetic trees and by conflicting topologies among individual gene trees. This may be the case of the subfamily Trematominae, a group of marine teleosts of coastal Antarctic waters, which is considered to have passed through a period of rapid diversification. Despite much phylogenetic attention, the relationships among Trematominae species remain unclear. In contrast to previous studies that were mostly based on concatenated datasets of mitochondrial and/or single nuclear loci, we applied various single-locus and multilocus phylogenetic approaches to sequences from 11 loci (eight nuclear) and we also used several methods to assess the hypothesis of a radiation event in Trematominae evolution. Diversification rate analyses support the hypothesis of a period of rapid diversification during Trematominae history and only a few nodes in the hypothetical species tree were consistently resolved with various phylogenetic methods. We detected significant discrepancies among trees from individual genes of these species, most probably resulting from incomplete lineage sorting, suggesting that concatenation of loci is not the most appropriate way to investigate Trematominae species interrelationships. These data also provide information about the possible effects of historic climate changes on the diversification rate of this group of fish.  相似文献   

9.
The relative efficiencies of different protein-coding genes of the mitochondrial genome and different tree-building methods in recovering a known vertebrate phylogeny (two whale species, cow, rat, mouse, opossum, chicken, frog, and three bony fish species) was evaluated. The tree-building methods examined were the neighbor joining (NJ), minimum evolution (ME), maximum parsimony (MP), and maximum likelihood (ML), and both nucleotide sequences and deduced amino acid sequences were analyzed. Generally speaking, amino acid sequences were better than nucleotide sequences in obtaining the true tree (topology) or trees close to the true tree. However, when only first and second codon positions data were used, nucleotide sequences produced reasonably good trees. Among the 13 genes examined, Nd5 produced the true tree in all tree-building methods or algorithms for both amino acid and nucleotide sequence data. Genes Cytb and Nd4 also produced the correct tree in most tree-building algorithms when amino acid sequence data were used. By contrast, Co2, Nd1, and Nd41 showed a poor performance. In general, large genes produced better results, and when the entire set of genes was used, all tree-building methods generated the true tree. In each tree-building method, several distance measures or algorithms were used, but all these distance measures or algorithms produced essentially the same results. The ME method, in which many different topologies are examined, was no better than the NJ method, which generates a single final tree. Similarly, an ML method, in which many topologies are examined, was no better than the ML star decomposition algorithm that generates a single final tree. In ML the best substitution model chosen by using the Akaike information criterion produced no better results than simpler substitution models. These results question the utility of the currently used optimization principles in phylogenetic construction. Relatively simple methods such as the NJ and ML star decomposition algorithms seem to produce as good results as those obtained by more sophisticated methods. The efficiencies of the NJ, ME, MP, and ML methods in obtaining the correct tree were nearly the same when amino acid sequence data were used. The most important factor in constructing reliable phylogenetic trees seems to be the number of amino acids or nucleotides used.   相似文献   

10.
Phylogenomics has largely succeeded in its aim of accurately inferring species trees, even when there are high levels of discordance among individual gene trees. These resolved species trees can be used to ask many questions about trait evolution, including the direction of change and number of times traits have evolved. However, the mapping of traits onto trees generally uses only a single representation of the species tree, ignoring variation in the gene trees used to construct it. Recognizing that genes underlie traits, these results imply that many traits follow topologies that are discordant with the species topology. As a consequence, standard methods for character mapping will incorrectly infer the number of times a trait has evolved. This phenomenon, dubbed “hemiplasy,” poses many problems in analyses of character evolution. Here we outline these problems, explaining where and when they are likely to occur. We offer several ways in which the possible presence of hemiplasy can be diagnosed, and discuss multiple approaches to dealing with the problems presented by underlying gene tree discordance when carrying out character mapping. Finally, we discuss the implications of hemiplasy for general phylogenetic inference, including the possible drawbacks of the widespread push for “resolved” species trees.  相似文献   

11.
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.  相似文献   

12.
The proliferation of gene data from multiple loci of large multigene families has been greatly facilitated by considerable recent advances in sequence generation. The evolution of such gene families, which often undergo complex histories and different rates of change, combined with increases in sequence data, pose complex problems for traditional phylogenetic analyses, and in particular, those that aim to successfully recover species relationships from gene trees. Here, we implement gene tree parsimony analyses on multicopy gene family data sets of snake venom proteins for two separate groups of taxa, incorporating Bayesian posterior distributions as a rigorous strategy to account for the uncertainty present in gene trees. Gene tree parsimony largely failed to infer species trees congruent with each other or with species phylogenies derived from mitochondrial and single-copy nuclear sequences. Analysis of four toxin gene families from a large expressed sequence tag data set from the viper genus Echis failed to produce a consistent topology, and reanalysis of a previously published gene tree parsimony data set, from the family Elapidae, suggested that species tree topologies were predominantly unsupported. We suggest that gene tree parsimony failure in the family Elapidae is likely the result of unequal and/or incomplete sampling of paralogous genes and demonstrate that multiple parallel gene losses are likely responsible for the significant species tree conflict observed in the genus Echis. These results highlight the potential for gene tree parsimony analyses to be undermined by rapidly evolving multilocus gene families under strong natural selection.  相似文献   

13.
We have characterized the relationship between accurate phylogenetic reconstruction and sequence similarity, testing whether high levels of sequence similarity can consistently produce accurate evolutionary trees. We generated protein families with known phylogenies using a modified version of the PAML/EVOLVER program that produces insertions and deletions as well as substitutions. Protein families were evolved over a range of 100-400 point accepted mutations; at these distances 63% of the families shared significant sequence similarity. Protein families were evolved using balanced and unbalanced trees, with ancient or recent radiations. In families sharing statistically significant similarity, about 60% of multiple sequence alignments were 95% identical to true alignments. To compare recovered topologies with true topologies, we used a score that reflects the fraction of clades that were correctly clustered. As expected, the accuracy of the phylogenies was greatest in the least divergent families. About 88% of phylogenies clustered over 80% of clades in families that shared significant sequence similarity, using Bayesian, parsimony, distance, and maximum likelihood methods. However, for protein families with short ancient branches (ancient radiation), only 30% of the most divergent (but statistically significant) families produced accurate phylogenies, and only about 70% of the second most highly conserved families, with median expectation values better than 10(-60), produced accurate trees. These values represent upper bounds on expected tree accuracy for sequences with a simple divergence history; proteins from 700 Giardia families, with a similar range of sequence similarities but considerably more gaps, produced much less accurate trees. For our simulated insertions and deletions, correct multiple sequence alignments did not perform much better than those produced by T-COFFEE, and including sequences with expressed sequence tag-like sequencing errors did not significantly decrease phylogenetic accuracy. In general, although less-divergent sequence families produce more accurate trees, the likelihood of estimating an accurate tree is most dependent on whether radiation in the family was ancient or recent. Accuracy can be improved by combining genes from the same organism when creating species trees or by selecting protein families with the best bootstrap values in comprehensive studies.  相似文献   

14.
Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets.  相似文献   

15.
为了更好地了解拟诺卡氏菌属(Nocardiopsis)各物种间的系统发育关系,该属现有有效描述种的gyrB,sodrpoB基因的部分序列被测定,结合16S rRNA基因,对拟诺卡氏菌属进行了系统发育重建。研究发现拟诺卡氏菌属gyrB,sodrpoB基因的平均相似性分别为87.7%、87.3%和94.1%,而16S rRNA基因的平均相似性则达到96.65%,3个看家基因均比16S rRNA具有更高的分歧度。比较基于不同基因的系统树发现,由gyrB基因得到的系统树拓扑结构与16S rRNA得到的结构在亚群上基本一致。因此,gyrB基因在拟诺卡氏菌属的系统分类上比16S rRNA基因更具优越性。  相似文献   

16.
Gene trees are often assumed to be equivalent to species trees, but processes such as incomplete lineage sorting can generate incongruence among gene topologies and analyzing multilocus data in concatenated matrices can be prone to systematic errors. Accordingly, a variety of new methods have been developed to estimate species trees using multilocus data sets. Here, we apply some of these methods to reconstruct the phylogeny of Buarremon and near relatives, a group in which phylogenetic analyses of mitochondrial DNA sequences produced results that were inconsistent with relationships implied by a taxonomy based on variation in external phenotype. Gene genealogies obtained for seven loci (one mitochondrial, six nuclear) were varied, with some supporting and some rejecting the monophyly of Buarremon. Overall, our species-tree analyses tended to support a monophyletic Buarremon, but due to lack of congruence between methodologies, resolution of the phylogeny of this group remains uncertain. More generally, our study indicates that the number of individuals sampled can have an important effect on phylogenetic reconstruction, that the use of seven markers does not guarantee obtaining a strongly-supported species tree, and that methods for species-tree reconstruction can produce different results using the same data; these are important considerations for researchers using these new phylogenetic approaches in other systems.  相似文献   

17.
In silico genomic fingerprints were produced by virtual hybridization of 191 fully sequenced bacterial genomes using a set of 15,264 13-mer probes specially designed to produce universal whole genome fingerprints. A novel approach for constructing phylogenetic trees, based on comparative analysis of genomic fingerprints, was developed. The resultant bacterial phylogenetic tree had strong similarities to those produced from the alignment of conserved sequences. Notably, the trees derived from the alignment of other conserved COG genes divided the Bacillus and Corynebacterium genera into the same subgroups produced by the novel bacterial tree. A number of discrepancies between both techniques were observed for the grouping of some Lactobacillus species. However, a detailed analysis of the alignment of these genomes using other bioinformatics tools revealed that the grouping of these organisms in the novel tree was more satisfactory than the groupings from previous classifications, which used only a few conserved genes. All these data suggest that the bacterial taxonomy produced by genomic fingerprints is satisfactory, but sometimes different from classical taxonomies. Discrepancies probably arise because the fingerprinting technique analyzes genomic sequences and reveals more information than previously used approaches.  相似文献   

18.
Yu Y  Degnan JH  Nakhleh L 《PLoS genetics》2012,8(4):e1002660
Gene tree topologies have proven a powerful data source for various tasks, including species tree inference and species delimitation. Consequently, methods for computing probabilities of gene trees within species trees have been developed and widely used in probabilistic inference frameworks. All these methods assume an underlying multispecies coalescent model. However, when reticulate evolutionary events such as hybridization occur, these methods are inadequate, as they do not account for such events. Methods that account for both hybridization and deep coalescence in computing the probability of a gene tree topology currently exist for very limited cases. However, no such methods exist for general cases, owing primarily to the fact that it is currently unknown how to compute the probability of a gene tree topology within the branches of a phylogenetic network. Here we present a novel method for computing the probability of gene tree topologies on phylogenetic networks and demonstrate its application to the inference of hybridization in the presence of incomplete lineage sorting. We reanalyze a Saccharomyces species data set for which multiple analyses had converged on a species tree candidate. Using our method, though, we show that an evolutionary hypothesis involving hybridization in this group has better support than one of strict divergence. A similar reanalysis on a group of three Drosophila species shows that the data is consistent with hybridization. Further, using extensive simulation studies, we demonstrate the power of gene tree topologies at obtaining accurate estimates of branch lengths and hybridization probabilities of a given phylogenetic network. Finally, we discuss identifiability issues with detecting hybridization, particularly in cases that involve extinction or incomplete sampling of taxa.  相似文献   

19.
In silico genomic fingerprints were produced by virtual hybridization of 191 fully sequenced bacterial genomes using a set of 15,264 13-mer probes specially designed to produce universal whole genome fingerprints. A novel approach for constructing phylogenetic trees, based on comparative analysis of genomic fingerprints, was developed. The resultant bacterial phylogenetic tree had strong similarities to those produced from the alignment of conserved sequences. Notably, the trees derived from the alignment of other conserved COG genes divided the Bacillus and Corynebacterium genera into the same subgroups produced by the novel bacterial tree. A number of discrepancies between both techniques were observed for the grouping of some Lactobacillus species. However, a detailed analysis of the alignment of these genomes using other bioinformatics tools revealed that the grouping of these organisms in the novel tree was more satisfactory than the groupings from previous classifications, which used only a few conserved genes. All these data suggest that the bacterial taxonomy produced by genomic fingerprints is satisfactory, but sometimes different from classical taxonomies. Discrepancies probably arise because the fingerprinting technique analyzes genomic sequences and reveals more information than previously used approaches.  相似文献   

20.
Classification of bacteria is mainly based on sequence comparisons of certain homologous genes such as 16S rRNA. Recently there are challenges to classify bacteria using oligonucleotide frequency pattern of nonhomologous sequences. However, the evolutionary significance of oligonucleotides longer than tetra-nucleotide is not studied well. We performed phylogenetic analysis by using the Euclidean distances calculated from the di to deca-nucleotide frequencies in bacterial genomes, and compared these oligonucleotide frequency-based tree topologies with those for 16S rRNA gene and concatenated seven genes. When oligonucleotide frequency-based trees were constructed for bacterial species with similar GC content, their topologies at genus and family level were congruent with those based on homologous genes. Our results suggest that oligonucleotide frequency is useful not only for classification of bacteria, but also for estimation of their phylogenetic relationships for closely related species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号