首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Ongoing genome sequencing projects have led to a phylogenetic approach based on genome-scale data (phylogenomics), which is beginning to shed light on longstanding unresolved phylogenetic issues. The use of large datasets in phylogenomic analysis results in a global increase in resolution due to a decrease in sampling error. However, a fully resolved tree can still be wrong if the phylogenetic inference is biased.  相似文献   

2.

Background  

When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication). The utility of phylogenetic information in high-throughput genome annotation ("phylogenomics") is widely recognized, but existing approaches are either manual or not explicitly based on phylogenetic trees.  相似文献   

3.
The emerging field of phylogenomics is influencing both the amount and type of characters being brought to bear on long-standing problems in systematic biology. Moreover, the proliferation of sequence information from genome projects in concert with the development of new informatics tools is widening access to comparative data on retroelements to a broad cross section of investigators. Motivated by this, the Society of Systematic Biologists sponsored a symposium entitled "Genome Analysis and the Molecular Systematics of Retroelements," and the resulting papers illustrate this theme of new discoveries and cover three basic areas of research: (i) the taxonomic distribution and phylogenetic structure of families of retroelements; (II) the use of SINE and LINE insertions for phylogenetic inference; and (III) the informatics and classification of repetitive elements. Contributions of each article are briefly discussed in this context and particularly fruitful directions for future research illuminated by results of this symposium are reviewed.  相似文献   

4.
MOTIVATION: Protein families evolve a multiplicity of functions through gene duplication, speciation and other processes. As a number of studies have shown, standard methods of protein function prediction produce systematic errors on these data. Phylogenomic analysis--combining phylogenetic tree construction, integration of experimental data and differentiation of orthologs and paralogs--has been proposed to address these errors and improve the accuracy of functional classification. The explicit integration of structure prediction and analysis in this framework, which we call structural phylogenomics, provides additional insights into protein superfamily evolution. RESULTS: Results of protein functional classification using phylogenomic analysis show fewer expected false positives overall than when pairwise methods of functional classification are employed. We present an overview of the motivations and fundamental principles of phylogenomic analysis, new methods developed for the key tasks, benchmark datasets for these tasks (when available) and suggest procedures to increase accuracy. We also discuss some of the methods used in the Celera Genomics high-throughput phylogenomic classification of the human genome. AVAILABILITY: Software tools from the Berkeley Phylogenomics Group are available at http://phylogenomics.berkeley.edu  相似文献   

5.
Abstract More than 190 plastid genomes have been completely sequenced during the past two decades due to advances in DNA sequencing technologies. Based on this unprecedented abundance of data, extensive genomic changes have been revealed in the plastid genomes. Inversion is the most common mechanism that leads to gene order changes. Several inversion events have been recognized as informative phylogenetic markers, such as a 30‐kb inversion found in all living vascular plants minus lycopsids and two short inversions putatively shared by all ferns. Gene loss is a common event throughout plastid genome evolution. Many genes were independently lost or transferred to the nuclear genome in multiple plant lineages. The trnR‐CCG gene was lost in some clades of lycophytes, ferns, and seed plants, and all the ndh genes were absent in parasitic plants, gnetophytes, Pinaceae, and the Taiwan moth orchid. Certain parasitic plants have, in particular, lost plastid genes related to photosynthesis because of the relaxation of functional constraint. The dramatic growth of plastid genome sequences has also promoted the use of whole plastid sequences and genomic features to solve phylogenetic problems. Chloroplast phylogenomics has provided additional evidence for deep‐level phylogenetic relationships as well as increased phylogenetic resolutions at low taxonomic levels. However, chloroplast phylogenomics is still in its infant stage and rigorous analysis methodology has yet to be developed.  相似文献   

6.
系统发育基因组学是利用全基因组数据构建系统发育树的新领域。全基因组数据能有效消除横向基因转移和类群间基因进化速率差异等因素对系统发育树的影响。根据所使用的全基因组数据的类型, 可以将系统发育基因组学方法分为以下5类:多基因联合建树方法, 基于基因含量的方法, 基于基因排列信息的方法, 基于序列短串含量特征信息的方法及基于代谢途径的方法。文章系统地总结了每一类方法的原理、速度、准确性、适用范围及在各个生物类群中的应用, 并对系统发育基因组学的前景及面临的挑战进行了概述。  相似文献   

7.
8.
Determining the influence of horizontal gene transfer (HGT) on phylogenomic analyses and the retrieval of a tree of life is relevant for our understanding of microbial genome evolution. It is particularly difficult to differentiate between phylogenetic incongruence due to noise and that resulting from HGT. We have performed a large-scale, detailed evolutionary analysis of the different phylogenetic signals present in the genomes of Xanthomonadales, a group of Proteobacteria. We show that the presence of phylogenetic noise is not an obstacle to infer past and present HGTs during their evolution. The scenario derived from this analysis and other recently published reports reflect the confounding effects on bacterial phylogenomics of past and present HGT. Although transfers between closely related species are difficult to detect in genome-scale phylogenetic analyses, past transfers to the ancestor of extant groups appear as conflicting signals that occasionally might make impossible to determine the evolutionary origin of the whole genome.  相似文献   

9.
MOTIVATION: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. RESULTS: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algorithms that are approximately O(n) for a gene tree of sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees. AVAILABILITY: http://www.genetics.wustl.edu/eddy/forester.  相似文献   

10.
哺乳动物是一类最进化并在地球上占主导地位的动物类群,重建其系统发育关系一直是分子系统学的研究热点。随着越来越多物种全基因组测序的完成,在基因组水平上探讨该类动物的系统发育关系与进化成为研究的热点。本文从全基因组序列,稀有基因组变异及染色体涂染等几个方面简要介绍了当前系统发育基因组学在现生哺乳动物分子系统学中的应用,综合已有的研究归纳整理了胎盘亚纲的总目及目间的系统发育关系,给出了胎盘动物19 个目的系统发育树。本文还分析了哺乳动物系统发育基因组学目前所面临的主要问题及未来的发展前景。  相似文献   

11.
Mammalian phylogenomics comes of age   总被引:28,自引:0,他引:28  
The relatively new field of phylogenomics is beginning to reveal the potential of genomic data for evolutionary studies. As the cost of whole genome sequencing falls, anticipation of complete genome sequences from divergent species, reflecting the major lineages of modern mammals, is no longer a distant dream. In this article, we describe how comparative genomic data from mammals is progressing to resolve long-standing phylogenetic controversies, to refine dogma on how chromosomes evolve and to guide annotation of human and other vertebrate genomes.  相似文献   

12.
The genus Drosophila has been the subject of intense comparative phylogenomics characterization to provide insights into genome evolution under diverse biological and ecological contexts and to functionally annotate the Drosophila melanogaster genome, a model system for animal and insect genetics. Recent sequencing of 11 additional Drosophila species from various divergence points of the genus is a first step in this direction. However, to fully reap the benefits of this resource, the Drosophila community is faced with two critical needs: i.e., the expansion of genomic resources from a much broader range of phylogenetic diversity and the development of additional resources to aid in finishing the existing draft genomes. To address these needs, we report the first synthesis of a comprehensive set of bacterial artificial chromosome (BAC) resources for 19 Drosophila species from all three subgenera. Ten libraries were derived from the exact source used to generate 10 of the 12 draft genomes, while the rest were generated from a strategically selected set of species on the basis of salient ecological and life history features and their phylogenetic positions. The majority of the new species have at least one sequenced reference genome for immediate comparative benefit. This 19-BAC library set was rigorously characterized and shown to have large insert sizes (125-168 kb), low nonrecombinant clone content (0.3-5.3%), and deep coverage (9.1-42.9×). Further, we demonstrated the utility of this BAC resource for generating physical maps of targeted loci, refining draft sequence assemblies and identifying potential genomic rearrangements across the phylogeny.  相似文献   

13.
Here we use phylogenomics with expressed sequence tag (EST) data from the ecologically important coccolithophore-forming alga Emiliania huxleyi and the plastid-lacking cryptophyte Goniomonas cf. pacifica to establish their phylogenetic positions in the eukaryotic tree. Haptophytes and cryptophytes are members of the putative eukaryotic supergroup Chromalveolata (chromists [cryptophytes, haptophytes, stramenopiles] and alveolates [apicomplexans, ciliates, and dinoflagellates]). The chromalveolates are postulated to be monophyletic on the basis of plastid pigmentation in photosynthetic members, plastid gene and genome relationships, nuclear "host" phylogenies of some chromalveolate lineages, unique gene duplication and replacements shared by these taxa, and the evolutionary history of components of the plastid import and translocation systems. However the phylogenetic position of cryptophytes and haptophytes and the monophyly of chromalveolates as a whole remain to be substantiated. Here we assess chromalveolate monophyly using a multigene dataset of nuclear genes that includes members of all 6 eukaryotic supergroups. An automated phylogenomics pipeline followed by targeted database searches was used to assemble a 16-protein dataset (6,735 aa) from 46 taxa for tree inference. Maximum likelihood and Bayesian analyses of these data support the monophyly of haptophytes and cryptophytes. This relationship is consistent with a gene replacement via horizontal gene transfer of plastid-encoded rpl36 that is uniquely shared by these taxa. The haptophytes + cryptophytes are sister to a clade that includes all other chromalveolates and, surprisingly, two members of the Rhizaria, Reticulomyxa filosa and Bigelowiella natans. The association of the two Rhizaria with chromalveolates is supported by the approximately unbiased (AU)-test and when the fastest evolving amino acid sites are removed from the 16-protein alignment.  相似文献   

14.
15.
叶绿体系统发育基因组学的研究进展   总被引:4,自引:0,他引:4  
系统发育基因组学是由系统发育研究和基因组学相结合产生的一门崭新的交叉学科。近年来,在植物系统发育研究中,基于叶绿体基因组的系统发育基因组学研究优势渐显端倪,为一些分类困难类群的系统学问题提出了解决方案,但同时也存在某些问题。本文结合近年来叶绿体系统发育基因组学研究中的一些典型实例,讨论了叶绿体系统发育基因组学在植物系统关系重建中的价值和应用前景,并针对其存在问题进行了探讨,其中也涉及了新一代测序技术对叶绿体系统发育基因组学的影响。  相似文献   

16.
The availability of numerous universal markers and suitable phylogenetic analysis methods are both very important for phylogenomics inference. Based on PCR amplification, a total of 122 markers, which were amplified in 19 representative species, were developed for Laurasiatherian phylogenomics. Subsequently, we illustrated the utility of these newly developed markers using a subset of eight markers. We showed that both 'supermatrix' and 'supertree' trees generated similar topology, which accorded with the current understanding of the Laurasiatherian phylogeny in most aspects. Thus, markers developed here would be likely to make a contribution to resolving evolutionary relationships and inferring evolutionary histories of the Laurasiatherian mammals in the future.  相似文献   

17.
MOTIVATION: Phylogenomics integrates the vast amount of phylogenetic information contained in complete genome sequences, and is rapidly becoming the standard for reliably inferring species phylogenies. There are, however, fundamental differences between the ways in which phylogenomic approaches like gene content, superalignment, superdistance and supertree integrate the phylogenetic information from separate orthologous groups. Furthermore, they all depend on the method by which the orthologous groups are initially determined. Here, we systematically compare these four phylogenomic approaches, in parallel with three approaches for large-scale orthology determination: pairwise orthology, cluster orthology and tree-based orthology. RESULTS: Including various phylogenetic methods, we apply a total of 54 fully automated phylogenomic procedures to the fungi, the eukaryotic clade with the largest number of sequenced genomes, for which we retrieved a golden standard phylogeny from the literature. Phylogenomic trees based on gene content show, relative to the other methods, a bias in the tree topology that parallels convergence in lifestyle among the species compared, indicating convergence in gene content. CONCLUSIONS: Complete genomes are no guarantee for good or even consistent phylogenies. However, the large amounts of data in genomes enable us to carefully select the data most suitable for phylogenomic inference. In terms of performance, the superalignment approach, combined with restrictive orthology, is the most successful in recovering a fungal phylogeny that agrees with current taxonomic views, and allows us to obtain a high-resolution phylogeny. We provide solid support for what has grown to be a common practice in phylogenomics during its advance in recent years. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

18.
Zhang YJ  Ma PF  Li DZ 《PloS one》2011,6(5):e20596

Background

Bambusoideae is the only subfamily that contains woody members in the grass family, Poaceae. In phylogenetic analyses, Bambusoideae, Pooideae and Ehrhartoideae formed the BEP clade, yet the internal relationships of this clade are controversial. The distinctive life history (infrequent flowering and predominance of asexual reproduction) of woody bamboos makes them an interesting but taxonomically difficult group. Phylogenetic analyses based on large DNA fragments could only provide a moderate resolution of woody bamboo relationships, although a robust phylogenetic tree is needed to elucidate their evolutionary history. Phylogenomics is an alternative choice for resolving difficult phylogenies.

Methodology/Principal Findings

Here we present the complete nucleotide sequences of six woody bamboo chloroplast (cp) genomes using Illumina sequencing. These genomes are similar to those of other grasses and rather conservative in evolution. We constructed a phylogeny of Poaceae from 24 complete cp genomes including 21 grass species. Within the BEP clade, we found strong support for a sister relationship between Bambusoideae and Pooideae. In a substantial improvement over prior studies, all six nodes within Bambusoideae were supported with ≥0.95 posterior probability from Bayesian inference and 5/6 nodes resolved with 100% bootstrap support in maximum parsimony and maximum likelihood analyses. We found that repeats in the cp genome could provide phylogenetic information, while caution is needed when using indels in phylogenetic analyses based on few selected genes. We also identified relatively rapidly evolving cp genome regions that have the potential to be used for further phylogenetic study in Bambusoideae.

Conclusions/Significance

The cp genome of Bambusoideae evolved slowly, and phylogenomics based on whole cp genome could be used to resolve major relationships within the subfamily. The difficulty in resolving the diversification among three clades of temperate woody bamboos, even with complete cp genome sequences, suggests that these lineages may have diverged very rapidly.  相似文献   

19.
20.
Full genome data sets are currently being explored on a regular basis to infer phylogenetic trees, but there are often discordances among the trees produced by different genes. An important goal in phylogenomics is to identify which individual gene and species produce the same phylogenetic tree and are thus likely to share the same evolutionary history. On the other hand, it is also essential to identify which genes and species produce discordant topologies and therefore evolve in a different way or represent noise in the data. The latter are outlier genes or species and they can provide a wealth of information on potentially interesting biological processes, such as incomplete lineage sorting, hybridization, and horizontal gene transfers. Here, we propose a new method to explore the genomic tree space and detect outlier genes and species based on multiple co-inertia analysis (MCOA), which efficiently captures and compares the similarities in the phylogenetic topologies produced by individual genes. Our method allows the rapid identification of outlier genes and species by extracting the similarities and discrepancies, in terms of the pairwise distances, between all the species in all the trees, simultaneously. This is achieved by using MCOA, which finds successive decomposition axes from individual ordinations (i.e., derived from distance matrices) that maximize a covariance function. The method is freely available as a set of R functions. The source code and tutorial can be found online at http://phylomcoa.cgenomics.org.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号