首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Composition Vector Tree (CVTree) is an alignment-free algorithm to infer phylogenetic relationships from genome sequences. It has been successfully applied to study phylogeny and taxonomy of viruses, prokaryotes, and fungi based on the whole genomes, as well as chloroplast genomes, mitochondrial genomes, and metagenomes. Here we presented the standalone software for the CVTree algorithm. In the software, an extensible parallel workflow for the CVTree algorithm was designed. Based on the workflow, new alignment-free methods were also implemented. And by examining the phylogeny and taxonomy of 13,903 prokaryotes based on 16S rRNA sequences, we showed that CVTree software is an efficient and effective tool for studying phylogeny and taxonomy based on genome sequences. The code of CVTree software can be available at https://github.com/ghzuo/cvtree.  相似文献   

2.
The advent of next-generation sequencing technologies has greatly promoted the field of metagenomics which studies genetic material recovered directly from an environment. Characterization of genomic composition of a metagenomic sample is essential for understanding the structure of the microbial community. Multiple genomes contained in a metagenomic sample can be identified and quantitated through homology searches of sequence reads with known sequences catalogued in reference databases. Traditionally, reads with multiple genomic hits are assigned to non-specific or high ranks of the taxonomy tree, thereby impacting on accurate estimates of relative abundance of multiple genomes present in a sample. Instead of assigning reads one by one to the taxonomy tree as many existing methods do, we propose a statistical framework to model the identified candidate genomes to which sequence reads have hits. After obtaining the estimated proportion of reads generated by each genome, sequence reads are assigned to the candidate genomes and the taxonomy tree based on the estimated probability by taking into account both sequence alignment scores and estimated genome abundance. The proposed method is comprehensively tested on both simulated datasets and two real datasets. It assigns reads to the low taxonomic ranks very accurately. Our statistical approach of taxonomic assignment of metagenomic reads, TAMER, is implemented in R and available at http://faculty.wcas.northwestern.edu/hji403/MetaR.htm.  相似文献   

3.
We report an important but long-overlooked manifestation of low-resolution power of 16S rRNA sequence analysis at the species level, namely, in 16S rRNA-based phylogenetic trees polyphyletic placements of closely-related species are abundant compared to those in genome-based phylogeny. This phenomenon makes the demarcation of genera within many families ambiguous in the 16S rRNA-based taxonomy. In this study, we reconstructed phylogenetic relationship for more than ten thousand prokaryote genomes using the CVTree method, which is based on whole-genome information. And many such genera, which are polyphyletic in 16S rRNA-based trees, are well resolved as monophyletic clusters by CVTree. We believe that with genome sequencing of prokaryotes becoming a commonplace, genome-based phylogeny is doomed to play a definitive role in the construction of a natural and objective taxonomy.  相似文献   

4.

Background

Metagenomics has a great potential to discover previously unattainable information about microbial communities. An important prerequisite for such discoveries is to accurately estimate the composition of microbial communities. Most of prevalent homology-based approaches utilize solely the results of an alignment tool such as BLAST, limiting their estimation accuracy to high ranks of the taxonomy tree.

Results

We developed a new homology-based approach called Taxonomic Analysis by Elimination and Correction (TAEC), which utilizes the similarity in the genomic sequence in addition to the result of an alignment tool. The proposed method is comprehensively tested on various simulated benchmark datasets of diverse complexity of microbial structure. Compared with other available methods designed for estimating taxonomic composition at a relatively low taxonomic rank, TAEC demonstrates greater accuracy in quantification of genomes in a given microbial sample. We also applied TAEC on two real metagenomic datasets, oral cavity dataset and Crohn’s disease dataset. Our results, while agreeing with previous findings at higher ranks of the taxonomy tree, provide accurate estimation of taxonomic compositions at the species/strain level, narrowing down which species/strains need more attention in the study of oral cavity and the Crohn’s disease.

Conclusions

By taking account of the similarity in the genomic sequence TAEC outperforms other available tools in estimating taxonomic composition at a very low rank, especially when closely related species/strains exist in a metagenomic sample.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-242) contains supplementary material, which is available to authorized users.  相似文献   

5.
Species evolutionary relationships have traditionally been defined by sequence similarities of phylogenetic marker molecules, recently followed by whole-genome phylogenies based on gene order, average ortholog similarity or gene content. Here, we introduce genome conservation--a novel metric of evolutionary distances between species that simultaneously takes into account, both gene content and sequence similarity at the whole-genome level. Genome conservation represents a robust distance measure, as demonstrated by accurate phylogenetic reconstructions. The genome conservation matrix for all presently sequenced organisms exhibits a remarkable ability to define evolutionary relationships across all taxonomic ranges. An assessment of taxonomic ranks with genome conservation shows that certain ranks are inadequately described and raises the possibility for a more precise and quantitative taxonomy in the future. All phylogenetic reconstructions are available at the genome phylogeny server: .  相似文献   

6.
The ultimate goal of taxonomy is to establish a system that mirrors the 'order in nature'. In prokaryote microbiology, almost all taxonomic concepts try to mirror the whole evolutionary order back to the origin of life with the cell as basic unit. The introduction of the 16S rRNA gene as molecular marker allowed for the first time the creation of a hierarchical taxonomic system based on one practical molecular marker. With the development of new and rapid sequencing technologies a wealth of new data can and will be used for critical evaluation of the taxonomic system. Comprehensive analyses of other molecular markers as well as total or partial genome comparisons confirmed the 16S rRNA based hierarchical system as 'backbone of prokaryote taxonomy' at least at the genus level and above. A tendency is visible to classify novel taxa more and more based on the genotype, i.e. comparative analyses of 16S rRNA and/or other gene sequence data (in multilocus sequence analysis, MLSA) at the genus and the species level, sometimes contrary to the indications of other (often phenotypic) data. The understanding of all the information behind these data is lagging far behind their accumulation. Genes and genomes do not function on its own and can only display their potential within the cell as the basic unit of evolution (and hence taxonomy). It is the phenotype and the natural selection that 'drive' evolution in a given environment. In this context, the 'polyphasic taxonomic approach' should be revisited again, taking into account the novel insights into genomes and other 'omic' sciences in a more strict and detailed context with the phenotype. This approach allows a more holistic view and provides a sound basis for describing the diversity of prokaryotes and has the potential to become the foundation of a more stable, in-depth taxonomy of the prokaryotes.  相似文献   

7.
The Composition Vector Tree (CVTree) is a parameter-free and alignment-free method to infer prokaryotic phylogeny from their complete genomes. It is distinct from the traditional 16S rRNA analysis in both the input data and the methodology. The prokaryotic phylogenetic trees constructed by using the CVTree method agree well with the Bergey's taxonomy in all major groupings and fine branching patterns. Thus, combined use of the CVTree approach and the 16S rRNA analysis may provide an objective and reliable reconstruction of the prokaryotic branch of the Tree of Life.  相似文献   

8.
We describe an interactive viewer for the All-Species Living Tree (LVTree). The viewer incorporates treeing and lineage information from the ARB-SILVA website. It allows collapsing the tree branches at different taxonomic ranks and expanding the collapsed branches as well, keeping the overall topology of the tree unchanged. It also enables the user to observe the consequence of trial lineage modifications by re-collapsing the tree. The system reports taxon statistics at all ranks automatically after each collapsing and re-collapsing. These features greatly facilitate the compar-ison of the 16S rRNA sequence phylogeny with prokaryotic taxonomy in a taxon by taxon manner. In view of the fact that the present prokaryotic systematics is largely based on 16S rRNA sequence analysis, the current viewer may help reveal discrepancies between phylogeny and taxonomy. As an application, we show that in the latest release of LVTree, based on 11,939 rRNA sequences, as few as 24 lineage modifications are enough to bring all but two phyla (Proteobacteria and Firmicutes) to monophyletic clusters.  相似文献   

9.
At advanced stages of working with user-defined protein and gene sequence collections, it is frequently necessary to link these data to the taxonomic tree and to extract subsets in accordance with taxonomic considerations. Since no general automatic tools had been available, this was a tedious manual effort. Our taxonomy workbench allows processing of sequence sets, mapping of these sets onto the taxonomic tree, collection of taxonomic subsets from them and printing of the whole tree or some part of it. As a side effect, the system enables queries to and navigation within the taxonomy database. AVAILABILITY: An implementation of the taxonomy workbench is accessible for public use as a www-service at http://mendel.imp.univie.ac.at/taxonomy/. Software components for the command-line and for the www-version are available on request. CONTACT: Georg.Schneider@nt.imp.univie.ac.at; Frank.Eisenhaber@nt.imp.univie.ac.at SUPPLEMENTARY INFORMATION: Documentation for the taxonomy workbench can be accessed at http://mendel.imp.univie.ac.at/taxonomy/help.html.  相似文献   

10.
We perform an exhaustive, taxon by taxon, comparison of the branchings in the composition vector trees (CVTrees) inferred from 432 prokaryotic genomes available on 31 December 2006, with the bacteriologists’ taxonomy—primarily the latest online Outline of the Bergey’s Manual of Systematic Bacteriology. The CVTree phylogeny agrees very well with the Bergey’s taxonomy in majority of fine branchings and overall structures. At the same time most of the differences between the trees and the Manual have been known to biologists to some extent and may hint at taxonomic revisions. Instead of demonstrating the overwhelming agreement this paper puts emphasis on the biological implications of the differences.  相似文献   

11.
We perform an exhaustive, taxon by taxon, comparison of the branchings in the composition vector trees (CVTrees) inferred from 432 prokaryotic genomes available on 31 December 2006, with the bacte-riologists' taxonomy-primarily the latest online Outline of the Bergey's Manual of Systematic Bacteri-ology. The CVTree phylogeny agrees very well with the Bergey's taxonomy in majority of fine branchings and overall structures. At the same time most of the differences between the trees and the Manual have been known to biologists to some extent and may hint at taxonomic revisions. Instead of demonstrating the overwhelming agreement this paper puts emphasis on the biological implications of the differences.  相似文献   

12.
The classification of life forms into a hierarchical system (taxonomy) and the application of names to this hierarchy (nomenclature) is at a turning point in microbiology. The unprecedented availability of genome sequences means that a taxonomy can be built upon a comprehensive evolutionary framework, a longstanding goal of taxonomists. However, there is resistance to adopting a single framework to preserve taxonomic freedom, and ever increasing numbers of genomes derived from uncultured prokaryotes threaten to overwhelm current nomenclatural practices, which are based on characterised isolates. The challenge ahead then is to reach a consensus on the taxonomic framework and to adapt and scale the existing nomenclatural code, or create a new code, to systematically incorporate uncultured taxa into the chosen framework.Subject terms: Archaea, Bacteria  相似文献   

13.
Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree'' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.  相似文献   

14.
The recent multiplication of cladistic hypotheses for many zoological groups poses a challenge to zoological nomenclature following the International Code of Zoological Nomenclature: in order to account for these hypotheses, we will need many more ranks than currently allowed in this system, especially in lower taxonomy (around the ranks genus and species). The current Code allows the use of as many ranks as necessary in the family-series of nomina (except above superfamily), but forbids the use of more than a few ranks in the genus and species-series. It is here argued that this limitation has no theoretical background, does not respect the freedom of taxonomic thoughts or actions, and is harmful to zoological taxonomy in two respects at least: (1) it does not allow to express in detail hypothesized cladistic relationships among taxa at lower taxonomic levels (genus and species); (2) it does not allow to point taxonomically to low-level differentiation between populations of the same species, although this would be useful in some cases for conservation biology purposes. It is here proposed to modify the rules of the Code in order to allow use by taxonomists of an indeterminate number of ranks in all nominal-series. Such an 'expanded nomenclatural system' would be highly flexible and likely to be easily adapted to any new finding or hypothesis regarding cladistic relationships between taxa, at genus and species level and below. This system could be useful for phylogeographic analysis and in conservation biology. In zoological nomenclature, whereas robustness of nomina is necessary, the same does not hold for nomenclatural ranks, as the latter are arbitrary and carry no special biological, evolutionary or other information, except concerning the mutual relationships between taxa in the taxonomic hierarchy. Compared to the Phylocode project, the new system is equally unambiguous within the frame of a given taxonomic frame, but it provides more explicit and informative nomina for non-specialist users, and is more economic in terms of number of nomina needed to account for a given hierarchy. These ideas are exemplified by a comparative study of three possible nomenclatures for the taxonomy recently proposed by Hillis and Wilcox (2005) for American frogs traditionally referred to the genus Rana.  相似文献   

15.

Background

The correct taxonomic assignment of bacterial genomes is a primary and challenging task. With the availability of whole genome sequences, the gene content based approaches appear promising in inferring the bacterial taxonomy. The complete genome sequencing of a bacterial genome often reveals a substantial number of unique genes present only in that genome which can be used for its taxonomic classification.

Results

In this study, we have proposed a comprehensive method which uses the taxon-specific genes for the correct taxonomic assignment of existing and new bacterial genomes. The taxon-specific genes identified at each taxonomic rank have been successfully used for the taxonomic classification of 2,342 genomes present in the NCBI genomes, 36 newly sequenced genomes, and 17 genomes for which the complete taxonomy is not yet known. This approach has been implemented for the development of a tool ‘Microtaxi’ which can be used for the taxonomic assignment of complete bacterial genomes.

Conclusion

The taxon-specific gene based approach provides an alternate valuable methodology to carry out the taxonomic classification of newly sequenced or existing bacterial genomes.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1542-0) contains supplementary material, which is available to authorized users.  相似文献   

16.
Microsporidia are obligatory intracellular parasites related to fungi and since their discovery their classification and origin has been controversial due to their unique morphology. Early taxonomic studies of microsporidia were based on ultrastructural spore features, characteristics of their life cycle and transmission modes. However, taxonomy and phylogeny based solely on these characteristics can be misleading. SSU rRNA is a traditional marker used in taxonomical classifications, but the power of SSU rRNA to resolve phylogenetic relationships between microsporidia is considered weak at the species level, as it may not show enough variation to distinguish closely related species. Overall genome relatedness indices (OGRI), such as average nucleotide identity (ANI), allows fast and easy-to-implement comparative measurements between genomes to assess species boundaries in prokaryotes, with a 95% cutoff value for grouping genomes of the same species. Due to the increasing availability of complete genomes, metrics of genome relatedness have been applied for eukaryotic microbes taxonomy such as microsporidia. However, the distribution of ANI values and cutoff values for species delimitation have not yet been fully tested in microsporidia. In this study we examined the distribution of ANI values for 65 publicly available microsporidian genomes and tested whether the 95% cutoff value is a good estimation for circumscribing species based on their genetic relatedness.  相似文献   

17.
Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation.  相似文献   

18.
DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).  相似文献   

19.
MOTIVATION: We explored the feasibility of using unaligned rRNA gene sequences as DNA barcodes, based on correlation analysis of composition vectors (CVs) derived from nucleotide strings. We tested this method with seven rRNA (including 12, 16, 18, 26 and 28S) datasets from a wide variety of organisms (from archaea to tetrapods) at taxonomic levels ranging from class to species. RESULT: Our results indicate that grouping of taxa based on CV analysis is always in good agreement with the phylogenetic trees generated by traditional approaches, although in some cases the relationships among the higher systemic groups may differ. The effectiveness of our analysis might be related to the length and divergence among sequences in a dataset. Nevertheless, the correct grouping of sequences and accurate assignment of unknown taxa make our analysis a reliable and convenient approach in analyzing unaligned sequence datasets of various rRNAs for barcoding purposes. AVAILABILITY: The newly designed software (CVTree 1.0) is publicly available at the Composition Vector Tree (CVTree) web server http://cvtree.cbi.pku.edu.cn.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号