首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Increasingly large datasets of 16S rRNA gene sequences reveal new information about the extent of microbial diversity and the surprising extent of the rare biosphere. Currently, many of the largest datasets are represented by short and variable ribosomal sequence tags (RSTs) that are limited in their ability to accurately assign sequences to broad-scale phylogenetic trees. In this study, we selected 30 rare RSTs from existing sequence datasets and designed primers to amplify c. 1400 bases of the 16S rRNA gene to determine whether these sequences were represented by existing databases or if they might reveal new lineages within the Bacteria. Approximately one-third of the RST primers successfully amplified longer portions of these low-abundance 16S rRNA genes in a specific manner. Subsequent phylogenetic analysis demonstrated that most of these sequences were (1) distantly related to existing cultivated microorganisms and (2) closely related to uncultivated clone sequences that were recently deposited in GenBank. The presence of so many recently collected 16S rRNA gene reference sequences in existing databases suggests that progress is being made quickly towards a microbial census, one which has begun scratching the surface of the 'rare biosphere'.  相似文献   

2.
MOTIVATION: We explored the feasibility of using unaligned rRNA gene sequences as DNA barcodes, based on correlation analysis of composition vectors (CVs) derived from nucleotide strings. We tested this method with seven rRNA (including 12, 16, 18, 26 and 28S) datasets from a wide variety of organisms (from archaea to tetrapods) at taxonomic levels ranging from class to species. RESULT: Our results indicate that grouping of taxa based on CV analysis is always in good agreement with the phylogenetic trees generated by traditional approaches, although in some cases the relationships among the higher systemic groups may differ. The effectiveness of our analysis might be related to the length and divergence among sequences in a dataset. Nevertheless, the correct grouping of sequences and accurate assignment of unknown taxa make our analysis a reliable and convenient approach in analyzing unaligned sequence datasets of various rRNAs for barcoding purposes. AVAILABILITY: The newly designed software (CVTree 1.0) is publicly available at the Composition Vector Tree (CVTree) web server http://cvtree.cbi.pku.edu.cn.  相似文献   

3.
The signing authors together with the journal Systematic and Applied Microbiology (SAM) have started an ambitious project that has been conceived to provide a useful tool especially for the scientific microbial taxonomist community. The aim of what we have called "The All-Species Living Tree" is to reconstruct a single 16S rRNA tree harboring all sequenced type strains of the hitherto classified species of Archaea and Bacteria. This tree is to be regularly updated by adding the species with validly published names that appear monthly in the Validation and Notification lists of the International Journal of Systematic and Evolutionary Microbiology. For this purpose, the SAM executive editors, together with the responsible teams of the ARB, SILVA, and LPSN projects (www.arb-home.de, www.arb-silva.de, and www.bacterio.cict.fr, respectively), have prepared a 16S rRNA database containing over 6700 sequences, each of which represents a single type strain of a classified species up to 31 December 2007. The selection of sequences had to be undertaken manually due to a high error rate in the names and information fields provided for the publicly deposited entries. In addition, from among the often occurring multiple entries for a single type strain, the best-quality sequence was selected for the project. The living tree database that SAM now provides contains corrected entries and the best-quality sequences with a manually checked alignment. The tree reconstruction has been performed by using the maximum likelihood algorithm RAxML. The tree provided in the first release is a result of the calculation of a single dataset containing 9975 single entries, 6728 corresponding to type strain gene sequences, as well as 3247 additional high-fquality sequences to give robustness to the reconstruction. Trees are dynamic structures that change on the basis of the quality and availability of the data used for their calculation. Therefore, the addition of new type strain sequences in further subsequent releases may help to resolve certain branching orders that appear ambiguous in this first release. On the web sites: www.elsevier.de/syapm and www.arb-silva.de/living-tree, the All-Species Living Tree team will release a regularly updated database compatible with the ARB software environment containing the whole 16S rRNA dataset used to reconstruct "The All-Species Living Tree". As a result, the latest reconstructed phylogeny will be provided. In addition to the ARB file, a readable multi-FASTA universal sequence editor file with the complete alignment will be provided for those not using ARB. There is also a complete set of supplementary tables and figures illustrating the selection procedure and its outcome. It is expected that the All-Species Living Tree will help to improve future classification efforts by simplifying the selection of the correct type strain sequences. For queries, information updates, remarks on the dataset or tree reconstructions shown, a contact email address has been created (living-tree@arb-silva.de). This provides an entry point for anyone from the scientific community to provide additional input for the construction and improvement of the first tree compiling all sequenced type strains of all prokaryotic species for which names had been validly published.  相似文献   

4.
5.
Page RD 《Nucleic acids research》2000,28(20):3839-3845
Comparative analysis is the preferred method of inferring RNA secondary structure, but its use requires considerable expertise and manual effort. As the importance of secondary structure for accurate sequence alignment and phylogenetic analysis becomes increasingly realised, the need for secondary structure models for diverse taxonomic groups becomes more pressing. The number of available structures bears little relation to the relative diversity or importance of the different taxonomic groups. Insects, for example, comprise the largest group of animals and yet are very poorly represented in secondary structure databases. This paper explores the utility of maximum weighted matching (MWM) to help automate the process of comparative analysis by inferring secondary structure for insect mitochondrial small subunit (12S) rRNA sequences. By combining information on correlated changes in substitutions and helix dot plots, MWM can rapidly generate plausible models of secondary structure. These models can be further refined using standard comparative techniques. This paper presents a secondary structure model for insect 12S rRNA based on an alignment of 225 insect sequences and an alignment for 16 exemplar insect sequences. This alignment is used as a template for a web server that automatically generates secondary structures for insect sequences.  相似文献   

6.
Phylogenetic analysis of Glomeromycota by partial LSU rDNA sequences   总被引:2,自引:0,他引:2  
We analyzed the large subunit ribosomal RNA (rRNA) gene [LSU ribosomal DNA (rDNA)] as a phylogenetic marker for arbuscular mycorrhizal (AM) fungal taxonomy. Partial LSU rDNA sequences were obtained from ten AM fungal isolates, comprising seven species, with two new primers designed for Glomeromycota LSU rDNA. The sequences, together with 58 sequences available from the databases, represented 31 AM fungal species. Neighbor joining and parsimony analyses were performed with the aim of evaluating the potential of the LSU rDNA for phylogenetic resolution. The resulting trees indicated that Archaeosporaceae are a basal group in Glomeromycota, Acaulosporaceae and Gigasporaceae belong to the same clade, while Glomeraceae are polyphyletic. The results support data obtained with the small subunit (SSU) rRNA gene, demonstrating that the LSU rRNA gene is a useful molecular marker for clarifying taxonomic and phylogenetic relationships in Glomeromycota.  相似文献   

7.
Tangherlini  M.  Miralto  M.  Colantuono  C.  Sangiovanni  M.  Dell&#; Anno  A.  Corinaldesi  C.  Danovaro  R.  Chiusano  M. L. 《BMC bioinformatics》2018,19(15):443-143

Background

Environmental metagenomics is a challenging approach that is exponentially spreading in the scientific community to investigate taxonomic diversity and possible functions of the biological components. The massive amount of sequence data produced, often endowed with rich environmental metadata, needs suitable computational tools to fully explore the embedded information. Bioinformatics plays a key role in providing methodologies to manage, process and mine molecular data, integrated with environmental metagenomics collections. One such relevant example is represented by the Tara Ocean Project.

Results

We considered the Tara 16S miTAGs released by the consortium, representing raw sequences from a shotgun metagenomics approach with similarities to 16S rRNA genes. We generated assembled 16S rDNA sequences, which were classified according to their lengths, the possible presence of chimeric reads, the putative taxonomic affiliation. The dataset was included in GLOSSary (the GLobal Ocean 16S Subunit web accessible resource), a bioinformatics platform to organize environmental metagenomics data. The aims of this work were: i) to present alternative computational approaches to manage challenging metagenomics data; ii) to set up user friendly web-based platforms to allow the integration of environmental metagenomics sequences and of the associated metadata; iii) to implement an appropriate bioinformatics platform supporting the analysis of 16S rDNA sequences exploiting reference datasets, such as the SILVA database. We organized the data in a next-generation NoSQL “schema-less” database, allowing flexible organization of large amounts of data and supporting native geospatial queries. A web interface was developed to permit an interactive exploration and a visual geographical localization of the data, either raw miTAG reads or 16S contigs, from our processing pipeline. Information on unassembled sequences is also available. The taxonomic affiliations of contigs and miTAGs, and the spatial distribution of the sampling sites and their associated sequence libraries, as they are contained in the Tara metadata, can be explored by a query interface, which allows both textual and visual investigations. In addition, all the sequence data were made available for a dedicated BLAST-based web application alongside the SILVA collection.

Conclusions

GLOSSary provides an expandable bioinformatics environment, able to support the scientific community in current and forthcoming environmental metagenomics analyses.
  相似文献   

8.
Marine sponges often contain diverse and abundant communities of microorganisms including bacteria, archaea and eukaryotic microbes. Numerous 16S rRNA-based studies have identified putative 'sponge-specific' microbes that are apparently absent from seawater and other (non-sponge) marine habitats. With more than 7500 sponge-derived rRNA sequences (from clone, isolate and denaturing gradient gel electrophoresis data) now publicly available, we sought to determine whether the current notion of sponge-specific sequence clusters remains valid. Comprehensive phylogenetic analyses were performed on the 7546 sponge-derived 16S and 18S rRNA sequences that were publicly available in early 2010. Overall, 27% of all sequences fell into monophyletic, sponge-specific sequence clusters. Such clusters were particularly well represented among the Chloroflexi, Cyanobacteria, 'Poribacteria', Betaproteobacteria and Acidobacteria, and in total were identified in at least 14 bacterial phyla, as well as the Archaea and fungi. The largest sponge-specific cluster, representing the cyanobacterium 'Synechococcus spongiarum', contained 245 sequences from 40 sponge species. These results strongly support the existence of sponge-specific microbes and provide a suitable framework for future studies of rare and abundant sponge symbionts, both of which can now be studied using next-generation sequencing technologies.  相似文献   

9.
Comparing bacterial 16S rDNA sequences to GenBank and other large public databases via BLAST often provides results of little use for identification and taxonomic assignment of the organisms of interest. The human microbiome, and in particular the oral microbiome, includes many taxa, and accurate identification of sequence data is essential for studies of these communities. For this purpose, a phylogenetically curated 16S rDNA database of the core oral microbiome, CORE, was developed. The goal was to include a comprehensive and minimally redundant representation of the bacteria that regularly reside in the human oral cavity with computationally robust classification at the level of species and genus. Clades of cultivated and uncultivated taxa were formed based on sequence analyses using multiple criteria, including maximum-likelihood-based topology and bootstrap support, genetic distance, and previous naming. A number of classification inconsistencies for previously named species, especially at the level of genus, were resolved. The performance of the CORE database for identifying clinical sequences was compared to that of three publicly available databases, GenBank nr/nt, RDP and HOMD, using a set of sequencing reads that had not been used in creation of the database. CORE offered improved performance compared to other public databases for identification of human oral bacterial 16S sequences by a number of criteria. In addition, the CORE database and phylogenetic tree provide a framework for measures of community divergence, and the focused size of the database offers advantages of efficiency for BLAST searching of large datasets. The CORE database is available as a searchable interface and for download at http://microbiome.osu.edu.  相似文献   

10.
SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword.  相似文献   

11.
Labyrinthulomycetes are heterotrophic stramenopiles that are ubiquitous in a wide range of both marine and freshwater habitats and play important roles in decomposition of organic matter. The diversity and taxonomy of Labyrinthulomycetes has been studied for many years, but we nevertheless lack both a comprehensive reference database and up‐to‐date phylogeny including all known diversity, which hinders many global insights into their ecological distribution and the relative importance of various subgroups in different environments. Here, we present a curated reference database and a phylogenetic tree of Labyrinthulomycetes small subunit ribosomal RNA (SSU or 18S rRNA) data. Based on this created reference database, we analyzed high‐throughput environmental sequencing data, revealing many previously unknown environmental clades and exploring the ecological distribution of various subgroups. Particularly, a number of newly identified environmental clades that are widespread in the open ocean. Comparing the manually curated reference database to existing tools for identification of environmental sequences (e.g. PR2 or SILVA databases) suggests that the curated database provides a higher degree of specificity and a lower frequency of misidentification. The phylogenetic framework and database will be a useful tool for future ecological and evolutionary studies.  相似文献   

12.
Sequencing of taxonomic or phylogenetic markers is becoming a fast and efficient method for studying environmental microbial communities. This has resulted in a steadily growing collection of marker sequences, most notably of the small-subunit (SSU) ribosomal RNA gene, and an increased understanding of microbial phylogeny, diversity and community composition patterns. However, to utilize these large datasets together with new sequencing technologies, a reliable and flexible system for taxonomic classification is critical. We developed CREST (Classification Resources for Environmental Sequence Tags), a set of resources and tools for generating and utilizing custom taxonomies and reference datasets for classification of environmental sequences. CREST uses an alignment-based classification method with the lowest common ancestor algorithm. It also uses explicit rank similarity criteria to reduce false positives and identify novel taxa. We implemented this method in a web server, a command line tool and the graphical user interfaced program MEGAN. Further, we provide the SSU rRNA reference database and taxonomy SilvaMod, derived from the publicly available SILVA SSURef, for classification of sequences from bacteria, archaea and eukaryotes. Using cross-validation and environmental datasets, we compared the performance of CREST and SilvaMod to the RDP Classifier. We also utilized Greengenes as a reference database, both with CREST and the RDP Classifier. These analyses indicate that CREST performs better than alignment-free methods with higher recall rate (sensitivity) as well as precision, and with the ability to accurately identify most sequences from novel taxa. Classification using SilvaMod performed better than with Greengenes, particularly when applied to environmental sequences. CREST is freely available under a GNU General Public License (v3) from http://apps.cbu.uib.no/crest and http://lcaclassifier.googlecode.com.  相似文献   

13.
The sequence of the Gyrodactylus salaris Malmberg, 1957, large subunit, or 28S, ribosomal RNA (rRNA) gene has been determined. This gene is the final portion of the Gyrodactylus rRNA gene operon to be sequenced and results in the first complete sequence of all rRNA genes and spacers from a monogenean. The nucleotide sequence was used to predict the secondary structure of the large subunit rRNA, and regions of conserved and variable sequence and structure were identified. The site where the 5' terminus of the 5.8S rRNA binds to a region within the large subunit rRNA was predicted and complements the anticipated interaction of the 3' terminus of the 5.8S with the 5' terminus of the large subunit rRNA. The large subunit gene may be useful in phylogenetic analysis of the Monogenea or Platyhelminthes and comparisons with other eukaryotes. The variable domains C and H may be most suitable for this purpose.  相似文献   

14.
Nucleotide sequences were determined for the rRNA internal transcribed spacers 1 and 2 (ITS1 and 2) and the 5' terminus of the large subunit rRNA in selected Gyrodactylus species. Examination of primary sequence variation and secondary structure models in ITS2 and variable region V4 of the small subunit rRNA revealed that structure was largely conserved despite significant variation in sequence. ITS1 sequences were highly variable, and models of structure were unreliable but, despite this, show some resemblance to structures predicted in Digenea. ITS2 models demonstrated binding of the 3' end of 5.8S rRNA to the 5' end of the large subunit rRNA and enabled the termini of these genes to be defined with greater confidence than previously. The structure model shown here may prove useful in future phylogenetic analyses.  相似文献   

15.
GenBank          下载免费PDF全文
GenBank (R) is a comprehensive sequence database that contains publicly available DNA sequences for more than 119 000 different organisms, obtained primarily through the submission of sequence data from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the BankIt (web) or Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in the UK and the DNA Data Bank of Japan helps ensure worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI home page at: http://www.ncbi.nlm.nih.gov.  相似文献   

16.
Taxonomic classification of the thousands–millions of 16S rRNA gene sequences generated in microbiome studies is often achieved using a naïve Bayesian classifier (for example, the Ribosomal Database Project II (RDP) classifier), due to favorable trade-offs among automation, speed and accuracy. The resulting classification depends on the reference sequences and taxonomic hierarchy used to train the model; although the influence of primer sets and classification algorithms have been explored in detail, the influence of training set has not been characterized. We compared classification results obtained using three different publicly available databases as training sets, applied to five different bacterial 16S rRNA gene pyrosequencing data sets generated (from human body, mouse gut, python gut, soil and anaerobic digester samples). We observed numerous advantages to using the largest, most diverse training set available, that we constructed from the Greengenes (GG) bacterial/archaeal 16S rRNA gene sequence database and the latest GG taxonomy. Phylogenetic clusters of previously unclassified experimental sequences were identified with notable improvements (for example, 50% reduction in reads unclassified at the phylum level in mouse gut, soil and anaerobic digester samples), especially for phylotypes belonging to specific phyla (Tenericutes, Chloroflexi, Synergistetes and Candidate phyla TM6, TM7). Trimming the reference sequences to the primer region resulted in systematic improvements in classification depth, and greatest gains at higher confidence thresholds. Phylotypes unclassified at the genus level represented a greater proportion of the total community variation than classified operational taxonomic units in mouse gut and anaerobic digester samples, underscoring the need for greater diversity in existing reference databases.  相似文献   

17.

Background  

The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research.  相似文献   

18.
Tetrahymena thermophila mitochondrial DNA is a linear molecule with two tRNAs, large subunit beta (LSU beta) rRNA (21S rRNA) and LSU alpha rRNA (5.8S-like RNA) encoded near each terminus. The DNA sequence of approximately 550 bp of this region was determined in six species of Tetrahymena. In three species the LSU beta rRNA and tRNA(leu) genes were not present on one end of the DNA, demonstrating a mitochondrial genome organization different from that of T. thermophila. The DNA sequence of the LSU alpha rRNA was used to construct a mitochondrial phylogenetic tree, which was found to be topologically equivalent to a phylogenetic tree based on nuclear small subunit rRNA sequences (Sogin et al. (1986) EMBO J. 5, 3625-3630). The mitochondrial rRNA gene was found to accumulate base-pair substitutions considerably faster than the nuclear rRNA gene, the rate difference being similar to that observed for mammals.  相似文献   

19.
The ribosomal RNA (rRNA) gene region of the microsporidium Heterosporis anguillarum has been examined. Complete DNA sequence data (4060 bp, GenBank Accession No. AF402839) of the rRNA gene of H. anguillarum are presented for the small subunit gene (SSU rRNA: 1359 bp), the internal transcribed spacer (ITS: 37 bp), and the large subunit gene (LSU rRNA: 2664 bp). The secondary structures of the H. anguillarum SSU and LSU rRNA genes are constructed and described. This is the first complete sequence of an rRNA gene published for a fish-infecting microsporidian species. In the phylogenetic analysis, the sequences, including partial SSU rRNA, ITS, and partial LSU rRNA sequences of the fish-infecting microsporidia, were aligned and analysed. The taxonomic position of H. anguillarum as suggested by Lom et al. (2000; Dis Aquat Org 43:225-231) is confirmed in this paper.  相似文献   

20.

Background  

Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes) can yield more phylogenetic signal than individual genes. However, manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. Additionally, sequence curation, alignment and assessment of the results of phylogenetic analysis are made particularly difficult by the potential for a given gene in a given species to be unrepresented, or to be represented by multiple or partial sequences. We have developed a software package, TaxMan, that largely automates the processes of sequence acquisition, consensus building, alignment and taxon selection to facilitate this type of phylogenetic study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号