首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.  相似文献   

2.
A new computer program, called Mallard, is presented for screening entire 16S rRNA gene libraries of up to 1,000 sequences for chimeras and other artifacts. Written in the Java computer language and capable of running on all major operating systems, the program provides a novel graphical approach for visualizing phylogenetic relationships among 16S rRNA gene sequences. To illustrate its use, we analyzed most of the large libraries of cloned bacterial 16S rRNA gene sequences submitted to the public repository during 2005. Defining a large library as one containing 100 or more sequences of 1,200 bases or greater, we screened 25 of the 28 libraries and found that all but three contained substantial anomalies. Overall, 543 anomalous sequences were found. The average anomaly content per clone library was 9.0%, 4% higher than that previously estimated for the public repository overall. In addition, 90.8% of anomalies had characteristic chimeric patterns, a rise of 25.4% over that found previously. One library alone was found to contain 54 chimeras, representing 45.8% of its content. These figures far exceed previous estimates of artifacts within public repositories and further highlight the urgent need for all researchers to adequately screen their libraries prior to submission. Mallard is freely available from our website at http://www.cardiff.ac.uk/biosi/research/biosoft/.  相似文献   

3.
DECIPHER is a new method for finding 16S rRNA chimeric sequences by the use of a search-based approach. The method is based upon detecting short fragments that are uncommon in the phylogenetic group where a query sequence is classified but frequently found in another phylogenetic group. The algorithm was calibrated for full sequences (fs_DECIPHER) and short sequences (ss_DECIPHER) and benchmarked against WigeoN (Pintail), ChimeraSlayer, and Uchime using artificially generated chimeras. Overall, ss_DECIPHER and Uchime provided the highest chimera detection for sequences 100 to 600 nucleotides long (79% and 81%, respectively), but Uchime's performance deteriorated for longer sequences, while ss_DECIPHER maintained a high detection rate (89%). Both methods had low false-positive rates (1.3% and 1.6%). The more conservative fs_DECIPHER, benchmarked only for sequences longer than 600 nucleotides, had an overall detection rate lower than that of ss_DECIPHER (75%) but higher than those of the other programs. In addition, fs_DECIPHER had the lowest false-positive rate among all the benchmarked programs (<0.20%). DECIPHER was outperformed only by ChimeraSlayer and Uchime when chimeras were formed from closely related parents (less than 10% divergence). Given the differences in the programs, it was possible to detect over 89% of all chimeras with just the combination of ss_DECIPHER and Uchime. Using fs_DECIPHER, we detected between 1% and 2% additional chimeras in the RDP, SILVA, and Greengenes databases from which chimeras had already been removed with Pintail or Bellerophon. DECIPHER was implemented in the R programming language and is directly accessible through a webpage or by downloading the program as an R package (http://DECIPHER.cee.wisc.edu).  相似文献   

4.
A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.  相似文献   

5.
Sequences in public databases may contain a number of sequencing errors. A double binomial model describing the distribution of indel-excluded similarity coefficients (S) among repeatedly sequenced 16S rRNA was previously developed and it produced a confidence interval of S useful for testing sequence identity among sequences of 400-bp length. We characterized patterns in sequencing errors found in nearly complete 16S rRNA sequences of Vibrionaceae as highly variable in reported sequence length and containing a small number of indels. To accommodate these characteristics, a simple binomial model for distribution of the similarity coefficient (H) that included indels was derived from the double binomial model for S. The model showed good fit to empirical data. By using either a pre-determined or bootstrapping estimated standard probability of base matching, we were able to use the exact binomial test to determine the relative level of sequencing error for a given pair of duplicated sequences. A limitation of the method is the requirement that duplicated sequences for the same template sequence be paired, but this can be overcome by using only conserved regions of 16S rRNA sequences and pairing a given sequence with its highest scoring BLAST search hit from the nr database of GenBank.  相似文献   

6.
The contribution of PCR artifacts to 16S rRNA gene sequence diversity from a complex bacterioplankton sample was estimated. Taq DNA polymerase errors were found to be the dominant sequence artifact but could be constrained by clustering the sequences into 99% sequence similarity groups. Other artifacts (chimeras and heteroduplex molecules) were significantly reduced by employing modified amplification protocols. Surprisingly, no skew in sequence types was detected in the two libraries constructed from PCR products amplified for different numbers of cycles. Recommendations for modification of amplification protocols and for reporting diversity estimates at 99% sequence similarity as a standard are given.  相似文献   

7.
Detection of chimeric artifacts formed when PCR is used to retrieve naturally occurring small-subunit (SSU) rRNA sequences may rely on demonstrating that different sequence domains have different phylogenetic affiliations. We evaluated the CHECK_CHIMERA method of the Ribosomal Database Project and another method which we developed, both based on determining nearest neighbors of different sequence domains, for their ability to discern artificially generated SSU rRNA chimeras from authentic Ribosomal Database Project sequences. The reliability of both methods decreases when the parental sequences which contribute to chimera formation are more than 82 to 84% similar. Detection is also complicated by the occurrence of authentic SSU rRNA sequences that behave like chimeras. We developed a naive statistical test based on CHECK_CHIMERA output and used it to evaluate previously reported SSU rRNA chimeras. Application of this test also suggests that chimeras might be formed by retrieving SSU rRNAs as cDNA. The amount of uncertainty associated with nearest-neighbor analyses indicates that such tests alone are insufficient and that better methods are needed.  相似文献   

8.
The contribution of PCR artifacts to 16S rRNA gene sequence diversity from a complex bacterioplankton sample was estimated. Taq DNA polymerase errors were found to be the dominant sequence artifact but could be constrained by clustering the sequences into 99% sequence similarity groups. Other artifacts (chimeras and heteroduplex molecules) were significantly reduced by employing modified amplification protocols. Surprisingly, no skew in sequence types was detected in the two libraries constructed from PCR products amplified for different numbers of cycles. Recommendations for modification of amplification protocols and for reporting diversity estimates at 99% sequence similarity as a standard are given.  相似文献   

9.
G C Wang  Y Wang 《Applied microbiology》1997,63(12):4645-4650
PCR is routinely used in amplification and cloning of rRNA genes from environmental DNA samples for studies of microbial community structure and identification of novel organisms. There have been concerns about generation of chimeric sequences as a consequence of PCR coamplification of highly conserved genes, because such sequences may lead to reports of nonexistent organisms. To quantify the frequency of chimeric molecule formation, mixed genomic DNAs from eight actinomycete species whose 16S rRNA sequences had been determined were used for PCR coamplification of 16S rRNA genes. A large number of cloned 16S ribosomal DNAs were examined by sequence analysis, and chimeric molecules were identified by multiple-sequence alignment with reference species. Here, we report that the level of occurrence of chimeric sequences after 30 cycles of PCR amplification was 32%. We also show that PCR-induced chimeras were formed between different rRNA gene copies from the same organism. Because of the wide use of PCR for direct isolation of 16S rRNA sequences from environmental DNA to assess microbial diversity, the extent of chimeric molecule formation deserves serious attention.  相似文献   

10.
The study aim was to describe the diversity of the intraluminal intestinal microbial community in dogs by direct sequence analysis of the 16S rRNA gene. Intestinal content was collected from the duodenum, jejunum, ileum, and colon from six healthy dogs. Bacterial 16S rRNA gene was amplified with universal bacterial primers. Amplicons were ligated into cloning vectors and near-full-length 16S rRNA gene inserts were analyzed. From a total of 864 clones analyzed, 106 nonredundant 16S rRNA gene sequences were identified. Forty-two (40%) sequences showed<98% sequence similarity to 16S rRNA gene sequences reported previously. Operation taxonomic units were classified into four phyla: Firmicutes, Fusobacteria, Bacteroidetes, and Proteobacteria. Clostridiales predominated in the duodenum (40% of clones) and jejunum (39%), and were highly abundant in the ileum (25%) and colon (26%). Sequences affiliated with Clostridium cluster XI and Clostridium cluster XIVa dominated in the proximal small intestine and colon, respectively. Fusobacteriales and Bacteroidales were the most abundant bacterial order in the ileum (33%) and colon (30%). Enterobacteriales were more commonly observed in the small intestine than in the colon. Lactobacillales occurred commonly in all parts of the intestine.  相似文献   

11.
Here we describe the natural occurrence of bacteria of the class Dehalococcoidia (DEH) and their diversity at different depths in anoxic waters of a remote meromictic lake (Lake Pavin) using 16S rRNA gene amplicon sequencing and quantitative PCR. Detected DEH are phylogenetically diverse and the majority of 16S rRNA sequences have less than 91% similarity to previously isolated DEH 16S rRNA sequences. To predict the metabolic potential of detected DEH subgroups and to assess if they encode genes to transform halogenated compounds, we enriched DEH-affiliated genomic DNA by using a specific-gene capture method and probes against DEH-derived 16S rRNA genes, reductive dehalogenase genes and known insertion sequences. Two reductive dehalogenase homologous sequences were identified from DEH-enriched genomic DNA, and marker genes in the direct vicinity confirm that gene fragments were derived from DEH. The low sequence similarity with known reductive dehalogenase genes suggests yet-unknown catabolic potential in the anoxic zone of Lake Pavin.  相似文献   

12.
SUMMARY: Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. AVAILABILITY: Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl  相似文献   

13.
李涛  王鹏 《生态学报》2013,33(1):286-293
分别利用参数模型和无参数估计法预测南海陆坡沉积物柱MD05-2896中的细菌丰度.基于非培养的PCR-RFLP的16SrRNA基因分子技术,扩增了沉积物柱中的细菌16S rRNA基因序列,并构建16S rRNA基因文库.系统发育分析表明16S rRNA基因文库中,大多数序列属于17个已知的“门”.分别以99%、97%、90%和80%序列一致性作为分类单元分界点,将16SrRNA基因序列组群为分类单元.使用逆高斯分布模型、对数正态分布模型、负二项式分布模型、帕雷托分布模型、双指数分布模型以及ACE、ACE-1等估计方法预测不同分类单元分类水平下的细菌丰度.结果表明在“种”级分类水平上,负二项式分布为最优估计模型,估计细菌丰度为244±10(SE).不过,受实验条件的限制,该估计值可能偏低.  相似文献   

14.
An insertion of about 100 bases within the central part of the 23S rRNA genes was found to be a phylogenetic marker for the bacterial line of descent of Gram-positive bacteria with a high DNA G + C content. The insertion was present in 23S rRNA genes of 64 strains representing the major phylogenetic groups of Gram-positive bacteria with a high DNA G+C content, whereas it was not found in 23S rRNA genes of 55 (eu)bacteria representing Gram-positive bacteria with a low DNA G + C content and all other known (eu)bacterial phyla. The presence of the insertion could be easily demonstrated by comparative gel electrophoretic analysis of in vitro-amplified 23S rDNA fragments, which contained the insertion. The nucleotide sequences of the amplified fragments were determined and sequence similarities of at least 44% were found. The overall similarity values are lower than those of 16S and 23S rRNA sequences of the particular organism. Northern hybridization experiments indicated the presence of the insertion within the mature 23S rRNA of Corynebacterium glutamicum.  相似文献   

15.
The complete nucleotide sequence of 16S rRNA from Propionigenium modestum was determined and compared with 380 16S rRNA sequences from representatives of all eu- and archaebacterial phyla known so far. The phylogenetic analysis of this data set indicated P. modestum to represent a new separated line of descent within the radiation of eubacterial phyla moderately related to cyanobacteria and Gram-positive bacteria with low DNA GC content.  相似文献   

16.
As an evolutionary marker, 23S ribosomal RNA (rRNA) offers more diagnostic sequence stretches and greater sequence variation than 16S rRNA. However, 23S rRNA is still not as widely used. Based on 80 metagenome samples from the Global Ocean Sampling (GOS) Expedition, the usefulness and taxonomic resolution of 23S rRNA were compared to those of 16S rRNA. Since 23S rRNA is approximately twice as large as 16S rRNA, twice as many 23S rRNA gene fragments were retrieved from the GOS reads than 16S rRNA gene fragments, with 23S rRNA gene fragments being generally about 100 bp longer. Datasets for 16S and 23S rRNA sequences revealed similar relative abundances for major marine bacterial and archaeal taxa. However, 16S rRNA sequences had a better taxonomic resolution due to their significantly larger reference database.Reevaluation of the specificity of previously published PCR amplification primers and group specific fluorescence in situ hybridization probes on this metagenomic set of non-amplified 23S rRNA sequences revealed that out of 16 primers investigated, only two had more than 90% target group coverage. Evaluations of two probes, BET42a and GAM42a, were in accordance with previous evaluations, with a discrepancy in the target group coverage of the GAM42a probe when evaluated against the GOS metagenomic dataset.  相似文献   

17.
The number of bacterial species estimated to exist on Earth has increased dramatically in recent years. This newly recognized species diversity has raised the possibility that bacterial natural product biosynthetic diversity has also been significantly underestimated by previous culture-based studies. Here, we compare 454-pyrosequenced nonribosomal peptide adenylation domain, type I polyketide ketosynthase domain, and type II polyketide ketosynthase alpha gene fragments amplified from cosmid libraries constructed using DNA isolated from three different arid soils. While 16S rRNA gene sequence analysis indicates these cloned metagenomes contain DNA from similar distributions of major bacterial phyla, we found that they contain almost completely distinct collections of secondary metabolite biosynthetic gene sequences. When grouped at 85% identity, only 1.5% of the adenylation domain, 1.2% of the ketosynthase, and 9.3% of the ketosynthase alpha sequence clusters contained sequences from all three metagenomes. Although there is unlikely to be a simple correlation between biosynthetic gene sequence diversity and the diversity of metabolites encoded by the gene clusters in which these genes reside, our analysis further suggests that sequences in one soil metagenome are so distantly related to sequences in another metagenome that they are, in many cases, likely to arise from functionally distinct gene clusters. The marked differences observed among collections of biosynthetic genes found in even ecologically similar environments suggest that prokaryotic natural product biosynthesis diversity is, like bacterial species diversity, potentially much larger than appreciated from culture-based studies.  相似文献   

18.
为了研究分析嗜盐古生菌物种与细菌视紫红质(BR)蛋白基因资源,从40份土壤、湖水及淤泥样品中分离出148株嗜盐菌,对其中6株菌采用聚合酶链式反应(PCR)方法对其编码螺旋C至螺旋G的蛋白基因片段和16SrRNA基因进行了扩增,并测定了基因的核苷酸序列。与已报道的相应片段进行对比,ABDH10,ABDH1I和ABDH40中的螺旋C至螺旋G的蛋白与其他菌株差异显著。基于16SrRNA序列的同源性比较以及系统发育学研究表明,ABDH10和ABDH40是Natronorubrum属下的新成员和Natrinema属下的新成员,ABDH40的16SrRNA序列已登录到GenBank,其序列号为AY989910。ABDH11中的螺旋C至螺旋G的蛋白与其他菌株差异显著。  相似文献   

19.
Bacteria of the phyla Proteobacteria and Bacteroidetes are known to be the most prominent heterotrophic organisms in marine surface waters. In order to investigate the occurrence of these phyla in a coastal environment, the tidal flat ecosystem German Wadden Sea, we analyzed a clone library of PCR-amplified and sequenced 16S rRNA gene fragments and isolated 46 new strains affiliated with these phyla from the water column with various polymers and complex media as substrates. The phylogenetic affiliation of these strains was analyzed on the basis of sequenced 16S rRNA gene fragments. Subsequently, a comprehensive phylogenetic analysis of Proteobacteria and Bacteroidetes including available sequences from oxic habitats of earlier studies of this ecosystem was performed. Sequences of the earlier studies were derived from isolation approaches and from denaturing gradient gel electrophoresis (DGGE) analyses of environmental samples and high dilution steps of MPN (most probable number) cultures. The majority of the 265 sequences included in this analysis affiliated with alpha-Proteobacteria (45.3%), gamma-Proteobacteria (31.7%), and Bacteroidetes (16.2%). Almost 7% belong to the delta-Proteobacteria and several of these clones affiliated with the Myxococcales, a group comprising obligate aerobic organisms. Within the alpha- and gamma-Proteobacteria specific clusters were identified including isolates from high dilution steps of dilution cultures and/or clones from the clone library or DGGE gels, implying a high abundance of some of these organisms. Within the gamma-Proteobacteria a new cluster is proposed, which consists of marine surface-attached organisms. This SAMMIC (Surface Attached Marine MICrobes) cluster comprises only uncultured phylotypes and exhibits a global distribution. Overall, the analysis indicates that Proteobacteria and Bacteroidetes of the Wadden Sea have a surprisingly high diversity, presumably a result of the signature of this ecosystem as a melting pot at the land-sea interface and comprising a great habitat variety.  相似文献   

20.
Molecular approaches aimed at detection of a broad-range of prokaryotes in the environment routinely rely on classifying heterogeneous 16S rRNA genes amplified by polymerase chain reaction (PCR) using primers with broad specificity. The general method of sampling and categorizing DNA has been to clone then sequence the PCR products. However, the number of clones required to adequately catalog the majority of taxa in a sample is unwieldy. Alternatively, hybridizing target sequences to a universal 16S rRNA gene microarray may provide a more rapid and comprehensive view of prokaryotic community composition. This study investigated the breadth and accuracy of a microarray in detecting diverse 16S rRNA gene sequence types compared to clone-and-sequencing using three environmental samples: urban aerosol, subsurface soil, and subsurface water. PCR products generated from universal 16S rRNA gene-targeted primers were classified by using either the clone-and-sequence method or by hybridization to a novel high-density microarray of 297,851 probes complementary to 842 prokaryotic subfamilies. The three clone libraries comprised 1391 high-quality sequences. Approximately 8% of the clones could not be placed into a known subfamily and were considered novel. The microarray results confirmed the majority of clone-detected subfamilies and additionally demonstrated greater amplicon diversity extending into phyla not observed by the cloning method. Sequences matching operational taxonomic units within the phyla Nitrospira, Planctomycetes, and TM7, which were uniquely detected by the array, were verified with specific primers and subsequent amplicon sequencing. Subfamily richness detected by the array corresponded well with nonparametric richness predictions extrapolated from clone libraries except in the water community where clone-based richness predictions were greatly exceeded. It was concluded that although the microarray is unreliable in identifying novel prokaryotic taxa, it reveals greater diversity in environmental samples than sequencing a typically sized clone library. Furthermore, the microarray allowed samples to be rapidly evaluated with replication, a significant advantage in studies of microbial ecology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号