首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The advent of next generation sequencing has coincided with a growth in interest in using these approaches to better understand the role of the structure and function of the microbial communities in human, animal, and environmental health. Yet, use of next generation sequencing to perform 16S rRNA gene sequence surveys has resulted in considerable controversy surrounding the effects of sequencing errors on downstream analyses. We analyzed 2.7×10(6) reads distributed among 90 identical mock community samples, which were collections of genomic DNA from 21 different species with known 16S rRNA gene sequences; we observed an average error rate of 0.0060. To improve this error rate, we evaluated numerous methods of identifying bad sequence reads, identifying regions within reads of poor quality, and correcting base calls and were able to reduce the overall error rate to 0.0002. Implementation of the PyroNoise algorithm provided the best combination of error rate, sequence length, and number of sequences. Perhaps more problematic than sequencing errors was the presence of chimeras generated during PCR. Because we knew the true sequences within the mock community and the chimeras they could form, we identified 8% of the raw sequence reads as chimeric. After quality filtering the raw sequences and using the Uchime chimera detection program, the overall chimera rate decreased to 1%. The chimeras that could not be detected were largely responsible for the identification of spurious operational taxonomic units (OTUs) and genus-level phylotypes. The number of spurious OTUs and phylotypes increased with sequencing effort indicating that comparison of communities should be made using an equal number of sequences. Finally, we applied our improved quality-filtering pipeline to several benchmarking studies and observed that even with our stringent data curation pipeline, biases in the data generation pipeline and batch effects were observed that could potentially confound the interpretation of microbial community data.  相似文献   

2.
Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ∼8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.  相似文献   

3.
李涛  王鹏 《生态学报》2013,33(1):286-293
分别利用参数模型和无参数估计法预测南海陆坡沉积物柱MD05-2896中的细菌丰度.基于非培养的PCR-RFLP的16SrRNA基因分子技术,扩增了沉积物柱中的细菌16S rRNA基因序列,并构建16S rRNA基因文库.系统发育分析表明16S rRNA基因文库中,大多数序列属于17个已知的“门”.分别以99%、97%、90%和80%序列一致性作为分类单元分界点,将16SrRNA基因序列组群为分类单元.使用逆高斯分布模型、对数正态分布模型、负二项式分布模型、帕雷托分布模型、双指数分布模型以及ACE、ACE-1等估计方法预测不同分类单元分类水平下的细菌丰度.结果表明在“种”级分类水平上,负二项式分布为最优估计模型,估计细菌丰度为244±10(SE).不过,受实验条件的限制,该估计值可能偏低.  相似文献   

4.
The goal of this research was to investigate the influence of the error rate of sequence determination on the differentiation of cloned SSU rRNA gene sequences for assessment of community structure. SSU rRNA cloned sequences from groundwater samples that represent different bacterial divisions were sequenced multiple times with the same sequencing primer. From comparison of sequence alignments with unedited data, confidence intervals were obtained from both a 'double binomial' model of sequence comparison and by non-parametric methods. The results indicated that similarity values below 0.9946 are likely derived from dissimilar sequences at a confidence level of 0.95, and not sequencing errors. The results confirmed that screening by direct sequence determination could be reliably used to differentiate at the species level. However, given sequencing errors comparable to those seen in this study, sequences with similarities above 0.9946 should be treated as the same sequence if a 95% confidence is desired.  相似文献   

5.
Next-generation sequencing has increased the coverage of microbial diversity surveys by orders of magnitude, but differentiating artifacts from rare environmental sequences remains a challenge. Clustering 16S rRNA sequences into operational taxonomic units (OTUs) organizes sequence data into groups of 97 % identity, helping to reduce data volumes and avoid analyzing sequencing artifacts by grouping them with real sequences. Here, we analyze sequence abundance distributions across environmental samples and show that 16S rRNA sequences of >99 % identity can represent functionally distinct microorganisms, rendering OTU clustering problematic when the goal is an accurate analysis of organism distribution. Strict postsequencing quality control (QC) filters eliminated the most prevalent artifacts without clustering. Further experiments proved that DNA polymerase errors in polymerase chain reaction (PCR) generate a significant number of substitution errors, most of which pass QC filters. Based on our findings, we recommend minimizing the number of PCR cycles in DNA library preparation and applying strict postsequencing QC filters to reduce the most prevalent artifacts while maintaining a high level of accuracy in diversity estimates. We further recommend correlating rare and abundant sequences across environmental samples, rather than clustering into OTUs, to identify remaining sequence artifacts without losing the resolution afforded by high-throughput sequencing.  相似文献   

6.
A bacterial strain, designated BzDS03 was isolated from water sample, collected from Dal Lake Srinagar. The strain was characterized by using 16S ribosomal RNA gene and 16S-23S rRNA internal transcribed spacer region sequences. Phylogenetic analysis showed that 16S rRNA sequence of the isolate formed a monophyletic clade with genera Escherichia. The closest phylogenetic relative was Escherichia coli with 99% 16S rRNA gene sequence similarity. The result of Ribosomal database project's classifier tool revealed that the strain BzDS03 belongs to genera Escherichia.16S rRNA sequence of isolate was deposited in GenBank with accession number FJ961336. Further analysis of 16S-23S rRNA sequence of isolate confirms that the identified strain BzDS03 be assigned as the type strain of Escherichia coli with 98% 16S-23S rRNA sequence similarity. The GenBank accession number allotted for 16S-23S rRNA intergenic spacer sequence of isolate is FJ961337.  相似文献   

7.
新疆泥火山细菌遗传多样性   总被引:7,自引:0,他引:7  
为了解新疆乌苏泥火山细菌多样性,从泥火山泥浆样品中直接提取总DNA,构建了含150个有效转化子的泥火山细菌16S rDNA基因文库,转化子经菌液PCR及HaeⅢ酶切后获得16个不同带型,克隆测序结果表明,其分属于16个不同的分类单元.一部分序列与已知细菌类群的16S rDNA序列相似性较高,归属变形菌门(Proteobacteria),厚壁菌门(Firmicutes),梭杆菌门(Fusobacteria),放线菌门(Actinobacteria);另外一部分序列与已知细菌类群的16S rDNA序列同源性较低,可能代表新的分类单位.研究结果显示,泥火山环境中微生物种群丰富,值得进一步研究.  相似文献   

8.
Here we describe the natural occurrence of bacteria of the class Dehalococcoidia (DEH) and their diversity at different depths in anoxic waters of a remote meromictic lake (Lake Pavin) using 16S rRNA gene amplicon sequencing and quantitative PCR. Detected DEH are phylogenetically diverse and the majority of 16S rRNA sequences have less than 91% similarity to previously isolated DEH 16S rRNA sequences. To predict the metabolic potential of detected DEH subgroups and to assess if they encode genes to transform halogenated compounds, we enriched DEH-affiliated genomic DNA by using a specific-gene capture method and probes against DEH-derived 16S rRNA genes, reductive dehalogenase genes and known insertion sequences. Two reductive dehalogenase homologous sequences were identified from DEH-enriched genomic DNA, and marker genes in the direct vicinity confirm that gene fragments were derived from DEH. The low sequence similarity with known reductive dehalogenase genes suggests yet-unknown catabolic potential in the anoxic zone of Lake Pavin.  相似文献   

9.
10.
11.
Characterisation of microsporidian species and differentiation among genetic variants of the same species has typically relied on ribosomal RNA (rRNA) gene sequences. We characterised the entire rRNA gene of a microsporidium from 11 isolates representing eight different European bumblebee (Bombus) species. We demonstrate that the microsporidium Nosema bombi infected all hosts that originated from a wide geographic area. A total of 16 variable sites (all single nucleotid polymorphisms (SNPs)) was detected in the small subunit (SSU) rRNA gene and 42 (39 SNPs and 3 indels) in the large subunit (LSU) rRNA sequence. Direct sequencing of PCR-amplified DNA products of the internal transcribed spacer (ITS) region revealed identical sequences in all isolates. In contrast, ITS fragment length determined by PAGE and sequencing of cloned amplicons gave better resolution of sequences and revealed multiple SNPs across isolates and two fragment sizes in each isolate (six short and seven long amplicon variants). Genetic variants were not unique to individual host species. Moreover, two or more sequence variants were obtained from individual bumblebee hosts, suggesting the existence of multiple, variable copies of rRNA in the same microsporidium, and contrary to that expected for a class of multi-gene family under concerted evolution theory. Our data on within-genome rRNA variability call into question the usefulness of rRNA sequences to characterise intraspecific genetic variants in the Microsporidia and other groups of unicellular organisms.  相似文献   

12.
S Chao  R Sederoff    C S Levings  rd 《Nucleic acids research》1984,12(16):6629-6644
The nucleotide sequence of the gene coding for the 18S ribosomal RNA of maize mitochondria has been determined and a model for the secondary structure is proposed. Dot matrix analysis has been used to compare the extent and distribution of sequence similarities of the entire maize mitochondrial 18S rRNA sequence with that of 15 other small subunit rRNA sequences. The mitochondrial gene shows great similarity to the eubacterial sequences and to the maize chloroplast, and less similarity to mitochondrial rRNA genes in animals and fungi. We propose that this similarity is due to a slow rate of nucleotide divergence in plant mtDNA compared to the mtDNA of animals. Sequence comparisons indicate that the evolution of the maize mitochondrial 18S, chloroplast 16S and nuclear 17S ribosomal genes have been essentially independent, in spite of evidence for DNA transfer between organelles and the nucleus.  相似文献   

13.
The contribution of PCR artifacts to 16S rRNA gene sequence diversity from a complex bacterioplankton sample was estimated. Taq DNA polymerase errors were found to be the dominant sequence artifact but could be constrained by clustering the sequences into 99% sequence similarity groups. Other artifacts (chimeras and heteroduplex molecules) were significantly reduced by employing modified amplification protocols. Surprisingly, no skew in sequence types was detected in the two libraries constructed from PCR products amplified for different numbers of cycles. Recommendations for modification of amplification protocols and for reporting diversity estimates at 99% sequence similarity as a standard are given.  相似文献   

14.
We present an EST library, chloroplast genome sequence, and nuclear microsatellite markers that were developed for the semi-domesticated oilseed crop noug (Guizotia abyssinica) from Ethiopia. The EST library consists of 25 711 Sanger reads, assembled into 17 538 contigs and singletons, of which 4781 were functionally annotated using the Arabidopsis Information Resource (TAIR). The age distribution of duplicated genes in the EST library shows evidence of two paleopolyploidizations—a pattern that noug shares with several other species in the Heliantheae tribe (Compositae family). From the EST library, we selected 43 microsatellites and then designed and tested primers for their amplification. The number of microsatellite alleles varied between 2 and 10 (average 4.67), and the average observed and expected heterozygosities were 0.49 and 0.54, respectively. The chloroplast genome was sequenced de novo using Illumina’s sequencing technology and completed with traditional Sanger sequencing. No large re-arrangements were found between the noug and sunflower chloroplast genomes, but 1.4% of sites have indels and 1.8% show sequence divergence between the two species. We identified 34 tRNAs, 4 rRNA sequences, and 80 coding sequences, including one region (trnH-psbA) with 15% sequence divergence between noug and sunflower that may be particularly useful for phylogeographic studies in noug and its wild relatives.  相似文献   

15.
The contribution of PCR artifacts to 16S rRNA gene sequence diversity from a complex bacterioplankton sample was estimated. Taq DNA polymerase errors were found to be the dominant sequence artifact but could be constrained by clustering the sequences into 99% sequence similarity groups. Other artifacts (chimeras and heteroduplex molecules) were significantly reduced by employing modified amplification protocols. Surprisingly, no skew in sequence types was detected in the two libraries constructed from PCR products amplified for different numbers of cycles. Recommendations for modification of amplification protocols and for reporting diversity estimates at 99% sequence similarity as a standard are given.  相似文献   

16.
Toward a census of bacteria in soil   总被引:2,自引:0,他引:2  
For more than a century, microbiologists have sought to determine the species richness of bacteria in soil, but the extreme complexity and unknown structure of soil microbial communities have obscured the answer. We developed a statistical model that makes the problem of estimating richness statistically accessible by evaluating the characteristics of samples drawn from simulated communities with parametric community distributions. We identified simulated communities with rank-abundance distributions that followed a truncated lognormal distribution whose samples resembled the structure of 16S rRNA gene sequence collections made using Alaskan and Minnesotan soils. The simulated communities constructed based on the distribution of 16S rRNA gene sequences sampled from the Alaskan and Minnesotan soils had a richness of 5,000 and 2,000 operational taxonomic units (OTUs), respectively, where an OTU represents a collection of sequences not more than 3% distant from each other. To sample each of these OTUs in the Alaskan 16S rRNA gene library at least twice, 480,000 sequences would be required; however, to estimate the richness of the simulated communities using nonparametric richness estimators would require only 18,000 sequences. Quantifying the richness of complex environments such as soil is an important step in building an ecological framework. We have shown that generating sufficient sequence data to do so requires less sequencing effort than completely sequencing a bacterial genome.  相似文献   

17.
The presence of heterozygous indels in a DNA sequence usually results in the sequence being discarded. If the sequence trace is of high enough quality, however, it will contain enough information to reconstruct the two constituent sequences with very little ambiguity. Solutions already exist using comparisons with a known reference sequence, but this is often unavailable for nonmodel organisms or novel DNA regions. I present a program which determines the sizes and positions of heterozygous indels in a DNA sequence and reconstructs the two constituent haploid sequences. No external data such as a reference sequence or other prior knowledge are required. Simulation suggests an accuracy of >99% from a single read, with errors being eliminable by the inclusion of a second sequencing read, such as one using a reverse primer. Diploid sequences can be fully reconstructed across any number of heterozygous indels, with two overlapping sequencing reads almost always sufficient to infer the entire DNA sequence. This eliminates the need for costly and laborious cloning, and allows data to be used which would otherwise be discarded. With no more laboratory work than is needed to produce two normal sequencing reads, two aligned haploid sequences can be produced quickly and accurately and with extensive phasing information.  相似文献   

18.
Because of technological limitations, the primer and amplification biases in targeted sequencing of 16S rRNA genes have veiled the true microbial diversity underlying environmental samples. However, the protocol of metagenomic shotgun sequencing provides 16S rRNA gene fragment data with natural immunity against the biases raised during priming and thus the potential of uncovering the true structure of microbial community by giving more accurate predictions of operational taxonomic units (OTUs). Nonetheless, the lack of statistically rigorous comparison between 16S rRNA gene fragments and other data types makes it difficult to interpret previously reported results using 16S rRNA gene fragments. Therefore, in the present work, we established a standard analysis pipeline that would help confirm if the differences in the data are true or are just due to potential technical bias. This pipeline is built by using simulated data to find optimal mapping and OTU prediction methods. The comparison between simulated datasets revealed a relationship between 16S rRNA gene fragments and full-length 16S rRNA sequences that a 16S rRNA gene fragment having a length >150 bp provides the same accuracy as a full-length 16S rRNA sequence using our proposed pipeline, which could serve as a good starting point for experimental design and making the comparison between 16S rRNA gene fragment-based and targeted 16S rRNA sequencing-based surveys possible.  相似文献   

19.
20.
Microbial communities host unparalleled taxonomic diversity. Adequate characterization of environmental and host-associated samples remains a challenge for microbiologists, despite the advent of 16S rRNA gene sequencing. In order to increase the depth of sampling for diverse bacterial communities, we developed a method for sequencing and assembling millions of paired-end reads from the 16S rRNA gene (spanning the V3 region; ~200 nucleotides) by using an Illumina genome analyzer. To confirm reproducibility and to identify a suitable computational pipeline for data analysis, sequence libraries were prepared in duplicate for both a defined mixture of DNAs from known cultured bacterial isolates (>1 million postassembly sequences) and an Arctic tundra soil sample (>6 million postassembly sequences). The Illumina 16S rRNA gene libraries represent a substantial increase in number of sequences over all extant next-generation sequencing approaches (e.g., 454 pyrosequencing), while the assembly of paired-end 125-base reads offers a methodological advantage by incorporating an initial quality control step for each 16S rRNA gene sequence. This method incorporates indexed primers to enable the characterization of multiple microbial communities in a single flow cell lane, may be modified readily to target other variable regions or genes, and demonstrates unprecedented and economical access to DNAs from organisms that exist at low relative abundances.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号