首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We evaluated phylogenetic clustering of bacterial and archaeal communities from redox-dynamic subtropical forest soils that were defined by 16S rRNA and rRNA gene sequences. We observed significant clustering for the RNA-based communities but not the DNA-based communities, as well as increasing clustering over time of the highly active taxa detected by only rRNA.  相似文献   

2.
Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.  相似文献   

3.
Recent developments of next generation sequencing technologies have led to rapid accumulation of 16S rRNA sequences for microbiome profiling. One key step in data processing is to cluster short sequences into operational taxonomic units (OTUs). Although many methods have been proposed for OTU inferences, a major challenge is the balance between inference accuracy and computational efficiency, where inference accuracy is often sacrificed to accommodate the need to analyze large numbers of sequences. Inspired by the hierarchical clustering method and a modified greedy network clustering algorithm, we propose a novel multi-seeds based heuristic clustering method, named MSClust, for OTU inference. MSClust first adaptively selects multi-seeds instead of one seed for each candidate cluster, and the reads are then processed using a greedy clustering strategy. Through many numerical examples, we demonstrate that MSClust enjoys less memory usage, and better biological accuracy compared to existing heuristic clustering methods while preserving efficiency and scalability.  相似文献   

4.
5.
6.

Background

Advances in biotechnology have changed the manner of characterizing large populations of microbial communities that are ubiquitous across several environments."Metagenome" sequencing involves decoding the DNA of organisms co-existing within ecosystems ranging from ocean, soil and human body. Several researchers are interested in metagenomics because it provides an insight into the complex biodiversity across several environments. Clinicians are using metagenomics to determine the role played by collection of microbial organisms within human body with respect to human health wellness and disease.

Results

We have developed an efficient and scalable, species richness estimation algorithm that uses locality sensitive hashing (LSH). Our algorithm achieves efficiency by approximating the pairwise sequence comparison operations using hashing and also incorporates matching of fixed-length, gapless subsequences criterion to improve the quality of sequence comparisons. We use LSH-based similarity function to cluster similar sequences and make individual groups, called operational taxonomic units (OTUs). We also compute different species diversity/richness metrics by utilizing OTU assignment results to further extend our analysis.

Conclusion

The algorithm is evaluated on synthetic samples and eight targeted 16S rRNA metagenome samples taken from seawater. We compare the performance of our algorithm with several competing diversity estimation algorithms. We show the benefits of our approach with respect to computational runtime and meaningful OTU assignments. We also demonstrate practical significance of the developed algorithm by comparing bacterial diversity and structure across different skin locations.
  相似文献   

7.
近年来,16S扩增子测序技术被广泛应用于肠道微生物菌群结构和多样性研究,同时也常被用于临床样本中未知病原菌的检测。然而其对样本中物种组成的分辨率只能到属水平的相对丰度,且实验过程中多种因素皆可对结果产生一定影响,如样本起始浓度、PCR循环数、扩增引物等。为解决以上问题,本研究采用随机标签和内参法相结合的方法,开发了一套定量16S扩增子测序方法,将常规的16S rRNA编码基因测序结果中的相对丰度转化为绝对定量的拷贝数,有效提高了肠道菌群结构检测的精准性,降低了实验操作对结果的影响,也提高了测序与其他分子生物学方法间的可比性,有利于未来技术的进一步研发和改进。  相似文献   

8.
High-throughput sequencing can produce hundreds of thousands of 16S rRNA sequence reads corresponding to different organisms present in the environmental samples. Typically, analysis of microbial diversity in bioinformatics starts from pre-processing followed by clustering 16S rRNA reads into relatively fewer operational taxonomic units (OTUs). The OTUs are reliable indicators of microbial diversity and greatly accelerate the downstream analysis time. However, existing hierarchical clustering algorithms that are generally more accurate than greedy heuristic algorithms struggle with large sequence datasets. To keep pace with the rapid rise in sequencing data, we present CLUSTOM-CLOUD, which is the first distributed sequence clustering program based on In-Memory Data Grid (IMDG) technology–a distributed data structure to store all data in the main memory of multiple computing nodes. The IMDG technology helps CLUSTOM-CLOUD to enhance both its capability of handling larger datasets and its computational scalability better than its ancestor, CLUSTOM, while maintaining high accuracy. Clustering speed of CLUSTOM-CLOUD was evaluated on published 16S rRNA human microbiome sequence datasets using the small laboratory cluster (10 nodes) and under the Amazon EC2 cloud-computing environments. Under the laboratory environment, it required only ~3 hours to process dataset of size 200 K reads regardless of the complexity of the human microbiome data. In turn, one million reads were processed in approximately 20, 14, and 11 hours when utilizing 20, 30, and 40 nodes on the Amazon EC2 cloud-computing environment. The running time evaluation indicates that CLUSTOM-CLOUD can handle much larger sequence datasets than CLUSTOM and is also a scalable distributed processing system. The comparative accuracy test using 16S rRNA pyrosequences of a mock community shows that CLUSTOM-CLOUD achieves higher accuracy than DOTUR, mothur, ESPRIT-Tree, UCLUST and Swarm. CLUSTOM-CLOUD is written in JAVA and is freely available at http://clustomcloud.kopri.re.kr.  相似文献   

9.
A 16S rRNA fluorescence in situ hybridization (FISH) method for cheese was developed to allow detection in situ of microorganisms within the dairy matrix. An embedding procedure using a plastic resin was applied to Stilton cheese, providing intact embedded cheese sections withstanding the hybridization reaction. The use of a fluorescein-labelled 16S rRNA Domain Bacteria probe allowed observation of large colonies of microbial cells homogeneously distributed in the cheese matrix. FISH experiments performed on cheese suspensions provided images of the different microbial morphotypes occurring. The technique has great potential to study the spatial distribution of microbial populations in situ in foods, especially where the matrix is too fragile to allow manipulation of cryosections.  相似文献   

10.
Taxonomy-independent analysis plays an essential role in microbial community analysis. Hierarchical clustering is one of the most widely employed approaches to finding operational taxonomic units, the basis for many downstream analyses. Most existing algorithms have quadratic space and computational complexities, and thus can be used only for small or medium-scale problems. We propose a new online learning-based algorithm that simultaneously addresses the space and computational issues of prior work. The basic idea is to partition a sequence space into a set of subspaces using a partition tree constructed using a pseudometric, then recursively refine a clustering structure in these subspaces. The technique relies on new methods for fast closest-pair searching and efficient dynamic insertion and deletion of tree nodes. To avoid exhaustive computation of pairwise distances between clusters, we represent each cluster of sequences as a probabilistic sequence, and define a set of operations to align these probabilistic sequences and compute genetic distances between them. We present analyses of space and computational complexity, and demonstrate the effectiveness of our new algorithm using a human gut microbiota data set with over one million sequences. The new algorithm exhibits a quasilinear time and space complexity comparable to greedy heuristic clustering algorithms, while achieving a similar accuracy to the standard hierarchical clustering algorithm.  相似文献   

11.
Methylation of G1405 within bacterial 16S ribosomal RNA results in high-level resistance to specific combinations of aminoglycoside antibiotics. Only a few closely related methyltransferases (MTases), which carry out the respective modification (here dubbed "Agr", for aminoglycoside resistance), are known. It is not clear, whether they are related to "typical" S-adenosylmethionine (AdoMet)-dependent MTases or not. Demydchuk et al., 1998 proposed that the cofactor-binding region is localized at the C-terminus of Agr MTases, which implies an interesting case of sequence permutation. Since the Agr MTases lack significant sequence similarity to other proteins, we tested that hypothesis using more sensitive sequence/structure threading approach. Structure prediction confirmed the presence of a putative AdoMet-binding site in these proteins, albeit at a distinct location, resembling that of "typical", non-permuted MTases. Additionally, a small alpha-helical domain dissimilar to other proteins in the database was identified in the N-terminal region of Agr MTases. Comparison of a three-dimensional model of the Agr family member with a recently solved structure of reovirus mRNA capping MTase suggests that the mechanism of guanine-N7 methylation in rRNA and mRNA may be different.  相似文献   

12.
Streamlined method to analyze 16S rRNA gene clone libraries   总被引:5,自引:0,他引:5  
Vergin KL  Rappé MS  Giovannoni SJ 《BioTechniques》2001,30(5):938-40, 943-4
  相似文献   

13.

16S核糖体RNA(16S rRNA)基因测序是微生物分析的重要手段。16S rRNA基因测序的原始数据复杂,存在许多误差,分析前一般需要先进行序列质量控制,即对数据进行去噪、去冗余和去嵌合,最终将质量控制后的数据分为可操作分类单元(OTU)或ASV,在OTU或ASV基础上再进行菌群的各种分析。经过多年改进,聚类方法逐渐以UPARSE、DADA2、Deblur和UNOISE3等为主流,OTU聚类已经不能满足研究的需求,而ASV聚类使得菌群分析更加准确。本文除了综述聚类方法的研究进展外,还介绍了USEARCH、MOTHUR和QIIME等多种16S基因测序分析工具软件的相关研究进展。

  相似文献   

14.
The 30S ribosomal P site: a function of 16S rRNA   总被引:1,自引:0,他引:1  
Noller HF  Hoang L  Fredrick K 《FEBS letters》2005,579(4):855-858
The 30S ribosomal P site serves several functions in translation. It must specifically bind initiator tRNA during formation of the 30S initiation complex; bind the anticodon stem-loop of peptidyl-tRNA during the elongation phase; and help to maintain the translational reading frame when the A site is unoccupied. Early experiments provided evidence that 16S rRNA was an important component of the 30S P site. Footprinting and crosslinking studies later implicated specific nucleotides in interactions with tRNA. The crystal structures of the 30S subunit and 70S ribosome-tRNA complexes confirmed the interactions between 16S rRNA and tRNA, but also revealed contacts between tRNA and the C-terminal tails of proteins S9 and S13. Deletion of these tails now shows that the 16S rRNA contacts alone are sufficient to support protein synthesis in living cells.  相似文献   

15.
16.
The 16S and 23S rRNA higher-order structures inferred from comparative analysis are now quite refined. The models presented here differ from their immediate predecessors only in minor detail. Thus, it is safe to assert that all of the standard secondary-structure elements in (prokaryotic) rRNAs have been identified, with approximately 90% of the individual base pairs in each molecule having independent comparative support, and that at least some of the tertiary interactions have been revealed. It is interesting to compare the rRNAs in this respect with tRNA, whose higher-order structure is known in detail from its crystal structure (36) (Table 2). It can be seen that rRNAs have as great a fraction of their sequence in established secondary-structure elements as does tRNA. However, the fact that the former show a much lower fraction of identified tertiary interactions and a greater fraction of unpaired nucleotides than the latter implies that many of the rRNA tertiary interactions remain to be located. (Alternatively, the ribosome might involve protein-rRNA rather than intramolecular rRNA interactions to stabilize three-dimensional structure.) Experimental studies on rRNA are consistent to a first approximation with the structures proposed here, confirming the basic assumption of comparative analysis, i.e., that bases whose compositions strictly covary are physically interacting. In the exhaustive study of Moazed et al. (45) on protection of the bases in the small-subunit rRNA against chemical modification, the vast majority of bases inferred to pair by covariation are found to be protected from chemical modification, both in isolated small-subunit rRNA and in the 30S subunit. The majority of the tertiary interactions are reflected in the chemical protection data as well (45). On the other hand, many of the bases not shown as paired in Fig. 1 are accessible to chemical attack (45). However, in this case a sizeable fraction of them are also protected against chemical modification (in the isolated rRNA), which suggests that considerable higher-order structure remains to be found (although all of it may not involve base-base interactions and so may not be detectable by comparative analysis). The agreement between the higher-order structure of the small-subunit rRNA and protection against chemical modification is not perfect, however; some bases shown to covary canonically are accessible to chemical modification (45).(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

17.
18.
Boreal soils have been suspected reservoirs of infectious environmental mycobacteria. Detection of these bacteria in the environment is hampered by their slow growth. We applied a quantitative sandwich hybridization approach for direct detection of mycobacterial 16S rRNA in soil without a nucleic acid amplification step. The numbers of mycobacterial 16S rRNA molecules found in the soil indicated the presence of up to 10(7) to 10(8) mycobacterial cells per gram of soil. These numbers exceed by factor of 10 to 100 x the previous estimates of mycobacteria in soil based on culture methods. When real-time PCR with mycobacteria targeting primers was used to estimate the number of 16S rDNA copies in soil, one copy of 16S rDNA was detected per 10(4) copies of 16S rRNA. This is close to the number of 16S rRNA molecules detected per cell by the same method in laboratory pure cultures of M. chlorophenolicum. Therefore a major part of the mycobacterial DNA in the studied soils may thus have represented metabolically active cells. The 16S rRNA sandwich hybridization method described in this paper offers a culture independent solution for tracking environmental reservoirs of viable and potentially infectious mycobacteria.  相似文献   

19.

Background

The intra- and inter-species genetic diversity of bacteria and the absence of ‘reference’, or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia.

Methods

A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM) of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization.

Results

The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52%) corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as ‘centroids’ in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578.

Conclusion

The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra-species variability.  相似文献   

20.
The nucleotide sequence of 16S rDNA from Euglena gracilis chloroplasts has been determined representing the first complete sequence of an algal chloroplast rRNA gene. The structural part of the 16S rRNA gene has 1491 nucleotides according to a comparative analysis of our sequencing results with the published 5'- and 3'-terminal "T1-oligonucleotides" from 16S rRNA from E. gracilis. Alignment with 16S rDNA from Zea mays chloroplasts and E. coli reveals 80 to 72% sequence homology, respectively. Two deletions of 9 and 23 nucleotides are found which are identical in size and position with deletions observed in 16S rDNA of maize and tobacco chloroplasts and which seem to be characteristic for all chloroplast rRNA species. We also find insertions and deletions in E. gracilis not seen in 16S rDNA of higher plant chloroplasts. The 16S rRNA sequence of E. gracilis chloroplasts can be folded by base pairing according to the general 16S rRNA secondary structure model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号