首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到6条相似文献,搜索用时 15 毫秒
1.
Next-generation sequencing(NGS) technology has revolutionized and significantly impacted metagenomic research.However,the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads,which will significantly compromise downstream analysis.Many quality control(QC) tools have been proposed,however,few of them have been verified to be suitable or efficient for metagenomic data,which are composed of multiple genomes and are more complex than other kinds of NGS data.Here we present a metagenomic data QC method named Meta-QC-Chain.Meta-QC-Chain combines multiple QC functions:technical tests describe input data status and identify potential errors,quality trimming filters poor sequencing-quality bases and reads,and contamination screening identifies higher eukaryotic species,which are considered as contamination for metagenomic data.Most computing processes are optimized based on parallel programming.Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads,and the whole quality control procedure was completed within 20 min.Therefore,Meta-QC-Chain provides a comprehensive,useful and high-performance QC tool for metagenomic data.Meta-QC-Chain is publicly available for free at:http://computationalbioenergy.org/meta-qc-chain.html.  相似文献   

2.
Next-generation sequencing (NGS) is getting routinely used in the diagnosis of hereditary diseases, such as human cardiomyopathies. Hence, it is of utter importance to secure high quality sequencing data, enabling the identification of disease-relevant mutations or the conclusion of neg-ative test results. During the process of sample preparation, each protocol for target enrichment library preparation has its own requirements for quality control (QC); however, there is little evi-dence on the actual impact of these guidelines on resulting data quality. In this study, we analyzed the impact of QC during the diverse library preparation steps of Agilent SureSelect XT target enrichment and Illumina sequencing. We quantified the parameters for a cohort of around 600 sam-ples, which include starting amount of DNA, amount of sheared DNA, smallest and largest frag-ment size of the starting DNA; amount of DNA after the pre-PCR, and smallest and largest fragment size of the resulting DNA;as well as the amount of the final library, the corresponding smallest and largest fragment size, and the number of detected variants. Intriguingly, there is a high tolerance for variations in all QC steps, meaning that within the boundaries proposed in the current study, a considerable variance at each step of QC can be well tolerated without compromising NGS quality.  相似文献   

3.
4.

Background

The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case–control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue.

Results

We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from “empty” ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets.

Conclusions

We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality assessment will have a major impact on the robustness of biological and clinical conclusions drawn from metagenomic studies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1406-7) contains supplementary material, which is available to authorized users.  相似文献   

5.
6.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号