期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

首页 | 本学科首页

官方微博 | 高级检索

相似文献

共查询到20条相似文献，搜索用时 0 毫秒

1.

Rapid evaluation and quality control of next generation sequencing data with FaQCs

Chien-Chi Lo Patrick S G Chain 《BMC bioinformatics》2014,15(1)

Background

Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform’s sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects.

Results

Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics.

Conclusion

FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0366-2) contains supplementary material, which is available to authorized users. 相似文献

2.

SAMStat: monitoring biases in next generation sequencing data

Lassmann T Hayashizaki Y Daub CO 《Bioinformatics (Oxford, England)》2011,27(1):130-131

相似文献

3.

In silico secretome analysis approach for next generation sequencing transcriptomic data

Garg G Ranganathan S 《BMC genomics》2011,12(Z3):S14

相似文献

4.

Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data

Althammer S González-Vallinas J Ballaré C Beato M Eyras E 《Bioinformatics (Oxford, England)》2011,27(24):3333-3340

相似文献

5.

PoPoolation: a toolbox for population genetic analysis of next generation sequencing data from pooled individuals 总被引：1，自引：0，他引：1

Kofler R Orozco-terWengel P De Maio N Pandey RV Nolte V Futschik A Kosiol C Schlötterer C 《PloS one》2011,6(1):e15925

Recent statistical analyses suggest that sequencing of pooled samples provides a cost effective approach to determine genome-wide population genetic parameters. Here we introduce PoPoolation, a toolbox specifically designed for the population genetic analysis of sequence data from pooled individuals. PoPoolation calculates estimates of θ(Watterson), θ(π), and Tajima's D that account for the bias introduced by pooling and sequencing errors, as well as divergence between species. Results of genome-wide analyses can be graphically displayed in a sliding window plot. PoPoolation is written in Perl and R and it builds on commonly used data formats. Its source code can be downloaded from http://code.google.com/p/popoolation/. Furthermore, we evaluate the influence of mapping algorithms, sequencing errors, and read coverage on the accuracy of population genetic parameter estimates from pooled data. 相似文献

6.

Assessment of metagenomic assembly using simulated next generation sequencing data

Mende DR Waller AS Sunagawa S Järvelin AI Chan MM Arumugam M Raes J Bork P 《PloS one》2012,7(2):e31386

Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available. 相似文献

7.

Mobster: accurate detection of mobile element insertions in next generation sequencing data

Djie Tjwan Thung Joep de Ligt Lisenka EM Vissers Marloes Steehouwer Mark Kroon Petra de Vries Eline P Slagboom Kai Ye Joris A Veltman Jayne Y Hehir-Kwa 《Genome biology》2014,15(10)

Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rate and high recall rate for both L1 and Alu elements. Mobster is available at http://sourceforge.net/projects/mobster.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0488-x) contains supplementary material, which is available to authorized users. 相似文献

8.

LymAnalyzer: a tool for comprehensive analysis of next generation sequencing data of T cell receptors and immunoglobulins

Yaxuan Yu Rhodri Ceredig Cathal Seoighe 《Nucleic acids research》2016,44(4):e31

The adaptive immune system includes populations of B and T cells capable of binding foreign epitopes via antigen specific receptors, called immunoglobulin (IG) for B cells and the T cell receptor (TCR) for T cells. In order to provide protection from a wide range of pathogens, these cells display highly diverse repertoires of IGs and TCRs. This is achieved through combinatorial rearrangement of multiple gene segments in addition, for B cells, to somatic hypermutation. Deep sequencing technologies have revolutionized analysis of the diversity of these repertoires; however, accurate TCR/IG diversity profiling requires specialist bioinformatics tools. Here we present LymAnalzyer, a software package that significantly improves the completeness and accuracy of TCR/IG profiling from deep sequence data and includes procedures to identify novel alleles of gene segments. On real and simulated data sets LymAnalyzer produces highly accurate and complete results. Although, to date we have applied it to TCR/IG data from human and mouse, it can be applied to data from any species for which an appropriate database of reference genes is available. Implemented in Java, it includes both a command line version and a graphical user interface and is freely available at https://sourceforge.net/projects/lymanalyzer/. 相似文献

9.

下一代测序cDNA样品制备及优化技术探讨DD

秦巧平张岚岚李南羿崔永一徐凯《遗传》2010,32(9)

下一代测序技术目前已经应用于微生物、人类、动物、植物等的基因组分析.样品制备是开展大规模测序的必要前提和测序成功的根本保证.对大规模测序造成干扰的主要因素有: polyA干扰测序信号及高丰度基因对低丰度基因的掩盖等.文章以堇菜(Viola verecumda A.Gray)叶片为试材, 提取总RNA, 合成双链cDNA, 利用DSN核酸酶对双链cDNA进行均一化处理, 并对双链cDNA polyA进行了切除, 将处理后的cDNA进行了TA克隆, 挑取100个克隆随机测序.结果表明, 未处理的cDNA样本测序有15个克隆由于polyA的存在而影响了附近碱基的正确阅读, 独立克隆只有62个, 而处理后的cDNA样本经测序未发现polyA, 独立克隆有94个.序列分析发现, 经过处理的cDNA样本随机测序有两个克隆是经MALDI-TOF检测在样本中有蛋白质峰, 而基因克隆一直没有分离到的序列.以处理后的cDNA样本为模板扩增2个已知表达丰度差异较大的基因显示, 处理后这2个基因的PCR扩增产量差异明显减小.这些结果表明polyA切除、DSN核酸酶处理的cDNA样本完全满足大规模测序、寻找新基因的要求. 相似文献

10.

Evaluation of next generation sequencing platforms for population targeted sequencing studies

Olivier Harismendy Pauline C Ng Robert L Strausberg Xiaoyun Wang Timothy B Stockwell Karen Y Beeson Nicholas J Schork Sarah S Murray Eric J Topol Samuel Levy Kelly A Frazer 《Genome biology》2009,10(3):R32-13

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies. 相似文献

11.

Human Disease: Next-generation sequencing of the next generation

Burgess DJ 《Nature reviews. Genetics》2011,12(2):78

相似文献

12.

An integrative framework for the identification of double minute chromosomes using next generation sequencing data

Matthew?Hayes Email author Jing?Li 《BMC genetics》2015,16(Z2):S1

Background

Double minute chromosomes are circular fragments of DNA whose presence is associated with the onset of certain cancers. Double minutes are lethal, as they are highly amplified and typically contain oncogenes. Locating double minutes can supplement the process of cancer diagnosis, and it can help to identify therapeutic targets. However, there is currently a dearth of computational methods available to identify double minutes. We propose a computational framework for the idenfication of double minute chromosomes using next-generation sequencing data. Our framework integrates predictions from algorithms that detect DNA copy number variants, and it also integrates predictions from algorithms that locate genomic structural variants. This information is used by a graph-based algorithm to predict the presence of double minute chromosomes.

Results

Using a previously published copy number variant algorithm and two structural variation prediction algorithms, we implemented our framework and tested it on a dataset consisting of simulated double minute chromosomes. Our approach uncovered double minutes with high accuracy, demonstrating its plausibility.

Conclusions

Although we only tested the framework with three programs (RDXplorer, BreakDancer, Delly), it can be extended to incorporate results from programs that 1) detect amplified copy number and from programs that 2) detect genomic structural variants like deletions, translocations, inversions, and tandem repeats.The software that implements the framework can be accessed here: https://github.com/mhayes20/DMFinder

相似文献

13.

Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data

Dong-Won Seo Jae-Don Oh Shil Jin Ki-Duk Song Hee-Bok Park Kang-Nyeong Heo Younhee Shin Myunghee Jung Junhyung Park Cheorun Jo Hak-Kyo Lee Jun-Heon Lee 《Molecular biology reports》2015,42(2):471-477

相似文献

14.

Personalized pathway enrichment map of putative cancer genes from next generation sequencing data

Jia P Zhao Z 《PloS one》2012,7(5):e37595

BACKGROUND: Pathway analysis of a set of genes represents an important area in large-scale omic data analysis. However, the application of traditional pathway enrichment methods to next-generation sequencing (NGS) data is prone to several potential biases, including genomic/genetic factors (e.g., the particular disease and gene length) and environmental factors (e.g., personal life-style and frequency and dosage of exposure to mutagens). Therefore, novel methods are urgently needed for these new data types, especially for individual-specific genome data. METHODOLOGY: In this study, we proposed a novel method for the pathway analysis of NGS mutation data by explicitly taking into account the gene-wise mutation rate. We estimated the gene-wise mutation rate based on the individual-specific background mutation rate along with the gene length. Taking the mutation rate as a weight for each gene, our weighted resampling strategy builds the null distribution for each pathway while matching the gene length patterns. The empirical P value obtained then provides an adjusted statistical evaluation. PRINCIPAL FINDINGS/CONCLUSIONS: We demonstrated our weighted resampling method to a lung adenocarcinomas dataset and a glioblastoma dataset, and compared it to other widely applied methods. By explicitly adjusting gene-length, the weighted resampling method performs as well as the standard methods for significant pathways with strong evidence. Importantly, our method could effectively reject many marginally significant pathways detected by standard methods, including several long-gene-based, cancer-unrelated pathways. We further demonstrated that by reducing such biases, pathway crosstalk for each individual and pathway co-mutation map across multiple individuals can be objectively explored and evaluated. This method performs pathway analysis in a sample-centered fashion, and provides an alternative way for accurate analysis of cancer-personalized genomes. It can be extended to other types of genomic data (genotyping and methylation) that have similar bias problems. 相似文献

15.

The introduction of a targeted next generation sequencing diagnostic service for MH

Dorota Fiszer Sarah Hobson Nickla Fisher Marie-Anne Shaw Sarah Shepherd Rachel Robinson Ruth Charlton Phil Hopkins 《BMC anesthesiology》2014,14(Z1):A14

相似文献

16.

Bacterial genomes: next generation sequencing technologies for studies of bacterial ecosystems

Siv GE Andersson Andrew L Goodman 《Current opinion in microbiology》2012,15(5):603-604

相似文献

17.

CAPRG: sequence assembling pipeline for next generation sequencing of non-model organisms

Rawat A Elasri MO Gust KA George G Pham D Scanlan LD Vulpe C Perkins EJ 《PloS one》2012,7(2):e30370

相似文献

18.

Single-cell RNA-Seq: a next generation sequencing tool for a high-resolution view of the individual cell

Kevin Stevenson 《Journal of biomolecular structure & dynamics》2020,38(12):3730-3735

Communicated by Ramaswamy H. Sarma 相似文献

19.

Comparison of solution-based exome capture methods for next generation sequencing

Sulonen AM Ellonen P Almusa H Lepistö M Eldfors S Hannula S Miettinen T Tyynismaa H Salo P Heckman C Joensuu H Raivio T Suomalainen A Saarela J 《Genome biology》2011,12(9):R94-18

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons. 相似文献

20.

Erratum to: Single nucleotide polymorphism analysis of Korean native chickens using next generation sequencing data

Dong-Won Seo Jae-Don Oh Shil Jin Ki-Duk Song Hee-Bok Park Kang-Nyeong Heo Younhee Shin Myunghee Jung Junhyung Park Cheorun Jo Hak-Kyo Lee Jun-Heon Lee 《Molecular biology reports》2015,42(2):567-567

相似文献

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司京ICP备09084417号