期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

BreaKmer: detection of structural variation in targeted massively parallel sequencing data using kmers

Ryan P. Abo Matthew Ducar Elizabeth P. Garcia Aaron R. Thorner Vanesa Rojas-Rudilla Ling Lin Lynette M. Sholl William C. Hahn Matthew Meyerson Neal I. Lindeman Paul Van?Hummelen Laura E. MacConaill 《Nucleic acids research》2015,43(3):e19

Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for ‘targeted’ resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings. 相似文献

2.

Flow cytometry for enrichment and titration in massively parallel DNA sequencing

下载免费PDF全文

Julia Sandberg Patrik L. Sthl Afshin Ahmadian Magnus K. Bjursell Joakim Lundeberg 《Nucleic acids research》2009,37(8):e63

Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. 相似文献

3.

Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing 总被引：2，自引：0，他引：2

Robertson G Hirst M Bainbridge M Bilenky M Zhao Y Zeng T Euskirchen G Bernier B Varhol R Delaney A Thiessen N Griffith OL He A Marra M Snyder M Jones S 《Nature methods》2007,4(8):651-657

相似文献

4.

An integrative genomic and epigenomic approach for the study of transcriptional regulation

Figueroa ME Reimers M Thompson RF Ye K Li Y Selzer RR Fridriksson J Paietta E Wiernik P Green RD Greally JM Melnick A 《PloS one》2008,3(3):e1882

相似文献

5.

MochiView: versatile software for genome browsing and DNA motif analysis

Oliver R Homann Alexander D Johnson 《BMC biology》2010,8(1):49

Background

As high-throughput technologies rapidly generate genome-scale data, it becomes increasingly important to visually integrate these data so that specific hypotheses can be formulated and tested. 相似文献

6.

ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

Misawa K Kamatani N 《Source code for biology and medicine》2011,6(1):10-4

Background

Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs). By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required.

Results

We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo.

Conclusions

ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/. 相似文献

7.

Noninvasive prenatal diagnosis of fetal trisomy 21 by allelic ratio analysis using targeted massively parallel sequencing of maternal plasma DNA

Liao GJ Chan KC Jiang P Sun H Leung TY Chiu RW Lo YM 《PloS one》2012,7(5):e38154

相似文献

8.

seqCNA: an R package for DNA copy number analysis in cancer using high-throughput sequencing

David Mosen-Ansorena Naiara Telleria Silvia Veganzones Virginia De la Orden Maria Luisa Maestro Ana M Aransay 《BMC genomics》2014,15(1)

Background

Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles.

Results

We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation.

Conclusions

seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users. 相似文献

9.

Recurrence time statistics: versatile tools for genomic DNA sequence analysis

Cao Y Tung WW Gao JB Qi Y 《Journal of bioinformatics and computational biology》2005,3(3):677-696

With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC. 相似文献

10.

Targeted genomic capture and massively parallel sequencing to identify genes for hereditary hearing loss in middle eastern families

Brownstein Z Friedman LM Shahin H Oron-Karni V Kol N Abu Rayyan A Parzefall T Lev D Shalev S Frydman M Davidov B Shohat M Rahile M Lieberman S Levy-Lahad E Lee MK Shomron N King MC Walsh T Kanaan M Avraham KB 《Genome biology》2011,12(9):R89-11

Background

Identification of genes responsible for medically important traits is a major challenge in human genetics. Due to the genetic heterogeneity of hearing loss, targeted DNA capture and massively parallel sequencing are ideal tools to address this challenge. Our subjects for genome analysis are Israeli Jewish and Palestinian Arab families with hearing loss that varies in mode of inheritance and severity.

Results

A custom 1.46 MB design of cRNA oligonucleotides was constructed containing 246 genes responsible for either human or mouse deafness. Paired-end libraries were prepared from 11 probands and bar-coded multiplexed samples were sequenced to high depth of coverage. Rare single base pair and indel variants were identified by filtering sequence reads against polymorphisms in dbSNP132 and the 1000 Genomes Project. We identified deleterious mutations in CDH23, MYO15A, TECTA, TMC1, and WFS1. Critical mutations of the probands co-segregated with hearing loss. Screening of additional families in a relevant population was performed. TMC1 p.S647P proved to be a founder allele, contributing to 34% of genetic hearing loss in the Moroccan Jewish population.

Conclusions

Critical mutations were identified in 6 of the 11 original probands and their families, leading to the identification of causative alleles in 20 additional probands and their families. The integration of genomic analysis into early clinical diagnosis of hearing loss will enable prediction of related phenotypes and enhance rehabilitation. Characterization of the proteins encoded by these genes will enable an understanding of the biological mechanisms involved in hearing loss. 相似文献

11.

Pseudo-Sanger sequencing: massively parallel production of long and near error-free reads using NGS technology

Jue Ruan Lan Jiang Zechen Chong Qiang Gong Heng Li Chunyan Li Yong Tao Caihong Zheng Weiwei Zhai David Turissini Charles H Cannon Xuemei Lu Chung-I Wu 《BMC genomics》2013,14(1)

Background

Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.

Results

We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.

Conclusions

Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users. 相似文献

12.

Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing

Misra S Agrawal A Liao WK Choudhary A 《Bioinformatics (Oxford, England)》2011,27(2):189-195

相似文献

13.

CGHAnalyzer: a stand-alone software package for cancer genome analysis using array-based DNA copy number data

Margolin AA Greshock J Naylor TL Mosse Y Maris JM Bignell G Saeed AI Quackenbush J Weber BL 《Bioinformatics (Oxford, England)》2005,21(15):3308-3311

SUMMARY: This synopsis provides an overview of array-based comparative genomic hybridization data display, abstraction and analysis using CGHAnalyzer, a software suite, designed specifically for this purpose. CGHAnalyzer can be used to simultaneously load copy number data from multiple platforms, query and describe large, heterogeneous datasets and export results. Additionally, CGHAnalyzer employs a host of algorithms for microarray analysis that include hierarchical clustering and class differentiation. AVAILABILITY: CGHAnalyzer, the accompanying manual, documentation and sample data are available for download at http://acgh.afcri.upenn.edu. This is a Java-based application built in the framework of the TIGR MeV that can run on Microsoft Windows, Macintosh OSX and a variety of Unix-based platforms. It requires the installation of the free Java Runtime Environment 1.4.1 (or more recent) (http://www.java.sun.com). 相似文献

14.

Strainer: software for analysis of population variation in community genomic datasets

John M Eppley Gene W Tyson Wayne M Getz Jillian F Banfield 《BMC bioinformatics》2007,8(1):398

Background

Metagenomic analyses of microbial communities that are comprehensive enough to provide multiple samples of most loci in the genomes of the dominant organism types will also reveal patterns of genetic variation within natural populations. New bioinformatic tools will enable visualization and comprehensive analysis of this sequence variation and inference of recent evolutionary and ecological processes. 相似文献

15.

COHCAP: an integrative genomic pipeline for single-nucleotide resolution DNA methylation analysis

Charles D. Warden Heehyoung Lee Joshua D. Tompkins Xiaojin Li Charles Wang Arthur D. Riggs Hua Yu Richard Jove Yate-Ching Yuan 《Nucleic acids research》2013,41(11):e117

COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution DNA methylation data produced by either an Illumina methylation array or targeted bisulfite sequencing. The goal of the COHCAP algorithm is to identify CpG islands that show a consistent pattern of methylation among CpG sites. COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data. For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression. COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression. This software is freely available at https://sourceforge.net/projects/cohcap/. 相似文献

16.

High-resolution functional mapping of the venezuelan equine encephalitis virus genome by insertional mutagenesis and massively parallel sequencing

Beitzel BF Bakken RR Smith JM Schmaljohn CS 《PLoS pathogens》2010,6(10):e1001146

We have developed a high-resolution genomic mapping technique that combines transposon-mediated insertional mutagenesis with either capillary electrophoresis or massively parallel sequencing to identify functionally important regions of the Venezuelan equine encephalitis virus (VEEV) genome. We initially used a capillary electrophoresis method to gain insight into the role of the VEEV nonstructural protein 3 (nsP3) in viral replication. We identified several regions in nsP3 that are intolerant to small (15 bp) insertions, and thus are presumably functionally important. We also identified nine separate regions in nsP3 that will tolerate small insertions at low temperatures (30°C), but not at higher temperatures (37°C, and 40°C). Because we found this method to be extremely effective at identifying temperature sensitive (ts) mutations, but limited by capillary electrophoresis capacity, we replaced the capillary electrophoresis with massively parallel sequencing and used the improved method to generate a functional map of the entire VEEV genome. We identified several hundred potential ts mutations throughout the genome and we validated several of the mutations in nsP2, nsP3, E3, E2, E1 and capsid using single-cycle growth curve experiments with virus generated through reverse genetics. We further demonstrated that two of the nsP3 ts mutants were attenuated for virulence in mice but could elicit protective immunity against challenge with wild-type VEEV. The recombinant ts mutants will be valuable tools for further studies of VEEV replication and virulence. Moreover, the method that we developed is applicable for generating such tools for any virus with a robust reverse genetics system. 相似文献

17.

Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing 总被引：1，自引：0，他引：1

Wederell ED Bilenky M Cullum R Thiessen N Dagpinar M Delaney A Varhol R Zhao Y Zeng T Bernier B Ingham M Hirst M Robertson G Marra MA Jones S Hoodless PA 《Nucleic acids research》2008,36(14):4549-4564

相似文献

18.

Eval: A software package for analysis of genome annotations

Evan?Keibler Michael?R?Brent Email author 《BMC bioinformatics》2003,4(1):50

相似文献

19.

cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data

Evangelos Bellos Michael R Johnson Lachlan J M Coin 《Genome biology》2012,13(12):R120

Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq 相似文献

20.

GROMACS 3.0: a package for molecular simulation and trajectory analysis 总被引：19，自引：0，他引：19

Erik Lindahl Berk Hess David van der Spoel 《Journal of molecular modeling》2001,7(8):306-317

相似文献