首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for ‘targeted’ resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a ‘kmer’ strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings.  相似文献   

2.
Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols.  相似文献   

3.
4.
5.

Background  

As high-throughput technologies rapidly generate genome-scale data, it becomes increasingly important to visually integrate these data so that specific hypotheses can be formulated and tested.  相似文献   

6.

Background

Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs). By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required.

Results

We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo.

Conclusions

ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.  相似文献   

7.
8.

Background

Deviations in the amount of genomic content that arise during tumorigenesis, called copy number alterations, are structural rearrangements that can critically affect gene expression patterns. Additionally, copy number alteration profiles allow insight into cancer discrimination, progression and complexity. On data obtained from high-throughput sequencing, improving quality through GC bias correction and keeping false positives to a minimum help build reliable copy number alteration profiles.

Results

We introduce seqCNA, a parallelized R package for an integral copy number analysis of high-throughput sequencing cancer data. The package includes novel methodology on (i) filtering, reducing false positives, and (ii) GC content correction, improving copy number profile quality, especially under great read coverage and high correlation between GC content and copy number. Adequate analysis steps are automatically chosen based on availability of paired-end mapping, matched normal samples and genome annotation.

Conclusions

seqCNA, available through Bioconductor, provides accurate copy number predictions in tumoural data, thanks to the extensive filtering and better GC bias correction, while providing an integrated and parallelized workflow.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-178) contains supplementary material, which is available to authorized users.  相似文献   

9.
With the completion of the human and a few model organisms' genomes, and with the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time-based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Our method requires approximately 6 . N byte memory and a computational time of N log N to extract all the repeat-related and periodic or quasi-periodic features from a sequence of length N without any prior knowledge on the consensus sequence of those features, hence enables us to carry out sequence analysis on the whole genomic scale by a PC.  相似文献   

10.

Background

Identification of genes responsible for medically important traits is a major challenge in human genetics. Due to the genetic heterogeneity of hearing loss, targeted DNA capture and massively parallel sequencing are ideal tools to address this challenge. Our subjects for genome analysis are Israeli Jewish and Palestinian Arab families with hearing loss that varies in mode of inheritance and severity.

Results

A custom 1.46 MB design of cRNA oligonucleotides was constructed containing 246 genes responsible for either human or mouse deafness. Paired-end libraries were prepared from 11 probands and bar-coded multiplexed samples were sequenced to high depth of coverage. Rare single base pair and indel variants were identified by filtering sequence reads against polymorphisms in dbSNP132 and the 1000 Genomes Project. We identified deleterious mutations in CDH23, MYO15A, TECTA, TMC1, and WFS1. Critical mutations of the probands co-segregated with hearing loss. Screening of additional families in a relevant population was performed. TMC1 p.S647P proved to be a founder allele, contributing to 34% of genetic hearing loss in the Moroccan Jewish population.

Conclusions

Critical mutations were identified in 6 of the 11 original probands and their families, leading to the identification of causative alleles in 20 additional probands and their families. The integration of genomic analysis into early clinical diagnosis of hearing loss will enable prediction of related phenotypes and enhance rehabilitation. Characterization of the proteins encoded by these genes will enable an understanding of the biological mechanisms involved in hearing loss.  相似文献   

11.

Background

Usually, next generation sequencing (NGS) technology has the property of ultra-high throughput but the read length is remarkably short compared to conventional Sanger sequencing. Paired-end NGS could computationally extend the read length but with a lot of practical inconvenience because of the inherent gaps. Now that Illumina paired-end sequencing has the ability of read both ends from 600 bp or even 800 bp DNA fragments, how to fill in the gaps between paired ends to produce accurate long reads is intriguing but challenging.

Results

We have developed a new technology, referred to as pseudo-Sanger (PS) sequencing. It tries to fill in the gaps between paired ends and could generate near error-free sequences equivalent to the conventional Sanger reads in length but with the high throughput of the Next Generation Sequencing. The major novelty of PS method lies on that the gap filling is based on local assembly of paired-end reads which have overlaps with at either end. Thus, we are able to fill in the gaps in repetitive genomic region correctly. The PS sequencing starts with short reads from NGS platforms, using a series of paired-end libraries of stepwise decreasing insert sizes. A computational method is introduced to transform these special paired-end reads into long and near error-free PS sequences, which correspond in length to those with the largest insert sizes. The PS construction has 3 advantages over untransformed reads: gap filling, error correction and heterozygote tolerance. Among the many applications of the PS construction is de novo genome assembly, which we tested in this study. Assembly of PS reads from a non-isogenic strain of Drosophila melanogaster yields an N50 contig of 190 kb, a 5 fold improvement over the existing de novo assembly methods and a 3 fold advantage over the assembly of long reads from 454 sequencing.

Conclusions

Our method generated near error-free long reads from NGS paired-end sequencing. We demonstrated that de novo assembly could benefit a lot from these Sanger-like reads. Besides, the characteristic of the long reads could be applied to such applications as structural variations detection and metagenomics.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-711) contains supplementary material, which is available to authorized users.  相似文献   

12.
13.
SUMMARY: This synopsis provides an overview of array-based comparative genomic hybridization data display, abstraction and analysis using CGHAnalyzer, a software suite, designed specifically for this purpose. CGHAnalyzer can be used to simultaneously load copy number data from multiple platforms, query and describe large, heterogeneous datasets and export results. Additionally, CGHAnalyzer employs a host of algorithms for microarray analysis that include hierarchical clustering and class differentiation. AVAILABILITY: CGHAnalyzer, the accompanying manual, documentation and sample data are available for download at http://acgh.afcri.upenn.edu. This is a Java-based application built in the framework of the TIGR MeV that can run on Microsoft Windows, Macintosh OSX and a variety of Unix-based platforms. It requires the installation of the free Java Runtime Environment 1.4.1 (or more recent) (http://www.java.sun.com).  相似文献   

14.

Background  

Metagenomic analyses of microbial communities that are comprehensive enough to provide multiple samples of most loci in the genomes of the dominant organism types will also reveal patterns of genetic variation within natural populations. New bioinformatic tools will enable visualization and comprehensive analysis of this sequence variation and inference of recent evolutionary and ecological processes.  相似文献   

15.
COHCAP (City of Hope CpG Island Analysis Pipeline) is an algorithm to analyze single-nucleotide resolution DNA methylation data produced by either an Illumina methylation array or targeted bisulfite sequencing. The goal of the COHCAP algorithm is to identify CpG islands that show a consistent pattern of methylation among CpG sites. COHCAP is currently the only DNA methylation package that provides integration with gene expression data to identify a subset of CpG islands that are most likely to regulate downstream gene expression, and it can generate lists of differentially methylated CpG islands with ∼50% concordance with gene expression from both cell line data and heterogeneous patient data. For example, this article describes known breast cancer biomarkers (such as estrogen receptor) with a negative correlation between DNA methylation and gene expression. COHCAP also provides visualization for quality control metrics, regions of differential methylation and correlation between methylation and gene expression. This software is freely available at https://sourceforge.net/projects/cohcap/.  相似文献   

16.
We have developed a high-resolution genomic mapping technique that combines transposon-mediated insertional mutagenesis with either capillary electrophoresis or massively parallel sequencing to identify functionally important regions of the Venezuelan equine encephalitis virus (VEEV) genome. We initially used a capillary electrophoresis method to gain insight into the role of the VEEV nonstructural protein 3 (nsP3) in viral replication. We identified several regions in nsP3 that are intolerant to small (15 bp) insertions, and thus are presumably functionally important. We also identified nine separate regions in nsP3 that will tolerate small insertions at low temperatures (30°C), but not at higher temperatures (37°C, and 40°C). Because we found this method to be extremely effective at identifying temperature sensitive (ts) mutations, but limited by capillary electrophoresis capacity, we replaced the capillary electrophoresis with massively parallel sequencing and used the improved method to generate a functional map of the entire VEEV genome. We identified several hundred potential ts mutations throughout the genome and we validated several of the mutations in nsP2, nsP3, E3, E2, E1 and capsid using single-cycle growth curve experiments with virus generated through reverse genetics. We further demonstrated that two of the nsP3 ts mutants were attenuated for virulence in mice but could elicit protective immunity against challenge with wild-type VEEV. The recombinant ts mutants will be valuable tools for further studies of VEEV replication and virulence. Moreover, the method that we developed is applicable for generating such tools for any virus with a robust reverse genetics system.  相似文献   

17.
18.
19.
Recent advances in sequencing technologies provide the means for identifying copy number variation (CNV) at an unprecedented resolution. A single next-generation sequencing experiment offers several features that can be used to detect CNV, yet current methods do not incorporate all available signatures into a unified model. cnvHiTSeq is an integrative probabilistic method for CNV discovery and genotyping that jointly analyzes multiple features at the population level. By combining evidence from complementary sources, cnvHiTSeq achieves high genotyping accuracy and a substantial improvement in CNV detection sensitivity over existing methods, while maintaining a low false discovery rate. cnvHiTSeq is available at http://sourceforge.net/projects/cnvhitseq  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号