期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accurate detection of subclonal single nucleotide variants in whole genome amplified and pooled cancer samples using HaloPlex target enrichment

Eva C Berglund Carl M?rten Lindqvist Shahina Hayat Elin ?vern?s Niklas Henriksson Jessica Nordlund Per Wahlberg Erik Forestier Gudmar L?nnerholm Ann-Christine Syv?nen 《BMC genomics》2013,14(1)

Background

Target enrichment and resequencing is a widely used approach for identification of cancer genes and genetic variants associated with diseases. Although cost effective compared to whole genome sequencing, analysis of many samples constitutes a significant cost, which could be reduced by pooling samples before capture. Another limitation to the number of cancer samples that can be analyzed is often the amount of available tumor DNA. We evaluated the performance of whole genome amplified DNA and the power to detect subclonal somatic single nucleotide variants in non-indexed pools of cancer samples using the HaloPlex technology for target enrichment and next generation sequencing.

Results

We captured a set of 1528 putative somatic single nucleotide variants and germline SNPs, which were identified by whole genome sequencing, with the HaloPlex technology and sequenced to a depth of 792–1752. We found that the allele fractions of the analyzed variants are well preserved during whole genome amplification and that capture specificity or variant calling is not affected. We detected a large majority of the known single nucleotide variants present uniquely in one sample with allele fractions as low as 0.1 in non-indexed pools of up to ten samples. We also identified and experimentally validated six novel variants in the samples included in the pools.

Conclusion

Our work demonstrates that whole genome amplified DNA can be used for target enrichment equally well as genomic DNA and that accurate variant detection is possible in non-indexed pools of cancer samples. These findings show that analysis of a large number of samples is feasible at low cost, even when only small amounts of DNA is available, and thereby significantly increases the chances of indentifying recurrent mutations in cancer samples.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-856) contains supplementary material, which is available to authorized users. 相似文献

2.

Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population

Ikuko N Motoike Mitsuyo Matsumoto Inaho Danjoh Fumiki Katsuoka Kaname Kojima Naoki Nariai Yukuto Sato Yumi Yamaguchi-Kabata Shin Ito Hisaaki Kudo Ichiko Nishijima Satoshi Nishikawa Xiaoqing Pan Rumiko Saito Sakae Saito Tomo Saito Matsuyuki Shirota Kaoru Tsuda Junji Yokozawa Kazuhiko Igarashi Naoko Minegishi Osamu Tanabe Nobuo Fuse Masao Nagasaki Kengo Kinoshita Jun Yasuda Masayuki Yamamoto 《BMC genomics》2014,15(1)

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users. 相似文献

3.

Edge effects in calling variants from targeted amplicon sequencing

Ravi Vijaya Satya John DiCarlo 《BMC genomics》2014,15(1)

Background

Analysis of targeted amplicon sequencing data presents some unique challenges in comparison to the analysis of random fragment sequencing data. Whereas reads from randomly fragmented DNA have arbitrary start positions, the reads from amplicon sequencing have fixed start positions that coincide with the amplicon boundaries. As a result, any variants near the amplicon boundaries can cause misalignments of multiple reads that can ultimately lead to false-positive or false-negative variant calls.

Results

We show that amplicon boundaries are variant calling blind spots where the variant calls are highly inaccurate. We propose that an effective strategy to avoid these blind spots is to incorporate the primer bases in obtaining read alignments and post-processing of the alignments, thereby effectively moving these blind spots into the primer binding regions (which are not used for variant calling). Targeted sequencing data analysis pipelines can provide better variant calling accuracy when primer bases are retained and sequenced.

Conclusions

Read bases beyond the variant site are necessary for analysis of amplicon sequencing data. Enzymatic primer digestion, if used in the target enrichment process, should leave at least a few primer bases to ensure that these bases are available during data analysis. The primer bases should only be removed immediately before the variant calling step to ensure that the variants can be called irrespective of where they occur within the amplicon insert region.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1073) contains supplementary material, which is available to authorized users. 相似文献

4.

Are special read alignment strategies necessary and cost-effective when handling sequencing reads from patient-derived tumor xenografts?

Kai-Yuen Tso Sau Dan Lee Kwok-Wai Lo Kevin Y Yip 《BMC genomics》2014,15(1)

Background

Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data.

Results

We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing.

Conclusions

Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1172) contains supplementary material, which is available to authorized users. 相似文献

5.

Variant detection sensitivity and biases in whole genome and exome sequencing

Alison M Meynert Morad Ansari David R FitzPatrick Martin S Taylor 《BMC bioinformatics》2014,15(1)

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users. 相似文献

6.

Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling

Guoqiang Zhang Jianfeng Wang Jin Yang Wenjie Li Yutian Deng Jing Li Jun Huang Songnian Hu Bing Zhang 《BMC genomics》2015,16(1)

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users. 相似文献

7.

Rapid identification and recovery of ENU-induced mutations with next-generation sequencing and Paired-End Low-Error analysis

Luyuan Pan Arish N Shah Ian G Phelps Dan Doherty Eric A Johnson Cecilia B Moens 《BMC genomics》2015,16(1)

Background

Targeting Induced Local Lesions IN Genomes (TILLING) is a reverse genetics approach to directly identify point mutations in specific genes of interest in genomic DNA from a large chemically mutagenized population. Classical TILLING processes, based on enzymatic detection of mutations in heteroduplex PCR amplicons, are slow and labor intensive.

Results

Here we describe a new TILLING strategy in zebrafish using direct next generation sequencing (NGS) of 250bp amplicons followed by Paired-End Low-Error (PELE) sequence analysis. By pooling a genomic DNA library made from over 9,000 N-ethyl-N-nitrosourea (ENU) mutagenized F1 fish into 32 equal pools of 288 fish, each with a unique Illumina barcode, we reduce the complexity of the template to a level at which we can detect mutations that occur in a single heterozygous fish in the entire library. MiSeq sequencing generates 250 base-pair overlapping paired-end reads, and PELE analysis aligns the overlapping sequences to each other and filters out any imperfect matches, thereby eliminating variants introduced during the sequencing process. We find that this filtering step reduces the number of false positive calls 50-fold without loss of true variant calls. After PELE we were able to validate 61.5% of the mutant calls that occurred at a frequency between 1 mutant call:100 wildtype calls and 1 mutant call:1000 wildtype calls in a pool of 288 fish. We then use high-resolution melt analysis to identify the single heterozygous mutation carrier in the 288-fish pool in which the mutation was identified.

Conclusions

Using this NGS-TILLING protocol we validated 28 nonsense or splice site mutations in 20 genes, at a two-fold higher efficiency than using traditional Cel1 screening. We conclude that this approach significantly increases screening efficiency and accuracy at reduced cost and can be applied in a wide range of organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1263-4) contains supplementary material, which is available to authorized users. 相似文献

8.

Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes

Zhen Xuan Yeo Joshua Chee Leong Wong Steven G Rozen Ann Siew Gek Lee 《BMC genomics》2014,15(1)

Background

The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM’s reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing.

Results

Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM.

Conclusions

New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-516) contains supplementary material, which is available to authorized users. 相似文献

9.

Design and development of exome capture sequencing for the domestic pig (Sus scrofa)

Christelle Robert Pablo Fuentes-Utrilla Karen Troup Julia Loecherbach Frances Turner Richard Talbot Alan L Archibald Alan Mileham Nader Deeb David A Hume Mick Watson 《BMC genomics》2014,15(1)

Background

The domestic pig (Sus scrofa) is both an important livestock species and a model for biomedical research. Exome sequencing has accelerated identification of protein-coding variants underlying phenotypic traits in human and mouse. We aimed to develop and validate a similar resource for the pig.

Results

We developed probe sets to capture pig exonic sequences based upon the current Ensembl pig gene annotation supplemented with mapped expressed sequence tags (ESTs) and demonstrated proof-of-principle capture and sequencing of the pig exome in 96 pigs, encompassing 24 capture experiments. For most of the samples at least 10x sequence coverage was achieved for more than 90% of the target bases. Bioinformatic analysis of the data revealed over 236,000 high confidence predicted SNPs and over 28,000 predicted indels.

Conclusions

We have achieved coverage statistics similar to those seen with commercially available human and mouse exome kits. Exome capture in pigs provides a tool to identify coding region variation associated with production traits, including loss of function mutations which may explain embryonic and neonatal losses, and to improve genomic assemblies in the vicinity of protein coding genes in the pig.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-550) contains supplementary material, which is available to authorized users. 相似文献

10.

Performance comparison of four exome capture systems for deep sequencing

Chandra Sekhar Reddy Chilamakuri Susanne Lorenz Mohammed-Amin Madoui Daniel Vodák Jinchang Sun Eivind Hovig Ola Myklebost Leonardo A Meza-Zepeda 《BMC genomics》2014,15(1)

Background

Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample.

Results

Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content.

Conclusions

We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-449) contains supplementary material, which is available to authorized users. 相似文献

11.

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

Danny Challis Lilian Antunes Erik Garrison Eric Banks Uday S Evani Donna Muzny Ryan Poplin Richard A Gibbs Gabor Marth Fuli Yu 《BMC genomics》2015,16(1)

Background

Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

Results

This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

Conclusions

In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users. 相似文献

12.

SHEAR: sample heterogeneity estimation and assembly by reference

Sean R Landman Tae Hyun Hwang Kevin AT Silverstein Yingming Li Scott M Dehm Michael Steinbach Vipin Kumar 《BMC genomics》2014,15(1)

Background

Personal genome assembly is a critical process when studying tumor genomes and other highly divergent sequences. The accuracy of downstream analyses, such as RNA-seq and ChIP-seq, can be greatly enhanced by using personal genomic sequences rather than standard references. Unfortunately, reads sequenced from these types of samples often have a heterogeneous mix of various subpopulations with different variants, making assembly extremely difficult using existing assembly tools. To address these challenges, we developed SHEAR (Sample Heterogeneity Estimation and Assembly by Reference; http://vk.cs.umn.edu/SHEAR), a tool that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis.

Results

By making use of structural variant detection algorithms, SHEAR offers improved performance in the form of a stronger ability to handle difficult structural variant types and better computational efficiency. We compare against the lead competing approach using a variety of simulated scenarios as well as real tumor cell line data with known heterogeneous variants. SHEAR is shown to successfully estimate heterogeneity percentages in both cases, and demonstrates an improved efficiency and better ability to handle tandem duplications.

Conclusion

SHEAR allows for accurate and efficient SV detection and personal genomic sequence generation. It is also able to account for heterogeneous sequencing samples, such as from tumor tissue, by estimating the subpopulation percentage for each heterogeneous variant.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-84) contains supplementary material, which is available to authorized users. 相似文献

13.

Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies

Kristopher A. Standish Tristan M. Carland Glenn K. Lockwood Wayne Pfeiffer Mahidhar Tatineni C Chris Huang Sarah Lamberth Yauheniya Cherkas Carrie Brodmerkel Ed Jaeger Lance Smith Gunaretnam Rajagopal Mark E. Curran Nicholas J. Schork 《BMC bioinformatics》2015,16(1)

Motivation

Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost.

Results

We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study.

Conclusions

We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging ‘big data’ problems in biomedical research brought on by the expansion of NGS technologies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0736-4) contains supplementary material, which is available to authorized users. 相似文献

14.

Interlocus gene conversion explains at least 2.7?% of single nucleotide variants in human segmental duplications

Beth L. Dumont 《BMC genomics》2015,16(1)

Background

Interlocus gene conversion (IGC) is a recombination-based mechanism that results in the unidirectional transfer of short stretches of sequence between paralogous loci. Although IGC is a well-established mechanism of human disease, the extent to which this mutagenic process has shaped overall patterns of segregating variation in multi-copy regions of the human genome remains unknown. One expected manifestation of IGC in population genomic data is the presence of one-to-one paralogous SNPs that segregate identical alleles.

Results

Here, I use SNP genotype calls from the low-coverage phase 3 release of the 1000 Genomes Project to identify 15,790 parallel, shared SNPs in duplicated regions of the human genome. My approach for identifying these sites accounts for the potential redundancy of short read mapping in multi-copy genomic regions, thereby effectively eliminating false positive SNP calls arising from paralogous sequence variation. I demonstrate that independent mutation events to identical nucleotides at paralogous sites are not a significant source of shared polymorphisms in the human genome, consistent with the interpretation that these sites are the outcome of historical IGC events. These putative signals of IGC are enriched in genomic contexts previously associated with non-allelic homologous recombination, including clear signals in gene families that form tandem intra-chromosomal clusters.

Conclusions

Taken together, my analyses implicate IGC, not point mutation, as the mechanism generating at least 2.7 % of single nucleotide variants in duplicated regions of the human genome.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1681-3) contains supplementary material, which is available to authorized users. 相似文献

15.

Read count-based method for high-throughput allelic genotyping of transposable elements and structural variants

Alexandre Kuhn Yao Min Ong Stephen R. Quake William F. Burkholder 《BMC genomics》2015,16(1)

Background

Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed.

Results

We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate.

Conclusions

This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1700-4) contains supplementary material, which is available to authorized users. 相似文献

16.

De Novo Occurrence of a Variant in ARL3 and Apparent Autosomal Dominant Transmission of Retinitis Pigmentosa

Samuel P. Strom Michael J. Clark Ariadna Martinez Sarah Garcia Amira A. Abelazeem Anna Matynia Sachin Parikh Lori S. Sullivan Sara J. Bowne Stephen P. Daiger Michael B. Gorin 《PloS one》2016,11(3)

Background

Retinitis pigmentosa is a phenotype with diverse genetic causes. Due to this genetic heterogeneity, genome-wide identification and analysis of protein-altering DNA variants by exome sequencing is a powerful tool for novel variant and disease gene discovery. In this study, exome sequencing analysis was used to search for potentially causal DNA variants in a two-generation pedigree with apparent dominant retinitis pigmentosa.

Methods

Variant identification and analysis of three affected members (mother and two affected offspring) was performed via exome sequencing. Parental samples of the index case were used to establish inheritance. Follow-up testing of 94 additional retinitis pigmentosa pedigrees was performed via retrospective analysis or Sanger sequencing.

Results and Conclusions

A total of 136 high quality coding variants in 123 genes were identified which are consistent with autosomal dominant disease. Of these, one of the strongest genetic and functional candidates is a c.269A>G (p.Tyr90Cys) variant in ARL3. Follow-up testing established that this variant occurred de novo in the index case. No additional putative causal variants in ARL3 were identified in the follow-up cohort, suggesting that if ARL3 variants can cause adRP it is an extremely rare phenomenon. 相似文献

17.

Validation and assessment of variant calling pipelines for next-generation sequencing

Mehdi?Pirooznia Melissa?Kramer Jennifer?Parla Fernando?S?Goes James?B?Potash W?Richard?McCombie Peter?P?Zandi Email author 《Human genomics》2014,8(1):14

Background

The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.

Results

We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.

Conclusions

Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.

相似文献

18.

Detection of internal exon deletion with exon Del

Yan Guo Shilin Zhao Brian D Lehmann Quanhu Sheng Timothy M Shaver Thomas P Stricker Jennifer A Pietenpol Yu Shyr 《BMC bioinformatics》2014,15(1)

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users. 相似文献

19.

Relationships linking amplification level to gene over-expression in gliomas

Vogt N Gibaud A Almeida A Ourliac-Garnier I Debatisse M Malfoy B 《PloS one》2010,5(12):e14249

Background

Gene amplification is thought to promote over-expression of genes favouring tumour development. Because amplified regions are usually megabase-long, amplification often concerns numerous syntenic or non-syntenic genes, among which only a subset is over-expressed. The rationale for these differences remains poorly understood.

Methodology/Principal Finding

To address this question, we used quantitative RT-PCR to determine the expression level of a series of co-amplified genes in five xenografted and one fresh human gliomas. These gliomas were chosen because we have previously characterised in detail the genetic content of their amplicons. In all the cases, the amplified sequences lie on extra-chromosomal DNA molecules, as commonly observed in gliomas. We show here that genes transcribed in non-amplified gliomas are over-expressed when amplified, roughly in proportion to their copy number, while non-expressed genes remain inactive. When specific antibodies were available, we also compared protein expression in amplified and non-amplified tumours. We found that protein accumulation barely correlates with the level of mRNA expression in some of these tumours.

Conclusions/Significance

Here we show that the tissue-specific pattern of gene expression is maintained upon amplification in gliomas. Our study relies on a single type of tumour and a limited number of cases. However, it strongly suggests that, even when amplified, genes that are normally silent in a given cell type play no role in tumour progression. The loose relationships between mRNA level and protein accumulation and/or activity indicate that translational or post-translational events play a key role in fine-tuning the final outcome of amplification in gliomas. 相似文献

20.

The diagnostic application of targeted re-sequencing in Korean patients with retinitis pigmentosa

Chang-Ki Yoon Nayoung K. D. Kim Je-Gun Joung Joo Young Shin Jung Hyun Park Hye-Hyun Eum Hae-ock Lee Woong-Yang Park Hyeong Gon Yu 《BMC genomics》2015,16(1)

Background

Identification of the causative genes of retinitis pigmentosa (RP) is important for the clinical care of patients with RP. However, a comprehensive genetic study has not been performed in Korean RP patients. Moreover, the genetic heterogeneity found in sensorineural genetic disorders makes identification of pathogenic mutations challenging. Therefore, high throughput genetic testing using massively parallel sequencing is needed.

Results

Sixty-two Korean patients with nonsyndromic RP (46 patients from 18 families and 16 simplex cases) who consented to molecular genetic testing were recruited in this study and targeted exome sequencing was applied on 53 RP-related genes. Causal variants were characterised by selecting exonic and splicing variants, selecting variants with low allele frequency (below 1 %), and discarding the remaining variants with quality below 20. The variants were additionally confirmed by an inheritance pattern and cosegregation test of the families, and the rest of the variants were prioritised using in-silico prediction tools. Finally, causal variants were detected from 10 of 18 familial cases (55.5 %) and 7 of 16 simplex cases (43.7 %) in total. Novel variants were detected in 13 of 20 (65 %) candidate variants. Compound heterozygous variants were found in four of 7 simplex cases.

Conclusion

Panel-based targeted re-sequencing can be used as an effective molecular diagnostic tool for RP.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1723-x) contains supplementary material, which is available to authorized users. 相似文献