期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parla JS Iossifov I Grabill I Spector MS Kramer M McCombie WR 《Genome biology》2011,12(9):R97

Background

Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. 相似文献

2.

Performance comparison of four exome capture systems for deep sequencing

Chandra Sekhar Reddy Chilamakuri Susanne Lorenz Mohammed-Amin Madoui Daniel Vodák Jinchang Sun Eivind Hovig Ola Myklebost Leonardo A Meza-Zepeda 《BMC genomics》2014,15(1)

Background

Recent developments in deep (next-generation) sequencing technologies are significantly impacting medical research. The global analysis of protein coding regions in genomes of interest by whole exome sequencing is a widely used application. Many technologies for exome capture are commercially available; here we compare the performance of four of them: NimbleGen’s SeqCap EZ v3.0, Agilent’s SureSelect v4.0, Illumina’s TruSeq Exome, and Illumina’s Nextera Exome, all applied to the same human tumor DNA sample.

Results

Each capture technology was evaluated for its coverage of different exome databases, target coverage efficiency, GC bias, sensitivity in single nucleotide variant detection, sensitivity in small indel detection, and technical reproducibility. In general, all technologies performed well; however, our data demonstrated small, but consistent differences between the four capture technologies. Illumina technologies cover more bases in coding and untranslated regions. Furthermore, whereas most of the technologies provide reduced coverage in regions with low or high GC content, the Nextera technology tends to bias towards target regions with high GC content.

Conclusions

We show key differences in performance between the four technologies. Our data should help researchers who are planning exome sequencing to select appropriate exome capture technology for their particular application.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-449) contains supplementary material, which is available to authorized users. 相似文献

3.

Design and development of exome capture sequencing for the domestic pig (Sus scrofa)

Christelle Robert Pablo Fuentes-Utrilla Karen Troup Julia Loecherbach Frances Turner Richard Talbot Alan L Archibald Alan Mileham Nader Deeb David A Hume Mick Watson 《BMC genomics》2014,15(1)

Background

The domestic pig (Sus scrofa) is both an important livestock species and a model for biomedical research. Exome sequencing has accelerated identification of protein-coding variants underlying phenotypic traits in human and mouse. We aimed to develop and validate a similar resource for the pig.

Results

We developed probe sets to capture pig exonic sequences based upon the current Ensembl pig gene annotation supplemented with mapped expressed sequence tags (ESTs) and demonstrated proof-of-principle capture and sequencing of the pig exome in 96 pigs, encompassing 24 capture experiments. For most of the samples at least 10x sequence coverage was achieved for more than 90% of the target bases. Bioinformatic analysis of the data revealed over 236,000 high confidence predicted SNPs and over 28,000 predicted indels.

Conclusions

We have achieved coverage statistics similar to those seen with commercially available human and mouse exome kits. Exome capture in pigs provides a tool to identify coding region variation associated with production traits, including loss of function mutations which may explain embryonic and neonatal losses, and to improve genomic assemblies in the vicinity of protein coding genes in the pig.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-550) contains supplementary material, which is available to authorized users. 相似文献

4.

Comparison of solution-based exome capture methods for next generation sequencing

Sulonen AM Ellonen P Almusa H Lepistö M Eldfors S Hannula S Miettinen T Tyynismaa H Salo P Heckman C Joensuu H Raivio T Suomalainen A Saarela J 《Genome biology》2011,12(9):R94-18

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons. 相似文献

5.

Comprehensive comparison of three commercial human whole-exome capture platforms

Asan Xu Y Jiang H Tyler-Smith C Xue Y Jiang T Wang J Wu M Liu X Tian G Wang J Wang J Yang H Zhang X 《Genome biology》2011,12(9):R95-12

Background

Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

Results

We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

Conclusions

We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set. 相似文献

6.

Variant detection sensitivity and biases in whole genome and exome sequencing

Alison M Meynert Morad Ansari David R FitzPatrick Martin S Taylor 《BMC bioinformatics》2014,15(1)

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users. 相似文献

7.

Targeted high throughput sequencing of a cancer-related exome subset by specific sequence capture with a fully automated microarray platform

Daniel Summerer Nadine Schracke Haiguo Wu Yang Cheng Stephan Bau Cord F. Stähler Peer F. Stähler Markus Beier 《Genomics》2010,95(4):241-246

Sequence capture methods for targeted next generation sequencing promise to massively reduce cost of genomics projects compared to untargeted sequencing. However, evaluated capture methods specifically dedicated to biologically relevant genomic regions are rare. Whole exome capture has been shown to be a powerful tool to discover the genetic origin of disease and provides a reduction in target size and thus calculative sequencing capacity of > 90-fold compared to untargeted whole genome sequencing. For further cost reduction, a valuable complementing approach is the analysis of smaller, relevant gene subsets but involving large cohorts of samples. However, effective adjustment of target sizes and sample numbers is hampered by the limited scalability of enrichment systems. We report a highly scalable and automated method to capture a 480 Kb exome subset of 115 cancer-related genes using microfluidic DNA arrays. The arrays are adaptable from 125 Kb to 1 Mb target size and/or one to eight samples without barcoding strategies, representing a further 26 – 270-fold reduction of calculative sequencing capacity compared to whole exome sequencing. Illumina GAII analysis of a HapMap genome enriched for this exome subset revealed a completeness of > 96%. Uniformity was such that > 68% of exons had at least half the median depth of coverage. An analysis of reference SNPs revealed a sensitivity of up to 93% and a specificity of 98.2% or higher. 相似文献

8.

Design of DNA Pooling to Allow Incorporation of Covariates in Rare Variants Analysis

Weihua Guan Chun Li 《PloS one》2014,9(12)

Background

Rapid advances in next-generation sequencing technologies facilitate genetic association studies of an increasingly wide array of rare variants. To capture the rare or less common variants, a large number of individuals will be needed. However, the cost of a large scale study using whole genome or exome sequencing is still high. DNA pooling can serve as a cost-effective approach, but with a potential limitation that the identity of individual genomes would be lost and therefore individual characteristics and environmental factors could not be adjusted in association analysis, which may result in power loss and a biased estimate of genetic effect.

Methods

For case-control studies, we propose a design strategy for pool creation and an analysis strategy that allows covariate adjustment, using multiple imputation technique.

Results

Simulations show that our approach can obtain reasonable estimate for genotypic effect with only slight loss of power compared to the much more expensive approach of sequencing individual genomes.

Conclusion

Our design and analysis strategies enable more powerful and cost-effective sequencing studies of complex diseases, while allowing incorporation of covariate adjustment. 相似文献

9.

Screening the human exome: a comparison of whole genome and whole transcriptome sequencing

Elizabeth T Cirulli Abanish Singh Kevin V Shianna Dongliang Ge Jason P Smith Jessica M Maia Erin L Heinzen James J Goedert David B Goldstein the Center for HIV/AIDS Vaccine Immunology 《Genome biology》2010,11(5):R57

Background

There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. 相似文献

10.

De Novo Occurrence of a Variant in ARL3 and Apparent Autosomal Dominant Transmission of Retinitis Pigmentosa

Samuel P. Strom Michael J. Clark Ariadna Martinez Sarah Garcia Amira A. Abelazeem Anna Matynia Sachin Parikh Lori S. Sullivan Sara J. Bowne Stephen P. Daiger Michael B. Gorin 《PloS one》2016,11(3)

Background

Retinitis pigmentosa is a phenotype with diverse genetic causes. Due to this genetic heterogeneity, genome-wide identification and analysis of protein-altering DNA variants by exome sequencing is a powerful tool for novel variant and disease gene discovery. In this study, exome sequencing analysis was used to search for potentially causal DNA variants in a two-generation pedigree with apparent dominant retinitis pigmentosa.

Methods

Variant identification and analysis of three affected members (mother and two affected offspring) was performed via exome sequencing. Parental samples of the index case were used to establish inheritance. Follow-up testing of 94 additional retinitis pigmentosa pedigrees was performed via retrospective analysis or Sanger sequencing.

Results and Conclusions

A total of 136 high quality coding variants in 123 genes were identified which are consistent with autosomal dominant disease. Of these, one of the strongest genetic and functional candidates is a c.269A>G (p.Tyr90Cys) variant in ARL3. Follow-up testing established that this variant occurred de novo in the index case. No additional putative causal variants in ARL3 were identified in the follow-up cohort, suggesting that if ARL3 variants can cause adRP it is an extremely rare phenomenon. 相似文献

11.

Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling

Guoqiang Zhang Jianfeng Wang Jin Yang Wenjie Li Yutian Deng Jing Li Jun Huang Songnian Hu Bing Zhang 《BMC genomics》2015,16(1)

Background

To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq™ Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer.

Results

Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3 % in four samples, whereas the concordance of co-detected variant loci reached 99 %. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5 %) was higher than the SNPs specific to TargetSeq-Proton (60.0 %) or specific to SureSelect-HiSeq (88.3 %). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0 %) and SureSelect-HiSeq-specific (89.6 %) were higher than those of TargetSeq-Proton-specific (15.8 %).

Conclusions

In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1796-6) contains supplementary material, which is available to authorized users. 相似文献

12.

Nucleotide polymorphism and copy number variant detection using exome capture and next‐generation sequencing in the polyploid grass Panicum virgatum

Joseph Evans Jeongwoon Kim Kevin L. Childs Brieanne Vaillancourt Emily Crisovan Aruna Nandety Daniel J. Gerhardt Todd A. Richmond Jeffrey A. Jeddeloh Shawn M. Kaeppler Michael D. Casler C. Robin Buell 《The Plant journal : for cell and molecular biology》2014,79(6):993-1008

相似文献

13.

Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries

Ji-Ping?Z?Wang Email author Bruce?G?Lindsay Liying?Cui P?Kerr?Wall Josh?Marion Jiaxuan?Zhang Claude?W?dePamphilis 《BMC bioinformatics》2005,6(1):300

Background

In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. 相似文献

14.

Validation of multiple single nucleotide variation calls by additional exome analysis with a semiconductor sequencer to supplement data of whole-genome sequencing of a human population

Ikuko N Motoike Mitsuyo Matsumoto Inaho Danjoh Fumiki Katsuoka Kaname Kojima Naoki Nariai Yukuto Sato Yumi Yamaguchi-Kabata Shin Ito Hisaaki Kudo Ichiko Nishijima Satoshi Nishikawa Xiaoqing Pan Rumiko Saito Sakae Saito Tomo Saito Matsuyuki Shirota Kaoru Tsuda Junji Yokozawa Kazuhiko Igarashi Naoko Minegishi Osamu Tanabe Nobuo Fuse Masao Nagasaki Kengo Kinoshita Jun Yasuda Masayuki Yamamoto 《BMC genomics》2014,15(1)

Background

Validation of single nucleotide variations in whole-genome sequencing is critical for studying disease-related variations in large populations. A combination of different types of next-generation sequencers for analyzing individual genomes may be an efficient means of validating multiple single nucleotide variations calls simultaneously.

Results

Here, we analyzed 12 independent Japanese genomes using two next-generation sequencing platforms: the Illumina HiSeq 2500 platform for whole-genome sequencing (average depth 32.4×), and the Ion Proton semiconductor sequencer for whole exome sequencing (average depth 109×). Single nucleotide polymorphism (SNP) calls based on the Illumina Human Omni 2.5-8 SNP chip data were used as the reference. We compared the variant calls for the 12 samples, and found that the concordance between the two next-generation sequencing platforms varied between 83% and 97%.

Conclusions

Our results show the versatility and usefulness of the combination of exome sequencing with whole-genome sequencing in studies of human population genetics and demonstrate that combining data from multiple sequencing platforms is an efficient approach to validate and supplement SNP calls.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-673) contains supplementary material, which is available to authorized users. 相似文献

15.

Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing

Zhao Q Kirkness EF Caballero OL Galante PA Parmigiani RB Edsall L Kuan S Ye Z Levy S Vasconcelos AT Ren B de Souza SJ Camargo AA Simpson AJ Strausberg RL 《Genome biology》2010,11(11):R114-14

相似文献

16.

Identification of copy number variants from exome sequence data

Pubudu Saneth Samarakoon Hanne S?rmo Sorte Bj?rn Evert Kristiansen Tove Skodje Ying Sheng Geir E Tj?nnfjord Barbro Stadheim Asbj?rg Stray-Pedersen Olaug Kristin R?dningen Robert Lyle 《BMC genomics》2014,15(1)

Background

With advances in next generation sequencing technologies and genomic capture techniques, exome sequencing has become a cost-effective approach for mutation detection in genetic diseases. However, computational prediction of copy number variants (CNVs) from exome sequence data is a challenging task. Whilst numerous programs are available, they have different sensitivities, and have low sensitivity to detect smaller CNVs (1–4 exons). Additionally, exonic CNV discovery using standard aCGH has limitations due to the low probe density over exonic regions. The goal of our study was to develop a protocol to detect exonic CNVs (including shorter CNVs that cover 1–4 exons), combining computational prediction algorithms and a high-resolution custom CGH array.

Results

We used six published CNV prediction programs (ExomeCNV, CONTRA, ExomeCopy, ExomeDepth, CoNIFER, XHMM) and an in-house modification to ExomeCopy and ExomeDepth (ExCopyDepth) for computational CNV prediction on 30 exomes from the 1000 genomes project and 9 exomes from primary immunodeficiency patients. CNV predictions were tested using a custom CGH array designed to capture all exons (exaCGH). After this validation, we next evaluated the computational prediction of shorter CNVs. ExomeCopy and the in-house modified algorithm, ExCopyDepth, showed the highest capability in detecting shorter CNVs. Finally, the performance of each computational program was assessed by calculating the sensitivity and false positive rate.

Conclusions

In this paper, we assessed the ability of 6 computational programs to predict CNVs, focussing on short (1–4 exon) CNVs. We also tested these predictions using a custom array targeting exons. Based on these results, we propose a protocol to identify and confirm shorter exonic CNVs combining computational prediction algorithms and custom aCGH experiments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-661) contains supplementary material, which is available to authorized users. 相似文献

17.

SNES: single nucleus exome sequencing

Marco L Leung Yong Wang Jill Waters Nicholas E Navin 《Genome biology》2015,16(1)

Single-cell genome sequencing methods are challenged by poor physical coverage and high error rates, making it difficult to distinguish real biological variants from technical artifacts. To address this problem, we developed a method called SNES that combines flow-sorting of single G1/0 or G2/M nuclei, time-limited multiple-displacement-amplification, exome capture, and next-generation sequencing to generate high coverage (96%) data from single human cells. We validated our method in a fibroblast cell line, and show low allelic dropout and false-positive error rates, resulting in high detection efficiencies for single nucleotide variants (92%) and indels (85%) in single cells.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0616-2) contains supplementary material, which is available to authorized users. 相似文献

18.

Transcriptome-based differentiation of closely-related Miscanthus lines

Chouvarine P Cooksey AM McCarthy FM Ray DA Baldwin BS Burgess SC Peterson DG 《PloS one》2012,7(1):e29850

Background

Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations.

Results

A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence.

Conclusions

Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation. 相似文献

19.

Detection of internal exon deletion with exon Del

Yan Guo Shilin Zhao Brian D Lehmann Quanhu Sheng Timothy M Shaver Thomas P Stricker Jennifer A Pietenpol Yu Shyr 《BMC bioinformatics》2014,15(1)

Background

Exome sequencing allows researchers to study the human genome in unprecedented detail. Among the many types of variants detectable through exome sequencing, one of the most over looked types of mutation is internal deletion of exons. Internal exon deletions are the absence of consecutive exons in a gene. Such deletions have potentially significant biological meaning, and they are often too short to be considered copy number variation. Therefore, to the need for efficient detection of such deletions using exome sequencing data exists.

Results

We present ExonDel, a tool specially designed to detect homozygous exon deletions efficiently. We tested ExonDel on exome sequencing data generated from 16 breast cancer cell lines and identified both novel and known IEDs. Subsequently, we verified our findings using RNAseq and PCR technologies. Further comparisons with multiple sequencing-based CNV tools showed that ExonDel is capable of detecting unique IEDs not found by other CNV tools.

Conclusions

ExonDel is an efficient way to screen for novel and known IEDs using exome sequencing data. ExonDel and its source code can be downloaded freely at https://github.com/slzhao/ExonDel.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-332) contains supplementary material, which is available to authorized users. 相似文献

20.

SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations

Steven N. Hart Vivekananda Sarangi Raymond Moore Saurabh Baheti Jaysheel D. Bhavsar Fergus J. Couch Jean-Pierre A. Kocher 《PloS one》2013,8(12)

Background

Structural variation (SV) represents a significant, yet poorly understood contribution to an individual’s genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints.

Results

We developed and validated SoftSearch using real and synthetic datasets. SoftSearch’s key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call.

Conclusions

We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance. 相似文献