首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations.

Methodology/Principal Findings

We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS.

Conclusions/Significance

MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive.  相似文献   

2.

Background

The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM’s reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing.

Results

Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM.

Conclusions

New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-516) contains supplementary material, which is available to authorized users.  相似文献   

3.

Objectives

The aims of this study were to test the utility of benchtop NGS platforms for NIPT for trisomy 21 using previously published z score calculation methods and to optimize the sample preparation and data analysis with use of in silico and physical size selection methods.

Methods

Samples from 130 pregnant women were analyzed by whole genome sequencing on benchtop NGS systems Ion Torrent PGM and MiSeq. The targeted yield of 3 million raw reads on each platform was used for z score calculation. The impact of in silico and physical size selection on analytical performance of the test was studied.

Results

Using a z score value of 3 as the cut-off, 98.11% - 100% (104-106/106) specificity and 100% (24/24) sensitivity and 99.06% - 100% (105-106/106) specificity and 100% (24/24) sensitivity were observed for Ion Torrent PGM and MiSeq, respectively. After in silico based size selection both platforms reached 100% specificity and sensitivity. Following the physical size selection z scores of tested trisomic samples increased significantly—p = 0.0141 and p = 0.025 for Ion Torrent PGM and MiSeq, respectively.

Conclusions

Noninvasive prenatal testing for chromosome 21 trisomy with the utilization of benchtop NGS systems led to results equivalent to previously published studies performed on high-to-ultrahigh throughput NGS systems. The in silico size selection led to higher specificity of the test. Physical size selection performed on isolated DNA led to significant increase in z scores. The observed results could represent a basis for increasing of cost effectiveness of the test and thus help with its penetration worldwide.  相似文献   

4.

Background

Influenza viruses exist as a large group of closely related viral genomes, also called quasispecies. The composition of this influenza viral quasispecies can be determined by an accurate and sensitive sequencing technique and data analysis pipeline. We compared the suitability of two benchtop next-generation sequencers for whole genome influenza A quasispecies analysis: the Illumina MiSeq sequencing-by-synthesis and the Ion Torrent PGM semiconductor sequencing technique.

Results

We first compared the accuracy and sensitivity of both sequencers using plasmid DNA and different ratios of wild type and mutant plasmid. Illumina MiSeq sequencing reads were one and a half times more accurate than those of the Ion Torrent PGM. The majority of sequencing errors were substitutions on the Illumina MiSeq and insertions and deletions, mostly in homopolymer regions, on the Ion Torrent PGM. To evaluate the suitability of the two techniques for determining the genome diversity of influenza A virus, we generated plasmid-derived PR8 virus and grew this virus in vitro. We also optimized an RT-PCR protocol to obtain uniform coverage of all eight genomic RNA segments. The sequencing reads obtained with both sequencers could successfully be assembled de novo into the segmented influenza virus genome. After mapping of the reads to the reference genome, we found that the detection limit for reliable recognition of variants in the viral genome required a frequency of 0.5% or higher. This threshold exceeds the background error rate resulting from the RT-PCR reaction and the sequencing method. Most of the variants in the PR8 virus genome were present in hemagglutinin, and these mutations were detected by both sequencers.

Conclusions

Our approach underlines the power and limitations of two commonly used next-generation sequencers for the analysis of influenza virus gene diversity. We conclude that the Illumina MiSeq platform is better suited for detecting variant sequences whereas the Ion Torrent PGM platform has a shorter turnaround time. The data analysis pipeline that we propose here will also help to standardize variant calling in small RNA genomes based on next-generation sequencing data.  相似文献   

5.
Introduction: Next Generation Sequencing (NGS) is cost-effective and a faster method to study genes, but its protocol is challenging.Objective: To analyze different adjustments to the protocol for screening the BRCA genes using Ion Torrent PGM sequencing and correlate the results with the number of false positive (FP) variants.Material and methods: We conducted a library preparation process and analyzed the number of FP InDels, the library concentration, the number of cycles in the target amplification step, the purity of the nucleic acid, the input, and the number of samples/Ion 314 chips in association with the results obtained by NGS.Results: We carried out 51 reactions and nine adjustments of protocols and observed eight FP InDels in homopolymer regions. No FP Single-Nucleotide Polymorphism variant was observed; 67.5% of protocol variables were jointly associated with the quality of the results obtained (p<0.05). The number of FP InDels decreased when the quality of results increased.Conclusion: The Ion AmpliSeq BRCA1/BRCA2 Community Panel had a better performance using four samples per Ion-314 chip instead of eight and the optimum number of cycles in the amplification step, even when using high-quality DNA, was 23. We observed better results with the manual equalization process and not using the Ion Library Equalizer kit. These adjustments provided a higher coverage of the variants and fewer artifacts (6.7-fold). Laboratories must perform internal validation because FP InDel variants can vary according to the quality of results while the NGS assay should be validated with Sanger.  相似文献   

6.

Background

Targeted Next Generation Sequencing (NGS) offers a way to implement testing of multiple genetic aberrations in diagnostic pathology practice, which is necessary for personalized cancer treatment. However, no standards regarding input material have been defined. This study therefore aimed to determine the effect of the type of input material (e.g. formalin fixed paraffin embedded (FFPE) versus fresh frozen (FF) tissue) on NGS derived results. Moreover, this study aimed to explore a standardized analysis pipeline to support consistent clinical decision-making.

Method

We used the Ion Torrent PGM sequencing platform in combination with the Ion AmpliSeq Cancer Hotspot Panel v2 to sequence frequently mutated regions in 50 cancer related genes, and validated the NGS detected variants in 250 FFPE samples using standard diagnostic assays. Next, 386 tumour samples were sequenced to explore the effect of input material on variant detection variables. For variant calling, Ion Torrent analysis software was supplemented with additional variant annotation and filtering.

Results

Both FFPE and FF tissue could be sequenced reliably with a sensitivity of 99.1%. Validation showed a 98.5% concordance between NGS and conventional sequencing techniques, where NGS provided both the advantage of low input DNA concentration and the detection of low-frequency variants. The reliability of mutation analysis could be further improved with manual inspection of sequence data.

Conclusion

Targeted NGS can be reliably implemented in cancer diagnostics using both FFPE and FF tissue when using appropriate analysis settings, even with low input DNA.  相似文献   

7.
The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori.  相似文献   

8.
Three benchtop high-throughput sequencing instruments are now available. The 454 GS Junior (Roche), MiSeq (Illumina) and Ion Torrent PGM (Life Technologies) are laser-printer sized and offer modest set-up and running costs. Each instrument can generate data required for a draft bacterial genome sequence in days, making them attractive for identifying and characterizing pathogens in the clinical setting. We compared the performance of these instruments by sequencing an isolate of Escherichia coli O104:H4, which caused an outbreak of food poisoning in Germany in 2011. The MiSeq had the highest throughput per run (1.6 Gb/run, 60 Mb/h) and lowest error rates. The 454 GS Junior generated the longest reads (up to 600 bases) and most contiguous assemblies but had the lowest throughput (70 Mb/run, 9 Mb/h). Run in 100-bp mode, the Ion Torrent PGM had the highest throughput (80–100 Mb/h). Unlike the MiSeq, the Ion Torrent PGM and 454 GS Junior both produced homopolymer-associated indel errors (1.5 and 0.38 errors per 100 bases, respectively).  相似文献   

9.

Background

The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.

Results

We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.

Conclusions

Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.
  相似文献   

10.
The functional consequences of missense variants in disease genes are difficult to predict. We assessed if gene expression profiles could distinguish between BRCA1 or BRCA2 pathogenic truncating and missense mutation carriers and familial breast cancer cases whose disease was not attributable to BRCA1 or BRCA2 mutations (BRCAX cases). 72 cell lines from affected women in high-risk breast ovarian families were assayed after exposure to ionising irradiation, including 23 BRCA1 carriers, 22 BRCA2 carriers, and 27 BRCAX individuals. A subset of 10 BRCAX individuals carried rare BRCA1/2 sequence variants considered to be of low clinical significance (LCS). BRCA1 and BRCA2 mutation carriers had similar expression profiles, with some subclustering of missense mutation carriers. The majority of BRCAX individuals formed a distinct cluster, but BRCAX individuals with LCS variants had expression profiles similar to BRCA1/2 mutation carriers. Gaussian Process Classifier predicted BRCA1, BRCA2 and BRCAX status, with a maximum of 62% accuracy, and prediction accuracy decreased with inclusion of BRCAX samples carrying an LCS variant, and inclusion of pathogenic missense carriers. Similarly, prediction of mutation status with gene lists derived using Support Vector Machines was good for BRCAX samples without an LCS variant (82–94%), poor for BRCAX with an LCS (40–50%), and improved for pathogenic BRCA1/2 mutation carriers when the gene list used for prediction was appropriate to mutation effect being tested (71–100%). This study indicates that mutation effect, and presence of rare variants possibly associated with a low risk of cancer, must be considered in the development of array-based assays of variant pathogenicity.  相似文献   

11.
Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.  相似文献   

12.

Background

With the price of next generation sequencing steadily decreasing, bacterial genome assembly is now accessible to a wide range of researchers. It is therefore necessary to understand the best methods for generating a genome assembly, specifically, which combination of sequencing and bioinformatics strategies result in the most accurate assemblies. Here, we sequence three E. coli strains on the Illumina MiSeq, Life Technologies Ion Torrent PGM, and Pacific Biosciences RS. We then perform genome assemblies on all three datasets alone or in combination to determine the best methods for the assembly of bacterial genomes.

Results

Three E. coli strains – BL21(DE3), Bal225, and DH5α – were sequenced to a depth of 100× on the MiSeq and Ion Torrent machines and to at least 125× on the PacBio RS. Four assembly methods were examined and compared. The previously published BL21(DE3) genome [GenBank:AM946981.2], allowed us to evaluate the accuracy of each of the BL21(DE3) assemblies. BL21(DE3) PacBio-only assemblies resulted in a 90% reduction in contigs versus short read only assemblies, while N50 numbers increased by over 7-fold. Strikingly, the number of SNPs in PacBio-only assemblies were less than half that seen with short read assemblies (~20 SNPs vs. ~50 SNPs) and indels also saw dramatic reductions (~2 indel >5 bp in PacBio-only assemblies vs. ~12 for short-read only assemblies). Assemblies that used a mixture of PacBio and short read data generally fell in between these two extremes. Use of PacBio sequencing reads also allowed us to call covalent base modifications for the three strains. Each of the strains used here had a known covalent base modification genotype, which was confirmed by PacBio sequencing.

Conclusion

Using data generated solely from the Pacific Biosciences RS, we were able to generate the most complete and accurate de novo assemblies of E. coli strains. We found that the addition of other sequencing technology data offered no improvements over use of PacBio data alone. In addition, the sequencing data from the PacBio RS allowed for sensitive and specific calling of covalent base modifications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-675) contains supplementary material, which is available to authorized users.  相似文献   

13.
14.
Pre-existing low-frequency resistance-associated variants (RAVs) may jeopardize successful sustained virological responses (SVR) to HCV treatment with direct-acting antivirals (DAAs). However, the potential impact of low-frequency (∼0.1%) mutations, concatenated mutations (haplotypes), and their association with genotypes (Gts) on the treatment outcome has not yet been elucidated, most probably owing to the difficulty in detecting pre-existing minor haplotypes with sufficient length and accuracy. Herein, we characterize a methodological framework based on Illumina MiSeq next-generation sequencing (NGS) coupled with bioinformatics of quasispecies reconstruction (QSR) to realize highly accurate variant calling and genotype-haplotype detection. The core-to-NS3 protease coding sequences in 10 HCV monoinfected patients, 5 of whom had a history of blood transfusion, and 11 HCV/HIV coinfected patients with hemophilia, were studied. Simulation experiments showed that, for minor variants constituting more than 1%, our framework achieved a positive predictive value (PPV) of 100% and sensitivities of 91.7–100% for genotyping and 80.6% for RAV screening. Genotyping analysis indicated the prevalence of dominant Gt1a infection in coinfected patients (6/11 vs 0/10, p = 0.01). For clinical samples, minor genotype overlapping infection was prevalent in HCV/HIV coinfected hemophiliacs (10/11) and patients who experienced whole-blood transfusion (4/5) but none in patients without exposure to blood (0/5). As for RAV screening, the Q80K/R and S122K/R variants were particularly prevalent among minor RAVs observed, detected in 12/21 and 6/21 cases, respectively. Q80K was detected only in coinfected patients, whereas Q80R was predominantly detected in monoinfected patients (1/11 vs 7/10, p < 0.01). Multivariate interdependence analysis revealed the previously unrecognized prevalence of Gt1b-Q80K, in HCV/HIV coinfected hemophiliacs [Odds ratio = 13.4 (3.48–51.9), p < 0.01]. Our study revealed the distinct characteristics of viral quasispecies between the subgroups specified above and the feasibility of NGS and QSR-based genetic deconvolution of pre-existing minor Gts, RAVs, and their interrelationships.  相似文献   

15.
Breast cancer is the most commonly diagnosed cancer in women, with 10% of disease attributed to hereditary factors. Although BRCA1 and BRCA2 account for a high percentage of hereditary cases, there are more than 25 susceptibility genes that differentially impact the risk for breast cancer. Traditionally, germline testing for breast cancer was performed by Sanger dideoxy terminator sequencing in a reflexive manner, beginning with BRCA1 and BRCA2. The introduction of next-generation sequencing (NGS) has enabled the simultaneous testing of all genes implicated in breast cancer resulting in diagnostic labs offering large, comprehensive gene panels. However, some physicians prefer to only test for those genes in which established surveillance and treatment protocol exists. The NGS based BRCAplus test utilizes a custom tiled PCR based target enrichment design and bioinformatics pipeline coupled with array comparative genomic hybridization (aCGH) to identify mutations in the six high-risk genes: BRCA1, BRCA2, PTEN, TP53, CDH1, and STK11. Validation of the assay with 250 previously characterized samples resulted in 100% detection of 3,025 known variants and analytical specificity of 99.99%. Analysis of the clinical performance of the first 3,000 BRCAplus samples referred for testing revealed an average coverage greater than 9,000X per target base pair resulting in excellent specificity and the sensitivity to detect low level mosaicism and allele-drop out. The unique design of the assay enabled the detection of pathogenic mutations missed by previous testing. With the abundance of NGS diagnostic tests being released, it is essential that clinicians understand the advantages and limitations of different test designs.  相似文献   

16.
The development of next generation sequencing has challenged the use of other molecular fingerprinting methods used to study microbial diversity. We analysed the bacterial diversity in the rumen of defaunated sheep following the introduction of different protozoal populations, using both next generation sequencing (NGS: Ion Torrent PGM) and terminal restriction fragment length polymorphism (T-RFLP). Although absolute number differed, there was a high correlation between NGS and T-RFLP in terms of richness and diversity with R values of 0.836 and 0.781 for richness and Shannon-Wiener index, respectively. Dendrograms for both datasets were also highly correlated (Mantel test = 0.742). Eighteen OTUs and ten genera were significantly impacted by the addition of rumen protozoa, with an increase in the relative abundance of Prevotella, Bacteroides and Ruminobacter, related to an increase in free ammonia levels in the rumen. Our findings suggest that classic fingerprinting methods are still valuable tools to study microbial diversity and structure in complex environments but that NGS techniques now provide cost effect alternatives that provide a far greater level of information on the individual members of the microbial population.  相似文献   

17.
Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.  相似文献   

18.
BackgroundThe spectrum of BRCA1 and BRCA2 mutations varies among populations; however, some mutations may be frequent in particular ethnic groups due to the “founder” effect. The c.3700_3704del mutation was previously described as a recurrent BRCA1 variant in Eastern European countries. This study aimed to investigate the frequency of c.3700_3704del BRCA1 mutation in Albanian breast and ovarian cancer patients from North Macedonia and Kosovo.Materials and methodsA total of 327 patients with invasive breast and/or ovarian cancer (111 Albanian women from North Macedonia and 216 from Kosovo) were screened for 13 recurrent BRCA1/2 mutations. Targeted NGS with a panel of 94 cancer-associated genes including BRCA1 and BRCA2 was performed in a selected group of 118 patients.ResultsWe have identified 21 BRCA1/2 pathogenic variants, 17 (14 BRCA1 and 3 BRCA2) in patients from Kosovo (7.9%) and 4 (1 BRCA1 and 3 BRCA2) in patients from North Macedonia (3.6%). All BRCA1/2 mutations were found in one patient each, except for c.3700_3704del BRCA1 mutation which was observed in 14 unrelated families, all except one originating from Kosovo. The c.3700_3704del mutation accounts for 93% of BRCA1 mutation positive cases and is present with a frequency of 6% among breast cancer patients from Kosovo.ConclusionsThis is the first report of BRCA1/2 mutations among breast and ovarian cancer patients from Kosovo. The finding that BRCA1 c.3700_3704del represents a founder mutation in Kosovo with the highest worldwide reported frequency supports the implementation of fast and low-cost screening protocol, regardless of the family history and even a pilot population-based screening in at-risk population.  相似文献   

19.

Motivation

Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost.

Results

We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study.

Conclusions

We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging ‘big data’ problems in biomedical research brought on by the expansion of NGS technologies.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0736-4) contains supplementary material, which is available to authorized users.  相似文献   

20.
We develop a statistical tool SNVer for calling common and rare variants in analysis of pooled or individual next-generation sequencing (NGS) data. We formulate variant calling as a hypothesis testing problem and employ a binomial-binomial model to test the significance of observed allele frequency against sequencing error. SNVer reports one single overall P-value for evaluating the significance of a candidate locus being a variant based on which multiplicity control can be obtained. This is particularly desirable because tens of thousands loci are simultaneously examined in typical NGS experiments. Each user can choose the false-positive error rate threshold he or she considers appropriate, instead of just the dichotomous decisions of whether to 'accept or reject the candidates' provided by most existing methods. We use both simulated data and real data to demonstrate the superior performance of our program in comparison with existing methods. SNVer runs very fast and can complete testing 300 K loci within an hour. This excellent scalability makes it feasible for analysis of whole-exome sequencing data, or even whole-genome sequencing data using high performance computing cluster. SNVer is freely available at http://snver.sourceforge.net/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号