期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Evaluation of next generation sequencing platforms for population targeted sequencing studies

Olivier Harismendy Pauline C Ng Robert L Strausberg Xiaoyun Wang Timothy B Stockwell Karen Y Beeson Nicholas J Schork Sarah S Murray Eric J Topol Samuel Levy Kelly A Frazer 《Genome biology》2009,10(3):R32-13

Background

Next generation sequencing (NGS) platforms are currently being utilized for targeted sequencing of candidate genes or genomic intervals to perform sequence-based association studies. To evaluate these platforms for this application, we analyzed human sequence generated by the Roche 454, Illumina GA, and the ABI SOLiD technologies for the same 260 kb in four individuals.

Results

Local sequence characteristics contribute to systematic variability in sequence coverage (>100-fold difference in per-base coverage), resulting in patterns for each NGS technology that are highly correlated between samples. A comparison of the base calls to 88 kb of overlapping ABI 3730xL Sanger sequence generated for the same samples showed that the NGS platforms all have high sensitivity, identifying >95% of variant sites. At high coverage, depth base calling errors are systematic, resulting from local sequence contexts; as the coverage is lowered additional 'random sampling' errors in base calling occur.

Conclusions

Our study provides important insights into systematic biases and data variability that need to be considered when utilizing NGS platforms for population targeted sequencing studies. 相似文献

2.

Development and application of a next-generation-sequencing (NGS) approach to detect known and novel gene defects underlying retinal diseases

Isabelle Audo Kinga M Bujakowska Thierry Léveillard Saddek Mohand-Sa?d Marie-Elise Lancelot Aurore Germain Aline Antonio Christelle Michiels Jean-Paul Saraiva Mélanie Letexier José-Alain Sahel Shomi S Bhattacharya Christina Zeitz 《Orphanet journal of rare diseases》2012,7(1):1-17

Background

Inherited retinal disorders are clinically and genetically heterogeneous with more than 150 gene defects accounting for the diversity of disease phenotypes. So far, mutation detection was mainly performed by APEX technology and direct Sanger sequencing of known genes. However, these methods are time consuming, expensive and unable to provide a result if the patient carries a new gene mutation. In addition, multiplicity of phenotypes associated with the same gene defect may be overlooked.

Methods

To overcome these challenges, we designed an exon sequencing array to target 254 known and candidate genes using Agilent capture. Subsequently, 20 DNA samples from 17 different families, including four patients with known mutations were sequenced using Illumina Genome Analyzer IIx next-generation-sequencing (NGS) platform. Different filtering approaches were applied to identify the genetic defect. The most likely disease causing variants were analyzed by Sanger sequencing. Co-segregation and sequencing analysis of control samples validated the pathogenicity of the observed variants.

Results

The phenotype of the patients included retinitis pigmentosa, congenital stationary night blindness, Best disease, early-onset cone dystrophy and Stargardt disease. In three of four control samples with known genotypes NGS detected the expected mutations. Three known and five novel mutations were identified in NR2E3, PRPF3, EYS, PRPF8, CRB1, TRPM1 and CACNA1F. One of the control samples with a known genotype belongs to a family with two clinical phenotypes (Best and CSNB), where a novel mutation was identified for CSNB. In six families the disease associated mutations were not found, indicating that novel gene defects remain to be identified.

Conclusions

In summary, this unbiased and time-efficient NGS approach allowed mutation detection in 75% of control cases and in 57% of test cases. Furthermore, it has the possibility of associating known gene defects with novel phenotypes and mode of inheritance. 相似文献

3.

Comparison of solution-based exome capture methods for next generation sequencing

Sulonen AM Ellonen P Almusa H Lepistö M Eldfors S Hannula S Miettinen T Tyynismaa H Salo P Heckman C Joensuu H Raivio T Suomalainen A Saarela J 《Genome biology》2011,12(9):R94-18

Background

Techniques enabling targeted re-sequencing of the protein coding sequences of the human genome on next generation sequencing instruments are of great interest. We conducted a systematic comparison of the solution-based exome capture kits provided by Agilent and Roche NimbleGen. A control DNA sample was captured with all four capture methods and prepared for Illumina GAII sequencing. Sequence data from additional samples prepared with the same protocols were also used in the comparison.

Results

We developed a bioinformatics pipeline for quality control, short read alignment, variant identification and annotation of the sequence data. In our analysis, a larger percentage of the high quality reads from the NimbleGen captures than from the Agilent captures aligned to the capture target regions. High GC content of the target sequence was associated with poor capture success in all exome enrichment methods. Comparison of mean allele balances for heterozygous variants indicated a tendency to have more reference bases than variant bases in the heterozygous variant positions within the target regions in all methods. There was virtually no difference in the genotype concordance compared to genotypes derived from SNP arrays. A minimum of 11× coverage was required to make a heterozygote genotype call with 99% accuracy when compared to common SNPs on genome-wide association arrays.

Conclusions

Libraries captured with NimbleGen kits aligned more accurately to the target regions. The updated NimbleGen kit most efficiently covered the exome with a minimum coverage of 20×, yet none of the kits captured all the Consensus Coding Sequence annotated exons. 相似文献

4.

A comparative study of <Emphasis Type="Italic">k</Emphasis>-spectrum-based error correction methods for next-generation sequencing data analysis

Isaac?Akogwu Nan?Wang Chaoyang?Zhang Ping?Gong Email author 《Human genomics》2016,10(2):20

Background

Innumerable opportunities for new genomic research have been stimulated by advancement in high-throughput next-generation sequencing (NGS). However, the pitfall of NGS data abundance is the complication of distinction between true biological variants and sequence error alterations during downstream analysis. Many error correction methods have been developed to correct erroneous NGS reads before further analysis, but independent evaluation of the impact of such dataset features as read length, genome size, and coverage depth on their performance is lacking. This comparative study aims to investigate the strength and weakness as well as limitations of some newest k-spectrum-based methods and to provide recommendations for users in selecting suitable methods with respect to specific NGS datasets.

Methods

Six k-spectrum-based methods, i.e., Reptile, Musket, Bless, Bloocoo, Lighter, and Trowel, were compared using six simulated sets of paired-end Illumina sequencing data. These NGS datasets varied in coverage depth (10× to 120×), read length (36 to 100 bp), and genome size (4.6 to 143 MB). Error Correction Evaluation Toolkit (ECET) was employed to derive a suite of metrics (i.e., true positives, false positive, false negative, recall, precision, gain, and F-score) for assessing the correction quality of each method.

Results

Results from computational experiments indicate that Musket had the best overall performance across the spectra of examined variants reflected in the six datasets. The lowest accuracy of Musket (F-score?=?0.81) occurred to a dataset with a medium read length (56 bp), a medium coverage (50×), and a small-sized genome (5.4 MB). The other five methods underperformed (F-score?<?0.80) and/or failed to process one or more datasets.

Conclusions

This study demonstrates that various factors such as coverage depth, read length, and genome size may influence performance of individual k-spectrum-based error correction methods. Thus, efforts have to be paid in choosing appropriate methods for error correction of specific NGS datasets. Based on our comparative study, we recommend Musket as the top choice because of its consistently superior performance across all six testing datasets. Further extensive studies are warranted to assess these methods using experimental datasets generated by NGS platforms (e.g., 454, SOLiD, and Ion Torrent) under more diversified parameter settings (k-mer values and edit distances) and to compare them against other non-k-spectrum-based classes of error correction methods.

相似文献

5.

The functional spectrum of low-frequency coding variation

Marth GT Yu F Indap AR Garimella K Gravel S Leong WF Tyler-Smith C Bainbridge M Blackwell T Zheng-Bradley X Chen Y Challis D Clarke L Ball EV Cibulskis K Cooper DN Fulton B Hartl C Koboldt D Muzny D Smith R Sougnez C Stewart C Ward A Yu J Xue Y Altshuler D Bustamante CD Clark AG Daly M DePristo M Flicek P Gabriel S Mardis E Palotie A Gibbs R; Genomes Project 《Genome biology》2011,12(9):R84-17

Background

Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.

Results

The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.

Conclusions

This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation. 相似文献

6.

Exome localization of complex disease association signals

Benjamin Lehne Cathryn M Lewis Thomas Schlitt 《BMC genomics》2011,12(1):1-9

Background

Recent progress in high-throughput technologies has greatly contributed to the development of DNA methylation profiling. Although there are several reports that describe methylome detection of whole genome bisulfite sequencing, the high cost and heavy demand on bioinformatics analysis prevents its extensive application. Thus, current strategies for the study of mammalian DNA methylomes is still based primarily on genome-wide methylated DNA enrichment combined with DNA microarray detection or sequencing. Methylated DNA enrichment is a key step in a microarray based genome-wide methylation profiling study, and even for future high-throughput sequencing based methylome analysis.

Results

In order to evaluate the sensitivity and accuracy of methylated DNA enrichment, we investigated and optimized a number of important parameters to improve the performance of several enrichment assays, including differential methylation hybridization (DMH), microarray-based methylation assessment of single samples (MMASS), and methylated DNA immunoprecipitation (MeDIP). With advantages and disadvantages unique to each approach, we found that assays based on methylation-sensitive enzyme digestion and those based on immunoprecipitation detected different methylated DNA fragments, indicating that they are complementary in their relative ability to detect methylation differences.

Conclusions

Our study provides the first comprehensive evaluation for widely used methodologies for methylated DNA enrichment, and could be helpful for developing a cost effective approach for DNA methylation profiling. 相似文献

7.

GPMiner: an integrated system for mining combinatorial cis-regulatory elements in mammalian gene group

Lee Tzong-Yi Chang Wen-Chi Hsu Justin Bo-Kai Chang Tzu-Hao Shien Dray-Ming 《BMC genomics》2012,13(1):1-12

Background

Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences.

Results

We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates.

Conclusion

We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of DNA starting material. 相似文献

8.

Identification of the ancestral killer immunoglobulin-like receptor gene in primates

Jennifer G Sambrook Arman Bashirova Hanne Andersen Mike Piatak George S Vernikos Penny Coggill Jeff D Lifson Mary Carrington Stephan Beck 《BMC genomics》2006,7(1):1-8

Background

The recent advancement in human genome sequencing and genotyping has revealed millions of single nucleotide polymorphisms (SNP) which determine the variation among human beings. One of the particular important projects is The International HapMap Project which provides the catalogue of human genetic variation for disease association studies. In this paper, we analyzed the genotype data in HapMap project by using National Institute of Environmental Health Sciences Environmental Genome Project (NIEHS EGP) SNPs. We first determine whether the HapMap data are transferable to the NIEHS data. Then, we study how well the HapMap SNPs capture the untyped SNPs in the region. Finally, we provide general guidelines for determining whether the SNPs chosen from HapMap may be able to capture most of the untyped SNPs.

Results

Our analysis shows that HapMap data are not robust enough to capture the untyped variants for most of the human genes. The performance of SNPs for European and Asian samples are marginal in capturing the untyped variants, i.e. approximately 55%. Expectedly, the SNPs from HapMap YRI panel can only capture approximately 30% of the variants. Although the overall performance is low, however, the SNPs for some genes perform very well and are able to capture most of the variants along the gene. This is observed in the European and Asian panel, but not in African panel. Through observation, we concluded that in order to have a well covered SNPs reference panel, the SNPs density and the association among reference SNPs are important to estimate the robustness of the chosen SNPs.

Conclusion

We have analyzed the coverage of HapMap SNPs using NIEHS EGP data. The results show that HapMap SNPs are transferable to the NIEHS SNPs. However, HapMap SNPs cannot capture some of the untyped SNPs and therefore resequencing may be needed to uncover more SNPs in the missing region. 相似文献

9.

Identification of sequence variants in genetic disease-causing genes using targeted next-generation sequencing

Wei X Ju X Yi X Zhu Q Qu N Liu T Chen Y Jiang H Yang G Zhen R Lan Z Qi M Wang J Yang Y Chu Y Li X Guang Y Huang J 《PloS one》2011,6(12):e29500

Background

Identification of gene variants plays an important role in research on and diagnosis of genetic diseases. A combination of enrichment of targeted genes and next-generation sequencing (targeted DNA-HiSeq) results in both high efficiency and low cost for targeted sequencing of genes of interest.

Methodology/Principal Findings

To identify mutations associated with genetic diseases, we designed an array-based gene chip to capture all of the exons of 193 genes involved in 103 genetic diseases. To evaluate this technology, we selected 7 samples from seven patients with six different genetic diseases resulting from six disease-causing genes and 100 samples from normal human adults as controls. The data obtained showed that on average, 99.14% of 3,382 exons with more than 30-fold coverage were successfully detected using Targeted DNA-HiSeq technology, and we found six known variants in four disease-causing genes and two novel mutations in two other disease-causing genes (the STS gene for XLI and the FBN1 gene for MFS) as well as one exon deletion mutation in the DMD gene. These results were confirmed in their entirety using either the Sanger sequencing method or real-time PCR.

Conclusions/Significance

Targeted DNA-HiSeq combines next-generation sequencing with the capture of sequences from a relevant subset of high-interest genes. This method was tested by capturing sequences from a DNA library through hybridization to oligonucleotide probes specific for genetic disorder-related genes and was found to show high selectivity, improve the detection of mutations, enabling the discovery of novel variants, and provide additional indel data. Thus, targeted DNA-HiSeq can be used to analyze the gene variant profiles of monogenic diseases with high sensitivity, fidelity, throughput and speed. 相似文献

10.

Interpretation of custom designed Illumina genotype cluster plots for targeted association studies and next-generation sequence validation

Elizabeth A Tindall Desiree C Petersen Stina Nikolaysen Webb Miller Stephan C Schuster Vanessa M Hayes 《BMC research notes》2010,3(1):39

Background

High-throughput custom designed genotyping arrays are a valuable resource for biologically focused research studies and increasingly for validation of variation predicted by next-generation sequencing (NGS) technologies. We investigate the Illumina GoldenGate chemistry using custom designed VeraCode and sentrix array matrix (SAM) assays for each of these applications, respectively. We highlight applications for interpretation of Illumina generated genotype cluster plots to maximise data inclusion and reduce genotyping errors.

Findings

We illustrate the dramatic effect of outliers in genotype calling and data interpretation, as well as suggest simple means to avoid genotyping errors. Furthermore we present this platform as a successful method for two-cluster rare or non-autosomal variant calling. The success of high-throughput technologies to accurately call rare variants will become an essential feature for future association studies. Finally, we highlight additional advantages of the Illumina GoldenGate chemistry in generating unusually segregated cluster plots that identify potential NGS generated sequencing error resulting from minimal coverage.

Conclusions

We demonstrate the importance of visually inspecting genotype cluster plots generated by the Illumina software and issue warnings regarding commonly accepted quality control parameters. In addition to suggesting applications to minimise data exclusion, we propose that the Illumina cluster plots may be helpful in identifying potential in-put sequence errors, particularly important for studies to validate NGS generated variation.

相似文献

11.

Accurate detection of subclonal single nucleotide variants in whole genome amplified and pooled cancer samples using HaloPlex target enrichment

Eva C Berglund Carl M?rten Lindqvist Shahina Hayat Elin ?vern?s Niklas Henriksson Jessica Nordlund Per Wahlberg Erik Forestier Gudmar L?nnerholm Ann-Christine Syv?nen 《BMC genomics》2013,14(1)

Background

Target enrichment and resequencing is a widely used approach for identification of cancer genes and genetic variants associated with diseases. Although cost effective compared to whole genome sequencing, analysis of many samples constitutes a significant cost, which could be reduced by pooling samples before capture. Another limitation to the number of cancer samples that can be analyzed is often the amount of available tumor DNA. We evaluated the performance of whole genome amplified DNA and the power to detect subclonal somatic single nucleotide variants in non-indexed pools of cancer samples using the HaloPlex technology for target enrichment and next generation sequencing.

Results

We captured a set of 1528 putative somatic single nucleotide variants and germline SNPs, which were identified by whole genome sequencing, with the HaloPlex technology and sequenced to a depth of 792–1752. We found that the allele fractions of the analyzed variants are well preserved during whole genome amplification and that capture specificity or variant calling is not affected. We detected a large majority of the known single nucleotide variants present uniquely in one sample with allele fractions as low as 0.1 in non-indexed pools of up to ten samples. We also identified and experimentally validated six novel variants in the samples included in the pools.

Conclusion

Our work demonstrates that whole genome amplified DNA can be used for target enrichment equally well as genomic DNA and that accurate variant detection is possible in non-indexed pools of cancer samples. These findings show that analysis of a large number of samples is feasible at low cost, even when only small amounts of DNA is available, and thereby significantly increases the chances of indentifying recurrent mutations in cancer samples.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-14-856) contains supplementary material, which is available to authorized users. 相似文献

12.

Einführung in die Grundlagen der Hochdurchsatzsequenzierung

K. Neveling A. Hoischen 《Medizinische Genetik》2014,26(2):231-238

Background

Next generation sequencing is the new sequencing method for DNA. But what does this actually mean, and how does it differ from Sanger sequencing? In this review, insights into next generation sequencing are provided, while this does not represent a single technology, but rather comprises many different new techniques.

Technology and application

The currently most commonly used sequencing machines and techniques are explained in detail. Thereby, similarities in techniques, but also the differences, advantages and disadvantages are described. One has to realize that the reader will learn that not one machine is perfect for all applications, but that the best machine has to be chosen for a given application. In addition, the possibility of outsourcing is discussed and could be interesting for some laboratories. Furthermore, analogous to the polymerase chain reaction for Sanger sequencing, one also has to enrich for the region of interest for most NGS applications. For this purpose, various methods can be selected, depending on the number of genes and samples to be investigated.

Future perspectives

Insights into future technologies are provided, underlining that the genetic revolution is ongoing. 相似文献

13.

Validation and assessment of variant calling pipelines for next-generation sequencing

Mehdi?Pirooznia Melissa?Kramer Jennifer?Parla Fernando?S?Goes James?B?Potash W?Richard?McCombie Peter?P?Zandi Email author 《Human genomics》2014,8(1):14

Background

The processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.

Results

We developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.

Conclusions

Our findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.

相似文献

14.

Validation of Next-Generation Sequencing of Entire Mitochondrial Genomes and the Diversity of Mitochondrial DNA Mutations in Oral Squamous Cell Carcinoma

Anita Kloss-Brandst?tter Hansi Weissensteiner Gertraud Erhart Georg Sch?fer Lukas Forer Sebastian Sch?nherr Dominic Pacher Christof Seifarth Andrea St?ckl Liane Fendt Irma Sottsas Helmut Klocker Christian W. Huck Michael Rasse Florian Kronenberg Frank R. Kloss 《PloS one》2015,10(8)

Background

Oral squamous cell carcinoma (OSCC) is mainly caused by smoking and alcohol abuse and shows a five-year survival rate of ~50%. We aimed to explore the variation of somatic mitochondrial DNA (mtDNA) mutations in primary oral tumors, recurrences and metastases.

Methods

We performed an in-depth validation of mtDNA next-generation sequencing (NGS) on an Illumina HiSeq 2500 platform for its application to cancer tissues, with the goal to detect low-level heteroplasmies and to avoid artifacts. Therefore we genotyped the mitochondrial genome (16.6 kb) from 85 tissue samples (tumors, recurrences, resection edges, metastases and blood) collected from 28 prospectively recruited OSCC patients applying both Sanger sequencing and high-coverage NGS (~35,000 reads per base).

Results

We observed a strong correlation between Sanger sequencing and NGS in estimating the mixture ratio of heteroplasmies (r = 0.99; p<0.001). Non-synonymous heteroplasmic variants were enriched among cancerous tissues. The proportions of somatic and inherited variants in a given gene region were strongly correlated (r = 0.85; p<0.001). Half of the patients shared mutations between benign and cancerous tissue samples. Low level heteroplasmies (<10%) were more frequent in benign samples compared to tumor samples, where heteroplasmies >10% were predominant. Four out of six patients who developed a local tumor recurrence showed mutations in the recurrence that had also been observed in the primary tumor. Three out of five patients, who had tumor metastases in the lymph nodes of their necks, shared mtDNA mutations between primary tumors and lymph node metastases. The percentage of mutation heteroplasmy increased from the primary tumor to lymph node metastases.

Conclusions

We conclude that Sanger sequencing is valid for heteroplasmy quantification for heteroplasmies ≥10% and that NGS is capable of reliably detecting and quantifying heteroplasmies down to the 1%-level. The finding of shared mutations between primary tumors, recurrences and metastasis indicates a clonal origin of malignant cells in oral cancer. 相似文献

15.

Ultrasensitive single-genome sequencing: accurate,targeted, next generation sequencing of HIV-1 RNA

Valerie?F.?Boltz Email author Jason?Rausch Wei?Shao Junko?Hattori Brian?Luke Frank?Maldarelli John?W.?Mellors Mary?F.?Kearney John?M.?Coffin 《Retrovirology》2016,13(1):87

相似文献

16.

Quantitative prediction of the effect of genetic variation using hidden Markov models

Mingming Liu Layne T Watson Liqing Zhang 《BMC bioinformatics》2014,15(1):1-10

Background

With the development of sequencing technologies, more and more sequence variants are available for investigation. Different classes of variants in the human genome have been identified, including single nucleotide substitutions, insertion and deletion, and large structural variations such as duplications and deletions. Insertion and deletion (indel) variants comprise a major proportion of human genetic variation. However, little is known about their effects on humans. The absence of understanding is largely due to the lack of both biological data and computational resources.

Results

This paper presents a new indel functional prediction method HMMvar based on HMM profiles, which capture the conservation information in sequences. The results demonstrate that a scoring strategy based on HMM profiles can achieve good performance in identifying deleterious or neutral variants for different data sets, and can predict the protein functional effects of both single and multiple mutations.

Conclusions

This paper proposed a quantitative prediction method, HMMvar, to predict the effect of genetic variation using hidden Markov models. The HMM based pipeline program implementing the method HMMvar is freely available at https://bioinformatics.cs.vt.edu/zhanglab/hmm. 相似文献

17.

Targeted Next Generation Sequencing as a Reliable Diagnostic Assay for the Detection of Somatic Mutations in Tumours Using Minimal DNA Amounts from Formalin Fixed Paraffin Embedded Material

Wendy W. J. de Leng Christa G. Gadellaa-van Hooijdonk Fran?oise A. S. Barendregt-Smouter Marco J. Koudijs Ies Nijman John W. J. Hinrichs Edwin Cuppen Stef van Lieshout Robert D. Loberg Maja de Jonge Emile E. Voest Roel A. de Weger Neeltje Steeghs Marlies H. G. Langenberg Stefan Sleijfer Stefan M. Willems Martijn P. Lolkema 《PloS one》2016,11(2)

Background

Targeted Next Generation Sequencing (NGS) offers a way to implement testing of multiple genetic aberrations in diagnostic pathology practice, which is necessary for personalized cancer treatment. However, no standards regarding input material have been defined. This study therefore aimed to determine the effect of the type of input material (e.g. formalin fixed paraffin embedded (FFPE) versus fresh frozen (FF) tissue) on NGS derived results. Moreover, this study aimed to explore a standardized analysis pipeline to support consistent clinical decision-making.

Method

We used the Ion Torrent PGM sequencing platform in combination with the Ion AmpliSeq Cancer Hotspot Panel v2 to sequence frequently mutated regions in 50 cancer related genes, and validated the NGS detected variants in 250 FFPE samples using standard diagnostic assays. Next, 386 tumour samples were sequenced to explore the effect of input material on variant detection variables. For variant calling, Ion Torrent analysis software was supplemented with additional variant annotation and filtering.

Results

Both FFPE and FF tissue could be sequenced reliably with a sensitivity of 99.1%. Validation showed a 98.5% concordance between NGS and conventional sequencing techniques, where NGS provided both the advantage of low input DNA concentration and the detection of low-frequency variants. The reliability of mutation analysis could be further improved with manual inspection of sequence data.

Conclusion

Targeted NGS can be reliably implemented in cancer diagnostics using both FFPE and FF tissue when using appropriate analysis settings, even with low input DNA. 相似文献

18.

Comprehensive comparison of three commercial human whole-exome capture platforms

Asan Xu Y Jiang H Tyler-Smith C Xue Y Jiang T Wang J Wu M Liu X Tian G Wang J Wang J Yang H Zhang X 《Genome biology》2011,12(9):R95-12

Background

Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.

Results

We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.

Conclusions

We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set. 相似文献

19.

Clinical analysis of germline copy number variation in DMD using a non-conjugate hierarchical Bayesian model

Velina Kozareva Clayton Stroff Maxwell Silver Jonathan F. Freidin Nigel F. Delaney 《BMC medical genomics》2018,11(1):91

Background

Detection of copy number variants (CNVs) is an important aspect of clinical testing for several disorders, including Duchenne muscular dystrophy, and is often performed using multiplex ligation-dependent probe amplification (MLPA). However, since many genetic carrier screens depend instead on next-generation sequencing (NGS) for wider discovery of small variants, they often do not include CNV analysis. Moreover, most computational techniques developed to detect CNVs from exome sequencing data are not suitable for carrier screening, as they require matched normals, very large cohorts, or extensive gene panels.

Methods

We present a computational software package, geneCNV (http://github.com/vkozareva/geneCNV), which can identify exon-level CNVs using exome sequencing data from only a few genes. The tool relies on a hierarchical parametric model trained on a small cohort of reference samples.

Results

Using geneCNV, we accurately inferred heterozygous CNVs in the DMD gene across a cohort of 15 test subjects. These results were validated against MLPA, the current standard for clinical CNV analysis in DMD. We also benchmarked the tool’s performance against other computational techniques and found comparable or improved CNV detection in DMD using data from panels ranging from 4,000 genes to as few as 8 genes.

Conclusions

geneCNV allows for the creation of cost-effective screening panels by allowing NGS sequencing approaches to generate results equivalent to bespoke genotyping assays like MLPA. By using a parametric model to detect CNVs, it also fulfills regulatory requirements to define a reference range for a genetic test. It is freely available and can be incorporated into any Illumina sequencing pipeline to create clinical assays for detection of exon duplications and deletions.

相似文献

20.

Skeletal muscle alterations and exercise performance decrease in erythropoietin-deficient mice: a comparative study

Laurence Mille-Hamard Veronique L Billat Elodie Henry Blandine Bonnamy Florence Joly Philippe Benech Eric Barrey 《BMC medical genomics》2012,5(1):1-20

相似文献