期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing

Mi-Hyun Park Hwanseok Rhee Jung Hoon Park Hae-Mi Woo Byung-Ok Choi Bo-Young Kim Ki Wha Chung Yoo-Bok Cho Hyung Jin Kim Ji-Won Jung Soo Kyung Koo 《PloS one》2014,9(1)

Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF]≥20 and AF<80 in SAMtools, SB<–10 in GATK). Moreover, the validation rate increased significantly (up to 97–99%) when the variant was filtered together with the suggested values of mapping quality (MQ), SNPQ and SB. This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses. 相似文献

2.

体细胞变异对神经系统常见肿瘤和发育异常类疾病的致病性

刘芳宋小珍谢华陈晓丽《遗传》2016,38(3):196-205

在生物体发育过程中各种内源性及外源性因素均可造成DNA损伤,引起体细胞变异.研究表明体细胞变异对肿瘤具有致病性作用,而体细胞变异对神经系统发育异常类疾病的致病性鲜有报道.新一代测序技术的发展,尤其是全外显子测序,靶向深度测序的应用大大提高了低频体细胞变异检出的敏感性,使科研人员重新认识了体细胞变异在神经系统肿瘤和发育异常类疾病发生中的致病性.本文综述了体细胞变异在神经系统肿瘤和发育异常类疾病致病性方面的研究进展,旨在为今后研究该类疾病的遗传病因提供新的思路,同时也为新药开发提供理论依据. 相似文献

3.

Detailed evaluation of cancer sequencing pipelines in different microenvironments and heterogeneity levels

Batuhan K&#x;sakol ahin Sar&#x;han Mehmet Arif Ergün Mehmet Baysan 《Turkish Journal of Biology》2021,45(2):114

The importance of next generation sequencing (NGS) rises in cancer research as accessing this key technology becomes easier for researchers. The sequence data created by NGS technologies must be processed by various bioinformatics algorithms within a pipeline in order to convert raw data to meaningful information. Mapping and variant calling are the two main steps of these analysis pipelines, and many algorithms are available for these steps. Therefore, detailed benchmarking of these algorithms in different scenarios is crucial for the efficient utilization of sequencing technologies. In this study, we compared the performance of twelve pipelines (three mapping and four variant discovery algorithms) with recommended settings to capture single nucleotide variants. We observed significant discrepancy in variant calls among tested pipelines for different heterogeneity levels in real and simulated samples with overall high specificity and low sensitivity. Additional to the individual evaluation of pipelines, we also constructed and tested the performance of pipeline combinations. In these analyses, we observed that certain pipelines complement each other much better than others and display superior performance than individual pipelines. This suggests that adhering to a single pipeline is not optimal for cancer sequencing analysis and sample heterogeneity should be considered in algorithm optimization. 相似文献

4.

Quantitative and Sensitive Detection of GNAS Mutations Causing McCune-Albright Syndrome with Next Generation Sequencing

Satoshi Narumi Kumihiro Matsuo Tomohiro Ishii Yusuke Tanahashi Tomonobu Hasegawa 《PloS one》2013,8(3)

Somatic activating GNAS mutations cause McCune-Albright syndrome (MAS). Owing to low mutation abundance, mutant-specific enrichment procedures, such as the peptide nucleic acid (PNA) method, are required to detect mutations in peripheral blood. Next generation sequencing (NGS) can analyze millions of PCR amplicons independently, thus it is expected to detect low-abundance GNAS mutations quantitatively. In the present study, we aimed to develop an NGS-based method to detect low-abundance somatic GNAS mutations. PCR amplicons encompassing exons 8 and 9 of GNAS, in which most activating mutations occur, were sequenced on the MiSeq instrument. As expected, our NGS-based method could sequence the GNAS locus with very high read depth (approximately 100,000) and low error rate. A serial dilution study with use of cloned mutant and wildtype DNA samples showed a linear correlation between dilution and measured mutation abundance, indicating the reliability of quantification of the mutation. Using the serially diluted samples, the detection limits of three mutation detection methods (the PNA method, NGS, and combinatory use of PNA and NGS [PNA-NGS]) were determined. The lowest detectable mutation abundance was 1% for the PNA method, 0.03% for NGS and 0.01% for PNA-NGS. Finally, we analyzed 16 MAS patient-derived leukocytic DNA samples with the three methods, and compared the mutation detection rate of them. Mutation detection rate of the PNA method, NGS and PNA-NGS in 16 patient-derived peripheral blood samples were 56%, 63% and 75%, respectively. In conclusion, NGS can detect somatic activating GNAS mutations quantitatively and sensitively from peripheral blood samples. At present, the PNA-NGS method is likely the most sensitive method to detect low-abundance GNAS mutation. 相似文献

5.

Comprehensive Mutation Analysis for Congenital Muscular Dystrophy: A Clinical PCR-Based Enrichment and Next-Generation Sequencing Panel

C. Alexander Valencia Arunkanth Ankala Devin Rhodenizer Shruti Bhide Martin Robert Littlejohn Lisa Mari Keong Anne Rutkowski Susan Sparks Carsten Bonnemann Madhuri Hegde 《PloS one》2013,8(1)

The congenital muscular dystrophies (CMDs) comprise a heterogeneous group of heritable muscle disorders with often difficult to interpret muscle pathology, making them challenging to diagnose. Serial Sanger sequencing of suspected CMD genes, while the current molecular diagnostic method of choice, can be slow and expensive. A comprehensive panel test for simultaneous screening of mutations in all known CMD-associated genes would be a more effective diagnostic strategy. Thus, the CMDs are a model disorder group for development and validation of next-generation sequencing (NGS) strategies for diagnostic and clinical care applications. Using a highly multiplexed PCR-based target enrichment method (RainDance) in conjunction with NGS, we performed mutation detection in all CMD genes of 26 samples and compared the results with Sanger sequencing. The RainDance NGS panel showed great consistency in coverage depth, on-target efficiency, versatility of mutation detection, and genotype concordance with Sanger sequencing, demonstrating the test''s appropriateness for clinical use. Compared to single tests, a higher diagnostic yield was observed by panel implementation. The panel''s limitation is the amplification failure of select gene-specific exons which require Sanger sequencing for test completion. Successful validation and application of the CMD NGS panel to improve the diagnostic yield in a clinical laboratory was shown. 相似文献

6.

Validation and Application of a Custom-Designed Targeted Next-Generation Sequencing Panel for the Diagnostic Mutational Profiling of Solid Tumors

Guy Froyen An Broekmans Femke Hillen Karin Pat Ruth Achten Jeroen Mebis Jean-Luc Rummens Johan Willemse Brigitte Maes 《PloS one》2016,11(4)

The inevitable switch from standard molecular methods to next-generation sequencing for the molecular profiling of tumors is challenging for most diagnostic laboratories. However, fixed validation criteria for diagnostic accreditation are not in place because of the great variability in methods and aims. Here, we describe the validation of a custom panel of hotspots in 24 genes for the detection of somatic mutations in non-small cell lung carcinoma, colorectal carcinoma and malignant melanoma starting from FFPE sections, using 14, 36 and 5 cases, respectively. The targeted hotspots were selected for their present or future clinical relevance in solid tumor types. The target regions were enriched with the TruSeq approach starting from limited amounts of DNA. Cost effective sequencing of 12 pooled libraries was done using a micro flow cell on the MiSeq and subsequent data analysis with MiSeqReporter and VariantStudio. The entire workflow was diagnostically validated showing a robust performance with maximal sensitivity and specificity using as thresholds a variant allele frequency >5% and a minimal amplicon coverage of 300. We implemented this method through the analysis of 150 routine diagnostic samples and identified clinically relevant mutations in 16 genes including KRAS (32%), TP53 (32%), BRAF (12%), APC (11%), EGFR (8%) and NRAS (5%). Importantly, the highest success rate was obtained when using also the low quality DNA samples. In conclusion, we provide a workflow for the validation of targeted NGS by a custom-designed pan-solid tumor panel in a molecular diagnostic lab and demonstrate its robustness in a clinical setting. 相似文献

7.

Next‐generation sequencing (NGS)‐based identification of induced mutations in a doubly mutagenized tomato (Solanum lycopersicum) population

下载免费PDF全文

Prateek Gupta Sameera Devulapalli Bradley John Till Yellamaraju Sreelakshmi Rameshwar Sharma 《The Plant journal : for cell and molecular biology》2017,92(3):495-508

The identification of mutations in targeted genes has been significantly simplified by the advent of TILLING (Targeting Induced Local Lesions In Genomes), speeding up the functional genomic analysis of animals and plants. Next‐generation sequencing (NGS) is gradually replacing classical TILLING for mutation detection, as it allows the analysis of a large number of amplicons in short durations. The NGS approach was used to identify mutations in a population of Solanum lycopersicum (tomato) that was doubly mutagenized by ethylmethane sulphonate (EMS). Twenty‐five genes belonging to carotenoids and folate metabolism were PCR‐amplified and screened to identify potentially beneficial alleles. To augment efficiency, the 600‐bp amplicons were directly sequenced in a non‐overlapping manner in Illumina MiSeq, obviating the need for a fragmentation step before library preparation. A comparison of the different pooling depths revealed that heterozygous mutations could be identified up to 128‐fold pooling. An evaluation of six different software programs (camba , crisp , gatk unified genotyper , lofreq , snver and vipr ) revealed that no software program was robust enough to predict mutations with high fidelity. Among these, crisp and camba predicted mutations with lower false discovery rates. The false positives were largely eliminated by considering only mutations commonly predicted by two different software programs. The screening of 23.47 Mb of tomato genome yielded 75 predicted mutations, 64 of which were confirmed by Sanger sequencing with an average mutation density of 1/367 Kb. Our results indicate that NGS combined with multiple variant detection tools can reduce false positives and significantly speed up the mutation discovery rate. 相似文献

8.

Network-based cancer genomic data integration for pattern discovery

Zhu Fangfang Li Jiang Liu Juan Min Wenwen 《BMC genetics》2021,22(1):1-10

Background

Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency.

Results

Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%.

Conclusions

For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

相似文献

9.

multiSNV: a probabilistic approach for improving detection of somatic point mutations from multiple related tumour samples

Malvina Josephidou Andy G. Lynch Simon Tavaré 《Nucleic acids research》2015,43(9):e61

Somatic variant analysis of a tumour sample and its matched normal has been widely used in cancer research to distinguish germline polymorphisms from somatic mutations. However, due to the extensive intratumour heterogeneity of cancer, sequencing data from a single tumour sample may greatly underestimate the overall mutational landscape. In recent studies, multiple spatially or temporally separated tumour samples from the same patient were sequenced to identify the regional distribution of somatic mutations and study intratumour heterogeneity. There are a number of tools to perform somatic variant calling from matched tumour-normal next-generation sequencing (NGS) data; however none of these allow joint analysis of multiple same-patient samples. We discuss the benefits and challenges of multisample somatic variant calling and present multiSNV, a software package for calling single nucleotide variants (SNVs) using NGS data from multiple same-patient samples. Instead of performing multiple pairwise analyses of a single tumour sample and a matched normal, multiSNV jointly considers all available samples under a Bayesian framework to increase sensitivity of calling shared SNVs. By leveraging information from all available samples, multiSNV is able to detect rare mutations with variant allele frequencies down to 3% from whole-exome sequencing experiments. 相似文献

10.

Rapid identification and recovery of ENU-induced mutations with next-generation sequencing and Paired-End Low-Error analysis

Luyuan Pan Arish N Shah Ian G Phelps Dan Doherty Eric A Johnson Cecilia B Moens 《BMC genomics》2015,16(1)

Background

Targeting Induced Local Lesions IN Genomes (TILLING) is a reverse genetics approach to directly identify point mutations in specific genes of interest in genomic DNA from a large chemically mutagenized population. Classical TILLING processes, based on enzymatic detection of mutations in heteroduplex PCR amplicons, are slow and labor intensive.

Results

Here we describe a new TILLING strategy in zebrafish using direct next generation sequencing (NGS) of 250bp amplicons followed by Paired-End Low-Error (PELE) sequence analysis. By pooling a genomic DNA library made from over 9,000 N-ethyl-N-nitrosourea (ENU) mutagenized F1 fish into 32 equal pools of 288 fish, each with a unique Illumina barcode, we reduce the complexity of the template to a level at which we can detect mutations that occur in a single heterozygous fish in the entire library. MiSeq sequencing generates 250 base-pair overlapping paired-end reads, and PELE analysis aligns the overlapping sequences to each other and filters out any imperfect matches, thereby eliminating variants introduced during the sequencing process. We find that this filtering step reduces the number of false positive calls 50-fold without loss of true variant calls. After PELE we were able to validate 61.5% of the mutant calls that occurred at a frequency between 1 mutant call:100 wildtype calls and 1 mutant call:1000 wildtype calls in a pool of 288 fish. We then use high-resolution melt analysis to identify the single heterozygous mutation carrier in the 288-fish pool in which the mutation was identified.

Conclusions

Using this NGS-TILLING protocol we validated 28 nonsense or splice site mutations in 20 genes, at a two-fold higher efficiency than using traditional Cel1 screening. We conclude that this approach significantly increases screening efficiency and accuracy at reduced cost and can be applied in a wide range of organisms.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1263-4) contains supplementary material, which is available to authorized users. 相似文献

11.

Validation of Next-Generation Sequencing of Entire Mitochondrial Genomes and the Diversity of Mitochondrial DNA Mutations in Oral Squamous Cell Carcinoma

Anita Kloss-Brandst?tter Hansi Weissensteiner Gertraud Erhart Georg Sch?fer Lukas Forer Sebastian Sch?nherr Dominic Pacher Christof Seifarth Andrea St?ckl Liane Fendt Irma Sottsas Helmut Klocker Christian W. Huck Michael Rasse Florian Kronenberg Frank R. Kloss 《PloS one》2015,10(8)

Background

Oral squamous cell carcinoma (OSCC) is mainly caused by smoking and alcohol abuse and shows a five-year survival rate of ~50%. We aimed to explore the variation of somatic mitochondrial DNA (mtDNA) mutations in primary oral tumors, recurrences and metastases.

Methods

We performed an in-depth validation of mtDNA next-generation sequencing (NGS) on an Illumina HiSeq 2500 platform for its application to cancer tissues, with the goal to detect low-level heteroplasmies and to avoid artifacts. Therefore we genotyped the mitochondrial genome (16.6 kb) from 85 tissue samples (tumors, recurrences, resection edges, metastases and blood) collected from 28 prospectively recruited OSCC patients applying both Sanger sequencing and high-coverage NGS (~35,000 reads per base).

Results

We observed a strong correlation between Sanger sequencing and NGS in estimating the mixture ratio of heteroplasmies (r = 0.99; p<0.001). Non-synonymous heteroplasmic variants were enriched among cancerous tissues. The proportions of somatic and inherited variants in a given gene region were strongly correlated (r = 0.85; p<0.001). Half of the patients shared mutations between benign and cancerous tissue samples. Low level heteroplasmies (<10%) were more frequent in benign samples compared to tumor samples, where heteroplasmies >10% were predominant. Four out of six patients who developed a local tumor recurrence showed mutations in the recurrence that had also been observed in the primary tumor. Three out of five patients, who had tumor metastases in the lymph nodes of their necks, shared mtDNA mutations between primary tumors and lymph node metastases. The percentage of mutation heteroplasmy increased from the primary tumor to lymph node metastases.

Conclusions

We conclude that Sanger sequencing is valid for heteroplasmy quantification for heteroplasmies ≥10% and that NGS is capable of reliably detecting and quantifying heteroplasmies down to the 1%-level. The finding of shared mutations between primary tumors, recurrences and metastasis indicates a clonal origin of malignant cells in oral cancer. 相似文献

12.

Selection and mutation in the “new” genetics: an emerging hypothesis

Bruce Gottlieb Lenore K. Beitel Carlos Alvarado Mark A. Trifiro 《Human genetics》2010,127(5):491-501

It has been anticipated that new, much more sensitive, next generation sequencing (NGS) techniques, using massively parallel sequencing, will likely provide radical insights into the genetics of multifactorial diseases. While NGS has been used initially to analyze individual human genomes, and has revealed considerable differences between healthy individuals, we have used NGS to examine genetic variation within individuals, by sequencing tissues “in depth”, i.e., oversequencing many thousands of times. Initial studies have revealed intra-tissue genetic heterogeneity, in the form of multiple variants of a single gene that exist as distinct “majority and “minority” variants. This highly specialized form of somatic mosaicism has been found within both cancer and normal tissues. If such genetic variation within individual tissues is widespread, it will need to be considered as a significant factor in the ontogeny of many multifactorial diseases, including cancer. The discovery of majority and minority gene variants and the resulting somatic cell heterogeneity in both normal and diseased tissues suggests that selection, as opposed to mutation, might be the critical event in disease ontogeny. We, therefore, are proposing a hypothesis to explain multifactorial disease ontogeny in which pre-existing multiple somatic gene variants, which may arise at a very early stage of tissue development, are eventually selected due to changes in tissue microenvironments. 相似文献

13.

Integration of Wet and Dry Bench Processes Optimizes Targeted Next-generation Sequencing of Low-quality and Low-quantity Tumor Biopsies

Jeffrey Houghton Andrew G. Hadd Robert Zeigler Brian C. Haynes Gary J. Latham 《Journal of visualized experiments : JoVE》2016,(110)

All next-generation sequencing (NGS) procedures include assays performed at the laboratory bench ("wet bench") and data analyses conducted using bioinformatics pipelines ("dry bench"). Both elements are essential to produce accurate and reliable results, which are particularly critical for clinical laboratories. Targeted NGS technologies have increasingly found favor in oncology applications to help advance precision medicine objectives, yet the methods often involve disconnected and variable wet and dry bench workflows and uncoordinated reagent sets. In this report, we describe a method for sequencing challenging cancer specimens with a 21-gene panel as an example of a comprehensive targeted NGS system. The system integrates functional DNA quantification and qualification, single-tube multiplexed PCR enrichment, and library purification and normalization using analytically-verified, single-source reagents with a standalone bioinformatics suite. As a result, accurate variant calls from low-quality and low-quantity formalin-fixed, paraffin-embedded (FFPE) and fine-needle aspiration (FNA) tumor biopsies can be achieved. The method can routinely assess cancer-associated variants from an input of 400 amplifiable DNA copies, and is modular in design to accommodate new gene content. Two different types of analytically-defined controls provide quality assurance and help safeguard call accuracy with clinically-relevant samples. A flexible "tag" PCR step embeds platform-specific adaptors and index codes to allow sample barcoding and compatibility with common benchtop NGS instruments. Importantly, the protocol is streamlined and can produce 24 sequence-ready libraries in a single day. Finally, the approach links wet and dry bench processes by incorporating pre-analytical sample quality control results directly into the variant calling algorithms to improve mutation detection accuracy and differentiate false-negative and indeterminate calls. This targeted NGS method uses advances in both wetware and software to achieve high-depth, multiplexed sequencing and sensitive analysis of heterogeneous cancer samples for diagnostic applications. 相似文献

14.

Host Subtraction,Filtering and Assembly Validations for Novel Viral Discovery Using Next Generation Sequencing Data

Gordon M. Daly Richard M. Leggett William Rowe Samuel Stubbs Maxim Wilkinson Ricardo H. Ramirez-Gonzalez Mario Caccamo William Bernal Jonathan L. Heeney 《PloS one》2015,10(6)

The use of next generation sequencing (NGS) to identify novel viral sequences from eukaryotic tissue samples is challenging. Issues can include the low proportion and copy number of viral reads and the high number of contigs (post-assembly), making subsequent viral analysis difficult. Comparison of assembly algorithms with pre-assembly host-mapping subtraction using a short-read mapping tool, a k-mer frequency based filter and a low complexity filter, has been validated for viral discovery with Illumina data derived from naturally infected liver tissue and simulated data. Assembled contig numbers were significantly reduced (up to 99.97%) by the application of these pre-assembly filtering methods. This approach provides a validated method for maximizing viral contig size as well as reducing the total number of assembled contigs that require down-stream analysis as putative viral nucleic acids. 相似文献

15.

CNV-CH: A Convex Hull Based Segmentation Approach to Detect Copy Number Variations (CNV) Using Next-Generation Sequencing Data

Rituparna Sinha Sandip Samaddar Rajat K. De 《PloS one》2015,10(8)

Copy number variation (CNV) is a form of structural alteration in the mammalian DNA sequence, which are associated with many complex neurological diseases as well as cancer. The development of next generation sequencing (NGS) technology provides us a new dimension towards detection of genomic locations with copy number variations. Here we develop an algorithm for detecting CNVs, which is based on depth of coverage data generated by NGS technology. In this work, we have used a novel way to represent the read count data as a two dimensional geometrical point. A key aspect of detecting the regions with CNVs, is to devise a proper segmentation algorithm that will distinguish the genomic locations having a significant difference in read count data. We have designed a new segmentation approach in this context, using convex hull algorithm on the geometrical representation of read count data. To our knowledge, most algorithms have used a single distribution model of read count data, but here in our approach, we have considered the read count data to follow two different distribution models independently, which adds to the robustness of detection of CNVs. In addition, our algorithm calls CNVs based on the multiple sample analysis approach resulting in a low false discovery rate with high precision. 相似文献

16.

Germline and somatic mutation profile in Cancer patients revealed by a medium-sized pan-Cancer panel

《Genomics》2021,113(4):1930-1939

Gene mutation detection and the resulted precision-medicine therapy is transforming clinical practice. Here, we report the use of a custom-developed, medium-sized, pan-cancer probe panel for the detection of somatic and germline mutations. We used a hybridization capture-based NGS assay for targeted deep sequencing of all exons and selected introns of 181 key cancer driver genes, covering both inherited risks and somatic mutations. We performed paired-variant calling on tumor samples and their matched normal samples. We processed clinical patient samples of formalin-fixed, paraffin embedded tumors (FFPE samples) and cell-free peripheral blood (cfDNA samples). We found germline mutations of inherited cancer risk at 9%; and discovered a novel germline mutation in BRCA1. Somatic mutation rate in driver genes is at 73.1%, much higher than previously reported. On recommending precision-medicine therapeutics, we achieved 91.6% for patients with FFPE samples. 相似文献

17.

Rare Variants Association Analysis in Large-Scale Sequencing Studies at the Single Locus Level

Xinge Jessie Jeng Zhongyin John Daye Wenbin Lu Jung-Ying Tzeng 《PLoS computational biology》2016,12(6)

Genetic association analyses of rare variants in next-generation sequencing (NGS) studies are fundamentally challenging due to the presence of a very large number of candidate variants at extremely low minor allele frequencies. Recent developments often focus on pooling multiple variants to provide association analysis at the gene instead of the locus level. Nonetheless, pinpointing individual variants is a critical goal for genomic researches as such information can facilitate the precise delineation of molecular mechanisms and functions of genetic factors on diseases. Due to the extreme rarity of mutations and high-dimensionality, significances of causal variants cannot easily stand out from those of noncausal ones. Consequently, standard false-positive control procedures, such as the Bonferroni and false discovery rate (FDR), are often impractical to apply, as a majority of the causal variants can only be identified along with a few but unknown number of noncausal variants. To provide informative analysis of individual variants in large-scale sequencing studies, we propose the Adaptive False-Negative Control (AFNC) procedure that can include a large proportion of causal variants with high confidence by introducing a novel statistical inquiry to determine those variants that can be confidently dispatched as noncausal. The AFNC provides a general framework that can accommodate for a variety of models and significance tests. The procedure is computationally efficient and can adapt to the underlying proportion of causal variants and quality of significance rankings. Extensive simulation studies across a plethora of scenarios demonstrate that the AFNC is advantageous for identifying individual rare variants, whereas the Bonferroni and FDR are exceedingly over-conservative for rare variants association studies. In the analyses of the CoLaus dataset, AFNC has identified individual variants most responsible for gene-level significances. Moreover, single-variant results using the AFNC have been successfully applied to infer related genes with annotation information. 相似文献

18.

Customisation of the exome data analysis pipeline using a combinatorial approach

Pattnaik S Vaidyanathan S Pooja DG Deepak S Panda B 《PloS one》2012,7(1):e30080

The advent of next generation sequencing (NGS) technologies have revolutionised the way biologists produce, analyse and interpret data. Although NGS platforms provide a cost-effective way to discover genome-wide variants from a single experiment, variants discovered by NGS need follow up validation due to the high error rates associated with various sequencing chemistries. Recently, whole exome sequencing has been proposed as an affordable option compared to whole genome runs but it still requires follow up validation of all the novel exomic variants. Customarily, a consensus approach is used to overcome the systematic errors inherent to the sequencing technology, alignment and post alignment variant detection algorithms. However, the aforementioned approach warrants the use of multiple sequencing chemistry, multiple alignment tools, multiple variant callers which may not be viable in terms of time and money for individual investigators with limited informatics know-how. Biologists often lack the requisite training to deal with the huge amount of data produced by NGS runs and face difficulty in choosing from the list of freely available analytical tools for NGS data analysis. Hence, there is a need to customise the NGS data analysis pipeline to preferentially retain true variants by minimising the incidence of false positives and make the choice of right analytical tools easier. To this end, we have sampled different freely available tools used at the alignment and post alignment stage suggesting the use of the most suitable combination determined by a simple framework of pre-existing metrics to create significant datasets. 相似文献

19.

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate

Klambauer G Schwarzbauer K Mayr A Clevert DA Mitterecker A Bodenhofer U Hochreiter S 《Nucleic acids research》2012,40(9):e69

Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose 'Copy Number estimation by a Mixture Of PoissonS' (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1-FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor. 相似文献

20.

Discovery of rare mutations in extensively pooled DNA samples using multiple target enrichment

Xu Chi Yingchun Zhang Zheyong Xue Laibao Feng Huaqing Liu Feng Wang Xiaoquan Qi 《Plant biotechnology journal》2014,12(6):709-717

Chemical mutagenesis is routinely used to create large numbers of rare mutations in plant and animal populations, which can be subsequently subjected to selection for beneficial traits and phenotypes that enable the characterization of gene functions. Several next‐generation sequencing (NGS)‐based target enrichment methods have been developed for the detection of mutations in target DNA regions. However, most of these methods aim to sequence a large number of target regions from a small number of individuals. Here, we demonstrate an effective and affordable strategy for the discovery of rare mutations in a large sodium azide‐induced mutant rice population (F₂). The integration of multiplex, semi‐nested PCR combined with NGS library construction allowed for the amplification of multiple target DNA fragments for sequencing. The 8 × 8 × 8 tridimensional DNA sample pooling strategy enabled us to obtain DNA sequences of 512 individuals while only sequencing 24 samples. A stepwise filtering procedure was then elaborated to eliminate most of the false positives expected to arise through sequencing error, and the application of a simple Student's t‐test against position‐prone error allowed for the discovery of 16 mutations from 36 enriched targeted DNA fragments of 1024 mutagenized rice plants, all without any false calls. 相似文献