首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Whole‐genome or whole‐exome sequencing (WGS/WES) of the affected proband together with normal parents (trio) is commonly adopted to identify de novo germline mutations (DNMs) underlying sporadic cases of various genetic disorders. However, our current knowledge of the occurrence and functional effects of DNMs remains limited and accurately identifying the disease‐causing DNM from a group of irrelevant DNMs is complicated. Herein, we provide a general‐purpose discussion of important issues related to pathogenic gene identification based on trio‐based WGS/WES data. Specifically, the relevance of DNMs to human sporadic diseases, current knowledge of DNM biogenesis mechanisms, and common strategies or software tools used for DNM detection are reviewed, followed by a discussion of pathogenic gene prioritization. In addition, several key factors that may affect DNM identification accuracy and causal gene prioritization are reviewed. Based on recent major advances, this review both sheds light on how trio‐based WGS/WES technologies can play a significant role in the identification of DNMs and causal genes for sporadic diseases, and also discusses existing challenges.  相似文献   

2.
Fundamental improvement was made for genome sequencing since the next-generation sequencing (NGS) came out in the 2000s. The newer technologies make use of the power of massively-parallel short-read DNA sequencing, genome alignment and assembly methods to digitally and rapidly search the genomes on a revolutionary scale, which enable large-scale whole genome sequencing (WGS) accessible and practical for researchers. Nowadays, whole genome sequencing is more and more prevalent in detecting the genetics of diseases, studying causative relations with cancers, making genome-level comparative analysis, reconstruction of human population history, and giving clinical implications and instructions. In this review, we first give a typical pipeline of whole genome sequencing, including the lab template preparation, sequencing, genome assembling and quality control, variants calling and annotations. We compare the difference between whole genome and whole exome sequencing (WES), and explore a wide range of applications of whole genome sequencing for both mendelian diseases and complex diseases in medical genetics. We highlight the impact of whole genome sequencing in cancer studies, regulatory variant analysis, predictive medicine and precision medicine, as well as discuss the challenges of the whole genome sequencing.   相似文献   

3.
With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor.
This is a PLOS Computational Biology Software Article.
  相似文献   

4.
5.
The number of polymorphisms identified with next‐generation sequencing approaches depends directly on the sequencing depth and therefore on the experimental cost. Although higher levels of depth ensure more sensitive and more specific SNP calls, economic constraints limit the increase of depth for whole‐genome resequencing (WGS). For this reason, capture resequencing is used for studies focusing on only some specific regions of the genome. However, several biases in capture resequencing are known to have a negative impact on the sensitivity of SNP detection. Within this framework, the aim of this study was to compare the accuracy of WGS and capture resequencing on SNP detection and genotype calling, which differ in terms of both sequencing depth and biases. Indeed, we have evaluated the SNP calling and genotyping accuracy in a WGS dataset (13X) and in a capture resequencing dataset (87X) performed on 11 individuals. The percentage of SNPs not identified due to a sevenfold sequencing depth decrease was estimated at 7.8% using a down‐sampling procedure on the capture sequencing dataset. A comparison of the 87X capture sequencing dataset with the WGS dataset revealed that capture‐related biases were leading with the loss of 5.2% of SNPs detected with WGS. Nevertheless, when considering the SNPs detected by both approaches, capture sequencing appears to achieve far better SNP genotyping, with about 4.4% of the WGS genotypes that can be considered as erroneous and even 10% focusing on heterozygous genotypes. In conclusion, WGS and capture deep sequencing can be considered equivalent strategies for SNP detection, as the rate of SNPs not identified because of a low sequencing depth in the former is quite similar to SNPs missed because of method biases of the latter. On the other hand, capture deep sequencing clearly appears more adapted for studies requiring great accuracy in genotyping.  相似文献   

6.
Cancer genomes exhibit profound somatic copy number alterations (SCNAs). Studying tumor SCNAs using massively parallel sequencing provides unprecedented resolution and meanwhile gives rise to new challenges in data analysis, complicated by tumor aneuploidy and heterogeneity as well as normal cell contamination. While the majority of read depth based methods utilize total sequencing depth alone for SCNA inference, the allele specific signals are undervalued. We proposed a joint segmentation and inference approach using both signals to meet some of the challenges. Our method consists of four major steps: 1) extracting read depth supporting reference and alternative alleles at each SNP/Indel locus and comparing the total read depth and alternative allele proportion between tumor and matched normal sample; 2) performing joint segmentation on the two signal dimensions; 3) correcting the copy number baseline from which the SCNA state is determined; 4) calling SCNA state for each segment based on both signal dimensions. The method is applicable to whole exome/genome sequencing (WES/WGS) as well as SNP array data in a tumor-control study. We applied the method to a dataset containing no SCNAs to test the specificity, created by pairing sequencing replicates of a single HapMap sample as normal/tumor pairs, as well as a large-scale WGS dataset consisting of 88 liver tumors along with adjacent normal tissues. Compared with representative methods, our method demonstrated improved accuracy, scalability to large cancer studies, capability in handling both sequencing and SNP array data, and the potential to improve the estimation of tumor ploidy and purity.  相似文献   

7.
Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost ‘capture’ method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.  相似文献   

8.
Schizophrenia is a common disorder with a high heritability, but its genetic architecture is still elusive.We implemented whole-genome sequencing(WGS) analysis of 8 families with monozygotic(MZ) twin pairs discordant for schizophrenia to assess potential association of de novo mutations(DNMs) or inherited variants with susceptibility to schizophrenia. Eight non-synonymous DNMs(including one splicing site) were identified and shared by twins, which were either located in previously reported schizophrenia risk genes(p.V24689 I mutation in TTN, p.S2506 T mutation in GCN1L1, IVS3+1G T in DOCK1) or had a benign to damaging effect according to in silico prediction analysis. By searching the inherited rare damaging or loss-of-function(LOF) variants and common susceptible alleles from three classes of schizophrenia candidate genes, we were able to distill genetic alterations in several schizophrenia risk genes, including GAD1, PLXNA2, RELN and FEZ1. Four inherited copy number variations(CNVs; including a large deletion at 16p13.11) implicated for schizophrenia were identified in four families, respectively. Most of families carried both missense DNMs and inherited risk variants, which might suggest that DNMs, inherited rare damaging variants and common risk alleles together conferred to schizophrenia susceptibility. Our results support that schizophrenia is caused by a combination of multiple genetic factors, with each DNM/variant showing a relatively small effect size.  相似文献   

9.

Background

Less than two percent of the human genome is protein coding, yet that small fraction harbours the majority of known disease causing mutations. Despite rapidly falling whole genome sequencing (WGS) costs, much research and increasingly the clinical use of sequence data is likely to remain focused on the protein coding exome. We set out to quantify and understand how WGS compares with the targeted capture and sequencing of the exome (exome-seq), for the specific purpose of identifying single nucleotide polymorphisms (SNPs) in exome targeted regions.

Results

We have compared polymorphism detection sensitivity and systematic biases using a set of tissue samples that have been subject to both deep exome and whole genome sequencing. The scoring of detection sensitivity was based on sequence down sampling and reference to a set of gold-standard SNP calls for each sample. Despite evidence of incremental improvements in exome capture technology over time, whole genome sequencing has greater uniformity of sequence read coverage and reduced biases in the detection of non-reference alleles than exome-seq. Exome-seq achieves 95% SNP detection sensitivity at a mean on-target depth of 40 reads, whereas WGS only requires a mean of 14 reads. Known disease causing mutations are not biased towards easy or hard to sequence areas of the genome for either exome-seq or WGS.

Conclusions

From an economic perspective, WGS is at parity with exome-seq for variant detection in the targeted coding regions. WGS offers benefits in uniformity of read coverage and more balanced allele ratio calls, both of which can in most cases be offset by deeper exome-seq, with the caveat that some exome-seq targets will never achieve sufficient mapped read depth for variant detection due to technical difficulties or probe failures. As WGS is intrinsically richer data that can provide insight into polymorphisms outside coding regions and reveal genomic rearrangements, it is likely to progressively replace exome-seq for many applications.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-247) contains supplementary material, which is available to authorized users.  相似文献   

10.
Identifying copy number variants (CNVs) can provide diagnoses to patients and provide important biological insights into human health and disease. Current exome and targeted sequencing approaches cannot detect clinically and biologically-relevant CNVs outside their target area. We present SavvyCNV, a tool which uses off-target read data from exome and targeted sequencing data to call germline CNVs genome-wide. Up to 70% of sequencing reads from exome and targeted sequencing fall outside the targeted regions. We have developed a new tool, SavvyCNV, to exploit this ‘free data’ to call CNVs across the genome. We benchmarked SavvyCNV against five state-of-the-art CNV callers using truth sets generated from genome sequencing data and Multiplex Ligation-dependent Probe Amplification assays. SavvyCNV called CNVs with high precision and recall, outperforming the five other tools at calling CNVs genome-wide, using off-target or on-target reads from targeted panel and exome sequencing. We then applied SavvyCNV to clinical samples sequenced using a targeted panel and were able to call previously undetected clinically-relevant CNVs, highlighting the utility of this tool within the diagnostic setting. SavvyCNV outperforms existing tools for calling CNVs from off-target reads. It can call CNVs genome-wide from targeted panel and exome data, increasing the utility and diagnostic yield of these tests. SavvyCNV is freely available at https://github.com/rdemolgen/SavvySuite.  相似文献   

11.
王翠翠  袁慧军 《遗传》2017,39(3):208-219
耳聋是一种常见的严重出生缺陷,阐明遗传性耳聋的致病机理不仅能够在临床上辅助诊断,为遗传咨询及耳聋预防提供依据,而且能促进人们更深入地了解耳聋的致病机制,开发新的治疗方法。随着基因组研究技术不断创新,以全基因组测序、全外显子组测序、目标区域测序为代表的高通量测序技术在遗传性耳聋研究中已得到广泛应用。本文总结了近5年全外显子组测序和目标区域测序在遗传性耳聋致病基因研究及临床分子诊断中应用及研究进展,希望能够有助于我国临床耳聋基因诊断技术的发展及诊断水平的提升。  相似文献   

12.
Next-generation sequencing (NGS) technologies provide the potential for developing high-throughput and low-cost platforms for clinical diagnostics. A limiting factor to clinical applications of genomic NGS is downstream bioinformatics analysis for data interpretation. We have developed an integrated approach for end-to-end clinical NGS data analysis from variant detection to functional profiling. Robust bioinformatics pipelines were implemented for genome alignment, single nucleotide polymorphism (SNP), small insertion/deletion (InDel), and copy number variation (CNV) detection of whole exome sequencing (WES) data from the Illumina platform. Quality-control metrics were analyzed at each step of the pipeline by use of a validated training dataset to ensure data integrity for clinical applications. We annotate the variants with data regarding the disease population and variant impact. Custom algorithms were developed to filter variants based on criteria, such as quality of variant, inheritance pattern, and impact of variant on protein function. The developed clinical variant pipeline links the identified rare variants to Integrated Genome Viewer for visualization in a genomic context and to the Protein Information Resource’s iProXpress for rich protein and disease information. With the application of our system of annotations, prioritizations, inheritance filters, and functional profiling and analysis, we have created a unique methodology for downstream variant filtering that empowers clinicians and researchers to interpret more effectively the relevance of genomic alterations within a rare genetic disease.  相似文献   

13.

Background

Whole exome sequencing (WES) is the state-of-the-art method for identification of pathogenic mutations in patients with a Mendelian disorder. WES comprehensively covers the coding sequence of the genome and is a fast and cost-effective technique.

Purpose

As most of the technical difficulties have been overcome for WES, the major issue is data processing and analysis to find the pathogenic sequence variation among tens of thousands of sequence changes. Bioinformatic analysis pipelines for filtering sequence variants have to be adapted according to the patients and family members examined by WES and the most likely inheritance pattern underlying the disease.

Possible approaches

Based on 4 cases, different variant prioritization strategies which led to identification of the most likely causative changes in the index patients are described.  相似文献   

14.

Background  

Whole exome capture sequencing allows researchers to cost-effectively sequence the coding regions of the genome. Although the exome capture sequencing methods have become routine and well established, there is currently a lack of tools specialized for variant calling in this type of data.  相似文献   

15.
Autism spectrum disorders (ASD) are a group of related neurodevelopmental disorders with significant combined prevalence (~1%) and high heritability. Dozens of individually rare genes and loci associated with high-risk for ASD have been identified, which overlap extensively with genes for intellectual disability (ID). However, studies indicate that there may be hundreds of genes that remain to be identified. The advent of inexpensive massively parallel nucleotide sequencing can reveal the genetic underpinnings of heritable complex diseases, including ASD and ID. However, whole exome sequencing (WES) and whole genome sequencing (WGS) provides an embarrassment of riches, where many candidate variants emerge. It has been argued that genetic variation for ASD and ID will cluster in genes involved in distinct pathways and protein complexes. For this reason, computational methods that prioritize candidate genes based on additional functional information such as protein-protein interactions or association with specific canonical or empirical pathways, or other attributes, can be useful. In this study we applied several supervised learning approaches to prioritize ASD or ID disease gene candidates based on curated lists of known ASD and ID disease genes. We implemented two network-based classifiers and one attribute-based classifier to show that we can rank and classify known, and predict new, genes for these neurodevelopmental disorders. We also show that ID and ASD share common pathways that perturb an overlapping synaptic regulatory subnetwork. We also show that features relating to neuronal phenotypes in mouse knockouts can help in classifying neurodevelopmental genes. Our methods can be applied broadly to other diseases helping in prioritizing newly identified genetic variation that emerge from disease gene discovery based on WES and WGS.  相似文献   

16.
High-throughput DNA sequencing (HTS) is of increasing importance in the life sciences. One of its most prominent applications is the sequencing of whole genomes or targeted regions of the genome such as all exonic regions (i.e., the exome). Here, the objective is the identification of genetic variants such as single nucleotide polymorphisms (SNPs). The extraction of SNPs from the raw genetic sequences involves many processing steps and the application of a diverse set of tools. We review the essential building blocks for a pipeline that calls SNPs from raw HTS data. The pipeline includes quality control, mapping of short reads to the reference genome, visualization and post-processing of the alignment including base quality recalibration. The final steps of the pipeline include the SNP calling procedure along with filtering of SNP candidates. The steps of this pipeline are accompanied by an analysis of a publicly available whole-exome sequencing dataset. To this end, we employ several alignment programs and SNP calling routines for highlighting the fact that the choice of the tools significantly affects the final results.  相似文献   

17.
Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.  相似文献   

18.
Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/.  相似文献   

19.

Background

Variants with known or possible pathogenicity located in genes that are unrelated to primary disease conditions are defined as secondary findings. Secondary findings are not the primary targets of whole exome and genome sequencing (WES/WGS) assay but can be of great practical value in early disease prevention and intervention. The driving force for this study was to investigate the impact of racial difference and disease background on secondary findings. Here, we analyzed secondary findings frequencies in 421 whole exome-sequenced Chinese children who are phenotypically normal or bear congenital heart diseases/juvenile obesity. In total, 421 WES datasets were processed for potential deleterious variant screening. A reference gene list was defined according to the American College of Medical Genetics and Genomics (ACMG) recommendations for reporting secondary findings v2.0 (ACMG SF v2.0). The variant classification was performed according to the evidence-based guidelines recommended by the joint consensus of the ACMG and the Association for Molecular Pathology (AMP).

Results

Among the 421 WES datasets, we identified 11 known/expected pathogenic variants in 12 individuals, accounting for 2.85% of our samples, which is much higher than the reported frequency in a Caucasian population. In conclusion, secondary findings are not so rare in Chinese children, which means that we should pay more attention to the clinical interpretation of sequencing results.
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号