首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Somatic single nucleotide variants (SNVs) in cancer genome affect gene expression through various mechanisms depending on their genomic location. While somatic SNVs near canonical splice sites have been reported to cause abnormal splicing of cancer-related genes, whether these SNVs can affect gene expression through other mechanisms remains an open question. Here, we analyzed RNA sequencing and exome data from 4,998 cancer patients covering ten cancer types and identified 152 somatic SNVs near splice sites that were associated with abnormal intronic polyadenylation (IPA). IPA-associated somatic variants favored the localization near the donor splice sites compared to the acceptor splice sites. A proportion of SNV-associated IPA events overlapped with premature cleavage and polyadenylation events triggered by U1 small nuclear ribonucleoproteins (snRNP) inhibition. GC content, intron length and polyadenylation signal were three genomic features that differentiated between SNV-associated IPA and intron retention. Notably, IPA-associated SNVs were enriched in tumor suppressor genes (TSGs), including the well-known TSGs such as PTEN and CDH1 with recurrent SNV-associated IPA events. Minigene assay confirmed that SNVs from PTEN, CDH1, VEGFA, GRHL2, CUL3 and WWC2 could lead to IPA. This work reveals that IPA acts as a novel mechanism explaining the functional consequence of somatic SNVs in human cancer.  相似文献   

2.
Early analytical clone screening is important during Chinese hamster ovary (CHO) cell line development of biotherapeutic proteins to select a clonally derived cell line with most favorable stability and product quality. Sensitive sequence confirmation methods using mass spectrometry have limitations in throughput and turnaround time. Next‐generation sequencing (NGS) technologies emerged as alternatives for CHO clone analytics. We report an efficient NGS workflow applying the targeted locus amplification (TLA) strategy for genomic screening of antibody expressing CHO clones. In contrast to previously reported RNA sequencing approaches, TLA allows for targeted sequencing of genomic integrated transgenic DNA without prior locus information, robust detection of single‐nucleotide variants (SNVs) and transgenic rearrangements. During clone selection, TLA/NGS revealed CHO clones with high‐level SNVs within the antibody gene and we report in another case the utility of TLA/NGS to identify rearrangements at transgenic DNA level. We also determined detection limits for SNVs calling and the potential to identify clone contaminations by TLA/NGS. TLA/NGS also allows to identify genetically identical clones. In summary, we demonstrate that TLA/NGS is a robust screening method useful for routine clone analytics during cell line development with the potential to process up to 24 CHO clones in less than 7 workdays.  相似文献   

3.
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.  相似文献   

4.
Fibroblast growth factor receptors (FGFRs) are recurrently altered by single nucleotide variants (SNVs) in many human cancers. The prevalence of SNVs in FGFRs depends on the cancer type. In some tumors, such as the urothelial carcinoma, mutations of FGFRs occur at very high frequency (up to 60%). Many characterized mutations occur in the extracellular or transmembrane domains, while fewer known mutations are found in the kinase domain. In this study, we performed a bioinformatics analysis to identify novel putative cancer driver or therapeutically actionable mutations of the kinase domain of FGFRs. To pinpoint those mutations that may be clinically relevant, we exploited the recurrence of alterations on analogous amino acid residues within the kinase domain (PK_Tyr_Ser-Thr) of different kinases as a predictor of functional impact. By exploiting MutationAligner and LowMACA bioinformatics resources, we highlighted novel uncharacterized mutations of FGFRs which recur in other protein kinases. By revealing unanticipated correspondence with known variants, we were able to infer their functional effects, as alterations clustering on similar residues in analogous proteins have a high probability to elicit similar effects. As FGFRs represent an important class of oncogenes and drug targets, our study opens the way for further studies to validate their driver and/or actionable nature and, in the long term, for a more efficacious application of precision oncology.  相似文献   

5.
The domestic dog serves as an excellent model to investigate the genetic basis of disease. More than 400 heritable traits analogous to human diseases have been described in dogs. To further canine medical genetics research, we established the Dog Biomedical Variant Database Consortium (DBVDC) and present a comprehensive list of functionally annotated genome variants that were identified with whole genome sequencing of 582 dogs from 126 breeds and eight wolves. The genomes used in the study have a minimum coverage of 10× and an average coverage of ~24×. In total, we identified 23 133 692 single‐nucleotide variants (SNVs) and 10 048 038 short indels, including 93% undescribed variants. On average, each individual dog genome carried ~4.1 million single‐nucleotide and ~1.4 million short‐indel variants with respect to the reference genome assembly. About 2% of the variants were located in coding regions of annotated genes and loci. Variant effect classification showed 247 141 SNVs and 99 562 short indels having moderate or high impact on 11 267 protein‐coding genes. On average, each genome contained heterozygous loss‐of‐function variants in 30 potentially embryonic lethal genes and 97 genes associated with developmental disorders. More than 50 inherited disorders and traits have been unravelled using the DBVDC variant catalogue, enabling genetic testing for breeding and diagnostics. This resource of annotated variants and their corresponding genotype frequencies constitutes a highly useful tool for the identification of potential variants causative for rare inherited disorders in dogs.  相似文献   

6.
Mutation position imaging toolbox (MuPIT) interactive is a browser-based application for single-nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional (3D) protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts either in bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs, including available functional annotations such as binding sites, mutagenesis experiments, and common polymorphisms. Multiple SNVs may be mapped onto each structure, enabling 3D visualization of SNV clusters and their relationship to functionally annotated positions. We illustrate the utility of MuPIT interactive in rationalizing the impact of selected polymorphisms in the PharmGKB database, somatic mutations identified in the Cancer Genome Atlas study of invasive breast carcinomas, and rare variants identified in the exome sequencing project. MuPIT interactive is freely available for non-profit use at http://mupit.icm.jhu.edu.  相似文献   

7.
Both schizophrenia (SCZ) and autism spectrum disorders (ASD) are neuropsychiatric disorders with overlapping genetic etiology. Protocadherin 15 (PCDH15), which encodes a member of the cadherin super family that contributes to neural development and function, has been cited as a risk gene for neuropsychiatric disorders. Recently, rare variants of large effect have been paid attention to understand the etiopathology of these complex disorders. Thus, we evaluated the impacts of rare, single-nucleotide variants (SNVs) in PCDH15 on SCZ or ASD. First, we conducted coding exon-targeted resequencing of PCDH15 with next-generation sequencing technology in 562 Japanese patients (370 SCZ and 192 ASD) and detected 16 heterozygous SNVs. We then performed association analyses on 2,096 cases (1,714 SCZ and 382 ASD) and 1,917 controls with six novel variants of these 16 SNVs. Of these six variants, four (p.R219K, p.T281A, p.D642N, c.3010-1G>C) were ultra-rare variants (minor allele frequency < 0.0005) that may increase disease susceptibility. Finally, no statistically significant association between any of these rare, heterozygous PCDH15 point variants and SCZ or ASD was found. Our results suggest that a larger sample size of resequencing subjects is necessary to detect associations between rare PCDH15 variants and neuropsychiatric disorders.  相似文献   

8.
Hypertriglyceridemia (HTG) is a heritable risk factor for cardiovascular disease. Investigating the genetics of HTG may identify new drug targets. There are ∼35 known single-nucleotide variants (SNVs) that explain only ∼10% of variation in triglyceride (TG) level. Because of the genetic heterogeneity of HTG, a family study design is optimal for identification of rare genetic variants with large effect size because the same mutation can be observed in many relatives and cosegregation with TG can be tested. We considered HTG in a five-generation family of European American descent (n = 121), ascertained for familial combined hyperlipidemia. By using Bayesian Markov chain Monte Carlo joint oligogenic linkage and association analysis, we detected linkage to chromosomes 7 and 17. Whole-exome sequence data revealed shared, highly conserved, private missense SNVs in both SLC25A40 on chr7 and PLD2 on chr17. Jointly, these SNVs explained 49% of the genetic variance in TG; however, only the SLC25A40 SNV was significantly associated with TG (p = 0.0001). This SNV, c.374A>G, causes a highly disruptive p.Tyr125Cys substitution just outside the second helical transmembrane region of the SLC25A40 inner mitochondrial membrane transport protein. Whole-gene testing in subjects from the Exome Sequencing Project confirmed the association between TG and SLC25A40 rare, highly conserved, coding variants (p = 0.03). These results suggest a previously undescribed pathway for HTG and illustrate the power of large pedigrees in the search for rare, causal variants.  相似文献   

9.
Adaptation of viruses to their environments occurs through the acquisition of both novel single-nucleotide variants (SNV) and recombination events including insertions, deletions, and duplications. The co-occurrence of SNVs in individual viral genomes during their evolution has been well-described. However, unlike covariation of SNVs, studying the correlation between recombination events with each other or with SNVs has been hampered by their inherent genetic complexity and a lack of bioinformatic tools. Here, we expanded our previously reported CoVaMa pipeline (v0.1) to measure linkage disequilibrium between recombination events and SNVs within both short-read and long-read sequencing datasets. We demonstrate this approach using long-read nanopore sequencing data acquired from Flock House virus (FHV) serially passaged in vitro. We found SNVs that were either correlated or anti-correlated with large genomic deletions generated by nonhomologous recombination that give rise to Defective-RNAs. We also analyzed NGS data from longitudinal HIV samples derived from a patient undergoing antiretroviral therapy who proceeded to virological failure. We found correlations between insertions in the p6Gag and mutations in Gag cleavage sites. This report confirms previous findings and provides insights on novel associations between SNVs and specific recombination events within the viral genome and their role in viral evolution.  相似文献   

10.
Although rare variants within the Toll-like receptor signalling pathway genes have been found to underlie human primary immunodeficiencies associated with selective predisposition to invasive pneumococcal disease (IPD), the contribution of variants in these genes to IPD susceptibility at the population level remains unknown. Complete re-sequencing of IRAK4, MYD88 and IKBKG genes was undertaken in 164 IPD cases from the UK and 164 geographically-matched population-based controls. 233 single-nucleotide variants (SNVs) were identified, of which ten were in coding regions. Four rare coding variants were predicted to be deleterious, two variants in MYD88 and two in IRAK4. The predicted deleterious variants in MYD88 were observed as two heterozygote cases but not seen in controls. Frequencies of predicted deleterious IRAK4 SNVs were the same in cases and controls. Our findings suggest that rare, functional variants in MYD88, IRAK4 or IKBKG do not significantly contribute to IPD susceptibility in adults at the population level.  相似文献   

11.
U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date.  相似文献   

12.
13.
Recent advances in genomics technologies have spurred unprecedented efforts in genome and exome re-sequencing aiming to unravel the genetic component of rare and complex disorders. While in rare disorders this allowed the identification of novel causal genes, the missing heritability paradox in complex diseases remains so far elusive. Despite rapid advances of next-generation sequencing, both the technology and the analysis of the data it produces are in its infancy. At present there is abundant knowledge pertaining to the role of rare single nucleotide variants (SNVs) in rare disorders and of common SNVs in common disorders. Although the 1,000 genome project has clearly highlighted the prevalence of rare variants and more complex variants (e.g. insertions, deletions), their role in disease is as yet far from elucidated.We set out to analyse the properties of sequence variants identified in a comprehensive collection of exome re-sequencing studies performed on samples from patients affected by a broad range of complex and rare diseases (N = 173). Given the known potential for Loss of Function (LoF) variants to be false positive, we performed an extensive validation of the common, rare and private LoF variants identified, which indicated that most of the private and rare variants identified were indeed true, while common novel variants had a significantly higher false positive rate. Our results indicated a strong enrichment of very low-frequency insertion/deletion variants, so far under-investigated, which might be difficult to capture with low coverage and imputation approaches and for which most of study designs would be under-powered. These insertions and deletions might play a significant role in disease genetics, contributing specifically to the underlining rare and private variation predicted to be discovered through next generation sequencing.  相似文献   

14.
Exome sequencing has been widely used in detecting pathogenic nonsynonymous single nucleotide variants (SNVs) for human inherited diseases. However, traditional statistical genetics methods are ineffective in analyzing exome sequencing data, due to such facts as the large number of sequenced variants, the presence of non-negligible fraction of pathogenic rare variants or de novo mutations, and the limited size of affected and normal populations. Indeed, prevalent applications of exome sequencing have been appealing for an effective computational method for identifying causative nonsynonymous SNVs from a large number of sequenced variants. Here, we propose a bioinformatics approach called SPRING (Snv PRioritization via the INtegration of Genomic data) for identifying pathogenic nonsynonymous SNVs for a given query disease. Based on six functional effect scores calculated by existing methods (SIFT, PolyPhen2, LRT, MutationTaster, GERP and PhyloP) and five association scores derived from a variety of genomic data sources (gene ontology, protein-protein interactions, protein sequences, protein domain annotations and gene pathway annotations), SPRING calculates the statistical significance that an SNV is causative for a query disease and hence provides a means of prioritizing candidate SNVs. With a series of comprehensive validation experiments, we demonstrate that SPRING is valid for diseases whose genetic bases are either partly known or completely unknown and effective for diseases with a variety of inheritance styles. In applications of our method to real exome sequencing data sets, we show the capability of SPRING in detecting causative de novo mutations for autism, epileptic encephalopathies and intellectual disability. We further provide an online service, the standalone software and genome-wide predictions of causative SNVs for 5,080 diseases at http://bioinfo.au.tsinghua.edu.cn/spring.  相似文献   

15.
Next-generation sequencing (NGS) will likely facilitate a better understanding of the causes and consequences of human genetic variability. In this context, the validity of NGS-inferred single-nucleotide variants (SNVs) is of paramount importance. We therefore developed a statistical framework to assess the fidelity of three common NGS platforms. Using aligned DNA sequence data from two completely sequenced HapMap samples as included in the 1000 Genomes Project, we unraveled remarkably different error profiles for the three platforms. Compared to confirmed HapMap variants, newly identified SNVs included a substantial proportion of false positives (3–17%). Consensus calling by more than one platform yielded significantly lower error rates (1–4%). This implies that the use of multiple NGS platforms may be more cost-efficient than relying upon a single technology alone, particularly in physically localized sequencing experiments that rely upon small error rates. Our study thus highlights that different NGS platforms suit different practical applications differently well, and that NGS-based studies require stringent data quality control for their results to be valid.  相似文献   

16.
Recent advances in DNA sequencing techniques have identified rare single‐nucleotide variants with less than 1% minor allele frequency. Despite the growing interest and physiological importance of rare variants in genome sciences, less attention has been paid to the allele frequency of variants in protein sciences. To elucidate the characteristics of genetic variants on protein interaction sites, from the viewpoints of the allele frequency and the structural position of variants, we mapped about 20,000 human SNVs onto protein complexes. We found that variants are less abundant in protein interfaces, and specifically the core regions of interfaces. The tendency to “avoid” the interfacial core is stronger among common variants than rare variants. As amino acid substitutions, the trend of mutating amino acids among rare variants is consistent in different interfacial regions, reflecting the fact that rare variants result from random mutations in DNA sequences, whereas amino acid changes of common variants vary between the interfacial core and rim regions, possibly due to functional constraints on proteins. This study illustrated how the allele frequency of variants relates to the protein structural regions and the functional sites in general and will lead to deeper understanding of the potential deleteriousness of rare variants at the structural level. Exceptional cases of the observed trends will shed light on the limitations of structural approaches to evaluate the functional impacts of variants.  相似文献   

17.
Congenital Zika Syndrome (CZS) is a critical illness with a wide range of severity caused by Zika virus (ZIKV) infection during pregnancy. Life-threatening neurodevelopmental dysfunctions are among the most common phenotypes observed in affected newborns. Risk factors that contribute to susceptibility and response to ZIKV infection may be related to the virus itself, the environment, and maternal genetic background. Nevertheless, the newborn’s genetic contribution to the critical illness is still not elucidated. Here, we aimed to identify possible genetic variants as well as relevant biological pathways that might be associated with CZS phenotypes. For this purpose, we performed a whole-exome sequencing in 40 children born to women with confirmed exposure to ZIKV during pregnancy. We investigated the occurrence of rare harmful single-nucleotide variants (SNVs) possibly associated with inborn errors in genes ontologically related to CZS phenotypes. Moreover, an exome-wide association analysis was also performed using a case-control design (29 CZS cases and 11 controls), for both common and rare variants. Five out of the 29 CZS patients harbored known pathogenic variants likely to contribute to mild to severe manifestations observed. Approximately, 30% of affected individuals carried at least one pathogenic or likely pathogenic SNV in genes candidates to play a role in CZS. Our common variant association analysis detected a suggestive protective effect of the rs2076469 in DISP3 gene (p-value: 1.39 x 10−5). The IL12RB2 gene (p-value: 2.18x10-11) also showed an unusual distribution of nonsynonymous rare SNVs in control samples. Finally, genes harboring harmful variants are involved in processes related to CZS phenotypes such as neurological development and immunity. Therefore, both rare and common variations may be likely to contribute as the underlying genetic cause of CZS susceptibility. The variations and pathways identified in this study may also have implications for the development of therapeutic strategies in the future.  相似文献   

18.
Non-small-cell lung cancer (NSCLC) accounts for most cancer-related deaths worldwide. Liquid biopsy by a blood draw to detect circulating tumor cells (CTCs) is a tool for molecular profiling of cancer using single-cell and next-generation sequencing (NGS) technologies. The aim of the study was to identify somatic variants in single CTCs isolated from NSCLC patients by targeted NGS. Thirty-one subjects (20 NSCLC patients, 11 smokers without cancer) were enrolled for blood draws (7.5 mL). CTCs were identified by immunofluorescence, individually retrieved, and DNA-extracted. Targeted NGS was performed to detect somatic variants (single-nucleotide variants (SNVs) and insertions/deletions (Indels)) across 65 oncogenes and tumor suppressor genes. Cancer-associated variants were classified using OncoKB database. NSCLC patients had significantly higher CTC counts than control smokers (p = 0.0132; Mann–Whitney test). Analyzing 23 CTCs and 13 white blood cells across seven patients revealed a total of 644 somatic variants that occurred in all CTCs within the same subject, ranging from 1 to 137 per patient. The highest number of variants detected in ≥1 CTC within a patient was 441. A total of 18/65 (27.7%) genes were highly mutated. Mutations with oncogenic impact were identified in functional domains of seven oncogenes/tumor suppressor genes (NF1, PTCH1, TP53, SMARCB1, SMAD4, KRAS, and ERBB2). Single CTC-targeted NGS detects heterogeneous and shared mutational signatures within and between NSCLC patients. CTC single-cell genomics have potential for integration in NSCLC precision oncology.  相似文献   

19.
Genome and exome sequencing yield extensive catalogues of human genetic variation. However, pinpointing the few phenotypically causal variants among the many variants present in human genomes remains a major challenge, particularly for rare and complex traits wherein genetic information alone is often insufficient. Here, we review approaches to estimate the deleteriousness of single nucleotide variants (SNVs), which can be used to prioritize disease-causal variants. We describe recent advances in comparative and functional genomics that enable systematic annotation of both coding and non-coding variants. Application and optimization of these methods will be essential to find the genetic answers that sequencing promises to hide in plain sight.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号