首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.  相似文献   

2.
Some individuals with a particular disease-causing mutation or genotype fail to express most if not all features of the disease in question, a phenomenon that is known as ‘reduced (or incomplete) penetrance’. Reduced penetrance is not uncommon; indeed, there are many known examples of ‘disease-causing mutations’ that fail to cause disease in at least a proportion of the individuals who carry them. Reduced penetrance may therefore explain not only why genetic diseases are occasionally transmitted through unaffected parents, but also why healthy individuals can harbour quite large numbers of potentially disadvantageous variants in their genomes without suffering any obvious ill effects. Reduced penetrance can be a function of the specific mutation(s) involved or of allele dosage. It may also result from differential allelic expression, copy number variation or the modulating influence of additional genetic variants in cis or in trans. The penetrance of some pathogenic genotypes is known to be age- and/or sex-dependent. Variable penetrance may also reflect the action of unlinked modifier genes, epigenetic changes or environmental factors. At least in some cases, complete penetrance appears to require the presence of one or more genetic variants at other loci. In this review, we summarize the evidence for reduced penetrance being a widespread phenomenon in human genetics and explore some of the molecular mechanisms that may help to explain this enigmatic characteristic of human inherited disease.  相似文献   

3.
Interpreting the impact of human genome variation on phenotype is challenging. The functional effect of protein-coding variants is often predicted using sequence conservation and population frequency data, however other factors are likely relevant. We hypothesized that variants in protein post-translational modification (PTM) sites contribute to phenotype variation and disease. We analyzed fraction of rare variants and non-synonymous to synonymous variant ratio (Ka/Ks) in 7,500 human genomes and found a significant negative selection signal in PTM regions independent of six factors, including conservation, codon usage, and GC-content, that is widely distributed across tissue-specific genes and function classes. PTM regions are also enriched in known disease mutations, suggesting that PTM variation is more likely deleterious. PTM constraint also affects flanking sequence around modified residues and increases around clustered sites, indicating presence of functionally important short linear motifs. Using target site motifs of 124 kinases, we predict that at least ∼180,000 motif-breaker amino acid residues that disrupt PTM sites when substituted, and highlight kinase motifs that show specific negative selection and enrichment of disease mutations. We provide this dataset with corresponding hypothesized mechanisms as a community resource. As an example of our integrative approach, we propose that PTPN11 variants in Noonan syndrome aberrantly activate the protein by disrupting an uncharacterized cluster of phosphorylation sites. Further, as PTMs are molecular switches that are modulated by drugs, we study mutated binding sites of PTM enzymes in disease genes and define a drug-disease network containing 413 novel predicted disease-gene links.  相似文献   

4.
Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.  相似文献   

5.
Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disease that results in progressive degeneration of motor neurons, ultimately leading to paralysis and death. Approximately 10% of ALS cases are familial, with the remaining 90% of cases being sporadic. Genetic studies in familial cases of ALS have been extremely informative in determining the causative mutations behind ALS, especially as the same mutations identified in familial ALS can also cause sporadic disease. However, the cause of ALS in approximately 30% of familial cases and in the majority of sporadic cases remains unknown. Sporadic ALS cases represent an underutilized resource for genetic information about ALS; therefore, we undertook a targeted sequencing approach of 169 known and candidate ALS disease genes in 242 sporadic ALS cases and 129 matched controls to try to identify novel variants linked to ALS. We found a significant enrichment in novel and rare variants in cases versus controls, indicating that we are likely identifying disease associated mutations. This study highlights the utility of next generation sequencing techniques combined with functional studies and rare variant analysis tools to provide insight into the genetic etiology of a heterogeneous sporadic disease.  相似文献   

6.
Large-scale sequencing of cancer genomes has revealed many novel mutations and inter-tumoral heterogeneity. Therefore, prioritizing variants according to their potential deleterious effects has become essential. We constructed a disease gene network and proposed a Bayesian ensemble approach that integrates diverse sources to predict the functional effects of missense variants. We analyzed 23,336 missense disease mutations and 36,232 neutral polymorphisms of 12,039 human proteins. The results showed successful improvement of prediction accuracy in both sensitivity and specificity, and we demonstrated the utility of the method by applying it to somatic mutations obtained from colorectal and breast cancer cell lines. The candidate genes with predicted deleterious mutations as well as known cancer genes were significantly enriched in many KEGG pathways related to carcinogenesis, supporting genetic homogeneity of cancer at the pathway level. The breast cancer-specific network increased the prediction accuracy for breast cancer mutations. This study provides a ranked list of deleterious mutations and candidate cancer genes and suggests that mutations affecting cancer may occur in important pathways and should be interpreted on the phenotype-related network or pathway. A disease gene network may be of value in predicting functional effects of novel disease-specific mutations.  相似文献   

7.
Finding genes for complex diseases has been the goal of many genetic studies. Most of these studies have been successful by searching for genes and mutations in rare familial cases, by screening candidate genes and by performing genome wide association studies. However, only a small fraction of the total genetic risk for these complex genetic diseases can be explained by the identified mutations and associated genetic loci. In this review we focus on Hirschsprung disease (HSCR) as an example of a complex genetic disorder. We describe the genes identified in this congenital malformation and postulate that both common ‘low penetrant’ variants in combination with rare or private ‘high penetrant’ variants determine the risk on HSCR, and likely, on other complex diseases. We also discuss how new technological advances can be used to gain further insights in the genetic background of complex diseases. Finally, we outline a few steps to develop functional assays in order to determine the involvement of these variants in disease development.  相似文献   

8.
Much emphasis has been placed on the identification, functional characterization, and therapeutic potential of somatic variants in tumor genomes. However, the majority of somatic variants lie outside coding regions and their role in cancer progression remains to be determined. In order to establish a system to test the functional importance of non-coding somatic variants in cancer, we created a low-passage cell culture of a metastatic melanoma tumor sample. As a foundation for interpreting functional assays, we performed whole-genome sequencing and analysis of this cell culture, the metastatic tumor from which it was derived, and the patient-matched normal genomes. When comparing somatic mutations identified in the cell culture and tissue genomes, we observe concordance at the majority of single nucleotide variants, whereas copy number changes are more variable. To understand the functional impact of non-coding somatic variation, we leveraged functional data generated by the ENCODE Project Consortium. We analyzed regulatory regions derived from multiple different cell types and found that melanocyte-specific regions are among the most depleted for somatic mutation accumulation. Significant depletion in other cell types suggests the metastatic melanoma cells de-differentiated to a more basal regulatory state. Experimental identification of genome-wide regulatory sites in two different melanoma samples supports this observation. Together, these results show that mutation accumulation in metastatic melanoma is nonrandom across the genome and that a de-differentiated regulatory architecture is common among different samples. Our findings enable identification of the underlying genetic components of melanoma and define the differences between a tissue-derived tumor sample and the cell culture created from it. Such information helps establish a broader mechanistic understanding of the linkage between non-coding genomic variations and the cellular evolution of cancer.  相似文献   

9.
Although we now routinely sequence human genomes, we can confidently identify only a fraction of the sequence variants that have a functional impact. Here, we developed a deep mutational scanning framework that produces exhaustive maps for human missense variants by combining random codon mutagenesis and multiplexed functional variation assays with computational imputation and refinement. We applied this framework to four proteins corresponding to six human genes: UBE2I (encoding SUMO E2 conjugase), SUMO1 (small ubiquitin‐like modifier), TPK1 (thiamin pyrophosphokinase), and CALM1/2/3 (three genes encoding the protein calmodulin). The resulting maps recapitulate known protein features and confidently identify pathogenic variation. Assays potentially amenable to deep mutational scanning are already available for 57% of human disease genes, suggesting that DMS could ultimately map functional variation for all human disease genes.  相似文献   

10.
Exome sequencing in families affected by rare genetic disorders has the potential to rapidly identify new disease genes (genes in which mutations cause disease), but the identification of a single causal mutation among thousands of variants remains a significant challenge. We developed a scoring algorithm to prioritize potential causal variants within a family according to segregation with the phenotype, population frequency, predicted effect, and gene expression in the tissue(s) of interest. To narrow the search space in families with multiple affected individuals, we also developed two complementary approaches to exome-based mapping of autosomal-dominant disorders. One approach identifies segments of maximum identity by descent among affected individuals; the other nominates regions on the basis of shared rare variants and the absence of homozygous differences between affected individuals. We showcase our methods by using exome sequence data from families affected by autosomal-dominant retinitis pigmentosa (adRP), a rare disorder characterized by night blindness and progressive vision loss. We performed exome capture and sequencing on 91 samples representing 24 families affected by probable adRP but lacking common disease-causing mutations. Eight of 24 families (33%) were revealed to harbor high-scoring, most likely pathogenic (by clinical assessment) mutations affecting known RP genes. Analysis of the remaining 17 families identified candidate variants in a number of interesting genes, some of which have withstood further segregation testing in extended pedigrees. To empower the search for Mendelian-disease genes in family-based sequencing studies, we implemented them in a cross-platform-compatible software package, MendelScan, which is freely available to the research community.  相似文献   

11.
The problem of interpreting missense mutations of disease-causing genes is an increasingly important one. Because these point mutations result in alteration of only a single amino acid of the protein product, it is often unclear whether this change alone is sufficient to cause disease. We propose a Bayesian approach that utilizes genetic information on affected relatives in families ascertained through known missense-mutation carriers. This method is useful in evaluating known disease genes for common disease phenotypes, such as breast cancer or colorectal cancer. The posterior probability that a missense mutation is disease causing is conditioned on the relationship of the relatives to the proband, the population frequency of the mutation, and the phenocopy rate of the disease. The approach is demonstrated in two cancer data sets: BRCA1 R841W and APC I1307K. In both examples, this method helps establish that these mutations are likely to be disease causing, with Bayes factors in favor of causality of 5.09 and 66.97, respectively, and posterior probabilities of .836 and .985. We also develop a simple approximation for rare alleles and consider the case of unknown penetrance and allele frequency.  相似文献   

12.
Phenotypic variation in natural populations results from a combination of genetic effects, environmental effects, and gene-by-environment interactions. Despite the vast amount of genomic data becoming available, many pressing questions remain about the nature of genetic mutations that underlie functional variation. We present the results of combining genome-wide association analysis of 41 different phenotypes in ∼5,000 inbred maize lines to analyze patterns of high-resolution genetic association among of 28.9 million single-nucleotide polymorphisms (SNPs) and ∼800,000 copy-number variants (CNVs). We show that genic and intergenic regions have opposite patterns of enrichment, minor allele frequencies, and effect sizes, implying tradeoffs among the probability that a given polymorphism will have an effect, the detectable size of that effect, and its frequency in the population. We also find that genes tagged by GWAS are enriched for regulatory functions and are ∼50% more likely to have a paralog than expected by chance, indicating that gene regulation and gene duplication are strong drivers of phenotypic variation. These results will likely apply to many other organisms, especially ones with large and complex genomes like maize.  相似文献   

13.
The decreasing cost of sequencing is leading to a growing repertoire of personal genomes. However, we are lagging behind in understanding the functional consequences of the millions of variants obtained from sequencing. Global system-wide effects of variants in coding genes are particularly poorly understood. It is known that while variants in some genes can lead to diseases, complete disruption of other genes, called ‘loss-of-function tolerant’, is possible with no obvious effect. Here, we build a systems-based classifier to quantitatively estimate the global perturbation caused by deleterious mutations in each gene. We first survey the degree to which gene centrality in various individual networks and a unified ‘Multinet’ correlates with the tolerance to loss-of-function mutations and evolutionary conservation. We find that functionally significant and highly conserved genes tend to be more central in physical protein-protein and regulatory networks. However, this is not the case for metabolic pathways, where the highly central genes have more duplicated copies and are more tolerant to loss-of-function mutations. Integration of three-dimensional protein structures reveals that the correlation with centrality in the protein-protein interaction network is also seen in terms of the number of interaction interfaces used. Finally, combining all the network and evolutionary properties allows us to build a classifier distinguishing functionally essential and loss-of-function tolerant genes with higher accuracy (AUC = 0.91) than any individual property. Application of the classifier to the whole genome shows its strong potential for interpretation of variants involved in Mendelian diseases and in complex disorders probed by genome-wide association studies.  相似文献   

14.
Next-generation sequencing has prompted a surge of discovery of millions of genetic variants from vertebrate genomes. Besides applications in genetic association and linkage studies, a fraction of these variants will have functional consequences. This study describes detection and characterization of 15 million SNPs from chicken genome with the goal to predict variants with potential functional implications (pfVars) from both coding and non-coding regions. The study reports: 183K amino acid-altering SNPs of which 48% predicted as evolutionary intolerant, 13K splicing variants, 51K likely to alter RNA secondary structures, 500K within most conserved elements and 3K from non-coding RNAs. Regions of local fixation within commercial broiler and layer lines were investigated as potential selective sweeps using genome-wide SNP data. Relationships with phenotypes, if any, of the pfVars were explored by overlaying the sweep regions with known QTLs. Based on this, the candidate genes and/or causal mutations for a number of important traits are discussed. Although the fixed variants within sweep regions were enriched with non-coding SNPs, some non-synonymous-intolerant mutations reached fixation, suggesting their possible adaptive advantage. The results presented in this study are expected to have important implications for future genomic research to identify candidate causal mutations and in poultry breeding.  相似文献   

15.
As large-scale re-sequencing of genomes reveals many protein mutations, especially in human cancer tissues, prediction of their likely functional impact becomes important practical goal. Here, we introduce a new functional impact score (FIS) for amino acid residue changes using evolutionary conservation patterns. The information in these patterns is derived from aligned families and sub-families of sequence homologs within and between species using combinatorial entropy formalism. The score performs well on a large set of human protein mutations in separating disease-associated variants (∼19 200), assumed to be strongly functional, from common polymorphisms (∼35 600), assumed to be weakly functional (area under the receiver operating characteristic curve of ∼0.86). In cancer, using recurrence, multiplicity and annotation for ∼10 000 mutations in the COSMIC database, the method does well in assigning higher scores to more likely functional mutations (‘drivers’). To guide experimental prioritization, we report a list of about 1000 top human cancer genes frequently mutated in one or more cancer types ranked by likely functional impact; and, an additional 1000 candidate cancer genes with rare but likely functional mutations. In addition, we estimate that at least 5% of cancer-relevant mutations involve switch of function, rather than simply loss or gain of function.  相似文献   

16.
The disease Xeroderma Pigmentosum (XP) is genetically heterogeneous and defined by pathogenic variants (formerly termed mutations) in any of eight different genes. Pathogenic variants in the XPC gene are the most commonly observed in US patients. Moreover, pathogenic variants in just four of the genes, XPA, XPC, XPD/ERCC2 and XPV/POLH account for 91% of all XP cases worldwide. In the current study, we describe the clinical, histopathologic, molecular genetic, and pathophysiological features of a 19-year-old female patient clinically diagnosed with XP as an infant. Analysis of archival material reveals a novel variation of a 13 base pair deletion in XPC exon 14 and a previously reported A>C missense pathogenic variant in the proximal splice site for XPC exon 6. Both variations induce frameshifts most likely leading to a truncated XPC protein product. Quantitative RT-PCR also revealed reduced mRNA levels in the archived specimen. Analysis of the XPA, XPD/ERCC2 and XPV/POLH genes in the current specimen failed to reveal pathologic variants. All previously reported pathogenic variants, polymorphisms and known amino acid changes for the XPC gene are compiled and described in the current nomenclature. Given the relative ease of screening for genetic variation and the potential role for such variation in human disease, a proposal for screening appropriate archival materials for alterations in the four most prevalent XP genes is presented.  相似文献   

17.
Common inbred strains of the laboratory rat can be divided into four different mitochondrial DNA haplotype groups represented by the SHR, BN, LEW, and F344 strains. In the current study, we investigated the metabolic and hemodynamic effects of the SHR vs. LEW mitochondrial genomes by comparing the SHR to a new SHR conplastic strain, SHR-mt(LEW); these strains are genetically identical except for their mitochondrial genomes. Complete mitochondrial DNA (mtDNA) sequence analysis comparing the SHR and LEW strains revealed gene variants encoding amino acid substitutions limited to a single mitochondrial enzyme complex, NADH dehydrogenase (complex I), affecting subunits 2, 4, and 5. Two of the variants in the mt-Nd4 subunit gene are located close to variants known to be associated with exercise intolerance and diabetes mellitus in humans. No variants were found in tRNA or rRNA genes. These variants in mt-Nd2, mt-Nd4, and mt-Nd5 in the SHR-mt(LEW) conplastic strain were linked to reductions in oxidative and nonoxidative glucose metabolism in skeletal muscle. In addition, SHR-mt(LEW) conplastic rats showed increased serum nonesterified fatty acid levels and resistance to insulin stimulated incorporation of glucose into adipose tissue lipids. These results provide evidence that inherited variation in mitochondrial genes encoding respiratory chain complex I subunits, in the absence of variation in the nuclear genome and other confounding factors, can influence glucose and lipid metabolism when expressed on the nuclear genetic background of the SHR strain.  相似文献   

18.
19.
Over 1600 mammalian genes are known to cause an inherited disorder, when subjected to one or more mutations. These disease genes represent a unique resource for the identification and quantification of relationships between phenotypic attributes of a disease and the molecular features of the associated disease genes, including their ascribed annotated functional classes and expression patterns. Such analyses can provide a more global perspective and a deeper understanding of the probable causes underlying human hereditary diseases. In this perspective and critical view of disease genomics, we present a comparative analysis of genes reported to cause inherited diseases in humans in terms of their causative effects on physiology, their genetics and inheritance modes, the functional processes they are involved in and their expression profiles across a wide spectrum of tissues. Our analysis reveals that there are more extensive correlations between these attributes of genetic disease genes than previously appreciated. For instance, the functional pattern of genes causing dominant and recessive diseases is markedly different. Also, the function of the genes and their expression correlate with the type of disease they cause when mutated. The results further indicate that a comparative genomics approach for the analysis of genes linked to human genetic diseases will facilitate the elucidation of the underlying molecular and cellular mechanisms.  相似文献   

20.
Retinitis Pigmentosa (RP) is a heterogeneous group of inherited retinal dystrophies characterised ultimately by the loss of photoreceptor cells. RP is the leading cause of visual loss in individuals younger than 60 years, with a prevalence of about 1 in 4000. The molecular genetic diagnosis of autosomal recessive RP (arRP) is challenging due to the large genetic and clinical heterogeneity. Traditional methods for sequencing arRP genes are often laborious and not easily available and a screening technique that enables the rapid detection of the genetic cause would be very helpful in the clinical practice. The goal of this study was to develop and apply microarray-based resequencing technology capable of detecting both known and novel mutations on a single high-throughput platform. Hence, the coding regions and exon/intron boundaries of 16 arRP genes were resequenced using microarrays in 102 Spanish patients with clinical diagnosis of arRP. All the detected variations were confirmed by direct sequencing and potential pathogenicity was assessed by functional predictions and frequency in controls. For validation purposes 4 positive controls for variants consisting of previously identified changes were hybridized on the array. As a result of the screening, we detected 44 variants, of which 15 are very likely pathogenic detected in 14 arRP families (14%). Finally, the design of this array can easily be transformed in an equivalent diagnostic system based on targeted enrichment followed by next generation sequencing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号