首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Interlocus gene conversion (IGC) is a recombination-based mechanism that results in the unidirectional transfer of short stretches of sequence between paralogous loci. Although IGC is a well-established mechanism of human disease, the extent to which this mutagenic process has shaped overall patterns of segregating variation in multi-copy regions of the human genome remains unknown. One expected manifestation of IGC in population genomic data is the presence of one-to-one paralogous SNPs that segregate identical alleles.

Results

Here, I use SNP genotype calls from the low-coverage phase 3 release of the 1000 Genomes Project to identify 15,790 parallel, shared SNPs in duplicated regions of the human genome. My approach for identifying these sites accounts for the potential redundancy of short read mapping in multi-copy genomic regions, thereby effectively eliminating false positive SNP calls arising from paralogous sequence variation. I demonstrate that independent mutation events to identical nucleotides at paralogous sites are not a significant source of shared polymorphisms in the human genome, consistent with the interpretation that these sites are the outcome of historical IGC events. These putative signals of IGC are enriched in genomic contexts previously associated with non-allelic homologous recombination, including clear signals in gene families that form tandem intra-chromosomal clusters.

Conclusions

Taken together, my analyses implicate IGC, not point mutation, as the mechanism generating at least 2.7 % of single nucleotide variants in duplicated regions of the human genome.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1681-3) contains supplementary material, which is available to authorized users.  相似文献   

2.
Humans show tremendous phenotypic diversity across geographically distributed populations, and much of this diversity undoubtedly results from genetic adaptations to different environmental pressures. The availability of genome-wide genetic variation data from densely sampled populations offers unprecedented opportunities for identifying the loci responsible for these adaptations and for elucidating the genetic architecture of human adaptive traits. Several approaches have been used to detect signals of selection in human populations, and these approaches differ in the assumptions they make about the underlying mode of selection. We contrast the results of approaches based on haplotype structure and differentiation of allele frequencies to those from a method for identifying single nucleotide polymorphisms strongly correlated with environmental variables. Although the first group of approaches tends to detect new beneficial alleles that were driven to high frequencies by selection, the environmental correlation approach has power to identify alleles that experienced small shifts in frequency owing to selection. We suggest that the first group of approaches tends to identify only variants with relatively strong phenotypic effects, whereas the environmental correlation methods can detect variants that make smaller contributions to an adaptive trait.  相似文献   

3.
Structural variants (SV) are defined as chromosomal changes larger than 1kb. Although technical progress has enabled improved characterization of the qualitative and quantitative features of SV, their phenotypic consequences remain poorly understood. Distinguishing between a neutral variant, a predisposing factor and a disease-causing aberration represents one of the major challenges in today’s human genetic diagnostics.  相似文献   

4.
It is a central assumption of evolution that gene duplications provide the genetic raw material from which to create proteins with new functions. The increasing availability in multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico approaches to predict details of protein function. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogous proteins. It has been proposed that the positions that show switches in substitution rate over time-i.e., "heterotachous sites," are good indicators of functional divergence. Here, we analyzed the alpha and beta paralogous subunits of hemoglobin in search for such signatures. We found as many heterotachous sites in comparisons between groups of paralogous subunits (alpha/beta) as between orthologous ones (alpha/alpha, beta/beta). Thus, the importance of substitution rate shifts as predictors of specialization between protein subfamilies might be reconsidered. Instead, such shifts may reflect a more general process of protein evolution, consistent with the fact that they can be compatible with function conservation. As an alternative, we focused on those residues showing highly constrained states in two sequence groups, but different in each group, and we named them CBD (for "constant but different"). As opposed to heterotachous positions, CBD sites were markedly overrepresented in paralogous (alpha/beta) comparisons, as opposed to orthologous ones (alpha/alpha, beta/beta), identifying them as likely signatures of functional specialization between the two subunits. When superimposed onto the three-dimensional structure of hemoglobin, CBD positions consistently appeared to cluster preferentially on inter-subunit surfaces, two contact areas crucial to function in vertebrate tetrameric hemoglobin. The identification and analysis of CBD sites by complementing structural information with evolutionary data may represent a promising direction for future studies dealing with the functional characterization of a growing number of multigene families identified by complete genome analyses.  相似文献   

5.
Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.  相似文献   

6.
7.
We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from the dbSNP database in G-protein-coupled receptors (GPCRs), a large class of important transmembrane (TM) proteins. Apart from the location of the SNP in the protein, we evaluated the predictive power of three major classes of features to differentiate between disease-causing mutations and neutral changes: (i) properties derived from amino-acid scales, such as volume and hydrophobicity; (ii) position-specific phylogenetic features reflecting evolutionary conservation, such as normalized site entropy, residue frequency and SIFT score; and (iii) substitution-matrix scores, such as those derived from the BLOSUM62, GRANTHAM and PHAT matrices. We validated our approach using a control dataset consisting of known disease-causing mutations and neutral variations. Logistic regression analyses indicated that position-specific phylogenetic features that describe the conservation of an amino acid at a specific site are the best discriminators of disease mutations versus neutral variations, and integration of all our features improves discrimination power. Overall, we identify 115 SNPs in GPCRs from dbSNP that are likely to be associated with disease and thus are good candidates for genotyping in association studies.  相似文献   

8.
Quantitative trait nucleotide analysis using Bayesian model selection   总被引:4,自引:0,他引:4  
Although much attention has been given to statistical genetic methods for the initial localization and fine mapping of quantitative trait loci (QTLs), little methodological work has been done to date on the problem of statistically identifying the most likely functional polymorphisms using sequence data. In this paper we provide a general statistical genetic framework, called Bayesian quantitative trait nucleotide (BQTN) analysis, for assessing the likely functional status of genetic variants. The approach requires the initial enumeration of all genetic variants in a set of resequenced individuals. These polymorphisms are then typed in a large number of individuals (potentially in families), and marker variation is related to quantitative phenotypic variation using Bayesian model selection and averaging. For each sequence variant a posterior probability of effect is obtained and can be used to prioritize additional molecular functional experiments. An example of this quantitative nucleotide analysis is provided using the GAW12 simulated data. The results show that the BQTN method may be useful for choosing the most likely functional variants within a gene (or set of genes). We also include instructions on how to use our computer program, SOLAR, for association analysis and BQTN analysis.  相似文献   

9.
Advancements in sequencing technologies have empowered recent efforts to identify polymorphisms and mutations on a global scale. The large number of variations and mutations found in these projects requires high-throughput tools to identify those that are most likely to have an impact on function. Numerous computational tools exist for predicting which mutations are likely to be functional, but none that specifically attempt to identify mutations that result in hyperactivation or gain-of-function. Here we present a modified version of the SIFT (Sorting Intolerant from Tolerant) algorithm that utilizes protein sequence alignments with homologous sequences to identify functional mutations based on evolutionary fitness. We show that this bi-directional SIFT (B-SIFT) is capable of identifying experimentally verified activating mutants from multiple datasets. B-SIFT analysis of large-scale cancer genotyping data identified potential activating mutations, some of which we have provided detailed structural evidence to support. B-SIFT could prove to be a valuable tool for efforts in protein engineering as well as in identification of functional mutations in cancer.  相似文献   

10.
The identification of novel sequence variants, which may be either disease-causing mutations or silent polymorphisms, in large numbers of samples is becoming the rate-limiting step in associating diseases with specific genes. This is particularly true in light of the imminent arrival of the complete reference sequence of the human genome. A number of techniques have been developed to analyze DNA samples for sequence variants rapidly. We describe a new technique, capillary-based conformation-sensitive gel electrophoresis (capillary CSGE) that transfers mutation detection from acrylamide gel to capillary electrophoresis. Capillary CSGE was able to detect 7/7 short insertion/deletions and 16/22 base substitutions in a series of random single-nucleotide polymorphisms and known variants in the lipoprotein lipase and BRCA2 genes. This technique has the potential to screen many megabases of DNA in a single day.  相似文献   

11.
Expression quantitative trait loci (eQTLs) are currently the most abundant and systematically-surveyed class of functional consequence for genetic variation. Recent genetic studies of gene expression have identified thousands of eQTLs in diverse tissue types for the majority of human genes. Application of this large eQTL catalog provides an important resource for understanding the molecular basis of common genetic diseases. However, only now has both the availability of individuals with full genomes and corresponding advances in functional genomics provided the opportunity to dissect eQTLs to identify causal regulatory variants. Resolving the properties of such causal regulatory variants is improving understanding of the molecular mechanisms that influence traits and guiding the development of new genome-scale approaches to variant interpretation. In this review, we provide an overview of current computational and experimental methods for identifying causal regulatory variants and predicting their phenotypic consequences.  相似文献   

12.
Proteomic studies of some human tissues and organs (skeletal muscles, myometrium, motor zone of the brain, prostate), and also cultivated myoblasts revealed 41 proteins, in which the presence of certain variants of amino acids (“conflicts”) was recognized at several “conflict” positions. Among the 93 registered “amino acid conflicts”, seven cases represented the results of the protein polymorphisms caused by corresponding substitution of individual amino acid. Proteomic analysis of prostate proteins revealed two isoforms of a prostate-specific antigen, formed due to alternative splicing. Thus, our results have shown that employment of the proteomic technologies may characterize various types of biochemical polymorphism in many human proteins.  相似文献   

13.
Laboratory mouse strains are known to have emerged from recent interbreeding between individuals of Mus musculus isolated populations. As a result of this breeding history, the collection of polymorphisms observed between laboratory mouse strains is likely to harbor the effects of natural selection between reproductively isolated populations. Until now no study has systematically investigated the consequences of this breeding history on gene evolution. Here we have used a novel, unbiased evolutionary approach to predict the founder origin of laboratory mouse strains and to assess the balance between ancient and newly emerged mutations in the founder subspecies. Our results confirm a contribution from at least four distinct subspecies. Additionally, our method allowed us to identify regions of relaxed selective constraint among laboratory mouse strains. This unique structure of variation is likely to have significant consequences on the use of mouse to find genes underlying phenotypic variation.  相似文献   

14.
Understanding how each residue position contributes to protein function has been a long-standing goal in protein science. Substitution studies have historically focused on conserved protein positions. However, substitutions of nonconserved positions can also modify function. Indeed, we recently identified nonconserved positions that have large substitution effects in human liver pyruvate kinase (hLPYK), including altered allosteric coupling. To facilitate a comparison of which characteristics determine when a nonconserved position does vs does not contribute to function, the goal of the current work was to identify neutral positions in hLPYK. However, existing hLPYK data showed that three features commonly associated with neutral positions—high sequence entropy, high surface exposure, and alanine scanning—lacked the sensitivity needed to guide experimental studies. We used multiple evolutionary patterns identified in a sequence alignment of the PYK family to identify which positions were least patterned, reasoning that these were most likely to be neutral. Nine positions were tested with a total of 117 amino acid substitutions. Although exploring all potential functions is not feasible for any protein, five parameters associated with substrate/effector affinities and allosteric coupling were measured for hLPYK variants. For each position, the aggregate functional outcomes of all variants were used to quantify a “neutrality” score. Three positions showed perfect neutral scores for all five parameters. Furthermore, the nine positions showed larger neutral scores than 17 positions located near allosteric binding sites. Thus, our strategy successfully enriched the dataset for positions with neutral and modest substitutions.  相似文献   

15.
The identification of DNA sequence variants underlying human complex phenotypes remains a significant challenge for several reasons: individual variants can have small phenotypic effects or low population frequencies, and multiple allelic variants may act in concert to affect a trait. We evaluated the combined effect of allelic variants in seven genes involved in high-density lipoprotein (HDL) metabolism, using forward stepwise regression. Analysis of all known common single-nucleotide polymorphisms (SNPs) in the seven candidate genes revealed four variants that were associated with incremental changes in HDL cholesterol levels in three independent samples. Conversely, analysis of 660 polymorphisms in eight genes that do not appear to be involved in HDL metabolism did not identify any associations with plasma HDL-cholesterol levels. These data indicate that several common SNPs act in concert to influence plasma levels of HDL cholesterol.  相似文献   

16.
Interpreting the phenotypic consequences of human structural variation remains challenging. Functional enrichment analysis, which can identify functional enrichments among genes affected by structural variants, is providing significant biological insights into the genotype-phenotype relationship. In this review, we discuss the different approaches and choices in the application of this technique to human structural variation. We consider the importance of choosing the right background distribution for detection, the significance of the gene selection criteria, the effects of tissue-specific gene length biases and discuss sources of functional annotations with a focus on Gene Ontology and mouse phenotypic resources. Throughout this review, we highlight potential sources of significant bias that are of particular concern to the analysis of structural variants, and illustrate the importance of examining the expectations upon which enrichment analysis techniques depend.  相似文献   

17.
Localization of human quantitative trait loci (QTLs) is now routine. However, identifying their functional DNA variants is still a formidable challenge. We present a complete dissection of a human QTL using novel statistical techniques to infer the most likely functional polymorphisms of a QTL that influence plasma levels of clotting factor VII (FVII), a risk factor for cardiovascular disease. Resequencing of 15 kb in and around the F7 gene identified 49 polymorphisms, which were then genotyped in 398 people. Using a Bayesian quantitative trait nucleotide (BQTN) method, we identified four to seven functional variants that completely account for this QTL. These variants include both rare coding variants and more common, potentially regulatory polymorphisms in intronic and promoter regions.  相似文献   

18.
Linkage mapping has been extensively applied in the murine and human genomes. It remains a powerful approach to mapping genes and identifying genetic variants. As genome efforts identify large numbers of single-nucleotide polymorphisms, it will be critical to validate these polymorphisms and confirm their gene assignment and chromosomal location. The presence of pseudogenes can confuse such efforts. We have used denaturing HPLC to identify polymorphisms in human genes and to genotype individuals in selected CEPH pedigrees. The same approach has been applied to the mapping of murine genes in interspecies backcross animals. This strategy is rapid, accurate and superior in several respects to other technologies.  相似文献   

19.
Within the past decade our understanding of thromboembolic disorders has become even more sophisticated as recent discoveries have suggested the influence of gene variants on the development of atherosclerotic disease and arterial thrombosis. Candidate genes encode proteins involved in processes relevant to atherosclerosis, ranging from cholesterol metabolism to arterial thrombosis. Platelets are key elements in primary hemostasis, but also in arterial thrombosis. Moreover, a number of genetic polymorphisms of platelet proteins may also induce gain or loss of function, supporting a role predisposing some individuals to thrombotic events. However, after thousands of studies, much controversy remains whether individual platelet polymorphisms contribute to an increased likelihood of thromboembolic disorders. Although platelet polymorphisms are a promising addition to more established cardiovascular risk factors, identifying genetic variants as a single cause of cardiovascular disease would be an oversimplification; instead, the contribution of these polymorphisms should also be considered in the context of a multifactorial disease. Gene-gene and gene-environment studies would identify specific combinations associated with a high risk to suffer from these diseases. The platelet's genetic heterogeneity should also be considered in every aspect of clinical medicine, ranging from susceptibility to diseases, pathogenesis, and clinical outcome to diversity in responses to drug treatment (pharmacogenomics), and bleeding.  相似文献   

20.
The p53 tumour suppressor protein lies at the crossroads of multiple cellular response pathways that control the fate of the cell in response to endogenous or exogenous stresses and inactivation of the p53 tumour suppressor signalling pathway is seen in most human cancers. Such aberrant p53 activity may be caused by mutations in the TP53 gene sequence producing truncated or inactive mutant proteins, or by aberrant production of other proteins that regulate p53 activity, such as gene amplification and overexpression of MDM2 or viral proteins that inhibit or degrade p53. Recent studies have also suggested that inherited genetic polymorphisms in the p53 pathway influence tumour formation, progression and/or response to therapy. In some cases, these variants are clearly associated with clinico-pathological variables or prognosis of cancer, whereas in other cases the evidence is less conclusive. Here, we review the evidence that common polymorphisms in various aspects of p53 biology have important consequences for overall tumour susceptibility, clinico-pathology and prognosis. We also suggest reasons for some of the reported discrepancies in the effects of common polymorphisms on tumourigenesis, which relate to the complexity of effects on tumour formation in combination with other oncogenic changes and other polymorphisms. It is likely that future studies of combinations of polymorphisms in the p53 pathway will be useful for predicting tumour susceptibility in the human population and may serve as predictive biomarkers of tumour response to standard therapies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号