首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.  相似文献   

3.
4.
To investigate whether common variants in the human genetic background are associated with pathogenesis of ischemic heart diseases, we systematically surveyed 41 possible candidate genes for single-nucleotide polymorphisms (SNPs) by directly sequencing 96 independent alleles at each locus, derived from 48 unrelated Japanese patients with myocardial infarction, including 25.8 kb 5' flanking regions, 56.8 kb exonic and 35.4 kb intronic sequences, and 1.8 kb 3' flanking regions. In this genomic DNA of nearly 120 kb, we identified 187 SNPs: 55 in 5' flanking regions, seven in 5' untranslated regions (UTRs), 52 in coding elements, 64 in introns, eight in 3' UTRs, and one in a 3' flanking region. Among the 52 coding SNPs, 26 were non-synonymous changes. Allelic frequencies of some of the polymorphisms were significantly different from those reported in European populations. For example, the Q506R substitution in the coagulation factor V gene, the so-called "Leiden mutation", has a reported frequency of 2.3% in Europeans, but we detected the Leiden mutation in none of the Japanese genomes that we investigated. The allelic frequencies of the -33A>G SNP in the thrombomodulin gene were also very different; this allele occurred at a 12% frequency in the Japanese patients that we examined, although it had been detected in none of 82 Caucasians reported previously. These data support the hypothesis that some SNPs are specific to particular ethnic groups.  相似文献   

5.
6.
Hitchhiking under positive Darwinian selection   总被引:77,自引:0,他引:77  
Fay JC  Wu CI 《Genetics》2000,155(3):1405-1413
Positive selection can be inferred from its effect on linked neutral variation. In the restrictive case when there is no recombination, all linked variation is removed. If recombination is present but rare, both deterministic and stochastic models of positive selection show that linked variation hitchhikes to either low or high frequencies. While the frequency distribution of variation can be influenced by a number of evolutionary processes, an excess of derived variants at high frequency is a unique pattern produced by hitchhiking (derived refers to the nonancestral state as determined from an outgroup). We adopt a statistic, H, to measure an excess of high compared to intermediate frequency variants. Only a few high-frequency variants are needed to detect hitchhiking since not many are expected under neutrality. This is of particular utility in regions of low recombination where there is not much variation and in regions of normal or high recombination, where the hitchhiking effect can be limited to a small (<1 kb) region. Application of the H test to published surveys of Drosophila variation reveals an excess of high frequency variants that are likely to have been influenced by positive selection.  相似文献   

7.
Localization of human quantitative trait loci (QTLs) is now routine. However, identifying their functional DNA variants is still a formidable challenge. We present a complete dissection of a human QTL using novel statistical techniques to infer the most likely functional polymorphisms of a QTL that influence plasma levels of clotting factor VII (FVII), a risk factor for cardiovascular disease. Resequencing of 15 kb in and around the F7 gene identified 49 polymorphisms, which were then genotyped in 398 people. Using a Bayesian quantitative trait nucleotide (BQTN) method, we identified four to seven functional variants that completely account for this QTL. These variants include both rare coding variants and more common, potentially regulatory polymorphisms in intronic and promoter regions.  相似文献   

8.
DNA from 130 individuals was studied with up to 18 (primarily cDNA) probes for the frequency of variants in this initial experiment to determine the feasibility of this approach to screening for germinal gene mutations. This approach, a modification of the usual restriction enzyme mapping strategy, focuses on the detection of insertion/deletion/rearrangement (I/D/R) variants, because the DNA is digested with only two restriction enzymes before transfer to membranes and hybridization with an extensive series of unrelated probes. Some 4000 noncontiguous, independent DNA fragments ("loci"), functional loci, pseudogenes or anonymous fragments, (a total of approximately 77,400 kb) were screened. 19 different classes and 31 copies of presumably I/D/R variants were detected while 4 different classes and 24 individuals exhibiting base substitution variants were observed. 18 of the 19 I/D/R classes were rare variants, that is, each were observed at a frequency, within this population, of less than 0.01; 3 of the base substitution classes existed at polymorphic frequencies and only 1 was a rare variant. 10 of the I/D/R classes, occurring in a total of 18 individuals, were detected with probes which are not known to be associated with repetitive elements. This is a variant frequency for I/D/R variants without known repetitive elements of 0.15 classes and 0.23 copies for each 1000 kb screened; this would extrapolate to 1600 such variant sites in the genome of each individual. Within the context of a mutation screening program, the rare variants, either with or without repetitive elements, would have a higher probability of being de novo mutations than would polymorphic variants; this former group would be the focus of family studies to test for the heritability of the allele (fragment pattern). Sufficient DNA probes are available to screen a significant portion of the human genome for genetic variation and de novo mutations of this type.  相似文献   

9.
Common variants explain little of the variance of most common disease,prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases.Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power.To estimate the performance of imputation of rare variants,we imputed 153 individuals,each of whom was genotyped on 3 different genotype arrays including 317k,610k and 1 million single nucleotide polymorphisms(SNPs),to two different reference panels:HapMap2 and 1000 Genomes pilot March 2010 release (lKGpilot) by using IMPUTE version 2.We found that more than 94%and 84%of all SNPs yield acceptable accuracy(info > 0.4) in HapMap2 and lKGpilot-based imputation,respectively.For rare variants(minor allele frequency(MAF) <5%),the proportion of wellimputed SNPs increased as the MAF increased from 0.3%to 5%across all 3 genome-wide association study(GWAS) datasets.The proportion of well-imputed SNPs was 69%,60%and 49%for SNPs with a MAF from 0.3%to 5%for 1M,610k and 317k,respectively. None of the very rare variants(MAF < 0.3%) were well imputed.We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small.Variants with lower MAF are more difficult to impute.These findings have important implications in the design and replication of large-scale sequencing studies.  相似文献   

10.
The human SNP database was used to detect selection on 238 hexamers previously identified as exonic splicing enhancers (ESEs). We compared the distribution of the 238 putative ESEs in biallelic and triallelic SNPs within five different functional categories of the SNP database: synonymous, nonsynonymous, introns, UTRs, and nongenic SNPs. Since true ESEs do not function outside of exons, SNPs that disrupt ESE motifs were expected to be more common in nonexonic portions of the genome. Our results supported this expectation: ESEs were least prevalent within synonymous SNPs and most common in nongenic SNPs. There were ∼11% fewer ESEs within synonymous biallelic SNPs than expected under no selective constraint. We also compared the frequency of neutral SNPs, those where neither allele was an ESE, with deleterious SNPs, those where one or more alleles was an ESE, across the five different functional classes of SNPs. In comparison with the other functional classes of SNPs, synonymous SNPs contained an excess of neutral variants (+1.64% and +6.04% for biallelic and triallelic SNPs, respectively) and a dearth of deleterious variants (−13.11% and −52.39% for biallelic and triallelic SNPs, respectively). The observed patterns were consistent with purifying selection on the 238 hexamers to maintain their function as ESEs. However, in contrast to previous work, we did not find evidence for selection to maintain ESE function at nonsynonymous SNPs because selection at the protein level probably obscured any difference at the level of ESE function.  相似文献   

11.
Differences in genomic structure between individuals are ubiquitous features of human genetic variation. Specific copy number variants (CNVs) have been associated with susceptibility to numerous complex psychiatric disorders, including attention-deficit-hyperactivity disorder, autism-spectrum disorders and schizophrenia. These disorders often display co-morbidity with low intelligence. Rare chromosomal deletions and duplications are associated with these disorders, so it has been suggested that these deletions or duplications may be associated with differences in intelligence. Here we investigate associations between large (≥500kb), rare (<1% population frequency) CNVs and both fluid and crystallized intelligence in community-dwelling older people. We observe no significant associations between intelligence and total CNV load. Examining individual CNV regions previously implicated in neuropsychological disorders, we find suggestive evidence that CNV regions around SHANK3 are associated with fluid intelligence as derived from a battery of cognitive tests. This is the first study to examine the effects of rare CNVs as called by multiple algorithms on cognition in a large non-clinical sample, and finds no effects of such variants on general cognitive ability.  相似文献   

12.
Abstract: Range expansion from Pleistocene refugia and anthropogenic influences contribute to the present distribution pattern of Arabidopsis thaliana. We scored a genome-wide set of CAPSs and found two markers with an east-west geographic distribution across the Eurasian range of the species. Regions around the two SNPs were sequenced in 98 accessions, including newly collected plants from Middle Asia and Western Siberia. These regions correspond to a gene (∼ 1500 bp) and a non-coding region (∼ 500 bp) 300 kbp apart on chromosome 2. Nucleotide diversities, π, of the two sequenced fragments were 0.0032 and 0.0130. The haplotypes of both sequences belonged to one of two groups: a rather uniform "Asian" and a more variable "European" haplotype group, on the basis of non-disjunct clusters of SNPs. Recombination between "Asian" and "European" haplotypes occurs where they meet. Especially in the "European" haplotype, many rare SNP variants representing independent mutations are scattered among the shared haplotype-specific SNPs. This agrees with previous suggestions of two large haplotype groups in A. thaliana and the post-glacial colonization of central Europe from the east and the west. A clear correlation between climatic factors and the haplotype distribution may reflect the dispersal history rather than local climate adaptation. The pattern of SNP variation within the contiguous sequences explains why only a minority of SNPs selected across the genome show evidence of this geographic pattern.  相似文献   

13.
Interpreting the impact of human genome variation on phenotype is challenging. The functional effect of protein-coding variants is often predicted using sequence conservation and population frequency data, however other factors are likely relevant. We hypothesized that variants in protein post-translational modification (PTM) sites contribute to phenotype variation and disease. We analyzed fraction of rare variants and non-synonymous to synonymous variant ratio (Ka/Ks) in 7,500 human genomes and found a significant negative selection signal in PTM regions independent of six factors, including conservation, codon usage, and GC-content, that is widely distributed across tissue-specific genes and function classes. PTM regions are also enriched in known disease mutations, suggesting that PTM variation is more likely deleterious. PTM constraint also affects flanking sequence around modified residues and increases around clustered sites, indicating presence of functionally important short linear motifs. Using target site motifs of 124 kinases, we predict that at least ∼180,000 motif-breaker amino acid residues that disrupt PTM sites when substituted, and highlight kinase motifs that show specific negative selection and enrichment of disease mutations. We provide this dataset with corresponding hypothesized mechanisms as a community resource. As an example of our integrative approach, we propose that PTPN11 variants in Noonan syndrome aberrantly activate the protein by disrupting an uncharacterized cluster of phosphorylation sites. Further, as PTMs are molecular switches that are modulated by drugs, we study mutated binding sites of PTM enzymes in disease genes and define a drug-disease network containing 413 novel predicted disease-gene links.  相似文献   

14.
Alcohol dependence (AD) is a complex psychiatric disorder that affects about 12.5 % of US adults. Genetic factors play a major role in the development of AD. We conducted a genomewide association study in 2,875 African-Americans including 1,719 AD cases and 1,156 controls. We used the Illumina Omni 1-Quad microarray, which yielded 769,498 single-nucleotide polymorphisms (SNPs) after quality control. To explore the genetic architecture of AD, we estimated the variance that could be explained by all SNPs and subsets of SNPs using two different approaches to genome partitioning. We found that 23.9 % (s.e. 9.3 %) of the phenotypic variance could be explained by using all of the common SNPs on the array. We also found a significant linear relationship between the proportion of the top SNPs used and the phenotypic variance explained by them. Based on genome partitioning of common variants, we also observed a significant linear relationship between the variance explained by a chromosome and its length. Chromosome 4, known to contain several AD risk genes, accounted for excess risk in proportion to its length. By functional partitioning, we found that the genetic variants within 20 kb of genes explained 17.5 % (s.e. 11.4 %) of the phenotypic variance. Our findings are consistent with the generally accepted view that AD is a highly polygenic trait, i.e., the genetic risk in AD appears to be conferred by multiple variants, each of which may have a small or moderate effect.  相似文献   

15.
16.
Sequence diversity in 36 candidate genes for cardiovascular disorders.   总被引:22,自引:0,他引:22       下载免费PDF全文
Two strategies involving whole-genome association studies have been proposed for the identification of genes involved in complex diseases. The first one seeks to characterize all common variants of human genes and to test their association with disease. The second one seeks to develop dense maps of single-nucleotide polymorphisms (SNPs) and to detect susceptibility genes through linkage disequilibrium. We performed a molecular screening of the coding and/or flanking regions of 36 candidate genes for cardiovascular diseases. All polymorphisms identified by this screening were further genotyped in 750 subjects of European descent. In the whole set of genes, the lengths explored spanned 53.8 kb in the 5' regions, 68.4 kb in exonic regions, and 13 kb in the 3' regions. The strength of linkage disequilibrium within candidate regions suggests that genomewide maps of SNPs might be efficient ways to identify new disease-susceptibility genes, provided that the maps are sufficiently dense. However, the relatively large number of polymorphisms within coding and regulatory regions of candidate genes raises the possibility that several of them might be functional and that the pattern of genotype-phenotype association might be more complex than initially envisaged, as actually has been observed in some well-characterized genes. These results argue in favor of both genomewide association studies and detailed studies of the overall sequence variation of candidate genes, as complementary approaches.  相似文献   

17.
Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.  相似文献   

18.
Genome-wide association studies (GWAS) have detected many disease associations. However, the reported variants tend to explain small fractions of risk, and there are doubts about issues such as the portability of findings over different ethnic groups or the relative roles of rare versus common variants in the genetic architecture of complex disease. Studying the degree of sharing of disease-associated variants across populations can help in solving these issues. We present a comprehensive survey of GWAS replicability across 28 diseases. Most loci and SNPs discovered in Europeans for these conditions have been extensively replicated using peoples of European and East Asian ancestry, while the replication with individuals of African ancestry is much less common. We found a strong and significant correlation of Odds Ratios across Europeans and East Asians, indicating that underlying causal variants are common and shared between the two ancestries. Moreover, SNPs that failed to replicate in East Asians map into genomic regions where Linkage Disequilibrium patterns differ significantly between populations. Finally, we observed that GWAS with larger sample sizes have detected variants with weaker effects rather than with lower frequencies. Our results indicate that most GWAS results are due to common variants. In addition, the sharing of disease alleles and the high correlation in their effect sizes suggest that most of the underlying causal variants are shared between Europeans and East Asians and that they tend to map close to the associated marker SNPs.  相似文献   

19.
We describe a new computer program, SnpEff, for rapidly categorizing the effects of variants in genome sequences. Once a genome is sequenced, SnpEff annotates variants based on their genomic locations and predicts coding effects. Annotated genomic locations include intronic, untranslated region, upstream, downstream, splice site, or intergenic regions. Coding effects such as synonymous or non-synonymous amino acid replacement, start codon gains or losses, stop codon gains or losses, or frame shifts can be predicted. Here the use of SnpEff is illustrated by annotating ~356,660 candidate SNPs in ~117 Mb unique sequences, representing a substitution rate of ~1/305 nucleotides, between the Drosophila melanogaster w(1118); iso-2; iso-3 strain and the reference y(1); cn(1) bw(1) sp(1) strain. We show that ~15,842 SNPs are synonymous and ~4,467 SNPs are non-synonymous (N/S ~0.28). The remaining SNPs are in other categories, such as stop codon gains (38 SNPs), stop codon losses (8 SNPs), and start codon gains (297 SNPs) in the 5'UTR. We found, as expected, that the SNP frequency is proportional to the recombination frequency (i.e., highest in the middle of chromosome arms). We also found that start-gain or stop-lost SNPs in Drosophila melanogaster often result in additions of N-terminal or C-terminal amino acids that are conserved in other Drosophila species. It appears that the 5' and 3' UTRs are reservoirs for genetic variations that changes the termini of proteins during evolution of the Drosophila genus. As genome sequencing is becoming inexpensive and routine, SnpEff enables rapid analyses of whole-genome sequencing data to be performed by an individual laboratory.  相似文献   

20.
Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq <0.3, accuracy for both rare and low frequency SNPs was very high and almost identical to accuracy for common SNPs. We found that imputation using the 1KG-EUR panel had advantages in successfully imputing rare, low frequency and common variants. Our findings suggest that 1KG-based imputation can increase the opportunity to discover significant associations for SNPs across the allele frequency spectrum. Because the 1KG Project is still underway, we expect that later versions will provide even better imputation performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号