首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Genome-wide association studies (GWAS) have had a tremendous success in the identification of common DNA sequence variants associated with complex human diseases and traits. However, because of their design, GWAS are largely inappropriate to characterize the role of rare and low-frequency DNA variants on human phenotypic variation. Rarer genetic variation is geographically more restricted, supporting the need for local whole-genome sequencing (WGS) efforts to study these variants in specific populations. Here, we present the first large-scale low-pass WGS of the French-Canadian population. Specifically, we sequenced at ~5.6× coverage the whole genome of 1970 French Canadians recruited by the Montreal Heart Institute Biobank and identified 29 million bi-allelic variants (31 % novel), including 19 million variants with a minor allele frequency (MAF) <0.5 %. Genotypes from the WGS data are highly concordant with genotypes obtained by exome array on the same individuals (99.8 %), even when restricting this analysis to rare variants (MAF <0.5, 99.9 %) or heterozygous sites (98.9 %). To further validate our data set, we showed that we can effectively use it to replicate several genetic associations with myocardial infarction risk and blood lipid levels. Furthermore, we analyze the utility of our WGS data set to generate a French-Canadian-specific imputation reference panel and to infer population structure in the Province of Quebec. Our results illustrate the value of low-pass WGS to study the genetics of human diseases in the founder French-Canadian population.  相似文献   

2.
Resequencing is an emerging tool for identification of rare disease-associated mutations. Rare mutations are difficult to tag with SNP genotyping, as genotyping studies are designed to detect common variants. However, studies have shown that genetic heterogeneity is a probable scenario for common diseases, in which multiple rare mutations together explain a large proportion of the genetic basis for the disease. Thus, we propose a weighted-sum method to jointly analyse a group of mutations in order to test for groupwise association with disease status. For example, such a group of mutations may result from resequencing a gene. We compare the proposed weighted-sum method to alternative methods and show that it is powerful for identifying disease-associated genes, both on simulated and Encode data. Using the weighted-sum method, a resequencing study can identify a disease-associated gene with an overall population attributable risk (PAR) of 2%, even when each individual mutation has much lower PAR, using 1,000 to 7,000 affected and unaffected individuals, depending on the underlying genetic model. This study thus demonstrates that resequencing studies can identify important genetic associations, provided that specialised analysis methods, such as the weighted-sum method, are used.  相似文献   

3.
Large whole-genome sequencing projects have provided access to much rare variation in human populations, which is highly informative about population structure and recent demography. Here, we show how the age of rare variants can be estimated from patterns of haplotype sharing and how these ages can be related to historical relationships between populations. We investigate the distribution of the age of variants occurring exactly twice ( variants) in a worldwide sample sequenced by the 1000 Genomes Project, revealing enormous variation across populations. The median age of haplotypes carrying variants is 50 to 160 generations across populations within Europe or Asia, and 170 to 320 generations within Africa. Haplotypes shared between continents are much older with median ages for haplotypes shared between Europe and Asia ranging from 320 to 670 generations. The distribution of the ages of haplotypes is informative about their demography, revealing recent bottlenecks, ancient splits, and more modern connections between populations. We see the effect of selection in the observation that functional variants are significantly younger than nonfunctional variants of the same frequency. This approach is relatively insensitive to mutation rate and complements other nonparametric methods for demographic inference.  相似文献   

4.
Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low‐coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high‐density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low‐density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.  相似文献   

5.

Background

Estimating the historical and demographic parameters that characterize modern human populations is a fundamental part of reconstructing the recent history of our species. In addition, the development of a model of human evolution that can best explain neutral genetic diversity is required to identify confidently regions of the human genome that have been targeted by natural selection.

Methodology/Principal Findings

We have resequenced 20 independent noncoding autosomal regions dispersed throughout the genome in 213 individuals from different continental populations, corresponding to a total of ∼6 Mb of diploid resequencing data. We used these data to explore and co-estimate an extensive range of historical and demographic parameters with a statistical framework that combines the evaluation of multiple models of human evolution via a best-fit approach, followed by an Approximate Bayesian Computation (ABC) analysis. From a methodological standpoint, evaluating the accuracy of the parameter co-estimation allowed us to identify the most accurate set of statistics to be used for the estimation of each of the different historical and demographic parameters characterizing recent human evolution.

Conclusions/Significance

Our results support a model in which modern humans left Africa through a single major dispersal event occurring ∼60,000 years ago, corresponding to a drastic reduction of ∼5 times the effective population size of the ancestral African population of ∼13,800 individuals. Subsequently, the ancestors of modern Europeans and East Asians diverged much later, ∼22,500 years ago, from the population of ancestral migrants. This late diversification of Eurasians after the African exodus points to the occurrence of a long maturation phase in which the ancestral Eurasian population was not yet diversified.  相似文献   

6.
We investigate the performance of tests of neutrality in admixed populations using plausible demographic models for African-American history as well as resequencing data from African and African-American populations. The analysis of both simulated and human resequencing data suggests that recent admixture does not result in an excess of false-positive results for neutrality tests based on the frequency spectrum after accounting for the population growth in the parental African population. Furthermore, when simulating positive selection, Tajima's D, Fu and Li's D, and haplotype homozygosity have lower power to detect population-specific selection using individuals sampled from the admixed population than from the nonadmixed population. Fay and Wu's H test, however, has more power to detect selection using individuals from the admixed population than from the nonadmixed population, especially when the selective sweep ended long ago. Our results have implications for interpreting recent genome-wide scans for positive selection in human populations.  相似文献   

7.
Pseudo-vitamin D-deficiency rickets (PDDR) was mapped close to D12S90 and between proximal D12S312 and distal (D12S305, D12S104) microsatellites that were subsequently found on a single YAC clone. Analysis of a complex haplotype in linkage disequilibrium (LD) with the disease discriminated among distinct founder effects in French Canadian populations in Acadia and in Charlevoix-Saguenay-Lac-Saint-Jean (Ch-SLSJ), as well as an earlier one in precolonial Europe. A simple demographic model suggested the historical age of the founder effect in Ch-SLSJ to be approximately 12 generations. The corresponding LD data are consistent with this figure when they are analyzed within the framework of Luria-Delbrück model, which takes into account the population growth. Population sampling due to a limited number of first settlers and the rapid demographic expansion appear to have played a major role in the founding of PDDR in Ch-SLSJ and, presumably, other genetic disorders endemic to French Canada. Similarly, the founder effect in Ashkenazim, coinciding with their early settlement in medieval Poland and subsequent expansion eastward, could explain the origin of frequent genetic diseases in this population.  相似文献   

8.
A new genetic estimator of the effective population size (N(e)) is introduced. This likelihood-based (LB) estimator uses two temporally spaced genetic samples of individuals from a population. We compared its performance to that of the classical F-statistic-based N(e) estimator (N(eFk)) by using data from simulated populations with known N(e) and real populations. The new likelihood-based estimator (N(eLB)) showed narrower credible intervals and greater accuracy than (N(eFk)) when genetic drift was strong, but performed only slightly better when genetic drift was relatively weak. When drift was strong (e.g., N(e) = 20 for five generations), as few as approximately 10 loci (heterozygosity of 0.6; samples of 30 individuals) are sufficient to consistently achieve credible intervals with an upper limit <50 using the LB method. In contrast, approximately 20 loci are required for the same precision when using the classical F-statistic approach. The N(eLB) estimator is much improved over the classical method when there are many rare alleles. It will be especially useful in conservation biology because it less often overestimates N(e) than does N(eLB) and thus is less likely to erroneously suggest that a population is large and has a low extinction risk.  相似文献   

9.
Genome-wide association studies (GWAS) have in recent years discovered thousands of associated markers for hundreds of phenotypes. However, associated loci often only explain a relatively small fraction of heritability and the link between association and causality has yet to be uncovered for most loci. Rare causal variants have been suggested as one scenario that may partially explain these shortcomings. Specifically, Dickson et al. recently reported simulations of rare causal variants that lead to association signals of common, tag single nucleotide polymorphisms, dubbed "synthetic associations". However, an open question is what practical implications synthetic associations have for GWAS. Here, we explore the signatures exhibited by such "synthetic associations" and their implications based on patterns of genetic variation observed in human populations, thus accounting for human evolutionary history -a force disregarded in previous simulation studies. This is made possible by human population genetic data from HapMap 3 consisting of both resequencing and array-based genotyping data for the same set of individuals from multiple populations. We report that synthetic associations tend to be further away from the underlying risk alleles compared to "natural associations" (i.e. associations due to underlying common causal variants), but to a much lesser extent than previously predicted, with both the age and the effect size of the risk allele playing a part in this phenomenon. We find that while a synthetic association has a lower probability of capturing causal variants within its linkage disequilibrium block, sequencing around the associated variant need not extend substantially to have a high probability of capturing at least one causal variant. We also show that the minor allele frequency of synthetic associations is lower than of natural associations for most, but not all, loci that we explored. Finally, we find the variance in associated allele frequency to be a potential indicator of synthetic associations.  相似文献   

10.
Many populations introduced into a novel environment fail to establish. One underlying process is the Allee effect, i.e., the difficulty of individuals to survive and reproduce when rare, and the consequently low or negative population growth. Although observations showing a positive relation between initial population size and establishment probability suggest that the Allee effect could be widespread in biological invasions, experimental tests are scarce. Here, we used a biological control program against Diuraphis noxia (Mordvilko) (Hemiptera: Aphididae) in the United States to manipulate initial population size of the introduced parasitoid Aphelinus asychis Walker (Hymenoptera: Aphelinidae) originating from France. For eight populations and three generations after introduction, we studied spatial distribution and spread, density, mate-finding, and population growth. Dispersal was lower in small populations during the first generation. Smaller initial population size nonetheless resulted in lower density during the three generations studied. The proportion of mated females and the population sex ratio were not affected by initial population size or population density. Net reproductive rate decreased with density within each generation, suggesting negative density-dependence. But for a given density, net reproductive rate was smaller in populations initiated with few individuals than in populations initiated with many individuals. Hence, our results demonstrate a demographic Allee effect. Mate-finding is excluded as an underlying mechanism, and other component Allee effects may have been overwhelmed by negative density-dependence in reproduction. Impact of generalist predators could provide one potential explanation for the relationship between initial population size and net reproductive rate. However, the continuing effect of initial population size on population growth suggests genetic processes may have been involved in the observed demographic Allee effect.  相似文献   

11.
Elucidating the pattern of genetic diversity for non-European populations is necessary to make the benefits of human genetics research available to individuals from these groups. In the era of large human genomic initiatives, Native American populations have been neglected, in particular, the Quechua, the largest South Amerindian group settled along the Andes. We characterized the genetic diversity of a Quechua population in a global setting, using autosomal noncoding sequences (nine unlinked loci for a total of 16 kb), 351 unlinked SNPs and 678 microsatellites and tested predictions of the model of the evolution of Native Americans proposed by (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496). European admixture is <5% and African ancestry is barely detectable in the studied population. The largest genetic distances were between African versus Quechua or Melanesian populations, which is concordant with the African origin of modern humans and the fact that South America was the last part of the world to be peopled. The diversity in the Quechua population is comparable with that of Eurasian populations, and the allele frequency spectrum based on resequencing data does not reflect a reduction in the proportion of rare alleles. Thus, the Quechua population is a large reservoir of common and rare genetic variants of South Amerindians. These results are consistent with and complement our evolutionary model of South Amerindians (Tarazona-Santos et al.: Am J Hum Genet 68 (2001) 1485-1496), proposed based on Y-chromosome data, which predicts high genomic diversity due to the high level of gene flow between Andean populations and their long-term effective population size.  相似文献   

12.
Population persistence has been studied in a conservation context to predict the fate of small or declining populations. Persistence models have explored effects on extinction of random demographic and environmental fluctuations, but in the face of directional environmental change they should also integrate factors affecting whether a population can adapt. Here, we examine the population‐size dependence of demographic and genetic factors and their likely contributions to extinction time under scenarios of environmental change. Parameter estimates were derived from experimental populations of the rainforest species, Drosophila birchii, held in the lab for 10 generations at census sizes of 20, 100 and 1000, and later exposed to five generations of heat‐knockdown selection. Under a model of directional change in the thermal environment, rapid extinction of populations of size 20 was caused by a combination of low growth rate (r) and high stochasticity in r. Populations of 100 had significantly higher reproductive output, lower stochasticity in r and more additive genetic variance (VA) than populations of 20, but they were predicted to persist less well than the largest size class. Even populations of 1000 persisted only a few hundred generations under realistic estimates of environmental change because of low VA for heat‐knockdown resistance. The experimental results document population‐size dependence of demographic and adaptability factors. The simulations illustrate a threshold influence of demographic factors on population persistence, while genetic variance has a more elastic impact on persistence under environmental change.  相似文献   

13.
Rare genetic variants, identified by in-detail resequencing of loci, may contribute to complex traits. We used the apolipoprotein A-I gene (APOA1), a major high-density lipoprotein (HDL) gene, and population-based resequencing to determine the spectrum of genetic variants, the phenotypic characteristics of these variants, and how these results compared with results based on resequencing only the extremes of the apolipoprotein A-I (apoA-I) distribution. First, we resequenced APOA1 in 10,330 population-based participants in the Copenhagen City Heart Study. The spectrum and distribution of genetic variants was determined as a function of the number of individuals resequenced. Second, apoA-I and HDL cholesterol phenotypes were determined for nonsynonymous (NS) and synonymous (S) variants and were validated in the Copenhagen General Population Study (n = 45,239). Third, observed phenotypes were compared with those predicted using an extreme phenotype approach based on the apoA-I distribution. Our results are as follows: First, population-based resequencing of APOA1 identified 40 variants of which only 7 (18%) had minor allele frequencies >1%, and most were exceedingly rare. Second, 0.27% of individuals in the general population were heterozygous for NS variants which were associated with substantial reductions in apoA-I (up to 39 mg/dL) and/or HDL cholesterol (up to 0.9 mmol/L) and, surprisingly, 0.41% were heterozygous for variants predisposing to amyloidosis. NS variants associated with a hazard ratio of 1.72 (1.09–2.70) for myocardial infarction (MI), largely driven by A164S, a variant not associated with apoA-I or HDL cholesterol levels. Third, using the extreme apoA-I phenotype approach, NS variants correctly predicted the apoA-I phenotype observed in the population-based resequencing. However, using the extreme approach, between 79% (screening 0–1st percentile) and 21% (screening 0–20th percentile) of all variants were not identified; among these were variants previously associated with amyloidosis. Population-based resequencing of APOA1 identified a majority of rare NS variants associated with reduced apoA-1 and HDL cholesterol levels and/or predisposing to amyloidosis. In addition, NS variants associated with increased risk of MI.  相似文献   

14.
The Amur tiger, Panthera tigris altaica, is a highly endangered felid whose range and population size has been severely reduced in recent times. At present, the wild population is estimated at 490 individuals, having rebounded from the 20–30 tigers remaining following a severe bottleneck in the 1940's. The current study presents preliminary data on the patterns and levels of genetic variation in the mitochondrial DNA control region using DNA extracted from non-invasively sampled faecal material, collected throughout the entire range of P. t. altaica in the Russian Far East. Analysis of 82 scat samples representing at least 27 individuals revealed extremely low levels of CR haplotype diversity, characterized by a single widespread haplotype (96.4%) and two rare variants, each differing by a single step within the hypervariable I (2.4%) and central conserved regions (1.2%), respectively. A comparison with previous data on cytochrome bvariation in 14 captive individuals revealed a potentially greater amount of genetic variation represented in captivity relative to that found in the wild population. The extremely low levels of mitochondrial DNA variation in the wild population is discussed in light of the demographic processes that might have shaped these patterns as well as the potential bias introduced through analysis of fecal samples. These results highlight the continuing need to assess levels of genetic variation even in recovering populations that are increasing in number and underscore the important role that captive breeding programs may play in preserving remnant genetic diversity of endangered species.  相似文献   

15.
Demographic history plays a major role in shaping the distribution of genomic variation. Yet the interaction between different demographic forces and their effects in the genomes is not fully resolved in human populations. Here, we focus on the Roma population, the largest transnational ethnic minority in Europe. They have a South Asian origin and their demographic history is characterized by recent dispersals, multiple founder events, and extensive gene flow from non-Roma groups. Through the analyses of new high-coverage whole exome sequences and genome-wide array data for 89 Iberian Roma individuals together with forward simulations, we show that founder effects have reduced their genetic diversity and proportion of rare variants, gene flow has counteracted the increase in mutational load, runs of homozygosity show ancestry-specific patterns of accumulation of deleterious homozygotes, and selection signals primarily derive from preadmixture adaptation in the Roma population sources. The present study shows how two demographic forces, bottlenecks and admixture, act in opposite directions and have long-term balancing effects on the Roma genomes. Understanding how demography and gene flow shape the genome of an admixed population provides an opportunity to elucidate how genomic variation is modeled in human populations.  相似文献   

16.
Stranger BE  Stahl EA  Raj T 《Genetics》2011,187(2):367-383
Enormous progress in mapping complex traits in humans has been made in the last 5 yr. There has been early success for prevalent diseases with complex phenotypes. These studies have demonstrated clearly that, while complex traits differ in their underlying genetic architectures, for many common disorders the predominant pattern is that of many loci, individually with small effects on phenotype. For some traits, loci of large effect have been identified. For almost all complex traits studied in humans, the sum of the identified genetic effects comprises only a portion, generally less than half, of the estimated trait heritability. A variety of hypotheses have been proposed to explain why this might be the case, including untested rare variants, and gene-gene and gene-environment interaction. Effort is currently being directed toward implementation of novel analytic approaches and testing rare variants for association with complex traits using imputed variants from the publicly available 1000 Genomes Project resequencing data and from direct resequencing of clinical samples. Through integration with annotations and functional genomic data as well as by in vitro and in vivo experimentation, mapping studies continue to characterize functional variants associated with complex traits and address fundamental issues such as epistasis and pleiotropy. This review focuses primarily on the ways in which genome-wide association studies (GWASs) have revolutionized the field of human quantitative genetics.  相似文献   

17.
18.
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.  相似文献   

19.
Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ~2500 individuals by using Illumina SNP data, with an emphasis on “hotspots” prone to recurrent mutations. We find variants larger than 500 kb in 5%–10% of individuals and variants greater than 1 Mb in 1%–2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%–1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.  相似文献   

20.
Deep resequencing of functional regions in human genomes is key to identifying potentially causal rare variants for complex disorders. Here, we present the results from a large-sample resequencing (n = 285 patients) study of candidate genes coupled with population genetics and statistical methods to identify rare variants associated with Autism Spectrum Disorder and Schizophrenia. Three genes, MAP1A, GRIN2B, and CACNA1F, were consistently identified by different methods as having significant excess of rare missense mutations in either one or both disease cohorts. In a broader context, we also found that the overall site frequency spectrum of variation in these cases is best explained by population models of both selection and complex demography rather than neutral models or models accounting for complex demography alone. Mutations in the three disease-associated genes explained much of the difference in the overall site frequency spectrum among the cases versus controls. This study demonstrates that genes associated with complex disorders can be mapped using resequencing and analytical methods with sample sizes far smaller than those required by genome-wide association studies. Additionally, our findings support the hypothesis that rare mutations account for a proportion of the phenotypic variance of these complex disorders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号