首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To control for hidden population stratification in genetic-association studies, statistical methods that use marker genotype data to infer population structure have been proposed as a possible alternative to family-based designs. In principle, it is possible to infer population structure from associations between marker loci and from associations of markers with the trait, even when no information about the demographic background of the population is available. In a model in which the total population is formed by admixture between two or more subpopulations, confounding can be estimated and controlled. Current implementations of this approach have limitations, the most serious of which is that they do not allow for uncertainty in estimations of individual admixture proportions or for lack of identifiability of subpopulations in the model. We describe methods that overcome these limitations by a combination of Bayesian and classical approaches, and we demonstrate the methods by using data from three admixed populations--African American, African Caribbean, and Hispanic American--in which there is extreme confounding of trait-genotype associations because the trait under study (skin pigmentation) varies with admixture proportions. In these data sets, as many as one-third of marker loci show crude associations with the trait. Control for confounding by population stratification eliminates these associations, except at loci that are linked to candidate genes for the trait. With only 32 markers informative for ancestry, the efficiency of the analysis is 70%. These methods can deal with both confounding and selection bias in genetic-association studies, making family-based designs unnecessary.  相似文献   

2.
Law B  Buckleton JS  Triggs CM  Weir BS 《Genetics》2003,164(1):381-387
The probability of multilocus genotype counts conditional on allelic counts and on allelic independence provides a test statistic for independence within and between loci. As the number of loci increases and each sampled genotype becomes unique, the conditional probability becomes a function of total heterozygosity. In that case, it does not address between-locus dependence directly but only indirectly through detection of the Wahlund effect. Moreover, the test will reject the hypothesis of allelic independence only for small values of heterozygosity. Low heterozygosity is expected for population subdivision but not for population admixture. The test may therefore be inappropriate for admixed populations. If individuals with parents in two different populations are always considered to belong to one of the populations, then heterozygosity is increased in that population and the exact test should not be used for sparse data sets from that population. If such a case is suspected, then alternative testing strategies are suggested.  相似文献   

3.
Summary Population admixture can be a confounding factor in genetic association studies. Family‐based methods ( Rabinowitz and Larid, 2000 , Human Heredity 50, 211–223) have been proposed in both testing and estimation settings to adjust for this confounding, especially in case‐only association studies. The family‐based methods rely on conditioning on the observed parental genotypes or on the minimal sufficient statistic for the genetic model under the null hypothesis. In some cases, these methods do not capture all the available information due to the conditioning strategy being too stringent. General efficient methods to adjust for population admixture that use all the available information have been proposed ( Rabinowitz, 2002 , Journal of the American Statistical Association 92, 742–758). However these approaches may not be easy to implement in some situations. A previously developed easy‐to‐compute approach adjusts for admixture by adding supplemental covariates to linear models ( Yang et al., 2000 , Human Heredity 50, 227–233). Here is shown that this augmenting linear model with appropriate covariates strategy can be combined with the general efficient methods in Rabinowitz (2002) to provide computationally tractable and locally efficient adjustment. After deriving the optimal covariates, the adjusted analysis can be carried out using standard statistical software packages such as SAS or R . The proposed methods enjoy a local efficiency in a neighborhood of the true model. The simulation studies show that nontrivial efficiency gains can be obtained by using information not accessible to the methods that rely on conditioning on the minimal sufficient statistics. The approaches are illustrated through an analysis of the influence of apolipoprotein E (APOE) genotype on plasma low‐density lipoprotein (LDL) concentration in children.  相似文献   

4.
African-American populations are genetically admixed. Studies performed among unrelated individuals from ethnically admixed populations may be both vulnerable to confounding by population stratification, but offer an opportunity for efficiently mapping complex traits through admixture linkage disequilibrium. By typing 42 ancestry-informative markers and estimating genetic ancestry, we assessed genetic admixture and heterogeneity among African-American participants in the Coronary Artery Risk Development in Young Adults (CARDIA) cohort. We also assessed associations between individual genetic ancestry and several quantitative and binary traits related to cardiovascular risk. We found evidence of population sub-structure and excess inter-marker linkage disequilibrium, consistent with recent admixture. The estimated group admixture proportions were 78.1% African and 22.9% European, but differed according to geographic region. In multiple regression models, African ancestry was significantly associated with decreased total cholesterol, decreased LDL-cholesterol, and decreased triglycerides, and also with increased risk of insulin resistance. These observed associations between African ancestry and several lipid traits are consistent with the general tendency of individuals of African descent to have healthier lipid profiles compared to European-Americans. There was no association between genetic ancestry and hypertension, BMI, waist circumference, CRP level, or coronary artery calcification. These results demonstrate the potential for confounding of genetic associations with some cardiovascular disease-related traits in large studies involving US African-Americans.  相似文献   

5.
Here is presented an approach to testing whether the effect of a candidate gene on a quantitative trait is dominant and for testing whether the effect is recessive. The approach uses parental genotype information in nuclear families to adjust for bias due to population admixture. The approach is applicable regardless of the nature of the sampling. The results of an application of the methods to a candidate mutation for diabetic nephropathy are used for illustration.  相似文献   

6.
The nature of gene flow in parasites with complex life cycles is poorly understood, particularly when intermediate and definitive hosts have contrasting movement potential. We examined whether the fine-scale population genetic structure of the diphyllobothriidean cestode Schistocephalus solidus reflects the habits of intermediate threespine stickleback hosts or those of its definitive hosts, semi-aquatic piscivorous birds, to better understand complex host-parasite interactions. Seventeen lakes in the Cook Inlet region of south-central Alaska were sampled, including ten in the Matanuska-Susitna Valley, five on the Kenai Peninsula, and two in the Bristol Bay drainage. We analyzed sequence variation across a 759 bp region of the mitochondrial DNA (mtDNA) cytochrome oxidase I region for 1,026 S. solidus individuals sampled from 2009-2012. We also analyzed allelic variation at 8 microsatellite loci for 1,243 individuals. Analysis of mtDNA haplotype and microsatellite genotype variation recovered evidence of significant population genetic structure within S. solidus. Host, location, and year were factors in structuring observed genetic variation. Pairwise measures revealed significant differentiation among lakes, including a pattern of isolation-by-distance. Bayesian analysis identified three distinct genotypic clusters in the study region, little admixture within hosts and lakes, and a shift in genotype frequencies over time. Evidence of fine-scale population structure in S. solidus indicates that movement of its vagile, definitive avian hosts has less influence on gene flow than expected based solely on movement potential. Observed patterns of genetic variation may reflect genetic drift, behaviors of definitive hosts that constrain dispersal, life history of intermediate hosts, and adaptive specificity of S. solidus to intermediate host genotype.  相似文献   

7.
Elucidating genetic influences on bison growth and body composition is of interest, not only because bison are important for historical, cultural, and agricultural reasons, but also because their unusual population history makes them valuable models for finding influential loci in both domestic cattle and humans. We tested for trait loci associated with body weight, height, and bison mass index (BMI) while controlling for estimated ancestry to reduce potential confounding effects due to population admixture in 1316 bison sampled from four U.S. herds. We used 60 microsatellite markers to model each phenotype as a function of herd, sex, age, marker genotypes, and individual ancestry estimates. Statistical significance for genotype and its interaction with ancestry was evaluated using the adaptive false discovery rate. Of the four herds, two appeared to be admixed and two were nonadmixed. Although none of the main effects of the loci were significant, estimated ancestry and its interaction with marker loci were significantly associated with the phenotypes, illustrating the importance of including ancestry in the models and the dependence of genotype-phenotype associations on background ancestry. Individual loci contributed approximately 2.0% of variation in weight, height, and BMI, which confirms the utility and potential importance of adjusting for population stratification.  相似文献   

8.
Falush D  Stephens M  Pritchard JK 《Genetics》2003,164(4):1567-1587
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.  相似文献   

9.
The effect of genetic drift on the genetic structure of seven Irish populations was investigated using anthropometric data collected during the 1890s on 259 adult males. These populations ranged in size from 769 to 3757, were relatively stable over time, and were located within 119 km of one another. Two populations are known to have experienced considerable English admixture. Data on ten anthropometric variables (three body measures and seven craniofacial measures) were adjusted for age and used to compute a relationship (R) matrix. The R matrix was converted into a distance measure and compared with a potential genetic drift distance measure, defined as (1/Ni + 1/Nj), where Ni and Nj are the effective population sizes of groups i and j (derivation of this formula is presented). Distances were rank-transformed, and the correlation between their pairwise elements was computed using matrix permutation methods to assess significance. Under the hypothesis that drift affects anthropometric variation, these correlations are expected to be positive. The correlation between anthropometric distance and potential genetic drift distance is 0.123, which is not significantly different from 0 (P = 0.368). When a multiple regression model is used to adjust for geographic distance and English admixture, the partial correlation (0.369) is significant (p = 0.021). As part of further analysis of the genetic structure of these populations, the same analyses were repeated using a distance matrix derived from surname frequencies. The correlation of surname distance and potential genetic drift distance is 0.164, which is not significant (p = 0.264). When the multiple regression model is applied, the correlation is 0.401, which is borderline significant (p = 0.055). These results show the influence of genetic drift, local migration, and admixture on Irish population structure.  相似文献   

10.
We examine the generation of cytonuclear disequilibria by admixture and continued gene flow. General formulas analogous to the nuclear case are first derived showing that the allelic and genotypic disequilibria from admixture or population subdivision equal their expected value across the contributing (sub) populations plus the covariance across these sources between the cytoplasmic gene frequency and the relevant nuclear frequency. A detailed study is then presented of the cytonuclear dynamics, in a random-mating population under two different migration scenarios. In both cases closed-form solutions are given for all variables as a function of the initial conditions and relevant migration parameters. The dynamics of the gene frequencies and allelic disequilibria, which dominate each system, are the same as those involving two unlinked nuclear loci, while the dynamics of the genotypic disequilibria and cytonuclear frequencies have no nuclear counterpart. The continent-island formulation focuses on a population receiving continued immigration from a large source of constant composition. A major discovery is that cytonuclear disequilibria can transiently build up on the "island" to levels far exceeding those found at equilibrium. In contrast, the admixture formulation focuses on the dynamics within two populations undergoing continued intermigration. Although in this case all cytonuclear associations must ultimately decay to zero, long-term transient disequilibria can develop which are many times their initial admixture values. For both migration scenarios it is shown that the time of population censusing relative to migration and reproduction dramatically affects both the amount and pattern of the nonrandom associations produced. The empirical relevance of these models is discussed in light of nuclear-mitochondrial data from a hybrid zone between European and North American eels and from a zone of racial admixture in humans.  相似文献   

11.
Gao H  Williamson S  Bustamante CD 《Genetics》2007,176(3):1635-1651
Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy-Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s approximately 0.48-0.70).  相似文献   

12.
Wakefield J 《Biometrics》2003,59(1):9-17
In many ecological regression studies investigating associations between environmental exposures and health outcomes, the observed relative risks are in the range 1.0-2.0. The interpretation of such small relative risks is difficult due to a variety of biases--some of which are unique to ecological data, since they arise from within-area variability in exposures/confounders. The potential for residual spatial dependence, due to unmeasured confounders and/or data anomalies with spatial structure, must also be considered, though it often will be of secondary importance when compared to the likely effects of unmeasured confounding and within-area variability in exposures/confounders. Methods for addressing sensitivity to these issues are described, along with an approach for assessing the implications of spatial dependence. An ecological study of the association between myocardial infarction and magnesium is critically reevaluated to determine potential sources of bias. It is argued that the sophistication of the statistical analysis should not outweigh the quality of the data, and that finessing models for spatial dependence will often not be merited in the context of ecological regression.  相似文献   

13.
Stepwise regression is often used to draw associations between phenological records and weather data. For example, the dates that a species first flowers each year might be regressed on monthly mean temperatures for a period preceding flowering. The months that 'best' explain the variation in first flowering dates would be selected by stepwise regression. However, daily records of weather are usually available. Stepwise regression on daily temperatures would not be appropriate because of high correlations between neighbouring days. Smoothing methods provide a way of avoiding such difficulties. Regression coefficients can be smoothed by penalising differences in slopes between neighbouring regressors. The resultant curve of regression gradients is intuitively attractive. Various possible approaches to smoothing regression coefficients are discussed. We illustrate the use of one method, P-spline signal regression, which is particularly appropriate when there are many more regressors than observations. Smoothing can be applied to more than one set of regressors. This results in a multi-dimensional surface of regression coefficients. We use this approach to investigate how the time of year that a plant species tends to flower affects its relationship with temperature records. Using this method, we found that later species tend to be affected by later temperatures.  相似文献   

14.
The quantitative genetic variance-covariance that can be maintained in a random environment is studied, assuming overlapping generations and Gaussian stabilizing selection with a fluctuating optimum. The phenotype of an individual is assumed to be determined by additive contributions from each locus on paternal and maternal gametes (i.e., no epistasis and no dominance). Recurrent mutation is ignored, but linkage between loci is arbitrary. The genotype distribution in the evolutionarily stable population is generically discrete: only a finite number of polymorphic alleles with distinctly different effects are maintained, even though we allow a continuum of alleles with arbitrary phenotypic contributions to invade. Fluctuating selection maintains nonzero genetic variance in the evolutionarily stable population if the environmental heterogeneity is larger than a certain threshold. Explicit asymptotic expressions for the standing variance-covariance components are derived for the population near the threshold, or for large generational overlap, as a function of environmental variability and genetic parameters (i.e., number of loci, recombination rate, etc.), using the fact that the genotype distribution is discrete. Above the threshold, the population maintains considerable genetic variance in the form of positive linkage disequilibrium and positive gamete covariance (Hardy-Weinberg disequilibrium) as well as allelic variance. The relative proportion of these disequilibrium variances in the total genetic variance increases with the environmental variability.  相似文献   

15.
Phenotypic variability is evaluated in a series of skeletal samples from the Apalachee region of Florida. Based on ethnohistoric evidence, several predictive models for changes in variability are generated. If variability decreases through time, this likely represents the effect of genetic drift in populations experiencing epidemic disease and population loss. If variability increases through time, this suggests that population aggregation or genetic admixture were primary factors shaping the Apalachee population during the mission period. Dental dimensions were collected from a series of precontact (pre-1500), early mission (AD 1633-1650) (San Pedro y San Pablo de Patale), and late mission (post-1657) (San Luis) samples from the Apalachee region and were subjected to univariate and multivariate variability analyses. The results indicate that the late mission San Luis sample was significantly more variable than the Patale or precontact samples; however, the Patale sample exhibited no significant variability change in comparison to the precontact population. This suggests that the missions initially effected limited change in genetic variability in the mission populations. However, San Luis was affected by either admixture or population aggregation to such a degree that the observed variation had increased beyond earlier levels. Given the limited historic evidence for population aggregation at this mission, and the comparatively large resident Spanish population, the increased variability may be indicative of admixture at this mission, and potentially at this mission only. Based on a limited data set, however, it appears that the mission period cannot be typified by a single evolutionary or historic process.  相似文献   

16.
The influences of the apolipoprotein E (Apo E) polymorphism and of gender on the distributions of plasma levels of total cholesterol (Total-C), 1n triglycerides (1n Trig), HDL cholesterol (HDL-C), and apolipoproteins AI (Apo AI), AII (Apo AII), 1n E (1nApo E), B (Apo B), CII (Apo CII), and 1n CIII (1nApo CIII) were studied in 507 unrelated individuals representative of the adult population of Rochester, MN. Apo E genotypes influenced both phenotypic level and intragenotype phenotypic variability. The mean levels of six of the nine traits were influenced significantly by Apo E genotype. Intragenotype variability in eight of the nine traits was significantly different among Apo E genotypes. These effects were estimated separately in males and females. The contribution of allelic variation in the Apo E gene to the definition of the multivariate mena and variance of the lipid and apolipoprotein hyperspace was evaluated. These findings were used to demonstrate how heterogeneity of risk-factor-trait variance among genotype/gender-specific subgroups of the population at large may influence the evaluation of risk of coronary artery disease.  相似文献   

17.
Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids.  相似文献   

18.
Case-control genetic association studies in admixed populations are known to be susceptible to genetic confounding due to population stratification. The transmission/disequilibrium test (TDT) approach can avoid this problem. However, the TDT is expensive and impractical for late-onset diseases. Case-control study designs, in which, cases and controls are matched by admixture, can be an appealing and a suitable alternative for genetic association studies in admixed populations. In this study, we applied this matching strategy when recruiting our African American participants in the Study of African American, Asthma, Genes and Environments. Group admixture in this cohort consists of 83% African ancestry and 17% European ancestry, which was consistent with reports from other studies. By carrying out several complementary analyses, our results show that there is a substructure in the cohort, but that the admixture distributions are almost identical in cases and controls, and also in cases only. We performed association tests for asthma-related traits with ancestry, and only found that FEV(1), a measure for baseline pulmonary function, was associated with ancestry after adjusting for socio-economic and environmental risk factors (P=0.01). We did not observe an excess of type I error rate in our association tests for ancestry informative markers and asthma-related phenotypes when ancestry was not adjusted in the analyses. Furthermore, using the association tests between genetic variants in a known asthma candidate gene, beta(2) adrenergic receptor (beta(2)AR) and DeltaFEF(25-75), an asthma-related phenotype, as an example, we demonstrated population stratification was not a confounder in our genetic association. Our present work demonstrates that admixture-matched case-control strategies can efficiently control population stratification confounding in admixed populations.  相似文献   

19.
一种有效的复杂疾病基因定位的检测法   总被引:1,自引:0,他引:1  
连锁不平衡(LD)应用于某些复杂疾病基因的定位,近年来发展了许多LD定位方法,除TDT外,大多数LD定位方法须先假定无人群混和,人群混合可增大在疾病基因定位时犯Ⅰ类错误的机率,产生无效结果。此方法利用LD来检测标记位点和疾病敏感位点(DSL)的连锁(有连锁不平衡)相关(有连锁)。分析时采用不相关样本,已知其父母基因型和至少父母之一为杂合子,再将随机样本依基因型不同分类,然后对来自不同类的数据应用有力的统计方法进行单独和联合分析。此LD定位法不仅适用于患病和正常个体,而且有效消除据父母基因分类的样本定位时人群混合的影响,分析结果和模拟结果也表明此方法解决了在检测标记位点和疾病敏感位点之间的连锁和相关时人群混和的问题,但与TDT比,此法在检测的位点为DSL时丙能有效和充分地利用矫正数据,检测位点不是DSL时,此法和TDT法可相互补充更有效地检测连锁的DSL。  相似文献   

20.
Admixture between human populations is the norm rather than the exception. Many groups practice deme exogamy, so that all marriages include one spouse from an outside group. Typically, that group is a nearby population, but in many areas, especially since the advent of post-agricultural times, human movement has been greater and matings have occurred between ever more disparate groups. Because evolution is a response to local conditions, and occurs in the context of the overall genetic variability of local groups, admixture may be an important factor in human evolution and adaptation. To understand the potential impact of admixture on human evolution, it is important to know the effect of admixture on the geographic distribution of human variation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号