首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This article reviews recent developments in Bayesian algorithms that explicitly include geographical information in the inference of population structure. Current models substantially differ in their prior distributions and background assumptions, falling into two broad categories: models with or without admixture. To aid users of this new generation of spatially explicit programs, we clarify the assumptions underlying the models, and we test these models in situations where their assumptions are not met. We show that models without admixture are not robust to the inclusion of admixed individuals in the sample, thus providing an incorrect assessment of population genetic structure in many cases. In contrast, admixture models are robust to an absence of admixture in the sample. We also give statistical and conceptual reasons why data should be explored using spatially explicit models that include admixture.  相似文献   

2.
Model based methods for genetic clustering of individuals, such as those implemented in structure or ADMIXTURE, allow the user to infer individual ancestries and study population structure. The underlying model makes several assumptions about the demographic history that shaped the analysed genetic data. One assumption is that all individuals are a result of K homogeneous ancestral populations that are all well represented in the data, while another assumption is that no drift happened after the admixture event. The histories of many real world populations do not conform to that model, and in that case taking the inferred admixture proportions at face value might be misleading. We propose a method to evaluate the fit of admixture models based on estimating the correlation of the residual difference between the true genotypes and the genotypes predicted by the model. When the model assumptions are not violated, the residuals from a pair of individuals are not correlated. In the case of a bad fitting admixture model, individuals with similar demographic histories have a positive correlation of their residuals. Using simulated and real data, we show how the method is able to detect a bad fit of inferred admixture proportions due to using an insufficient number of clusters K or to demographic histories that deviate significantly from the admixture model assumptions, such as admixture from ghost populations, drift after admixture events and nondiscrete ancestral populations. We have implemented the method as an open source software that can be applied to both unphased genotypes and low depth sequencing data.  相似文献   

3.
Falush D  Stephens M  Pritchard JK 《Genetics》2003,164(4):1567-1587
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.  相似文献   

4.
Model-based (likelihood and Bayesian) and non-model-based (PCA and K-means clustering) methods were developed to identify populations and assign individuals to the identified populations using marker genotype data. Model-based methods are favoured because they are based on a probabilistic model of population genetics with biologically meaningful parameters and thus produce results that are easily interpretable and applicable. Furthermore, they often yield more accurate structure inferences than non-model-based methods. However, current model-based methods either are computationally demanding and thus applicable to small problems only or use simplified admixture models that could yield inaccurate results in difficult situations such as unbalanced sampling. In this study, I propose new likelihood methods for fast and accurate population admixture inference using genotype data from a few multiallelic microsatellites to millions of diallelic SNPs. The methods conduct first a clustering analysis of coarse-grained population structure by using the mixture model and the simulated annealing algorithm, and then an admixture analysis of fine-grained population structure by using the clustering results as a starting point in an expectation maximisation algorithm. Extensive analyses of both simulated and empirical data show that the new methods compare favourably with existing methods in both accuracy and running speed. They can analyse small datasets with just a few multiallelic microsatellites but can also handle in parallel terabytes of data with millions of markers and millions of individuals. In difficult situations such as many and/or lowly differentiated populations, unbalanced or very small samples of individuals, the new methods are substantially more accurate than other methods.Subject terms: Population genetics, Evolutionary ecology  相似文献   

5.
Argentine population genetic structure was examined using a set of 78 ancestry informative markers (AIMs) to assess the contributions of European, Amerindian, and African ancestry in 94 individuals members of this population. Using the Bayesian clustering algorithm STRUCTURE, the mean European contribution was 78%, the Amerindian contribution was 19.4%, and the African contribution was 2.5%. Similar results were found using weighted least mean square method: European, 80.2%; Amerindian, 18.1%; and African, 1.7%. Consistent with previous studies the current results showed very few individuals (four of 94) with greater than 10% African admixture. Notably, when individual admixture was examined, the Amerindian and European admixture showed a very large variance and individual Amerindian contribution ranged from 1.5 to 84.5% in the 94 individual Argentine subjects. These results indicate that admixture must be considered when clinical epidemiology or case control genetic analyses are studied in this population. Moreover, the current study provides a set of informative SNPs that can be used to ascertain or control for this potentially hidden stratification. In addition, the large variance in admixture proportions in individual Argentine subjects shown by this study suggests that this population is appropriate for future admixture mapping studies.  相似文献   

6.
This paper introduces a likelihood method of estimating ethnic admixture that uses individuals, pedigrees, or a combination of individuals and pedigrees. For each founder of a pedigree, admixture proportions are calculated by conditioning on the pedigree-wide genotypes at all ancestry-informative markers. These estimates are then propagated down the pedigree to the nonfounders by a simple averaging process. The large-sample standard errors of the founders' proportions can be similarly transformed into standard errors for the admixture proportions of the descendants. These standard errors are smaller than the corresponding standard errors when each individual is treated independently. Both hard and soft information on a founder's ancestry can be accommodated in this scheme, which has been implemented in the genetic software package Mendel. The utility of the method is demonstrated on simulated data and a real data example involving Mexican families of mixed Amerindian and Spanish ancestry.  相似文献   

7.
The Genetic Structure of Admixed Populations   总被引:26,自引:2,他引:24       下载免费PDF全文
J. C. Long 《Genetics》1991,127(2):417-428
  相似文献   

8.
Bayesian statistical methods for the estimation of hidden genetic structure of populations have gained considerable popularity in the recent years. Utilizing molecular marker data, Bayesian mixture models attempt to identify a hidden population structure by clustering individuals into genetically divergent groups, whereas admixture models target at separating the ancestral sources of the alleles observed in different individuals. We discuss the difficulties involved in the simultaneous estimation of the number of ancestral populations and the levels of admixture in studied individuals' genomes. To resolve this issue, we introduce a computationally efficient method for the identification of admixture events in the population history. Our approach is illustrated by analyses of several challenging real and simulated data sets. The software (baps), implementing the methods introduced here, is freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.  相似文献   

9.
Admixture occurs when individuals from parental populations that have been isolated for hundreds of generations form a new hybrid population. Currently, interest in measuring biogeographic ancestry has spread from anthropology to forensic sciences, direct-to-consumers personal genomics, and civil rights issues of minorities, and it is critical for genetic epidemiology studies of admixed populations. Markers with highly differentiated frequencies among human populations are informative of ancestry and are called ancestry informative markers (AIMs). For tri-hybrid Latin American populations, ancestry information is required for Africans, Europeans and Native Americans. We developed two multiplex panels of AIMs (for 14 SNPs) to be genotyped by two mini-sequencing reactions, suitable for investigators of medium-small laboratories to estimate admixture of Latin American populations. We tested the performance of these AIMs by comparing results obtained with our 14 AIMs with those obtained using 108 AIMs genotyped in the same individuals, for which DNA samples is available for other investigators. We emphasize that this type of comparison should be made when new admixture/population structure panels are developed. At the population level, our 14 AIMs were useful to estimate European admixture, though they overestimated African admixture and underestimated Native American admixture. Combined with more AIMs, our panel could be used to infer individual admixture. We used our panel to infer the pattern of admixture in two urban populations (Montes Claros and Manhua?u) of the State of Minas Gerais (southeastern Brazil), obtaining a snapshot of their genetic structure in the context of their demographic history.  相似文献   

10.
We develop models that describe the cytonuclear structure for either a cytoplasmic and nuclear marker in a haplodiploid species or a cytoplasmic and X-linked marker in a diploid species. Sex-specific disequilibrium statistics that summarize nonrandom cytonuclear associations in such systems are defined, and their basic Hardy-Weinberg dynamics and admixture formulae are delimited. We focus on the context of hybrid zones and develop continent-island models whereby individuals from two genetically differentiated source populations migrate into and mate within a single zone of admixture. We examine the effects of differential migration of the sexes, assortative mating by pure type females, and census time (relative to mating and migration), as well as special cases of random mating and migration subsumed under the general models. We show that pure type individuals and nonzero cytonuclear disequilibria can be maintained within a hybrid zone if there is continued migration from both source populations, and that females generally have a greater influence over these cytonuclear variables than males. The resulting theoretical framework can be used to estimate the rates of assortative mating and sex-specific gene flow in hybrid zones and other zones of admixture involving haplodiploid or sex-linked cytonuclear data.  相似文献   

11.
We studied 156 individuals of Native American descent from the city of Tlapa in the state of Guerrero in western Mexico. Most individuals' ethnicity was either Nahua, Mixtec, or Tlapanec, but self-identified Mestizos and individuals of mixed ethnicities were also included in the sample. We typed 24 autosomal, one Y-chromosome, and four mitochondrial ancestry-informative markers (AIMs) to estimate group and individual admixture proportions, and determine whether the admixture process involved directional gene flow between parental groups. When genetically defined (GD) Mestizos were excluded from the analysis, Native American ancestry represented approximately 98% of the population's gene pool, while European and West African ancestry represented approximately 1% each. Maternally inherited markers also showed an exceptionally high Native American contribution (98.5%), as did the paternally inherited marker, DYS199 (90.7%). We did not detect genetic structure in this population using these AIMs, which appears consistent with the homogeneity of the sample in terms of admixture proportions. The addition of GD Mestizos to the sample did not produce a considerable change in admixture estimates, but it had a major effect on population structure. These results show that the population of Tlapa in Guerrero, Mexico, has experienced little admixture with Europeans and/or West Africans. They also show that the impact of a small number of admixed individuals on an otherwise homogeneous population might have profound implications on subsequent ancestry/phenotype analysis and mapping strategies. We suggest that heterogeneity is a major characteristic of Mexican populations and, as a consequence, should not be disregarded when designing epidemiological studies of Mexican and Mexican American populations.  相似文献   

12.
Inferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology and conservation biology. Such population structure inferences are routinely investigated via the program structure implementing a Bayesian algorithm to identify groups of individuals at Hardy–Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical data sets. In this study, I used simulated and empirical microsatellite data sets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward‐biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, while at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled data sets. Additionally, a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.  相似文献   

13.
Rosenberg NA  Nordborg M 《Genetics》2006,173(3):1665-1678
In linkage disequilibrium mapping of genetic variants causally associated with phenotypes, spurious associations can potentially be generated by any of a variety of types of population structure. However, mathematical theory of the production of spurious associations has largely been restricted to population structure models that involve the sampling of individuals from a collection of discrete subpopulations. Here, we introduce a general model of spurious association in structured populations, appropriate whether the population structure involves discrete groups, admixture among such groups, or continuous variation across space. Under the assumptions of the model, we find that a single common principle--applicable to both the discrete and admixed settings as well as to spatial populations--gives a necessary and sufficient condition for the occurrence of spurious associations. Using a mathematical connection between the discrete and admixed cases, we show that in admixed populations, spurious associations are less severe than in corresponding mixtures of discrete subpopulations, especially when the variance of admixture across individuals is small. This observation, together with the results of simulations that examine the relative influences of various model parameters, has important implications for the design and analysis of genetic association studies in structured populations.  相似文献   

14.
The computer program Structure implements a Bayesian method, based on a population genetics model, to assign individuals to their source populations using genetic marker data. It is widely applied in the fields of ecology, evolutionary biology, human genetics and conservation biology for detecting hidden genetic structures, inferring the most likely number of populations (K), assigning individuals to source populations and estimating admixture and migration rates. Recently, several simulation studies repeatedly concluded that the program yields erroneous inferences when samples from different populations are highly unbalanced in size. Analysing both simulated and empirical data sets, this study confirms that Structure indeed yields poor individual assignments to source populations and gives frequently incorrect estimates of K when sampling is unbalanced. However, this poor performance is mainly caused by the adoption of the default ancestry prior, which assumes all source populations contribute equally to the pooled sample of individuals. When the alternative ancestry prior, which allows for unequal representations of the source populations by the sample, is adopted, accurate individual assignments could be obtained even if sampling is highly unbalanced. The alternative prior also improves the inference of K by two estimators, albeit the improvement is not as much as that in individual assignments to populations. For the difficult case of many populations and unbalanced sampling, a rarely used parameter combination of the alternative ancestry prior, an initial ALPHA value much smaller than the default and the uncorrelated allele frequency model is required for Structure to yield accurate inferences. I conclude that Structure is easy to use but is easier to misuse because of its complicated genetic model and many parameter (prior) options which may not be obvious to choose, and suggest using multiple plausible models (parameters) and K estimators in conducting comparative and exploratory Structure analysis.  相似文献   

15.
Asmussen MA  Orive ME 《Genetics》2000,155(2):813-831
We determine the nuclear-dicytoplasmic effects of unidirectional gene flow via pollen and seeds upon a mixed-mating plant population, focusing on nuclear-mitochondrial-chloroplast systems where mitochondria are inherited maternally and chloroplasts paternally, as in many conifers. After first delineating the general effects of admixture (via seeds or individuals) on the nonrandom associations in such systems, we derive the full dicytonuclear equilibrium structure, including when disequilibria may be indicators of gene flow. Substantial levels of permanent two- and three-locus disequilibria can be generated in adults by (i) nonzero disequilibria in the migrant pools or (ii) intermigrant admixture effects via different chloroplast frequencies in migrant pollen and seeds. Additionally, three-locus disequilibria can be generated by higher-order intermigrant effects such as different chloroplast frequencies in migrant pollen and seeds coupled with nuclear-mitochondrial disequilibria in migrant seeds, or different nuclear frequencies in migrant pollen and seeds coupled with mitochondrial-chloroplast disequilibria in migrant seeds. Further insight is provided by considering special cases with seed or pollen migration alone, complete random mating or selfing, or migrant pollen and seeds lacking disequilibria or intermigrant admixture effects. The results complete the theoretical foundation for a new method for estimating pollen and seed migration using joint cytonuclear or dicytonuclear data.  相似文献   

16.
17.
18.
Spatially explicit models relating to plant populations have developed little since Felsenstein (1975) pointed out that if limited seed dispersal causes clustering of individuals, such models cannot reach an equilibrium. This paper aims to resolve this issue by modifying the Neyman-Scott cluster point process. The new point processes are dynamic models with random immigration, and the continuous increase in the clustering of individuals stops at some level. Hence, an equilibrium state is achieved, and new individual-based spatially explicit neutral coalescent models are established. By fitting the spatial structure at equilibrium to individual spatial distribution data, we can indirectly estimate seed dispersal and effective population density. These estimates are improved when genetic data are available, and become even more sophisticated if spatial distribution and genetic data pertaining to the offspring are also available.  相似文献   

19.
Recent analyses have found that a substantial amount of the Neandertal genome persists in the genomes of contemporary non-African individuals. East Asians have, on average, higher levels of Neandertal ancestry than do Europeans, which might be due to differences in the efficiency of purifying selection, an additional pulse of introgression into East Asians, or other unexplored scenarios. To better define the scope of plausible models of archaic admixture between Neandertals and anatomically modern humans, we analyzed patterns of introgressed sequence in whole-genome data of 379 Europeans and 286 East Asians. We found that inferences of demographic history restricted to neutrally evolving genomic regions allowed a simple one-pulse model to be robustly rejected, suggesting that differences in selection cannot explain the differences in Neandertal ancestry. We show that two additional demographic models, involving either a second pulse of Neandertal gene flow into the ancestors of East Asians or a dilution of Neandertal lineages in Europeans by admixture with an unknown ancestral population, are consistent with the data. Thus, the history of admixture between modern humans and Neandertals is most likely more complex than previously thought.  相似文献   

20.
A new statistical test for linkage heterogeneity.   总被引:6,自引:5,他引:1       下载免费PDF全文
A new, statistical test for linkage heterogeneity is described. It is a likelihood-ratio test based on a beta distribution for the prior distribution of the recombination fraction among families (or individuals). The null distribution for this statistic (called the B-test) is derived under a broad range of circumstances. Two other heterogeneity test statistics--the admixture test or A-test first described by Smith and Morton's test (here referred to as the K-test)--are also examined. The probability distribution for the K-test statistic is very sensitive to family size, whereas the other two statistics are not. All three statistics are somewhat sensitive to the magnitude of the recombination fraction theta. Critical values for each of the test statistics are given. A conservative approximation for both the A-test and B-test is given by a chi 2 distribution when P/2 instead of P is used for the observed significance level. In terms of power, the B-test performs best among the three tests over a broad range of alternate heterogeneity hypotheses--except for the specific case of admixture with loose linkage, in which the A-test performs best. Overall, the difference in power among the three tests is not large. An application to some recently published data on the fragile-X syndrome and X-chromosome markers is given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号