首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Population genetics model based Bayesian methods have been proposed and widely applied to making unsupervised inference of population structure from a sample of multilocus genotypes. Usually they provide good estimates of the ancestry (or population membership) of sampled individuals by clustering them probabilistically or proportionally into (anonymous) populations. However, they have difficulties in accurately estimating the number of populations (K) represented by the sampled individuals. This study proposed a new ad hoc estimator of K, calculable from the output of a population clustering program such as STRUCTURE or ADMIXTURE. The new criterion, called parsimony index (PI), aims to identify the number of populations (K) which yields consistently the minimal admixture estimates of sampled individuals. Extensive simulated and empirical data were used to compare the accuracy of PI and two popular K estimators based on Pr[X|K] (i.e., the probability of genotype data X given K) and ΔK (i.e., the rate of change of the probability of data as a function of K) calculated from STRUCTURE outputs, and the accuracy of PI and the cross‐validation method calculated from ADMIXTURE outputs. It was shown that PI was more accurate than the other methods consistently in various population structure (e.g., hierarchical island model, different extents of differentiation) and sampling (e.g., unbalanced sample sizes, different marker information contents) scenarios. The ΔK method was more accurate than the Pr[X|K] method only for hierarchically structured or highly inbred populations, and the opposite was true in the other scenarios. The PI method was implemented in a computer program, KFinder, which can be run on all major computer platforms.  相似文献   

2.
Maximum-likelihood estimation of admixture proportions from genetic data   总被引:9,自引:0,他引:9  
Wang J 《Genetics》2003,164(2):747-765
For an admixed population, an important question is how much genetic contribution comes from each parental population. Several methods have been developed to estimate such admixture proportions, using data on genetic markers sampled from parental and admixed populations. In this study, I propose a likelihood method to estimate jointly the admixture proportions, the genetic drift that occurred to the admixed population and each parental population during the period between the hybridization and sampling events, and the genetic drift in each ancestral population within the interval between their split and hybridization. The results from extensive simulations using various combinations of relevant parameter values show that in general much more accurate and precise estimates of admixture proportions are obtained from the likelihood method than from previous methods. The likelihood method also yields reasonable estimates of genetic drift that occurred to each population, which translate into relative effective sizes (N(e)) or absolute average N(e)'s if the times when the relevant events (such as population split, admixture, and sampling) occurred are known. The proposed likelihood method also has features such as relatively low computational requirement compared with previous ones, flexibility for admixture models, and marker types. In particular, it allows for missing data from a contributing parental population. The method is applied to a human data set and a wolflike canids data set, and the results obtained are discussed in comparison with those from other estimators and from previous studies.  相似文献   

3.
Zhu X  Zhang S  Tang H  Cooper R 《Human genetics》2006,120(3):431-445
Several disease-mapping methods have been proposed recently, which use the information generated by recent admixture of populations from historically distinct geographic origins. These methods include both classic likelihood and Bayesian approaches. In this study we directly maximize the likelihood function from the hidden Markov Model for admixture mapping using the EM algorithm, allowing for uncertainty in model parameters, such as the allele frequencies in the parental populations. We determined the robustness of the proposed method by examining the ancestral allele frequency estimate and individual marker-location specific ancestry when the data were generated by different population admixture models and no learning sample was used. The proposed method outperforms a widely used Bayesian MCMC strategy for data generated from various population admixture models. The multipoint information content for ancestry was derived based on the map provided by Smith et al. (2004) and the associated statistical power was calculated. We examined the distribution of admixture LD across the genome for both real and simulated data and established a threshold for genome wide significance applicable to admixture mapping studies. The software ADMIXPROGRAM for performing admixture mapping is available from authors.  相似文献   

4.
Advancing technologies have facilitated the ever‐widening application of genetic markers such as microsatellites into new systems and research questions in biology. In light of the data and experience accumulated from several years of using microsatellites, we present here a literature review that synthesizes the limitations of microsatellites in population genetic studies. With a focus on population structure, we review the widely used fixation (FST) statistics and Bayesian clustering algorithms and find that the former can be confusing and problematic for microsatellites and that the latter may be confounded by complex population models and lack power in certain cases. Clustering, multivariate analyses, and diversity‐based statistics are increasingly being applied to infer population structure, but in some instances these methods lack formalization with microsatellites. Migration‐specific methods perform well only under narrow constraints. We also examine the use of microsatellites for inferring effective population size, changes in population size, and deeper demographic history, and find that these methods are untested and/or highly context‐dependent. Overall, each method possesses important weaknesses for use with microsatellites, and there are significant constraints on inferences commonly made using microsatellite markers in the areas of population structure, admixture, and effective population size. To ameliorate and better understand these constraints, researchers are encouraged to analyze simulated datasets both prior to and following data collection and analysis, the latter of which is formalized within the approximate Bayesian computation framework. We also examine trends in the literature and show that microsatellites continue to be widely used, especially in non‐human subject areas. This review assists with study design and molecular marker selection, facilitates sound interpretation of microsatellite data while fostering respect for their practical limitations, and identifies lessons that could be applied toward emerging markers and high‐throughput technologies in population genetics.  相似文献   

5.
The computer program Structure implements a Bayesian method, based on a population genetics model, to assign individuals to their source populations using genetic marker data. It is widely applied in the fields of ecology, evolutionary biology, human genetics and conservation biology for detecting hidden genetic structures, inferring the most likely number of populations (K), assigning individuals to source populations and estimating admixture and migration rates. Recently, several simulation studies repeatedly concluded that the program yields erroneous inferences when samples from different populations are highly unbalanced in size. Analysing both simulated and empirical data sets, this study confirms that Structure indeed yields poor individual assignments to source populations and gives frequently incorrect estimates of K when sampling is unbalanced. However, this poor performance is mainly caused by the adoption of the default ancestry prior, which assumes all source populations contribute equally to the pooled sample of individuals. When the alternative ancestry prior, which allows for unequal representations of the source populations by the sample, is adopted, accurate individual assignments could be obtained even if sampling is highly unbalanced. The alternative prior also improves the inference of K by two estimators, albeit the improvement is not as much as that in individual assignments to populations. For the difficult case of many populations and unbalanced sampling, a rarely used parameter combination of the alternative ancestry prior, an initial ALPHA value much smaller than the default and the uncorrelated allele frequency model is required for Structure to yield accurate inferences. I conclude that Structure is easy to use but is easier to misuse because of its complicated genetic model and many parameter (prior) options which may not be obvious to choose, and suggest using multiple plausible models (parameters) and K estimators in conducting comparative and exploratory Structure analysis.  相似文献   

6.
Inference of individual ancestry is useful in various applications, such as admixture mapping and structured-association mapping. Using information-theoretic principles, we introduce a general measure, the informativeness for assignment (I(n)), applicable to any number of potential source populations, for determining the amount of information that multiallelic markers provide about individual ancestry. In a worldwide human microsatellite data set, we identify markers of highest informativeness for inference of regional ancestry and for inference of population ancestry within regions; these markers, which are listed in online-only tables in our article, can be useful both in testing for and in controlling the influence of ancestry on case-control genetic association studies. Markers that are informative in one collection of source populations are generally informative in others. Informativeness of random dinucleotides, the most informative class of microsatellites, is five to eight times that of random single-nucleotide polymorphisms (SNPs), but 2%-12% of SNPs have higher informativeness than the median for dinucleotides. Our results can aid in decisions about the type, quantity, and specific choice of markers for use in studies of ancestry.  相似文献   

7.
Falush D  Stephens M  Pritchard JK 《Genetics》2003,164(4):1567-1587
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.  相似文献   

8.
Wang J 《Genetics》2006,173(3):1679-1692
A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies.  相似文献   

9.
Knowledge of how individuals are related is important in many areas of research, and numerous methods for inferring pairwise relatedness from genetic data have been developed. However, the majority of these methods were not developed for situations where data are limited. Specifically, most methods rely on the availability of population allele frequencies, the relative genomic position of variants and accurate genotype data. But in studies of non‐model organisms or ancient samples, such data are not always available. Motivated by this, we present a new method for pairwise relatedness inference, which requires neither allele frequency information nor information on genomic position. Furthermore, it can be applied not only to accurate genotype data but also to low‐depth sequencing data from which genotypes cannot be accurately called. We evaluate it using data from a range of human populations and show that it can be used to infer close familial relationships with a similar accuracy as a widely used method that relies on population allele frequencies. Additionally, we show that our method is robust to SNP ascertainment and applicable to low‐depth sequencing data generated using different strategies, including resequencing and RADseq, which is important for application to a diverse range of populations and species.  相似文献   

10.
The objective was to assess by simulation the efficacy of population structure analysis in plant breeding. Twelve populations and 300 inbred lines were simulated and genotyped using 100 microsatellite loci. The experimental material included populations with and without admixture, ancestry relationship and linkage disequilibrium, and with distinct levels of genetic differentiation and effective sizes. The analyses were performed using Structure software and employed all available models. For all the group number (K) tested, for both populations and inbred lines, the admixture model with correlated allelic frequencies provided the highest value for the logarithm of the marginal likelihood. Fitting appropriate model and using adequate sample size for individuals and markers, Structure was effective in identifying the correct population structure, migrants and individuals with genome from distinct populations. The linkage model did not result in an improvement in clustering relative to the admixture model with correlated allelic frequencies. The inclusion of prior information did not change the results; for some K values the analyses showed slight higher values of the marginal likelihood. The reduction in the number of individuals and markers negatively affected the results. There was a high variation in the most probable K value between the evaluated methods.  相似文献   

11.
Jinliang Wang 《Molecular ecology》2014,23(13):3191-3213
Coupled with rapid developments of efficient genetic markers, powerful population genetic methods were proposed to estimate migration rates (m) in natural populations in much broader spatial and temporal scales than the traditional mark‐release‐recapture (MRR) methods. Highly polymorphic (e.g. microsatellites) and genomic‐wide (e.g. SNPs) markers provide sufficient information to assign individuals to their populations or parents of origin and thereby to estimate directly m in a way similar to MRR. Such direct estimates of current migration rates are particularly useful in understanding the ecology and microevolution of wild populations and in managing the populations in the future. In this study, I proposed and implemented, in the software MigEst, a likelihood method to use marker‐based parentage assignments in jointly estimating m and candidate parent sampling proportions (x) in a subset of populations, investigated its power and accuracy using data simulated in various scenarios of population properties (e.g. the actual m, number, size and differentiation of populations) and sampling properties (e.g. the numbers of sampled parent candidates, offspring and markers), compared it with the population assignment approach implemented in the software BayesAss and demonstrated its usefulness by analysing a microsatellite data set from three natural populations of Brazilian bats. Simulations showed that MigEst provides unbiased and accurate estimates of m and performs better than BayesAss except when populations are highly differentiated with very small and ecologically insignificant migration rates. A valuable property of MigEst is that in the presence of unsampled populations, it gives good estimates of the rate of migration among sampled populations as well as of the rate of migration into each sampled population from the pooled unsampled populations.  相似文献   

12.
Bayesian clustering methods have emerged as a popular tool for assessing hybridization using genetic markers. Simulation studies have shown these methods perform well under certain conditions; however, these methods have not been evaluated using empirical data sets with individuals of known ancestry. We evaluated the performance of two clustering programs, baps and structure , with genetic data from a reintroduced red wolf (Canis rufus) population in North Carolina, USA. Red wolves hybridize with coyotes (C. latrans), and a single hybridization event resulted in introgression of coyote genes into the red wolf population. A detailed pedigree has been reconstructed for the wild red wolf population that includes individuals of 50–100% red wolf ancestry, providing an ideal case study for evaluating the ability of these methods to estimate admixture. Using 17 microsatellite loci, we tested the programs using different training set compositions and varying numbers of loci. structure was more likely than baps to detect an admixed genotype and correctly estimate an individual's true ancestry composition. However, structure was more likely to misclassify a pure individual as a hybrid. Both programs were outperformed by a maximum‐likelihood‐based test designed specifically for this system, which never misclassified a hybrid (50–75% red wolf) as a red wolf or vice versa. Training set composition and the number of loci both had an impact on accuracy but their relative importance varied depending on the program. Our findings demonstrate the importance of evaluating methods used for detecting admixture in the context of endangered species management.  相似文献   

13.
There has been much recent excitement about the use of genetics to elucidate ancestral history and demography. Whole genome data from humans and other species are revealing complex stories of divergence and admixture that were left undiscovered by previous smaller data sets. A central challenge is to estimate the timing of past admixture and divergence events, for example the time at which Neanderthals exchanged genetic material with humans and the time at which modern humans left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past admixture events, along with population divergence times and changes in effective population size. We infer demography from a collection of pairwise sequence alignments by summarizing their length distribution of tracts of identity by state (IBS) and maximizing an analytic composite likelihood derived from a Markovian coalescent approximation. Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method power by influencing the distribution of shared IBS tracts. In simulated data, we accurately infer the timing and strength of admixture events, population size changes, and divergence times over a variety of ancient and recent time scales. Using the same technique, we analyze deeply sequenced trio parents from the 1000 Genomes project. The data show evidence of extensive gene flow between Africa and Europe after the time of divergence as well as substructure and gene flow among ancestral hominids. In particular, we infer that recent African-European gene flow and ancient ghost admixture into Europe are both necessary to explain the spectrum of IBS sharing in the trios, rejecting simpler models that contain less population structure.  相似文献   

14.
15.
In this article, we develop an admixture F model (AFM) for the estimation of population-level coancestry coefficients from neutral molecular markers. In contrast to the previously published F model, the AFM enables disentangling small population size and lack of migration as causes of genetic differentiation behind a given level of FST. We develop a Bayesian estimation scheme for fitting the AFM to multiallelic data acquired from a number of local populations. We demonstrate the performance of the AFM, using simulated data sets and real data on ninespine sticklebacks (Pungitius pungitius) and common shrews (Sorex araneus). The results show that the parameterization of the AFM conveys more information about the evolutionary history than a simple summary parameter such as FST. The methods are implemented in the R package RAFM.  相似文献   

16.
Principal components analysis of population admixture   总被引:1,自引:0,他引:1  
J Ma  CI Amos 《PloS one》2012,7(7):e40115
With the availability of high-density genotype information, principal components analysis (PCA) is now routinely used to detect and quantify the genetic structure of populations in both population genetics and genetic epidemiology. An important issue is how to make appropriate and correct inferences about population relationships from the results of PCA, especially when admixed individuals are included in the analysis. We extend our recently developed theoretical formulation of PCA to allow for admixed populations. Because the sampled individuals are treated as features, our generalized formulation of PCA directly relates the pattern of the scatter plot of the top eigenvectors to the admixture proportions and parameters reflecting the population relationships, and thus can provide valuable guidance on how to properly interpret the results of PCA in practice. Using our formulation, we theoretically justify the diagnostic of two-way admixture. More importantly, our theoretical investigations based on the proposed formulation yield a diagnostic of multi-way admixture. For instance, we found that admixed individuals with three parental populations are distributed inside the triangle formed by their parental populations and divide the triangle into three smaller triangles whose areas have the same proportions in the big triangle as the corresponding admixture proportions. We tested and illustrated these findings using simulated data and data from HapMap III and the Human Genome Diversity Project.  相似文献   

17.

Background

The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal.

Results

Here we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. First, ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Second, individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. Third, by penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. Finally, by exploiting multiple processors, large datasets can be analyzed even more rapidly.

Conclusions

The enhancements we have described make ADMIXTURE a more accurate, efficient, and versatile tool for ancestry estimation.  相似文献   

18.
Many studies in human genetics compare informativeness of single‐nucleotide polymorphisms (SNPs) and microsatellites (single sequence repeats; SSR) in genome scans, but it is difficult to transfer the results directly to livestock because of different population structures. The aim of this study was to determine the number of SNPs needed to obtain the same differentiation power as with a given standard set of microsatellites. Eight chicken breeds were genotyped for 29 SSRs and 9216 SNPs. After filtering, only 2931 SNPs remained. The differentiation power was evaluated using two methods: partitioning of the Euclidean distance matrix based on a principal component analysis (PCA) and a Bayesian model‐based clustering approach. Generally, with PCA‐based partitioning, 70 SNPs provide a comparable resolution to 29 SSRs. In model‐based clustering, the similarity coefficient showed significantly higher values between repeated runs for SNPs compared to SSRs. For the membership coefficients, reflecting the proportion to which a fraction segment of the genome belongs to the ith cluster, the highest values were obtained for 29 SSRs and 100 SNPs respectively. With a low number of loci (29 SSRs or ≤100 SNPs), neither marker types could detect the admixture in the Gödöllö Nhx population. Using more than 250 SNPs allowed a more detailed insight into the genetic architecture. Thus, the admixed population could be detected. It is concluded that breed differentiation studies will substantially gain power even with moderate numbers of SNPs.  相似文献   

19.
The tiger shrimp (Penaeus monodon) is an important marine crustacean in terms of biological diversity and aquaculture resource. The shrimp is widespread across the Indo‐Pacific region and shows apparent genetic differentiation among geographical populations. It is common practice to transport female brooders between different countries to seed the shrimp farms, posing potential problems of unwanted population admixture. We developed 23 polymorphic microsatellites for P. monodon (average HE = 0.936) and these microsatellites were applicable for studying population differentiation, identifying valid stocks and tagging nonindigenous farmed shrimps.  相似文献   

20.
Although whole‐genome sequencing is becoming more accessible and feasible for nonmodel organisms, microsatellites have remained the markers of choice for various population and conservation genetic studies. However, the criteria for choosing microsatellites are still controversial due to ascertainment bias that may be introduced into the genetic inference. An empirical study of red deer (Cervus elaphus) populations, in which cross‐specific and species‐specific microsatellites developed through pyrosequencing of enriched libraries, was performed for this study. Two different strategies were used to select the species‐specific panels: randomly vs. highly polymorphic markers. The results suggest that reliable and accurate estimations of genetic diversity can be obtained using random microsatellites distributed throughout the genome. In addition, the results reinforce previous evidence that selecting the most polymorphic markers leads to an ascertainment bias in estimates of genetic diversity, when compared with randomly selected microsatellites. Analyses of population differentiation and clustering seem less influenced by the approach of microsatellite selection, whereas assigning individuals to populations might be affected by a random selection of a small number of microsatellites. Individual multilocus heterozygosity measures produced various discordant results, which in turn had impacts on the heterozygosity‐fitness correlation test. Finally, we argue that picking the appropriate microsatellite set should primarily take into account the ecological and evolutionary questions studied. Selecting the most polymorphic markers will generally overestimate genetic diversity parameters, leading to misinterpretations of the real genetic diversity, which is particularly important in managed and threatened populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号