首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Dominant markers such as amplified fragment length polymorphisms (AFLPs) provide an economical way of surveying variation at many loci. However, the uncertainty about the underlying genotypes presents a problem for statistical analysis. Similarly, the presence of null alleles and the limitations of genotype calling in polyploids mean that many conventional analysis methods are invalid for many organisms. Here we present a simple approach for accounting for genotypic ambiguity in studies of population structure and apply it to AFLP data from whitefish. The approach is implemented in the program structure version 2.2, which is available from http://pritch.bsd.uchicago.edu/structure.html.  相似文献   

2.
Falush D  Stephens M  Pritchard JK 《Genetics》2003,164(4):1567-1587
We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations ("admixture linkage disequilibium"). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.  相似文献   

3.
In conservation and management of species it is important to make inferences about gene flow, dispersal and population structure. In this study, we used 613 georeferenced tissue samples from hazel grouse (Bonasa bonasia) where each individual was genotyped at 12 microsatellite loci to make inference on population genetic structure, gene flow and dispersal in northern Sweden. Observed levels of genetic diversity suggest that Swedish hazel grouse do not suffer loss of genetic diversity compared with other grouse species. We found significant F(IS) (deviation from Hardy-Weinberg expectations) over the entire sample using jack-knifed estimators over loci, which is most likely explained by a Wahlund effect. With the use of spatial autocorrelation methods, we detected significant isolation by distance among individuals. Neighbourhood size was estimated in the order of 62-158 individuals corresponding to a dispersal distance of 950-1500 m. Using a spatial statistical model for landscape genetics to infer the number of populations and the spatial location of genetic discontinuities between these populations we found indications that Swedish hazel grouse are divided into a northern and a southern population. We could not find a sharp border between these two populations and none of the observed borders appeared to coincide with any potential geographical barriers.These results imply that gene flow appears somewhat unrestricted in the boreal taiga forests of northern Sweden and that the two populations of hazel grouse in Sweden may be explained by the post-glacial reinvasion history of the Scandinavian Peninsula.  相似文献   

4.
Information on the genetic diversity and population structure of cattle breeds is useful when deciding the most optimal, for example, crossbreeding strategies to improve phenotypic performance by exploiting heterosis. The present study investigated the genetic diversity and population structure of the most prominent dairy and beef breeds used in Ireland. Illumina high-density genotypes (777 962 single nucleotide polymorphisms; SNPs) were available on 4623 purebred bulls from nine breeds; Angus (n=430), Belgian Blue (n=298), Charolais (n=893), Hereford (n=327), Holstein-Friesian (n=1261), Jersey (n=75), Limousin (n=943), Montbéliarde (n=33) and Simmental (n=363). Principal component analysis revealed that Angus, Hereford, and Jersey formed non-overlapping clusters, representing distinct populations. In contrast, overlapping clusters suggested geographical proximity of origin and genetic similarity between Limousin, Simmental and Montbéliarde and to a lesser extent between Holstein, Friesian and Belgian Blue. The observed SNP heterozygosity averaged across all loci was 0.379. The Belgian Blue had the greatest mean observed heterozygosity (HO=0.389) among individuals within breed while the Holstein-Friesian and Jersey populations had the lowest mean heterozygosity (HO=0.370 and 0.376, respectively). The correlation between the genomic-based and pedigree-based inbreeding coefficients was weak (r=0.171; P<0.001). Mean genomic inbreeding estimates were greatest for Jersey (0.173) and least for Hereford (0.051). The pair-wise breed fixation index (Fst) ranged from 0.049 (Limousin and Charolais) to 0.165 (Hereford and Jersey). In conclusion, substantial genetic variation exists among breeds commercially used in Ireland. Thus custom-mating strategies would be successful in maximising the exploitation of heterosis in crossbreeding strategies.  相似文献   

5.
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this "chromosome painting" can be summarized as a "coancestry matrix," which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.  相似文献   

6.
We use the patterns of homozygosity at multiple loci to distinguish between excess homozygosity caused by consanguineous mating and that due to undetected population subdivision (the Wahlund effect). Clarification of the underlying causes of excess homozygosity is of practical importance in explaining the occurrence of recessive genetic disorders and in forensic match probability calculations. We calculated a likelihood surface for two parameters: C, the proportion of the population practicing consanguinity, and theta, the genetic correlation due population subdivision. To illustrate the method, we applied it to multilocus genotypic data of two U.K. Asian populations, one practicing a high frequency of cousin marriage, and another in which caste endogamy was suspected. The method was able to successfully distinguish the different patterns of relatedness. The method also returned accurate estimates of C and theta using simulated data sets. We show how our method can be extended to allow for degrees of inbreeding closer than cousin unions, including selfing. With closer inbreeding, the relatedness of recent ancestors beyond the parents becomes an issue.  相似文献   

7.
Inference of bacterial microevolution using multilocus sequence data   总被引:5,自引:0,他引:5  
Didelot X  Falush D 《Genetics》2007,175(3):1251-1266
We describe a model-based method for using multilocus sequence data to infer the clonal relationships of bacteria and the chromosomal position of homologous recombination events that disrupt a clonal pattern of inheritance. The key assumption of our model is that recombination events introduce a constant rate of substitutions to a contiguous region of sequence. The method is applicable both to multilocus sequence typing (MLST) data from a few loci and to alignments of multiple bacterial genomes. It can be used to decide whether a subset of isolates share common ancestry, to estimate the age of the common ancestor, and hence to address a variety of epidemiological and ecological questions that hinge on the pattern of bacterial spread. It should also be useful in associating particular genetic events with the changes in phenotype that they cause. We show that the model outperforms existing methods of subdividing recombinogenic bacteria using MLST data and provide examples from Salmonella and Bacillus. The software used in this article, ClonalFrame, is available from http://bacteria.stats.ox.ac.uk/.  相似文献   

8.
Several methods have been developed to estimate the selfing rate of a population from a sample of individuals genotyped for several marker loci. These methods can be based on homozygosity excess (or inbreeding), identity disequilibrium, progeny array (PA) segregation or population assignment incorporating partial selfing. Progeny array-based method is generally the best because it is not subject to some assumptions made by other methods (such as lack of misgenotyping, absence of biparental inbreeding and presence of inbreeding equilibrium), and it can reveal other facets of a mixed-mating system such as patterns of shared paternity. However, in practice, it is often difficult to obtain PAs, especially for animal species. In this study, we propose a method to reconstruct the pedigree of a sample of individuals taken from a monoecious diploid population practicing mixed mating, using multilocus genotypic data. Selfing and outcrossing events are then detected when an individual derives from identical parents and from two distinct parents, respectively. Selfing rate is estimated by the proportion of selfed offspring in the reconstructed pedigree of a sample of individuals. The method enjoys many advantages of the PA method, but without the need of a priori family structure, although such information, if available, can be utilized to improve the inference. Furthermore, the new method accommodates genotyping errors, estimates allele frequencies jointly and is robust to the presence of biparental inbreeding and inbreeding disequilibrium. Both simulated and empirical data were analysed by the new and previous methods to compare their statistical properties and accuracies.  相似文献   

9.
Gao H  Williamson S  Bustamante CD 《Genetics》2007,176(3):1635-1651
Nonrandom mating induces correlations in allelic states within and among loci that can be exploited to understand the genetic structure of natural populations (Wright 1965). For many species, it is of considerable interest to quantify the contribution of two forms of nonrandom mating to patterns of standing genetic variation: inbreeding (mating among relatives) and population substructure (limited dispersal of gametes). Here, we extend the popular Bayesian clustering approach STRUCTURE (Pritchard et al. 2000) for simultaneous inference of inbreeding or selfing rates and population-of-origin classification using multilocus genetic markers. This is accomplished by eliminating the assumption of Hardy-Weinberg equilibrium within clusters and, instead, calculating expected genotype frequencies on the basis of inbreeding or selfing rates. We demonstrate the need for such an extension by showing that selfing leads to spurious signals of population substructure using the standard STRUCTURE algorithm with a bias toward spurious signals of admixture. We gauge the performance of our method using extensive coalescent simulations and demonstrate that our approach can correct for this bias. We also apply our approach to understanding the population structure of the wild relative of domesticated rice, Oryza rufipogon, an important partially selfing grass species. Using a sample of n = 16 individuals sequenced at 111 random loci, we find strong evidence for existence of two subpopulations, which correlates well with geographic location of sampling, and estimate selfing rates for both groups that are consistent with estimates from experimental data (s approximately 0.48-0.70).  相似文献   

10.
A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

11.
Das A  Mohanty S  Stephan W 《Genetics》2004,168(4):1975-1985
Inferring the origin, population structure, and demographic history of a species is a major objective of population genetics. Although many organisms have been analyzed, the genetic structures of subdivided populations are not well understood. Here we analyze Drosophila ananassae, a highly substructured, cosmopolitan, and human-commensal species distributed in the tropical, subtropical, and mildly temperate regions of the world. We adopt a multilocus approach (with 10 neutral loci) using 16 population samples covering almost the entire species range (Asia, Australia, and America). Analyzed with our recently developed Bayesian method, 5 populations in Southeast Asia are found to be central, while the other 11 are peripheral. These 5 central populations were sampled from localities that belonged to a single landmass ("Sundaland") during the late Pleistocene ( approximately 18,000 years ago), when sea level was approximately 120 m below the present level. The inferred migration routes of D. ananassae out of Sundaland seem to parallel those of humans in this region. Strong evidence for a population size expansion is seen particularly in the ancestral populations.  相似文献   

12.
Despite its importance as a human pathogen, information on population structure and global epidemiology of Staphylococcus epidermidis is scarce and the relative importance of the mechanisms contributing to clonal diversification is unknown. In this study, we addressed these issues by analyzing a representative collection of S. epidermidis isolates from diverse geographic and clinical origins using multilocus sequence typing (MLST). Additionally, we characterized the mobile element (SCCmec) carrying the genetic determinant of methicillin resistance. The 217 S. epidermidis isolates from our collection were split by MLST into 74 types, suggesting a high level of genetic diversity. Analysis of MLST data using the eBURST algorithm revealed the existence of nine epidemic clonal lineages that were disseminated worldwide. One single clonal lineage (clonal complex 2) comprised 74% of the isolates, whereas the remaining isolates were clustered into 8 minor clonal lineages and 13 singletons. According to our evolutionary model, SCCmec was acquired at least 56 times by S. epidermidis. Although geographic dissemination of S. epidermidis strains and the value of the index of association between the alleles, 0.2898 (P < 0.05), support the clonality of S. epidermidis species, examination of the sequence changes at MLST loci during clonal diversification showed that recombination gives rise to new alleles approximately twice as frequently as point mutations. We suggest that S. epidermidis has a population with an epidemic structure, in which nine clones have emerged upon a recombining background and evolved quickly through frequent transfer of genetic mobile elements, including SCCmec.  相似文献   

13.
It is an assumption of large, population-based datasets that samples are annotated accurately whether they correspond to known relationships or unrelated individuals. These annotations are key for a broad range of genetics applications. While many methods are available to assess relatedness that involve estimates of identity-by-descent (IBD) and/or identity-by-state (IBS) allele-sharing proportions, we developed a novel approach that estimates IBD0, 1, and 2 based on observed IBS within windows. When combined with genome-wide IBS information, it provides an intuitive and practical graphical approach with the capacity to analyze datasets with thousands of samples without prior information about relatedness between individuals or haplotypes. We applied the method to a commonly used Human Variation Panel consisting of 400 nominally unrelated individuals. Surprisingly, we identified identical, parent-child, and full-sibling relationships and reconstructed pedigrees. In two instances non-sibling pairs of individuals in these pedigrees had unexpected IBD2 levels, as well as multiple regions of homozygosity, implying inbreeding. This combined method allowed us to distinguish related individuals from those having atypical heterozygosity rates and determine which individuals were outliers with respect to their designated population. Additionally, it becomes increasingly difficult to identify distant relatedness using genome-wide IBS methods alone. However, our IBD method further identified distant relatedness between individuals within populations, supported by the presence of megabase-scale regions lacking IBS0 across individual chromosomes. We benchmarked our approach against the hidden Markov model of a leading software package (PLINK), showing improved calling of distantly related individuals, and we validated it using a known pedigree from a clinical study. The application of this approach could improve genome-wide association, linkage, heterozygosity, and other population genomics studies that rely on SNP genotype data.  相似文献   

14.
Wiuf C 《Genetics》2004,166(1):537-545
In this study compatibility with a tree for unphased genotype data is discussed. If the data are compatible with a tree, the data are consistent with an assumption of no recombination in its evolutionary history. Further, it is said that there is a solution to the perfect phylogeny problem; i.e., for each individual a pair of haplotypes can be defined and the set of all haplotypes can be explained without invoking recombination. A new algorithm to decide whether or not a sample is compatible with a tree is derived. The new algorithm relies on an equivalence relation between sites that mutually determine the phase of each other. (The previous algorithm was based on advanced graph theoretical tools.) The equivalence relation is used to derive the number of solutions to the perfect phylogeny problem. Further, a series of statistics, R ( j ) ( M ), j >or= 2, are defined. These can be used to detect recombination events in the sample's history and to divide the sample into regions that are compatible with a tree. The new statistics are applied to real data from human genes. The results from this application are discussed with reference to recent suggestions that recombination in the human genome is highly heterogeneous.  相似文献   

15.
A retrospective likelihood-based approach was proposed to test and estimate the effect of haplotype on disease risk using unphased genotype data with adjustment for environmental covariates. The proposed method was also extended to handle the data in which the haplotype and environmental covariates are not independent. Likelihood ratio tests were constructed to test the effects of haplotype and gene-environment interaction. The model parameters such as haplotype effect size was estimated using an Expectation Conditional-Maximization (ECM) algorithm developed by Meng and Rubin (1993). Model-based variance estimates were derived using the observed information matrix. Simulation studies were conducted for three different genetic effect models, including dominant effect, recessive effect, and additive effect. The results showed that the proposed method generated unbiased parameter estimates, proper type I error, and true beta coverage probabilities. The model performed well with small or large sample sizes, as well as short or long haplotypes.  相似文献   

16.
Inference in population structure studies   总被引:2,自引:2,他引:0       下载免费PDF全文
  相似文献   

17.
This study analyzes population structure and linkage disequilibrium (LD) among 187 commonly used Chinese maize inbred lines, representing the genetic diversity among public, commercial and historically important lines for corn breeding. Seventy SSR loci, evenly distributed over 10 chromosomes, were assayed for polymorphism. The identified 290 alleles served to estimate population structure and analyze the genome-wide LD. The population of lines was highly structured, showing 6 subpopulations: BSSS (American BSSS including Reid), PA (group A germplasm derived from modern U.S. hybrids in China), PB (group B germplasm derived from modern U.S. hybrid in China), Lan (Lancaster Surecrop), LRC (derivative lines from Lvda Reb Cob, a Chinese landrace) and SPT (derivative lines from Si-ping-tou, a Chinese landrace). Forty lines, which formerly had an unknown and/or miscellaneous origin and pedigree record, were assigned to the appropriate group. Relationship estimates based on SSR marker data were quantified in a Q matrix, and this information will inform breeder’s decisions regarding crosses. Extensive inter- and intra-chromosomal LD was detected between 70 microsatellite loci for the investigated maize lines (2109 loci pairs in LD with D′ > 0.1 and 93 out of them at P < 0.01).This suggests that rapidly evolving microsatellites may track recent population structure. Interlocus LD decay among the diverse maize germplasm indicated that association studies in QTLs and/or candidate genes might avoid nonfunctional and spurious associations since most of the LD blocks were broken between diverse germplasm. The defined population structure and the LD analysis present the basis for future association mapping. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

18.
19.
20.
A geostatistical perspective on spatial genetic structure may explain methodological issues of quantifying spatial genetic structure and suggest new approaches to addressing them. We use a variogram approach to (i) derive a spatial partitioning of molecular variance, gene diversity, and genotypic diversity for microsatellite data under the infinite allele model (IAM) and the stepwise mutation model (SMM), (ii) develop a weighting of sampling units to reflect ploidy levels or multiple sampling of genets, and (iii) show how variograms summarize the spatial genetic structure within a population under isolation-by-distance. The methods are illustrated with data from a population of the epiphytic lichen Lobaria pulmonaria, using six microsatellite markers. Variogram-based analysis not only avoids bias due to the underestimation of population variance in the presence of spatial autocorrelation, but also provides estimates of population genetic diversity and the degree and extent of spatial genetic structure accounting for autocorrelation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号