首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many models for inference of population genetic parameters are based on the assumption that the data set at hand consists of groups displaying within-group Hardy-Weinberg equilibrium at individual loci and linkage equilibrium between loci. This assumption is commonly violated by the presence of within-group spatial structure arising from nonrandom mating of individuals due to isolation by distance (IBD). This paper proposes a model and simulation method implemented in a computer program to flexibly simulate data displaying such patterns. The program permits displaying of smooth spatial variations of allele frequencies due to IBD and more abrupt variations due to presence of strong barriers to gene flow. It is useful in assessing performance of various statistical inference methods and in designing spatial sampling schemes. This is shown by a simulation study aimed at assessing the extent to which IBD patterns affect accuracy of cluster inferences performed in models assuming panmixia. The program is also used to study the effects of spatial sampling scheme (e.g. sampling individuals in clumps or uniformly across the spatial domain). The accuracy of such inferences is assessed in terms of number of inferred populations, assignment of individuals to populations and location of borders between populations. The effect of spatial sampling was weak while the effect of IBD may be substantial, leading to the inference of spurious populations, especially when IBD was strong with respect to the size of the sampling domain. The model and program are new and have been embedded in the R package Geneland, for user convenience and compliance with existing data formats.  相似文献   

2.
Wagner AP  Creel S  Kalinowski ST 《Heredity》2006,97(5):336-345
Relatedness is often estimated from microsatellite genotypes that include null alleles. When null alleles are present, observed genotypes represent one of several possible true genotypes. If null alleles are detected, but analyses do not adjust for their presence (ie, observed genotypes are treated as true genotypes), then estimates of relatedness and relationship can be incorrect. The number of loci available in many wildlife studies is limited, and loci with null alleles are commonly a large proportion of data that cannot be discarded without substantial loss of power. To resolve this problem, we present a new approach for estimating relatedness and relationships from data sets that include null alleles. Once it is recognized that the probability of the observed genotypes is dependent on the probabilities of a limited number of possible true genotypes, the required adjustments are straightforward. The concept can be applied to any existing estimators of relatedness and relationships. We review established maximum likelihood estimators and apply the correction in that setting. In an application of the corrected method to data from striped hyenas, we demonstrate that correcting for the presence of null alleles affect results substantially. Finally, we use simulated data to confirm that this method works better than two common approaches, namely ignoring the presence of null alleles or discarding affected loci.  相似文献   

3.
Microsatellite null alleles and estimation of population differentiation   总被引:20,自引:0,他引:20  
Microsatellite null alleles are commonly encountered in population genetics studies, yet little is known about their impact on the estimation of population differentiation. Computer simulations based on the coalescent were used to investigate the evolutionary dynamics of null alleles, their impact on F(ST) and genetic distances, and the efficiency of estimators of null allele frequency. Further, we explored how the existing method for correcting genotype data for null alleles performed in estimating F(ST) and genetic distances, and we compared this method with a new method proposed here (for F(ST) only). Null alleles were likely to be encountered in populations with a large effective size, with an unusually high mutation rate in the flanking regions, and that have diverged from the population from which the cloned allele state was drawn and the primers designed. When populations were significantly differentiated, F(ST) and genetic distances were overestimated in the presence of null alleles. Frequency of null alleles was estimated precisely with the algorithm presented in Dempster et al. (1977). The conventional method for correcting genotype data for null alleles did not provide an accurate estimate of F(ST) and genetic distances. However, the use of the genetic distance of Cavalli-Sforza and Edwards (1967) corrected by the conventional method gave better estimates than those obtained without correction. F(ST) estimation from corrected genotype frequencies performed well when restricted to visible allele sizes. Both the proposed method and the traditional correction method have been implemented in a program that is available free of charge at http://www.montpellier.inra.fr/URLB/. We used 2 published microsatellite data sets based on original and redesigned pairs of primers to empirically confirm our simulation results.  相似文献   

4.
A significant portion of plant species are polyploids, with ploidy levels sometimes varying among individuals and/or populations. Current techniques to determine the individual ploidy, e.g., flow cytometry, chromosome counting or genotyping‐by‐sequencing, are often cumbersome. Based on the genotypic probabilities for polysomic inheritance under double‐reduction, we developed a model to estimate allele frequency and infer the ploidy status of individuals from the allelic phenotypes of codominant genetic markers. The allele frequencies are estimated by an expectation‐maximization algorithm in the presence of null alleles, false alleles, negative amplifications and self‐fertilization, and the posterior probabilities are used to assign individuals into different levels of ploidy. The accuracy of this method under different conditions is evaluated. Our methods are freely available in a new software package, ploidyinfer , for use by other researchers which can be downloaded from http://github.com/huangkang1987/ploidyinfer .  相似文献   

5.
The use of dominant markers such as amplified fragment length polymorphism (AFLP) for population genetics analyses is often impeded by the lack of appropriate computer programs and rarely motivated by objective considerations. The point of the present note is twofold: (i) we describe how the computer program Geneland designed to infer population structure has been adapted to deal with dominant markers; and (ii) we use Geneland for numerical comparison of dominant and codominant markers to perform clustering. AFLP markers lead to less accurate results than bi-allelic codominant markers such as single nucleotide polymorphisms (SNP) markers but this difference becomes negligible for data sets of common size (number of individuals n≥100, number of markers L≥200). The latest Geneland version (3.2.1) handling dominant markers is freely available as an R package with a fully clickable graphical interface. Installation instructions and documentation can be found on http://www2.imm.dtu.dk/~gigu/Geneland.  相似文献   

6.
Null alleles are alleles that for various reasons fail to amplify in a PCR assay. The presence of null alleles in microsatellite data is known to bias the genetic parameter estimates. Thus, efficient detection of null alleles is crucial, but the methods available for indirect null allele detection return inconsistent results. Here, our aim was to compare different methods for null allele detection, to explain their respective performance and to provide improvements. We applied several approaches to identify the ‘true’ null alleles based on the predictions made by five different methods, used either individually or in combination. First, we introduced simulated ‘true’ null alleles into 240 population data sets and applied the methods to measure their success in detecting the simulated null alleles. The single best‐performing method was ML‐NullFreq_frequency. Furthermore, we applied different noise reduction approaches to improve the results. For instance, by combining the results of several methods, we obtained more reliable results than using a single one. Rule‐based classification was applied to identify population properties linked to the false discovery rate. Rules obtained from the classifier described which population genetic estimates and loci characteristics were linked to the success of each method. We have shown that by simulating ‘true’ null alleles into a population data set, we may define a null allele frequency threshold, related to a desired true or false discovery rate. Moreover, using such simulated data sets, the expected null allele homozygote frequency may be estimated independently of the equilibrium state of the population.  相似文献   

7.
We propose a new model to make use of georeferenced genetic data for inferring the location and shape of a hybrid zone. The model output includes the posterior distribution of a parameter that quantifies the width of the hybrid zone. The model proposed is implemented in the GUI and command-line versions of the Geneland program versions ≥ 3.3.0. Information about the program can be found on http://www2.imm.dtu.dk/gigu/Geneland/.  相似文献   

8.
In the problem of reconstructing full sib pedigrees from DNA marker data, three existing algorithms and one new algorithm are compared in terms of accuracy, efficiency and robustness using real and simulated data sets. An algorithm based on the exclusion principle and another based on a maximization of the Simpson index were very accurate at reconstructing data sets comprising a few large families but had problems with data sets with limited family structure, while a Markov Chain Monte Carlo (MCMC) algorithm based on the maximization of a partition score had the opposite behaviour. An MCMC algorithm based on maximizing the full joint likelihood performed best in small data sets comprising several medium-sized families but did not work well under most other conditions. It appears that the likelihood surface may be rough and presents challenges for the MCMC algorithm to find the global maximum. This likelihood algorithm also exhibited problems in reconstructing large family groups, due possibly to limits in computational precision. The accuracy of each algorithm improved with an increasing amount of information in the data set, and was very high with eight loci with eight alleles each. All four algorithms were quite robust to deviation from an idealized uniform allelic distribution, to departures from idealized Mendelian inheritance in simulated data sets and to the presence of null alleles. In contrast, none of the algorithms were very robust to the probable presence of error/mutation in the data. Depending upon the type of mutation or errors and the algorithm used, between 70 and 98% of the affected individuals were classified improperly on average.  相似文献   

9.
Microsatellite null alleles are found to a varying degree across all taxa. They are problematic as they may inflate measures of genetic differentiation and create false homozygotes. Although there are several methods for correcting allele frequencies for null alleles and enable estimations of F(ST), much less is known about how null alleles affect assignment testing. Data presented here, based on simulations, show that the percentage of correctly assigned individuals in model-based clustering and Bayesian assignment methods were slightly, though significantly, reduced in the presence of null alleles (frequency range from 0.000 to 0.913). The bias in assignment tests caused by null alleles lead to a slight reduction in the power to correctly assigned individuals (0.2 and 1.0 percent units for STRUCTURE- and 2.4 percent units for GENECLASS-based assignment tests). Further, the presence of null alleles caused a small, however, significant overestimation of F(ST). Consequently, microsatellite loci affected by null alleles would probably not alter the overall outcome of assignment testing and could therefore be included in these types of studies. Nevertheless, loci prone to null alleles should be used with caution as they lower the power of assignment tests and alter the accuracy of F(ST), and loci less prone to null alleles should always be preferred.  相似文献   

10.
Geneland is a computer package that allows to make use of georeferenced individual multilocus genotypes for the inference of the number of populations and of the spatial location of genetic discontinuities between those populations. Main assumptions of the method are: (i) the number of populations is unknown and all values are considered a priori equally likely, (ii) populations are spread over areas given by a union of some polygons of unknown location in the spatial domain, (iii) Hardy–Weinberg equilibrium is assumed within each population and (iv) allele frequencies in each population are unknown and treated as random variable either following the so‐called Dirichlet model or Falush model. Different algorithms implemented in Geneland to perform inferences are first briefly presented. Then major running steps and outputs (i.e. histogram of number of populations and map of posterior probabilities of population membership) are illustrated from the analysis of a simulated data set, which was also produced by Geneland.  相似文献   

11.
Studies of genetics and ecology often require estimates of relatedness coefficients based on genetic marker data. However, with the presence of null alleles, an observed genotype can represent one of several possible true genotypes. This results in biased estimates of relatedness. As the numbers of marker loci are often limited, loci with null alleles cannot be abandoned without substantial loss of statistical power. Here, we show how loci with null alleles can be incorporated into six estimators of relatedness (two novel). We evaluate the performance of various estimators before and after correction for null alleles. If the frequency of a null allele is <0.1, some estimators can be used directly without adjustment; if it is >0.5, the potency of estimation is too low and such a locus should be excluded. We make available a software package entitled PolyRelatedness v1.6, which enables researchers to optimize these estimators to best fit a particular data set.  相似文献   

12.
Genetic data are useful for estimating the genealogical relationship or relatedness between individuals of unknown ancestry. We present a computer program, ml ‐relate that calculates maximum likelihood estimates of relatedness and relationship. ml ‐relate is designed for microsatellite data and can accommodate null alleles. It uses simulation to determine which relationships are consistent with genotype data and to compare putative relationships with alternatives. ml ‐relate runs on the Microsoft Windows operating system and is available from http://www.montana.edu/kalinowski .  相似文献   

13.
The use of microsatellites for studies of population structure, as markers in genome mapping, and for parentage control has become increasingly popular in recent years. However, the presence of null alleles can lead to confounding results when using microsatellites. In the Omy3DIAS microsatellite, the presence of a null allele was discovered by analysis of family material. The null allele was sequenced after amplification with new primers located farther away from the repeat sequence. The null allele was shown to be caused by a deletion of a 4-bp sequence, which was part of a repetitive sequence within one of the primer recognition sites. As this phenomenon has been seen in other cases of null alleles, this observation leads to the recommendation to avoid repetitive sequences of any kind within primer sequences. Allele-specific amplification of the null allele revealed the presence of a single variant of this allele. Received January 31, 2000; accepted May 5, 2000.  相似文献   

14.
Human genetic linkage maps are based on rates of recombination across the genome. These rates in humans vary by the sex of the parent from whom alleles are inherited, by chromosomal position, and by genomic features, such as GC content and repeat density. We have examined--for the first time, to our knowledge--racial/ethnic differences in genetic maps of humans. We constructed genetic maps based on 353 microsatellite markers in four racial/ethnic groups: whites, African Americans, Mexican Americans, and East Asians (Chinese and Japanese). These maps were generated using 9,291 subjects from 2,900 nuclear families who participated in the National Heart, Lung, and Blood Institute-funded Family Blood Pressure Program, the largest sample used for map construction to date. Although the maps for the different groups are generally similar, we did find regional and genomewide differences across ethnic groups, including a longer genomewide map for African Americans than for other populations. Some of this variation was explained by genotyping artifacts--namely, null alleles (i.e., alleles with null phenotypes) at a number of loci--and by ethnic differences in null-allele frequencies. In particular, null alleles appear to be the likely explanation for the excess map length in African Americans. We also found that nonrandom missing data biases map results. However, we found regions on chromosome 8p and telomeric segments with significant ethnic differences and a suggestive interval on chromosome 12q that were not due to genotype artifacts. The difference on chromosome 8p is likely due to a polymorphic inversion in the region. The results of our investigation have implications for inferences of possible genetic influences on human recombination as well as for future linkage studies, especially those involving populations of nonwhite ethnicity.  相似文献   

15.
Zaykin DV  Pudovkin A  Weir BS 《Genetics》2008,180(1):533-545
The correlation between alleles at a pair of genetic loci is a measure of linkage disequilibrium. The square of the sample correlation multiplied by sample size provides the usual test statistic for the hypothesis of no disequilibrium for loci with two alleles and this relation has proved useful for study design and marker selection. Nevertheless, this relation holds only in a diallelic case, and an extension to multiple alleles has not been made. Here we introduce a similar statistic, R(2), which leads to a correlation-based test for loci with multiple alleles: for a pair of loci with k and m alleles, and a sample of n individuals, the approximate distribution of n(k - 1)(m - 1)/(km)R(2) under independence between loci is chi((k-1)(m-1))(2). One advantage of this statistic is that it can be interpreted as the total correlation between a pair of loci. When the phase of two-locus genotypes is known, the approach is equivalent to a test for the overall correlation between rows and columns in a contingency table. In the phase-known case, R(2) is the sum of the squared sample correlations for all km 2 x 2 subtables formed by collapsing to one allele vs. the rest at each locus. We examine the approximate distribution under the null of independence for R(2) and report its close agreement with the exact distribution obtained by permutation. The test for independence using R(2) is a strong competitor to approaches such as Pearson's chi square, Fisher's exact test, and a test based on Cressie and Read's power divergence statistic. We combine this approach with our previous composite-disequilibrium measures to address the case when the genotypic phase is unknown. Calculation of the new multiallele test statistic and its P-value is very simple and utilizes the approximate distribution of R(2). We provide a computer program that evaluates approximate as well as "exact" permutational P-values.  相似文献   

16.
For over a decade, experimental evolution has been combined with high-throughput sequencing techniques. In so-called Evolve-and-Resequence (E&R) experiments, populations are kept in the laboratory under controlled experimental conditions where their genomes are sampled and allele frequencies monitored. However, identifying signatures of adaptation in E&R datasets is far from trivial, and it is still necessary to develop more efficient and statistically sound methods for detecting selection in genome-wide data. Here, we present Bait-ER – a fully Bayesian approach based on the Moran model of allele evolution to estimate selection coefficients from E&R experiments. The model has overlapping generations, a feature that describes several experimental designs found in the literature. We tested our method under several different demographic and experimental conditions to assess its accuracy and precision, and it performs well in most scenarios. Nevertheless, some care must be taken when analysing trajectories where drift largely dominates and starting frequencies are low. We compare our method with other available software and report that ours has generally high accuracy even for trajectories whose complexity goes beyond a classical sweep model. Furthermore, our approach avoids the computational burden of simulating an empirical null distribution, outperforming available software in terms of computational time and facilitating its use on genome-wide data. We implemented and released our method in a new open-source software package that can be accessed at https://doi.org/10.5281/zenodo.7351736 .  相似文献   

17.
Partial clonality is commonly used in eukaryotes and has large consequences for their evolution and ecology. Assessing accurately the relative importance of clonal vs. sexual reproduction matters for studying and managing such species. Here, we proposed a Bayesian approach, ClonEstiMate, to infer rates of clonality c from populations sampled twice over a short time interval, ideally one generation time. The method relies on the likelihood of the transitions between genotype frequencies of ancestral and descendent populations, using an extended Wright–Fisher model explicitly integrating reproductive modes. Our model provides posterior probability distribution of inferred c, given the assumed rates of mutation, as well as inbreeding and selfing when occurring. Tested under various conditions, this model provided accurate inferences of c, especially when the amount of information was modest, that is low sample sizes, few loci, low polymorphism and strong linkage disequilibrium. Inferences remained robust when mutation models and rates were misinformed. However, the method was sensitive to moderate frequencies of null alleles and when the time interval between required samplings exceeding two generations. Misinformed rates on mating modes (inbreeding and selfing) also resulted in biased inferences. Our method was tested on eleven data sets covering five partially clonal species, for which the extent of clonality was formerly deciphered. It delivered highly consistent results with previous information on the biology of those species. ClonEstiMate represents a powerful tool for detecting and inferring clonality in finite populations, genotyped with SNPs or microsatellites. It is freely available at https://www6.rennes.inra.fr/igepp_eng/Productions/Software .  相似文献   

18.
Duplicated loci, for example those associated with major histocompatibility complex (MHC) genes, often have similar DNA sequences that can be coamplified with a pair of primers. This results in genotyping difficulties and inaccurate analyses. Here, we present a method to assign alleles to different loci in amplifications of duplicated loci. This method simultaneously considers several factors that may each affect correct allele assignment. These are the sharing of identical alleles among loci, null alleles, copy number variation, negative amplification, heterozygote excess or heterozygote deficiency, and linkage disequilibrium. The possible multilocus genotypes are extracted from the alleles for each individual and weighted to estimate the allele frequencies. The likelihood of an allele configuration is calculated and is optimized with a heuristic algorithm. Monte‐Carlo simulations and three empirical MHC data sets are used as examples to evaluate the efficacy of our method under different conditions. Our new software, mhc‐typer V1.1, is freely available at https://github.com/huangkang1987/mhc-typer .  相似文献   

19.
MICROSATELIGHT is a Perl/Tk pipeline with a graphical user interface that facilitates several tasks when scoring microsatellites. It implements new subroutines in R and PERL and takes advantage of features provided by previously developed freeware. MICROSATELIGHT takes raw genotype data and automates the peak identification through PeakScanner. The PeakSelect subroutine assigns peaks to different microsatellite markers according to their multiplex group, fluorochrome type, and size range. After peak selection, binning of alleles can be carried out 1) automatically through AlleloBin or 2) by manual bin definition through Binator. In both cases, several features for quality checking and further binning improvement are provided. The genotype table can then be converted into input files for several population genetics programs through CREATE. Finally, Hardy-Weinberg equilibrium tests and confidence intervals for null allele frequency can be obtained through GENEPOP. MICROSATELIGHT is the only freely available public-domain software that facilitates full multiplex microsatellite scoring, from electropherogram files to user-defined text files to be used with population genetics software. MICROSATELIGHT has been created for the Windows XP operating system and has been successfully tested under Windows 7. It is available at http://sourceforge.net/projects/microsatelight/.  相似文献   

20.
Genetic techniques are frequently used to sample and monitor wildlife populations. The goal of these studies is to maximize the ability to distinguish individuals for various genetic inference applications, a process which is often complicated by genotyping error. However, wildlife studies usually have fixed budgets, which limit the number of genetic markers available for inclusion in a study marker panel. Prior to our study, a formal algorithm for selecting a marker panel that included genotyping error, laboratory costs, and ability to distinguish individuals did not exist. We developed a constrained nonlinear programming optimization algorithm to determine the optimal number of markers for a marker panel, initially applied to a pilot study designed to estimate black bear abundance in central Georgia. We extend the algorithm to other genetic applications (e.g., parentage or population assignment) and incorporate possible null alleles. Our algorithm can be used in wildlife pilot studies to assess the feasibility of genetic sampling for multiple genetic inference applications. © 2011 The Wildlife Society.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号