首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Anderson AD  Weir BS 《Genetics》2007,176(1):421-440
A maximum-likelihood estimator for pairwise relatedness is presented for the situation in which the individuals under consideration come from a large outbred subpopulation of the population for which allele frequencies are known. We demonstrate via simulations that a variety of commonly used estimators that do not take this kind of misspecification of allele frequencies into account will systematically overestimate the degree of relatedness between two individuals from a subpopulation. A maximum-likelihood estimator that includes F(ST) as a parameter is introduced with the goal of producing the relatedness estimates that would have been obtained if the subpopulation allele frequencies had been known. This estimator is shown to work quite well, even when the value of F(ST) is misspecified. Bootstrap confidence intervals are also examined and shown to exhibit close to nominal coverage when F(ST) is correctly specified.  相似文献   

2.
Wang J 《Molecular ecology》2004,13(10):3169-3178
Knowledge of the genetic relatedness between a pair of individuals is important in many research areas of quantitative genetics, conservation genetics, evolution and ecology. Many estimators have been developed to estimate such pairwise relatedness (r) using codominant markers, such as microsatellites and enzymes. In contrast, only two estimators are proposed to use dominant markers, such as random amplified polymorphic DNAs (RAPDs) and amplified fragment length polymorphisms (AFLPs), in relatedness inference. They are both biased estimators, and their statistical properties and robustness to the sampling errors in allele frequency have not been investigated. In this short paper, I propose two new pairwise relatedness estimators for dominant markers, and compare them in precision, accuracy and robustness to sampling with the two previous estimators using simulations. It was found that the new estimator based on the least squares approach is unbiased when allele frequencies are known or estimated from a sample without correcting for sampling effects. It has, however, a low precision and as a result, an intermediate overall performance among the four estimators in terms of the mean squared deviation (MSD) of estimates from actual values of r. The new estimator based on a similarity index is slightly biased but has generally the lowest MSD among the four estimators compared, regardless of the number of loci, type of actual relationships, allele frequencies known or estimated from samples. Simulations also show that the confidence intervals estimated by bootstrapping are appropriate for different estimators provided that the number of loci used in the estimation is not small.  相似文献   

3.
An estimator for pairwise relatedness using molecular markers   总被引:21,自引:0,他引:21  
Wang J 《Genetics》2002,160(3):1203-1215
I propose a new estimator for jointly estimating two-gene and four-gene coefficients of relatedness between individuals from an outbreeding population with data on codominant genetic markers and compare it, by Monte Carlo simulations, to previous ones in precision and accuracy for different distributions of population allele frequencies, numbers of alleles per locus, actual relationships, sample sizes, and proportions of relatives included in samples. In contrast to several previous estimators, the new estimator is well behaved and applies to any number of alleles per locus and any allele frequency distribution. The estimates for two- and four-gene coefficients of relatedness from the new estimator are unbiased irrespective of the sample size and have sampling variances decreasing consistently with an increasing number of alleles per locus to the minimum asymptotic values determined by the variation in identity-by-descent among loci per se, regardless of the actual relationship. The new estimator is also robust for small sample sizes and for unknown relatives being included in samples for estimating allele frequencies. Compared to previous estimators, the new one is generally advantageous, especially for highly polymorphic loci and/or small sample sizes.  相似文献   

4.
The two alleles an individual carries at a locus are identical by descent (ibd) if they have descended from a single ancestral allele in a reference population, and the probability of such identity is the inbreeding coefficient of the individual. Inbreeding coefficients can be predicted from pedigrees with founders constituting the reference population, but estimation from genetic data is not possible without data from the reference population. Most inbreeding estimators that make explicit use of sample allele frequencies as estimates of allele probabilities in the reference population are confounded by average kinships with other individuals. This means that the ranking of those estimates depends on the scope of the study sample and we show the variation in rankings for common estimators applied to different subdivisions of 1000 Genomes data. Allele-sharing estimators of within-population inbreeding relative to average kinship in a study sample, however, do have invariant rankings across all studies including those individuals. They are unbiased with a large number of SNPs. We discuss how allele sharing estimates are the relevant quantities for a range of empirical applications.Subject terms: Population genetics, Evolutionary biology, Molecular ecology  相似文献   

5.
Nielsen R  Tarpy DR  Reeve HK 《Molecular ecology》2003,12(11):3157-3164
Estimating paternity and genetic relatedness is central to many empirical and theoretical studies of social insects. The two important measures of a queen's mating number are her actual number of mates and her effective number of mates. Estimating the effective number of mates is mathematically identical to the problem of estimating the effective number of alleles in population genetics, a common measure of genetic variability introduced by Kimura & Crow (1964). We derive a new bias-corrected estimator of effective number of types (mates or alleles) and compare this new method to previous methods for estimating true and effective numbers of types using Monte Carlo simulations. Our simulation results suggest that the examined estimators of the true number of types have very similar statistical properties, whereas the estimators of effective number of types have quite different statistical properties. Moreover, our new proposed estimator of effective number of types is approximately unbiased, and has considerably lower variance than the original estimator. Our new method will help researchers more accurately estimate intracolony genetic relatedness of social insects, which is an important measure in understanding their ecology and social behaviour. It should also be of use in population genetic studies in which the effective number of alleles is of interest.  相似文献   

6.
Knowledge of relatedness between pairs of individuals plays an important role in many research areas including evolutionary biology, quantitative genetics, and conservation. Pairwise relatedness estimation methods based on genetic data from highly variable molecular markers are now used extensively as a substitute for pedigrees. Although the sampling variance of the estimators has been intensively studied for the most common simple genetic relationships, such as unrelated, half- and full-sib, or parent-offspring, little attention has been paid to the average performance of the estimators, by which we mean the performance across all pairs of individuals in a sample. Here we apply two measures to quantify the average performance: first, misclassification rates between pairs of genetic relationships and, second, the proportion of variance explained in the pairwise relatedness estimates by the true population relatedness composition (i.e., the frequencies of different relationships in the population). Using simulated data derived from exceptionally good quality marker and pedigree data from five long-term projects of natural populations, we demonstrate that the average performance depends mainly on the population relatedness composition and may be improved by the marker data quality only within the limits of the population relatedness composition. Our five examples of vertebrate breeding systems suggest that due to the remarkably low variance in relatedness across the population, marker-based estimates may often have low power to address research questions of interest.  相似文献   

7.
Cautions on direct gene flow estimation in plant populations   总被引:4,自引:0,他引:4  
Through simulations we have investigated the statistical properties of two of the main approaches for directly estimating pollen gene flow (m) in plant populations: genotypic exclusion and mating models. When the assumptions about accurately known background pollen pool allelic frequencies are met, both methods provide unbiased results with comparable variances across a range of true m values. However, when presumed allelic frequencies differ from actual ones, which is more likely in research practice, both estimators are biased. We demonstrate that the extent and direction of bias largely depend on the difference (measured as genetic distance) between the presumed and actual pollen pools, and on the degree of genetic differentiation between the local population and the actual background pollen sources. However, one feature of the mating model is its ability to estimate pollen gene flow simultaneously with background pollen pool allelic frequencies. We have found that this approach gives nearly unbiased pollen gene flow estimates, and is practical because it eliminates the necessity of providing independent estimates of background pollen pool allelic frequencies. Violations of the mating model assumptions of random mating within local population affect the precision of the estimates only to a limited degree.  相似文献   

8.
Studies of inbreeding depression or kin selection require knowledge of relatedness between individuals. If pedigree information is lacking, one has to rely on genotypic information to infer relatedness. In this study we investigated the performance (absolute and relative) of 10 marker-based relatedness estimators using allele frequencies at microsatellite loci obtained from natural populations of two bird species and one mammal species. Using Monte Carlo simulations we show that many factors affect the performance of estimators and that different sets of loci promote the use of different estimators: in general, there is no single best-performing estimator. The use of locus-specific weights turns out to greatly improve the performance of estimators when marker loci are used that differ strongly in allele frequency distribution. Microsatellite-based estimates are expected to explain between 25 and 79% of variation in true relatedness depending on the microsatellite dataset and on the population composition (i.e. the frequency distribution of relationship in the population). We recommend performing Monte Carlo simulations to decide which estimator to use in studies of pairwise relatedness.  相似文献   

9.
The inference of population genetic structures is essential in many research areas in population genetics, conservation biology and evolutionary biology. Recently, unsupervised Bayesian clustering algorithms have been developed to detect a hidden population structure from genotypic data, assuming among others that individuals taken from the population are unrelated. Under this assumption, markers in a sample taken from a subpopulation can be considered to be in Hardy-Weinberg and linkage equilibrium. However, close relatives might be sampled from the same subpopulation, and consequently, might cause Hardy-Weinberg and linkage disequilibrium and thus bias a population genetic structure analysis. In this study, we used simulated and real data to investigate the impact of close relatives in a sample on Bayesian population structure analysis. We also showed that, when close relatives were identified by a pedigree reconstruction approach and removed, the accuracy of a population genetic structure analysis can be greatly improved. The results indicate that unsupervised Bayesian clustering algorithms cannot be used blindly to detect genetic structure in a sample with closely related individuals. Rather, when closely related individuals are suspected to be frequent in a sample, these individuals should be first identified and removed before conducting a population structure analysis.  相似文献   

10.
Maximum-likelihood estimation of relatedness   总被引:8,自引:0,他引:8  
Milligan BG 《Genetics》2003,163(3):1153-1167
Relatedness between individuals is central to many studies in genetics and population biology. A variety of estimators have been developed to enable molecular marker data to quantify relatedness. Despite this, no effort has been given to characterize the traditional maximum-likelihood estimator in relation to the remainder. This article quantifies its statistical performance under a range of biologically relevant sampling conditions. Under the same range of conditions, the statistical performance of five other commonly used estimators of relatedness is quantified. Comparison among these estimators indicates that the traditional maximum-likelihood estimator exhibits a lower standard error under essentially all conditions. Only for very large amounts of genetic information do most of the other estimators approach the likelihood estimator. However, the likelihood estimator is more biased than any of the others, especially when the amount of genetic information is low or the actual relationship being estimated is near the boundary of the parameter space. Even under these conditions, the amount of bias can be greatly reduced, potentially to biologically irrelevant levels, with suitable genetic sampling. Additionally, the likelihood estimator generally exhibits the lowest root mean-square error, an indication that the bias in fact is quite small. Alternative estimators restricted to yield only biologically interpretable estimates exhibit lower standard errors and greater bias than do unrestricted ones, but generally do not improve over the maximum-likelihood estimator and in some cases exhibit even greater bias. Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.  相似文献   

11.

Background

Genetic relatedness or similarity between individuals is a key concept in population, quantitative and conservation genetics. When the pedigree of a population is available and assuming a founder population from which the genealogical records start, genetic relatedness between individuals can be estimated by the coancestry coefficient. If pedigree data is lacking or incomplete, estimation of the genetic similarity between individuals relies on molecular markers, using either molecular coancestry or molecular covariance. Some relationships between genealogical and molecular coancestries and covariances have already been described in the literature.

Methods

We show how the expected values of the empirical measures of similarity based on molecular marker data are functions of the genealogical coancestry. From these formulas, it is easy to derive estimators of genealogical coancestry from molecular data. We include variation of allelic frequencies in the estimators.

Results

The estimators are illustrated with simulated examples and with a real dataset from dairy cattle. In general, estimators are accurate and only slightly biased. From the real data set, estimators based on covariances are more compatible with genealogical coancestries than those based on molecular coancestries. A frequently used estimator based on the average of estimated coancestries produced inflated coancestries and numerical instability. The consequences of unknown gene frequencies in the founder population are briefly discussed, along with alternatives to overcome this limitation.

Conclusions

Estimators of genealogical coancestry based on molecular data are easy to derive. Estimators based on molecular covariance are more accurate than those based on identity by state. A correction considering the random distribution of allelic frequencies improves accuracy of these estimators, especially for populations with very strong drift.  相似文献   

12.
Twenty-six individuals of the sporophytic self-incompatible (SSI) weed, Senecio squalidus were crossed in a full diallel to determine the number and frequency of S alleles in an Oxford population. Incompatibility phenotypes were determined by fruit-set results and the mating patterns observed fitted a SSI model that allowed us to identify six S alleles. Standard population S allele number estimators were modified to deal with S allele data from a species with SSI. These modified estimators predicted a total number of approximately six S alleles for the entire Oxford population of S. squalidus. This estimate of S allele number is low compared to other estimates of S allele diversity in species with SSI. Low S allele diversity in S. squalidus is expected to have arisen as a consequence of a disturbed population history since its introduction and subsequent colonisation of the British Isles. Other features of the SSI system in S. squalidus were also investigated: (a) the strength of self-incompatibility response; (b) the nature of S allele dominance interactions; and (c) the relative frequencies of S phenotypes. These are discussed in view of the low S allele diversity estimates and the known population history of S. squalidus.  相似文献   

13.
Several estimators have been proposed that use molecular marker data to infer the degree of relatedness for pairs of individuals. The objective of this study was to evaluate the performance of seven estimators when applied to marker data of a set of 33 key individuals from a large complex apple pedigree. The evaluation considered different scenarios of allele frequencies and different numbers of marker loci. The method of moments estimators were Similarity, Queller-Goodknight, Lynch-Ritland and Wang. The maximum likelihood estimators were Thompson, Anderson-Weir and Jacquard. The pedigree-based coancestry coefficients were taken as the point of reference in calculating correlations and root mean square error (RMSE). The marker data comprised 86 multi-allelic SSR markers on 17 linkage groups, covering 11 Morgans. Additionally, we simulated 10 datasets conditional on the real pedigree to support the results on the real dataset. None of the estimators outperformed the others. Knowledge of allele frequencies appeared to be the most influential, i.e., the highest correlations and lowest RMSE were found when frequencies from the founder population were available. When equal allele frequencies were used, all estimators resulted in very similar, but on average lower, correlations. The use of allele frequencies estimated from the set of 33 individuals gave, on average, the poorest results. The maximum likelihood estimators and the Lynch-Ritland estimator were the most sensitive to allele frequencies. The results from the simulation study fully supported the trends in results of the real dataset. This study indicated that high correlations (up to 0.90) and small RMSE (below 0.03), may be obtained when population allelic frequencies are available. In this scenario, the performances of the various estimators were similar, but seemed to favor the maximum likelihood estimators. In the absence of reliable allele frequencies the method of moments estimators were shown to be more robust. The number of marker loci influenced the average performance of the estimators; however, the ranking was not affected. Correlations up to 0.80 were obtained when two markers per chromosome and appropriate allele frequencies were available. Adding more markers to the current dataset may lead to marginal improvements.  相似文献   

14.
Greaves S  Sanson B  White P  Vincent JP 《Genetics》1999,152(4):1753-1766
Applications of quantitative genetics and conservation genetics often require measures of pairwise relationships between individuals, which, in the absence of known pedigree structure, can be estimated only by use of molecular markers. Here we introduce methods for the joint estimation of the two-gene and four-gene coefficients of relationship from data on codominant molecular markers in randomly mating populations. In a comparison with other published estimators of pairwise relatedness, we find these new "regression" estimators to be computationally simpler and to yield similar or lower sampling variances, particularly when many loci are used or when loci are hypervariable. Two examples are given in which the new estimators are applied to natural populations, one that reveals isolation-by-distance in an annual plant and the other that suggests a genetic basis for a coat color polymorphism in bears.  相似文献   

15.
Simultaneous estimation of null alleles and inbreeding coefficients   总被引:1,自引:0,他引:1  
Although microsatellites are a very efficient tool for many population genetics applications, they may occasionally produce "null" alleles, which, when present in high proportion, may affect estimates of key parameters such as inbreeding and relatedness coefficients or measures of genetic differentiation. In order to account for the presence of null alleles, it is first necessary to estimate their frequency within studied populations. However, the commonly used null allele frequency estimators are not of general applicability because they can produce upwardly biased estimates when a population under study experiences some inbreeding. In such a case, 2 formerly described approaches, population inbreeding model and individual inbreeding model, can be applied for simultaneous estimation of null allele frequencies and of the inbreeding coefficient. In this study, we demonstrate the properties and utility of these 2 methods and show that they outperform the commonly used approaches in the estimation of null allele frequencies based on genotypic data. The methods are applied to empirical data from a natural population of European beech (Fagus sylvatica L.), and results are briefly discussed. The methods presented in this paper are implemented in the Windows-based user-friendly INEST computer program (available free of charge at http://genetyka.ukw.edu.pl/INEst10_setup.exe).  相似文献   

16.
Genetic relatedness is a vital parameter in the evolution of social behaviour by kin selection. It can be easily estimated using genetic markers and calculating the genotypic correlation or regression of group members. Spatial gene frequency differentiation, due to population subdivision or isolation by distance, boosts the relatedness estimates. In such cases it may be useful to partition the estimate into components, the operational relatedness is normally that among individuals in social groups within the same subpopulation. Although it is straightforward to estimate the average relatedness in social groups, estimating values for specific individuals with the help of genetic markers is still problematic. Current estimators tend to give biased values and the sampling error is large. In spite of these shortcomings, studies of social behaviour combining relatedness and reproductive success are sorely needed.  相似文献   

17.
The Demerelate package offers algorithms to calculate different interindividual relatedness measurements. Three different allele sharing indices, five pairwise weighted estimates of relatedness and four pairwise weighted estimates with sample size correction are implemented to analyse kinship structures within populations. Statistics are based on randomization tests; modelling relatedness coefficients by logistic regression, modelling relatedness with geographic distance by mantel correlation and comparing mean relatedness between populations using pairwise t‐tests. Demerelate provides an advance on previous software packages by including some estimators not available in R to date, along with FIS, as well as combining analysis of relatedness and spatial structuring. An UPGMA tree visualizes genetic relatedness among individuals. Additionally, Demerelate summarizes information on data sets (allele vs. genotype frequencies; heterozygosity; FIS values). Demerelate is – to our knowledge – the first R package implementing basic allele sharing indices such as Blouin's Mxy relatedness, the estimator of Wang corrected for sample size (wangxy), estimators based on Morans I adapted to genetic relatedness as well as combining all estimators with geographic information. The R environment enables users to better understand relatedness within populations due to the flexibility of Demerelate of accepting different data sets as empirical data, reference data, geographical data and by providing intermediate results. Each statistic and tool can be used separately, which helps to understand the suitability of the data for relatedness analysis, and can be easily implemented in custom pipelines.  相似文献   

18.
Studies in genetics and ecology often require estimates of relatedness coefficients based on genetic marker data. Many diploid estimators have been developed using either method‐of‐moments or maximum‐likelihood estimates. However, there are no relatedness estimators for polyploids. The development of a moment estimator for polyploids with polysomic inheritance, which simultaneously incorporates the two‐gene relatedness coefficient and various ‘higher‐order’ coefficients, is described here. The performance of the estimator is compared to other estimators under a variety of conditions. When using a small number of loci, the estimator is biased because of an increase in ill‐conditioned matrices. However, the estimator becomes asymptotically unbiased with large numbers of loci. The ambiguity of polyploid heterozygotes (when balanced heterozygotes cannot be distinguished from unbalanced heterozygotes) is also considered; as with low numbers of loci, genotype ambiguity leads to bias. A software, PolyRelatedness , implementing this method and supporting a maximum ploidy of 8 is provided.  相似文献   

19.
The coancestry coefficient, also known as the population structure parameter, is of great interest in population genetics. It can be thought of as the intraclass correlation of pairs of alleles within populations and it can serve as a measure of genetic distance between populations. For a general class of evolutionary models it determines the distribution of allele frequencies among populations. Under more restrictive models it can be regarded as the probability of identity by descent of any pair of alleles at a locus within a random mating population. In this paper we review estimation procedures that use the method of moments or are maximum likelihood under the assumption of normally distributed allele frequencies. We then consider the problem of testing hypotheses about this parameter. In addition to parametric and non-parametric bootstrap tests we present an asymptotically-distributed chi-square test. This test reduces to the contingency-table test for equal sample sizes across populations. Our new test appears to be more powerful than previous tests, especially for loci with multiple alleles. We apply our methods to HapMap SNP data to confirm that the coancestry coefficient for humans is strictly positive.  相似文献   

20.
Genome-wide association studies (GWASs) are commonly used for the mapping of genetic loci that influence complex traits. A problem that is often encountered in both population-based and family-based GWASs is that of identifying cryptic relatedness and population stratification because it is well known that failure to appropriately account for both pedigree and population structure can lead to spurious association. A number of methods have been proposed for identifying relatives in samples from homogeneous populations. A strong assumption of population homogeneity, however, is often untenable, and many GWASs include samples from structured populations. Here, we consider the problem of estimating relatedness in structured populations with admixed ancestry. We propose a method, REAP (relatedness estimation in admixed populations), for robust estimation of identity by descent (IBD)-sharing probabilities and kinship coefficients in admixed populations. REAP appropriately accounts for population structure and ancestry-related assortative mating by using individual-specific allele frequencies at SNPs that are calculated on the basis of ancestry derived from whole-genome analysis. In simulation studies with related individuals and admixture from highly divergent populations, we demonstrate that REAP gives accurate IBD-sharing probabilities and kinship coefficients. We apply REAP to the Mexican Americans in Los Angeles, California (MXL) population sample of release 3 of phase III of the International Haplotype Map Project; in this sample, we identify third- and fourth-degree relatives who have not previously been reported. We also apply REAP to the African American and Hispanic samples from the Women's Health Initiative SNP Health Association Resource (WHI-SHARe) study, in which hundreds of pairs of cryptically related individuals have been identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号