首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Maximum-likelihood estimation of relatedness   总被引:8,自引:0,他引:8  
Milligan BG 《Genetics》2003,163(3):1153-1167
Relatedness between individuals is central to many studies in genetics and population biology. A variety of estimators have been developed to enable molecular marker data to quantify relatedness. Despite this, no effort has been given to characterize the traditional maximum-likelihood estimator in relation to the remainder. This article quantifies its statistical performance under a range of biologically relevant sampling conditions. Under the same range of conditions, the statistical performance of five other commonly used estimators of relatedness is quantified. Comparison among these estimators indicates that the traditional maximum-likelihood estimator exhibits a lower standard error under essentially all conditions. Only for very large amounts of genetic information do most of the other estimators approach the likelihood estimator. However, the likelihood estimator is more biased than any of the others, especially when the amount of genetic information is low or the actual relationship being estimated is near the boundary of the parameter space. Even under these conditions, the amount of bias can be greatly reduced, potentially to biologically irrelevant levels, with suitable genetic sampling. Additionally, the likelihood estimator generally exhibits the lowest root mean-square error, an indication that the bias in fact is quite small. Alternative estimators restricted to yield only biologically interpretable estimates exhibit lower standard errors and greater bias than do unrestricted ones, but generally do not improve over the maximum-likelihood estimator and in some cases exhibit even greater bias. Although some nonlikelihood estimators exhibit better performance with respect to specific metrics under some conditions, none approach the high level of performance exhibited by the likelihood estimator across all conditions and all metrics of performance.  相似文献   

2.
Wang J 《Genetics》2006,173(3):1679-1692
A variety of estimators have been developed to use genetic marker information in inferring the admixture proportions (parental contributions) of a hybrid population. The majority of these estimators used allele frequency data, ignored molecular information that is available in markers such as microsatellites and DNA sequences, and assumed that mutations are absent since the admixture event. As a result, these estimators may fail to deliver an estimate or give rather poor estimates when admixture is ancient and thus mutations are not negligible. A previous molecular estimator based its inference of admixture proportions on the average coalescent times between pairs of genes taken from within and between populations. In this article I propose an estimator that considers the entire genealogy of all of the sampled genes and infers admixture proportions from the numbers of segregating sites in DNA sequence samples. By considering the genealogy of all sequences rather than pairs of sequences, this new estimator also allows the joint estimation of other interesting parameters in the admixture model, such as admixture time, divergence time, population size, and mutation rate. Comparative analyses of simulated data indicate that the new coalescent estimator generally yields better estimates of admixture proportions than the previous molecular estimator, especially when the parental populations are not highly differentiated. It also gives reasonably accurate estimates of other admixture parameters. A human mtDNA sequence data set was analyzed to demonstrate the method, and the analysis results are discussed and compared with those from previous studies.  相似文献   

3.
Relatedness estimators are widely used in genetic studies, but effects of population structure on performance of estimators, criteria to evaluate estimators, and benefits of using such estimators in conservation programs have to date received little attention. In this article we present new estimators, based on the relationship between coancestry and molecular similarity between individuals, and compare them with existing estimators using Monte Carlo simulation of populations, either panmictic or structured. Estimators were evaluated using statistical criteria and a diversity criterion that minimized relatedness. Results show that ranking of estimators depends on the population structure. An existing estimator based on two-gene and four-gene coefficients of identity performs best in panmictic populations, whereas a new estimator based on coancestry performs best in structured populations. The number of marker alleles and loci did not affect ranking of estimators. Statistical criteria were insufficient to evaluate estimators for their use in conservation programs. The regression coefficient of pedigree relatedness on estimated relatedness (beta2) was substantially lower than unity for all estimators, causing overestimation of the diversity conserved. A simple correction to achieve beta2 = 1 improves both existing and new estimators. Using relatedness estimates with correction considerably increased diversity in structured populations, but did not do so or even decreased diversity in panmictic populations.  相似文献   

4.
Perme MP  Stare J  Estève J 《Biometrics》2012,68(1):113-120
Estimation of relative survival has become the first and the most basic step when reporting cancer survival statistics. Standard estimators are in routine use by all cancer registries. However, it has been recently noted that these estimators do not provide information on cancer mortality that is independent of the national general population mortality. Thus they are not suitable for comparison between countries. Furthermore, the commonly used interpretation of the relative survival curve is vague and misleading. The present article attempts to remedy these basic problems. The population quantities of the traditional estimators are carefully described and their interpretation discussed. We then propose a new estimator of net survival probability that enables the desired comparability between countries. The new estimator requires no modeling and is accompanied with a straightforward variance estimate. The methods are described on real as well as simulated data.  相似文献   

5.
Tallmon DA  Luikart G  Beaumont MA 《Genetics》2004,167(2):977-988
We describe and evaluate a new estimator of the effective population size (N(e)), a critical parameter in evolutionary and conservation biology. This new "SummStat" N(e) estimator is based upon the use of summary statistics in an approximate Bayesian computation framework to infer N(e). Simulations of a Wright-Fisher population with known N(e) show that the SummStat estimator is useful across a realistic range of individuals and loci sampled, generations between samples, and N(e) values. We also address the paucity of information about the relative performance of N(e) estimators by comparing the SummStat estimator to two recently developed likelihood-based estimators and a traditional moment-based estimator. The SummStat estimator is the least biased of the four estimators compared. In 32 of 36 parameter combinations investigated using initial allele frequencies drawn from a Dirichlet distribution, it has the lowest bias. The relative mean square error (RMSE) of the SummStat estimator was generally intermediate to the others. All of the estimators had RMSE > 1 when small samples (n = 20, five loci) were collected a generation apart. In contrast, when samples were separated by three or more generations and N(e) < or = 50, the SummStat and likelihood-based estimators all had greatly reduced RMSE. Under the conditions simulated, SummStat confidence intervals were more conservative than the likelihood-based estimators and more likely to include true N(e). The greatest strength of the SummStat estimator is its flexible structure. This flexibility allows it to incorporate any potentially informative summary statistic from population genetic data.  相似文献   

6.
Studies of inbreeding depression or kin selection require knowledge of relatedness between individuals. If pedigree information is lacking, one has to rely on genotypic information to infer relatedness. In this study we investigated the performance (absolute and relative) of 10 marker-based relatedness estimators using allele frequencies at microsatellite loci obtained from natural populations of two bird species and one mammal species. Using Monte Carlo simulations we show that many factors affect the performance of estimators and that different sets of loci promote the use of different estimators: in general, there is no single best-performing estimator. The use of locus-specific weights turns out to greatly improve the performance of estimators when marker loci are used that differ strongly in allele frequency distribution. Microsatellite-based estimates are expected to explain between 25 and 79% of variation in true relatedness depending on the microsatellite dataset and on the population composition (i.e. the frequency distribution of relationship in the population). We recommend performing Monte Carlo simulations to decide which estimator to use in studies of pairwise relatedness.  相似文献   

7.
Royle JA 《Biometrics》2004,60(1):108-115
Spatial replication is a common theme in count surveys of animals. Such surveys often generate sparse count data from which it is difficult to estimate population size while formally accounting for detection probability. In this article, I describe a class of models (N-mixture models) which allow for estimation of population size from such data. The key idea is to view site-specific population sizes, N, as independent random variables distributed according to some mixing distribution (e.g., Poisson). Prior parameters are estimated from the marginal likelihood of the data, having integrated over the prior distribution for N. Carroll and Lombard (1985, Journal of American Statistical Association 80, 423-426) proposed a class of estimators based on mixing over a prior distribution for detection probability. Their estimator can be applied in limited settings, but is sensitive to prior parameter values that are fixed a priori. Spatial replication provides additional information regarding the parameters of the prior distribution on N that is exploited by the N-mixture models and which leads to reasonable estimates of abundance from sparse data. A simulation study demonstrates superior operating characteristics (bias, confidence interval coverage) of the N-mixture estimator compared to the Caroll and Lombard estimator. Both estimators are applied to point count data on six species of birds illustrating the sensitivity to choice of prior on p and substantially different estimates of abundance as a consequence.  相似文献   

8.
An estimator for pairwise relatedness using molecular markers   总被引:21,自引:0,他引:21  
Wang J 《Genetics》2002,160(3):1203-1215
I propose a new estimator for jointly estimating two-gene and four-gene coefficients of relatedness between individuals from an outbreeding population with data on codominant genetic markers and compare it, by Monte Carlo simulations, to previous ones in precision and accuracy for different distributions of population allele frequencies, numbers of alleles per locus, actual relationships, sample sizes, and proportions of relatives included in samples. In contrast to several previous estimators, the new estimator is well behaved and applies to any number of alleles per locus and any allele frequency distribution. The estimates for two- and four-gene coefficients of relatedness from the new estimator are unbiased irrespective of the sample size and have sampling variances decreasing consistently with an increasing number of alleles per locus to the minimum asymptotic values determined by the variation in identity-by-descent among loci per se, regardless of the actual relationship. The new estimator is also robust for small sample sizes and for unknown relatives being included in samples for estimating allele frequencies. Compared to previous estimators, the new one is generally advantageous, especially for highly polymorphic loci and/or small sample sizes.  相似文献   

9.
A Phylogenetic Estimator of Effective Population Size or Mutation Rate   总被引:17,自引:7,他引:10       下载免费PDF全文
Y. X. Fu 《Genetics》1994,136(2):685-692
A new estimator of the essential parameter θ = 4N(e)μ from DNA polymorphism data is developed under the neutral Wright-Fisher model without recombination and population subdivision, where N(e) is the effective population size and μ is the mutation rate per locus per generation. The new estimator has a variance only slightly larger than the minimum variance of all possible unbiased estimators of the parameter and is substantially smaller than that of any existing estimator. The high efficiency of the new estimator is achieved by making full use of phylogenetic information in a sample of DNA sequences from a population. An example of estimating θ by the new method is presented using the mitochondrial sequences from an American Indian population.  相似文献   

10.
We consider the estimation of the scaled mutation parameter θ, which is one of the parameters of key interest in population genetics. We provide a general result showing when estimators of θ can be improved using shrinkage when taking the mean squared error as the measure of performance. As a consequence, we show that Watterson’s estimator is inadmissible, and propose an alternative shrinkage-based estimator that is easy to calculate and has a smaller mean squared error than Watterson’s estimator for all possible parameter values 0<θ<. This estimator is admissible in the class of all linear estimators. We then derive improved versions for other estimators of θ, including the MLE. We also investigate how an improvement can be obtained both when combining information from several independent loci and when explicitly taking into account recombination. A simulation study provides information about the amount of improvement achieved by our alternative estimators.  相似文献   

11.
In sample surveys, it is usual to make use of auxiliary information to increase the precision of the estimators. We propose a new chain ratio estimator and regression estimator of a finite population mean using linear combination of two auxiliary variables and obtain the mean squared error (MSE) equations for the proposed estimators. We find theoretical conditions that make proposed estimators more efficient than the traditional multivariate ratio estimator and the regression estimator using information of two auxiliary variables.  相似文献   

12.
Methods to evaluate populations for alleles to improve an elite hybrid   总被引:1,自引:0,他引:1  
Elite hybrids can be improved by the introgression of favorable alleles not already present in the hybrid. Our first objective was to evaluate several estimators derived from quantitative genetic theory that attempt to quantify the relative number of useful alleles in potential donor populations. Secondly, we wanted to evaluate two proposed ways of determining relatedness of donor populations to the parents of the elite hybrid. Two experiments, each consisting of 21 maize populations of known pedigree, were grown at three and four environments in Minnesota in 1991. Yield and plant height means were used to provide estimates of each of the following statistics: (1) LPLU, a minimally biased statistic, (2) UBND, the minimum estimate of an upper bound, (3) NI, the net improvement, (4) PTC, the predicted three-way cross, and (5) TCSC, the testcross of the populations. These statistics are biased estimators of the relative number of unique favorable alleles contained within a population compared to a reference elite hybrid. Based on rank correlations, all statistics except NI ranked populations similarly. The percent novel germplasm relative to the single cross to be improved was positively correlated with the estimates of favorable alleles except when NI was used as the estimator. The relationship estimators agreed with the genetic constitution of the donor populations. Strong positive correlations existed between diversity, based on the relationship rankings, and all the estimator rankings, except NI. Potential donor populations were effectively identified by LPLU, UBND, PTC, and TCSC. NI was not a good estimator of unique favorable alleles.  相似文献   

13.
Anderson AD  Weir BS 《Genetics》2007,176(1):421-440
A maximum-likelihood estimator for pairwise relatedness is presented for the situation in which the individuals under consideration come from a large outbred subpopulation of the population for which allele frequencies are known. We demonstrate via simulations that a variety of commonly used estimators that do not take this kind of misspecification of allele frequencies into account will systematically overestimate the degree of relatedness between two individuals from a subpopulation. A maximum-likelihood estimator that includes F(ST) as a parameter is introduced with the goal of producing the relatedness estimates that would have been obtained if the subpopulation allele frequencies had been known. This estimator is shown to work quite well, even when the value of F(ST) is misspecified. Bootstrap confidence intervals are also examined and shown to exhibit close to nominal coverage when F(ST) is correctly specified.  相似文献   

14.
Important aspects of population evolution have been investigated using nucleotide sequences. Under the neutral Wright–Fisher model, the scaled mutation rate represents twice the average number of new mutations per generations and it is one of the key parameters in population genetics. In this study, we present various methods of estimation of this parameter, analytical studies of their asymptotic behavior as well as comparisons of the distribution's behavior of these estimators through simulations. As knowledge of the genealogy is needed to estimate the maximum likelihood estimator (MLE), an application with real data is also presented, using jackknife to correct the bias of the MLE, which can be generated by the estimation of the tree. We proved analytically that the Waterson's estimator and the MLE are asymptotically equivalent with the same rate of convergence to normality. Furthermore, we showed that the MLE has a better rate of convergence than Waterson's estimator for values of the parameter greater than one and this relationship is reversed when the parameter is less than one.  相似文献   

15.
Hardy OJ 《Molecular ecology》2003,12(6):1577-1588
A new estimator of the pairwise relatedness coefficient between individuals adapted to dominant genetic markers is developed. This estimator does not assume genotypes to be in Hardy-Weinberg proportions but requires a knowledge of the departure from these proportions (i.e. the inbreeding coefficient). Simulations show that the estimator provides accurate estimates, except for some particular types of individual pairs such as full-sibs, and performs better than a previously developed estimator. When comparing marker-based relatedness estimates with pedigree expectations, a new approach to account for the change of the reference population is developed and shown to perform satisfactorily. Simulations also illustrate that this new relatedness estimator can be used to characterize isolation by distance within populations, leading to essentially unbiased estimates of the neighbourhood size. In this context, the estimator appears fairly robust to moderate errors made on the assumed inbreeding coefficient. The analysis of real data sets suggests that dominant markers (random amplified polymorphic DNA, amplified fragment length polymorphism) may be as valuable as co-dominant markers (microsatellites) in studying microgeographic isolation-by-distance processes. It is argued that the estimators developed should find major applications, notably for conservation biology.  相似文献   

16.
Ratio estimation with measurement error in the auxiliary variate   总被引:1,自引:0,他引:1  
Gregoire TG  Salas C 《Biometrics》2009,65(2):590-598
Summary .  With auxiliary information that is well correlated with the primary variable of interest, ratio estimation of the finite population total may be much more efficient than alternative estimators that do not make use of the auxiliary variate. The well-known properties of ratio estimators are perturbed when the auxiliary variate is measured with error. In this contribution we examine the effect of measurement error in the auxiliary variate on the design-based statistical properties of three common ratio estimators. We examine the case of systematic measurement error as well as measurement error that varies according to a fixed distribution. Aside from presenting expressions for the bias and variance of these estimators when they are contaminated with measurement error we provide numerical results based on a specific population. Under systematic measurement error, the biasing effect is asymmetric around zero, and precision may be improved or degraded depending on the magnitude of the error. Under variable measurement error, bias of the conventional ratio-of-means estimator increased slightly with increasing error dispersion, but far less than the increased bias of the conventional mean-of-ratios estimator. In similar fashion, the variance of the mean-of-ratios estimator incurs a greater loss of precision with increasing error dispersion compared with the other estimators we examine. Overall, the ratio-of-means estimator appears to be remarkably resistant to the effects of measurement error in the auxiliary variate.  相似文献   

17.
Jinliang Wang 《Molecular ecology》2016,25(19):4692-4711
In molecular ecology and conservation genetics studies, the important parameter of effective population size (Ne) is increasingly estimated from a single sample of individuals taken at random from a population and genotyped at a number of marker loci. Several estimators are developed, based on the information of linkage disequilibrium (LD), heterozygote excess (HE), molecular coancestry (MC) and sibship frequency (SF) in marker data. The most popular is the LD estimator, because it is more accurate than HE and MC estimators and is simpler to calculate than SF estimator. However, little is known about the accuracy of LD estimator relative to that of SF and about the robustness of all single‐sample estimators when some simplifying assumptions (e.g. random mating, no linkage, no genotyping errors) are violated. This study fills the gaps and uses extensive simulations to compare the biases and accuracies of the four estimators for different population properties (e.g. bottlenecks, nonrandom mating, haplodiploid), marker properties (e.g. linkage, polymorphisms) and sample properties (e.g. numbers of individuals and markers) and to compare the robustness of the four estimators when marker data are imperfect (with allelic dropouts). Extensive simulations show that SF estimator is more accurate, has a much wider application scope (e.g. suitable to nonrandom mating such as selfing, haplodiploid species, dominant markers) and is more robust (e.g. to the presence of linkage and genotyping errors of markers) than the other estimators. An empirical data set from a Yellowstone grizzly bear population was analysed to demonstrate the use of the SF estimator in practice.  相似文献   

18.
IntroductionSurveillance networks are often not exhaustive nor completely complementary. In such situations, capture-recapture methods can be used for incidence estimation. The choice of estimator and their robustness with respect to the homogeneity and independence assumptions are however not well documented.MethodsWe investigated the performance of five different capture-recapture estimators in a simulation study. Eight different scenarios were used to detect and combine case-information. The scenarios increasingly violated assumptions of independence of samples and homogeneity of detection probabilities. Belgian datasets on invasive pneumococcal disease (IPD) and pertussis provided motivating examples.ResultsNo estimator was unbiased in all scenarios. Performance of the parametric estimators depended on how much of the dependency and heterogeneity were correctly modelled. Model building was limited by parameter estimability, availability of additional information (e.g. covariates) and the possibilities inherent to the method. In the most complex scenario, methods that allowed for detection probabilities conditional on previous detections estimated the total population size within a 20–30% error-range. Parametric estimators remained stable if individual data sources lost up to 50% of their data. The investigated non-parametric methods were more susceptible to data loss and their performance was linked to the dependence between samples; overestimating in scenarios with little dependence, underestimating in others. Issues with parameter estimability made it impossible to model all suggested relations between samples for the IPD and pertussis datasets. For IPD, the estimates for the Belgian incidence for cases aged 50 years and older ranged from 44 to58/100,000 in 2010. The estimates for pertussis (all ages, Belgium, 2014) ranged from 24.2 to30.8/100,000.ConclusionWe encourage the use of capture-recapture methods, but epidemiologists should preferably include datasets for which the underlying dependency structure is not too complex, a priori investigate this structure, compensate for it within the model and interpret the results with the remaining unmodelled heterogeneity in mind.  相似文献   

19.
Procedures to estimate the genetic segregation parameter when ascertainment of families is incomplete, have previously relied on iterative computer algorithms since estimators with closed form are lacking. We now present the Minimum Variance Unbiased Estimator for the segregation parameter under any ascertainment probability. This estimator assumes a simple form when ascertainment is complete. We also present a simple estimator, akin to Li and Mantel's (1968) estimator, but without the restriction that ascertainment be complete. The performance of these estimators is compared with respect to asymptotic efficiency. We also provide tables that define the required number of families of a given size that need to be sampled to achieve a specific power for testing simple hypothesis on the segregation parameter.  相似文献   

20.
Mancl and DeRouen (2001, Biometrics57, 126-134) and Kauermann and Carroll (2001, JASA96, 1387-1398) proposed alternative bias-corrected covariance estimators for generalized estimating equations parameter estimates of regression models for marginal means. The finite sample properties of these estimators are compared to those of the uncorrected sandwich estimator that underestimates variances in small samples. Although the formula of Mancl and DeRouen generally overestimates variances, it often leads to coverage of 95% confidence intervals near the nominal level even in some situations with as few as 10 clusters. An explanation for these seemingly contradictory results is that the tendency to undercoverage resulting from the substantial variability of sandwich estimators counteracts the impact of overcorrecting the bias. However, these positive results do not generally hold; for small cluster sizes (e.g., <10) their estimator often results in overcoverage, and the bias-corrected covariance estimator of Kauermann and Carroll may be preferred. The methods are illustrated using data from a nested cross-sectional cluster intervention trial on reducing underage drinking.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号