首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
Stephens and Donnelly have introduced a simple yet powerful importance sampling scheme for computing the likelihood in population genetic models. Fundamental to the method is an approximation to the conditional probability of the allelic type of an additional gene, given those currently in the sample. As noted by Li and Stephens, the product of these conditional probabilities for a sequence of draws that gives the frequency of allelic types in a sample is an approximation to the likelihood, and can be used directly in inference. The aim of this note is to demonstrate the high level of accuracy of "product of approximate conditionals" (PAC) likelihood when used with microsatellite data. Results obtained on simulated microsatellite data show that this strategy leads to a negligible bias over a wide range of the scaled mutation parameter theta. Furthermore, the sampling variance of likelihood estimates as well as the computation time are lower than that obtained with importance sampling on the whole range of theta. It follows that this approach represents an efficient substitute to IS algorithms in computer intensive (e.g. MCMC) inference methods in population genetics.  相似文献   

2.
We explore the estimation of uncertainty in evolutionary parameters using a recently devised approach for resampling entire additive genetic variance–covariance matrices ( G ). Large‐sample theory shows that maximum‐likelihood estimates (including restricted maximum likelihood, REML) asymptotically have a multivariate normal distribution, with covariance matrix derived from the inverse of the information matrix, and mean equal to the estimated G . This suggests that sampling estimates of G from this distribution can be used to assess the variability of estimates of G , and of functions of G . We refer to this as the REML‐MVN method. This has been implemented in the mixed‐model program WOMBAT. Estimates of sampling variances from REML‐MVN were compared to those from the parametric bootstrap and from a Bayesian Markov chain Monte Carlo (MCMC) approach (implemented in the R package MCMCglmm). We apply each approach to evolvability statistics previously estimated for a large, 20‐dimensional data set for Drosophila wings. REML‐MVN and MCMC sampling variances are close to those estimated with the parametric bootstrap. Both slightly underestimate the error in the best‐estimated aspects of the G matrix. REML analysis supports the previous conclusion that the G matrix for this population is full rank. REML‐MVN is computationally very efficient, making it an attractive alternative to both data resampling and MCMC approaches to assessing confidence in parameters of evolutionary interest.  相似文献   

3.
Evolutionary biologists, ecologists and experimental gerontologists have increasingly used estimates of age-specific mortality as a critical component in studies of a range of important biological processes. However, the analysis of age-specific mortality rates is plagued by specific statistical challenges caused by sampling error. Here we discuss the nature of this ‘demographic sampling error’, and the way in which it can bias our estimates of (1) rates of ageing, (2) age at onset of senescence, (3) costs of reproduction and (4) demographic tests of evolutionary models of ageing. We conducted simulations which suggest that using standard statistical techniques, we would need sample sizes on the order of tens of thousands in most experiments to effectively remove any bias due to sampling error. We argue that biologists should use much larger sample sizes than have previously been used. However, we also present simple maximum likelihood models that effectively remove biases due to demographic sampling error even at relatively small sample sizes.  相似文献   

4.
Conway-Cranos LL  Doak DF 《Oecologia》2011,167(1):199-207
Repeated, spatially explicit sampling is widely used to characterize the dynamics of sessile communities in both terrestrial and aquatic systems, yet our understanding of the consequences of errors made in such sampling is limited. In particular, when Markov transition probabilities are calculated by tracking individual points over time, misidentification of the same spatial locations will result in biased estimates of transition probabilities, successional rates, and community trajectories. Nonetheless, to date, all published studies that use such data have implicitly assumed that resampling occurs without error when making estimates of transition rates. Here, we develop and test a straightforward maximum likelihood approach, based on simple field estimates of resampling errors, to arrive at corrected estimates of transition rates between species in a rocky intertidal community. We compare community Markov models based on raw and corrected transition estimates using data from Endocladia muricata-dominated plots in a California intertidal assemblage, finding that uncorrected predictions of succession consistently overestimate recovery time. We tested the precision and accuracy of the approach using simulated datasets and found good performance of our estimation method over a range of realistic sample sizes and error rates.  相似文献   

5.
The supplemented case-control design consists of a case-control sample and of an additional sample of disease-free subjects who arise from a given stratum of one of the measured exposures in the case-control study. The supplemental data might, for example, arise from a population survey conducted independently of the case-control study. This design improves precision of estimates of main effects and especially of joint exposures, particularly when joint exposures are uncommon and the prevalence of one of the exposures is low. We first present a pseudo-likelihood estimator (PLE) that is easy to compute. We further adapt two-phase design methods to find maximum likelihood estimates (MLEs) for the log odds ratios for this design and derive asymptotic variance estimators that appropriately account for the differences in sampling schemes of this design from that of the traditional two-phase design. As an illustration of our design we present a study that was conducted to assess the influence to joint exposure of hepatitis-B virus (HBV) and hepatitis-C virus (HCV) infection on the risk of hepatocellular carcinoma in data from Qidong County, Jiangsu Province, China.  相似文献   

6.
A critical decision in landscape genetic studies is whether to use individuals or populations as the sampling unit. This decision affects the time and cost of sampling and may affect ecological inference. We analyzed 334 Columbia spotted frogs at 8 microsatellite loci across 40 sites in northern Idaho to determine how inferences from landscape genetic analyses would vary with sampling design. At all sites, we compared a proportion available sampling scheme (PASS), in which all samples were used, to resampled datasets of 2–11 individuals. Additionally, we compared a population sampling scheme (PSS) to an individual sampling scheme (ISS) at 18 sites with sufficient sample size. We applied an information theoretic approach with both restricted maximum likelihood and maximum likelihood estimation to evaluate competing landscape resistance hypotheses. We found that PSS supported low‐density forest when restricted maximum likelihood was used, but a combination model of most variables when maximum likelihood was used. We also saw variations when AIC was used compared to BIC. ISS supported this model as well as additional models when testing hypotheses of land cover types that create the greatest resistance to gene flow for Columbia spotted frogs. Increased sampling density and study extent, seen by comparing PSS to PASS, showed a change in model support. As number of individuals increased, model support converged at 7–9 individuals for ISS to PSS. ISS may be useful to increase study extent and sampling density, but may lack power to provide strong support for the correct model with microsatellite datasets. Our results highlight the importance of additional research on sampling design effects on landscape genetics inference.  相似文献   

7.
We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise.  相似文献   

8.
Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.  相似文献   

9.
Accurate estimation of the size of animal populations is an important task in ecological science. Recent advances in the field of molecular genetics researches allow the use of genetic data to estimate the size of a population from a single capture occasion rather than repeated occasions as in the usual capture–recapture experiments. Estimating the population size using genetic data also has sometimes led to estimates that differ markedly from each other and also from classical capture–recapture estimates. Here, we develop a closed form estimator that uses genetic information to estimate the size of a population consisting of mothers and daughters, focusing on estimating the number of mothers, using data from a single sample. We demonstrate the estimator is consistent and propose a parametric bootstrap to estimate the standard errors. The estimator is evaluated in a simulation study and applied to real data. We also consider maximum likelihood in this setting and discover problems that preclude its general use.  相似文献   

10.
Xie W  Lewis PO  Fan Y  Kuo L  Chen MH 《Systematic biology》2011,60(2):150-160
The marginal likelihood is commonly used for comparing different evolutionary models in Bayesian phylogenetics and is the central quantity used in computing Bayes Factors for comparing model fit. A popular method for estimating marginal likelihoods, the harmonic mean (HM) method, can be easily computed from the output of a Markov chain Monte Carlo analysis but often greatly overestimates the marginal likelihood. The thermodynamic integration (TI) method is much more accurate than the HM method but requires more computation. In this paper, we introduce a new method, steppingstone sampling (SS), which uses importance sampling to estimate each ratio in a series (the "stepping stones") bridging the posterior and prior distributions. We compare the performance of the SS approach to the TI and HM methods in simulation and using real data. We conclude that the greatly increased accuracy of the SS and TI methods argues for their use instead of the HM method, despite the extra computation needed.  相似文献   

11.
We consider inference for demographic models and parameters based upon postprocessing the output of an MCMC method that generates samples of genealogical trees (from the posterior distribution for a specific prior distribution of the genealogy). This approach has the advantage of taking account of the uncertainty in the inference for the tree when making inferences about the demographic model and can be computationally efficient in terms of reanalyzing data under a wide variety of models. We consider a (simulation-consistent) estimate of the likelihood for variable population size models, which uses importance sampling, and propose two new approximate likelihoods, one for migration models and one for continuous spatial models.  相似文献   

12.
Ecologists commonly use matrix models to study the population dynamics of plants. Most studies of plant demography use plot-based methods to collect data, in part, because mapped individuals are easier to relocate in subsequent surveys and survey methods can be standardized among sites. However, there is tremendous variation among studies, both in terms of plot arrangement and the total area sampled. In addition, there has been little discussion of how alternative sampling arrangements influence estimates of population growth rates (λ) calculated with matrix models. We surveyed the literature to determine what sampling designs are most used in studies of plant demography using matrix models. We then used simulations of three common sampling techniques—using a single randomly placed plot, multiple randomly placed plots, and systematically distributed plots—to evaluate how these alternative strategies influenced the precision of estimates of λ. These simulations were based on long-term demographic data collected on 13 populations of the Amazonian understory herb Heliconia acuminate (Heliconiaceae). We found that the method used to collect data did not affect the bias or precision of estimates in our system—a surprising result, since the advantage in efficiency that is gained from systematic sampling is a well-known result from sampling theory. Because the statistical advantage of systematic sampling is most evident when there is spatial structure in demographic vital rates, we attribute this result to the lack of spatially structured vital rates in our focal populations. Given the likelihood of spatial autocorrelation in most ecological systems, we advocate sampling with a systematic grid of plots in each study site, as well as that researchers ensure that enough area is sampled—both within and across sites—to encompass the range of spatial variation in plant survival, growth, and reproduction.  相似文献   

13.
Neuhaus JM  Scott AJ  Wild CJ 《Biometrics》2006,62(2):488-494
Case-control studies augmented by the values of responses and covariates from family members allow investigators to study the association between the response and genetics and environment by relating differences in the response directly to within-family differences in covariates. However, existing approaches for case-control family data parameterize covariate effects in terms of the marginal probability of response, the same effects that one estimates from standard case-control studies. This article focuses on the estimation of family-specific covariate effects and develops efficient methods to fit family-specific models such as binary mixed-effects models. We also extend the approach to cover any setting where one has a fully specified model for the vector of responses in a family. We illustrate our approach using data from a case-control family study of brain cancer and consider the use of weighted and conditional likelihood methods as alternatives.  相似文献   

14.
There has been much work done in nest survival analysis using the maximum likelihood (ML) method. The ML method suffers from the instability of numerical calculations when models having a large number of unknown parameters are used. A Bayesian approach of model fitting is developed to estimate age-specific survival rates for nesting studies using a large class of prior distributions. The computation is done by Gibbs sampling. Some latent variables are introduced to simplify the full conditional distributions. The method is illustrated using both a real and a simulated data set. Results indicate that Bayesian analysis provides stable and accurate estimates of nest survival rates.  相似文献   

15.
DNA extracted from hair or faeces shows increasing promise for censusing populations whose individuals are difficult to locate. To date, the main problem with this approach has been that genotyping errors are common. If these errors are not identified, counting genotypes is likely to overestimate the number of individuals in a population. Here, we describe an algorithm that uses maximum likelihood estimates of genotyping error rates to calculate the evidence that samples came from the same individual. We test this algorithm with a hypothetical model of genotyping error and show that this algorithm works well with substantial rates of genotyping error and reasonable amounts of data. Additional work is necessary to develop statistical models of error in empirical data.  相似文献   

16.
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency.  相似文献   

17.
The paper presents a method of multivariate data analysis described by a model which involves fixed effects, additive polygenic individual effects and the effects of a major gene. To find the estimates of model parameters, the maximization of likelihood function method is applied. The maximum of likelihood function is computed by the use of the Gibbs sampling approach. In this approach, following the conditional posterior distributions, values of all unknown parameters are generated. On the basis of the obtained samples the marginal posterior densities as well as the estimates of fixed effects, gene frequency, genotypic values, major gene, polygenic and error (co)variances are calculated. A numerical example, supplemented to theoretical considerations, deals with data simulated according to the considered model.  相似文献   

18.
On estimation and prediction for spatial generalized linear mixed models   总被引:4,自引:0,他引:4  
Zhang H 《Biometrics》2002,58(1):129-136
We use spatial generalized linear mixed models (GLMM) to model non-Gaussian spatial variables that are observed at sampling locations in a continuous area. In many applications, prediction of random effects in a spatial GLMM is of great practical interest. We show that the minimum mean-squared error (MMSE) prediction can be done in a linear fashion in spatial GLMMs analogous to linear kriging. We develop a Monte Carlo version of the EM gradient algorithm for maximum likelihood estimation of model parameters. A by-product of this approach is that it also produces the MMSE estimates for the realized random effects at the sampled sites. This method is illustrated through a simulation study and is also applied to a real data set on plant root diseases to obtain a map of disease severity that can facilitate the practice of precision agriculture.  相似文献   

19.
Summary We discuss design and analysis of longitudinal studies after case–control sampling, wherein interest is in the relationship between a longitudinal binary response that is related to the sampling (case–control) variable, and a set of covariates. We propose a semiparametric modeling framework based on a marginal longitudinal binary response model and an ancillary model for subjects' case–control status. In this approach, the analyst must posit the population prevalence of being a case, which is then used to compute an offset term in the ancillary model. Parameter estimates from this model are used to compute offsets for the longitudinal response model. Examining the impact of population prevalence and ancillary model misspecification, we show that time‐invariant covariate parameter estimates, other than the intercept, are reasonably robust, but intercept and time‐varying covariate parameter estimates can be sensitive to such misspecification. We study design and analysis issues impacting study efficiency, namely: choice of sampling variable and the strength of its relationship to the response, sample stratification, choice of working covariance weighting, and degree of flexibility of the ancillary model. The research is motivated by a longitudinal study following case–control sampling of the time course of attention deficit hyperactivity disorder (ADHD) symptoms.  相似文献   

20.
To effectively manage rare populations, accurate monitoring data are critical. Yet many monitoring programs are initiated without careful consideration of whether chosen sampling designs will provide accurate estimates of population parameters. Obtaining accurate estimates is especially difficult when natural variability is high, or limited budgets determine that only a small fraction of the population can be sampled. The Missouri bladderpod, Lesquerella filiformis Rollins, is a federally threatened winter annual that has an aggregated distribution pattern and exhibits dramatic interannual population fluctuations. Using the simulation program SAMPLE, we evaluated five candidate sampling designs appropriate for rare populations, based on 4 years of field data: (1) simple random sampling, (2) adaptive simple random sampling, (3) grid-based systematic sampling, (4) adaptive grid-based systematic sampling, and (5) GIS-based adaptive sampling. We compared the designs based on the precision of density estimates for fixed sample size, cost, and distance traveled. Sampling fraction and cost were the most important factors determining precision of density estimates, and relative design performance changed across the range of sampling fractions. Adaptive designs did not provide uniformly more precise estimates than conventional designs, in part because the spatial distribution of L. filiformis was relatively widespread within the study site. Adaptive designs tended to perform better as sampling fraction increased and when sampling costs, particularly distance traveled, were taken into account. The rate that units occupied by L. filiformis were encountered was higher for adaptive than for conventional designs. Overall, grid-based systematic designs were more efficient and practically implemented than the others. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号