首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
 Multivariate analysis is a branch of statistics that successfully exploits the powerful tools of linear algebra to obtain a fairly comprehensive theory of estimation. The purpose of this paper is to explore to what extent a linear theory of estimation can be developed in the context of coalescent models used in the analysis of DNA polymorphism. We consider a large class of coalescent models, of which the neutral infinite sites model is one example. In the process, we discover several limitations of linear estimators that are quite distinct from those in the classical theory. In particular, we prove that there does not exist a uniformly BLUE (best linear unbiased estimator) for the scaled mutation parameter, under the assumptions of the neutral model of evolution. In fact, we show that no linear estimator performs uniformly better than the Watterson (1975) method based on the total number of segregating sites. For certain coalescent models, the segregating-sites estimator is actually optimal. The general conclusion is the following. If genealogical information is useful for estimating the rate of evolution, then there is no optimal linear method. If there is an optimal linear method, then no information other than the total number of segregating sites is needed. Received: 29 July 1998 / Revised version: 9 October 1998  相似文献   

2.
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.  相似文献   

3.
Malka Gorfine 《Biometrics》2001,57(2):589-597
In this article, we investigate estimation of a secondary parameter in group sequential tests. We study the model in which the secondary parameter is the mean of the normal distribution in a subgroup of the subjects. The bias of the naive secondary parameter estimator is studied. It is shown that the sampling proportions of the subgroup have a crucial effect on the bias: As the sampling proportion of the subgroup at or just before the stopping time increases, the bias of the naive subgroup parameter estimator increases as well. An unbiased estimator for the subgroup parameter and an unbiased estimator for its variance are derived. Using simulations, we compare the mean squared error of the unbiased estimator to that of the naive estimator, and we show that the differences are negligible. As an example, the methods of estimation are applied to an actual group sequential clinical trial, The Beta-Blocker Heart Attack Trial.  相似文献   

4.
Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many data sets of moderate size, including real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.  相似文献   

5.
Gene diversity is an important measure of genetic variability in inbred populations. The survival of species in changing environments depends on, among other factors, the genetic variability of the population. In this communication, I have derived the uniformly minimum variance unbiased estimator of gene diversity. The proposed estimator of gene diversity does not assume that the inbreeding coefficient is known. I have also provided the approximate variance of this estimator according to Fisher's method. In addition, I have developed a numerical resampling-based method for obtaining variances and confidence intervals based on the maximum likelihood estimator and the uniformly minimum variance unbiased estimator. Efficiency in estimation of the gene diversity based on these two estimators is discussed. In accordance with the simulation results, I found that the uniformly minimum variance estimator developed in this report is more accurate for estimation of gene diversity than the maximum likelihood estimator.  相似文献   

6.
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing substantial biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.  相似文献   

7.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

8.
On estimating the heterozygosity and polymorphism information content value   总被引:1,自引:0,他引:1  
The polymorphism information content (PIC) value is commonly used in genetics as a measure of polymorphism for a marker locus used in linkage analysis. In this communication we have derived the uniformly minimum variance unbiased estimator of PIC along with its exact variance. We have also calculated the exact variance of the maximum likelihood estimator of PIC which is asymptotically an unbiased estimator. In order to find this variance we have derived a recursive formula to calculate the moments of any polynomial in a set of variables that are multinomially distributed.  相似文献   

9.
A Likelihood Approach to Populations Samples of Microsatellite Alleles   总被引:4,自引:2,他引:2  
R. Nielsen 《Genetics》1997,146(2):711-716
This paper presents a likelihood approach to population samples of microsatellite alleles. A Markov chain recursion method previously published by GRIFFITHS and TAVARE is applied to estimate the likelihood function under different models of microsatellite evolution. The method presented can be applied to estimate a fundamental population genetics parameter θ as well as parameters of the mutational model. The new likelihood estimator provides a better estimator of θ in terms of the mean square error than previous approaches. Furthermore, it is demonstrated how the method may easily be applied to test models of microsatellite evolution. In particular it is shown how to compare a one-step model of microsatellite evolution to a multi-step model by a likelihood ratio test.  相似文献   

10.
We consider estimation after a group sequential test. An estimator that is unbiased or has small bias may have substantial conditional bias (Troendle and Yu, 1999, Coburger and Wassmer, 2001). In this paper we derive the conditional maximum likelihood estimators of both the primary parameter and a secondary parameter, and investigate their properties within a conditional inference framework. The method applies to both the usual and adaptive group sequential test designs. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
We show that the number of segregating sites is a sufficient statistic for the scaled mutation parameter (θ) in the limit as the number of sites tends to infinity and there is free recombination between sites. We assume that the mutation parameter at each site tends to zero such than the total mutation parameter (θ) is constant in the limit. Our results show that Watterson’s estimator is the maximum likelihood estimator in this case, but that it estimates a composite parameter which is different for different mutation models. Some of our results hold when recombination is limited, because Watterson’s estimator is an unbiased, method-of-moments estimator regardless of the recombination rate. The quantity it estimates depends on the details of how mutations occur at each site.  相似文献   

12.
Targeted maximum likelihood estimation of a parameter of a data generating distribution, known to be an element of a semi-parametric model, involves constructing a parametric model through an initial density estimator with parameter ? representing an amount of fluctuation of the initial density estimator, where the score of this fluctuation model at ? = 0 equals the efficient influence curve/canonical gradient. The latter constraint can be satisfied by many parametric fluctuation models since it represents only a local constraint of its behavior at zero fluctuation. However, it is very important that the fluctuations stay within the semi-parametric model for the observed data distribution, even if the parameter can be defined on fluctuations that fall outside the assumed observed data model. In particular, in the context of sparse data, by which we mean situations where the Fisher information is low, a violation of this property can heavily affect the performance of the estimator. This paper presents a fluctuation approach that guarantees the fluctuated density estimator remains inside the bounds of the data model. We demonstrate this in the context of estimation of a causal effect of a binary treatment on a continuous outcome that is bounded. It results in a targeted maximum likelihood estimator that inherently respects known bounds, and consequently is more robust in sparse data situations than the targeted MLE using a naive fluctuation model. When an estimation procedure incorporates weights, observations having large weights relative to the rest heavily influence the point estimate and inflate the variance. Truncating these weights is a common approach to reducing the variance, but it can also introduce bias into the estimate. We present an alternative targeted maximum likelihood estimation (TMLE) approach that dampens the effect of these heavily weighted observations. As a substitution estimator, TMLE respects the global constraints of the observed data model. For example, when outcomes are binary, a fluctuation of an initial density estimate on the logit scale constrains predicted probabilities to be between 0 and 1. This inherent enforcement of bounds has been extended to continuous outcomes. Simulation study results indicate that this approach is on a par with, and many times superior to, fluctuating on the linear scale, and in particular is more robust when there is sparsity in the data.  相似文献   

13.
The two-dimensional boundary layer flow and heat transfer to Sisko nanofluid over a non-linearly stretching sheet is scrutinized in the concerned study. Our nanofluid model incorporates the influences of the thermophoresis and Brownian motion. The convective boundary conditions are taken into account. Implementation of suitable transformations agreeing with the boundary conditions result in reduction of the governing equations of motion, energy and concentration into non-linear ordinary differential equations. These coupled non-linear ordinary differential equations are solved analytically by using the homotopy analysis method (HAM) and numerically by the shooting technique. The effects of the thermophoresis and Brownian motion parameters on the temperature and concentration fields are analyzed and graphically presented. The secured results make it clear that the temperature distribution is an increasing function of the thermophoresis and Brownian motion parameters and concentration distribution increases with the thermophoresis parameter but decreases with the Brownian motion parameter. To see the validity of the present work, we made a comparison with the numerical results as well as previously published work with an outstanding compatibility.  相似文献   

14.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

15.
We introduce an implicit method for state and parameter estimation and apply it to a stochastic ecological model. The method uses an ensemble of particles to approximate the distribution of model solutions and parameters conditioned on noisy observations of the state. For each particle, it first determines likely values based on the observations, then samples around those values. This approach has a strong theoretical foundation, applies to nonlinear models and non-Gaussian distributions, and can estimate any number of model parameters, initial conditions, and model error covariances. The method is called implicit because it updates the particles without forming a predictive distribution of forward model integrations. As a point of comparison for different assimilation techniques, we consider examples in which one or more bifurcations separate the true parameter from its initial approximation. The implicit estimator is asymptotically unbiased, has a root-mean-squared error comparable to or less than the other methods, and is accurate even with small ensemble sizes.  相似文献   

16.
M C Wu  K R Bailey 《Biometrics》1989,45(3):939-955
A general linear regression model for the usual least squares estimated rate of change (slope) on censoring time is described as an approximation to account for informative right censoring in estimating and comparing changes of a continuous variable in two groups. Two noniterative estimators for the group slope means, the linear minimum variance unbiased (LMVUB) estimator and the linear minimum mean squared error (LMMSE) estimator, are proposed under this conditional model. In realistic situations, we illustrate that the LMVUB and LMMSE estimators, derived under a simple linear regression model, are quite competitive compared to the pseudo maximum likelihood estimator (PMLE) derived by modeling the censoring probabilities. Generalizations to polynomial response curves and general linear models are also described.  相似文献   

17.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

18.
In this article, we provide a method of estimation for the treatment effect in the adaptive design for censored survival data with or without adjusting for risk factors other than the treatment indicator. Within the semiparametric Cox proportional hazards model, we propose a bias-adjusted parameter estimator for the treatment coefficient and its asymptotic confidence interval at the end of the trial. The method for obtaining an asymptotic confidence interval and point estimator is based on a general distribution property of the final test statistic from the weighted linear rank statistics at the interims with or without considering the nuisance covariates. The computation of the estimates is straightforward. Extensive simulation studies show that the asymptotic confidence intervals have reasonable nominal probability of coverage, and the proposed point estimators are nearly unbiased with practical sample sizes.  相似文献   

19.
Over the past two decades, many short tandem repeat (STR) microsatellite loci on the human Y chromosome have been identified together with mutation rate estimates for the individual loci. These have been used to estimate the coalescent age, or the time to the most recent common ancestor (TMRCA) expressed in generations, in conjunction with the average square difference measure (ASD), an unbiased point estimator of TMRCA based upon the average within-locus allele variance between haplotypes. The ASD estimator, in turn, depends on accurate mutation rate estimates to be able to produce good approximations of the coalescent age of a sample. Here, a comparison is made between three published sets of per locus mutation rate estimates as they are applied to the calculation of the coalescent age for real and simulated population samples. A novel evaluation method is developed for estimating the degree of conformity of any Y chromosome STR locus of interest to the strict stepwise mutation model and specific recommendations are made regarding the suitability of thirty-two commonly used Y-STR loci for the purpose of estimating the coalescent. The use of the geometric mean for averaging ASD and across loci is shown to improve the consistency of the resulting estimates, with decreased sensitivity to outliers and to the number of STR loci compared or the particular set of mutation rates selected.  相似文献   

20.
The unbiased estimation of fluctuating asymmetry (FA) requires independent repeated measurements on both sides. The statistical analysis of such data is currently performed by a two-way mixed ANOVA analysis. Although this approach produces unbiased estimates of FA, many studies do not utilize this method. This may be attributed in part to the fact that the complete analysis of FA is very cumbersome and cannot be performed automatically with standard statistical software. Therefore, further elaboration of the statistical tools to analyse FA should focus on the usefulness of the method, in order for the correct statistical approaches to be applied more regularly. In this paper we propose a mixed regression model with restricted maximum likelihood (REML) parameter estimation to model FA. This routine yields exactly the same estimates of FA as the two-way mixed ANOVA . Yet the advantages of this approach are that it allows (a) testing the statistical significance of FA, (b) modelling and testing heterogeneity in both FA and measurement error (ME) among samples, (c) testing for nonzero directional asymmetry and (d) obtaining unbiased estimates of individual FA levels. The switch from a mixed two-way ANOVA to a mixed regression model was made to avoid overparametrization. Two simulation studies are presented. The first shows that a previously proposed method to test the significance of FA is incorrect, contrary to our mixed regression approach. In the second simulation study we show that a traditionally applied measure of individual FA [abs(left – right)] is biased by ME. The proposed mixed regression method, however, produces unbiased estimates of individual FA after modelling heterogeneity in ME. The applicability of this method is illustrated with two analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号