首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Aitkin M 《Biometrics》1999,55(1):117-128
This paper describes an EM algorithm for nonparametric maximum likelihood (ML) estimation in generalized linear models with variance component structure. The algorithm provides an alternative analysis to approximate MQL and PQL analyses (McGilchrist and Aisbett, 1991, Biometrical Journal 33, 131-141; Breslow and Clayton, 1993; Journal of the American Statistical Association 88, 9-25; McGilchrist, 1994, Journal of the Royal Statistical Society, Series B 56, 61-69; Goldstein, 1995, Multilevel Statistical Models) and to GEE analyses (Liang and Zeger, 1986, Biometrika 73, 13-22). The algorithm, first given by Hinde and Wood (1987, in Longitudinal Data Analysis, 110-126), is a generalization of that for random effect models for overdispersion in generalized linear models, described in Aitkin (1996, Statistics and Computing 6, 251-262). The algorithm is initially derived as a form of Gaussian quadrature assuming a normal mixing distribution, but with only slight variation it can be used for a completely unknown mixing distribution, giving a straightforward method for the fully nonparametric ML estimation of this distribution. This is of value because the ML estimates of the GLM parameters can be sensitive to the specification of a parametric form for the mixing distribution. The nonparametric analysis can be extended straightforwardly to general random parameter models, with full NPML estimation of the joint distribution of the random parameters. This can produce substantial computational saving compared with full numerical integration over a specified parametric distribution for the random parameters. A simple method is described for obtaining correct standard errors for parameter estimates when using the EM algorithm. Several examples are discussed involving simple variance component and longitudinal models, and small-area estimation.  相似文献   

2.
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected. [Bayesian credible intervals; DistR method; multigene phylogeny; PHYML; rate heterogeneity.].  相似文献   

3.
An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this conditional variance estimate to the standard technique of using the observed information under a variety of experimental conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into standard bootstrapping procedures to allow for proper variance estimation.  相似文献   

4.
Keightley PD  Bataillon TM 《Genetics》2000,154(3):1193-1201
We develop a maximum-likelihood (ML) approach to estimate genomic mutation rates (U) and average homozygous mutation effects (s) from mutation-accumulation (MA) experiments in which phenotypic assays are carried out in several generations. We use simulations to compare the procedure's performance with the method of moments traditionally used to analyze MA data. Similar precision is obtained if mutation effects are small relative to the environmental standard deviation, but ML can give estimates of mutation parameters that have lower sampling variances than those obtained by the method of moments if mutations with large effects have accumulated. The inclusion of data from intermediate generations may improve the precision. We analyze life-history trait data from two Caenorhabditis elegans MA experiments. Under a model with equal mutation effects, the two experiments provide similar estimates for U of approximately 0.005 per haploid, averaged over traits. Estimates of s are more divergent and average at -0.51 and -0.13 in the two studies. Detailed analysis shows that changes of mean and variance of genetic values of MA lines in both C. elegans experiments are dominated by mutations with large effects, but the analysis does not rule out the presence of a large class of deleterious mutations with very small effects.  相似文献   

5.
最近,人们突变积累实验(MA)中测定有害基因突变(DGM)的兴趣大增。在MA实验中有两种常见的DGM估计方法(极大似然法ML和距法MM),依靠计算机模拟和处理真实数据的应用软件来比较这两种方法。结论是:ML法难于得到最大似然估计(MLEs),所以ML法不如MM法估计有效;即使MLEs可得,也因其具严重的微样误差(据偏差和抽样差异)而产生估计偏差;似然函数曲线较平坦而难于区分高峰态和低峰态的分布。  相似文献   

6.
Summary Approximate standard errors of genetic parameter estimates were obtained using a simulation technique and approximation formulae for a simple statistical model. The similarity of the corresponding estimates of standard errors from the two methods indicated that the simulation technique may be useful for estimating the precision of genetic parameter estimates for complex models or unbalanced population structures where approxi mation formulae do not apply. The method of generating simulation populations in the computer is outlined, and a technique of setting approximate confidence limits to heritability estimates is described.  相似文献   

7.
The molecular clock theory has greatly enlightened our understanding of macroevolutionary events. Maximum likelihood (ML) estimation of divergence times involves the adoption of fixed calibration points, and the confidence intervals associated with the estimates are generally very narrow. The credibility intervals are inferred assuming that the estimates are normally distributed, which may not be the case. Moreover, calculation of standard errors is usually carried out by the curvature method and is complicated by the difficulty in approximating second derivatives of the likelihood function. In this study, a standard primate phylogeny was used to examine the standard errors of ML estimates via the bootstrap method. Confidence intervals were also assessed from the posterior distribution of divergence times inferred via Bayesian Markov Chain Monte Carlo. For the primate topology under evaluation, no significant differences were found between the bootstrap and the curvature methods. Also, Bayesian confidence intervals were always wider than those obtained by ML.  相似文献   

8.
Binding constant data K degrees (T) are commonly subjected to van't Hoff analysis to extract estimates of DeltaH degrees, DeltaS degrees, and DeltaCP degrees for the process in question. When such analyses employ unweighted least-squares fitting of lnK degrees to an appropriate function of the temperature T, they are tacitly assuming constant relative error in K degrees. When this assumption is correct, the statistical errors in DeltaG degrees, DeltaH degrees, DeltaS degrees, DeltaCP degrees, and the T-derivative of DeltaCP degrees (if determined) are all independent of the actual values of K degrees and can be computed from knowledge of just the T values at which K degrees is known and the percent error in K degrees. All of these statistical errors except that for the highest-order constant are functions of T, so they must normally be calculated using a form of the error propagation equation that is not widely known. However, this computation can be bypassed by defining DeltaH degrees as a polynomial in (T-T0), the coefficients of which thus become DeltaH degrees, DeltaCP degrees, and 1/2 dDeltaCP degrees/dT at T=T0. The errors in the key quantities can then be computed by just repeating the fit for different T0. Procedures for doing this are described for a representative data analysis program. Results of such calculations show that expanding the T range from 10-40 to 5-45 degrees C gives significant improvement in the precision of all quantities. DeltaG degrees is typically determined with standard error a factor of approximately 30 smaller than that for DeltaH degrees. Accordingly, the error in TDeltaS degrees is nearly identical to that in DeltaH degrees. For 4% error in K degrees, the T-derivative in DeltaCP degrees cannot be determined unless it is approximately 10 cal mol-1 K-2 or greater; and DeltaCP degrees must be approximately 50 cal mol-1 K-1. Since all errors scale with the data error and inversely with the square root of the number of data points, the present results for 4% error cover any other relative error and number of points, for the same approximate T structure of the data.  相似文献   

9.
Gelman A  Chew GL  Shnaidman M 《Biometrics》2004,60(2):407-417
In a serial dilution assay, the concentration of a compound is estimated by combining measurements of several different dilutions of an unknown sample. The relation between concentration and measurement is nonlinear and heteroscedastic, and so it is not appropriate to weight these measurements equally. In the standard existing approach for analysis of these data, a large proportion of the measurements are discarded as being above or below detection limits. We present a Bayesian method for jointly estimating the calibration curve and the unknown concentrations using all the data. Compared to the existing method, our estimates have much lower standard errors and give estimates even when all the measurements are outside the "detection limits." We evaluate our method empirically using laboratory data on cockroach allergens measured in house dust samples. Our estimates are much more accurate than those obtained using the usual approach. In addition, we develop a method for determining the "effective weight" attached to each measurement, based on a local linearization of the estimated model. The effective weight can give insight into the information conveyed by each data point and suggests potential improvements in design of serial dilution experiments.  相似文献   

10.
New approaches to dating suggest a recent age for the human mtDNA ancestor.   总被引:12,自引:0,他引:12  
The most critical and controversial feature of the African origin hypothesis of human mitochondrial DNA (mtDNA) evolution is the relatively recent age of about 200 ka inferred for the human mtDNA ancestor. If this age is wrong, and the actual age instead approaches 1 million years ago, then the controversy abates. Reliable estimates of the age of the human mtDNA ancestor and the associated standard error are therefore crucial. However, more recent estimates of the age of the human ancestor rely on comparisons between human and chimpanzee mtDNAs that may not be reliable and for which standard errors are difficult to calculate. We present here two approaches for deriving an intraspecific calibration of the rate of human mtDNA sequence evolution that allow standard errors to be readily calculated. The estimates resulting from these two approaches for the age of the human mtDNA ancestor (and approximate 95% confidence intervals) are 133 (63-356) and 137 (63-416) ka ago. These results provide the strongest evidence yet for a relatively recent origin of the human mtDNA ancestor.  相似文献   

11.
Ten Have TR  Localio AR 《Biometrics》1999,55(4):1022-1029
We extend an approach for estimating random effects parameters under a random intercept and slope logistic regression model to include standard errors, thereby including confidence intervals. The procedure entails numerical integration to yield posterior empirical Bayes (EB) estimates of random effects parameters and their corresponding posterior standard errors. We incorporate an adjustment of the standard error due to Kass and Steffey (KS; 1989, Journal of the American Statistical Association 84, 717-726) to account for the variability in estimating the variance component of the random effects distribution. In assessing health care providers with respect to adult pneumonia mortality, comparisons are made with the penalized quasi-likelihood (PQL) approximation approach of Breslow and Clayton (1993, Journal of the American Statistical Association 88, 9-25) and a Bayesian approach. To make comparisons with an EB method previously reported in the literature, we apply these approaches to crossover trials data previously analyzed with the estimating equations EB approach of Waclawiw and Liang (1994, Statistics in Medicine 13, 541-551). We also perform simulations to compare the proposed KS and PQL approaches. These two approaches lead to EB estimates of random effects parameters with similar asymptotic bias. However, for many clusters with small cluster size, the proposed KS approach does better than the PQL procedures in terms of coverage of nominal 95% confidence intervals for random effects estimates. For large cluster sizes and a few clusters, the PQL approach performs better than the KS adjustment. These simulation results agree somewhat with those of the data analyses.  相似文献   

12.
Durban M  Hackett CA  Currie ID 《Biometrics》1999,55(3):699-703
We consider semiparametric models with p regressor terms and q smooth terms. We obtain an explicit expression for the estimate of the regression coefficients given by the back-fitting algorithm. The calculation of the standard errors of these estimates based on this expression is a considerable computational exercise. We present an alternative, approximate method of calculation that is less demanding. With smoothing splines, the method is exact, while with loess, it gives good estimates of standard errors. We assess the adequacy of our approximation and of another approximation with the help of two examples.  相似文献   

13.
Summary Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation‐maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less‐efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.  相似文献   

14.
García-Dorado A  Gallego A 《Genetics》2003,164(2):807-819
We simulated single-generation data for a fitness trait in mutation-accumulation (MA) experiments, and we compared three methods of analysis. Bateman-Mukai (BM) and maximum likelihood (ML) need information on both the MA lines and control lines, while minimum distance (MD) can be applied with or without the control. Both MD and ML assume gamma-distributed mutational effects. ML estimates of the rate of deleterious mutation had larger mean square error (MSE) than MD or BM had due to large outliers. MD estimates obtained by ignoring the mean decline observed from comparison to a control are often better than those obtained using that information. When effects are simulated using the gamma distribution, reducing the precision with which the trait is assayed increases the probability of obtaining no ML or MD estimates but causes no appreciable increase of the MSE. When the residual errors for the means of the simulated lines are sampled from the empirical distribution in a MA experiment, instead of from a normal one, the MSEs of BM, ML, and MD are practically unaffected. When the simulated gamma distribution accounts for a high rate of mild deleterious mutation, BM detects only approximately 30% of the true deleterious mutation rate, while MD or ML detects substantially larger fractions. To test the robustness of the methods, we also added a high rate of common contaminant mutations with constant mild deleterious effect to a low rate of mutations with gamma-distributed deleterious effects and moderate average. In that case, BM detects roughly the same fraction as before, regardless of the precision of the assay, while ML fails to provide estimates. However, MD estimates are obtained by ignoring the control information, detecting approximately 70% of the total mutation rate when the mean of the lines is assayed with good precision, but only 15% for low-precision assays. Contaminant mutations with only tiny deleterious effects could not be detected with acceptable accuracy by any of the above methods.  相似文献   

15.
Accurate measurements of metabolic fluxes in living cells are central to metabolism research and metabolic engineering. The gold standard method is model-based metabolic flux analysis (MFA), where fluxes are estimated indirectly from mass isotopomer data with the use of a mathematical model of the metabolic network. A critical step in MFA is model selection: choosing what compartments, metabolites, and reactions to include in the metabolic network model. Model selection is often done informally during the modelling process, based on the same data that is used for model fitting (estimation data). This can lead to either overly complex models (overfitting) or too simple ones (underfitting), in both cases resulting in poor flux estimates. Here, we propose a method for model selection based on independent validation data. We demonstrate in simulation studies that this method consistently chooses the correct model in a way that is independent on errors in measurement uncertainty. This independence is beneficial, since estimating the true magnitude of these errors can be difficult. In contrast, commonly used model selection methods based on the χ2-test choose different model structures depending on the believed measurement uncertainty; this can lead to errors in flux estimates, especially when the magnitude of the error is substantially off. We present a new approach for quantification of prediction uncertainty of mass isotopomer distributions in other labelling experiments, to check for problems with too much or too little novelty in the validation data. Finally, in an isotope tracing study on human mammary epithelial cells, the validation-based model selection method identified pyruvate carboxylase as a key model component. Our results argue that validation-based model selection should be an integral part of MFA model development.  相似文献   

16.
Excoffier L  Estoup A  Cornuet JM 《Genetics》2005,169(3):1727-1738
We introduce here a Bayesian analysis of a classical admixture model in which all parameters are simultaneously estimated. Our approach follows the approximate Bayesian computation (ABC) framework, relying on massive simulations and a rejection-regression algorithm. Although computationally intensive, this approach can easily deal with complex mutation models and partially linked loci, and it can be thoroughly validated without much additional computation cost. Compared to a recent maximum-likelihood (ML) method, the ABC approach leads to similarly accurate estimates of admixture proportions in the case of recent admixture events, but it is found superior when the admixture is more ancient. All other parameters of the admixture model such as the divergence time between parental populations, the admixture time, and the population sizes are also well estimated, unlike the ML method. The use of partially linked markers does not introduce any particular bias in the estimation of admixture, but ML confidence intervals are found too narrow if linkage is not specifically accounted for. The application of our method to an artificially admixed domestic bee population from northwest Italy suggests that the admixture occurred in the last 10-40 generations and that the parental Apis mellifera and A. ligustica populations were completely separated since the last glacial maximum.  相似文献   

17.
We introduce a new statistical computing method, called data cloning, to calculate maximum likelihood estimates and their standard errors for complex ecological models. Although the method uses the Bayesian framework and exploits the computational simplicity of the Markov chain Monte Carlo (MCMC) algorithms, it provides valid frequentist inferences such as the maximum likelihood estimates and their standard errors. The inferences are completely invariant to the choice of the prior distributions and therefore avoid the inherent subjectivity of the Bayesian approach. The data cloning method is easily implemented using standard MCMC software. Data cloning is particularly useful for analysing ecological situations in which hierarchical statistical models, such as state-space models and mixed effects models, are appropriate. We illustrate the method by fitting two nonlinear population dynamics models to data in the presence of process and observation noise.  相似文献   

18.
Oman SD  Landsman V  Carmel Y  Kadmon R 《Biometrics》2007,63(3):892-900
We estimate the relation between binary responses and corresponding covariate vectors, both observed over a large spatial lattice. We assume a hierarchical generalized linear model with probit link function, partition the lattice into blocks, and adopt the working assumption of independence between the blocks to obtain an easily solved estimating equation. Standard errors are obtained using the "sandwich" estimator together with window subsampling (Sherman, 1996, Journal of the Royal Statistical Society, Series B58, 509-523). We apply this to a large data set describing long-term vegetation growth, together with two other approximate-likelihood approaches: pairwise composite likelihood (CL) and estimation under a working assumption of independence. The independence and CL methods give similar point estimates and standard errors, while the independent-block approach gives considerably smaller standard errors, as well as more easily interpretable point estimates. We present numerical evidence suggesting this increased efficiency may hold more generally.  相似文献   

19.
Maximum likelihood methods were developed for estimation of the six parameters relating to a marker-linked quantitative trait locus (QTL) segregating in a half-sib design, namely the QTL additive effect, the QTL dominance effect, the population mean, recombination between the marker and the QTL, the population frequency of the QTL alleles, and the within-family residual variance. The method was tested on simulated stochastic data with various family structures under two genetic models. A method for predicting the expected value of the likelihood was also derived and used to predict the lower bound sampling errors of the parameter estimates and the correlations between them. It was found that standard errors and confidence intervals were smallest for the population mean and variance, intermediate for QTL effects and allele frequency, and highest for recombination rate. Correlations among standard errors of the parameter estimates were generally low except for a strong negative correlation (r = -0.9) between the QTL's dominance effect and the population mean, and medium positive and negative correlations between the QTL's additive effect and, respectively, recombination rate (r = 0.5) and residual variance (r = -0.6). The implications for experimental design and method of analysis on power and accuracy of marker-QTL linkage experiments were discussed.  相似文献   

20.
Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号