首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Short phylogenetic distances between taxa occur, for example, in studies on ribosomal RNA-genes with slow substitution rates. For consistently short distances, it is proved that in the completely singular limit of the covariance matrix ordinary least squares (OLS) estimates are minimum variance or best linear unbiased (BLU) estimates of phylogenetic tree branch lengths. Although OLS estimates are in this situation equal to generalized least squares (GLS) estimates, the GLS chi-square likelihood ratio test will be inapplicable as it is associated with zero degrees of freedom. Consequently, an OLS normal distribution test or an analogous bootstrap approach will provide optimal branch length tests of significance for consistently short phylogenetic distances. As the asymptotic covariances between branch lengths will be equal to zero, it follows that the product rule can be used in tree evaluation to calculate an approximate simultaneous confidence probability that all interior branches are positive.  相似文献   

2.
Kneib T  Fahrmeir L 《Biometrics》2006,62(1):109-118
Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.  相似文献   

3.
A restricted maximum likelihood estimator for truncated height samples   总被引:1,自引:0,他引:1  
A restricted maximum likelihood (ML) estimator is presented and evaluated for use with truncated height samples. In the common situation of a small sample truncated at a point not far below the mean, the ordinary ML estimator suffers from high sampling variability. The restricted estimator imposes an a priori value on the standard deviation and freely estimates the mean, exploiting the known empirical stability of the former to obtain less variable estimates of the latter. Simulation results validate the conjecture that restricted ML behaves like restricted ordinary least squares (OLS), whose properties are well established on theoretical grounds. Both estimators display smaller sampling variability when constrained, whether the restrictions are correct or not. The bias induced by incorrect restrictions sets up a decision problem involving a bias-precision tradeoff, which can be evaluated using the mean squared error (MSE) criterion. Simulated MSEs suggest that restricted ML estimation offers important advantages when samples are small and truncation points are high, so long as the true standard deviation is within roughly 0.5 cm of the chosen value.  相似文献   

4.
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods.  相似文献   

5.
Genetic models for quantitative seed traits with effects of several major genes and polygenes, as well as their GE interaction, were proposed. Mixed linear model approaches were suggested for analyzing the genetic models. Monte Carlo simulations were conducted to evaluate unbiasedness and efficiency for estimating fixed effects and variance components of the embryo and the endosperm models, including effects of a major gene from an unbalanced modified diallel mating design with nine parents, respectively. Simulation results showed that estimates of generalized least squares (GLS) were unbiased and efficient, while those of ordinary least squares (OLS) were almost as good as GLS. Minimum norm quadratic unbiased estimation (MINQUE) could obtain unbiased estimates of the variance components. It was also suggested that precision of MINQUE estimation would be improved with augmentation of experimental size. Data from a modified diallel design in upland cotton ( Gossypium hirsutum L.) were used as a worked example to illustrate the parameter estimation.  相似文献   

6.
The simultaneous estimation of individual growth curves and a mean growth curve is accomplished by weighted least squares. A polynomial curve is fitted for each individual and the polynomial parameters are linear functions of parameters corresponding to covariates. A simple, computationally efficient variance-covariance estimator is derived. The resultant estimate is used in the weighted least squares estimation. The results are compared to empirical Bayes estimation.  相似文献   

7.
In the linear model with right-censored responses and many potential explanatory variables, regression parameter estimates may be unstable or, when the covariates outnumber the uncensored observations, not estimable. We propose an iterative algorithm for partial least squares, based on the Buckley-James estimating equation, to estimate the covariate effect and predict the response for a future subject with a given set of covariates. We use a leave-two-out cross-validation method for empirically selecting the number of components in the partial least-squares fit that approximately minimizes the error in estimating the covariate effect of a future observation. Simulation studies compare the methods discussed here with other dimension reduction techniques. Data from the AIDS Clinical Trials Group protocol 333 are used to motivate the methodology.  相似文献   

8.
A genetic model was proposed to simultaneously investigate genetic effects of both polygenes and several single genes for quantitative traits of diploid plants and animals. Mixed linear model approaches were employed for statistical analysis. Based on two mating designs, a full diallel cross and a modified diallel cross including F2, Monte Carlo simulations were conducted to evaluate the unbiasedness and efficiency of the estimation of generalized least squares (GLS) and ordinary least squares (OLS) for fixed effects and of minimum norm quadratic unbiased estimation (MINQUE) and Henderson III for variance components. Estimates of MINQUE (1) were unbiased and efficient in both reduced and full genetic models. Henderson III could have a large bias when used to analyze the full genetic model. Simulation results also showed that GLS and OLS were good methods to estimate fixed effects in the genetic models. Data on Drosophila melanogaster from Gilbert were used as a worked example to demonstrate the parameter estimation. Received: 11 November 2000 / Accepted: 2 May 2001  相似文献   

9.
Ibrahim JG  Chen MH  Lipsitz SR 《Biometrics》1999,55(2):591-596
We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). We extend this method to continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American Statistical Association 85, 699-704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log-concave. The log-concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41, 337-348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one-dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E-step. We present examples involving both simulated and real data.  相似文献   

10.
Physiological and ecological allometries often pose linear regression problems characterized by (1) noncausal, phylogenetically autocorrelated independent (x) and dependent (y) variables (characters); (2) random variation in both variables; and (3) a focus on regression slopes (allometric exponents). Remedies for the phylogenetic autocorrelation of species values (phylogenetically independent contrasts) and variance structure of the data (reduced major axis [RMA] regression) have been developed, but most functional allometries are reported as ordinary least squares (OLS) regression without use of phylogenetically independent contrasts. We simulated Brownian diffusive evolution of functionally related characters and examined the importance of regression methodologies and phylogenetic contrasts in estimating regression slopes for phylogenetically constrained data. Simulations showed that both OLS and RMA regressions exhibit serious bias in estimated regression slopes under different circumstances but that a modified orthogonal (least squares variance-oriented residual [LSVOR]) regression was less biased than either OLS or RMA regressions. For strongly phylogenetically structured data, failure to use phylogenetic contrasts as regression data resulted in overestimation of the strength of the regression relationship and a significant increase in the variance of the slope estimate. Censoring of data sets by simulated extinction of taxa did not affect the importance of appropriate regression models or the use of phylogenetic contrasts.  相似文献   

11.
Generalized estimating equation (GEE) algorithm under a heterogeneous residual variance model is an extension of the iteratively reweighted least squares (IRLS) method for continuous traits to discrete traits. In contrast to mixture model-based expectation–maximization (EM) algorithm, the GEE algorithm can well detect quantitative trait locus (QTL), especially large effect QTLs located in large marker intervals in the manner of high computing speed. Based on a single QTL model, however, the GEE algorithm has very limited statistical power to detect multiple QTLs because of ignoring other linked QTLs. In this study, the fast least absolute shrinkage and selection operator (LASSO) is derived for generalized linear model (GLM) with all possible link functions. Under a heterogeneous residual variance model, the LASSO for GLM is used to iteratively estimate the non-zero genetic effects of those loci over entire genome. The iteratively reweighted LASSO is therefore extended to mapping QTL for discrete traits, such as ordinal, binary, and Poisson traits. The simulated and real data analyses are conducted to demonstrate the efficiency of the proposed method to simultaneously identify multiple QTLs for binary and Poisson traits as examples.  相似文献   

12.
Branch length estimates play a central role in maximum-likelihood (ML) and minimum-evolution (ME) methods of phylogenetic inference. For various reasons, branch length estimates are not statistically independent under ML or ME. We studied the response of correlations among branch length estimates to the degree of among-branch length heterogeneity (BLH) in the model (true) tree. The frequency and magnitude of (especially negative) correlations among branch length estimates were both shown to increase as BLH increases under simulation and analytically. For ML, we used the correct model (Jukes–Cantor). For ME, we employed ordinary least-squares (OLS) branch lengths estimated under both simple p-distances and Jukes–Cantor distances, analyzed with and without an among-site rate heterogeneity parameter. The efficiency of ME and ML was also shown to decrease in response to increased BLH. We note that the shape of the true tree will in part determine BLH and represents a critical factor in the probability of recovering the correct topology. An important finding suggests that researchers cannot expect that different branches that were in fact the same length will have the same probability of being accurately reconstructed when BLH exists in the overall tree. We conclude that methods designed to minimize the interdependencies of branch length estimates (BLEs) may (1) reduce both the variance and the covariance associated with the estimates and (2) increase the efficiency of model-based optimality criteria. We speculate on possible ways to reduce the nonindependence of BLEs under OLS and ML. Received: 9 March 1999 / Accepted: 4 May 1999  相似文献   

13.
Shin Y  Raudenbush SW 《Biometrics》2007,63(4):1262-1268
The development of model-based methods for incomplete data has been a seminal contribution to statistical practice. Under the assumption of ignorable missingness, one estimates the joint distribution of the complete data for thetainTheta from the incomplete or observed data y(obs). Many interesting models involve one-to-one transformations of theta. For example, with y(i) approximately N(mu, Sigma) for i= 1, ... , n and theta= (mu, Sigma), an ordinary least squares (OLS) regression model is a one-to-one transformation of theta. Inferences based on such a transformation are equivalent to inferences based on OLS using data multiply imputed from f(y(mis) | y(obs), theta) for missing y(mis). Thus, identification of theta from y(obs) is equivalent to identification of the regression model. In this article, we consider a model for two-level data with continuous outcomes where the observations within each cluster are dependent. The parameters of the hierarchical linear model (HLM) of interest, however, lie in a subspace of Theta in general. This identification of the joint distribution overidentifies the HLM. We show how to characterize the joint distribution so that its parameters are a one-to-one transformation of the parameters of the HLM. This leads to efficient estimation of the HLM from incomplete data using either the transformation method or the method of multiple imputation. The approach allows outcomes and covariates to be missing at either of the two levels, and the HLM of interest can involve the regression of any subset of variables on a disjoint subset of variables conceived as covariates.  相似文献   

14.
Effects of censoring on parameter estimates and power in genetic modeling.   总被引:5,自引:0,他引:5  
Genetic and environmental influences on variance in phenotypic traits may be estimated with normal theory Maximum Likelihood (ML). However, when the assumption of multivariate normality is not met, this method may result in biased parameter estimates and incorrect likelihood ratio tests. We simulated multivariate normal distributed twin data under the assumption of three different genetic models. Genetic model fitting was performed in six data sets: multivariate normal data, discrete uncensored data, censored data, square root transformed censored data, normal scores of censored data, and categorical data. Estimates were obtained with normal theory ML (data sets 1-5) and with categorical data analysis (data set 6). Statistical power was examined by fitting reduced models to the data. When fitting an ACE model to censored data, an unbiased estimate of the additive genetic effect was obtained. However, the common environmental effect was underestimated and the unique environmental effect was overestimated. Transformations did not remove this bias. When fitting an ADE model, the additive genetic effect was underestimated while the dominant and unique environmental effects were overestimated. In all models, the correct parameter estimates were recovered with categorical data analysis. However, with categorical data analysis, the statistical power decreased. The analysis of L-shaped distributed data with normal theory ML results in biased parameter estimates. Unbiased parameter estimates are obtained with categorical data analysis, but the power decreases.  相似文献   

15.
Due to increasing discoveries of biomarkers and observed diversity among patients, there is growing interest in personalized medicine for the purpose of increasing the well‐being of patients (ethics) and extending human life. In fact, these biomarkers and observed heterogeneity among patients are useful covariates that can be used to achieve the ethical goals of clinical trials and improving the efficiency of statistical inference. Covariate‐adjusted response‐adaptive (CARA) design was developed to use information in such covariates in randomization to maximize the well‐being of participating patients as well as increase the efficiency of statistical inference at the end of a clinical trial. In this paper, we establish conditions for consistency and asymptotic normality of maximum likelihood (ML) estimators of generalized linear models (GLM) for a general class of adaptive designs. We prove that the ML estimators are consistent and asymptotically follow a multivariate Gaussian distribution. The efficiency of the estimators and the performance of response‐adaptive (RA), CARA, and completely randomized (CR) designs are examined based on the well‐being of patients under a logit model with categorical covariates. Results from our simulation studies and application to data from a clinical trial on stroke prevention in atrial fibrillation (SPAF) show that RA designs lead to ethically desirable outcomes as well as higher statistical efficiency compared to CARA designs if there is no treatment by covariate interaction in an ideal model. CARA designs were however more ethical than RA designs when there was significant interaction.  相似文献   

16.
Lou XY  Yang MC 《Genetica》2006,128(1-3):471-484
A genetic model is developed with additive and dominance effects of a single gene and polygenes as well as general and specific reciprocal effects for the progeny from a diallel mating design. The methods of ANOVA, minimum norm quadratic unbiased estimation (MINQUE), restricted maximum likelihood estimation (REML), and maximum likelihood estimation (ML) are suggested for estimating variance components, and the methods of generalized least squares (GLS) and ordinary least squares (OLS) for fixed effects, while best linear unbiased prediction, linear unbiased prediction (LUP), and adjusted unbiased prediction are suggested for analyzing random effects. Monte Carlo simulations were conducted to evaluate the unbiasedness and efficiency of statistical methods involving two diallel designs with commonly used sample sizes, 6 and 8 parents, with no and missing crosses, respectively. Simulation results show that GLS and OLS are almost equally efficient for estimation of fixed effects, while MINQUE (1) and REML are better estimators of the variance components and LUP is most practical method for prediction of random effects. Data from a Drosophila melanogaster experiment (Gilbert 1985a, Theor appl Genet 69:625–629) were used as a working example to demonstrate the statistical analysis. The new methodology is also applicable to screening candidate gene(s) and to other mating designs with multiple parents, such as nested (NC Design I) and factorial (NC Design II) designs. Moreover, this methodology can serve as a guide to develop new methods for detecting indiscernible major genes and mapping quantitative trait loci based on mixture distribution theory. The computer program for the methods suggested in this article is freely available from the authors.  相似文献   

17.
Rodenberg C  Zhou XH 《Biometrics》2000,56(4):1256-1262
A receiver operating characteristic (ROC) curve is commonly used to measure the accuracy of a medical test. It is a plot of the true positive fraction (sensitivity) against the false positive fraction (1-specificity) for increasingly stringent positivity criterion. Bias can occur in estimation of an ROC curve if only some of the tested patients are selected for disease verification and if analysis is restricted only to the verified cases. This bias is known as verification bias. In this paper, we address the problem of correcting for verification bias in estimation of an ROC curve when the verification process and efficacy of the diagnostic test depend on covariates. Our method applies the EM algorithm to ordinal regression models to derive ML estimates for ROC curves as a function of covariates, adjusted for covariates affecting the likelihood of being verified. Asymptotic variance estimates are obtained using the observed information matrix of the observed data. These estimates are derived under the missing-at-random assumption, which means that selection for disease verification depends only on the observed data, i.e., the test result and the observed covariates. We also address the issues of model selection and model checking. Finally, we illustrate the proposed method on data from a two-phase study of dementia disorders, where selection for verification depends on the screening test result and age.  相似文献   

18.
SUMMARY: We consider two-armed clinical trials in which the response and/or the covariates are observed on either a binary, ordinal, or continuous scale. A new general nonparametric (NP) approach for covariate adjustment is presented using the notion of a relative effect to describe treatment effects. The relative effect is defined by the probability of observing a higher response in the experimental than in the control arm. The notion is invariant under monotone transformations of the data and is therefore especially suitable for ordinal data. For a normal or binary distributed response the relative effect is the transformed effect size or the difference of response probability, respectively. An unbiased and consistent NP estimator for the relative effect is presented. Further, we suggest a NP procedure for correcting the relative effect for covariate imbalance and random covariate imbalance, yielding a consistent estimator for the adjusted relative effect. Asymptotic theory has been developed to derive test statistics and confidence intervals. The test statistic is based on the joint behavior of the estimated relative effect for the response and the covariates. It is shown that the test statistic can be used to evaluate the treatment effect in the presence of (random) covariate imbalance. Approximations for small sample sizes are considered as well. The sampling behavior of the estimator of the adjusted relative effect is examined. We also compare the probability of a type I error and the power of our approach to standard covariate adjustment methods by means of a simulation study. Finally, our approach is illustrated on three studies involving ordinal responses and covariates.  相似文献   

19.
This paper introduces a simple stochastic model for waterfowl movement. After outlining the properties of the model, we focus on parameter estimation. We compare three standard least squares estimation procedures with maximum likelihood (ML) estimates using Monte Carlo simulations. For our model, little is gained by incorporating information about the covariance structure of the process into least squares estimation. In fact, misspecifying the covariance produces worse estimates than ignoring heteroscedasticity and autocorrelation. We also develop a modified least squares procedure that performs as well as ML. We then apply the five estimators to field data and show that differences in the statistical properties of the estimators can greatly affect our interpretation of the data. We conclude by highlighting the effects of density on per capita movement rates.  相似文献   

20.
Has land surface cover in South America been impacted by the loss of most large herbivores following the severe Pleistocene and Early Holocene megafauna extinctions on this continent? Here, we estimate how mean savanna woody biomass may have changed in the Americas following these extinctions by creating an empirical model to understand how large herbivores impact savanna woody biomass. To create this empirical model, we combine a large recently published dataset of savanna woody cover from Lehmann et al. (2014) (n = 2154 plots) with estimates of mammals ranges and weights from the IUCN database. We evaluate how variables such as number of megaherbivores (mammal species ≥ 1000 kg), log10 sum species weights, and total number of mammal species predict changes to woody cover by using both ordinary least squares regression analysis (OLS) and simultaneous auto‐regressive (SAR) analysis to control for spatial autocorrelation. Both number of megaherbivores and log10 sum species weights, which both disproportionately weight for megaherbivores, significantly explained much (~ 5–13%) variance in woody cover, but the third variable weighting all animals equally, did not. We then combined these biotic variables with abiotic variables such as temperature, precipitation, and fire frequency to create a model predicting 36% of the variance of savanna woody cover. We used this model combined with estimated range maps of extinct South American megafauna to estimate that had those South American megafauna not gone extinct, total savanna woody cover in South America could possibly have decreased by ~ 29% and that savannas would likely have been more open like current African savannas.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号