首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Kneib T  Fahrmeir L 《Biometrics》2006,62(1):109-118
Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.  相似文献   

2.
The total deviation index of Lin and Lin et al. is an intuitive approach for the assessment of agreement between two methods of measurement. It assumes that the differences of the paired measurements are a random sample from a normal distribution and works essentially by constructing a probability content tolerance interval for this distribution. We generalize this approach to the case when differences may not have identical distributions -- a common scenario in applications. In particular, we use the regression approach to model the mean and the variance of differences as functions of observed values of the average of the paired measurements, and describe two methods based on asymptotic theory of maximum likelihood estimators for constructing a simultaneous probability content tolerance band. The first method uses bootstrap to approximate the critical point and the second method is an analytical approximation. Simulation shows that the first method works well for sample sizes as small as 30 and the second method is preferable for large sample sizes. We also extend the methodology for the case when the mean function is modeled using penalized splines via a mixed model representation. Two real data applications are presented.  相似文献   

3.
In the present paper the linear logistic extension of latent class analysis is described. Thereby it is assumed that the item latent probabilities as well as the class sizes can be attributed to some explanatory variables. The basic equations of the model state the decomposition of the log-odds of the item latent probabilities and of the class sizes into weighted sums of basic parameters representing the effects of the predictor variables. Further, the maximum likelihood equations for these effect parameters and statistical tests for goodness-of-fit are given. Finally, an example illustrates the practical application of the model and the interpretation of the model parameters.  相似文献   

4.
Houseman EA  Marsit C  Karagas M  Ryan LM 《Biometrics》2007,63(4):1269-1277
Increasingly used in health-related applications, latent variable models provide an appealing framework for handling high-dimensional exposure and response data. Item response theory (IRT) models, which have gained widespread popularity, were originally developed for use in the context of educational testing, where extremely large sample sizes permitted the estimation of a moderate-to-large number of parameters. In the context of public health applications, smaller sample sizes preclude large parameter spaces. Therefore, we propose a penalized likelihood approach to reduce mean square error and improve numerical stability. We present a continuous family of models, indexed by a tuning parameter, that range between the Rasch model and the IRT model. The tuning parameter is selected by cross validation or approximations such as Akaike Information Criterion. While our approach can be placed easily in a Bayesian context, we find that our frequentist approach is more computationally efficient. We demonstrate our methodology on a study of methylation silencing of gene expression in bladder tumors. We obtain similar results using both frequentist and Bayesian approaches, although the frequentist approach is less computationally demanding. In particular, we find high correlation of methylation silencing among 16 loci in bladder tumors, that methylation is associated with smoking and also with patient survival.  相似文献   

5.
Guo W 《Biometrics》2002,58(1):121-128
In this article, a new class of functional models in which smoothing splines are used to model fixed effects as well as random effects is introduced. The linear mixed effects models are extended to nonparametric mixed effects models by introducing functional random effects, which are modeled as realizations of zero-mean stochastic processes. The fixed functional effects and the random functional effects are modeled in the same functional space, which guarantee the population-average and subject-specific curves have the same smoothness property. These models inherit the flexibility of the linear mixed effects models in handling complex designs and correlation structures, can include continuous covariates as well as dummy factors in both the fixed or random design matrices, and include the nested curves models as special cases. Two estimation procedures are proposed. The first estimation procedure exploits the connection between linear mixed effects models and smoothing splines and can be fitted using existing software. The second procedure is a sequential estimation procedure using Kalman filtering. This algorithm avoids inversion of large dimensional matrices and therefore can be applied to large data sets. A generalized maximum likelihood (GML) ratio test is proposed for inference and model selection. An application to comparison of cortisol profiles is used as an illustration.  相似文献   

6.
A predictive continuous time model is developed for continuous panel data to assess the effect of time‐varying covariates on the general direction of the movement of a continuous response that fluctuates over time. This is accomplished by reparameterizing the infinitesimal mean of an Ornstein–Uhlenbeck processes in terms of its equilibrium mean and a drift parameter, which assesses the rate that the process reverts to its equilibrium mean. The equilibrium mean is modeled as a linear predictor of covariates. This model can be viewed as a continuous time first‐order autoregressive regression model with time‐varying lag effects of covariates and the response, which is more appropriate for unequally spaced panel data than its discrete time analog. Both maximum likelihood and quasi‐likelihood approaches are considered for estimating the model parameters and their performances are compared through simulation studies. The simpler quasi‐likelihood approach is suggested because it yields an estimator that is of high efficiency relative to the maximum likelihood estimator and it yields a variance estimator that is robust to the diffusion assumption of the model. To illustrate the proposed model, an application to diastolic blood pressure data from a follow‐up study on cardiovascular diseases is presented. Missing observations are handled naturally with this model.  相似文献   

7.
Summary Latent class analysis (LCA) and latent class regression (LCR) are widely used for modeling multivariate categorical outcomes in social science and biomedical studies. Standard analyses assume data of different respondents to be mutually independent, excluding application of the methods to familial and other designs in which participants are clustered. In this article, we consider multilevel latent class models, in which subpopulation mixing probabilities are treated as random effects that vary among clusters according to a common Dirichlet distribution. We apply the expectation‐maximization (EM) algorithm for model fitting by maximum likelihood (ML). This approach works well, but is computationally intensive when either the number of classes or the cluster size is large. We propose a maximum pairwise likelihood (MPL) approach via a modified EM algorithm for this case. We also show that a simple latent class analysis, combined with robust standard errors, provides another consistent, robust, but less‐efficient inferential procedure. Simulation studies suggest that the three methods work well in finite samples, and that the MPL estimates often enjoy comparable precision as the ML estimates. We apply our methods to the analysis of comorbid symptoms in the obsessive compulsive disorder study. Our models' random effects structure has more straightforward interpretation than those of competing methods, thus should usefully augment tools available for LCA of multilevel data.  相似文献   

8.
Summary .   In this article, we present new methods to analyze data from an experiment using rodent models to investigate the role of p27, an important cell-cycle mediator, in early colon carcinogenesis. The responses modeled here are essentially functions nested within a two-stage hierarchy. Standard functional data analysis literature focuses on a single stage of hierarchy and conditionally independent functions with near white noise. However, in our experiment, there is substantial biological motivation for the existence of spatial correlation among the functions, which arise from the locations of biological structures called colonic crypts: this possible functional correlation is a phenomenon we term crypt signaling . Thus, as a point of general methodology, we require an analysis that allows for functions to be correlated at the deepest level of the hierarchy. Our approach is fully Bayesian and uses Markov chain Monte Carlo methods for inference and estimation. Analysis of this data set gives new insights into the structure of p27 expression in early colon carcinogenesis and suggests the existence of significant crypt signaling. Our methodology uses regression splines, and because of the hierarchical nature of the data, dimension reduction of the covariance matrix of the spline coefficients is important: we suggest simple methods for overcoming this problem.  相似文献   

9.
Summary .  We consider variable selection in the Cox regression model ( Cox, 1975 ,  Biometrika   362, 269–276) with covariates missing at random. We investigate the smoothly clipped absolute deviation penalty and adaptive least absolute shrinkage and selection operator (LASSO) penalty, and propose a unified model selection and estimation procedure. A computationally attractive algorithm is developed, which simultaneously optimizes the penalized likelihood function and penalty parameters. We also optimize a model selection criterion, called the   IC Q    statistic ( Ibrahim, Zhu, and Tang, 2008 ,  Journal of the American Statistical Association   103, 1648–1658), to estimate the penalty parameters and show that it consistently selects all important covariates. Simulations are performed to evaluate the finite sample performance of the penalty estimates. Also, two lung cancer data sets are analyzed to demonstrate the proposed methodology.  相似文献   

10.
Moming Li  Guoqing Diao  Jing Qin 《Biometrics》2020,76(4):1216-1228
We consider a two-sample problem where data come from symmetric distributions. Usual two-sample data with only magnitudes recorded, arising from case-control studies or logistic discriminant analyses, may constitute a symmetric two-sample problem. We propose a semiparametric model such that, in addition to symmetry, the log ratio of two unknown density functions is modeled in a known parametric form. The new semiparametric model, tailor-made for symmetric two-sample data, can also be viewed as a biased sampling model subject to symmetric constraint. A maximum empirical likelihood estimation approach is adopted to estimate the unknown model parameters, and the corresponding profile empirical likelihood ratio test is utilized to perform hypothesis testing regarding the two population distributions. Symmetry, however, comes with irregularity. It is shown that, under the null hypothesis of equal symmetric distributions, the maximum empirical likelihood estimator has degenerate Fisher information, and the test statistic has a mixture of χ2-type asymptotic distribution. Extensive simulation studies have been conducted to demonstrate promising statistical powers under correct and misspecified models. We apply the proposed methods to two real examples.  相似文献   

11.
Shuwei Li  Limin Peng 《Biometrics》2023,79(1):253-263
Assessing causal treatment effect on a time-to-event outcome is of key interest in many scientific investigations. Instrumental variable (IV) is a useful tool to mitigate the impact of endogenous treatment selection to attain unbiased estimation of causal treatment effect. Existing development of IV methodology, however, has not attended to outcomes subject to interval censoring, which are ubiquitously present in studies with intermittent follow-up but are challenging to handle in terms of both theory and computation. In this work, we fill in this important gap by studying a general class of causal semiparametric transformation models with interval-censored data. We propose a nonparametric maximum likelihood estimator of the complier causal treatment effect. Moreover, we design a reliable and computationally stable expectation–maximization (EM) algorithm, which has a tractable objective function in the maximization step via the use of Poisson latent variables. The asymptotic properties of the proposed estimators, including the consistency, asymptotic normality, and semiparametric efficiency, are established with empirical process techniques. We conduct extensive simulation studies and an application to a colorectal cancer screening data set, showing satisfactory finite-sample performance of the proposed method as well as its prominent advantages over naive methods.  相似文献   

12.
Leung Lai T  Shih MC  Wong SP 《Biometrics》2006,62(1):159-167
To circumvent the computational complexity of likelihood inference in generalized mixed models that assume linear or more general additive regression models of covariate effects, Laplace's approximations to multiple integrals in the likelihood have been commonly used without addressing the issue of adequacy of the approximations for individuals with sparse observations. In this article, we propose a hybrid estimation scheme to address this issue. The likelihoods for subjects with sparse observations use Monte Carlo approximations involving importance sampling, while Laplace's approximation is used for the likelihoods of other subjects that satisfy a certain diagnostic check on the adequacy of Laplace's approximation. Because of its computational tractability, the proposed approach allows flexible modeling of covariate effects by using regression splines and model selection procedures for knot and variable selection. Its computational and statistical advantages are illustrated by simulation and by application to longitudinal data from a fecundity study of fruit flies, for which overdispersion is modeled via a double exponential family.  相似文献   

13.
In survival analysis when the mortality reaches a peak after some finite period and then slowly declines, it is appropriate to use a model which has a nonmonotonic failure rate. In this paper we study the log-logistic model whose failure rate exhibits the above behavior and its mean residual life behaves in the reverse fashion. The maximum likelihood estimation of the parameters is examined and it is proved analytically that unique maximum likelihood estimates exist for the parameters. A lung cancer data set is analyzed. Confidence intervals for the parameters as well as for the critical points of the failure rate and mean residual life functions are obtained for the high performance status (PS) and low PS subgroups, where the term performance status is a measure of general medical status.  相似文献   

14.
Semiparametric analysis of zero-inflated count data   总被引:1,自引:0,他引:1  
Lam KF  Xue H  Cheung YB 《Biometrics》2006,62(4):996-1003
Medical and public health research often involve the analysis of count data that exhibit a substantially large proportion of zeros, such as the number of heart attacks and the number of days of missed primary activities in a given period. A zero-inflated Poisson regression model, which hypothesizes a two-point heterogeneity in the population characterized by a binary random effect, is generally used to model such data. Subjects are broadly categorized into the low-risk group leading to structural zero counts and high-risk (or normal) group so that the counts can be modeled by a Poisson regression model. The main aim is to identify the explanatory variables that have significant effects on (i) the probability that the subject is from the low-risk group by means of a logistic regression formulation; and (ii) the magnitude of the counts, given that the subject is from the high-risk group by means of a Poisson regression where the effects of the covariates are assumed to be linearly related to the natural logarithm of the mean of the counts. In this article we consider a semiparametric zero-inflated Poisson regression model that postulates a possibly nonlinear relationship between the natural logarithm of the mean of the counts and a particular covariate. A sieve maximum likelihood estimation method is proposed. Asymptotic properties of the proposed sieve maximum likelihood estimators are discussed. Under some mild conditions, the estimators are shown to be asymptotically efficient and normally distributed. Simulation studies were carried out to investigate the performance of the proposed method. For illustration purpose, the method is applied to a data set from a public health survey conducted in Indonesia where the variable of interest is the number of days of missed primary activities due to illness in a 4-week period.  相似文献   

15.
16.
Huang J  Harrington D 《Biometrics》2002,58(4):781-791
The Cox proportional hazards model is often used for estimating the association between covariates and a potentially censored failure time, and the corresponding partial likelihood estimators are used for the estimation and prediction of relative risk of failure. However, partial likelihood estimators are unstable and have large variance when collinearity exists among the explanatory variables or when the number of failures is not much greater than the number of covariates of interest. A penalized (log) partial likelihood is proposed to give more accurate relative risk estimators. We show that asymptotically there always exists a penalty parameter for the penalized partial likelihood that reduces mean squared estimation error for log relative risk, and we propose a resampling method to choose the penalty parameter. Simulations and an example show that the bootstrap-selected penalized partial likelihood estimators can, in some instances, have smaller bias than the partial likelihood estimators and have smaller mean squared estimation and prediction errors of log relative risk. These methods are illustrated with a data set in multiple myeloma from the Eastern Cooperative Oncology Group.  相似文献   

17.
Modeling functional data with spatially heterogeneous shape characteristics   总被引:1,自引:0,他引:1  
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural magnetic resonance imaging (MRI).  相似文献   

18.
Maximum likelihood estimation of the model parameters for a spatial population based on data collected from a survey sample is usually straightforward when sampling and non-response are both non-informative, since the model can then usually be fitted using the available sample data, and no allowance is necessary for the fact that only a part of the population has been observed. Although for many regression models this naive strategy yields consistent estimates, this is not the case for some models, such as spatial auto-regressive models. In this paper, we show that for a broad class of such models, a maximum marginal likelihood approach that uses both sample and population data leads to more efficient estimates since it uses spatial information from sampled as well as non-sampled units. Extensive simulation experiments based on two well-known data sets are used to assess the impact of the spatial sampling design, the auto-correlation parameter and the sample size on the performance of this approach. When compared to some widely used methods that use only sample data, the results from these experiments show that the maximum marginal likelihood approach is much more precise.  相似文献   

19.
On estimation and prediction for spatial generalized linear mixed models   总被引:4,自引:0,他引:4  
Zhang H 《Biometrics》2002,58(1):129-136
We use spatial generalized linear mixed models (GLMM) to model non-Gaussian spatial variables that are observed at sampling locations in a continuous area. In many applications, prediction of random effects in a spatial GLMM is of great practical interest. We show that the minimum mean-squared error (MMSE) prediction can be done in a linear fashion in spatial GLMMs analogous to linear kriging. We develop a Monte Carlo version of the EM gradient algorithm for maximum likelihood estimation of model parameters. A by-product of this approach is that it also produces the MMSE estimates for the realized random effects at the sampled sites. This method is illustrated through a simulation study and is also applied to a real data set on plant root diseases to obtain a map of disease severity that can facilitate the practice of precision agriculture.  相似文献   

20.
Qin J  Leung DH 《Biometrics》2005,61(2):456-464
Malaria remains a major epidemiologic problem in many developing countries. Malaria is defined as the presence of parasites and symptoms (usually fever) due to the parasites. In endemic areas, an individual may have symptoms attributable either to malaria or to other causes. From a clinical viewpoint, it is important to correctly diagnose an individual who has developed symptoms so that the appropriate treatments can be given. From an epidemiologic and economic viewpoint, it is important to determine the proportion of malaria-affected cases in individuals who have symptoms so that policies on intervention program can be developed. Once symptoms have developed in an individual, the diagnosis of malaria can be based on the analysis of the parasite levels in blood samples. However, even a blood test is not conclusive as in endemic areas many healthy individuals can have parasites in their blood slides. Therefore, data from this type of study can be viewed as coming from a mixture distribution, with the components corresponding to malaria and non-malaria cases. A unique feature in this type of data, however, is the fact that a proportion of the non-malaria cases have zero parasite levels. Therefore, one of the component distributions is itself a mixture distribution. In this article, we propose a semiparametric likelihood approach for estimating the proportion of clinical malaria using parasite-level data from a group of individuals with symptoms. Our approach assumes the density ratio for the parasite levels in clinical malaria and nonclinical malaria cases can be modeled using a logistic model. We use empirical likelihood to combine the zero and nonzero data. The maximum semiparametric likelihood estimate is more efficient than existing nonparametric estimates using only the frequencies of zero and nonzero data. On the other hand, it is more robust than a fully parametric maximum likelihood estimate that assumes a parametric model for the nonzero data. Simulation results show that the performance of the proposed method is satisfactory. The proposed method is used to analyze data from a malaria survey carried out in Tanzania.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号