首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The weights used in iterative weighted least squares (IWLS) regression are usually estimated parametrically using a working model for the error variance. When the variance function is misspecified, the IWLS estimates of the regression coefficients β are still asymptotically consistent but there is some loss in efficiency. Since second moments can be quite hard to model, it makes sense to estimate the error variances nonparametrically and to employ weights inversely proportional to the estimated variances in computing the WLS estimate for β. Surprisingly, this approach had not received much attention in the literature. The aim of this note is to demonstrate that such a procedure can be implemented easily in S-plus using standard functions with default options making it suitable for routine applications. The particular smoothing method that we use is local polynomial regression applied to the logarithm of the squared residuals but other smoothers can be tried as well. The proposed procedure is applied to data on the use of two different assay methods for a hormone. Efficiency calculations based on the estimated model show that the nonparametric IWLS estimates are more efficient than the parametric IWLS estimates based on three different plausible working models for the variance function. The proposed estimators also perform well in a simulation study using both parametric and nonparametric variance functions as well as normal and gamma errors.  相似文献   

2.
Wu Wang  Ying Sun 《Biometrics》2019,75(4):1179-1190
When performing spatial regression analysis in environmental data applications, spatial heterogeneity in the regression coefficients is often observed. Spatially varying coefficient models, including geographically weighted regression and spline models, are standard tools for quantifying such heterogeneity. In this paper, we propose a spatially varying coefficient model that represents the spatially varying parameters as a mixture of local polynomials at selected locations. The local polynomial parameters have attractive interpretations, indicating various types of spatial heterogeneity. Instead of estimating the spatially varying regression coefficients directly, we develop a penalized least squares regression procedure for the local polynomial parameter estimation, which both shrinks the parameter estimation and penalizes the differences among parameters that are associated with neighboring locations. We develop confidence intervals for the varying regression coefficients and prediction intervals for the response. We apply the proposed method to characterize the spatially varying association between particulate matter concentrations ( PM 2.5 ) and pollutant gases related to the secondary aerosol formulation in China. The identified regression coefficients show distinct spatial patterns for nitrogen dioxide, sulfur dioxide, and carbon monoxide during different seasons.  相似文献   

3.
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.  相似文献   

4.
The efficiencies of the estimators in the linear logistic regression model are examined using simulations under six missing value treatments. These treatments use either the maximum likelihood or the discriminant function approach in the estimation of the regression coefficients. Missing values are assumed to occur at random. The cases of multivariate normal and dichotomous independent variables are both considered. We found that in general, there is no uniformly best method. However, mean substitution and discriminant function estimation using existing pairs of values for correlations turn out to be favourable for the cases considered.  相似文献   

5.
Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

6.
A method for fitting regression models to data that exhibit spatial correlation and heteroskedasticity is proposed. It is well known that ignoring a nonconstant variance does not bias least-squares estimates of regression parameters; thus, data analysts are easily lead to the false belief that moderate heteroskedasticity can generally be ignored. Unfortunately, ignoring nonconstant variance when fitting variograms can seriously bias estimated correlation functions. By modeling heteroskedasticity and standardizing by estimated standard deviations, our approach eliminates this bias in the correlations. A combination of parametric and nonparametric regression techniques is used to iteratively estimate the various components of the model. The approach is demonstrated on a large data set of predicted nitrogen runoff from agricultural lands in the Midwest and Northern Plains regions of the U.S.A. For this data set, the model comprises three main components: (1) the mean function, which includes farming practice variables, local soil and climate characteristics, and the nitrogen application treatment, is assumed to be linear in the parameters and is fitted by generalized least squares; (2) the variance function, which contains a local and a spatial component whose shapes are left unspecified, is estimated by local linear regression; and (3) the spatial correlation function is estimated by fitting a parametric variogram model to the standardized residuals, with the standardization adjusting the variogram for the presence of heteroskedasticity. The fitting of these three components is iterated until convergence. The model provides an improved fit to the data compared with a previous model that ignored the heteroskedasticity and the spatial correlation.  相似文献   

7.
Dimension reduction methods have been proposed for regression analysis with predictors of high dimension, but have not received much attention on the problems with censored data. In this article, we present an iterative imputed spline approach based on principal Hessian directions (PHD) for censored survival data in order to reduce the dimension of predictors without requiring a prespecified parametric model. Our proposal is to replace the right-censored survival time with its conditional expectation for adjusting the censoring effect by using the Kaplan-Meier estimator and an adaptive polynomial spline regression in the residual imputation. A sparse estimation strategy is incorporated in our approach to enhance the interpretation of variable selection. This approach can be implemented in not only PHD, but also other methods developed for estimating the central mean subspace. Simulation studies with right-censored data are conducted for the imputed spline approach to PHD (IS-PHD) in comparison with two methods of sliced inverse regression, minimum average variance estimation, and naive PHD in ignorance of censoring. The results demonstrate that the proposed IS-PHD method is particularly useful for survival time responses approximating symmetric or bending structures. Illustrative applications to two real data sets are also presented.  相似文献   

8.
There are copula-based statistical models in the literature for regression with dependent data such as clustered and longitudinal overdispersed counts, for which parameter estimation and inference are straightforward. For situations where the main interest is in the regression and other univariate parameters and not the dependence, we propose a "weighted scores method", which is based on weighting score functions of the univariate margins. The weight matrices are obtained initially fitting a discretized multivariate normal distribution, which admits a wide range of dependence. The general methodology is applied to negative binomial regression models. Asymptotic and small-sample efficiency calculations show that our method is robust and nearly as efficient as maximum likelihood for fully specified copula models. An illustrative example is given to show the use of our weighted scores method to analyze utilization of health care based on family characteristics.  相似文献   

9.
The covariance function approach with an iterative two-stage algorithm of LIU et al. (2000) was applied to estimate parameters for the Polish Black-and-White dairy population based on a sample of 338 808 test day records for milk, fat, and protein yields. A multiple trait sire model was used to estimate covariances of lactation stages. A third-order Legendre polynomial was subsequently fitted to the estimated (co)variances to derive (co)variances of random regression coefficients for both additive genetic and permanent environment effects. Daily and 305-day heritability estimates obtained are consistent with several studies which used both fixed and random regression test day models. Genetic correlations between any two days in milk (DIM) of the same lactation as well as genetic correlations between the same DIM of two lactations were within a biologically acceptable range. It was shown that the applied estimation procedure can utilise very large data sets and give plausible estimates of (co)variance components.  相似文献   

10.
We present a robust and computationally inexpensive method to estimate the lengths and three-dimensional moment arms for a large number of musculotendon actuators of the human lower limb. Using a musculoskeletal model of the lower extremity, a set of values was established for the length of each musculotendon actuator for different lower limb generalized coordinates (joint angles). A multidimensional spline function was then used to fit these data. Muscle moment arms were obtained by differentiating the musculotendon length spline function with respect to the generalized coordinate of interest. This new method was then compared to a previously used polynomial regression method. Compared to the polynomial regression method, the multidimensional spline method produced lower errors for estimating musculotendon lengths and moment arms throughout the whole generalized coordinate workspace. The fitting accuracy was also less affected by the number of dependent degrees of freedom and by the amount of experimental data available. The spline method only required information on musculotendon lengths to estimate both musculotendon lengths and moment arms, thus relaxing data input requirements, whereas the polynomial regression requires different equations to be used for both musculotendon lengths and moment arms. Finally, we used the spline method in conjunction with an electromyography driven musculoskeletal model to estimate muscle forces under different contractile conditions, which showed that the method is suitable for the integration into large scale neuromusculoskeletal models.  相似文献   

11.
We describe the development of a multipoint nonparametric quantitative trait loci mapping method based on the Wilcoxon rank-sum test applicable to outbred half-sib pedigrees. The method has been evaluated on a simulated dataset and its efficiency compared with interval mapping by using regression. It was shown that the rank-based approach is slightly inferior to regression when the residual variance is homoscedastic normal; however, in three out of four other scenarios envisaged, i.e., residual variance heteroscedastic normal, homoscedastic skewed, and homoscedastic positively kurtosed, the latter outperforms the former one. Both methods were applied to a real data set analyzing the effect of bovine chromosome 6 on milk yield and composition by using a 125-cM map comprising 15 microsatellites and a granddaughter design counting 1158 Holstein-Friesian sires.  相似文献   

12.
Local approximation and its applications in statistics   总被引:1,自引:0,他引:1  
For the discrete and for the continuous case, the problem of evaluating the derivatives of a function f(x) in a given interval of x is solved by local approximation method. Examples of application of the resulting numerical procedures are quoted relating the estimation of smooth function and its derivative for measured values (of a growth process), internal regression, trend elimination of time series, Bernstein polynomial, and kernel estimation of a density function.  相似文献   

13.
Fei Liu  David Dunson  Fei Zou 《Biometrics》2011,67(2):504-512
Summary This article considers the problem of selecting predictors of time to an event from a high‐dimensional set of candidate predictors using data from multiple studies. As an alternative to the current multistage testing approaches, we propose to model the study‐to‐study heterogeneity explicitly using a hierarchical model to borrow strength. Our method incorporates censored data through an accelerated failure time model. Using a carefully formulated prior specification, we develop a fast approach to predictor selection and shrinkage estimation for high‐dimensional predictors. For model fitting, we develop a Monte Carlo expectation maximization (MC‐EM) algorithm to accommodate censored data. The proposed approach, which is related to the relevance vector machine (RVM), relies on maximum a posteriori estimation to rapidly obtain a sparse estimate. As for the typical RVM, there is an intrinsic thresholding property in which unimportant predictors tend to have their coefficients shrunk to zero. We compare our method with some commonly used procedures through simulation studies. We also illustrate the method using the gene expression barcode data from three breast cancer studies.  相似文献   

14.
This paper focuses on the problems of estimation and variable selection in the functional linear regression model (FLM) with functional response and scalar covariates. To this end, two different types of regularization (L1 and L2) are considered in this paper. On the one hand, a sample approach for functional LASSO in terms of basis representation of the sample values of the response variable is proposed. On the other hand, we propose a penalized version of the FLM by introducing a P-spline penalty in the least squares fitting criterion. But our aim is to propose P-splines as a powerful tool simultaneously for variable selection and functional parameters estimation. In that sense, the importance of smoothing the response variable before fitting the model is also studied. In summary, penalized (L1 and L2) and nonpenalized regression are combined with a presmoothing of the response variable sample curves, based on regression splines or P-splines, providing a total of six approaches to be compared in two simulation schemes. Finally, the most competitive approach is applied to a real data set based on the graft-versus-host disease, which is one of the most frequent complications (30% –50%) in allogeneic hematopoietic stem-cell transplantation.  相似文献   

15.
The standard Cox model is perhaps the most commonly used model for regression analysis of failure time data but it has some limitations such as the assumption on linear covariate effects. To relax this, the nonparametric additive Cox model, which allows for nonlinear covariate effects, is often employed, and this paper will discuss variable selection and structure estimation for this general model. For the problem, we propose a penalized sieve maximum likelihood approach with the use of Bernstein polynomials approximation and group penalization. To implement the proposed method, an efficient group coordinate descent algorithm is developed and can be easily carried out for both low- and high-dimensional scenarios. Furthermore, a simulation study is performed to assess the performance of the presented approach and suggests that it works well in practice. The proposed method is applied to an Alzheimer's disease study for identifying important and relevant genetic factors.  相似文献   

16.
Forensic age estimation is receiving growing attention from researchers in the last few years. Accurate estimates of age are needed both for identifying real age in individuals without any identity document and assessing it for human remains. The methods applied in such context are mostly based on radiological analysis of some anatomical districts and entail the use of a regression model. However, estimating chronological age by regression models leads to overestimated ages in younger subjects and underestimated ages in older ones. We introduced a full Bayesian calibration method combined with a segmented function for age estimation that relied on a Normal distribution as a density model to mitigate this bias. In this way, we were also able to model the decreasing growth rate in juveniles. We compared our new Bayesian‐segmented model with other existing approaches. The proposed method helped producing more robust and precise forecasts of age than compared models while exhibited comparable accuracy in terms of forecasting measures. Our method seemed to overcome the estimation bias also when applied to a real data set of South‐African juvenile subjects.  相似文献   

17.
A random regression model for the analysis of "repeated" records in animal breeding is described which combines a random regression approach for additive genetic and other random effects with the assumption of a parametric correlation structure for within animal covariances. Both stationary and non-stationary correlation models involving a small number of parameters are considered. Heterogeneity in within animal variances is modelled through polynomial variance functions. Estimation of parameters describing the dispersion structure of such model by restricted maximum likelihood via an "average information" algorithm is outlined. An application to mature weight records of beef cow is given, and results are contrasted to those from analyses fitting sets of random regression coefficients for permanent environmental effects.  相似文献   

18.
The application of joint contact mechanics requires a precise configuration of the joint surfaces. B-Spline, and NURBS have been widely used to model joint surfaces, but because these formulations use a structured data set provided by a rectangular net first, then a grid, there is a limit to the accuracy of the models they can produce. However new imaging systems such as 3D laser scanners can provide more realistic unstructured data sets. What is needed is a method to manipulate the unstructured data. We created a parametric polynomial function and applied it to unstructured data sets obtained by scanning joint surfaces. We applied our polynomial model to unstructured data sets from an artificial joint, and confirmed that our polynomial produced a smoother and more accurate model than the conventional B-spline method. Next, we applied it to a diarthrodial joint surface containing many ripples, and found that our function's noise filtering characteristics smoothed out existing ripples. Since no formulation was found to be optimal for all applications, we used two formulations to model surfaces with ripples. First, we used our polynomial to describe the global shape of the objective surface. Minute undulations were then specifically approximated with a Fourier series function. Finally, both approximated surfaces were superimposed to reproduce the original surface in a complete fashion.  相似文献   

19.
Welty LJ  Peng RD  Zeger SL  Dominici F 《Biometrics》2009,65(1):282-291
Summary .  A distributed lag model (DLagM) is a regression model that includes lagged exposure variables as covariates; its corresponding distributed lag (DL) function describes the relationship between the lag and the coefficient of the lagged exposure variable. DLagMs have recently been used in environmental epidemiology for quantifying the cumulative effects of weather and air pollution on mortality and morbidity. Standard methods for formulating DLagMs include unconstrained, polynomial, and penalized spline DLagMs. These methods may fail to take full advantage of prior information about the shape of the DL function for environmental exposures, or for any other exposure with effects that are believed to smoothly approach zero as lag increases, and are therefore at risk of producing suboptimal estimates. In this article, we propose a Bayesian DLagM (BDLagM) that incorporates prior knowledge about the shape of the DL function and also allows the degree of smoothness of the DL function to be estimated from the data. We apply our BDLagM to its motivating data from the National Morbidity, Mortality, and Air Pollution Study to estimate the short-term health effects of particulate matter air pollution on mortality from 1987 to 2000 for Chicago, Illinois. In a simulation study, we compare our Bayesian approach with alternative methods that use unconstrained, polynomial, and penalized spline DLagMs. We also illustrate the connection between BDLagMs and penalized spline DLagMs. Software for fitting BDLagM models and the data used in this article are available online.  相似文献   

20.
Semiparametric smoothing methods are usually used to model longitudinal data, and the interest is to improve efficiency for regression coefficients. This paper is concerned with the estimation in semiparametric varying‐coefficient models (SVCMs) for longitudinal data. By the orthogonal projection method, local linear technique, quasi‐score estimation, and quasi‐maximum likelihood estimation, we propose a two‐stage orthogonality‐based method to estimate parameter vector, coefficient function vector, and covariance function. The developed procedures can be implemented separately and the resulting estimators do not affect each other. Under some mild conditions, asymptotic properties of the resulting estimators are established explicitly. In particular, the asymptotic behavior of the estimator of coefficient function vector at the boundaries is examined. Further, the finite sample performance of the proposed procedures is assessed by Monte Carlo simulation experiments. Finally, the proposed methodology is illustrated with an analysis of an acquired immune deficiency syndrome (AIDS) dataset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号