期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Generalized additive modeling with implicit variable selection by likelihood-based boosting

Tutz G Binder H 《Biometrics》2006,62(4):961-971

The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson, and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. Penalized regression splines and the newly introduced penalized stumps are considered as weak learners. Estimates of standard deviations and stopping criteria, which are notorious problems in iterative procedures, are based on an approximate hat matrix. The method is shown to be a strong competitor to common procedures for the fitting of generalized additive models. In particular, in high-dimensional settings with many nuisance predictor variables it performs very well. 相似文献

2.

Low-rank scale-invariant tensor product smooths for generalized additive mixed models 总被引：1，自引：0，他引：1

Wood SN 《Biometrics》2006,62(4):1025-1036

A general method for constructing low-rank tensor product smooths for use as components of generalized additive models or generalized additive mixed models is presented. A penalized regression approach is adopted in which tensor product smooths of several variables are constructed from smooths of each variable separately, these "marginal" smooths being represented using a low-rank basis with an associated quadratic wiggliness penalty. The smooths offer several advantages: (i) they have one wiggliness penalty per covariate and are hence invariant to linear rescaling of covariates, making them useful when there is no "natural" way to scale covariates relative to each other; (ii) they have a useful tuneable range of smoothness, unlike single-penalty tensor product smooths that are scale invariant; (iii) the relatively low rank of the smooths means that they are computationally efficient; (iv) the penalties on the smooths are easily interpretable in terms of function shape; (v) the smooths can be generated completely automatically from any marginal smoothing bases and associated quadratic penalties, giving the modeler considerable flexibility to choose the basis penalty combination most appropriate to each modeling task; and (vi) the smooths can easily be written as components of a standard linear or generalized linear mixed model, allowing them to be used as components of the rich family of such models implemented in standard software, and to take advantage of the efficient and stable computational methods that have been developed for such models. A small simulation study shows that the methods can compare favorably with recently developed smoothing spline ANOVA methods. 相似文献

3.

Structured additive regression for categorical space-time data: a mixed model approach 总被引：1，自引：0，他引：1

Kneib T Fahrmeir L 《Biometrics》2006,62(1):109-118

Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey. 相似文献

4.

Semiparametric models for missing covariate and response data in regression models

Chen Q Ibrahim JG 《Biometrics》2006,62(1):177-184

We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods. 相似文献

5.

Stochastic Approximation Boosting for Incomplete Data Problems

Joseph Sexton Petter Laake 《Biometrics》2009,65(4):1156-1163

Summary Boosting is a powerful approach to fitting regression models. This article describes a boosting algorithm for likelihood‐based estimation with incomplete data. The algorithm combines boosting with a variant of stochastic approximation that uses Markov chain Monte Carlo to deal with the missing data. Applications to fitting generalized linear and additive models with missing covariates are given. The method is applied to the Pima Indians Diabetes Data where over half of the cases contain missing values. 相似文献

6.

Generalized models vs. classification tree analysis: Predicting spatial distributions of plant species at different scales

Wilfried Thuiller Miguel B. Araújo Sandra Lavorel 《植被学杂志》2003,14(5):669-680

Abstract. Statistical models of the realized niche of species are increasingly used, but systematic comparisons of alternative methods are still limited. In particular, only few studies have explored the effect of scale in model outputs. In this paper, we investigate the predictive ability of three statistical methods (generalized linear models, generalized additive models and classification tree analysis) using species distribution data at three scales: fine (Catalonia), intermediate (Portugal) and coarse (Europe). Four Mediterranean tree species were modelled for comparison. Variables selected by models were relatively consistent across scales and the predictive accuracy of models varied only slightly. However, there were slight differences in the performance of methods. Classification tree analysis had a lower accuracy than the generalized methods, especially at finer scales. The performance of generalized linear models also increased with scale. At the fine scale GLM with linear terms showed better accuracy than GLM with quadratic and polynomial terms. This is probably because distributions at finer scales represent a linear sub‐sample of entire realized niches of species. In contrast to GLM, the performance of GAM was constant across scales being more data‐oriented. The predictive accuracy of GAM was always at least equal to other techniques, suggesting that this modelling approach is more robust to variations of scale because it can deal with any response shape. 相似文献

7.

The use of mixed logit models to reflect heterogeneity in capture-recapture studies 总被引：2，自引：0，他引：2

Coull BA Agresti A 《Biometrics》1999,55(1):294-301

We examine issues in estimating population size N with capture-recapture models when there is variable catchability among subjects. We focus on a logistic-normal mixed model, for which the logit of the probability of capture is an additive function of a random subject and a fixed sampling occasion parameter. When the probability of capture is small or the degree of heterogeneity is large, the log-likelihood surface is relatively flat and it is difficult to obtain much information about N. We also discuss a latent class model and a log-linear model that account for heterogeneity and show that the log-linear model has greater scope. Models assuming homogeneity provide much narrower intervals for N but are usually highly overly optimistic, the actual coverage probability being much lower than the nominal level. 相似文献

8.

Semiparametric frailty models for clustered failure time data

Yu Z Lin X Tu W 《Biometrics》2012,68(2):429-436

We consider frailty models with additive semiparametric covariate effects for clustered failure time data. We propose a doubly penalized partial likelihood (DPPL) procedure to estimate the nonparametric functions using smoothing splines. We show that the DPPL estimators could be obtained from fitting an augmented working frailty model with parametric covariate effects, whereas the nonparametric functions being estimated as linear combinations of fixed and random effects, and the smoothing parameters being estimated as extra variance components. This approach allows us to conveniently estimate all model components within a unified frailty model framework. We evaluate the finite sample performance of the proposed method via a simulation study, and apply the method to analyze data from a study of sexually transmitted infections (STI). 相似文献

9.

Prediction of plant species distribution in lowland river valleys in Belgium: modelling species response to site conditions 总被引：7，自引：0，他引：7

Ana M.F. Bio Piet De Becker Els De Bie Willy Huybrechts Martin Wassen 《Biodiversity and Conservation》2002,11(12):2189-2216

In ecological modelling, limitations in data and their applicability for predictive modelling are more rule than exception. Often modelling has to be performed on sub-optimal data, as explicit and controlled collection of (more) appropriate data would not be feasible. An example of predictive ecological modelling is given with application of generalized additive and generalized linear models fitted to presence–absence records of plant species and site condition data from four nutrient-poor Flemish lowland valleys. Standard regression procedures are used for modelling, although explanatory and response data do not meet all the assumptions implicit in these procedures. Data were non-randomly collected and are spatially autocorrelated; model residuals retain part of that correlation. The scale of most site-condition records does not match the scale of the response variable (species distribution). Hence, interpolated and up-scaled explanatory variables are used. Data are aggregated from distinct phytogeographical regions to allow for generalized models, applicable to a wider population of river valleys in the same region. Nevertheless, ecologically sound models are obtained, which predict well the distribution of most plant species for the Flemish river valleys considered. 相似文献

10.

Autoregressive spatial smoothing and temporal spline smoothing for mapping rates 总被引：1，自引：0，他引：1

MacNab YC Dean CB 《Biometrics》2001,57(3):949-956

This article proposes generalized additive mixed models for the analysis of geographic and temporal variability of mortality rates. This class of models accommodates random spatial effects and fixed and random temporal components. Spatiotemporal models that use autoregressive local smoothing across the spatial dimension and B-spline smoothing over the temporal dimension are developed. The objective is the identification of temporal treads and the production of a series of smoothed maps from which spatial patterns of mortality risks can be monitored over time. Regions with consistently high rate estimates may be followed for further investigation. The methodology is illustrated by analysis of British Columbia infant mortality data. 相似文献

11.

On Testing an Unspecified Function Through a Linear Mixed Effects Model with Multiple Variance Components

Yuanjia Wang Huaihou Chen 《Biometrics》2012,68(4):1113-1125

Summary We examine a generalized F ‐test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying‐coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two‐way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F ‐test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome‐wide critical value and p ‐value of a genetic association test in a genome‐wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 10⁸ simulations) and asymptotic approximation may be unreliable and conservative. 相似文献

12.

Identification of sparse neural functional connectivity using penalized likelihood estimation and basis functions

Dong Song Haonan Wang Catherine Y. Tu Vasilis Z. Marmarelis Robert E. Hampson Sam A. Deadwyler Theodore W. Berger 《Journal of computational neuroscience》2013,35(3):335-357

One key problem in computational neuroscience and neural engineering is the identification and modeling of functional connectivity in the brain using spike train data. To reduce model complexity, alleviate overfitting, and thus facilitate model interpretation, sparse representation and estimation of functional connectivity is needed. Sparsities include global sparsity, which captures the sparse connectivities between neurons, and local sparsity, which reflects the active temporal ranges of the input-output dynamical interactions. In this paper, we formulate a generalized functional additive model (GFAM) and develop the associated penalized likelihood estimation methods for such a modeling problem. A GFAM consists of a set of basis functions convolving the input signals, and a link function generating the firing probability of the output neuron from the summation of the convolutions weighted by the sought model coefficients. Model sparsities are achieved by using various penalized likelihood estimations and basis functions. Specifically, we introduce two variations of the GFAM using a global basis (e.g., Laguerre basis) and group LASSO estimation, and a local basis (e.g., B-spline basis) and group bridge estimation, respectively. We further develop an optimization method based on quadratic approximation of the likelihood function for the estimation of these models. Simulation and experimental results show that both group-LASSO-Laguerre and group-bridge-B-spline can capture faithfully the global sparsities, while the latter can replicate accurately and simultaneously both global and local sparsities. The sparse models outperform the full models estimated with the standard maximum likelihood method in out-of-sample predictions. 相似文献

13.

Penalized spline smoothing using Kaplan–Meier weights with censored data

下载免费PDF全文

Jesus Orbe Jorge Virto 《Biometrical journal. Biometrische Zeitschrift》2018,60(5):947-961

In this paper, we consider the problem of nonparametric curve fitting in the specific context of censored data. We propose an extension of the penalized splines approach using Kaplan–Meier weights to take into account the effect of censorship and generalized cross‐validation techniques to choose the smoothing parameter adapted to the case of censored samples. Using various simulation studies, we analyze the effectiveness of the censored penalized splines method proposed and show that the performance is quite satisfactory. We have extended this proposal to a generalized additive models (GAM) framework introducing a correction of the censorship effect, thus enabling more complex models to be estimated immediately. A real dataset from Stanford Heart Transplant data is also used to illustrate the methodology proposed, which is shown to be a good alternative when the probability distribution for the response variable and the functional form are not known in censored regression models. 相似文献

14.

Why the beta-function cannot be used to estimate skewness of species responses

Jari Oksanen 《植被学杂志》1997,8(1):147-152

Abstract. The beta-function (β-function) has been suggested for testing the significance of the skewness of species responses along a gradient. However, the location of the optimum and skewness are correlated so that these parameters cannot be estimated independently. The only way for an independent estimation is to let the endpoints of the response curve vary. In that case they would no longer define the range of species occurrence. However, non-linear estimation of endpoints often leads to overwhelming problems in model fitting. Therefore, the beta-function is not suitable to test the shape of species response curves. Hierarchic models proposed by Huisman et al. (1993) seem to be superior to generalized additive models or third-degree polynomials and seem to be the best alternative to study the skewness of responses. 相似文献

15.

Two-component mixture cure rate model with spline estimated nonparametric components

Wang L Du P Liang H 《Biometrics》2012,68(3):726-735

Summary In some survival analysis of medical studies, there are often long-term survivors who can be considered as permanently cured. The goals in these studies are to estimate the noncured probability of the whole population and the hazard rate of the susceptible subpopulation. When covariates are present as often happens in practice, to understand covariate effects on the noncured probability and hazard rate is of equal importance. The existing methods are limited to parametric and semiparametric models. We propose a two-component mixture cure rate model with nonparametric forms for both the cure probability and the hazard rate function. Identifiability of the model is guaranteed by an additive assumption that allows no time-covariate interactions in the logarithm of hazard rate. Estimation is carried out by an expectation-maximization algorithm on maximizing a penalized likelihood. For inferential purpose, we apply the Louis formula to obtain point-wise confidence intervals for noncured probability and hazard rate. Asymptotic convergence rates of our function estimates are established. We then evaluate the proposed method by extensive simulations. We analyze the survival data from a melanoma study and find interesting patterns for this study. 相似文献

16.

Penalized estimating equations 总被引：1，自引：0，他引：1

Fu WJ 《Biometrics》2003,59(1):126-132

Penalty models--such as the ridge estimator, the Stein estimator, the bridge estimator, and the Lasso-have been proposed to deal with collinearity in regressions. The Lasso, for instance, has been applied to linear models, logistic regressions, Cox proportional hazard models, and neural networks. This article considers the bridge penalty model with penalty sigma(j)/beta(j)/gamma for estimating equations in general and applies this penalty model to the generalized estimating equations (GEE) in longitudinal studies. The lack of joint likelihood in the GEE is overcome by the penalized estimating equations, in which no joint likelihood is required. The asymptotic results for the penalty estimator are provided. It is demonstrated, with a simulation and an application, that the penalized GEE potentially improves the performance of the GEE estimator, and enjoys the same properties as linear penalty models. 相似文献

17.

Detection of significant disease risks using a spatial conditional autoregressive model

Escaramís G Carrasco JL Ascaso C 《Biometrics》2008,64(4):1043-1053

SUMMARY: The conditional autoregressive (CAR) model is widely used to describe the geographical distribution of a specific disease risk in lattice mapping. Successful developments based on frequentist and Bayesian procedures have been extensively applied to obtain two-stage disease risk predictions at the subregional level. Bayesian procedures are preferred for making inferences, as the posterior standard errors (SE) of the two-stage prediction account for the variability in the variance component estimates; however, some recent work based on frequentist procedures and the use of bootstrap adjustments for the SE has been undertaken. In this article we investigate the suitability of an analytical adjustment for disease risk inference that provides accurate interval predictions by using the penalized quasilikelihood (PQL) technique to obtain model parameter estimates. The method is a first-order approximation of the naive SE based on a Taylor expansion and is interpreted as a conditional measure of variability providing conditional calibrated prediction intervals, given the data. We conduct a simulation study to demonstrate how the method can be used to estimate the specific subregion risk by interval. We evaluate the proposed methodology by analyzing the commonly used example data set of lip cancer incidence in the 56 counties of Scotland for the period 1975-1980. This evaluation reveals a close similarity between the solutions provided by the method proposed here and those of its fully Bayesian counterpart. 相似文献

18.

Generalized additive models for cancer mapping with incomplete covariates

French JL Wand MP 《Biostatistics (Oxford, England)》2004,5(2):177-191

Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk. 相似文献

19.

Differential gene expression detection and sample classification using penalized linear regression models

Wu B 《Bioinformatics (Oxford, England)》2006,22(4):472-476

Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. 相似文献

20.

Variable selection for nonparametric additive Cox model with interval-censored data

Tian Tian Jianguo Sun 《Biometrical journal. Biometrische Zeitschrift》2023,65(1):2100310

The standard Cox model is perhaps the most commonly used model for regression analysis of failure time data but it has some limitations such as the assumption on linear covariate effects. To relax this, the nonparametric additive Cox model, which allows for nonlinear covariate effects, is often employed, and this paper will discuss variable selection and structure estimation for this general model. For the problem, we propose a penalized sieve maximum likelihood approach with the use of Bernstein polynomials approximation and group penalization. To implement the proposed method, an efficient group coordinate descent algorithm is developed and can be easily carried out for both low- and high-dimensional scenarios. Furthermore, a simulation study is performed to assess the performance of the presented approach and suggests that it works well in practice. The proposed method is applied to an Alzheimer's disease study for identifying important and relevant genetic factors. 相似文献