期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Functional data analysis for longitudinal data with informative observation times

Caleb Weaver Luo Xiao Wenbin Lu 《Biometrics》2023,79(2):722-733

In functional data analysis for longitudinal data, the observation process is typically assumed to be noninformative, which is often violated in real applications. Thus, methods that fail to account for the dependence between observation times and longitudinal outcomes may result in biased estimation. For longitudinal data with informative observation times, we find that under a general class of shared random effect models, a commonly used functional data method may lead to inconsistent model estimation while another functional data method results in consistent and even rate-optimal estimation. Indeed, we show that the mean function can be estimated appropriately via penalized splines and that the covariance function can be estimated appropriately via penalized tensor-product splines, both with specific choices of parameters. For the proposed method, theoretical results are provided, and simulation studies and a real data analysis are conducted to demonstrate its performance. 相似文献

2.

Structured additive regression for categorical space-time data: a mixed model approach 总被引：1，自引：0，他引：1

Kneib T Fahrmeir L 《Biometrics》2006,62(1):109-118

Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey. 相似文献

3.

Simple Incorporation of Interactions into Additive Models

Brent A. Coull David Ruppert M. P. Wand 《Biometrics》2001,57(2):539-545

Often, the functional form of covariate effects in an additive model varies across groups defined by levels of a categorical variable. This structure represents a factor-by-curve interaction. This article presents penalized spline models that incorporate factor-by-curve interactions into additive models. A mixed model formulation for penalized splines allows for straightforward model fitting and smoothing parameter selection. We illustrate the proposed model by applying it to pollen ragweed data in which seasonal trends vary by year. 相似文献

4.

On Testing an Unspecified Function Through a Linear Mixed Effects Model with Multiple Variance Components

Yuanjia Wang Huaihou Chen 《Biometrics》2012,68(4):1113-1125

Summary We examine a generalized F ‐test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying‐coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two‐way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F ‐test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome‐wide critical value and p ‐value of a genetic association test in a genome‐wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 10⁸ simulations) and asymptotic approximation may be unreliable and conservative. 相似文献

5.

Relative Efficiencies of a Class of Goodness of Fit Tests for Randomly Censored Data

Dr. J. A. Koziol 《Biometrical journal. Biometrische Zeitschrift》1987,29(3):323-330

We investigate the relative performances of a class of goodness of fit procedures for randomly censored data. For purposes of planning experiments, we quantify the loss of information induced by censorship. We evaluate efficiencies against particular alternatives of interest in survival studies, as the amount of censorship increases. We caution against attributing various power and efficiency properties of the goodness of fit criteria that are obtained under no censorship to situations where the censorship is far from negligible. 相似文献

6.

Semiparametric bayes' proportional odds models for current status data with underreporting

Wang L Dunson DB 《Biometrics》2011,67(3):1111-1118

Current status data are a type of interval-censored event time data in which all the individuals are either left or right censored. For example, our motivation is drawn from a cross-sectional study, which measured whether or not fibroid onset had occurred by the age of an ultrasound exam for each woman. We propose a semiparametric Bayesian proportional odds model in which the baseline event time distribution is estimated nonparametrically by using adaptive monotone splines in a logistic regression model and the potential risk factors are included in the parametric part of the mean structure. The proposed approach has the advantage of being straightforward to implement using a simple and efficient Gibbs sampler, whereas alternative semiparametric Bayes' event time models encounter problems for current status data. The model is generalized to allow systematic underreporting in a subset of the data, and the methods are applied to an epidemiologic study of uterine fibroids. 相似文献

7.

Doubly penalized buckley-james method for survival data with high-dimensional covariates.

Sijian Wang Bin Nan Ji Zhu David G Beer 《Biometrics》2008,64(1):132-140

Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley-James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L1- and L2-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study. 相似文献

8.

Censored partial regression

Orbe J Ferreira E Núñez-Antón V 《Biostatistics (Oxford, England)》2003,4(1):109-121

In this work we study the effect of several covariates on a censored response variable with unknown probability distribution. A semiparametric model is proposed to consider situations where the functional form of the effect of one or more covariates is unknown, as is the case in the application presented in this work. We provide its estimation procedure and, in addition, a bootstrap technique to make inference on the parameters. A simulation study has been carried out to show the good performance of the proposed estimation process and to analyse the effect of the censorship. Finally, we present the results when the methodology is applied to AIDS diagnosed patients. 相似文献

9.

A Bayesian Chi‐Squared Goodness‐of‐Fit Test for Censored Data Models

Jing Cao Ann Moosman Valen E. Johnson 《Biometrics》2010,66(2):426-434

Summary We propose a Bayesian chi‐squared model diagnostic for analysis of data subject to censoring. The test statistic has the form of Pearson's chi‐squared test statistic and is easy to calculate from standard output of Markov chain Monte Carlo algorithms. The key innovation of this diagnostic is that it is based only on observed failure times. Because it does not rely on the imputation of failure times for observations that have been censored, we show that under heavy censoring it can have higher power for detecting model departures than a comparable test based on the complete data. In a simulation study, we show that tests based on this diagnostic exhibit comparable power and better nominal Type I error rates than a commonly used alternative test proposed by Akritas (1988, Journal of the American Statistical Association 83, 222–230). An important advantage of the proposed diagnostic is that it can be applied to a broad class of censored data models, including generalized linear models and other models with nonidentically distributed and nonadditive error structures. We illustrate the proposed model diagnostic for testing the adequacy of two parametric survival models for Space Shuttle main engine failures. 相似文献

10.

Bayesian Neural Network Models for Censored Data

David Faraggi R. Simon E. Yaskil A. Kramar 《Biometrical journal. Biometrische Zeitschrift》1997,39(5):519-532

Neural networks are considered by many to be very promising tools for classification and prediction. The flexibility of the neural network models often result in over-fit. Shrinking the parameters using a penalized likelihood is often used in order to overcome such over-fit. In this paper we extend the approach proposed by FARAGGI and SIMON (1995a) to modeling censored survival data using the input-output relationship associated with a single hidden layer feed-forward neural network. Instead of estimating the neural network parameters using the method of maximum likelihood, we place normal prior distributions on the parameters and make inferences based on derived posterior distributions of the parameters. This Bayesian formulation will result in shrinking the parameters of the neural network model and will reduce the over-fit compared with the maximum likelihood estimators. We illustrate our proposed method on a simulated and a real example. 相似文献

11.

Doubly Penalized Buckley–James Method for Survival Data with High-Dimensional Covariates

Sijian Wang Bin Nan Ji Zhu David G. Beer 《Biometrics》2008,64(1):132-140

Summary . Recent interest in cancer research focuses on predicting patients' survival by investigating gene expression profiles based on microarray analysis. We propose a doubly penalized Buckley–James method for the semiparametric accelerated failure time model to relate high-dimensional genomic data to censored survival outcomes, which uses the elastic-net penalty that is a mixture of L ₁- and L ₂-norm penalties. Similar to the elastic-net method for a linear regression model with uncensored data, the proposed method performs automatic gene selection and parameter estimation, where highly correlated genes are able to be selected (or removed) together. The two-dimensional tuning parameter is determined by generalized crossvalidation. The proposed method is evaluated by simulations and applied to the Michigan squamous cell lung carcinoma study. 相似文献

12.

Differential gene expression detection and sample classification using penalized linear regression models

Wu B 《Bioinformatics (Oxford, England)》2006,22(4):472-476

Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection. 相似文献

13.

Varying coefficient model with unknown within-subject covariance for analysis of tumor growth curves

Krafty RT Gimotty PA Holtz D Coukos G Guo W 《Biometrics》2008,64(4):1023-1031

SUMMARY: In this article we develop a nonparametric estimation procedure for the varying coefficient model when the within-subject covariance is unknown. Extending the idea of iterative reweighted least squares to the functional setting, we iterate between estimating the coefficients conditional on the covariance and estimating the functional covariance conditional on the coefficients. Smoothing splines for correlated errors are used to estimate the functional coefficients with smoothing parameters selected via the generalized maximum likelihood. The covariance is nonparametrically estimated using a penalized estimator with smoothing parameters chosen via a Kullback-Leibler criterion. Empirical properties of the proposed method are demonstrated in simulations and the method is applied to the data collected from an ovarian tumor study in mice to analyze the effects of different chemotherapy treatments on the volumes of two classes of tumors. 相似文献

14.

Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression

Morrissey ER Juárez MA Denby KJ Burroughs NJ 《Biostatistics (Oxford, England)》2011,12(4):682-694

We propose a semiparametric Bayesian model, based on penalized splines, for the recovery of the time-invariant topology of a causal interaction network from longitudinal data. Our motivation is inference of gene regulatory networks from low-resolution microarray time series, where existence of nonlinear interactions is well known. Parenthood relations are mapped by augmenting the model with kinship indicators and providing these with either an overall or gene-wise hierarchical structure. Appropriate specification of the prior is crucial to control the flexibility of the splines, especially under circumstances of scarce data; thus, we provide an informative, proper prior. Substantive improvement in network inference over a linear model is demonstrated using synthetic data drawn from ordinary differential equation models and gene expression from an experimental data set of the Arabidopsis thaliana circadian rhythm. 相似文献

15.

Variable selection – A review and recommendations for the practicing statistician

下载免费PDF全文

Georg Heinze Christine Wallisch Daniela Dunkler 《Biometrical journal. Biometrische Zeitschrift》2018,60(3):431-449

Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms. 相似文献

16.

Generalized additive modeling with implicit variable selection by likelihood-based boosting

Tutz G Binder H 《Biometrics》2006,62(4):961-971

The use of generalized additive models in statistical data analysis suffers from the restriction to few explanatory variables and the problems of selection of smoothing parameters. Generalized additive model boosting circumvents these problems by means of stagewise fitting of weak learners. A fitting procedure is derived which works for all simple exponential family distributions, including binomial, Poisson, and normal response variables. The procedure combines the selection of variables and the determination of the appropriate amount of smoothing. Penalized regression splines and the newly introduced penalized stumps are considered as weak learners. Estimates of standard deviations and stopping criteria, which are notorious problems in iterative procedures, are based on an approximate hat matrix. The method is shown to be a strong competitor to common procedures for the fitting of generalized additive models. In particular, in high-dimensional settings with many nuisance predictor variables it performs very well. 相似文献

17.

Modeling antitumor activity in xenograft tumor treatment

Liang H 《Biometrical journal. Biometrische Zeitschrift》2005,47(3):358-368

To analyze responses of solid tumors to treatment with antitumor therapy, we applied nonparametric mixed-effects models to investigate tumor volumes measured over a fixed. The population and individual response functions were approximated by penalized splines. Linear mixed-effects modeling was applied in the implementation of the estimation. We applied the approach to an analysis of a real xenograft study of a new antitumor agent, temozolomide, combined with irinotecan. The model fitted the data very well. We conducted a sensitivity analysis to determine the effect of informative dropout. We also propose an intuitive approach to a comparison of the antitumor effects of two different treatments. Biological interpretations and clinical implications are discussed. 相似文献

18.

Semiparametric frailty models for clustered failure time data

Yu Z Lin X Tu W 《Biometrics》2012,68(2):429-436

We consider frailty models with additive semiparametric covariate effects for clustered failure time data. We propose a doubly penalized partial likelihood (DPPL) procedure to estimate the nonparametric functions using smoothing splines. We show that the DPPL estimators could be obtained from fitting an augmented working frailty model with parametric covariate effects, whereas the nonparametric functions being estimated as linear combinations of fixed and random effects, and the smoothing parameters being estimated as extra variance components. This approach allows us to conveniently estimate all model components within a unified frailty model framework. We evaluate the finite sample performance of the proposed method via a simulation study, and apply the method to analyze data from a study of sexually transmitted infections (STI). 相似文献

19.

Semiparametric models for missing covariate and response data in regression models

Chen Q Ibrahim JG 《Biometrics》2006,62(1):177-184

We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods. 相似文献

20.

Extending Tests of Random Effects to Assess for Measurement Invariance in Factor Models

Zhenzhen Zhang Thomas M. Braun Karen E. Peterson Howard Hu Martha M. Téllez-Rojo Brisa N. Sánchez 《Statistics in biosciences》2018,10(3):634-650

Factor analysis models are widely used in health research to summarize hard-to-measure predictor or outcome variable constructs. For example, in the ELEMENT study, factor models are used to summarize lead exposure biomarkers which are thought to indirectly measure prenatal exposure to lead. Classic latent factor models are fitted assuming that factor loadings are constant across all covariate levels (e.g., maternal age in ELEMENT); that is, measurement invariance (MI) is assumed. When the MI is not met, measurement bias is introduced. Traditionally, MI is examined by defining subgroups of the data based on covariates, fitting multi-group factor analysis, and testing differences in factor loadings across covariate groups. In this paper, we develop novel tests of measurement invariance by modeling the factor loadings as varying coefficients, i.e., letting the factor loading vary across continuous covariate values instead of groups. These varying coefficients are estimated using penalized splines, where spline coefficients are penalized by treating them as random coefficients. The test of MI is then carried out by conducting a likelihood ratio test for the null hypothesis that the variance of the random spline coefficients equals zero. We use a Monte Carlo EM algorithm for estimation, and obtain the likelihood using Monte Carlo integration. Using simulations, we compare the Type I error and power of our testing approach and the multi-group testing method. We apply the proposed methods to summarize data on prenatal biomarkers of lead exposure from the ELEMENT study and find violations of MI due to maternal age. 相似文献