首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A Bayesian model for sparse functional data   总被引:1,自引:0,他引:1  
Thompson WK  Rosen O 《Biometrics》2008,64(1):54-63
Summary.   We propose a method for analyzing data which consist of curves on multiple individuals, i.e., longitudinal or functional data. We use a Bayesian model where curves are expressed as linear combinations of B-splines with random coefficients. The curves are estimated as posterior means obtained via Markov chain Monte Carlo (MCMC) methods, which automatically select the local level of smoothing. The method is applicable to situations where curves are sampled sparsely and/or at irregular time points. We construct posterior credible intervals for the mean curve and for the individual curves. This methodology provides unified, efficient, and flexible means for smoothing functional data.  相似文献   

2.
Chen H  Wang Y 《Biometrics》2011,67(3):861-870
In this article, we propose penalized spline (P-spline)-based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects, and residual measurement error processes. Using P-splines, we propose nonparametric estimation of the population mean function, varying coefficient, random subject-specific curves, and the associated covariance function that represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Proposed methods offer flexible estimation of both the population- and subject-level curves. In addition, decomposing variability of the outcomes as a between- and within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. We use a likelihood-based method to select multiple smoothing parameters. Furthermore, we study the asymptotics of the baseline P-spline estimator with longitudinal data. We conduct simulation studies to investigate performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data, where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights. We also apply the proposed methods to estimate the effect of antihypertensive treatment from the Framingham Heart Study data.  相似文献   

3.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

4.
We provide statistical analysis methods for samples of curves in two or more dimensions, where the image, but not the parameterization of the curves, is of interest and suitable alignment/registration is thus necessary. Examples are handwritten letters, movement paths, or object outlines. We focus in particular on the computation of (smooth) means and distances, allowing, for example, classification or clustering. Existing parameterization invariant analysis methods based on the elastic distance of the curves modulo parameterization, using the square-root-velocity framework, have limitations in common realistic settings where curves are irregularly and potentially sparsely observed. We propose using spline curves to model smooth or polygonal (Fréchet) means of open or closed curves with respect to the elastic distance and show identifiability of the spline model modulo parameterization. We further provide methods and algorithms to approximate the elastic distance for irregularly or sparsely observed curves, via interpreting them as polygons. We illustrate the usefulness of our methods on two datasets. The first application classifies irregularly sampled spirals drawn by Parkinson's patients and healthy controls, based on the elastic distance to a mean spiral curve computed using our approach. The second application clusters sparsely sampled GPS tracks based on the elastic distance and computes smooth cluster means to find new paths on the Tempelhof field in Berlin. All methods are implemented in the R-package “elasdics” and evaluated in simulations.  相似文献   

5.
The cubic smoothing spline has been a popular method for detrending tree-ring data since the 1980s. The common implementation of this procedure (e.g., ARSTAN, dplR) uses a unique method for determining the smoothing parameter that is widely known as the %n criterion. However, this smoothing parameter selection method carries the assumption that end point effects are ignorable. In this paper, we complete the mathematical derivation and show how the original method differs from the complete version, both in the interpretation of the smoothing parameter and in the spline fit. Frequency response curves (FRC) demonstrate how the smoothing parameter is affected by the original assumption. For example, the FRC results indicate that a tree core of 250-year length has a 14% difference in the cut-off frequency when looking at the 67%n criterion. The FRC analysis shows that the existing approach produces a more flexible fit than anticipated, i.e., it is removing more variance than previously thought. For example, a 67%n spline under the existing approach corresponds to a 53%n spline fit. By using both simulated tree-core sequences and a dataset from a Midwest forest, we discuss which conditions result in greater differences between the spline fits and which conditions will have small differences. Tree-core sequences that have more curvature, such as a large-amplitude growth release, will lead to greater differences. Finally, we provide approximations to the end-point effect procedure. For example, using an 83%n criterion under the original approach produces a spline fit approximating the 67%n fit under the complete approach. These approximations could be easily implemented within existing programs like ARSTAN.  相似文献   

6.
We describe an approach to analysis of growth that does not depend on assumptions about the underlying functional growth pattern and that allows for multiple observations arising from individual-specific, irregularly spaced data. We produce estimated growth curves for predefined subject groups by using LOWESS, a nonparametric smoothing algorithm. We describe how statistical significance of curve features may be evaluated by using the “jackknife,” a sample re-use method; this technique can be used to assess differences between subject groups. We then obtain residuals at each data point by reference to the estimated curve. Consistency of residuals is evaluated as a characteristic of individual subjects, and in the presence of individual consistency, relative size-for-age is then scored by the average residual for each individual. This allows study of relationships between relative size and other individual characteristics such as birth order, dominance rank, or age of maturation. Finally, we indicate flexibility of these methods and alternatives, propose uses related to other questions about growth, and suggest potential applications to variables other than body size. Appendices demonstrate application of the LOWESS and jackknife algorithms to the problem of testing sex differences in growth. © 1992 Wiley-Liss, Inc.  相似文献   

7.
Functional data are smooth, often continuous, random curves, which can be seen as an extreme case of multivariate data with infinite dimensionality. Just as componentwise inference for multivariate data naturally performs feature selection, subsetwise inference for functional data performs domain selection. In this paper, we present a unified testing framework for domain selection on populations of functional data. In detail, p-values of hypothesis tests performed on pointwise evaluations of functional data are suitably adjusted for providing control of the familywise error rate (FWER) over a family of subsets of the domain. We show that several state-of-the-art domain selection methods fit within this framework and differ from each other by the choice of the family over which the control of the FWER is provided. In the existing literature, these families are always defined a priori. In this work, we also propose a novel approach, coined thresholdwise testing, in which the family of subsets is instead built in a data-driven fashion. The method seamlessly generalizes to multidimensional domains in contrast to methods based on a priori defined families. We provide theoretical results with respect to consistency and control of the FWER for the methods within the unified framework. We illustrate the performance of the methods within the unified framework on simulated and real data examples and compare their performance with other existing methods.  相似文献   

8.
9.
Huang L  Chen MH  Ibrahim JG 《Biometrics》2005,61(3):767-780
We propose Bayesian methods for estimating parameters in generalized linear models (GLMs) with nonignorably missing covariate data. We show that when improper uniform priors are used for the regression coefficients, phi, of the multinomial selection model for the missing data mechanism, the resulting joint posterior will always be improper if (i) all missing covariates are discrete and an intercept is included in the selection model for the missing data mechanism, or (ii) at least one of the covariates is continuous and unbounded. This impropriety will result regardless of whether proper or improper priors are specified for the regression parameters, beta, of the GLM or the parameters, alpha, of the covariate distribution. To overcome this problem, we propose a novel class of proper priors for the regression coefficients, phi, in the selection model for the missing data mechanism. These priors are robust and computationally attractive in the sense that inferences about beta are not sensitive to the choice of the hyperparameters of the prior for phi and they facilitate a Gibbs sampling scheme that leads to accelerated convergence. In addition, we extend the model assessment criterion of Chen, Dey, and Ibrahim (2004a, Biometrika 91, 45-63), called the weighted L measure, to GLMs and missing data problems as well as extend the deviance information criterion (DIC) of Spiegelhalter et al. (2002, Journal of the Royal Statistical Society B 64, 583-639) for assessing whether the missing data mechanism is ignorable or nonignorable. A novel Markov chain Monte Carlo sampling algorithm is also developed for carrying out posterior computation. Several simulations are given to investigate the performance of the proposed Bayesian criteria as well as the sensitivity of the prior specification. Real datasets from a melanoma cancer clinical trial and a liver cancer study are presented to further illustrate the proposed methods.  相似文献   

10.
A new method of extracting information about bacterial speeds from photon correlation spectroscopy is presented. This method has the advantage that an estimation of the translational speed distribution is directly varied so as to achieve a best least-squares fit to the experimental autocorrelation function. The theory of spline approximations to continuous functions is briefly outlined. The importance of the previously disregarded diffusional component of bacterial motion is discussed. Experimental data from Salmonella at a low scattering angle is analyzed by this method of spline approximation and the distribution of translational speeds is obtained.  相似文献   

11.
Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk.  相似文献   

12.
Du P  Jiang Y  Wang Y 《Biometrics》2011,67(4):1330-1339
Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data.  相似文献   

13.
Grigoletto M  Akritas MG 《Biometrics》1999,55(4):1177-1187
We propose a method for fitting semiparametric models such as the proportional hazards (PH), additive risks (AR), and proportional odds (PO) models. Each of these semiparametric models implies that some transformation of the conditional cumulative hazard function (at each t) depends linearly on the covariates. The proposed method is based on nonparametric estimation of the conditional cumulative hazard function, forming a weighted average over a range of t-values, and subsequent use of least squares to estimate the parameters suggested by each model. An approximation to the optimal weight function is given. This allows semiparametric models to be fitted even in incomplete data cases where the partial likelihood fails (e.g., left censoring, right truncation). However, the main advantage of this method rests in the fact that neither the interpretation of the parameters nor the validity of the analysis depend on the appropriateness of the PH or any of the other semiparametric models. In fact, we propose an integrated method for data analysis where the role of the various semiparametric models is to suggest the best fitting transformation. A single continuous covariate and several categorical covariates (factors) are allowed. Simulation studies indicate that the test statistics and confidence intervals have good small-sample performance. A real data set is analyzed.  相似文献   

14.
Summary We explore the use of a posterior predictive loss criterion for model selection for incomplete longitudinal data. We begin by identifying a property that most model selection criteria for incomplete data should consider. We then show that a straightforward extension of the Gelfand and Ghosh (1998, Biometrika, 85 , 1–11) criterion to incomplete data has two problems. First, it introduces an extra term (in addition to the goodness of fit and penalty terms) that compromises the criterion. Second, it does not satisfy the aforementioned property. We propose an alternative and explore its properties via simulations and on a real dataset and compare it to the deviance information criterion (DIC). In general, the DIC outperforms the posterior predictive criterion, but the latter criterion appears to work well overall and is very easy to compute unlike the DIC in certain classes of models for missing data.  相似文献   

15.
Chenlin Zhang  Huazhen Lin  Li Liu  Jin Liu  Yi Li 《Biometrics》2023,79(3):2232-2245
Functional data analysis has emerged as a powerful tool in response to the ever-increasing resources and efforts devoted to collecting information about response curves or anything that varies over a continuum. However, limited progress has been made with regard to linking the covariance structures of response curves to external covariates, as most functional models assume a common covariance structure. We propose a new functional regression model with covariate-dependent mean and covariance structures. Particularly, by allowing variances of random scores to be covariate-dependent, we identify eigenfunctions for each individual from the set of eigenfunctions that govern the variation patterns across all individuals, resulting in high interpretability and prediction power. We further propose a new penalized quasi-likelihood procedure that combines regularization and B-spline smoothing for model selection and estimation and establish the convergence rate and asymptotic normality of the proposed estimators. The utility of the developed method is demonstrated via simulations, as well as an analysis of the Avon Longitudinal Study of Parents and Children concerning parental effects on the growth curves of their offspring, which yields biologically interesting results.  相似文献   

16.
Graphical models play an important role in neuroscience studies, particularly in brain connectivity analysis. Typically, observations/samples are from several heterogenous groups and the group membership of each observation/sample is unavailable, which poses a great challenge for graph structure learning. In this paper, we propose a method which can achieve Simultaneous Clustering and Estimation of Heterogeneous Graphs (briefly denoted as SCEHG) for matrix-variate functional magnetic resonance imaging (fMRI) data. Unlike the conventional clustering methods which rely on the mean differences of various groups, the proposed SCEHG method fully exploits the group differences of conditional dependence relationships among brain regions for learning cluster structure. In essence, by constructing individual-level between-region network measures, we formulate clustering as penalized regression with grouping and sparsity pursuit, which transforms the unsupervised learning into supervised learning. A modified difference of convex programming with the alternating direction method of multipliers (DC-ADMM) algorithm is proposed to solve the corresponding optimization problem. We also propose a generalized criterion to specify the number of clusters. Extensive simulation studies illustrate the superiority of the SCEHG method over some state-of-the-art methods in terms of both clustering and graph recovery accuracy. We also apply the SCEHG procedure to analyze fMRI data associated with attention-deficit hyperactivity disorder (ADHD), which illustrates its empirical usefulness.  相似文献   

17.
Yang YC  Liu A  Wang Y 《Biometrics》2006,62(1):230-238
Neuroendocrine ensembles communicate with their remote and proximal target cells via an intermittent pattern of chemical signaling. The identification of episodic releases of hormonal pulse signals constitutes a major emphasis of endocrine investigation. Estimating the number, temporal locations, secretion rate, and elimination rate from hormone concentration measurements is of critical importance in endocrinology. In this article, we propose a new flexible statistical method for pulse detection based on nonlinear mixed effects partial spline models. We model pulsatile secretions using biophysical models and investigate biological variation between pulses using random effects. Pooling information from different pulses provides more efficient and stable estimation for parameters of interest. We combine all nuisance parameters including a nonconstant basal secretion rate and biological variations into a baseline function that is modeled nonparametrically using smoothing splines. We develop model selection and parameter estimation methods for the general nonlinear mixed effects partial spline models and an R package for pulse detection and estimation. We evaluate performance and the benefit of shrinkage by simulations and apply our methods to data from a medical experiment.  相似文献   

18.
Hormone secretion processes remain part of the biological mystery as they are highly regulated and individual specific. When hormone trajectories from multiple subjects are under investigation, both population-average mechanism and subject-specific deviations are of great interest. In particular, statistical methodologies that enable us not only to identify surge times and surge magnitudes but also to make inference on these biological features is in need. In this paper, we propose a local kernel smoothing method to perform the analysis of multiple hormone curves using the nonparametric mixed-effects model. We develop a local quadratic mixed-effects (LQME) fitting procedure that detects local maxima of the population-average profile curve and the individual profile curves. Related statistical inference is established to carry out a hypothesis test for the local surge and to construct a confidence interval for a detected surge feature. This method is illustrated by simulation studies and a reproductive hormone data analysis.  相似文献   

19.
We have considered a Bayesian approach for the nonlinear regression model by replacing the normal distribution on the error term by some skewed distributions, which account for both skewness and heavy tails or skewness alone. The type of data considered in this paper concerns repeated measurements taken in time on a set of individuals. Such multiple observations on the same individual generally produce serially correlated outcomes. Thus, additionally, our model does allow for a correlation between observations made from the same individual. We have illustrated the procedure using a data set to study the growth curves of a clinic measurement of a group of pregnant women from an obstetrics clinic in Santiago, Chile. Parameter estimation and prediction were carried out using appropriate posterior simulation schemes based in Markov Chain Monte Carlo methods. Besides the deviance information criterion (DIC) and the conditional predictive ordinate (CPO), we suggest the use of proper scoring rules based on the posterior predictive distribution for comparing models. For our data set, all these criteria chose the skew‐t model as the best model for the errors. These DIC and CPO criteria are also validated, for the model proposed here, through a simulation study. As a conclusion of this study, the DIC criterion is not trustful for this kind of complex model.  相似文献   

20.
Tan YD 《Genomics》2011,98(5):390-399
Receiver operating characteristic (ROC) has been widely used to evaluate statistical methods, but a fatal problem is that ROC cannot evaluate estimation of the false discovery rate (FDR) of a statistical method and hence the area under of curve as a criterion cannot tell us if a statistical method is conservative. To address this issue, we propose an alternative criterion, work efficiency. Work efficiency is defined as the product of the power and degree of conservativeness of a statistical method. We conducted large-scale simulation comparisons among the optimizing discovery procedure (ODP), the Bonferroni (B-) procedure, Local FDR (Localfdr), ranking analysis of the F-statistics (RAF), the Benjamini-Hochberg (BH-) procedure, and significance analysis of microarray data (SAM). The results show that ODP, SAM, and the B-procedure perform with low efficiencies while the BH-procedure, RAF, and Localfdr work with higher efficiency. ODP and SAM have the same ROC curves but their efficiencies are significantly different.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号