首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Pang Z  Kuk AY 《Biometrics》2007,63(1):218-227
Exchangeable binary data are often collected in developmental toxicity and other studies, and a whole host of parametric distributions for fitting this kind of data have been proposed in the literature. While these distributions can be matched to have the same marginal probability and intra-cluster correlation, they can be quite different in terms of shape and higher-order quantities of interest such as the litter-level risk of having at least one malformed fetus. A sensible alternative is to fit a saturated model (Bowman and George, 1995, Journal of the American Statistical Association 90, 871-879) using the expectation-maximization (EM) algorithm proposed by Stefanescu and Turnbull (2003, Biometrics 59, 18-24). The assumption of compatibility of marginal distributions is often made to link up the distributions for different cluster sizes so that estimation can be based on the combined data. Stefanescu and Turnbull proposed a modified trend test to test this assumption. Their test, however, fails to take into account the variability of an estimated null expectation and as a result leads to inaccurate p-values. This drawback is rectified in this article. When the data are sparse, the probability function estimated using a saturated model can be very jagged and some kind of smoothing is needed. We extend the penalized likelihood method (Simonoff, 1983, Annals of Statistics 11, 208-218) to the present case of unequal cluster sizes and implement the method using an EM-type algorithm. In the presence of covariate, we propose a penalized kernel method that performs smoothing in both the covariate and response space. The proposed methods are illustrated using several data sets and the sampling and robustness properties of the resulting estimators are evaluated by simulations.  相似文献   

2.
We describe a new pathway for multivariate analysis of data consisting of counts of species abundances that includes two key components: copulas, to provide a flexible joint model of individual species, and dissimilarity‐based methods, to integrate information across species and provide a holistic view of the community. Individual species are characterized using suitable (marginal) statistical distributions, with the mean, the degree of over‐dispersion, and/or zero‐inflation being allowed to vary among a priori groups of sampling units. Associations among species are then modeled using copulas, which allow any pair of disparate types of variables to be coupled through their cumulative distribution function, while maintaining entirely the separate individual marginal distributions appropriate for each species. A Gaussian copula smoothly captures changes in an index of association that excludes joint absences in the space of the original species variables. A permutation‐based filter with exact family‐wise error can optionally be used a priori to reduce the dimensionality of the copula estimation problem. We describe in detail a Monte Carlo expectation maximization algorithm for efficient estimation of the copula correlation matrix with discrete marginal distributions (counts). The resulting fully parameterized copula models can be used to simulate realistic ecological community data under fully specified null or alternative hypotheses. Distributions of community centroids derived from simulated data can then be visualized in ordinations of ecologically meaningful dissimilarity spaces. Multinomial mixtures of data drawn from copula models also yield smooth power curves in dissimilarity‐based settings. Our proposed analysis pathway provides new opportunities to combine model‐based approaches with dissimilarity‐based methods to enhance understanding of ecological systems. We demonstrate implementation of the pathway through an ecological example, where associations among fish species were found to increase after the establishment of a marine reserve.  相似文献   

3.
Lachos VH  Bandyopadhyay D  Dey DK 《Biometrics》2011,67(4):1594-1604
HIV RNA viral load measures are often subjected to some upper and lower detection limits depending on the quantification assays. Hence, the responses are either left or right censored. Linear (and nonlinear) mixed-effects models (with modifications to accommodate censoring) are routinely used to analyze this type of data and are based on normality assumptions for the random terms. However, those analyses might not provide robust inference when the normality assumptions are questionable. In this article, we develop a Bayesian framework for censored linear (and nonlinear) models replacing the Gaussian assumptions for the random terms with normal/independent (NI) distributions. The NI is an attractive class of symmetric heavy-tailed densities that includes the normal, Student's-t, slash, and the contaminated normal distributions as special cases. The marginal likelihood is tractable (using approximations for nonlinear models) and can be used to develop Bayesian case-deletion influence diagnostics based on the Kullback-Leibler divergence. The newly developed procedures are illustrated with two HIV AIDS studies on viral loads that were initially analyzed using normal (censored) mixed-effects models, as well as simulations.  相似文献   

4.
In behavioral medicine trials, such as smoking cessation trials, 2 or more active treatments are often compared. Noncompliance by some subjects with their assigned treatment poses a challenge to the data analyst. The principal stratification framework permits inference about causal effects among subpopulations characterized by potential compliance. However, in the absence of prior information, there are 2 significant limitations: (1) the causal effects cannot be point identified for some strata and (2) individuals in the subpopulations (strata) cannot be identified. We propose to use additional information-compliance-predictive covariates-to help identify the causal effects and to help describe characteristics of the subpopulations. The probability of membership in each principal stratum is modeled as a function of these covariates. The model is constructed using marginal compliance models (which are identified) and a sensitivity parameter that captures the association between the 2 marginal distributions. We illustrate our methods in both a simulation study and an analysis of data from a smoking cessation trial.  相似文献   

5.
Müller HG  Zhang Y 《Biometrics》2005,61(4):1064-1075
A recurring objective in longitudinal studies on aging and longevity has been the investigation of the relationship between age-at-death and current values of a longitudinal covariate trajectory that quantifies reproductive or other behavioral activity. We propose a novel technique for predicting age-at-death distributions for situations where an entire covariate history is included in the predictor. The predictor trajectories up to current time are represented by time-varying functional principal component scores, which are continuously updated as time progresses and are considered to be time-varying predictor variables that are entered into a class of time-varying functional regression models that we propose. We demonstrate for biodemographic data how these methods can be applied to obtain predictions for age-at-death and estimates of remaining lifetime distributions, including estimates of quantiles and of prediction intervals for remaining lifetime. Estimates and predictions are obtained for individual subjects, based on their observed behavioral trajectories, and include a dimension-reduction step that is implemented by projecting on a single index. The proposed techniques are illustrated with data on longitudinal daily egg-laying for female medflies, predicting remaining lifetime and age-at-death distributions from individual event histories observed up to current time.  相似文献   

6.
Simultaneous spike-counts of neural populations are typically modeled by a Gaussian distribution. On short time scales, however, this distribution is too restrictive to describe and analyze multivariate distributions of discrete spike-counts. We present an alternative that is based on copulas and can account for arbitrary marginal distributions, including Poisson and negative binomial distributions as well as second and higher-order interactions. We describe maximum likelihood-based procedures for fitting copula-based models to spike-count data, and we derive a so-called flashlight transformation which makes it possible to move the tail dependence of an arbitrary copula into an arbitrary orthant of the multivariate probability distribution. Mixtures of copulas that combine different dependence structures and thereby model different driving processes simultaneously are also introduced. First, we apply copula-based models to populations of integrate-and-fire neurons receiving partially correlated input and show that the best fitting copulas provide information about the functional connectivity of coupled neurons which can be extracted using the flashlight transformation. We then apply the new method to data which were recorded from macaque prefrontal cortex using a multi-tetrode array. We find that copula-based distributions with negative binomial marginals provide an appropriate stochastic model for the multivariate spike-count distributions rather than the multivariate Poisson latent variables distribution and the often used multivariate normal distribution. The dependence structure of these distributions provides evidence for common inhibitory input to all recorded stimulus encoding neurons. Finally, we show that copula-based models can be successfully used to evaluate neural codes, e.g., to characterize stimulus-dependent spike-count distributions with information measures. This demonstrates that copula-based models are not only a versatile class of models for multivariate distributions of spike-counts, but that those models can be exploited to understand functional dependencies.  相似文献   

7.
Linear mixed effects models have been widely used in analysis of data where responses are clustered around some random effects, so it is not reasonable to assume independence between observations in the same cluster. In most biological applications, it is assumed that the distributions of the random effects and of the residuals are Gaussian. This makes inferences vulnerable to the presence of outliers. Here, linear mixed effects models with normal/independent residual distributions for robust inferences are described. Specific distributions examined include univariate and multivariate versions of the Student‐ t, the slash and the contaminated normal. A Bayesian framework is adopted and Markov chain Monte Carlo is used to carry out the posterior analysis. The procedures are illustrated using birth weight data on rats in a toxicological experiment. Results from the Gaussian and robust models are contrasted, and it is shown how the implementation can be used for outlier detection. The thick‐tailed distributions provide an appealing robust alternative to the Gaussian process in linear mixed models, and they are easily implemented using data augmentation and MCMC techniques.  相似文献   

8.
We consider a nonparametric (NP) approach to the analysis of repeated measures designs with censored data. Using the NP model of Akritas and Arnold (1994, Journal of the American Statistical Association 89, 336-343) for marginal distributions, we present test procedures for the NP hypotheses of no main effects, no interaction, and no simple effects. This extends the existing NP methodology for such designs (Wei and Lachin, 1984, Journal of the American Statistical Association 79, 653-661). The procedures do not require any modeling assumptions and should be useful in cases where the assumptions of proportional hazards or location shift fail to be satisfied. The large-sample distribution of the test statistics is based on an i.i.d. representation for Kaplan-Meier integrals. The testing procedures apply also to ordinal data and to data with ties. Useful small-sample approximations are presented, and their performance is examined in a simulation study. Finally, the methodology is illustrated with two real life examples, one with censored and one with missing data. It is indicated that one of the data sets does not conform to any set of assumptions underlying the available methods and also that the present method provides a useful additional analysis even when data sets conform to modeling assumptions.  相似文献   

9.
Ross EA  Moore D 《Biometrics》1999,55(3):813-819
We have developed methods for modeling discrete or grouped time, right-censored survival data collected from correlated groups or clusters. We assume that the marginal hazard of failure for individual items within a cluster is specified by a linear log odds survival model and the dependence structure is based on a gamma frailty model. The dependence can be modeled as a function of cluster-level covariates. Likelihood equations for estimating the model parameters are provided. Generalized estimating equations for the marginal hazard regression parameters and pseudolikelihood methods for estimating the dependence parameters are also described. Data from two clinical trials are used for illustration purposes.  相似文献   

10.
Summary .   Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines. The mean function can be viewed as a fixed effect and is estimated with a penalty for regularization. With the latent process viewed as another random effect, the model becomes a generalized linear mixed model. In our motivating data set and other applications, the sample size is too large to easily accommodate maximum likelihood or restricted maximum likelihood estimation (REML), so pairwise likelihood, a special case of composite likelihood, is used instead. We develop an asymptotic theory for models that are sufficiently general to be used in a wide variety of applications, including, but not limited to, the problem that motivated this work. The splines have penalty parameters that must converge to zero asymptotically: we derive theory for this along with a data-driven method for selecting the penalty parameter, a method that is shown in simulations to improve greatly upon standard devices, such as likelihood crossvalidation. Finally, we apply the methods to the data from our experiment ACF. We discover an unexpected location for peak formation of ACF.  相似文献   

11.
Gaussian process functional regression modeling for batch data   总被引:2,自引:0,他引:2  
A Gaussian process functional regression model is proposed for the analysis of batch data. Covariance structure and mean structure are considered simultaneously, with the covariance structure modeled by a Gaussian process regression model and the mean structure modeled by a functional regression model. The model allows the inclusion of covariates in both the covariance structure and the mean structure. It models the nonlinear relationship between a functional output variable and a set of functional and nonfunctional covariates. Several applications and simulation studies are reported and show that the method provides very good results for curve fitting and prediction.  相似文献   

12.
Modeling the joint distribution of a binary trait (disease) within families is a tedious challenge, owing to the lack of a general statistical model with desirable properties such as the multivariate Gaussian model for a quantitative trait. Models have been proposed that either assume the existence of an underlying liability variable, the reality of which cannot be checked, or provide estimates of aggregation parameters that are dependent on the ordering of family members and on family size. We describe how a class of copula models for the analysis of exchangeable categorical data can be incorporated into a familial framework. In this class of models, the joint distribution of binary outcomes is characterized by a function of the given marginals. This function, referred to as a "copula," depends on an aggregation parameter that is weakly dependent on the marginal distributions. We propose to decompose a nuclear family into two sets of equicorrelated data (parents and offspring), each of which is characterized by an aggregation parameter (alphaFM and alphaSS, respectively). The marginal probabilities are modeled through a logistic representation. The advantage of this model is that it provides estimates of the aggregation parameters that are independent of family size and does not require any arbitrary ordering of sibs. It can be incorporated easily into segregation or combined segregation-linkage analysis and does not require extensive computer time. As an illustration, we applied this model to a combined segregation-linkage analysis of levels of plasma angiotensin I-converting enzyme (ACE) dichotomized into two classes according to the median. The conclusions of this analysis were very similar to those we had reported in an earlier familial analysis of quantitative ACE levels.  相似文献   

13.
Methods for modeling sets of complex curves where the curves must be aligned in time (or in another continuous predictor) fall into the general class of functional data analysis and include self-modeling regression and time-warping procedures. Self-modeling regression (SEMOR), also known as a shape invariant model (SIM), assumes the curves have a common shape, modeled nonparametrically, and curve-specific differences in amplitude and timing, traditionally modeled by linear transformations. When curves contain multiple features that need to be aligned in time, SEMOR may be inadequate since a linear time transformation generally cannot align more than one feature. Time warping procedures focus on timing variability and on finding flexible time warps to align multiple data features. We draw on these methods to develop a SIM that models the time transformations as random, flexible, monotone functions. The model is motivated by speech movement data from the University of Wisconsin X-ray microbeam speech production project and is applied to these data to test the effect of different speaking conditions on the shape and relative timing of movement profiles.  相似文献   

14.
Clegg LX  Cai J  Sen PK 《Biometrics》1999,55(3):805-812
In multivariate failure time data analysis, a marginal regression modeling approach is often preferred to avoid assumptions on the dependence structure among correlated failure times. In this paper, a marginal mixed baseline hazards model is introduced. Estimating equations are proposed for the estimation of the marginal hazard ratio parameters. The proposed estimators are shown to be consistent and asymptotically Gaussian with a robust covariance matrix that can be consistently estimated. Simulation studies indicate the adequacy of the proposed methodology for practical sample sizes. The methodology is illustrated with a data set from the Framingham Heart Study.  相似文献   

15.
The reliable estimation of animal location, and its associated error is fundamental to animal ecology. There are many existing techniques for handling location error, but these are often ad hoc or are used in isolation from each other. In this study we present a Bayesian framework for determining location that uses all the data available, is flexible to all tagging techniques, and provides location estimates with built-in measures of uncertainty. Bayesian methods allow the contributions of multiple data sources to be decomposed into manageable components. We illustrate with two examples for two different location methods: satellite tracking and light level geo-location. We show that many of the problems with uncertainty involved are reduced and quantified by our approach. This approach can use any available information, such as existing knowledge of the animal''s potential range, light levels or direct location estimates, auxiliary data, and movement models. The approach provides a substantial contribution to the handling uncertainty in archival tag and satellite tracking data using readily available tools.  相似文献   

16.
The stochastic nature of high-throughput screening (HTS) data indicates that information may be gleaned by applying statistical methods to HTS data. A foundation of parametric statistics is the study and elucidation of population distributions, which can be modeled using modern spreadsheet software. The methods and results described here use fundamental concepts of statistical population distributions analyzed using a spreadsheet to provide tools in a developing armamentarium for extracting information from HTS data. Specific examples using two HTS kinase assays are analyzed. The analyses use normal and gamma distributions, which combine to form mixture distributions. HTS data were found to be described well using such mixture distributions, and deconvolution of the mixtures to the constituent gamma and normal parts provided insight into how the assays performed. In particular, the proportion of hits confirmed was predicted from the original HTS data and used to assess screening assay performance. The analyses also provide a method for determining how hit thresholds--values used to separate active from inactive compounds--affect the proportion of compounds verified as active and how the threshold can be chosen to optimize the selection process.  相似文献   

17.
With the increasing use of survival models in animal breeding to address the genetic aspects of mainly longevity of livestock but also disease traits, the need for methods to infer genetic correlations and to do multivariate evaluations of survival traits and other types of traits has become increasingly important. In this study we derived and implemented a bivariate quantitative genetic model for a linear Gaussian and a survival trait that are genetically and environmentally correlated. For the survival trait, we considered the Weibull log-normal animal frailty model. A Bayesian approach using Gibbs sampling was adopted. Model parameters were inferred from their marginal posterior distributions. The required fully conditional posterior distributions were derived and issues on implementation are discussed. The two Weibull baseline parameters were updated jointly using a Metropolis-Hasting step. The remaining model parameters with non-normalized fully conditional distributions were updated univariately using adaptive rejection sampling. Simulation results showed that the estimated marginal posterior distributions covered well and placed high density to the true parameter values used in the simulation of data. In conclusion, the proposed method allows inferring additive genetic and environmental correlations, and doing multivariate genetic evaluation of a linear Gaussian trait and a survival trait.  相似文献   

18.
Nonparametric mixed effects models for unequally sampled noisy curves   总被引:7,自引:0,他引:7  
Rice JA  Wu CO 《Biometrics》2001,57(1):253-259
We propose a method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients. The method is applicable when the individual curves are sampled at variable and irregularly spaced points. This produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm. Smooth curves for individual trajectories are constructed as best linear unbiased predictor (BLUP) estimates, combining data from that individual and the entire collection. This framework leads naturally to methods for examining the effects of covariates on the shapes of the curves. We use model selection techniques--Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross-validation--to select the number of breakpoints for the spline approximation. We believe that the methodology we propose provides a simple, flexible, and computationally efficient means of functional data analysis.  相似文献   

19.
In this article, we propose a two-stage approach to modeling multilevel clustered non-Gaussian data with sufficiently large numbers of continuous measures per cluster. Such data are common in biological and medical studies utilizing monitoring or image-processing equipment. We consider a general class of hierarchical models that generalizes the model in the global two-stage (GTS) method for nonlinear mixed effects models by using any square-root-n-consistent and asymptotically normal estimators from stage 1 as pseudodata in the stage 2 model, and by extending the stage 2 model to accommodate random effects from multiple levels of clustering. The second-stage model is a standard linear mixed effects model with normal random effects, but the cluster-specific distributions, conditional on random effects, can be non-Gaussian. This methodology provides a flexible framework for modeling not only a location parameter but also other characteristics of conditional distributions that may be of specific interest. For estimation of the population parameters, we propose a conditional restricted maximum likelihood (CREML) approach and establish the asymptotic properties of the CREML estimators. The proposed general approach is illustrated using quartiles as cluster-specific parameters estimated in the first stage, and applied to the data example from a collagen fibril development study. We demonstrate using simulations that in samples with small numbers of independent clusters, the CREML estimators may perform better than conditional maximum likelihood estimators, which are a direct extension of the estimators from the GTS method.  相似文献   

20.
Studies of latent traits often collect data for multiple items measuring different aspects of the trait. For such data, it is common to consider models in which the different items are manifestations of a normal latent variable, which depends on covariates through a linear regression model. This article proposes a flexible Bayesian alternative in which the unknown latent variable density can change dynamically in location and shape across levels of a predictor. Scale mixtures of underlying normals are used in order to model flexibly the measurement errors and allow mixed categorical and continuous scales. A dynamic mixture of Dirichlet processes is used to characterize the latent response distributions. Posterior computation proceeds via a Markov chain Monte Carlo algorithm, with predictive densities used as a basis for inferences and evaluation of model fit. The methods are illustrated using data from a study of DNA damage in response to oxidative stress.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号