首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
Within behavioural research, non‐normally distributed data with a complicated structure are common. For instance, data can represent repeated observations of quantities on the same individual. The regression analysis of such data is complicated both by the interdependency of the observations (response variables) and by their non‐normal distribution. Over the last decade, such data have been more and more frequently analysed using generalized mixed‐effect models. Some researchers invoke the heavy machinery of mixed‐effect modelling to obtain the desired population‐level (marginal) inference, which can be achieved by using simpler tools—namely by marginal models. This paper highlights marginal modelling (using generalized estimating equations [GEE]) as an alternative method. In various situations, GEE can be based on fewer assumptions and directly generate estimates (population‐level parameters) which are of immediate interest to the behavioural researcher (such as population means). Using four examples from behavioural research, we demonstrate the use, advantages, and limits of the GEE approach as implemented within the functions of the ‘geepack’ package in R.  相似文献   

2.
Quantile regression methods have been used to estimate upper and lower quantile reference curves as the function of several covariates. Especially, in survival analysis, median regression models to the right‐censored data are suggested with several assumptions. In this article, we consider a median regression model for interval‐censored data and construct an estimating equation based on weights derived from interval‐censored data. In a simulation study, the performances of the proposed method are evaluated for both symmetric and right‐skewed distributed failure times. A well‐known breast cancer data are analyzed to illustrate the proposed method.  相似文献   

3.
Clustered interval‐censored data commonly arise in many studies of biomedical research where the failure time of interest is subject to interval‐censoring and subjects are correlated for being in the same cluster. A new semiparametric frailty probit regression model is proposed to study covariate effects on the failure time by accounting for the intracluster dependence. Under the proposed normal frailty probit model, the marginal distribution of the failure time is a semiparametric probit model, the regression parameters can be interpreted as both the conditional covariate effects given frailty and the marginal covariate effects up to a multiplicative constant, and the intracluster association can be summarized by two nonparametric measures in simple and explicit form. A fully Bayesian estimation approach is developed based on the use of monotone splines for the unknown nondecreasing function and a data augmentation using normal latent variables. The proposed Gibbs sampler is straightforward to implement since all unknowns have standard form in their full conditional distributions. The proposed method performs very well in estimating the regression parameters as well as the intracluster association, and the method is robust to frailty distribution misspecifications as shown in our simulation studies. Two real‐life data sets are analyzed for illustration.  相似文献   

4.
Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero‐inflation complicate these tasks—requiring highly flexible, data‐adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero‐inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero‐inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest—allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero‐inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER‐Medicare database.  相似文献   

5.
Marginal methods have been widely used for the analysis of longitudinal ordinal and categorical data. These models do not require full parametric assumptions on the joint distribution of repeated response measurements but only specify the marginal or even association structures. However, inference results obtained from these methods often incur serious bias when variables are subject to error. In this paper, we tackle the problem that misclassification exists in both response and categorical covariate variables. We develop a marginal method for misclassification adjustment, which utilizes second‐order estimating functions and a functional modeling approach, and can yield consistent estimates and valid inference for mean and association parameters. We propose a two‐stage estimation approach for cases in which validation data are available. Our simulation studies show good performance of the proposed method under a variety of settings. Although the proposed method is phrased to data with a longitudinal design, it also applies to correlated data arising from clustered and family studies, in which association parameters may be of scientific interest. The proposed method is applied to analyze a dataset from the Framingham Heart Study as an illustration.  相似文献   

6.
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.  相似文献   

7.
This paper focuses on analysis of spatiotemporal binary data with absorbing states. The research was motivated by a clinical study on amyotrophic lateral sclerosis (ALS), a neurological disease marked by gradual loss of muscle strength over time in multiple body regions. We propose an autologistic regression model to capture complex spatial and temporal dependencies in muscle strength among different muscles. As it is not clear how the disease spreads from one muscle to another, it may not be reasonable to define a neighborhood structure based on spatial proximity. Relaxing the requirement for prespecification of spatial neighborhoods as in existing models, our method identifies an underlying network structure empirically to describe the pattern of spreading disease. The model also allows the network autoregressive effects to vary depending on the muscles’ previous status. Based on the joint distribution derived from this autologistic model, the joint transition probabilities of responses among locations can be estimated and the disease status can be predicted in the next time interval. Model parameters are estimated through maximization of penalized pseudo‐likelihood. Postmodel selection inference was conducted via a bias‐correction method, for which the asymptotic distributions were derived. Simulation studies were conducted to evaluate the performance of the proposed method. The method was applied to the analysis of muscle strength loss from the ALS clinical study.  相似文献   

8.
The intraclass correlation is commonly used with clustered data. It is often estimated based on fitting a model to hierarchical data and it leads, in turn, to several concepts such as reliability, heritability, inter‐rater agreement, etc. For data where linear models can be used, such measures can be defined as ratios of variance components. Matters are more difficult for non‐Gaussian outcomes. The focus here is on count and time‐to‐event outcomes where so‐called combined models are used, extending generalized linear mixed models, to describe the data. These models combine normal and gamma random effects to allow for both correlation due to data hierarchies as well as for overdispersion. Furthermore, because the models admit closed‐form expressions for the means, variances, higher moments, and even the joint marginal distribution, it is demonstrated that closed forms of intraclass correlations exist. The proposed methodology is illustrated using data from agricultural and livestock studies.  相似文献   

9.
Modeling organism distributions from survey data involves numerous statistical challenges, including accounting for zero‐inflation, overdispersion, and selection and incorporation of environmental covariates. In environments with high spatial and temporal variability, addressing these challenges often requires numerous assumptions regarding organism distributions and their relationships to biophysical features. These assumptions may limit the resolution or accuracy of predictions resulting from survey‐based distribution models. We propose an iterative modeling approach that incorporates a negative binomial hurdle, followed by modeling of the relationship of organism distribution and abundance to environmental covariates using generalized additive models (GAM) and generalized additive models for location, scale, and shape (GAMLSS). Our approach accounts for key features of survey data by separating binary (presence‐absence) from count (abundance) data, separately modeling the mean and dispersion of count data, and incorporating selection of appropriate covariates and response functions from a suite of potential covariates while avoiding overfitting. We apply our modeling approach to surveys of sea duck abundance and distribution in Nantucket Sound (Massachusetts, USA), which has been proposed as a location for offshore wind energy development. Our model results highlight the importance of spatiotemporal variation in this system, as well as identifying key habitat features including distance to shore, sediment grain size, and seafloor topographic variation. Our work provides a powerful, flexible, and highly repeatable modeling framework with minimal assumptions that can be broadly applied to the modeling of survey data with high spatiotemporal variability. Applying GAMLSS models to the count portion of survey data allows us to incorporate potential overdispersion, which can dramatically affect model results in highly dynamic systems. Our approach is particularly relevant to systems in which little a priori knowledge is available regarding relationships between organism distributions and biophysical features, since it incorporates simultaneous selection of covariates and their functional relationships with organism responses.  相似文献   

10.
Measures of fitness such as reproductive performance are considered reliable indicators of habitat quality for a species. Such measures are, however, only available in a restricted number of sites, which prevents them from being used to quantify habitat quality across landscapes or regions. Alternatively, species presence records can be used along with environmental variables to build models that predict the distribution of species across larger spatial extents. Model predictions are often used for management purposes as they are assumed to describe the quality of the habitats to support a species. Yet, given that species are often present both in optimal and suboptimal areas, the use of data collected during the breeding season to build these models may potentially result in misleading predictions of habitat quality for the reproduction of the species, with potentially significant conservation consequences. In this study we analysed the relationship between fitness parameters informing on habitat quality for reproduction and predictions of species distribution models at multiple spatial scales using two independent sets of data. For 19 passerine bird species, we compared an indirect measure of reproductive performance (ratio of juveniles‐to‐adults) – obtained from Constant Effort Sites (CES) mist‐netting data in Catalonia – with the predictions of models based on bird presence records collected during the Catalan Breeding Bird Atlas (CBBA). A positive relationship between the predictions derived from species distribution models and the reproductive performance of the species was found for almost half of the species at one or more spatial scales. This result suggests that species distribution models may help to predict habitat quality for some species over some extents. However, caution is needed as this is not consistent for all species at all scales. Further work based on species‐ and scale‐specific approaches is now required to understand in which situations species distribution models provide predictions that are in line with reproductive performance.  相似文献   

11.
Aim The study and prediction of species–environment relationships is currently mainly based on species distribution models. These purely correlative models neglect spatial population dynamics and assume that species distributions are in equilibrium with their environment. This causes biased estimates of species niches and handicaps forecasts of range dynamics under environmental change. Here we aim to develop an approach that statistically estimates process‐based models of range dynamics from data on species distributions and permits a more comprehensive quantification of forecast uncertainties. Innovation We present an approach for the statistical estimation of process‐based dynamic range models (DRMs) that integrate Hutchinson's niche concept with spatial population dynamics. In a hierarchical Bayesian framework the environmental response of demographic rates, local population dynamics and dispersal are estimated conditional upon each other while accounting for various sources of uncertainty. The method thus: (1) jointly infers species niches and spatiotemporal population dynamics from occurrence and abundance data, and (2) provides fully probabilistic forecasts of future range dynamics under environmental change. In a simulation study, we investigate the performance of DRMs for a variety of scenarios that differ in both ecological dynamics and the data used for model estimation. Main conclusions Our results demonstrate the importance of considering dynamic aspects in the collection and analysis of biodiversity data. In combination with informative data, the presented framework has the potential to markedly improve the quantification of ecological niches, the process‐based understanding of range dynamics and the forecasting of species responses to environmental change. It thereby strengthens links between biogeography, population biology and theoretical and applied ecology.  相似文献   

12.
Summary Genomic instability, such as copy‐number losses and gains, occurs in many genetic diseases. Recent technology developments enable researchers to measure copy numbers at tens of thousands of markers simultaneously. In this article, we propose a nonparametric approach for detecting the locations of copy‐number changes and provide a measure of significance for each change point. The proposed test is based on seeking scale‐based changes in the sequence of copy numbers, which is ordered by the marker locations along the chromosome. The method leads to a natural way to estimate the null distribution for the test of a change point and adjusted p‐values for the significance of a change point using a step‐down maxT permutation algorithm to control the family‐wise error rate. A simulation study investigates the finite sample performance of the proposed method and compares it with a more standard sequential testing method. The method is illustrated using two real data sets.  相似文献   

13.
Biotic interactions are known to affect the composition of species assemblages via several mechanisms, such as competition and facilitation. However, most spatial models of species richness do not explicitly consider inter‐specific interactions. Here, we test whether incorporating biotic interactions into high‐resolution models alters predictions of species richness as hypothesised. We included key biotic variables (cover of three dominant arctic‐alpine plant species) into two methodologically divergent species richness modelling frameworks – stacked species distribution models (SSDM) and macroecological models (MEM) – for three ecologically and evolutionary distinct taxonomic groups (vascular plants, bryophytes and lichens). Predictions from models including biotic interactions were compared to the predictions of models based on climatic and abiotic data only. Including plant–plant interactions consistently and significantly lowered bias in species richness predictions and increased predictive power for independent evaluation data when compared to the conventional climatic and abiotic data based models. Improvements in predictions were constant irrespective of the modelling framework or taxonomic group used. The global biodiversity crisis necessitates accurate predictions of how changes in biotic and abiotic conditions will potentially affect species richness patterns. Here, we demonstrate that models of the spatial distribution of species richness can be improved by incorporating biotic interactions, and thus that these key predictor factors must be accounted for in biodiversity forecasts.  相似文献   

14.
Bayesian inference of phylogeny is unique among phylogenetic reconstruction methods in that it produces a posterior distribution of trees rather than a point estimate of the best tree. The most common way to summarize this distribution is to report the majority-rule consensus tree annotated with the marginal posterior probabilities of each partition. Reporting a single tree discards information contained in the full underlying distribution and reduces the Bayesian analysis to simply another method for finding a point estimate of the tree. Even when a point estimate of the phylogeny is desired, the majority-rule consensus tree is only one possible method, and there may be others that are more appropriate for the given data set and application. We present a method for summarizing the distribution of trees that is based on identifying agreement subtrees that are frequently present in the posterior distribution. This method provides fully resolved binary trees for subsets of taxa with high marginal posterior probability on the entire tree and includes additional information about the spread of the distribution.  相似文献   

15.
Behavioural research often produces data that have a complicated structure. For instance, data can represent repeated observations of the same individual and suffer from heteroscedasticity as well as other technical snags. The regression analysis of such data is often complicated by the fact that the observations (response variables) are mutually correlated. The correlation structure can be quite complex and might or might not be of direct interest to the user. In any case, one needs to take correlations into account (e.g. by means of random‐effect specification) in order to arrive at correct statistical inference (e.g. for construction of the appropriate test or confidence intervals). Over the last decade, such data have been more and more frequently analysed using repeated‐measures ANOVA and mixed‐effects models. Some researchers invoke the heavy machinery of mixed‐effects modelling to obtain the desired population‐level (marginal) inference, which can be achieved by using simpler tools – namely marginal models. This paper highlights marginal modelling (using generalized least squares [GLS] regression) as an alternative method. In various concrete situations, such marginal models can be based on fewer assumptions and directly generate estimates (population‐level parameters) which are of immediate interest to the behavioural researcher (such as population mean). Sometimes, they might be not only easier to interpret but also easier to specify than their competitors (e.g. mixed‐effects models). Using five examples from behavioural research, we demonstrate the use, advantages, limits and pitfalls of marginal and mixed‐effects models implemented within the functions of the ‘nlme’ package in R.  相似文献   

16.
Wang Z  Louis TA 《Biometrics》2004,60(4):884-891
Marginal models and conditional mixed-effects models are commonly used for clustered binary data. However, regression parameters and predictions in nonlinear mixed-effects models usually do not have a direct marginal interpretation, because the conditional functional form does not carry over to the margin. Because both marginal and conditional inferences are of interest, a unified approach is attractive. To this end, we investigate a parameterization of generalized linear mixed models with a structured random-intercept distribution that matches the conditional and marginal shapes. We model the marginal mean of response distribution and select the distribution of the random intercept to produce the match and also to model covariate-dependent random effects. We discuss the relation between this approach and some existing models and compare the approaches on two datasets.  相似文献   

17.
Summary In this article, we propose a positive stable shared frailty Cox model for clustered failure time data where the frailty distribution varies with cluster‐level covariates. The proposed model accounts for covariate‐dependent intracluster correlation and permits both conditional and marginal inferences. We obtain marginal inference directly from a marginal model, then use a stratified Cox‐type pseudo‐partial likelihood approach to estimate the regression coefficient for the frailty parameter. The proposed estimators are consistent and asymptotically normal and a consistent estimator of the covariance matrix is provided. Simulation studies show that the proposed estimation procedure is appropriate for practical use with a realistic number of clusters. Finally, we present an application of the proposed method to kidney transplantation data from the Scientific Registry of Transplant Recipients.  相似文献   

18.
Ecological niche models, or species distribution models, have been widely used to identify potentially suitable areas for species in future climate change scenarios. However, there are inherent errors to these models due to their inability to evaluate species occurrence influenced by non‐climatic factors. With the intuit to improve the modelling predictions for a bromeliad‐breeding treefrog (Phyllodytes melanomystax, Hylidae), we investigate how the climatic suitability of bromeliads influences the distribution model for the treefrog in the context of baseline and 2050 climate change scenarios. We used point occurrence data on the frog and the bromeliad (Vriesea procera, Bromeliaceae) to generate their predicted distributions based on baseline and 2050 climates. Using a consensus of five algorithms, we compared the accuracy of the models and the geographic predictions for the frog generated from two modelling procedures: (i) a climate‐only model for P. melanomystax and V. procera; and (ii) a climate‐biotic model for P. melanomystax, in which the climatic suitability of the bromeliad was jointly considered with the climatic variables. Both modelling approaches generated strong and similar predictive power for P. melanomystax, yet climate‐biotic modelling generated more concise predictions, particularly for the year 2050. Specifically, because the predicted area of the bromeliad overlaps with the predictions for the treefrog in the baseline climate, both modelling approaches produce reasonable similar predicted areas for the anuran. Alternatively, due to the predicted loss of northern climatically suitable areas for the bromeliad by 2050, only the climate‐biotic models provide evidence that northern populations of P. melanomystax will likely be negatively affected by 2050.  相似文献   

19.
In observational studies with dichotomous outcome of a population, researchers usually report treatment effect alone, although both baseline risk and treatment effect are needed to evaluate the significance of the treatment effect to the population. In this article, we study point and interval estimates including confidence region of baseline risk and treatment effect based on logistic model, where baseline risk is the risk of outcome of the population under control treatment while treatment effect is measured by the risk difference between outcomes of the population under active versus control treatments. Using approximate normal distribution of the maximum‐likelihood (ML) estimate of the model parameters, we obtain an approximate joint distribution of the ML estimate of the baseline risk and the treatment effect. Using the approximate joint distribution, we obtain point estimate and confidence region of the baseline risk and the treatment effect as well as point estimate and confidence interval of the treatment effect when the ML estimate of the baseline risk falls into specified range. These interval estimates reflect nonnormality of the joint distribution of the ML estimate of the baseline risk and the treatment effect. The method can be easily implemented by using any software that generates normal distribution. The method can also be used to obtain point and interval estimates of baseline risk and any other measure of treatment effect such as risk ratio and the number needed to treat. The method can also be extended from logistic model to other models such as log‐linear model.  相似文献   

20.
The popularity of penalized regression in high‐dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high‐dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso‐penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that can be applied to any penalized likelihood‐based model, such as logistic regression and Cox regression. Our approach is fast, flexible and can be used with a variety of penalty functions including lasso, elastic net, MCP, and MNet. We derive theoretical results under which the proposed method is valid, and use simulation studies to demonstrate that the approach is reasonably robust, albeit slightly conservative, when these assumptions are violated. Despite being conservative, we show that our method often offers more power to select causally important features than existing approaches. Finally, the practical utility of the method is demonstrated on gene expression datasets with binary and time‐to‐event outcomes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号