首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
Joint modeling of various longitudinal sequences has received quite a bit of attention in recent times. This paper proposes a so‐called marginalized joint model for longitudinal continuous and repeated time‐to‐event outcomes on the one hand and a marginalized joint model for bivariate repeated time‐to‐event outcomes on the other. The model has several appealing features. It flexibly allows for association among measurements of the same outcome at different occasions as well as among measurements on different outcomes recorded at the same time. The model also accommodates overdispersion. The time‐to‐event outcomes are allowed to be censored. While the model builds upon the generalized linear mixed model framework, it is such that model parameters enjoy a direct marginal interpretation. All of these features have been considered before, but here we bring them together in a unified, flexible framework. The model framework's properties are scrutinized using a simulation study. The models are applied to data from a chronic heart failure study and to a so‐called comet assay, encountered in preclinical research. Almost surprisingly, the models can be fitted relatively easily using standard statistical software.  相似文献   

2.
Summary Continuous‐time multistate models are widely used for categorical response data, particularly in the modeling of chronic diseases. However, inference is difficult when the process is only observed at discrete time points, with no information about the times or types of events between observation times, unless a Markov assumption is made. This assumption can be limiting as rates of transition between disease states might instead depend on the time since entry into the current state. Such a formulation results in a semi‐Markov model. We show that the computational problems associated with fitting semi‐Markov models to panel‐observed data can be alleviated by considering a class of semi‐Markov models with phase‐type sojourn distributions. This allows methods for hidden Markov models to be applied. In addition, extensions to models where observed states are subject to classification error are given. The methodology is demonstrated on a dataset relating to development of bronchiolitis obliterans syndrome in post‐lung‐transplantation patients.  相似文献   

3.
Survival data consisting of independent sets of correlated failure times may arise in many situations. For example, we may take repeated observations of the failure time of interest from each patient or observations of the failure time on siblings, or consider the failure times on littermates in toxicological experiments. Because the failure times taken on the same patient or related family members or from the same litter are likely correlated, use of the classical log‐rank test in these situations can be quite misleading with respect to type I error. To avoid this concern, this paper develops two closed‐form asymptotic summary tests, that account for the intraclass correlation between the failure times within patients or units. In fact, one of these two test includes the classical log‐rank test as a special case when the intraclass correlation equals 0. Furthermore, to evaluate the finite‐sample performance of the two tests developed here, this paper applies Monte Carlo simulation and notes that they can actually perform quite well in a variety of situations considered here.  相似文献   

4.
The Cochran–Armitage (CA) linear trend test for proportions is often used for genotype‐based analysis of candidate gene association. Depending on the underlying genetic mode of inheritance, the use of model‐specific scores maximises the power. Commonly, the underlying genetic model, i.e. additive, dominant or recessive mode of inheritance, is a priori unknown. Association studies are commonly analysed using permutation tests, where both inference and identification of the underlying mode of inheritance are important. Especially interesting are tests for case–control studies, defined by a maximum over a series of standardised CA tests, because such a procedure has power under all three genetic models. We reformulate the test problem and propose a conditional maximum test of scores‐specific linear‐by‐linear association tests. For maximum‐type, sum and quadratic test statistics the asymptotic expectation and covariance can be derived in a closed form and the limiting distribution is known. Both the limiting distribution and approximations of the exact conditional distribution can easily be computed using standard software packages. In addition to these technical advances, we extend the area of application to stratified designs, studies involving more than two groups and the simultaneous analysis of multiple loci by means of multiplicity‐adjusted p‐values for the underlying multiple CA trend tests. The new test is applied to reanalyse a study investigating genetic components of different subtypes of psoriasis. A new and flexible inference tool for association studies is available both theoretically as well as practically since already available software packages can be easily used to implement the suggested test procedures.  相似文献   

5.
The augmentation of categorical outcomes with underlying Gaussian variables in bivariate generalized mixed effects models has facilitated the joint modeling of continuous and binary response variables. These models typically assume that random effects and residual effects (co)variances are homogeneous across all clusters and subjects, respectively. Motivated by conflicting evidence about the association between performance outcomes in dairy production systems, we consider the situation where these (co)variance parameters may themselves be functions of systematic and/or random effects. We present a hierarchical Bayesian extension of bivariate generalized linear models whereby functions of the (co)variance matrices are specified as linear combinations of fixed and random effects following a square‐root‐free Cholesky reparameterization that ensures necessary positive semidefinite constraints. We test the proposed model by simulation and apply it to the analysis of a dairy cattle data set in which the random herd‐level and residual cow‐level effects (co)variances between a continuous production trait and binary reproduction trait are modeled as functions of fixed management effects and random cluster effects.  相似文献   

6.
In model building and model evaluation, cross‐validation is a frequently used resampling method. Unfortunately, this method can be quite time consuming. In this article, we discuss an approximation method that is much faster and can be used in generalized linear models and Cox’ proportional hazards model with a ridge penalty term. Our approximation method is based on a Taylor expansion around the estimate of the full model. In this way, all cross‐validated estimates are approximated without refitting the model. The tuning parameter can now be chosen based on these approximations and can be optimized in less time. The method is most accurate when approximating leave‐one‐out cross‐validation results for large data sets which is originally the most computationally demanding situation. In order to demonstrate the method's performance, it will be applied to several microarray data sets. An R package penalized, which implements the method, is available on CRAN.  相似文献   

7.
Designing an effective conservation strategy requires understanding where rare species are located. Because rare species can be difficult to find, ecologists often identify other species called conservation surrogates that can help inform the distribution of rare species. Species distribution models typically rely on environmental data when predicting the occurrence of species, neglecting the effect of species' co‐occurrences and biotic interactions. Here, we present a new approach that uses Bayesian networks to improve predictions by modeling environmental co‐responses among species. For species from a European peat bog community, our approach consistently performs better than single‐species models and better than conventional multi‐species approaches that include the presence of nontarget species as additional independent variables in regression models. Our approach performs particularly well with rare species and when calibration data are limited. Furthermore, we identify a group of “predictor species” that are relatively common, insensitive to the presence of other species, and can be used to improve occurrence predictions of rare species. Predictor species are distinct from other categories of conservation surrogates such as umbrella or indicator species, which motivates focused data collection of predictor species to enhance conservation practices.  相似文献   

8.
We analyze a real data set pertaining to reindeer fecal pellet‐group counts obtained from a survey conducted in a forest area in northern Sweden. In the data set, over 70% of counts are zeros, and there is high spatial correlation. We use conditionally autoregressive random effects for modeling of spatial correlation in a Poisson generalized linear mixed model (GLMM), quasi‐Poisson hierarchical generalized linear model (HGLM), zero‐inflated Poisson (ZIP), and hurdle models. The quasi‐Poisson HGLM allows for both under‐ and overdispersion with excessive zeros, while the ZIP and hurdle models allow only for overdispersion. In analyzing the real data set, we see that the quasi‐Poisson HGLMs can perform better than the other commonly used models, for example, ordinary Poisson HGLMs, spatial ZIP, and spatial hurdle models, and that the underdispersed Poisson HGLMs with spatial correlation fit the reindeer data best. We develop R codes for fitting these models using a unified algorithm for the HGLMs. Spatial count response with an extremely high proportion of zeros, and underdispersion can be successfully modeled using the quasi‐Poisson HGLM with spatial random effects.  相似文献   

9.
Ecological data sets often record the abundance of species, together with a set of explanatory variables. Multivariate statistical methods are optimal to analyze such data and are thus frequently used in ecology for exploration, visualization, and inference. Most approaches are based on pairwise distance matrices instead of the sites‐by‐species matrix, which stands in stark contrast to univariate statistics, where data models, assuming specific distributions, are the norm. However, through advances in statistical theory and computational power, models for multivariate data have gained traction. Systematic simulation‐based performance evaluations of these methods are important as guides for practitioners but still lacking. Here, we compare two model‐based methods, multivariate generalized linear models (MvGLMs) and constrained quadratic ordination (CQO), with two distance‐based methods, distance‐based redundancy analysis (dbRDA) and canonical correspondence analysis (CCA). We studied the performance of the methods to discriminate between causal variables and noise variables for 190 simulated data sets covering different sample sizes and data distributions. MvGLM and dbRDA differentiated accurately between causal and noise variables. The former had the lowest false‐positive rate (0.008), while the latter had the lowest false‐negative rate (0.027). CQO and CCA had the highest false‐negative rate (0.291) and false‐positive rate (0.256), respectively, where these error rates were typically high for data sets with linear responses. Our study shows that both model‐ and distance‐based methods have their place in the ecologist's statistical toolbox. MvGLM and dbRDA are reliable for analyzing species–environment relations, whereas both CQO and CCA exhibited considerable flaws, especially with linear environmental gradients.  相似文献   

10.
Zero‐truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well‐known software packages, and additional programming is often required. Motivated by the Rao–Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero‐truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.  相似文献   

11.
Birth‐and‐death processes are widely used to model the development of biological populations. Although they are relatively simple models, their parameters can be challenging to estimate, as the likelihood can become numerically unstable when data arise from the most common sampling schemes, such as annual population censuses. A further difficulty arises when the discrete observations are not equi‐spaced, for example, when census data are unavailable for some years. We present two approaches to estimating the birth, death, and growth rates of a discretely observed linear birth‐and‐death process: via an embedded Galton‐Watson process and by maximizing a saddlepoint approximation to the likelihood. We study asymptotic properties of the estimators, compare them on numerical examples, and apply the methodology to data on monitored populations.  相似文献   

12.
13.
Errors‐in‐variables models in high‐dimensional settings pose two challenges in application. First, the number of observed covariates is larger than the sample size, while only a small number of covariates are true predictors under an assumption of model sparsity. Second, the presence of measurement error can result in severely biased parameter estimates, and also affects the ability of penalized methods such as the lasso to recover the true sparsity pattern. A new estimation procedure called SIMulation‐SELection‐EXtrapolation (SIMSELEX) is proposed. This procedure makes double use of lasso methodology. First, the lasso is used to estimate sparse solutions in the simulation step, after which a group lasso is implemented to do variable selection. The SIMSELEX estimator is shown to perform well in variable selection, and has significantly lower estimation error than naive estimators that ignore measurement error. SIMSELEX can be applied in a variety of errors‐in‐variables settings, including linear models, generalized linear models, and Cox survival models. It is furthermore shown in the Supporting Information how SIMSELEX can be applied to spline‐based regression models. A simulation study is conducted to compare the SIMSELEX estimators to existing methods in the linear and logistic model settings, and to evaluate performance compared to naive methods in the Cox and spline models. Finally, the method is used to analyze a microarray dataset that contains gene expression measurements of favorable histology Wilms tumors.  相似文献   

14.
15.
This paper provides asymptotic simultaneous confidence intervals for a success probability and intraclass correlation of the beta‐binomial model, based on the maximum likelihood estimator approach. The coverage probabilities of those intervals are evaluated. An application to screening mammography is presented as an example. The individual and simultaneous confidence intervals for sensitivity and specificity and the corresponding intraclass correlations are investigated. Two additional examples using influenza data and sex ratio data among sibships are also considered, where the individual and simultaneous confidence intervals are provided.  相似文献   

16.
Data in medical sciences often have a hierarchical structure with lower level units (e.g. children) nested in higher level units (e.g. departments). Several specific but frequently studied settings, mainly in longitudinal and family research, involve a large number of units that tend to be quite small, with units containing only one element referred to as singletons. Regardless of sparseness, hierarchical data should be analyzed with appropriate methodology such as, for example linear‐mixed models. Using a simulation study, based on the structure of a data example on Ceftriaxone consumption in hospitalized children, we assess the impact of an increasing proportion of singletons (0–95%), in data with a low, medium, or high intracluster correlation, on the stability of linear‐mixed models parameter estimates, confidence interval coverage and F test performance. Some techniques that are frequently used in the presence of singletons include ignoring clustering, dropping the singletons from the analysis and grouping the singletons into an artificial unit. We show that both the fixed and random effects estimates and their standard errors are stable in the presence of an increasing proportion of singletons. We demonstrate that ignoring clustering and dropping singletons should be avoided as they come with biased standard error estimates. Grouping the singletons into an artificial unit might be considered, although the linear‐mixed model performs better even when the proportion of singletons is high. We conclude that the linear‐mixed model is stable in the presence of singletons when both lower‐ and higher level sample sizes are fixed. In this setting, the use of remedial measures, such as ignoring clustering and grouping or removing singletons, should be dissuaded.  相似文献   

17.
Aim Models relating species distributions to climate or habitat are widely used to predict the effects of global change on biodiversity. Most such approaches assume that climate governs coarse‐scale species ranges, whereas habitat limits fine‐scale distributions. We tested the influence of topoclimate and land cover on butterfly distributions and abundance in a mountain range, where climate may vary as markedly at a fine scale as land cover. Location Sierra de Guadarrama (Spain, southern Europe) Methods We sampled the butterfly fauna of 180 locations (89 in 2004, 91 in 2005) in a 10,800 km2 region, and derived generalized linear models (GLMs) for species occurrence and abundance based on topoclimatic (elevation and insolation) or habitat (land cover, geology and hydrology) variables sampled at 100‐m resolution using GIS. Models for each year were tested against independent data from the alternate year, using the area under the receiver operating characteristic curve (AUC) (distribution) or Spearman's rank correlation coefficient (rs) (abundance). Results In independent model tests, 74% of occurrence models achieved AUCs of > 0.7, and 85% of abundance models were significantly related to observed abundance. Topoclimatic models outperformed models based purely on land cover in 72% of occurrence models and 66% of abundance models. Including both types of variables often explained most variation in model calibration, but did not significantly improve model cross‐validation relative to topoclimatic models. Hierarchical partitioning analysis confirmed the overriding effect of topoclimatic factors on species distributions, with the exception of several species for which the importance of land cover was confirmed. Main conclusions Topoclimatic factors may dominate fine‐resolution species distributions in mountain ranges where climate conditions vary markedly over short distances and large areas of natural habitat remain. Climate change is likely to be a key driver of species distributions in such systems and could have important effects on biodiversity. However, continued habitat protection may be vital to facilitate range shifts in response to climate change.  相似文献   

18.
Modeling individual heterogeneity in capture probabilities has been one of the most challenging tasks in capture–recapture studies. Heterogeneity in capture probabilities can be modeled as a function of individual covariates, but correlation structure among capture occasions should be taking into account. A proposed generalized estimating equations (GEE) and generalized linear mixed modeling (GLMM) approaches can be used to estimate capture probabilities and population size for capture–recapture closed population models. An example is used for an illustrative application and for comparison with currently used methodology. A simulation study is also conducted to show the performance of the estimation procedures. Our simulation results show that the proposed quasi‐likelihood based on GEE approach provides lower SE than partial likelihood based on either generalized linear models (GLM) or GLMM approaches for estimating population size in a closed capture–recapture experiment. Estimator performance is good if a large proportion of individuals are captured. For cases where only a small proportion of individuals are captured, the estimates become unstable, but the GEE approach outperforms the other methods.  相似文献   

19.
The popularity of penalized regression in high‐dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high‐dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso‐penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that can be applied to any penalized likelihood‐based model, such as logistic regression and Cox regression. Our approach is fast, flexible and can be used with a variety of penalty functions including lasso, elastic net, MCP, and MNet. We derive theoretical results under which the proposed method is valid, and use simulation studies to demonstrate that the approach is reasonably robust, albeit slightly conservative, when these assumptions are violated. Despite being conservative, we show that our method often offers more power to select causally important features than existing approaches. Finally, the practical utility of the method is demonstrated on gene expression datasets with binary and time‐to‐event outcomes.  相似文献   

20.
Longitudinal data are common in clinical trials and observational studies, where missing outcomes due to dropouts are always encountered. Under such context with the assumption of missing at random, the weighted generalized estimating equation (WGEE) approach is widely adopted for marginal analysis. Model selection on marginal mean regression is a crucial aspect of data analysis, and identifying an appropriate correlation structure for model fitting may also be of interest and importance. However, the existing information criteria for model selection in WGEE have limitations, such as separate criteria for the selection of marginal mean and correlation structures, unsatisfactory selection performance in small‐sample setups, and so forth. In particular, there are few studies to develop joint information criteria for selection of both marginal mean and correlation structures. In this work, by embedding empirical likelihood into the WGEE framework, we propose two innovative information criteria named a joint empirical Akaike information criterion and a joint empirical Bayesian information criterion, which can simultaneously select the variables for marginal mean regression and also correlation structure. Through extensive simulation studies, these empirical‐likelihood‐based criteria exhibit robustness, flexibility, and outperformance compared to the other criteria including the weighted quasi‐likelihood under the independence model criterion, the missing longitudinal information criterion, and the joint longitudinal information criterion. In addition, we provide a theoretical justification of our proposed criteria, and present two real data examples in practice for further illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号