首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 625 毫秒
1.
He W  Lawless JF 《Biometrics》2003,59(4):837-848
This article presents methodology for multivariate proportional hazards (PH) regression models. The methods employ flexible piecewise constant or spline specifications for baseline hazard functions in either marginal or conditional PH models, along with assumptions about the association among lifetimes. Because the models are parametric, ordinary maximum likelihood can be applied; it is able to deal easily with such data features as interval censoring or sequentially observed lifetimes, unlike existing semiparametric methods. A bivariate Clayton model (1978, Biometrika 65, 141-151) is used to illustrate the approach taken. Because a parametric assumption about association is made, efficiency and robustness comparisons are made between estimation based on the bivariate Clayton model and "working independence" methods that specify only marginal distributions for each lifetime variable.  相似文献   

2.
Suppose that having established a marginal total effect of a point exposure on a time-to-event outcome, an investigator wishes to decompose this effect into its direct and indirect pathways, also known as natural direct and indirect effects, mediated by a variable known to occur after the exposure and prior to the outcome. This paper proposes a theory of estimation of natural direct and indirect effects in two important semiparametric models for a failure time outcome. The underlying survival model for the marginal total effect and thus for the direct and indirect effects, can either be a marginal structural Cox proportional hazards model, or a marginal structural additive hazards model. The proposed theory delivers new estimators for mediation analysis in each of these models, with appealing robustness properties. Specifically, in order to guarantee ignorability with respect to the exposure and mediator variables, the approach, which is multiply robust, allows the investigator to use several flexible working models to adjust for confounding by a large number of pre-exposure variables. Multiple robustness is appealing because it only requires a subset of working models to be correct for consistency; furthermore, the analyst need not know which subset of working models is in fact correct to report valid inferences. Finally, a novel semiparametric sensitivity analysis technique is developed for each of these models, to assess the impact on inference, of a violation of the assumption of ignorability of the mediator.  相似文献   

3.
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency.  相似文献   

4.
Summary We provide methods that can be used to obtain more accurate environmental exposure assessment. In particular, we propose two modeling approaches to combine monitoring data at point level with numerical model output at grid cell level, yielding improved prediction of ambient exposure at point level. Extending our earlier downscaler model (Berrocal, V. J., Gelfand, A. E., and Holland, D. M. (2010b) . A spatio‐temporal downscaler for outputs from numerical models. Journal of Agricultural, Biological and Environmental Statistics 15, 176–197), these new models are intended to address two potential concerns with the model output. One recognizes that there may be useful information in the outputs for grid cells that are neighbors of the one in which the location lies. The second acknowledges potential spatial misalignment between a station and its putatively associated grid cell. The first model is a Gaussian Markov random field smoothed downscaler that relates monitoring station data and computer model output via the introduction of a latent Gaussian Markov random field linked to both sources of data. The second model is a smoothed downscaler with spatially varying random weights defined through a latent Gaussian process and an exponential kernel function, that yields, at each site, a new variable on which the monitoring station data is regressed with a spatial linear model. We applied both methods to daily ozone concentration data for the Eastern US during the summer months of June, July and August 2001, obtaining, respectively, a 5% and a 15% predictive gain in overall predictive mean square error over our earlier downscaler model ( Berrocal et al., 2010b ). Perhaps more importantly, the predictive gain is greater at hold‐out sites that are far from monitoring sites.  相似文献   

5.
Pan W  Zeng D 《Biometrics》2011,67(3):996-1006
We study the estimation of mean medical cost when censoring is dependent and a large amount of auxiliary information is present. Under missing at random assumption, we propose semiparametric working models to obtain low-dimensional summarized scores. An estimator for the mean total cost can be derived nonparametrically conditional on the summarized scores. We show that when either the two working models for cost-survival process or the model for censoring distribution is correct, the estimator is consistent and asymptotically normal. Small-sample performance of the proposed method is evaluated via simulation studies. Finally, our approach is applied to analyze a real data set in health economics.  相似文献   

6.
Marginalized kernels for biological sequences   总被引:1,自引:0,他引:1  
MOTIVATION: Kernel methods such as support vector machines require a kernel function between objects to be defined a priori. Several works have been done to derive kernels from probability distributions, e.g., the Fisher kernel. However, a general methodology to design a kernel is not fully developed. RESULTS: We propose a reasonable way of designing a kernel when objects are generated from latent variable models (e.g., HMM). First of all, a joint kernel is designed for complete data which include both visible and hidden variables. Then a marginalized kernel for visible data is obtained by taking the expectation with respect to hidden variables. We will show that the Fisher kernel is a special case of marginalized kernels, which gives another viewpoint to the Fisher kernel theory. Although our approach can be applied to any object, we particularly derive several marginalized kernels useful for biological sequences (e.g., DNA and proteins). The effectiveness of marginalized kernels is illustrated in the task of classifying bacterial gyrase subunit B (gyrB) amino acid sequences.  相似文献   

7.
Ant clustering algorithms are a robust and flexible tool for clustering data that have produced some promising results. This paper introduces two improvements that can be incorporated into any ant clustering algorithm: kernel function similarity weights and a similarity memory model replacement scheme. A kernel function weights objects within an ant’s neighborhood according to the object distance and provides an alternate interpretation of the similarity of objects in an ant’s neighborhood. Ants can hill-climb the kernel gradients as they look for a suitable place to drop a carried object. The similarity memory model equips ants with a small memory consisting of a sampling of the current clustering space. We test several kernel functions and memory replacement schemes on the Iris, Wisconsin Breast Cancer, and Lincoln Lab network intrusion datasets. Compared to a basic ant clustering algorithm, we show that kernel functions and the similarity memory model increase clustering speed and cluster quality, especially for datasets with an unbalanced class distribution, such as network intrusion.  相似文献   

8.
Modeling of developmental toxicity studies often requires simple parametric analyses of the dose-response relationship between exposure and probability of a birth defect but poses challenges because of nonstandard distributions of birth defects for a fixed level of exposure. This article is motivated by two such experiments in which the distribution of the outcome variable is challenging to both the standard logistic model with binomial response and its parametric multistage elaborations. We approach our analysis using a Bayesian semiparametric model that we tailored specifically to developmental toxicology studies. It combines parametric dose-response relationships with a flexible nonparametric specification of the distribution of the response, obtained via a product of Dirichlet process mixtures approach (PDPM). Our formulation achieves three goals: (1) the distribution of the response is modeled in a general way, (2) the degree to which the distribution of the response adapts nonparametrically to the observations is driven by the data, and (3) the marginal posterior distribution of the parameters of interest is available in closed form. The logistic regression model, as well as many of its extensions such as the beta-binomial model and finite mixture models, are special cases. In the context of the two motivating examples and a simulated example, we provide model comparisons, illustrate overdispersion diagnostics that can assist model specification, show how to derive posterior distributions of the effective dose parameters and predictive distributions of response, and discuss the sensitivity of the results to the choice of the prior distribution.  相似文献   

9.
10.
Comparative structure models are available for two orders of magnitude more protein sequences than are experimentally determined structures. These models, however, suffer from two limitations that experimentally determined structures do not: They frequently contain significant errors, and their accuracy cannot be readily assessed. We have addressed the latter limitation by developing a protocol optimized specifically for predicting the Calpha root-mean-squared deviation (RMSD) and native overlap (NO3.5A) errors of a model in the absence of its native structure. In contrast to most traditional assessment scores that merely predict one model is more accurate than others, this approach quantifies the error in an absolute sense, thus helping to determine whether or not the model is suitable for intended applications. The assessment relies on a model-specific scoring function constructed by a support vector machine. This regression optimizes the weights of up to nine features, including various sequence similarity measures and statistical potentials, extracted from a tailored training set of models unique to the model being assessed: If possible, we use similarly sized models with the same fold; otherwise, we use similarly sized models with the same secondary structure composition. This protocol predicts the RMSD and NO3.5A errors for a diverse set of 580,317 comparative models of 6174 sequences with correlation coefficients (r) of 0.84 and 0.86, respectively, to the actual errors. This scoring function achieves the best correlation compared to 13 other tested assessment criteria that achieved correlations ranging from 0.35 to 0.71.  相似文献   

11.
Depressive state has been reported to be significantly associated with higher-level functional capacity among community-dwelling elderly. However, few studies have investigated the associations among people with long-term care requirements. We aimed to investigate the associations between depressive state and higher-level functional capacity and obtain marginal odds ratios using propensity score analyses in people with long-term care requirements. We conducted a cross-sectional study based on participants aged ≥65 years (n = 545) who were community dwelling and used outpatient care services for long-term preventive care. We measured higher-level functional capacity, depressive state, and possible confounders. Then, we estimated the marginal odds ratios (i.e., the change in odds of impaired higher-level functional capacity if all versus no participants were exposed to depressive state) by logistic models using generalized linear models with the inverse probability of treatment weighting (IPTW) for propensity score and design-based standard errors. Depressive state was used as the exposure variable and higher-level functional capacity as the outcome variable. The all absolute standardized differences after the IPTW using the propensity scores were <10% which indicated negligible differences in the mean or prevalence of the covariates between non-depressive state and depressive state. The marginal odds ratios were estimated by the logistic models with IPTW using the propensity scores. The marginal odds ratios were 2.17 (95%CI: 1.13–4.19) for men and 2.57 (95%CI: 1.26–5.26) for women. Prevention of depressive state may contribute to not only depressive state but also higher-level functional capacity.  相似文献   

12.
We propose a model-based approach that combines Bayesian variable selection tools, a novel spatial kernel convolution structure, and autoregressive processes for detecting a subject's brain activation at the voxel level in complex-valued functional magnetic resonance imaging (CV-fMRI) data. A computationally efficient Markov chain Monte Carlo algorithm for posterior inference is developed by taking advantage of the dimension reduction of the kernel-based structure. The proposed spatiotemporal model leads to more accurate posterior probability activation maps and less false positives than alternative spatial approaches based on Gaussian process models, and other complex-valued models that do not incorporate spatial and/or temporal structure. This is illustrated in the analysis of simulated data and human task-related CV-fMRI data. In addition, we show that complex-valued approaches dominate magnitude-only approaches and that the kernel structure in our proposed model considerably improves sensitivity rates when detecting activation at the voxel level.  相似文献   

13.
Marginal regression via generalized estimating equations is widely used in biostatistics to model longitudinal data from subjects whose outcomes and covariates are observed at several time points. In this paper we consider two issues that have been raised in the literature concerning the marginal regression approach. The first is that even though the past history may be predictive of outcome, the marginal approach does not use this history. Although marginal regression has the flexibility of allowing between-subject variations in the observation times, it may lose substantial prediction power in comparison with the transitional modeling approach that relates the responses to the covariate and outcome histories. We address this issue by using the concept of “information sets” for prediction to generalize the “partly conditional mean” approach of Pepe and Couper (J. Am. Stat. Assoc. 92:991–998, 1997). This modeling approach strikes a balance between the flexibility of the marginal approach and the predictive power of transitional modeling. Another issue is the problem of excess zeros in the outcomes over what the underlying model for marginal regression implies. We show how our predictive modeling approach based on information sets can be readily modified to handle the excess zeros in the longitudinal time series. By synthesizing the marginal, transitional, and mixed effects modeling approaches in a predictive framework, we also discuss how their respective advantages can be retained while their limitations can be circumvented for modeling longitudinal data.  相似文献   

14.
Summary We discuss tools for the evaluation of probabilistic forecasts and the critique of statistical models for count data. Our proposals include a nonrandomized version of the probability integral transform, marginal calibration diagrams, and proper scoring rules, such as the predictive deviance. In case studies, we critique count regression models for patent data, and assess the predictive performance of Bayesian age‐period‐cohort models for larynx cancer counts in Germany. The toolbox applies in Bayesian or classical and parametric or nonparametric settings and to any type of ordered discrete outcomes.  相似文献   

15.
Agreement between raters for binary outcome data is typically assessed using the kappa coefficient. There has been considerable recent work extending logistic regression to provide summary estimates of interrater agreement adjusted for covariates predictive of the marginal probability of classification by each rater. We propose an estimating equations approach which can also be used to identify covariates predictive of kappa. Models may include an arbitrary and variable number of raters per subject and yet do not require any stringent parametric assumptions. Examples used to illustrate this procedure include an investigation of factors affecting agreement between primary and proxy respondents from a case‐control study and a study of the effects of gender and zygosity on twin concordance for smoking history.  相似文献   

16.
Understanding the forces that shape the distribution of biodiversity across spatial scales is central in ecology and critical to effective conservation. To assess effects of possible richness drivers, we sampled ant communities on four elevational transects across two mountain ranges in Colorado, USA, with seven or eight sites on each transect and twenty repeatedly sampled pitfall trap pairs at each site each for a total of 90 d. With a multi‐scale hierarchical Bayesian community occupancy model, we simultaneously evaluated the effects of temperature, productivity, area, habitat diversity, vegetation structure, and temperature variability on ant richness at two spatial scales, quantifying detection error and genus‐level phylogenetic effects. We fit the model with data from one mountain range and tested predictive ability with data from the other mountain range. In total, we detected 105 ant species, and richness peaked at intermediate elevations on each transect. Species‐specific thermal preferences drove richness at each elevation with marginal effects of site‐scale productivity. Trap‐scale richness was primarily influenced by elevation‐scale variables along with a negative impact of canopy cover. Soil diversity had a marginal negative effect while daily temperature variation had a marginal positive effect. We detected no impact of area, land cover diversity, trap‐scale productivity, or tree density. While phylogenetic relationships among genera had little influence, congeners tended to respond similarly. The hierarchical model, trained on data from the first mountain range, predicted the trends on the second mountain range better than multiple regression, reducing root mean squared error up to 65%. Compared to a more standard approach, this modeling framework better predicts patterns on a novel mountain range and provides a nuanced, detailed evaluation of ant communities at two spatial scales.  相似文献   

17.
Wang Z  Louis TA 《Biometrics》2004,60(4):884-891
Marginal models and conditional mixed-effects models are commonly used for clustered binary data. However, regression parameters and predictions in nonlinear mixed-effects models usually do not have a direct marginal interpretation, because the conditional functional form does not carry over to the margin. Because both marginal and conditional inferences are of interest, a unified approach is attractive. To this end, we investigate a parameterization of generalized linear mixed models with a structured random-intercept distribution that matches the conditional and marginal shapes. We model the marginal mean of response distribution and select the distribution of the random intercept to produce the match and also to model covariate-dependent random effects. We discuss the relation between this approach and some existing models and compare the approaches on two datasets.  相似文献   

18.
MOTIVATION: Previous studies have shown that accounting for site-specific amino acid replacement patterns using mixtures of stationary probability profiles offers a promising approach for improving the robustness of phylogenetic reconstructions in the presence of saturation. However, such profile mixture models were introduced only in a Bayesian context, and are not yet available in a maximum likelihood (ML) framework. In addition, these mixture models only perform well on large alignments, from which they can reliably learn the shapes of profiles, and their associated weights. RESULTS: In this work, we introduce an expectation-maximization algorithm for estimating amino acid profile mixtures from alignment databases. We apply it, learning on the HSSP database, and observe that a set of 20 profiles is enough to provide a better statistical fit than currently available empirical matrices (WAG, JTT), in particular on saturated data.  相似文献   

19.
In this paper, we consider the problem of nonparametric curve fitting in the specific context of censored data. We propose an extension of the penalized splines approach using Kaplan–Meier weights to take into account the effect of censorship and generalized cross‐validation techniques to choose the smoothing parameter adapted to the case of censored samples. Using various simulation studies, we analyze the effectiveness of the censored penalized splines method proposed and show that the performance is quite satisfactory. We have extended this proposal to a generalized additive models (GAM) framework introducing a correction of the censorship effect, thus enabling more complex models to be estimated immediately. A real dataset from Stanford Heart Transplant data is also used to illustrate the methodology proposed, which is shown to be a good alternative when the probability distribution for the response variable and the functional form are not known in censored regression models.  相似文献   

20.
《Ecological monographs》2011,81(4):581-598
The complexity of mathematical models of ecological dynamics varies greatly, and it is often difficult to judge what would be the optimal level of complexity in a particular case. Here we compare the parameter estimates, model fits, and predictive abilities of two models of metapopulation dynamics: a detailed individual-based model (IBM) and a population-based stochastic patch occupancy model (SPOM) derived from the IBM. The two models were fitted to a 17-year time series of data for the Glanville fritillary butterfly (Melitaea cinxia) inhabiting a network of 72 small meadows. The data consisted of biannual counts of larval groups (IBM) and the annual presence or absence of local populations (SPOM). The models were fitted using a Bayesian state-space approach with a hierarchical random effect structure to account for observational, demographic, and environmental stochasticities. The detection probability of larval groups (IBM) and the probability of false zeros of local populations (SPOM) in the observation models were simultaneously estimated from the time-series data and independent control data. Prior distributions for dispersal parameters were obtained from a separate analysis of mark–recapture data. Both models fitted the data about equally, but the results were more precise for the IBM than for the SPOM. The two models yielded similar estimates for a random effect parameter describing habitat quality in each patch, which were correlated with independent empirical measures of habitat quality. The modeling results showed that variation in habitat quality influenced patch occupancy more through the effects on movement behavior at patch edges than on carrying capacity, whereas the latter influenced the mean population size in occupied patches. The IBM and the SPOM explained 63% and 45%, respectively, of the observed variation in the fraction of occupied habitat area among 75 independent patch networks not used in parameter estimation. We conclude that, while carefully constructed, detailed models can have better predictive ability than simple models, this advantage comes with the cost of greatly increased data requirements and computational challenges. Our results illustrate how complex models can be helpful in facilitating the construction of effective simpler models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号