首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 247 毫秒
1.
2.
Background stratified Poisson regression is an approach that has been used in the analysis of data derived from a variety of epidemiologically important studies of radiation-exposed populations, including uranium miners, nuclear industry workers, and atomic bomb survivors. We describe a novel approach to fit Poisson regression models that adjust for a set of covariates through background stratification while directly estimating the radiation-disease association of primary interest. The approach makes use of an expression for the Poisson likelihood that treats the coefficients for stratum-specific indicator variables as ‘nuisance’ variables and avoids the need to explicitly estimate the coefficients for these stratum-specific parameters. Log-linear models, as well as other general relative rate models, are accommodated. This approach is illustrated using data from the Life Span Study of Japanese atomic bomb survivors and data from a study of underground uranium miners. The point estimate and confidence interval obtained from this ‘conditional’ regression approach are identical to the values obtained using unconditional Poisson regression with model terms for each background stratum. Moreover, it is shown that the proposed approach allows estimation of background stratified Poisson regression models of non-standard form, such as models that parameterize latency effects, as well as regression models in which the number of strata is large, thereby overcoming the limitations of previously available statistical software for fitting background stratified Poisson regression models.  相似文献   

3.
利用合适的统计学方法能够更准确地理解动物的栖息地选择。本文通过对2003~2012年期间,10个国际期刊所发表的177篇关于鸟类和兽类栖息地选择论文的30种统计学方法进行分析,简要概述了目前流行的栖息地选择统计学分析方法及特点,同时对同时期的中文文献也进行了简要分析。目前关于动物栖息地选择较为流行的分析方法主要有逻辑斯蒂回归、资源选择函数、成分分析、广义线性模型、多元方差分析、基于欧几里德距离的方法、广义线性混合模型、生态位因子分析、基于个体模型、典型相关分析、物种分布模型等。广义线性模型、逻辑斯蒂回归、多元方差分析和基于欧几里德距离这些方法可以很灵活地用来分析数据,但是缺乏一个有生态学意义的理论框架。资源选择函数和生态位因子分析各自为栖息地选择研究提供了一个统一的理论框架。基于个体的模型是一个自下而上的过程,很难在系统水平形成理论。232篇国内文章中使用较多的方法是主成分分析、Mann-Whitney U检验、t检验、卡方检验、判别分析、方差分析、Vanderloeg选择系数和Scavia选择指数、逻辑斯蒂回归、Kruskal-Wallis H检验和多元回归分析等。在实际研究中,应根据所要解决的研究问题,选择切实可行的分析方法。  相似文献   

4.
Multi-state models are a flexible tool for analyzing complex time-to-event problems with multiple endpoints. Compared to the Cox regression model with a single endpoint or a summarizing composite endpoint, they can provide a more detailed insight into the disease process. Furthermore, prognosis can be improved by including information from intermediate events occurring during the course of the disease. Different model variants, options and additional assumptions provide many possibilities, but at the same time complicate the implementation of multi-state techniques. So far, no guiding literature is available to specify a multi-state model systematically. The objective of this work was to set up a general specification procedure for an illness-death model that optimizes the model fit and predictive accuracy by stepwise reduction of the model. As an application example, we reanalyzed data from an observational study of 434 ovarian cancer patients with progression as intermediate and death as absorbing state. The technique is described in general terms and can be applied to other illness-death models without recovery. The clock-reset approach was used, implicating that the time was reset to zero after progression. The non-homogeneous semi-Markov characteristic stated that the present time as well as the time between surgery and progression influenced survival after progression. Covariate effects on transitions were estimated and proportionality of transition baseline hazards was tested. The finally developed model optimized the accuracy of predictions for two simulated patients. This stepwise procedure yields parsimonious but targeted multi-state models with well interpretable coefficients and optimized predictive ability, even for smaller data sets.  相似文献   

5.
We have performed a meta-analysis of the major-histocompatibility-complex (MHC) region in systemic lupus erythematosus (SLE) to determine the association with both SNPs and classical human-leukocyte-antigen (HLA) alleles. More specifically, we combined results from six studies and well-known out-of-study control data sets, providing us with 3,701 independent SLE cases and 12,110 independent controls of European ancestry. This study used genotypes for 7,199 SNPs within the MHC region and for classical HLA alleles (typed and imputed). Our results from conditional analysis and model choice with the use of the Bayesian information criterion show that the best model for SLE association includes both classical loci (HLA-DRB103:01, HLA-DRB108:01, and HLA-DQA101:02) and two SNPs, rs8192591 (in class III and upstream of NOTCH4) and rs2246618 (MICB in class I). Our approach was to perform a stepwise search from multiple baseline models deduced from a priori evidence on HLA-DRB1 lupus-associated alleles, a stepwise regression on SNPs alone, and a stepwise regression on HLA alleles. With this approach, we were able to identify a model that was an overwhelmingly better fit to the data than one identified by simple stepwise regression either on SNPs alone (Bayes factor [BF] > 50) or on classical HLA alleles alone (BF > 1,000).  相似文献   

6.
Despite recent papers on problems associated with full-model and stepwise regression, their use is still common throughout ecological and environmental disciplines. Alternative approaches, including generating multiple models and comparing them post-hoc using techniques such as Akaike''s Information Criterion (AIC), are becoming more popular. However, these are problematic when there are numerous independent variables and interpretation is often difficult when competing models contain many different variables and combinations of variables. Here, we detail a new approach, REVS (Regression with Empirical Variable Selection), which uses all-subsets regression to quantify empirical support for every independent variable. A series of models is created; the first containing the variable with most empirical support, the second containing the first variable and the next most-supported, and so on. The comparatively small number of resultant models (n = the number of predictor variables) means that post-hoc comparison is comparatively quick and easy. When tested on a real dataset – habitat and offspring quality in the great tit (Parus major) – the optimal REVS model explained more variance (higher R2), was more parsimonious (lower AIC), and had greater significance (lower P values), than full, stepwise or all-subsets models; it also had higher predictive accuracy based on split-sample validation. Testing REVS on ten further datasets suggested that this is typical, with R2 values being higher than full or stepwise models (mean improvement = 31% and 7%, respectively). Results are ecologically intuitive as even when there are several competing models, they share a set of “core” variables and differ only in presence/absence of one or two additional variables. We conclude that REVS is useful for analysing complex datasets, including those in ecology and environmental disciplines.  相似文献   

7.
A working guide to boosted regression trees   总被引:33,自引:0,他引:33  
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.  相似文献   

8.
The problem of missing data is common in all fields of science. Various methods of estimating missing values in a dataset exist, such as deletion of cases, insertion of sample mean, and linear regression. Each approach presents problems inherent in the method itself or in the nature of the pattern of missing data. We report a method that (1) is more general in application and (2) provides better estimates than traditional approaches, such as one-step regression. The model is general in that it may be applied to singular matrices, such as small datasets or those that contain dummy or index variables. The strength of the model is that it builds a regression equation iteratively, using a bootstrap method. The precision of the regressed estimates of a variable increases as regressed estimates of the predictor variables improve. We illustrate this method with a set of measurements of European Upper Paleolithic and Mesolithic human postcranial remains, as well as a set of primate anthropometric data. First, simulation tests using the primate data set involved randomly turning 20% of the values to "missing". In each case, the first iteration produced significantly better estimates than other estimating techniques. Second, we applied our method to the incomplete set of human postcranial measurements. MISDAT estimates always perform better than replacement of missing data by means and better than classical multiple regression. As with classical multiple regression, MISDAT performs when squared multiple correlation values approach the reliability of the measurement to be estimated, e.g., above about 0. 8.  相似文献   

9.
ABSTRACT: BACKGROUND: Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS: We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS: We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.  相似文献   

10.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号