首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

In assessments of detrimental health risks from exposures to ionising radiation, many forms of risk to dose–response models are available in the literature. The usual practice is to base risk assessment on one specific model and ignore model uncertainty. The analysis illustrated here considers model uncertainty for the outcome all solid cancer incidence, when modelled as a function of colon organ dose, using the most recent publicly available data from the Life Span Study on atomic bomb survivors of Japan. Seven recent publications reporting all solid cancer risk models currently deemed plausible by the scientific community have been included in a model averaging procedure so that the main conclusions do not depend on just one type of model. The models have been estimated with different baselines and presented for males and females at various attained ages and ages at exposure, to obtain specially computed model-averaged Excess Relative Risks (ERR) and Excess Absolute Risks (EAR). Monte Carlo simulated estimation of uncertainty on excess risks was accounted for by applying realisations including correlations in the risk model parameters. Three models were found to weight the model-averaged risks most strongly depending on the baseline and information criteria used for the weighting. Fitting all excess risk models with the same baseline, one model dominates for both information criteria considered in this study. Based on the analysis presented here, it is generally recommended to take model uncertainty into account in future risk analyses.

  相似文献   

2.
Generalized relative and absolute risk models are fitted to the latest Japanese atomic bomb survivor solid cancer and leukemia mortality data (through 2000), with the latest (DS02) dosimetry, by classical (regression calibration) and Bayesian techniques, taking account of errors in dose estimates and other uncertainties. Linear-quadratic and linear-quadratic-exponential models are fitted and used to assess risks for contemporary populations of China, Japan, Puerto Rico, the U.S. and the UK. Many of these models are the same as or very similar to models used in the UNSCEAR 2006 report. For a test dose of 0.1 Sv, the solid cancer mortality for a UK population using the generalized linear-quadratic relative risk model is estimated as 5.4% Sv(-1) [90% Bayesian credible interval (BCI) 3.1, 8.0]. At 0.1 Sv, leukemia mortality for a UK population using the generalized linear-quadratic relative risk model is estimated as 0.50% Sv(-1) (90% BCI 0.11, 0.97). Risk estimates varied little between populations; at 0.1 Sv the central estimates ranged from 3.7 to 5.4% Sv(-1) for solid cancers and from 0.4 to 0.6% Sv(-1) for leukemia. Analyses using regression calibration techniques yield central estimates of risk very similar to those for the Bayesian approach. The central estimates of population risk were similar for the generalized absolute risk model and the relative risk model. Linear-quadratic-exponential models predict lower risks (at least at low test doses) and appear to fit as well, although for other (theoretical) reasons we favor the simpler linear-quadratic models.  相似文献   

3.
Instead of assessing the overall fit of candidate models like the traditional model selection criteria, the focused information criterion focuses attention directly on the parameter of the primary interest and aims to select the model with the minimum estimated mean squared error for the estimate of the focused parameter. In this article we apply the focused information criterion for personalized medicine. By using individual‐level information from clinical observations, demographics, and genetics, we obtain the personalized predictive models to make the prognosis and diagnosis individually. The consideration of the heterogeneity among the individuals helps reduce the prediction uncertainty and improve the prediction accuracy. Two real data examples from biomedical research are studied as illustrations.  相似文献   

4.
The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the ability to discriminate among competing topologies. We have explored the relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty. We also performed equally weighted maximum parsimony analyses in order to assess the effects of ignoring branch length information during tree selection. We observed that maximum parsimony gave the lowest mean estimate of bootstrap support for the correct set of nodes relative to the ML models for every data set except one. For several data sets, we established that the exact distribution used to model among-site rate variation was critical for a successful phylogenetic analysis. Site-specific rate models were shown to perform very poorly relative to gamma and invariable sites models for several of the data sets most likely because of the gross underestimation of branch lengths. The invariable sites model also performed poorly for several data sets where this model had a poor fit to the data, suggesting that addition of the gamma distribution can be critical. Estimates of bootstrap support for the correct nodes often increased under gamma and invariable sites models relative to equal rates models. Our observations are contrary to the prediction that such models cause reduced confidence in phylogenetic hypotheses. Our results raise several issues regarding the process of model selection, and we briefly discuss model selection uncertainty and the role of sensitivity analyses in molecular phylogenetics.  相似文献   

5.
A working guide to boosted regression trees   总被引:33,自引:0,他引:33  
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.  相似文献   

6.
Mike Lonergan 《Oecologia》2014,175(4):1063-1067
Detailed models have the potential to reveal important processes underlying patterns in data. However, model fitting depends on the availability of sufficient data, and the results obtained from the models depend on detailed assumptions. In a recent paper, Matthiopoulos et al. fitted Bayesian state space models to a limited dataset and attempted to explain the recent trajectory of the harbour seal population in the Moray Firth, in northern Scotland. They went on to suggest that the results could help explain recent declines in other nearby populations. This Comment describes the implications of understating the uncertainty that the model required for convergence, questions the robustness of the results, highlights the differences between the areas, and cautions against extrapolating across these populations. The distinction between models that can be fitted to a dataset and those that provide useful information about the systems that generated the data is also considered.  相似文献   

7.
Lung cancer mortality in the period of 1948-2002 has been analysed for 6,293 male workers of the Mayak Production Association, for whose information on smoking, annual external doses and annual lung doses due to plutonium exposures was available. Individual likelihoods were maximized for the two-stage clonal expansion (TSCE) model of carcinogenesis and for an empirical risk model. Possible detrimental and protective bystander effects on mutation and malignant transformation rates were taken into account in the TSCE model. Criteria for non-nested models were used to evaluate the quality of fit. Data were found to be incompatible with the model including a detrimental bystander effect. The model with a protective bystander effect did not improve the quality of fit over models without a bystander effect. The preferred TSCE model was sub-multiplicative in the risks due to smoking and internal radiation, and more than additive. Smoking contributed 57% to the lung cancer deaths, the interaction of smoking and radiation 27%, radiation 10%, and others cause 6%. An assessment of the relative biological effectiveness of plutonium was consistent with the ICRP recommended value of 20. At age 60 years, the excess relative risk (ERR) per lung dose was 0.20 (95% CI: 0.13; 0.40) Sv(-1), while the excess absolute risk (EAR) per lung dose was 3.2 (2.0; 6.2) per 10(4) PY Sv. With increasing age attained the ERR decreased and the EAR increased. In contrast to the atomic bomb survivors, a significant elevated lung cancer risk was also found for age attained younger than 55 years. For cumulative lung doses below 5 Sv, the excess risk depended linearly on dose. The excess relative risk was significantly lower in the TSCE model for ages attained younger than 55 than that in the empirical model. This reflects a model uncertainty in the results, which is not expressed by the standard statistical uncertainty bands.  相似文献   

8.
生长参数是渔业资源评估和管理策略中的关键参数,因而对目标鱼种选择合适的生长模型至关重要.本文以北部湾多齿蛇鲻为例,采用2006年12月至2009年7逐月采集的体长与年龄鉴定数据(n=2046),运用5个候选生长模型,利用最大似然法在加性误差条件下估算生长参数,并通过模型近似解释率(R2adj)、根平均方差(RMSE)、赤井信息准则(AIC)和贝叶斯信息准则(BIC)检验模型拟合度.结果表明: 在当前大样本的情况下,4种统计方法在模型拟合度排序上表现一致;多模型推论检验结果表明,Generalized VBGF获得足够的模型支持,并占到AIC权重的95.9%,可以独立描述多齿蛇鲻的体长与年龄的生长关系,生长方程为:Lt=578.49\[1-e-0.051(t-0.14)\]0.361.  相似文献   

9.
Proportional and separate models able to apply different combination of substitution rate matrix (SRM) and among-site rate variation model (ASRVM) to each locus are frequently used in phylogenetic studies of multilocus data. A proportional model assumes that branch lengths are proportional among partitions and a separate model assumes that each partition has an independent set of branch lengths. However, the selection from among nonpartitioned (i.e., a common combination of models is applied to all-loci concatenated sequences), proportional and separate models is usually based on the researcher's preference rather than on any information criteria. This study describes two programs, 'Kakusan4' (for DNA sequences) and 'Aminosan' (for amino-acid sequences), which allow the selection of evolutionary models based on several types of information criteria. The programs can handle both multilocus and single-locus data, in addition to providing an easy-to-use wizard interface and a noninteractive command line interface. In the case of multilocus data, SRMs and ASRVMs are compared at each locus and at all-loci concatenated sequences, after which nonpartitioned, proportional and separate models are compared based on information criteria. The programs also provide model configuration files for mrbayes, paup*, phyml, raxml and Treefinder to support further phylogenetic analysis using a selected model. When likelihoods are optimized by Treefinder, the best-fit models were found to differ depending on the data set. Furthermore, differences in the information criteria among nonpartitioned, proportional and separate models were much larger than those among the nonpartitioned models. These findings suggest that selecting from nonpartitioned, proportional and separate models results in a better phylogenetic tree. Kakusan4 and Aminosan are available at http://www.fifthdimension.jp/. They are licensed under gnugpl Ver.2, and are able to run on Windows, MacOS X and Linux.  相似文献   

10.
Muscle models are an important tool in the development of new rehabilitation and diagnostic techniques. Many models have been proposed in the past, but little work has been done on comparing the performance of models. In this paper, seven models that describe the isometric force response to pulse train inputs are investigated. Five of the models are from the literature while two new models are also presented. Models are compared in terms of their ability to fit to isometric force data, using Akaike’s and Bayesian information criteria and by examining the ability of each model to describe the underlying behaviour in response to individual pulses. Experimental data were collected by stimulating the locust extensor tibia muscle and measuring the force generated at the tibia. Parameters in each model were estimated by minimising the error between the modelled and actual force response for a set of training data. A separate set of test data, which included physiological kick-type data, was used to assess the models. It was found that a linear model performed the worst whereas a new model was found to perform the best. The parameter sensitivity of this new model was investigated using a one-at-a-time approach, and it found that the force response is not particularly sensitive to changes in any parameter.  相似文献   

11.
ABSTRACT: BACKGROUND: In Cote d'Ivoire, an estimated 767,000 disability-adjusted life years are due to malaria, placing the country at position number 14 with regard to the global burden of malaria. Risk maps are important to guide control interventions, and hence, the aim of this study was to predict the geographical distribution of malaria infection risk in children aged <16 years in Cote d'Ivoire at high spatial resolution. METHODS: Using different data sources, a systematic review was carried out to compile and georeference survey data on Plasmodium spp. infection prevalence in Cote d'Ivoire, focusing on children aged <16 years. The period from 1988 to 2007 was covered. A suite of Bayesian geo-statistical logistic regression models was fitted to analyse malaria risk. Non-spatial models with and without exchangeable random effect parameters were compared to stationary and non-stationary spatial models. Non-stationarity was modelled assuming that the underlying spatial process is a mixture of separate stationary processes in each ecological zone. The best fitting model based on the deviance information criterion was used to predict Plasmodium spp. infection risk for entire Cote d'Ivoire, including uncertainty. RESULTS: Overall, 235 data points at 170 unique survey locations with malaria prevalence data for individuals aged <16 years were extracted. Most data points (n = 182, 77.4%) were collected between 2000 and 2007. A Bayesian non-stationary regression model showed the best fit with annualized rainfall and maximum land surface temperature identified as significant environmental covariates. This model was used to predict malaria infection risk at nonsampled locations. High-risk areas were mainly found in the north-central and western area, while relatively low-risk areas were located in the north at the country border, in the northeast, in the south-east around Abidjan, and in the central-west between two high prevalence areas. CONCLUSION: The malaria risk map at high spatial resolution gives an important overview of the geographical distribution of the disease in Cote d'Ivoire. It is a useful tool for the national malaria control programme and can be utilized for spatial targeting of control interventions and rational resource allocation.  相似文献   

12.
Progress in sociobiology continues to be hindered by abstract debates over methodology and the relative importance of within‐group vs. between‐group selection. We need concrete biological examples to ground discussions in empirical data. Recent work argued that the levels of aggression in social spider colonies are explained by group‐level adaptation. Here, we examine this conclusion using models that incorporate ecological detail while remaining consistent with kin‐ and multilevel selection frameworks. We show that although levels of aggression are driven, in part, by between‐group selection, incorporating universal within‐group competition provides a striking fit to the data that is inconsistent with pure group‐level adaptation. Instead, our analyses suggest that aggression is favoured primarily as a selfish strategy to compete for resources, despite causing lower group foraging efficiency or higher risk of group extinction. We argue that sociobiology will benefit from a pluralistic approach and stronger links between ecologically informed models and data.  相似文献   

13.
Radiation-related risks of cancer can be transported from one population to another population at risk, for the purpose of calculating lifetime risks from radiation exposure. Transfer via excess relative risks (ERR) or excess absolute risks (EAR) or a mixture of both (i.e., from the life span study (LSS) of Japanese atomic bomb survivors) has been done in the past based on qualitative weighting. Consequently, the values of the weights applied and the method of application of the weights (i.e., as additive or geometric weighted means) have varied both between reports produced at different times by the same regulatory body and also between reports produced at similar times by different regulatory bodies. Since the gender and age patterns are often markedly different between EAR and ERR models, it is useful to have an evidence-based method for determining the relative goodness of fit of such models to the data. This paper identifies a method, using Akaike model weights, which could aid expert judgment and be applied to help to achieve consistency of approach and quantitative evidence-based results in future health risk assessments. The results of applying this method to recent LSS cancer incidence models are that the relative EAR weighting by cancer solid cancer site, on a scale of 0–1, is zero for breast and colon, 0.02 for all solid, 0.03 for lung, 0.08 for liver, 0.15 for thyroid, 0.18 for bladder and 0.93 for stomach. The EAR weighting for female breast cancer increases from 0 to 0.3, if a generally observed change in the trend between female age-specific breast cancer incidence rates and attained age, associated with menopause, is accounted for in the EAR model. Application of this method to preferred models from a study of multi-model inference from many models fitted to the LSS leukemia mortality data, results in an EAR weighting of 0. From these results it can be seen that lifetime risk transfer is most highly weighted by EAR only for stomach cancer. However, the generalization and interpretation of radiation effect estimates based on the LSS cancer data, when projected to other populations, are particularly uncertain if considerable differences exist between site-specific baseline rates in the LSS and the other populations of interest. Definitive conclusions, regarding the appropriate method for transporting cancer risks, are limited by a lack of knowledge in several areas including unknown factors and uncertainties in biological mechanisms and genetic and environmental risk factors for carcinogenesis; uncertainties in radiation dosimetry; and insufficient statistical power and/or incomplete follow-up in data from radio-epidemiological studies.  相似文献   

14.
Phylogenetic comparative methods may fail to produce meaningful results when either the underlying model is inappropriate or the data contain insufficient information to inform the inference. The ability to measure the statistical power of these methods has become crucial to ensure that data quantity keeps pace with growing model complexity. Through simulations, we show that commonly applied model choice methods based on information criteria can have remarkably high error rates; this can be a problem because methods to estimate the uncertainty or power are not widely known or applied. Furthermore, the power of comparative methods can depend significantly on the structure of the data. We describe a Monte Carlo-based method which addresses both of these challenges, and show how this approach both quantifies and substantially reduces errors relative to information criteria. The method also produces meaningful confidence intervals for model parameters. We illustrate how the power to distinguish different models, such as varying levels of selection, varies both with number of taxa and structure of the phylogeny. We provide an open-source implementation in the pmc ("Phylogenetic Monte Carlo") package for the R programming language. We hope such power analysis becomes a routine part of model comparison in comparative methods.  相似文献   

15.
Aim Understanding the spatial patterns of species distribution and predicting the occurrence of high biological diversity and rare species are central themes in biogeography and environmental conservation. The aim of this study was to model and scrutinize the relative contributions of climate, topography, geology and land‐cover factors to the distributions of threatened vascular plant species in taiga landscapes in northern Finland. Location North‐east Finland, northern Europe. Methods The study was performed using a data set of 28 plant species and environmental variables at a 25‐ha resolution. Four different stepwise selection algorithms [Akaike information criterion (AIC), Bayesian information criterion (BIC), adaptive backfitting, cross selection] with generalized additive models (GAMs) were fitted to identify the main environmental correlates for species occurrences. The accuracies of the distribution models were evaluated using fourfold cross‐validation based on the area under the curve (AUC) derived from receiver operating characteristic plots. The GAMs were tentatively extrapolated to the whole study area and species occurrence probability maps were produced using GIS techniques. The effect of spatial autocorrelation on the modelling results was also tested by including autocovariate terms in the GAMs. Results According to the AUC values, the model performance varied from fair to excellent. The AIC algorithm provided the highest mean performance (mean AUC = 0.889), whereas the lowest mean AUC (0.851) was obtained from BIC. Most of the variation in the distribution of threatened plant species was related to growing degree days, temperature of the coldest month, water balance, cover of mire and mean elevation. In general, climate was the most powerful explanatory variable group, followed by land cover, topography and geology. Inclusion of the autocovariate only slightly improved the performance of the models and had a minor effect on the importance of the environmental variables. Main conclusions The results confirm that the landscape‐scale distribution patterns of plant species can be modelled well on the basis of environmental parameters. A spatial grid system with several environmental variables derived from remote sensing and GIS data was found to produce useful data sets, which can be employed when predicting species distribution patterns over extensive areas. Landscape‐scale maps showing the predicted occurrences of individual or multiple threatened plant species may provide a useful basis for focusing field surveys and allocating conservation efforts.  相似文献   

16.
Bayesian multimodel inference for geostatistical regression models   总被引:2,自引:0,他引:2  
Johnson DS  Hoeting JA 《PloS one》2011,6(11):e25677
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.  相似文献   

17.
Latent class model diagnosis   总被引:1,自引:0,他引:1  
Garrett ES  Zeger SL 《Biometrics》2000,56(4):1055-1067
In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the intention of hierarchical modeling. Problems arise when it is not clear how many disease classes are appropriate, creating a need for model selection and diagnostic techniques. Previous work has shown that the Pearson chi 2 statistic and the log-likelihood ratio G2 statistic are not valid test statistics for evaluating latent class models. Other methods, such as information criteria, provide decision rules without providing explicit information about where discrepancies occur between a model and the data. Identifiability issues further complicate these problems. This paper develops procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using Markov chain Monte Carlo techniques. Simulations and a psychiatric example are presented to demonstrate the effective use of these methods.  相似文献   

18.
Abstract. Dominance/diversity curves, displaying the relative abundances of the species within a community, have often been constructed from field data. Several ecological and statistical models of dominance/diversity have been proposed, to explain the curves. Yet, rarely have curves of different models been fitted to field data. In this paper the appropriate parameters and methods of curve fitting for plant communities are described for the General Lognormal, Canonical Lognormal, Geometric, Broken Stick, Zipf and Zipf-Mandelbrot models. Distinction is made between fixed and optimised parameters, to clarify para-meterisation of the models. It is concluded that all should be fitted by minimising the deviance in a ranked-abundance plot. Statistical tests of goodness of fit are discussed. It is concluded that consistency of fit between replicate quadrats of a community provide the best test. Curves of all the models discussed are fitted to data from a species-rich Spanish hay meadow, and to data from a New Zealand intertidal algal community. The Spanish meadow data are best fitted by General Lognormal. The New Zealand algal data are best fitted by Geometric or General Lognormal. Goodness of fit for a sample is usually relatively good or poor for all models, since much of the deviance comes from steps in the curve which none of the models can fit closely.  相似文献   

19.
Phylogenetic estimation has largely come to rely on explicitly model-based methods. This approach requires that a model be chosen and that that choice be justified. To date, justification has largely been accomplished through use of likelihood-ratio tests (LRTs) to assess the relative fit of a nested series of reversible models. While this approach certainly represents an important advance over arbitrary model selection, the best fit of a series of models may not always provide the most reliable phylogenetic estimates for finite real data sets, where all available models are surely incorrect. Here, we develop a novel approach to model selection, which is based on the Bayesian information criterion, but incorporates relative branch-length error as a performance measure in a decision theory (DT) framework. This DT method includes a penalty for overfitting, is applicable prior to running extensive analyses, and simultaneously compares all models being considered and thus does not rely on a series of pairwise comparisons of models to traverse model space. We evaluate this method by examining four real data sets and by using those data sets to define simulation conditions. In the real data sets, the DT method selects the same or simpler models than conventional LRTs. In order to lend generality to the simulations, codon-based models (with parameters estimated from the real data sets) were used to generate simulated data sets, which are therefore more complex than any of the models we evaluate. On average, the DT method selects models that are simpler than those chosen by conventional LRTs. Nevertheless, these simpler models provide estimates of branch lengths that are more accurate both in terms of relative error and absolute error than those derived using the more complex (yet still wrong) models chosen by conventional LRTs. This method is available in a program called DT-ModSel.  相似文献   

20.
Despite the fact that conceptual models of individual decision making under risk are deterministic, attempts to econometrically estimate risk preferences require some assumption about the stochastic nature of choice. Unfortunately, the consequences of making different assumptions are, at present, unclear. In this paper, we compare three popular error specifications (Fechner, contextual utility, and Luce error) for three different preference functionals (expected utility, rank-dependent utility, and a mixture of those two) using in- and out-of-sample selection criteria. We find drastically different inferences about structural risk preferences across the competing functionals and error specifications. Expected utility theory is least affected by the selection of the error specification. A mixture model combining the two conceptual models assuming contextual utility provides the best fit of the data both in- and out-of-sample.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号