首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS: We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY: The source codes and datasets used are available from our Supplementary website.  相似文献   

2.
Model averaging is gaining popularity among ecologists for making inference and predictions. Methods for combining models include Bayesian model averaging (BMA) and Akaike’s Information Criterion (AIC) model averaging. BMA can be implemented with different prior model weights, including the Kullback–Leibler prior associated with AIC model averaging, but it is unclear how the prior model weight affects model results in a predictive context. Here, we implemented BMA using the Bayesian Information Criterion (BIC) approximation to Bayes factors for building predictive models of bird abundance and occurrence in the Chihuahuan Desert of New Mexico. We examined how model predictive ability differed across four prior model weights, and how averaged coefficient estimates, standard errors and coefficients’ posterior probabilities varied for 16 bird species. We also compared the predictive ability of BMA models to a best single-model approach. Overall, Occam’s prior of parsimony provided the best predictive models. In general, the Kullback–Leibler prior, however, favored complex models of lower predictive ability. BMA performed better than a best single-model approach independently of the prior model weight for 6 out of 16 species. For 6 other species, the choice of the prior model weight affected whether BMA was better than the best single-model approach. Our results demonstrate that parsimonious priors may be favorable over priors that favor complexity for making predictions. The approach we present has direct applications in ecology for better predicting patterns of species’ abundance and occurrence.  相似文献   

3.

Background  

Microarray technology is increasingly used to identify potential biomarkers for cancer prognostics and diagnostics. Previously, we have developed the iterative Bayesian Model Averaging (BMA) algorithm for use in classification. Here, we extend the iterative BMA algorithm for application to survival analysis on high-dimensional microarray data. The main goal in applying survival analysis to microarray data is to determine a highly predictive model of patients' time to event (such as death, relapse, or metastasis) using a small number of selected genes. Our multivariate procedure combines the effectiveness of multiple contending models by calculating the weighted average of their posterior probability distributions. Our results demonstrate that our iterative BMA algorithm for survival analysis achieves high prediction accuracy while consistently selecting a small and cost-effective number of predictor genes.  相似文献   

4.
We propose a simple method to provide a rapid and robust estimate of the short-term impacts of heat waves on mortality, to be used for communication within a heat warning system. The excess mortality during a heat wave is defined as the difference between the observed mortality over the period and the observed mortality over the same period during the N preceding years. This method was tested on 19 French cities between 1973 and 2007. In six cities, we compared the excess mortality to that obtained using a modelling of the temperature-mortality relationship. There was a good agreement between the excess mortalities estimated by the simple indicator and by the models. Major differences were observed during the most extreme heat waves, in 1983 and 2003, and after the implementation of the heat prevention plan in 2006. Excluding these events, the mean difference between the estimates obtained by the two methods was of 13 deaths [1:45]. A comparison of mortality with the previous years provides a simple estimate of the mortality impact of heat waves. It can be used to provide early and reliable information to stakeholders of the heat prevention plan, and to select heat waves that should be further investigated.  相似文献   

5.
In response to environmental threats, numerous indicators have been developed to assess the impact of livestock farming systems on the environment. Some of them, notably those based on management practices have been reported to have low accuracy. This paper reports the results of a study aimed at assessing whether accuracy can be increased at a reasonable cost by mixing individual indicators into models. We focused on proxy indicators representing an alternative to the direct impact measurement on two grassland bird species, the lapwing Vanellus vanellus and the redshank Tringa totanus. Models were developed using stepwise selection procedures or Bayesian model averaging (BMA). Sensitivity, specificity, and probability of correctly ranking fields (area under the curve, AUC) were estimated for each individual indicator or model from observational data measured on 252 grazed plots during 2 years. The cost of implementation of each model was computed as a function of the number and types of input variables. Among all management indicators, 50% had an AUC lower than or equal to 0.50 and thus were not better than a random decision. Independently of the statistical procedure, models combining management indicators were always more accurate than individual indicators for lapwings only. In redshanks, models based either on BMA or some selection procedures were non-informative. Higher accuracy could be reached, for both species, with model mixing management and habitat indicators. However, this increase in accuracy was also associated with an increase in model cost. Models derived by BMA were more expensive and slightly less accurate than those derived with selection procedures. Analysing trade-offs between accuracy and cost of indicators opens promising application perspectives as time consuming and expensive indicators are likely to be of low practical utility.  相似文献   

6.
In this paper we develop a hierarchical bivariate time series model to characterize the relationship between particulate matter less than 10 microns in aerodynamic diameter (PM10) and both mortality and hospital admissions for cardiovascular diseases. The model is applied to time series data on mortality and morbidity for 10 metropolitan areas in the United States from 1986 to 1993. We postulate that these time series should be related through a shared relationship with PM10. At the first stage of the hierarchy, we fit two seemingly unrelated Poisson regression models to produce city-specific estimates of the log relative rates of mortality and morbidity associated with exposure to PM10 within each location. The sample covariance matrix of the estimated log relative rates is obtained using a novel generalized estimating equation approach that takes into account the correlation between the mortality and morbidity time series. At the second stage, we combine information across locations to estimate overall log relative rates of mortality and morbidity and variation of the rates across cities. Using the combined information across the 10 locations we find that a 10 microg/m3 increase in average PM10 at the current day and previous day is associated with a 0.26% increase in mortality (95% posterior interval -0.37, 0.65), and a 0.71% increase in hospital admissions (95% posterior interval 0.35, 0.99). The log relative rates of mortality and morbidity have a similar degree of heterogeneity across cities: the posterior means of the between-city standard deviations of the mortality and morbidity air pollution effects are 0.42 (95% interval 0.05, 1.18), and 0.31 (95% interval 0.10, 0.89), respectively. The city-specific log relative rates of mortality and morbidity are estimated to have very low correlation, but the uncertainty in the correlation is very substantial (posterior mean = 0.20, 95% interval -0.89, 0.98). With the parameter estimates from the model, we can predict the hospitalization log relative rate for a new city for which hospitalization data are unavailable, using that city's estimated mortality relative rate. We illustrate this prediction using New York as an example.  相似文献   

7.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

8.
Increases in drought and temperature stress in forest and woodland ecosystems are thought to be responsible for the rise in episodic mortality events observed globally. However, key climatic drivers common to mortality events and the impacts of future extreme droughts on tree survival have not been evaluated. Here, we characterize climatic drivers associated with documented tree die‐off events across Australia using standardized climatic indices to represent the key dimensions of drought stress for a range of vegetation types. We identify a common probabilistic threshold associated with an increased risk of die‐off across all the sites that we examined. We show that observed die‐off events occur when water deficits and maximum temperatures are high and exist outside 98% of the observed range in drought intensity; this threshold was evident at all sites regardless of vegetation type and climate. The observed die‐off events also coincided with at least one heat wave (three consecutive days above the 90th percentile for maximum temperature), emphasizing a pivotal role of heat stress in amplifying tree die‐off and mortality processes. The joint drought intensity and maximum temperature distributions were modeled for each site to describe the co‐occurrence of both hot and dry conditions and evaluate future shifts in climatic thresholds associated with the die‐off events. Under a relatively dry and moderate warming scenario, the frequency of droughts capable of inducing significant tree die‐off across Australia could increase from 1 in 24 years to 1 in 15 years by 2050, accompanied by a doubling in the occurrence of associated heat waves. By defining commonalities in drought conditions capable of inducing tree die‐off, we show a strong interactive effect of water and high temperature stress and provide a consistent approach for assessing changes in the exposure of ecosystems to extreme drought events.  相似文献   

9.
A method is presented to statistically evaluate toxicity study design for dose– response assessment aimed at minimizing the uncertainty in resulting Benchmark dose (BMD) estimates. Although the BMD method has been accepted as a valuable tool for risk assessment, the traditional no observed adverse effect level (NOAEL)/lowest observed adverse effective level (LOAEL) approach is still the principal basis for toxicological study design. To develop similar protocols for experimental design for BMD estimation, methods are needed that account for variability in experimental outcomes, and uncertainty in dose–response model selection and model parameter estimates. Based on Bayesian model averaging (BMA) BMD estimation, this study focuses on identifying the study design criteria that can reduce the uncertainty in BMA BMD estimates by using a Monte Carlo pre-posterior analysis on BMA BMD predictions. The results suggest that (1) as more animals are tested there is less uncertainty in BMD estimates; (2) one relatively high dose is needed and other doses can then be appropriately spread over the resulting dose scale; (3) placing different numbers of animals in different dose groups has very limited influence on improving BMD estimation; and (4) when the total number of animals is fixed, using more (but smaller) dose groups is a preferred strategy.  相似文献   

10.
Heat waves are expected to increase in frequency and magnitude with climate change. The first part of a study to produce projections of the effect of future climate change on heat-related mortality is presented. Separate city-specific empirical statistical models that quantify significant relationships between summer daily maximum temperature (T max) and daily heat-related deaths are constructed from historical data for six cities: Boston, Budapest, Dallas, Lisbon, London, and Sydney. ‘Threshold temperatures’ above which heat-related deaths begin to occur are identified. The results demonstrate significantly lower thresholds in ‘cooler’ cities exhibiting lower mean summer temperatures than in ‘warmer’ cities exhibiting higher mean summer temperatures. Analysis of individual ‘heat waves’ illustrates that a greater proportion of mortality is due to mortality displacement in cities with less sensitive temperature–mortality relationships than in those with more sensitive relationships, and that mortality displacement is no longer a feature more than 12 days after the end of the heat wave. Validation techniques through residual and correlation analyses of modelled and observed values and comparisons with other studies indicate that the observed temperature–mortality relationships are represented well by each of the models. The models can therefore be used with confidence to examine future heat-related deaths under various climate change scenarios for the respective cities (presented in Part 2).  相似文献   

11.
The environmental changes caused by climate change represent a significant challenge to human societies. One part of this challenge will be greater heat-related mortality. Populations in the northern hemisphere will experience temperature increases exceeding the global average, but whether this will increase or decrease total temperature-related mortality burdens is debated. Here, we use distributed lag modeling to characterize temperature-mortality relationships in 15 Canadian cities. Further, we examine historical trends in temperature variation across Canada. We then develop city-specific general linear models to estimate change in high- and low-temperature-related mortality using dynamically downscaled climate projections for four future periods centred on 2040, 2060 and 2080. We find that the minimum mortality temperature is frequently located at approximately the 75th percentile of the city’s temperature distribution, and that Canadians currently experience greater and longer lasting risk from cold-related than heat-related mortality. Additionally, we find no evidence that temperature variation is increasing in Canada. However, the projected increased temperatures are sufficient to change the relative levels of heat- and cold-related mortality in some cities. While most temperature-related mortality will continue to be cold-related, our models predict that higher temperatures will increase the burden of annual temperature-related mortality in Hamilton, London, Montreal and Regina, but result in slight to moderate decreases in the burden of mortality in the other 11 cities investigated.  相似文献   

12.
Summary Model‐based estimation of the effect of an exposure on an outcome is generally sensitive to the choice of which confounding factors are included in the model. We propose a new approach, which we call Bayesian adjustment for confounding (BAC), to estimate the effect of an exposure of interest on the outcome, while accounting for the uncertainty in the choice of confounders. Our approach is based on specifying two models: (1) the outcome as a function of the exposure and the potential confounders (the outcome model); and (2) the exposure as a function of the potential confounders (the exposure model). We consider Bayesian variable selection on both models and link the two by introducing a dependence parameter, , denoting the prior odds of including a predictor in the outcome model, given that the same predictor is in the exposure model. In the absence of dependence (), BAC reduces to traditional Bayesian model averaging (BMA). In simulation studies, we show that BAC, with estimates the exposure effect with smaller bias than traditional BMA, and improved coverage. We, then, compare BAC, a recent approach of Crainiceanu, Dominici, and Parmigiani (2008 , Biometrika 95, 635–651), and traditional BMA in a time series data set of hospital admissions, air pollution levels, and weather variables in Nassau, NY for the period 1999–2005. Using each approach, we estimate the short‐term effects of on emergency admissions for cardiovascular diseases, accounting for confounding. This application illustrates the potentially significant pitfalls of misusing variable selection methods in the context of adjustment uncertainty.  相似文献   

13.
14.
ABSTRACT: BACKGROUND: Inference about regulatory networks from high-throughput genomics data is of great interest in systems biology. We present a Bayesian approach to infer gene regulatory networks from time series expression data by integrating various types of biological knowledge. RESULTS: We formulate network construction as a series of variable selection problems and use linear regression to model the data. Our method summarizes additional data sources with an informative prior probability distribution over candidate regression models. We extend the Bayesian model averaging (BMA) variable selection method to select regulators in the regression framework. We summarize the external biological knowledge by an informative prior probability distribution over the candidate regression models. CONCLUSIONS: We demonstrate our method on simulated data and a set of time-series microarray experiments measuring the effect of a drug perturbation on gene expression levels, and show that it outperforms leading regression-based methods in the literature.  相似文献   

15.
Decadal changes in summer mortality in U.S. cities   总被引:2,自引:0,他引:2  
Recent studies suggest that anthropogenic climate warming will result in higher heat-related mortality rates in U.S. cities than have been observed in the past. However, most of these analyses assume that weather-mortality relationships have not changed over time. We examine decadal-scale changes in relationships between human mortality and hot, humid weather for 28 U.S. cities with populations greater than one million. Twenty-nine years of daily total mortality rates, age-standardized to account for underlying demographic changes, are related to afternoon apparent temperatures ( T(a)) and organized by decade for each city. Threshold T(a) values, or the T(a) at and above which mortality is significantly elevated, are calculated for each city, and the mortality rates on days when the threshold T(a) was exceeded are compared across decades. On days with high T(a), mortality rates were lower in the 1980s and 1990s than in the 1960s and 1970s in a majority of the cities. Regionally, northeastern and northern interior cities continue to exhibit elevated, albeit reduced, death rates on warm, humid days in the 1980s and 1990s, while most southern cities do not. The overall decadal decline in mortality in most cities is probably because of adaptations: increased use of air conditioning, improved health care, and heightened public awareness of the biophysical impacts of heat exposure. This finding of a more muted mortality response of the U.S. populace to high T(a) values over time raises doubts about the validity of projections of future U.S. mortality increases linked to potential greenhouse warming.  相似文献   

16.
Establishing that a set of population‐splitting events occurred at the same time can be a potentially persuasive argument that a common process affected the populations. Recently, Oaks et al. ( 2013 ) assessed the ability of an approximate‐Bayesian model‐choice method (msBayes ) to estimate such a pattern of simultaneous divergence across taxa, to which Hickerson et al. ( 2014 ) responded. Both papers agree that the primary inference enabled by the method is very sensitive to prior assumptions and often erroneously supports shared divergences across taxa when prior uncertainty about divergence times is represented by a uniform distribution. However, the papers differ about the best explanation and solution for this problem. Oaks et al. ( 2013 ) suggested the method's behavior was caused by the strong weight of uniformly distributed priors on divergence times leading to smaller marginal likelihoods (and thus smaller posterior probabilities) of models with more divergence‐time parameters (Hypothesis 1); they proposed alternative prior probability distributions to avoid such strongly weighted posteriors. Hickerson et al. ( 2014 ) suggested numerical‐approximation error causes msBayes analyses to be biased toward models of clustered divergences because the method's rejection algorithm is unable to adequately sample the parameter space of richer models within reasonable computational limits when using broad uniform priors on divergence times (Hypothesis 2). As a potential solution, they proposed a model‐averaging approach that uses narrow, empirically informed uniform priors. Here, we use analyses of simulated and empirical data to demonstrate that the approach of Hickerson et al. ( 2014 ) does not mitigate the method's tendency to erroneously support models of highly clustered divergences, and is dangerous in the sense that the empirically derived uniform priors often exclude from consideration the true values of the divergence‐time parameters. Our results also show that the tendency of msBayes analyses to support models of shared divergences is primarily due to Hypothesis 1, whereas Hypothesis 2 is an untenable explanation for the bias. Overall, this series of papers demonstrates that if our prior assumptions place too much weight in unlikely regions of parameter space such that the exact posterior supports the wrong model of evolutionary history, no amount of computation can rescue our inference. Fortunately, as predicted by fundamental principles of Bayesian model choice, more flexible distributions that accommodate prior uncertainty about parameters without placing excessive weight in vast regions of parameter space with low likelihood increase the method's robustness and power to detect temporal variation in divergences.  相似文献   

17.
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.  相似文献   

18.
Understanding the mechanisms underlying the observed dynamics of complex biological systems requires the statistical assessment and comparison of multiple alternative models. Although this has traditionally been done using maximum likelihood-based methods such as Akaike''s Information Criterion (AIC), Bayesian methods have gained in popularity because they provide more informative output in the form of posterior probability distributions. However, comparison between multiple models in a Bayesian framework is made difficult by the computational cost of numerical integration over large parameter spaces. A new, efficient method for the computation of posterior probabilities has recently been proposed and applied to complex problems from the physical sciences. Here we demonstrate how nested sampling can be used for inference and model comparison in biological sciences. We present a reanalysis of data from experimental infection of mice with Salmonella enterica showing the distribution of bacteria in liver cells. In addition to confirming the main finding of the original analysis, which relied on AIC, our approach provides: (a) integration across the parameter space, (b) estimation of the posterior parameter distributions (with visualisations of parameter correlations), and (c) estimation of the posterior predictive distributions for goodness-of-fit assessments of the models. The goodness-of-fit results suggest that alternative mechanistic models and a relaxation of the quasi-stationary assumption should be considered.  相似文献   

19.
Heat wave impacts on mortality in Shanghai, 1998 and 2003   总被引:2,自引:0,他引:2  
A variety of research has linked extreme heat to heightened levels of daily mortality and, not surprisingly, heat waves both in 1998 and in 2003 all led to elevated mortality in Shanghai, China. While the heat waves in the two years were similar in meteorological character, elevated mortality was much more pronounced during the 1998 event, but it remains unclear why the human response was so varied. In order to explain the differences in human mortality between the two years’ heat waves, and to better understand how heat impacts human health, we examine a wide range of meteorological, pollution, and social variables in Shanghai during the summers (15 June to 15 September) of 1998 and 2003. Thus, the goal of this study is to determine what was responsible for the varying human health response during the two heat events. A multivariate analysis is used to investigate the relationships between mortality and heat wave intensity, duration, and timing within the summer season, along with levels of air pollution. It was found that for heat waves in both summers, mortality was strongly associated with the duration of the heat wave. In addition, while slightly higher than average, the air pollution levels for the two heat waves were similar and cannot fully explain the observed differences in human mortality. Finally, since the meteorological conditions and pollution levels for the two heat waves were alike, we conclude that improvements in living conditions in Shanghai, such as increased use of air conditioning, larger living areas, and increased urban green space, along with higher levels of heat awareness and the implementation of a heat warning system, were responsible for the lower levels of human mortality in 2003 compared to 1998.  相似文献   

20.
What stops parasites becoming ever more virulent? Conventional wisdom and most parasite-centred models of the evolution of virulence suppose that risk of host (and, hence, parasite) death imposes selection against more virulent strains. Here we selected for high and low virulence within each of two clones of the rodent malaria parasite Plasmodium chabaudi on the basis of between-host differences in a surrogate measure of virulence--loss of live weight post-infection. Despite imposing strong selection for low virulence which mimicked 50-75% host mortality, the low virulence lines increased in virulence as much as the high virulence lines. Thus, artificial selection on between-host differences in virulence was unable to counteract natural selection for increased virulence caused by within-host selection processes. The parasite''s asexual replication rate and number of sexual transmission forms also increased in all lines, consistent with evolutionary models explaining high virulence. An upper bound to virulence, though not the asexual replication rate, was apparent, but this bound was not imposed by host mortality. Thus, we found evidence of the factors assumed to drive evolution of increased virulence, but not those thought to counter this selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号