首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Forecasts of species distributions under future climates are inherently uncertain, but there have been few attempts to describe this uncertainty comprehensively in a probabilistic manner. We developed a Monte Carlo approach that accounts for uncertainty within generalized linear regression models (parameter uncertainty and residual error), uncertainty among competing models (model uncertainty), and uncertainty in future climate conditions (climate uncertainty) to produce site‐specific frequency distributions of occurrence probabilities across a species' range. We illustrated the method by forecasting suitable habitat for bull trout (Salvelinus confluentus) in the Interior Columbia River Basin, USA, under recent and projected 2040s and 2080s climate conditions. The 95% interval of total suitable habitat under recent conditions was estimated at 30.1–42.5 thousand km; this was predicted to decline to 0.5–7.9 thousand km by the 2080s. Projections for the 2080s showed that the great majority of stream segments would be unsuitable with high certainty, regardless of the climate data set or bull trout model employed. The largest contributor to uncertainty in total suitable habitat was climate uncertainty, followed by parameter uncertainty and model uncertainty. Our approach makes it possible to calculate a full distribution of possible outcomes for a species, and permits ready graphical display of uncertainty for individual locations and of total habitat.  相似文献   

2.
Huihang Liu  Xinyu Zhang 《Biometrics》2023,79(3):2050-2062
Advances in information technologies have made network data increasingly frequent in a spectrum of big data applications, which is often explored by probabilistic graphical models. To precisely estimate the precision matrix, we propose an optimal model averaging estimator for Gaussian graphs. We prove that the proposed estimator is asymptotically optimal when candidate models are misspecified. The consistency and the asymptotic distribution of model averaging estimator, and the weight convergence are also studied when at least one correct model is included in the candidate set. Furthermore, numerical simulations and a real data analysis on yeast genetic data are conducted to illustrate that the proposed method is promising.  相似文献   

3.
Hokeun Sun  Hongzhe Li 《Biometrics》2012,68(4):1197-1206
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified‐likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re‐estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.  相似文献   

4.
We propose to define the complexity of an ecological model as the statistical complexity of the output it produces. This allows for a direct comparison between data and model complexity. Working with univariate time series, we show that this measure ‘blindly’ discriminates among the different dynamical behaviours a model can exhibit. We then search a model parameter space in order to segment it into areas of different dynamical behaviour and calculate the maximum complexity a model can generate. Given a time series, and the problem of choosing among a number of ecological models to study it, we suggest that models whose maximum complexity is lower than the time series complexity should be disregarded because they are unable to reconstruct some of the structures contained in the data. Similar reasoning could be used to disregard models’ subdomains as well as areas of unnecessary high complexity. We suggest that model complexity so defined better captures the difficulty faced by a user in managing and understanding the behaviour of an ecological model than measures based on a model ‘size’.  相似文献   

5.
Most molecular graphics programs ignore any uncertainty in the atomic coordinates being displayed. Structures are displayed in terms of perfect points, spheres, and lines with no uncertainty. However, all experimental methods for defining structures, and many methods for predicting and comparing structures, associate uncertainties with each atomic coordinate. We have developed graphical representations that highlight these uncertainties. These representations are encapsulated in a new interactive display program, PROTEAND. PROTEAND represents structural uncertainty in three ways: (1) The traditional way: The program shows a collection of structures as superposed and overlapped stick-figure models. (2) Ellipsoids: At each atom position, the program shows an ellipsoid derived from a three-dimensional Gaussian model of uncertainty. This probabilistic model provides additional information about the relationship between atoms that can be displayed as a correlation matrix. (3) Rigid-body volumes: Using clouds of dots, the program can show the range of rigid-body motion of selected substructures, such as individual α helices. We illustrate the utility of these display modalities by the applying PROTEAND to the globin family of proteins, and show that certain types of structural variation are best illustrated with different methods of display.  相似文献   

6.
Ghosh D  Lin DY 《Biometrics》2003,59(4):877-885
Dependent censoring occurs in longitudinal studies of recurrent events when the censoring time depends on the potentially unobserved recurrent event times. To perform regression analysis in this setting, we propose a semiparametric joint model that formulates the marginal distributions of the recurrent event process and dependent censoring time through scale-change models, while leaving the distributional form and dependence structure unspecified. We derive consistent and asymptotically normal estimators for the regression parameters. We also develop graphical and numerical methods for assessing the adequacy of the proposed model. The finite-sample behavior of the new inference procedures is evaluated through simulation studies. An application to recurrent hospitalization data taken from a study of intravenous drug users is provided.  相似文献   

7.
Density dependence in population growth rates is of immense importance to ecological theory and application, but is difficult to estimate. The Global Population Dynamics Database (GPDD), one of the largest collections of population time series available, has been extensively used to study cross-taxa patterns in density dependence. A major difficulty with assessing density dependence from time series is that uncertainty in population abundance estimates can cause strong bias in both tests and estimates of strength. We analyse 627 data sets in the GPDD using Gompertz population models and account for uncertainty via the Kalman filter. Results suggest that at least 45% of the time series display density dependence, but that it is weak and difficult to detect for a large fraction. When uncertainty is ignored, magnitude of and evidence for density dependence is strong, illustrating that uncertainty in abundance estimates qualitatively changes conclusions about density dependence drawn from the GPDD.  相似文献   

8.
We present two models of optimal resource exploitation for sit-and-waitforagers. The first model assumes immediate recognition of sitequality and that site quality does not change over time. Thismodel predicts a forager's minimum acceptable site quality.We present a graphical analysis to show how (1) the distributionof site qualities, (2) the travel time between sites, (3) costof search, and (4) expected duration of the foraging processinfluence the minimum acceptable rate. Our second model allowssite qualities to change and relaxes the assumption of immediaterecognition. This model defines conditions of (1) state duration,(2) recognition time, (3) site abundance, and (4) cost of searchwhere the optimal policy is to stay put in a site regardlessof experience. We discuss the implications of these models forthe design and interpretation of field experiments of site useand habitat selection.  相似文献   

9.

Background

A variety of obstacles including bureaucracy and lack of resources have interfered with timely detection and reporting of dengue cases in many endemic countries. Surveillance efforts have turned to modern data sources, such as Internet search queries, which have been shown to be effective for monitoring influenza-like illnesses. However, few have evaluated the utility of web search query data for other diseases, especially those of high morbidity and mortality or where a vaccine may not exist. In this study, we aimed to assess whether web search queries are a viable data source for the early detection and monitoring of dengue epidemics.

Methodology/Principal Findings

Bolivia, Brazil, India, Indonesia and Singapore were chosen for analysis based on available data and adequate search volume. For each country, a univariate linear model was then built by fitting a time series of the fraction of Google search query volume for specific dengue-related queries from that country against a time series of official dengue case counts for a time-frame within 2003–2010. The specific combination of queries used was chosen to maximize model fit. Spurious spikes in the data were also removed prior to model fitting. The final models, fit using a training subset of the data, were cross-validated against both the overall dataset and a holdout subset of the data. All models were found to fit the data quite well, with validation correlations ranging from 0.82 to 0.99.

Conclusions/Significance

Web search query data were found to be capable of tracking dengue activity in Bolivia, Brazil, India, Indonesia and Singapore. Whereas traditional dengue data from official sources are often not available until after some substantial delay, web search query data are available in near real-time. These data represent valuable complement to assist with traditional dengue surveillance.  相似文献   

10.
Ecological indicators are often collected to detect and monitor environmental change. Statistical models are used to estimate natural variability, pre-existing trends, and environmental predictors of baseline indicator conditions. Establishing standard models for baseline characterization is critical to the effective design and implementation of environmental monitoring programs. An anthropogenic activity that requires monitoring is the development of Marine Renewable Energy sites. Currently, there are no standards for the analysis of environmental monitoring data for these development sites. Marine Renewable Energy monitoring data are used as a case study to develop and apply a model evaluation to establish best practices for characterizing baseline ecological indicator data. We examined a range of models, including six generalized regression models, four time series models, and three nonparametric models. Because monitoring data are not always normally distributed, we evaluated model ability to characterize normal and non-normal data using hydroacoustic metrics that serve as proxies for ecological indicator data. The nonparametric support vector regression and random forest models, and parametric state-space time series models generally were the most accurate in interpolating the normal metric data. Support vector regression and state-space models best interpolated the non-normally distributed data. If parametric results are preferred, then state-space models are the most robust for baseline characterization. Evaluation of a wide range of models provides a comprehensive characterization of the case study data, and highlights advantages of models rarely used in Marine Renewable Energy environmental monitoring. Our model findings are relevant for any ecological indicator data with similar properties, and the evaluation approach is applicable to any monitoring program.  相似文献   

11.
The behavior of females in search of a mate determines the likelihood that high quality males are encountered and adaptive search strategies rely on the effective use of available information on the quality of prospective mates. The sequential search strategy was formulated, like most models of search behavior, on the assumption that females obtain perfect information on the quality of encountered males. In this paper, we modify the strategy to allow for uncertainty of male quality and we determine how the magnitude of this uncertainty and the ability of females to inspect multiple male attributes to reduce uncertainty influence mate choice decisions. In general, searchers are sensitive to search costs and higher costs lower acceptance criteria under all versions of the model. The choosiness of searchers increases with the variability of the quality of prospective mates under conditions of the original model, but under conditions of uncertainty the choosiness of searchers may increase or decrease with the variability of inspected male attributes. The behavioral response depends on the functional relationship between observed male attributes and the fitness return to searchers and on costs associated with the search process. Higher uncertainty often induces searchers to pay more for information and under conditions of uncertainty the fitness return to searchers is never higher than under conditions of the original model. Further studies of the performance of alternative search strategies under conditions of uncertainty may consequently be necessary to identify search strategies likely to be used under natural conditions.  相似文献   

12.
Modeling vital rates improves estimation of population projection matrices   总被引:1,自引:1,他引:0  
Population projection matrices are commonly used by ecologists and managers to analyze the dynamics of stage-structured populations. Building projection matrices from data requires estimating transition rates among stages, a task that often entails estimating many parameters with few data. Consequently, large sampling variability in the estimated transition rates increases the uncertainty in the estimated matrix and quantities derived from it, such as the population multiplication rate and sensitivities of matrix elements. Here, we propose a strategy to avoid overparameterized matrix models. This strategy involves fitting models to the vital rates that determine matrix elements, evaluating both these models and ones that estimate matrix elements individually with model selection via information criteria, and averaging competing models with multimodel averaging. We illustrate this idea with data from a population of Silene acaulis (Caryophyllaceae), and conduct a simulation to investigate the statistical properties of the matrices estimated in this way. The simulation shows that compared with estimating matrix elements individually, building population projection matrices by fitting and averaging models of vital-rate estimates can reduce the statistical error in the population projection matrix and quantities derived from it.  相似文献   

13.
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.  相似文献   

14.
We have developed theoretical models for analysis of X-ray diffuse scattering from protein crystals. A series of models are proposed to be used for experimental data with different degrees of precision. First, we propose the normal mode model, where conformational dynamics of a protein is assumed to occur mostly in a limited conformational subspace spanned by a small number of low-frequency normal modes in the protein. When high precision data are available, variances and covariances of the normal mode variables can be determined from experimental data using this model. For experimental data with lower degrees of precision, we introduce a series of simpler models. These models express the covariance matrix using relatively simple empirical correlation functions by assuming the correlation between a pair of atoms to be isotropic. As an application of these simpler models, we calculate diffuse-scattering patterns from a human lysozyme crystal to examine how each adjustable parameter in the models affects general features of the resulting patterns. The results of the calculation are summarized as follows. (1) The higher order scattering makes a significant contribution at high resolutions. (2) The resulting simulated patterns are sensitive to changes in correlation lengths of about 1 Å, as well as to changes of the functional form of the correlation function. (3) But only the “average” value of the intra- and intermolecular correlation lengths seems to determine the gross features of the pattern. (4) The effect of the atom-dependent amplitude of fluctuations is difficult to observe. © 1994 John Wiley & Sons, Inc.  相似文献   

15.
We propose a network structure-based model for heterosis, and investigate it relying on metabolite profiles from Arabidopsis. A simple feed-forward two-layer network model (the Steinbuch matrix) is used in our conceptual approach. It allows for directly relating structural network properties with biological function. Interpreting heterosis as increased adaptability, our model predicts that the biological networks involved show increasing connectivity of regulatory interactions. A detailed analysis of metabolite profile data reveals that the increasing-connectivity prediction is true for graphical Gaussian models in our data from early development. This mirrors properties of observed heterotic Arabidopsis phenotypes. Furthermore, the model predicts a limit for increasing hybrid vigor with increasing heterozygosity—a known phenomenon in the literature.  相似文献   

16.
Time series data provided by single-molecule Förster resonance energy transfer (smFRET) experiments offer the opportunity to infer not only model parameters describing molecular complexes, e.g., rate constants, but also information about the model itself, e.g., the number of conformational states. Resolving whether such states exist or how many of them exist requires a careful approach to the problem of model selection, here meaning discrimination among models with differing numbers of states. The most straightforward approach to model selection generalizes the common idea of maximum likelihood—selecting the most likely parameter values—to maximum evidence: selecting the most likely model. In either case, such an inference presents a tremendous computational challenge, which we here address by exploiting an approximation technique termed variational Bayesian expectation maximization. We demonstrate how this technique can be applied to temporal data such as smFRET time series; show superior statistical consistency relative to the maximum likelihood approach; compare its performance on smFRET data generated from experiments on the ribosome; and illustrate how model selection in such probabilistic or generative modeling can facilitate analysis of closely related temporal data currently prevalent in biophysics. Source code used in this analysis, including a graphical user interface, is available open source via http://vbFRET.sourceforge.net.  相似文献   

17.
This paper objects to the arising problems when fitting graphical chain models to multidimensional data sets like the one at our disposal: the so-called graduates study. These models seem adequate, because the analysis of dependencies and associations among the variables of interest requires a multivariate statistical device that is rich enough to capture not only direct, but also indirect associations. Since even for a moderate number of variables the graphical chain model space is vast appropriate strategies for fitting graphical chain models are needed. Here, a data-driven selection strategy is discussed in detail.  相似文献   

18.
Mate choice and uncertainty in the decision process   总被引:1,自引:0,他引:1  
The behavior of females in search of a mate determines the likelihood that a high quality male is encountered in the search process and alternative search strategies provide different fitness returns to searchers. Models of search behavior are typically formulated on an assumption that the quality of prospective mates is revealed to searchers without error, either directly or by inspection of a perfectly informative phenotypic character. But recent theoretical developments suggest that the relative performance of a search strategy may be sensitive to any uncertainty associated with the to-be-realized fitness benefit of mate choice decisions. Indeed, uncertainty in the decision process is inevitable whenever unobserved male attributes influence the fitness of searchers. In this paper, we derive solutions to the sequential search strategy and the fixed sample search strategy for the general situation in which observed and unobserved male attributes affect the fitness consequences of female mate choice decisions and we determine how the magnitude of various parameters that are influential in the standard models alter these more general solutions. The distribution of unobserved attributes amongst prospective mates determines the uncertainty of mate choice decisions-the reliability of an observed male character as a predictor of male quality-and the realized functional relationship between an observed male character and the fitness return to searchers. The uncertainty of mate choice decisions induced by unobserved male attributes has no influence on the generalized model solutions. Thus, the results of earlier studies of these search models that rely on the use of a perfectly informative male character apply even if an observed male trait does not reveal the quality of prospective mates with certainty. But the solutions are sensitive to any changes of the distribution of unobserved male attributes that alter the realized functional relationship between an observed character and the fitness return to searchers. For example, the standard sequential search model exhibits a reservation property--the acceptability of prospective mates is delimited by a unique threshold criterion--and the existence of this model property under generalized conditions depends critically on the association between the observed and unobserved male characters. In our formulations of the models we assumed that females use a single male character to evaluate the quality of prospective mates, but the model properties generalize to situations in which male quality is evaluated by a direct inspection of multiple male characters.  相似文献   

19.
We introduce and exemplify an efficient method for direct samplingfrom hyper-inverse Wishart distributions. The method reliesvery naturally on the use of standard junction-tree representationof graphs, and couples these with matrix results for inverseWishart distributions. We describe the theory and resultingcomputational algorithms for both decomposable and nondecomposablegraphical models. An example drawn from financial time seriesdemonstrates application in a context where inferences on astructured covariance model are required. We discuss and investigatequestions of scalability of the simulation methods to higher-dimensionaldistributions. The paper concludes with general comments aboutthe approach, including its use in connection with existingMarkov chain Monte Carlo methods that deal with uncertaintyabout the graphical model structure.  相似文献   

20.
MOTIVATION: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. METHODS: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. RESULTS: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号