首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We prove that the generalized Poisson distribution GP(theta, eta) (eta > or = 0) is a mixture of Poisson distributions; this is a new property for a distribution which is the topic of the book by Consul (1989). Because we find that the fits to count data of the generalized Poisson and negative binomial distributions are often similar, to understand their differences, we compare the probability mass functions and skewnesses of the generalized Poisson and negative binomial distributions with the first two moments fixed. They have slight differences in many situations, but their zero-inflated distributions, with masses at zero, means and variances fixed, can differ more. These probabilistic comparisons are helpful in selecting a better fitting distribution for modelling count data with long right tails. Through a real example of count data with large zero fraction, we illustrate how the generalized Poisson and negative binomial distributions as well as their zero-inflated distributions can be discriminated.  相似文献   

2.
Aim A broad suit of climate data sets is becoming available for use in predictive species modelling. We compare the efficacy of using interpolated climate surfaces [Center for Resource and Environmental Studies (CRES) and Climate Research Unit (CRU)] or high‐resolution model‐derived climate data [Division of Atmospheric Research limited‐area model (DARLAM)] for predictive species modelling, using tick distributions from sub‐Saharan Africa. Location The analysis is restricted to sub‐Saharan Africa. The study area was subdivided into 3000 grids cells with a resolution of 60 × 60 km. Methods Species distributions were predicted using an established multivariate climate envelope modelling approach and three very different climate data sets. The recorded variance in the climate data sets was quantified by employing omnidirectional variograms. To further compare the interpolated tick distributions that flowed from using three climate data sets, we calculated true positive (TP) predictions, false negative (FN) predictions as well as the proportional overlaps between observed and modelled tick distributions. In addition, the effect of tick data set size on the performance of the climate data sets was evaluated by performing random draws of known tick distribution records without replacement. Results The predicted distributions were consistently wider ranging than the known records when using any of the three climate data sets. However, the proportional overlap between predicted and known distributions varied as follows: for Rhipicephalus appendiculatus Neumann (Acari: Ixodidae), these were 60%, 60% and 70%; for Rhipicephalus longus Neumann (Acari: Ixodidae) 60%, 57% and 75%; for Rhipicephalus zambeziensis Walker, Norval & Corwin (Acari: Ixodidae) 57%, 51% and 62%, and for Rhipicephalus capensis Koch (Acari: Ixodidae) 70%, 60% and 60% using the CRES, CRU and DARLAM climate data sets, respectively. All data sets were sensitive to data size but DARLAM performed better when using smaller species data sets. At a 20% data subsample level, DARLAM was able to capture more than 50% of the known records and captured more than 60% of known records at higher subsample levels. Main conclusions The use of data derived from high‐resolution nested climate models (e.g. DARLAM) provided equal or even better species distribution modelling performance. As the model is dynamic and process based, the output data are available at the modelled resolution, and are not hamstrung by the sampling intensity of observed climate data sets (c. one sample per 30,000 km2 for Africa). In addition, when exploring the biodiversity consequences of climate change, these modelled outputs form a more useful basis for comparison with modelled future climate scenarios.  相似文献   

3.
Aim To investigate the application of environmental modelling to reconstructive mapping of pre‐impact vegetation using historical survey records and remnant vegetation data. Location The higher elevation regions of the Fleurieu Peninsula region in South Australia were selected as a case study. The Fleurieu Peninsula is an area typical of many agricultural regions in temperate Australia that have undergone massive environmental transformation since European settlement. Around 9% of the present land cover is remnant vegetation and historical survey records from the ad 1880s exist. It is a region with strong gradients in climate and topography. Methods Records of pre‐impact vegetation distribution made in surveyors’ field notebooks were transcribed into a geographical information system and the spatial and classificatory accuracy of these records was assessed. Maps of remnant vegetation distribution were obtained. Analysis was undertaken to quantify the environmental domains of historical survey record and remnant vegetation data to selected meso‐scaled climatic parameters and topo‐scaled terrain‐related indices at a 20 m resolution. An exploratory analytical procedure was used to quantify the probability of occurrence of vegetation types in environmental domains. Probability models spatially extended to geographical space produce maps of the probability of occurrence of vegetation types. Individual probability maps were combined to produce a pre‐impact vegetation map of the region. Results Surveyors’ field notebook records provide reliable information that is accurately locatable to levels of resolution such that the vegetation data can be spatially correlated with environmental variables generated on 20 m resolution environmental data sets. Historical survey records of vegetation were weakly correlated with the topo‐scaled environmental variables but were correlated with meso‐scaled climate. Remnant vegetation records similarly not only correlated to climate but also displayed stronger relationships with the topo‐scaled environmental variables, particularly slope. Main conclusions A major conclusion of this study is that multiple sources of evidence are required to reconstruct past vegetation patterns in heavily transformed region. Neither the remnant vegetation data nor historical survey records provided adequate data sets on their own to reconstruct the pre‐impact vegetation of the Fleurieu Peninsula. Multiple sources of evidence provide the only means of assessing the environmental and historical representativeness of data sets. The spatial distribution of historical survey records was more environmentally representative than remnant vegetation data, which reflect biases due to land clearance. Historical survey records were also shown to be classificatory and spatially accurate, thus are suitable for quantitative spatial analyses. Analysis of different spatial vegetation data sets in an environmental modelling framework provided a rigorous means of assessing and comparing respective data sets as well as mapping their predicted distributions based on quantitative correlations. The method could be usefully applied to other regions where predictions of pre‐impact vegetation cover are required.  相似文献   

4.
The question of how to characterize the bacterial density in a body of water when data are available as counts from a number of small-volume samples was examined for cases where either the Poisson or negative binomial probability distributions could be used to describe the bacteriological data. The suitability of the Poisson distribution when replicate analyses were performed under carefully controlled conditions and of the negative binomial distribution for samples collected from different locations and over time were illustrated by two examples. In cases where the negative binomial distribution was appropriate, a procedure was given for characterizing the variability by dividing the bacterial counts into homogeneous groups. The usefulness of this procedure was illustrated for the second example based on survey data for Lake Erie. A further illustration of the difference between results based on the Poisson and negative binomial distributions was given by calculating the probability of obtaining all samples sterile, assuming various bacterial densities and sample sizes.  相似文献   

5.
Molecular loci that fail relative-rate tests are said to be "overdispersed." Traditional molecular-clock approaches to estimating divergence times do not take this into account. In this study, a method was developed to estimate divergence times using loci that may be overdispersed. The approach was to replace the traditional Poisson process assumption with a more general stationary process assumption. A probability model was developed, and an accompanying computer program was written to find maximum-likelihood estimates of divergence times under both the Poisson process and the stationary process assumptions. In simulation, it was shown that confidence intervals under the traditional Poisson assumptions often vastly underestimate the true confidence limits for overdispersed loci. Both models were applied to two data sets: one from land plants, the other from the higher metazoans. In both cases, the traditional Poisson process model could be rejected with high confidence. Maximum-likelihood analysis of the metazoan data set under the more general stationary process suggested that their radiation occurred well over a billion years ago, but confidence intervals were extremely wide. It was also shown that a model consistent with a Cambrian (or nearly Cambrian) origination of the animal phyla, although significantly less likely than a much older divergence, fitted the data well. It is argued that without an a priori understanding of the variance in the time between substitutions, molecular data sets may be incapable of ever establishing the age of the metazoan radiation.  相似文献   

6.
1.?State space models are starting to replace more simple time series models in analyses of temporal dynamics of populations that are not perfectly censused. By simultaneously modelling both the dynamics and the observations, consistent estimates of population dynamical parameters may be obtained. For many data sets, the distribution of observation errors is unknown and error models typically chosen in an ad-hoc manner. 2.?To investigate the influence of the choice of observation error on inferences, we analyse the dynamics of a replicated time series of red kangaroo surveys using a state space model with linear state dynamics. Surveys were performed through aerial counts and Poisson, overdispersed Poisson, normal and log-normal distributions may all be adequate for modelling observation errors for the data. We fit each of these to the data and compare them using AIC. 3.?The state space models were fitted with maximum likelihood methods using a recent importance sampling technique that relies on the Kalman filter. The method relaxes the assumption of Gaussian observation errors required by the basic Kalman filter. Matlab code for fitting linear state space models with Poisson observations is provided. 4.?The ability of AIC to identify the correct observation model was investigated in a small simulation study. For the parameter values used in the study, without replicated observations, the correct observation distribution could sometimes be identified but model selection was prone to misclassification. On the other hand, when observations were replicated, the correct distribution could typically be identified. 5.?Our results illustrate that inferences may differ markedly depending on the observation distributions used, suggesting that choosing an adequate observation model can be critical. Model selection and simulations show that for the models and parameter values in this study, a suitable observation model can typically be identified if observations are replicated. Model selection and replication of observations, therefore, provide a potential solution when the observation distribution is unknown.  相似文献   

7.
It is shown that any discrete distribution with non-negative support has a representation in terms of an extended Poisson process (or pure birth process). A particular extension of the simple Poisson process is proposed: one that admits a variety of distributions; the equations for such processes may be readily solved numerically. An analytical approximation for the solution is given, leading to approximate mean-variance relationships. The resulting distributions are then applied to analyses of some biological data-sets.  相似文献   

8.
Bivariate cumulative damage models are proposed where the responses given the damages are independent random variables. The bivariate damage process can be either bivariate Poisson or bivariate gamma. A bivariate continuous cumulative damage model is investigated in which the responses given the damages have gamma distributions. In this case evaluation of the joint density function and bivariate tail probability function is facilitated by expanding the gamma distributions of the conditional responses by Laguerre polynomials. This approach also leads to evaluation of associated survival models. Moments and estimating equations are discussed. In addition, a bivariate discrete cumulative damage model is investigated in which the responses given the damages have a distribution chosen from a class that includes the negative binomial, the Neyman Type‐A, the Polya‐Aeppli, and the Lagrangian Poisson. Probabilities are obtained from recursive formulas which do not involve cancellation error as all quantities are non‐negative. Moments and estimating equations are presented for these models also. The continuous and the discrete models are applied to describe the rise of systolic and diastolic blood pressure with age.  相似文献   

9.
Although it is widely predicted that the geographic distributions of tree species and forest types will undergo substantial shifts in future, modelling approaches used to date are largely unable to project the pace at which forest distributions will respond to environmental change. The expansion and contraction of forest distributions act against considerable demographic inertia in the present composition and size‐structure of forest stands as climate‐induced changes in growth, mortality, and recruitment alter population dynamics through time. We aimed to better understand how shifts in forest distributions reflect long‐term changes in tree demographic rates and population dynamics, and how such shifts are influenced by 1) disturbance from forest harvesting and 2) local environmental heterogeneity. Using a simple, data‐constrained gap model, we simulated regional forest dynamics in the eastern United States over the next 500 yr. We then compared the geographic distributions of five different forest types through time under present and altered climatic conditions, in scenarios that variously included and excluded forest harvesting and environmental heterogeneity. Although we held climate fixed after 100 yr, it took another 160 yr after this for these forest types to collectively experience 90% of their eventual climate‐related distribution gains and losses. Competition strongly affected the nature of responses to climate change. Harvesting accelerated and amplified gains by an early‐successional forest type at the expense of a late‐successional one, but these gains did not occur faster than those for other forest types. Environmental heterogeneity had little effect on distribution gains or losses through time. These findings indicate that forest distributions should respond quite slowly to climate change, with the leading and trailing edges of different forest types shifting over a span of centuries. Disturbances can expedite some transitions, but are unlikely to lead to wholesale changes in forest types in the coming decades.  相似文献   

10.
11.
Comparison of frequency distributions in flow cytometry   总被引:2,自引:0,他引:2  
  相似文献   

12.
Species abundances are undoubtedly the most widely available macroecological data, but can we use them to distinguish among several models of community structure? Here we present a Bayesian analysis of species‐abundance data that yields a full joint probability distribution of each model's parameters plus a relatively parameter‐independent criterion, the posterior Bayes factor, to compare these models. We illustrate our approach by comparing three classical distributions: the zero‐sum multinomial (ZSM) distribution, based on Hubbell's neutral model, the multivariate Poisson lognormal distribution (MPLN), based on niche arguments, and the discrete broken stick (DBS) distribution, based on MacArthur's broken stick model. We give explicit formulas for the probability of observing a particular species‐abundance data set in each model, and argue that conditioning on both sample size and species count is needed to allow comparisons between the two distributions. We apply our approach to two neotropical communities (trees, fish). We find that DBS is largely inferior to ZSM and MPLN for both communities. The tree data do not allow discrimination between ZSM and MPLN, but for the fish data ZSM (neutral model) overwhelmingly outperforms MPLN (niche model), suggesting that dispersal plays a previously underestimated role in structuring tropical freshwater fish communities. We advocate this approach for identifying the relative importance of dispersal and niche‐partitioning in determining diversity of different ecological groups of species under different environmental conditions.  相似文献   

13.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

14.
Accurately estimating probabilities from observations is important for probabilistic-based approaches to problems in computational biology. In this paper we present a biologically-motivated method for estimating probability distributions over discrete alphabets from observations using a mixture model of common ancestors. The method is an extension of substitution matrix-based probability estimation methods. In contrast to previous such methods, our method has a simple Bayesian interpretation and has the advantage over Dirichlet mixtures that it is both effective and simple to compute for large alphabets. The method is applied to estimate amino acid probabilities based on observed counts in an alignment and is shown to perform comparably to previous methods. The method is also applied to estimate probability distributions over protein families and improves protein classification accuracy.  相似文献   

15.
Current circumstances — that the majority of species distribution records exist as presence‐only data (e.g. from museums and herbaria), and that there is an established need for predictions of species distributions — mean that scientists and conservation managers seek to develop robust methods for using these data. Such methods must, in particular, accommodate the difficulties caused by lack of reliable information about sites where species are absent. Here we test two approaches for overcoming these difficulties, analysing a range of data sets using the technique of multivariate adaptive regression splines (MARS). MARS is closely related to regression techniques such as generalized additive models (GAMs) that are commonly and successfully used in modelling species distributions, but has particular advantages in its analytical speed and the ease of transfer of analysis results to other computational environments such as a Geographic Information System. MARS also has the advantage that it can model multiple responses, meaning that it can combine information from a set of species to determine the dominant environmental drivers of variation in species composition. We use data from 226 species from six regions of the world, and demonstrate the use of MARS for distribution modelling using presence‐only data. We test whether (1) the type of data used to represent absence or background and (2) the signal from multiple species affect predictive performance, by evaluating predictions at completely independent sites where genuine presence–absence data were recorded. Models developed with absences inferred from the total set of presence‐only sites for a biological group, and using simultaneous analysis of multiple species to inform the choice of predictor variables, performed better than models in which species were analysed singly, or in which pseudo‐absences were drawn randomly from the study area. The methods are fast, relatively simple to understand, and useful for situations where data are limited. A tutorial is included.  相似文献   

16.
Aim (1) To increase awareness of the challenges induced by imperfect detection, which is a fundamental issue in species distribution modelling; (2) to emphasize the value of replicate observations for species distribution modelling; and (3) to show how ‘cheap’ checklist data in faunal/floral databases may be used for the rigorous modelling of distributions by site‐occupancy models. Location Switzerland. Methods We used checklist data collected by volunteers during 1999 and 2000 to analyse the distribution of the blue hawker, Aeshna cyanea (Odonata, Aeshnidae), a common dragonfly in Switzerland. We used data from repeated visits to 1‐ha pixels to derive ‘detection histories’ and apply site‐occupancy models to estimate the ‘true’ species distribution, i.e. corrected for imperfect detection. We modelled blue hawker distribution as a function of elevation and year and its detection probability of elevation, year and season. Results The best model contained cubic polynomial elevation effects for distribution and quadratic effects of elevation and season for detectability. We compared the site‐occupancy model with a conventional distribution model based on a generalized linear model, which assumes perfect detectability (p = 1). The conventional distribution map looked very different from the distribution map obtained using site‐occupancy models that accounted for the imperfect detection. The conventional model underestimated the species distribution by 60%, and the slope parameters of the occurrence–elevation relationship were also underestimated when assuming p = 1. Elevation was not only an important predictor of blue hawker occurrence, but also of the detection probability, with a bell‐shaped relationship. Furthermore, detectability increased over the season. The average detection probability was estimated at only 0.19 per survey. Main conclusions Conventional species distribution models do not model species distributions per se but rather the apparent distribution, i.e. an unknown proportion of species distributions. That unknown proportion is equivalent to detectability. Imperfect detection in conventional species distribution models yields underestimates of the extent of distributions and covariate effects that are biased towards zero. In addition, patterns in detectability will erroneously be ascribed to species distributions. In contrast, site‐occupancy models applied to replicated detection/non‐detection data offer a powerful framework for making inferences about species distributions corrected for imperfect detection. The use of ‘cheap’ checklist data greatly enhances the scope of applications of this useful class of models.  相似文献   

17.
In clinical trials one traditionally models the effect of treatment on the mean response. The underlying assumption is that treatment affects the response distribution through a mean location shift on a suitable scale, with other aspects of the distribution (shape/dispersion/variance) remaining the same. This work is motivated by a trial in Parkinson's disease patients in which one of the endpoints is the number of falls during a 10‐week period. Inspection of the data reveals that the Poisson‐inverse Gaussian (PiG) distribution is appropriate, and that the experimental treatment reduces not only the mean, but also the variability, substantially. The conventional analysis assumes a treatment effect on the mean, either adjusted or unadjusted for covariates, and a constant dispersion parameter. On our data, this analysis yields a non‐significant treatment effect. However, if we model a treatment effect on both mean and dispersion parameters, both effects are highly significant. A simulation study shows that if a treatment effect exists on the dispersion and is ignored in the modelling, estimation of the treatment effect on the mean can be severely biased. We show further that if we use an orthogonal parametrization of the PiG distribution, estimates of the mean model are robust to misspecification of the dispersion model. We also discuss inferential aspects that are more difficult than anticipated in this setting. These findings have implications in the planning of statistical analyses for count data in clinical trials.  相似文献   

18.
Summary This paper presents a series of simulations designed to determine optimal diet breadth under shortfall avoidance models. Profitability and encounter rate functions were varied, and means and variances of energy intake rate were generated using a simple simulation procedure. The resulting mean-variance sets assumed three distinct shapes: u-shaped, arched, and looped. These simulations show that certain mean-variance sets allow the forager to employ simple behavioural rules to determine the optimal diet breadth. This situation occurs when low ranking diet items have small handling times, and these conditions may be quite common. In other cases, mean-variance sets may be too complicated to allow for easy behavioural rules designed to minimize starvation probability. The ability to characterize foraging problems into a limited series of mean-variance set types benefits workers examining the evolution and maintenance of foraging strategies, since these sets have clear implications for the ability of animals to develop simple behavioural rules. Unfortunately data are lacking on the profitability and encounter rate distributions animals face in nature.  相似文献   

19.
We present the one‐inflated zero‐truncated negative binomial (OIZTNB) model, and propose its use as the truncated count distribution in Horvitz–Thompson estimation of an unknown population size. In the presence of unobserved heterogeneity, the zero‐truncated negative binomial (ZTNB) model is a natural choice over the positive Poisson (PP) model; however, when one‐inflation is present the ZTNB model either suffers from a boundary problem, or provides extremely biased population size estimates. Monte Carlo evidence suggests that in the presence of one‐inflation, the Horvitz–Thompson estimator under the ZTNB model can converge in probability to infinity. The OIZTNB model gives markedly different population size estimates compared to some existing truncated count distributions, when applied to several capture–recapture data that exhibit both one‐inflation and unobserved heterogeneity.  相似文献   

20.
In many biometrical applications, the count data encountered often contain extra zeros relative to the Poisson distribution. Zero‐inflated Poisson regression models are useful for analyzing such data, but parameter estimates may be seriously biased if the nonzero observations are over‐dispersed and simultaneously correlated due to the sampling design or the data collection procedure. In this paper, a zero‐inflated negative binomial mixed regression model is presented to analyze a set of pancreas disorder length of stay (LOS) data that comprised mainly same‐day separations. Random effects are introduced to account for inter‐hospital variations and the dependency of clustered LOS observations. Parameter estimation is achieved by maximizing an appropriate log‐likelihood function using an EM algorithm. Alternative modeling strategies, namely the finite mixture of Poisson distributions and the non‐parametric maximum likelihood approach, are also considered. The determination of pertinent covariates would assist hospital administrators and clinicians to manage LOS and expenditures efficiently.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号