首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Can we model the probability of presence of species without absence data?   总被引:1,自引:0,他引:1  
In ecological studies, it is useful to estimate the probability that a species occurs at given locations. The probability of presence can be modeled by traditional statistical methods, if both presence and absence data are available. However, the challenge is that most species records contain only presence data, without reliable absence data. Previous presence‐only methods can estimate a relative index of habitat suitability, but cannot estimate the actual probability of presence. In this study, we develop a presence and background learning algorithm (PBL) that is successful in modeling the conditional probability of presence of a simulated species. The model is trained by two completely separate sets: observed presence and background data. Assuming that the probability of presence is one for ‘prototypical presence’ locations where the habitats are maximally suitable for a species, we can estimate a constant that can calibrate the trained model into the actual probability of presence. Experimental results show that the PBL method performs similarly to a presence‐absence method, and significantly better than the widely used maximum entropy method. The new algorithm enables us to model the probability that a species occurs conditional on environmental covariates without absence data. Hence, it has potential to improve modeling of the geographical distributions of species.  相似文献   

2.
Aim Several studies have found that more accurate predictive models of species’ occurrences can be developed for rarer species; however, one recent study found the relationship between range size and model performance to be an artefact of sample prevalence, that is, the proportion of presence versus absence observations in the data used to train the model. We examined the effect of model type, species rarity class, species’ survey frequency, detectability and manipulated sample prevalence on the accuracy of distribution models developed for 30 reptile and amphibian species. Location Coastal southern California, USA. Methods Classification trees, generalized additive models and generalized linear models were developed using species presence and absence data from 420 locations. Model performance was measured using sensitivity, specificity and the area under the curve (AUC) of the receiver‐operating characteristic (ROC) plot based on twofold cross‐validation, or on bootstrapping. Predictors included climate, terrain, soil and vegetation variables. Species were assigned to rarity classes by experts. The data were sampled to generate subsets with varying ratios of presences and absences to test for the effect of sample prevalence. Join count statistics were used to characterize spatial dependence in the prediction errors. Results Species in classes with higher rarity were more accurately predicted than common species, and this effect was independent of sample prevalence. Although positive spatial autocorrelation remained in the prediction errors, it was weaker than was observed in the species occurrence data. The differences in accuracy among model types were slight. Main conclusions Using a variety of modelling methods, more accurate species distribution models were developed for rarer than for more common species. This was presumably because it is difficult to discriminate suitable from unsuitable habitat for habitat generalists, and not as an artefact of the effect of sample prevalence on model estimation.  相似文献   

3.
In this paper we present a concept for using presence–absence data to recover information on the population dynamics of predator–prey systems. We use a highly complex and spatially explicit simulation model of a predator–prey mite system to generate simple presence–absence data: the number of patches with both prey and predators, with prey only, with predators only, and with neither species, along with the number of patches that change from one state to another in each time step. The average number of patches in the four states, as well as the average transition probabilities from one state to another, are then depicted in a state transition diagram, constituting the "footprints" of the underlying population dynamics. We investigate to what extent changes in the population processes modeled in the complex simulation (i.e. the predator's functional response and the dispersal rates of both species) are reflected by different footprints
The transition probabilities can be used to forecast the expected fate of a system given its current state. However, the transition probabilities in the modeled system depend on the number of patches in each state. We develop a model for the dependence of transition probabilities on state variables, and combine this information in a Markov chain transition matrix model. Finally, we use this extended model to predict the long-term dynamics of the system and to reveal its asymptotic steady state properties.  相似文献   

4.
Question: What are the effects of the number of presences on models generated with multivariate adaptive regression splines (MARS)? Do these effects vary with data quality and quantity and species ecology? Location: Spain and Ecuador. Methods: We used two data sets: (1) two trees from Spain, representing high‐occurrence number data sets with real absences and unbalanced prevalence; (2) two herbs from Ecuador, representing low‐occurrence number data sets without real absences and balanced prevalence. For model quality, we used two different measures: reliability and stability. For each sample size, different replicates were generated at random and then used to generate a consensus model. Results: Model reliability and stability decrease with sample size. Optimal minimum sample size varies depending on many factors, many of which are unknown. Regional niche variation and ecological heterogeneity are critical. Conclusions: (1) Model predictive power improves greatly with more than 18‐20 presences. (2) Model reliability depends on data quantity and quality as well as species ecological characteristics. (3) Depending on the number of presences in the data set, investigators must carefully distinguish between models that should be treated with skepticism and those whose predictions can be applied with reasonable confidence. (4) For species combining few initial presences and wide environmental range variation, it is advisable to generate several replicate models that partition the initial data and generate a consensus model. (5) Models of species with a narrow environmental range variation can be highly stable and reliable, even when generated with few presences.  相似文献   

5.
Tests for species interactions that involve the comparison of a statistic calculated from observed matrix of species presences and absences with the distribution of the same statistic generated from a null model have been used by ecologists for about 30 years. We argue that the validity of these tests requires a specific definition of independence. In particular, we note that an assumption that is often made is that all presence–absence matrices with the same row and column totals are equally likely if there is no interaction. However, we show using a simple model for species presences and absences without any species interactions that, in general, this assumption should be made with caution. Our model incorporates a definition of independence, allowing the computation of probabilities of different patterns in the null matrices. Other definitions of independence are possible; one of them is outlined using a new generalized linear model approach for carrying out tests applicable to different null models with or without the assumption of keeping row and column totals fixed. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

6.
The discriminating capacity (i.e. ability to correctly classify presences and absences) of species distribution models (SDMs) is commonly evaluated with metrics such as the area under the receiving operating characteristic curve (AUC), the Kappa statistic and the true skill statistic (TSS). AUC and Kappa have been repeatedly criticized, but TSS has fared relatively well since its introduction, mainly because it has been considered as independent of prevalence. In addition, discrimination metrics have been contested because they should be calculated on presence–absence data, but are often used on presence‐only or presence‐background data. Here, we investigate TSS and an alternative set of metrics—similarity indices, also known as F‐measures. We first show that even in ideal conditions (i.e. perfectly random presence–absence sampling), TSS can be misleading because of its dependence on prevalence, whereas similarity/F‐measures provide adequate estimations of model discrimination capacity. Second, we show that in real‐world situations where sample prevalence is different from true species prevalence (i.e. biased sampling or presence‐pseudoabsence), no discrimination capacity metric provides adequate estimation of model discrimination capacity, including metrics specifically designed for modelling with presence‐pseudoabsence data. Our conclusions are twofold. First, they unequivocally impel SDM users to understand the potential shortcomings of discrimination metrics when quality presence–absence data are lacking, and we recommend obtaining such data. Second, in the specific case of virtual species, which are increasingly used to develop and test SDM methodologies, we strongly recommend the use of similarity/F‐measures, which were not biased by prevalence, contrary to TSS.  相似文献   

7.
Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk.  相似文献   

8.
Most high‐performing species distribution modelling techniques require both presences, and either absences or pseudo‐absences or background points. In this paper, we explore the effect of sample size, towards developing improved strategies for modelling. We generated 1800 virtual species with three levels of prevalence using ten modelling techniques, while varying the number of training presences (NTP) and the number of random points (NRP representing pseudo‐absences or background sites). For five of the ten modelling techniques we built two versions of models: one with an equal total weight (ETW) setting where the total weight for pseudo‐absence is equivalent to the total weight for presence, and another with an unequal total weight (UTW) setting where the total weight for pseudo‐absence is not required to be equal to the total weight for presence. We compared two strategies for NRP: a small multiplier strategy (i.e. setting NRP at a few times as large as NTP), and a large number strategy (i.e. using numerous random points). We produced ensemble models (by averaging the predictions from 30 models built with the same set of training presences and different sets of random points in equivalent numbers) for three NTP magnitudes and two NRP strategies. We found that model accuracy altered as NRP increased with four distinct patterns of performance: increasing, decreasing, arch‐shaped and horizontal. In most cases ETW improved model performance. Ensemble models had higher accuracy than the corresponding single models, and this improvement was pronounced when NTP was low. We conclude that a large NRP is not always an appropriate strategy. The best choice for NRP will depend on the modelling techniques used, species prevalence and NTP. We recommend building ensemble models instead of single models, using the small multiplier strategy for NRP with ETW, especially when only a small number of species presence records are available.  相似文献   

9.
Species distribution models (SDMs) have been widely used in ecology, biogeography, and conservation. Although ecological theory predicts that species occupancy is dynamic, the outputs of SDMs are generally converted into a single occurrence map, and model performance is evaluated in terms of success to predict presences and absences. The aim of this study was to characterize the effects of a gradual response in species occupancy to environmental gradients into the performance of SDMs. First we outline guidelines for the appropriate simulation of artificial species that allows controlling for gradualism and prevalence in the occupancy patterns over an environmental gradient. Second, we derive theoretical expected values for success measures based on presence‐absence predictions (AUC, Kappa, sensitivity and specificity). And finally we used artificial species to exemplify and test the effect of a gradual probabilistic occupancy response to environmental gradients on SDM performance. Our results show that when a species responds gradually to an environmental gradient, conventional measures of SDM predictive success based on presence‐absence cannot be expected to attain currently accepted performance values considered as good, even for a model that recovers perfectly well the true probability of occurrence. A gradual response imposes a theoretical expected value for these measures of performance that can be calculated from the species properties. However, irrespective of the statistical modeling strategy used and of how gradual the species response is, one can recover the true probability of occurrence as a function of environmental variables provided that species and sample prevalence are similar. Therefore, model performance based on presence‐absence should be judged against the theoretical expected value rather than to absolute values currently in use such as AUC > 0.8. Overall, we advocate for a wider use of the probability of occurrence and emphasize the need for further technical developments in this sense.  相似文献   

10.
Modelling the distribution of invasive alien species is widely used for predicting future dispersal, response to climate change, and effects of management, but little information is available on the scale dependence of spatial models. This study is focused on Heracleum mantegazzianum , a problematic invasive plant in central and north-western Europe. The main objective was to model the current distribution of this species at national (43,000 km2) and regional scale (4900 km2) using autologistic regression with a Danish data set. Presence–absence data were used in a grid system with 5 × 5 km2 or 2 × 2 km2 as basic units. To avoid misleading presence–absence models and unreliable probability values due to unbalanced data, the prevalence was used as cut-off value, and a favourability function was applied to the model predictions. The national model showed a widespread distribution of H. mantegazzianum with highest habitat suitability in the eastern and northern parts of the country where human population density is high, winters more severe and/or loamy soils more common. At a regional scale the distribution of H. mantegazzianum is associated with alluvial sand cover, high human population density, spring precipitation, and presence of the species in neighbour grid units. The observed widespread national distribution is likely the result of anthropogenic spread of this ornamental plant, while the locally clumped distribution suggests that H. mantegazzianum naturally spreads mainly over short distances. The current distribution in Denmark resembles an intermediate invasion stage where long-distance dispersal is less important, while spread from suitable neighbour habitats is significant. The study demonstrates that the favourability function leads to improved mapping standards for invasive species.  相似文献   

11.
Co-occurrence of ectoparasites of marine fishes: a null model analysis   总被引:5,自引:0,他引:5  
We used null model analysis to test for nonrandomness in the structure of metazoan ectoparasite communities of 45 species of marine fish. Host species consistently supported fewer parasite species combinations than expected by chance, even in analyses that incorporated empty sites. However, for most analyses, the null hypothesis was not rejected, and co-occurrence patterns could not be distinguished from those that might arise by random colonization and extinction. We compared our results to analyses of presence–absence matrices for vertebrate taxa, and found support for the hypothesis that there is an ecological continuum of community organization. Presence–absence matrices for small-bodied taxa with low vagility and/or small populations (marine ectoparasites, herps) were mostly random, whereas presence–absence matrices for large-bodied taxa with high vagility and/or large populations (birds, mammals) were highly structured. Metazoan ectoparasites of marine fishes fall near the low end of this continuum, with little evidence for nonrandom species co-occurrence patterns.  相似文献   

12.
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates.  相似文献   

13.
Yip PS  Lin HZ  Xi L 《Biometrics》2005,61(4):1085-1092
A semiparametric estimation procedure is proposed to model capture-recapture data with the aim of estimating the population size for a closed population. Individuals' covariates are possibly time dependent and missing at noncaptured times and may be measured with error. A set of estimating equations (EEs) based on covariate process and capture-recapture data is constructed to estimate the relevant parameters and the population size. These EEs can be solved by an algorithm similar to an EM algorithm. Simulation results show that the proposed procedures work better than the naive estimate. In some cases they are even better than "ideal" estimates, for which the true values of covariates are available for all captured subjects over the entire experimental period. We apply the method to a capture-recapture experiment on the bird species Prinia flaviventris in Hong Kong.  相似文献   

14.
Estimates of HIV prevalence computed using data obtained from sampling a subgroup of the national population may lack the representativeness of all the relevant domains of the population. These estimates are often computed on the assumption that HIV prevalence is uniform across all domains of the population. Use of appropriate statistical methods together with population-based survey data can enhance better estimation of national and subgroup level HIV prevalence and can provide improved explanations of the variation in HIV prevalence across different domains of the population. In this study we computed design-consistent estimates of HIV prevalence, and their respective 95% confidence intervals at both the national and subgroup levels. In addition, we provided a multivariable survey logistic regression model from a generalized linear modelling perspective for explaining the variation in HIV prevalence using demographic, socio-economic, socio-cultural and behavioural factors. Essentially, this study borrows from the proximate determinants conceptual framework which provides guiding principles upon which socio-economic and socio-cultural variables affect HIV prevalence through biological behavioural factors. We utilize the 2010–11 Zimbabwe Demographic and Health Survey (2010–11 ZDHS) data (which are population based) to estimate HIV prevalence in different categories of the population and for constructing the logistic regression model. It was established that HIV prevalence varies greatly with age, gender, marital status, place of residence, literacy level, belief on whether condom use can reduce the risk of contracting HIV and level of recent sexual activity whereas there was no marked variation in HIV prevalence with social status (measured using a wealth index), method of contraceptive and an individual’s level of education.  相似文献   

15.
K Drescher  W Schill 《Biometrics》1991,47(4):1247-1256
By fitting an unconditional logistic regression model to unmatched case-control data, an estimate of the joint population attributable risk for the factor included is obtained. This estimate and its asymptotic variance can easily be computed from the intercept parameter and its asymptotic variance. A generalization to the analysis of stratified data with large strata enables the calculation of stratum-specific attributable risks and their variances via stratum-specific intercept parameters. If sampling of cases is independent of strata, an estimate of the summary attributable risk and its asymptotic variance may be obtained as a weighted sum of the stratum-specific attributable risks.  相似文献   

16.
17.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

18.
A new method for analyzing three-state protein unfolding equilibria is described that overcomes the difficulties created by direct effects of denaturants on circular dichroism (CD) and fluorescence spectra of the intermediate state. The procedure begins with a singular value analysis of the data matrix to determine the number of contributing species and perturbations. This result is used to choose a fitting model and remove all spectra from the fitting equation. Because the fitting model is a product of a matrix function which is nonlinear in the thermodynamic parameters and a matrix that is linear in the parameters that specify component spectra, the problem is solved with a variable projection algorithm. Advantages of this procedure are perturbation spectra do not have to be estimated before fitting, arbitrary assumptions about magnitudes of parameters that describe the intermediate state are not required, and multiple experiments involving different spectroscopic techniques can be simultaneously analyzed. Two tests of this method were performed: First, simulated three-state data were analyzed, and the original and recovered thermodynamic parameters agreed within one standard error, whereas recovered and original component spectra agreed within 0.5%. Second, guanidine-induced unfolding titrations of the human retinoid-X-receptor ligand-binding domain were analyzed according to a three-state model. The standard unfolding free energy changes in the absence of guanidine and the guanidine concentrations at zero free-energy change for both transitions were determined from a joint analysis of fluorescence and CD spectra. Realistic spectra of the three protein states were also obtained.  相似文献   

19.
Liu L  Pearl DK 《Systematic biology》2007,56(3):504-514
The desire to infer the evolutionary history of a group of species should be more viable now that a considerable amount of multilocus molecular data is available. However, the current molecular phylogenetic paradigm still reconstructs gene trees to represent the species tree. Further, commonly used methods of combining data, such as the concatenation method, are known to be inconsistent in some circumstances. In this paper, we propose a Bayesian hierarchical model to estimate the phylogeny of a group of species using multiple estimated gene tree distributions, such as those that arise in a Bayesian analysis of DNA sequence data. Our model employs substitution models used in traditional phylogenetics but also uses coalescent theory to explain genealogical signals from species trees to gene trees and from gene trees to sequence data, thereby forming a complete stochastic model to estimate gene trees, species trees, ancestral population sizes, and species divergence times simultaneously. Our model is founded on the assumption that gene trees, even of unlinked loci, are correlated due to being derived from a single species tree and therefore should be estimated jointly. We apply the method to two multilocus data sets of DNA sequences. The estimates of the species tree topology and divergence times appear to be robust to the prior of the population size, whereas the estimates of effective population sizes are sensitive to the prior used in the analysis. These analyses also suggest that the model is superior to the concatenation method in fitting these data sets and thus provides a more realistic assessment of the variability in the distribution of the species tree that may have produced the molecular information at hand. Future improvements of our model and algorithm should include consideration of other factors that can cause discordance of gene trees and species trees, such as horizontal transfer or gene duplication.  相似文献   

20.
Aim  To highlight the benefit of using habitat use to improve the accuracy of predictive road fatality models.
Location  The Snowy Mountains Highway in southern New South Wales, Australia.
Methods  A binary logistic regression model was constructed using wombat fatality presences and randomly generated absences. Species-specific habitat variables were included as predictors in the model selection process as well as two spatially explicit measures of wombat habitat use. Generalized additive models (GAMs) were constructed for each possible combination of predictors in R. The final model was selected by comparing all models subsets for the eight predictors and employing the one standard error rule to select the best model set.
Results  The final predictive model had high discriminatory power and incorporated both measures of species habitat use, greatly exceeding the variation explained by a previously published model for the same species and road.
Main Conclusions  Our findings highlight the importance of incorporating variables that describe habitat use by fauna for predictive modelling of animal-vehicle crashes. Reliance upon models that ignore landscape patterns are limited in their capacity to identify hotspots and inform managers of locations to engage in mitigation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号