首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Species distribution models should provide conservation practioners with estimates of the spatial distributions of species requiring attention. These species are often rare and have limited known occurrences, posing challenges for creating accurate species distribution models. We tested four modeling methods (Bioclim, Domain, GARP, and Maxent) across 18 species with different levels of ecological specialization using six different sample size treatments and three different evaluation measures. Our assessment revealed that Maxent was the most capable of the four modeling methods in producing useful results with sample sizes as small as 5, 10 and 25 occurrences. The other methods compensated reasonably well (Domain and GARP) to poorly (Bioclim) when presented with datasets of small sample sizes. We show that multiple evaluation measures are necessary to determine accuracy of models produced with presence-only data. Further, we found that accuracy of models is greater for species with small geographic ranges and limited environmental tolerance, ecological characteristics of many rare species. Our results indicate that reasonable models can be made for some rare species, a result that should encourage conservationists to add distribution modeling to their toolbox.  相似文献   

2.
Species distribution models (SDMs) relate presence/absence data to environmental variables, allowing to predict species environmental requirements and potential distribution. They have been increasingly used in fields such as ecology, biogeography and evolution, and often support conservation priorities and strategies. Thus, it becomes crucial to understand how trustworthy and reliable their predictions are. Different approaches, such as using ensemble methods (combining forecasts of different single models), or applying the most suitable threshold to transform continuous probability maps into species presences or absences, have been used to reduce model-based uncertainty. Taking into account the influence of biased sampling imprecision in species location, small datasets and species ecological characteristics, may also help to detect and compensate for uncertainty in the model building process. To investigate the effect of applying an ensemble approach, several threshold selection criteria and different datasets representing seasonal and spatial sampling bias, on models' accuracy, SDMs were built for four estuarine fish species with distinct use of the estuarine systems. Overall, predictions obtained with the ensemble approach were more accurate. Variability in accuracy metrics obtained with the nine threshold selection criteria applied was more pronounced for species with low prevalence and when sensitivity was calculated. Higher values of accuracy measures were registered with the threshold that maximizes the sum of sensitivity and specificity, and the threshold where the predicted prevalence equals the observed, whereas the 0.5 cut-off was unreliable, originating the lowest values for these metrics. Accuracy of models created from a spatially biased sampling was overall higher than accuracy of models created with a seasonally biased sampling or with the multi-year database created and this pattern was consistently obtained for marine migrant species, which use estuaries as nursery areas, presenting a seasonally and regular use of these ecosystems. The ecological dependence between these fish species and estuaries may add difficulties in the model building process, and needs to be taken into account, to improve their accuracy. The present study highlights the need for a thorough analysis of the critical underlying issues of the complete model building process to predict the distribution of estuarine fish species, due to the particular and dynamic nature of these ecosystems.  相似文献   

3.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

4.
Sets of presence records used to model species’ distributions typically consist of observations collected opportunistically rather than systematically. As a result, sampling probability is geographically uneven, which may confound the model's characterization of the species’ distribution. Modelers frequently address sampling bias by manipulating training data: either subsampling presence data or creating a similar spatial bias in non‐presence background data. We tested a new method, which we call ‘background thickening’, in the latter category. Background thickening entails concentrating background locations around presence locations in proportion to presence location density. We compared background thickening to two established sampling bias correction methods – target group background selection and presence thinning – using simulated data and data from a case study. In the case study, background thickening and presence thinning performed similarly well, both producing better model discrimination than target group background selection, and better model calibration than models without correction. In the simulation, background thickening performed better than presence thinning when the number of simulated presence locations was low, and vice versa. We discuss drawbacks to target group background selection, why background thickening and presence thinning are conservative but robust sampling bias correction methods, and why background thickening is better than presence thinning for small sample sizes. Particularly, background thickening is advantageous for treating sampling bias when data are scarce because it avoids discarding presence records.  相似文献   

5.
Georeferencing error is prevalent in datasets used to model species distributions, inducing uncertainty in covariate values associated with species occurrences that result in biased probability of occurrence estimates. Traditionally, this error has been dealt with at the data‐level by using only records with an acceptable level of error (filtering) or by summarizing covariates at sampling units by using measures of central tendency (averaging). Here we compare those previous approaches to a novel implementation of a Bayesian logistic regression with measurement error (ME), a seldom used method in species distribution modeling. We show that the ME model outperforms data‐level approaches on 1) specialist species and 2) when either sample sizes are small, the georeferencing error is large or when all georeferenced occurrences have a fixed level of error. Thus, for certain types of species and datasets the ME model is an effective method to reduce biases in probability of occurrence estimates and account for the uncertainty generated by georeferencing error. Our approach may be expanded for its use with presence‐only data as well as to include other sources of uncertainty in species distribution models.  相似文献   

6.
Aim The proportion of sampled sites where a species is present is known as prevalence. Empirical studies have shown that prevalence can affect the predictive performance of species distribution models. This paper uses simulated species data to examine how prevalence and the form of species environmental dependence affect the assessment of the predictive performance of models. Methods Simulated species data were based on various functions of simulated environmental data with differing degrees of spatial correlation. Seven model performance measures – sensitivity, specificity, class‐average (CA), overall prediction success, kappa (κ), normalized mutual information (NMI) and area under the receiver operating characteristic curve (AUC) – were applied to species models fitted by three regression methods. The response of the performance measures to prevalence was then assessed. Three probability threshold selection methods used to convert fitted logistic model values to presence or absence were also assessed. Results The study shows that the extent to which prevalence affects model performance depends on the modelling technique and its degree of success in capturing dominant environmental determinants. It also depends on the statistic used to measure model performance and the probability threshold method. The response based on κ generally preferred models with medium prevalence. All performance measures were least affected by prevalence when the probability threshold was chosen to maximize predictive performance or was based directly on prevalence. In these cases, the responses based on AUC, CA and NMI generally preferred models with small or large prevalence. Main conclusions The effect of prevalence on the predictive performance of species distribution models has a methodological basis. Relevant factors include the success of the fitted distribution model in capturing the dominant environmental determinant, the model performance measure and the probability threshold selection method. The fixed probability threshold method yields a marked response of model performance to prevalence and is therefore not recommended. The study explains previous empirical results obtained with real data.  相似文献   

7.
Models of the distribution of rare and endangered species are important tools for their monitoring and management. Presence data used to build up distribution models can be based on simple random sampling, but this for patchy distributed species results in small number of presences and therefore low precision. Convenience sampling, either based on easily accessible units or a priori knowledge of the species habitat but with no known probability of sampling each unit, is likely to result in biased estimates. Stratified random sampling, with strata defined using habitat suitability models [estimated in the resource selection functions (RSFs) framework] is a promising approach for improving the precision of model parameters. We used this approach to sample the Tibetan argali (Ovis ammon hodgsoni) in Indian Transhimalaya in order to estimate their distribution and to test if it can lead to a significant reduction in survey effort compared to random sampling. We first used an initial sample of argali feeding sites in 2005 and 2006 based on a priori selected vantage points and survey transects. This initial sample was used to build up an initial distribution model. The spatial predictions based on estimated RSFs were then used to define three strata of the study area. The strata were randomly sampled in 2007. As expected, much more presences per hour were obtained in the high quality strata compared to the low quality strata—1.33 obs/h vs. 0.080/h. Furthermore the best models selected on the basis of the prospective sample differed from those using the first a priori sample, suggesting bias in the initial sampling effort. The method therefore has significant implications for decreasing sampling effort in terms of sampling time in the field, especially when dealing with rare species, and removing initial sampling bias.  相似文献   

8.
Aim Techniques that predict species potential distributions by combining observed occurrence records with environmental variables show much potential for application across a range of biogeographical analyses. Some of the most promising applications relate to species for which occurrence records are scarce, due to cryptic habits, locally restricted distributions or low sampling effort. However, the minimum sample sizes required to yield useful predictions remain difficult to determine. Here we developed and tested a novel jackknife validation approach to assess the ability to predict species occurrence when fewer than 25 occurrence records are available. Location Madagascar. Methods Models were developed and evaluated for 13 species of secretive leaf‐tailed geckos (Uroplatus spp.) that are endemic to Madagascar, for which available sample sizes range from 4 to 23 occurrence localities (at 1 km2 grid resolution). Predictions were based on 20 environmental data layers and were generated using two modelling approaches: a method based on the principle of maximum entropy (Maxent) and a genetic algorithm (GARP). Results We found high success rates and statistical significance in jackknife tests with sample sizes as low as five when the Maxent model was applied. Results for GARP at very low sample sizes (less than c. 10) were less good. When sample sizes were experimentally reduced for those species with the most records, variability among predictions using different combinations of localities demonstrated that models were greatly influenced by exactly which observations were included. Main conclusions We emphasize that models developed using this approach with small sample sizes should be interpreted as identifying regions that have similar environmental conditions to where the species is known to occur, and not as predicting actual limits to the range of a species. The jackknife validation approach proposed here enables assessment of the predictive ability of models built using very small sample sizes, although use of this test with larger sample sizes may lead to overoptimistic estimates of predictive power. Our analyses demonstrate that geographical predictions developed from small numbers of occurrence records may be of great value, for example in targeting field surveys to accelerate the discovery of unknown populations and species.  相似文献   

9.
Aim Species distribution models (SDMs) use the locations of collection records to map the distributions of species, making them a powerful tool in conservation biology, ecology and biogeography. However, the accuracy of range predictions may be reduced by temporally autocorrelated biases in the data. We assess the accuracy of SDMs in predicting the ranges of tropical plant species on the basis of different sample sizes while incorporating real‐world collection patterns and biases. Location Tropical South American moist forests. Methods We use dated herbarium records to model the distributions of 65 Amazonian and Andean plant species. For each species, we use the first 25, 50, 100, 125 and 150 records collected and available for each species to analyse changes in spatial aggregation and climatic representativeness through time. We compare the accuracy of SDM range estimates produced using the time‐ordered data subsets to the accuracy of range estimates generated using the same number of collections but randomly subsampled from all available records. Results We find that collections become increasingly aggregated through time but that additional collecting sites are added resulting in progressively better representations of the species’ full climatic niches. The range predictions produced using time‐ordered data subsets are less accurate than predictions from random subsets of equal sample sizes. Range predictions produced using time‐ordered data subsets consistently underestimate the extent of ranges while no such tendency exists for range predictions produced using random data subsets. Main conclusions These results suggest that larger sample sizes are required to accurately map species ranges. Additional attention should be given to increasing the number of records available per species through continued collecting, better distributed collecting, and/or increasing access to existing collections. The fact that SDMs generally under‐predict the extent of species ranges means that extinction risks of species because of future habitat loss may be lower than previously estimated.  相似文献   

10.
11.
12.
Long‐term biodiversity monitoring data are mainly used to estimate changes in species occupancy or abundance over time, but they may also be incorporated into predictive models to document species distributions in space. Although changes in occupancy or abundance may be estimated from a relatively limited number of sampling units, small sample size may lead to inaccurate spatial models and maps of predicted species distributions. We provide a methodological approach to estimate the minimum sample size needed in monitoring projects to produce accurate species distribution models and maps. The method assumes that monitoring data are not yet available when sampling strategies are to be designed and is based on external distribution data from atlas projects. Atlas data are typically collected in a large number of sampling units during a restricted timeframe and are often similar in nature to the information gathered from long‐term monitoring projects. The large number of sampling units in atlas projects makes it possible to simulate a broad gradient of sample sizes in monitoring data and to examine how the number of sampling units influences the accuracy of the models. We apply the method to several bird species using data from a regional breeding bird atlas. We explore the effect of prevalence, range size and habitat specialization of the species on the sample size needed to generate accurate models. Model accuracy is sensitive to particularly small sample sizes and levels off beyond a sufficiently large number of sampling units that varies among species depending mainly on their prevalence. The integration of spatial modelling techniques into monitoring projects is a cost‐effective approach as it offers the possibility to estimate the dynamics of species distributions in space and over time. We believe our innovative method will help in the sampling design of future monitoring projects aiming to achieve such integration.  相似文献   

13.
Nazareno & Jump (2012) highlight potential issues with using small sample sizes in population genetic studies. By reanalysing allelic richness data from our recent publication on habitat fragmentation (Struebig et al. 2011), they assert that the observed relationship has been driven by three sites with the lowest number of individuals sampled. While sample size issues have been raised before in the genetic literature, Nazareno & Jump’s (2012) comment serves as a useful reminder to us all. Nevertheless, we disagree that our findings were significantly biased by sampling limitations. Here, we demonstrate by jackknifing that, contrary to the claims of Nazareno & Jump (2012), our correlations of allelic richness and fragment area are not driven solely by sites with low sample sizes. We maintain that small sample sizes can be accounted for in fragmentation studies and that sampling limitations should not detract from undertaking conservation genetic research.  相似文献   

14.
Understanding the effects of different types and quality of data on bioclimatic modeling predictions is vital to ascertaining the value of existing models, and to improving future models. Bioclimatic models were constructed using the CLIMEX program, using different data types – seasonal dynamics, geographic (overseas) distribution, and a combination of the two – for two biological control agents for the major weed Lantana camara L. in Australia. The models for one agent, Teleonemia scrupulosa Stål (Hemiptera: Tingidae) were based on a higher quality and quantity of data than the models for the other agent, Octotoma scabripennis Guérin-Méneville (Coleoptera: Chrysomelidae). Predictions of the geographic distribution for Australia showed that T. scrupulosa models exhibited greater accuracy with a progressive improvement from seasonal dynamics data, to the model based on overseas distribution, and finally the model combining the two data types. In contrast, O. scabripennis models were of low accuracy, and showed no clear trends across the various model types. These case studies demonstrate the importance of high quality data for developing models, and of supplementing distributional data with species seasonal dynamics data wherever possible. Seasonal dynamics data allows the modeller to focus on the species response to climatic trends, while distributional data enables easier fitting of stress parameters by restricting the species envelope to the described distribution. It is apparent that CLIMEX models based on low quality seasonal dynamics data, together with a small quantity of distributional data, are of minimal value in predicting the spatial extent of species distribution.  相似文献   

15.
Species distribution models are a very popular tool in ecology and biogeography and have great potential to help direct conservation efforts. Models are traditionally tested by using half the original species records to build the model and half to evaluate it. However, this can lead to overly optimistic estimates of model accuracy, particularly when there are systematic biases in the data. It is better to evaluate models using independent data. This study used independent species records from a new to survey to provide a more rigorous evaluation of distribution‐model accuracy. Distribution models were built for reptile, amphibian, butterfly and mammal species. The accuracy of these models was evaluated using the traditional approach of partitioning the original species records into model‐building and model‐evaluating datasets, and using independent records collected during a new field survey of 21 previously unvisited sites in diverse habitat types. We tested whether variation in distribution‐model accuracy among species could be explained by species detectability, range size, number of records used to build the models, and body size. Estimates of accuracy derived using the new species records correlated positively with estimates generated using the traditional data‐partitioning approach, but were on average 22% lower. Model accuracy was negatively related to range size and number of records used to build the models, and positively related to the body size of butterflies. There was no clear relationship between species detectability and model accuracy. The field data generally validated the species distribution models. However, there was considerable variation in model accuracy among species, some of which could be explained by the characteristics of species.  相似文献   

16.
Aim To offer an objective approach to some of the problems associated with the development of logistic regression models: how to compare different models, determination of sample size adequacy, the influence of the ratio of positive to negative cells on model accuracy, and the appropriate scale at which the hypothesis of a non‐random distribution should be tested. Location Test data were taken from Southern Africa. Methods The approach relies mainly on the use of the AUC (Area under the Curve) statistic, based on ROC (threshold Receiver Operating Characteristic) plots, for between‐model comparisons. Data for the distribution of the bont tick Amblyomma hebraeum Koch (Acari: Ixodidae) are used to illustrate the methods. Results Methods for the estimation of minimum sample sizes and more accurate hypothesis‐testing are outlined. Logistic regression is robust to the assumption that uncollected cells can be scored as negative, provided that the sample size of cells scored as positive is adequate. The variation in temperature and rainfall at localities where A. hebraeum has been collected is significantly lower than expected from a random sample of points across the data set, suggesting that within‐site variation may be an important determinant of its distribution. Main conclusions Between‐model comparisons relying on AUCs can be used to enhance objectivity in the development and refinement of logistic regression models. Both between‐site and within‐site variability should be considered as potentially important factors determining species distributions.  相似文献   

17.
Species distribution models are popular and widely applied ecological tools. Recent increases in data availability have led to opportunities and challenges for species distribution modelling. Each data source has different qualities, determined by how it was collected. As several data sources can inform on a single species, ecologists have often analysed just one of the data sources, but this loses information, as some data sources are discarded. Integrated distribution models (IDMs) were developed to enable inclusion of multiple datasets in a single model, whilst accounting for different data collection protocols. This is advantageous because it allows efficient use of all data available, can improve estimation and account for biases in data collection. What is not yet known is when integrating different data sources does not bring advantages. Here, for the first time, we explore the potential limits of IDMs using a simulation study integrating a spatially biased, opportunistic, presence-only dataset with a structured, presence–absence dataset. We explore four scenarios based on real ecological problems; small sample sizes, low levels of detection probability, correlations between covariates and a lack of knowledge of the drivers of bias in data collection. For each scenario we ask; do we see improvements in parameter estimation or the accuracy of spatial pattern prediction in the IDM versus modelling either data source alone? We found integration alone was unable to correct for spatial bias in presence-only data. Including a covariate to explain bias or adding a flexible spatial term improved IDM performance beyond single dataset models, with the models including a flexible spatial term producing the most accurate and robust estimates. Increasing the sample size of presence–absence data and having no correlated covariates also improved estimation. These results demonstrate under which conditions integrated models provide benefits over modelling single data sources.  相似文献   

18.
MaxEnt模型是过去几年最为流行的物种分布预测模型之一。针对一些濒危物种、入侵种和模拟数据的研究表明,MaxEnt模型均能在小样本的分布数据下得到较准确的预测结果。此外,研究范围的变化也会影响MaxEnt模型的构建。 然而,基于动物的实际分布数据来评估MaxEnt模型的研究甚少。 我们以黑白仰鼻猴 (Rhinopithecus bieti)为例,以11个猴群的分布数据为训练数据(样本量从1到10个猴群),在不同研究范围内构建MaxEnt模型,通过其它5个的猴群分布数据验证,分析样本量和研究范围变化对模型准确度产生的影响。 结果表明,随样本量和研究范围增大,MaxEnt模型准确度及稳定性都有增加。 此外,研究范围变化对模型准确度有一定影响。 应用Maxent进行物种分布预测时,训练数据应尽可能涵盖该物种可能出现的全部环境梯度。构建模型所需的背景数据点选择,应与建模使用的物种出现点形成有效对照。  相似文献   

19.
Parasite prevalence (the proportion of infected hosts) is a common measure used to describe parasitaemias and to unravel ecological and evolutionary factors that influence host-parasite relationships. Prevalence estimates are often based on small sample sizes because of either low abundance of the hosts or logistical problems associated with their capture or laboratory analysis. Because the accuracy of prevalence estimates is lower with small sample sizes, addressing sample size has been a common problem when dealing with prevalence data. Different methods are currently being applied to overcome this statistical challenge, but far from being different correct ways of solving a same problem, some are clearly wrong, and others need improvement.  相似文献   

20.
Fiske IJ  Bruna EM  Bolker BM 《PloS one》2008,3(8):e3080

Background

Matrix models are widely used to study the dynamics and demography of populations. An important but overlooked issue is how the number of individuals sampled influences estimates of the population growth rate (λ) calculated with matrix models. Even unbiased estimates of vital rates do not ensure unbiased estimates of λ–Jensen''s Inequality implies that even when the estimates of the vital rates are accurate, small sample sizes lead to biased estimates of λ due to increased sampling variance. We investigated if sampling variability and the distribution of sampling effort among size classes lead to biases in estimates of λ.

Methodology/Principal Findings

Using data from a long-term field study of plant demography, we simulated the effects of sampling variance by drawing vital rates and calculating λ for increasingly larger populations drawn from a total population of 3842 plants. We then compared these estimates of λ with those based on the entire population and calculated the resulting bias. Finally, we conducted a review of the literature to determine the sample sizes typically used when parameterizing matrix models used to study plant demography.

Conclusions/Significance

We found significant bias at small sample sizes when survival was low (survival = 0.5), and that sampling with a more-realistic inverse J-shaped population structure exacerbated this bias. However our simulations also demonstrate that these biases rapidly become negligible with increasing sample sizes or as survival increases. For many of the sample sizes used in demographic studies, matrix models are probably robust to the biases resulting from sampling variance of vital rates. However, this conclusion may depend on the structure of populations or the distribution of sampling effort in ways that are unexplored. We suggest more intensive sampling of populations when individual survival is low and greater sampling of stages with high elasticities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号