共查询到20条相似文献,搜索用时 15 毫秒
1.
Giles M. Foody 《Global Ecology and Biogeography》2011,20(3):498-508
Aim To explore the impacts of imperfect reference data on the accuracy of species distribution model predictions. The main focus is on impacts of the quality of reference data (labelling accuracy) and, to a lesser degree, data quantity (sample size) on species presence–absence modelling. Innovation The paper challenges the common assumption that some popular measures of model accuracy and model predictions are prevalence independent. It highlights how imperfect reference data may impact on a study and the actions that may be taken to address problems. Main conclusions The theoretical independence of prevalence of popular accuracy measures, such as sensitivity, specificity, true skills statistics (TSS) and area under the receiver operating characteristic curve (AUC), is unlikely to occur in practice due to reference data error; all of these measures of accuracy, together with estimates of species occurrence, showed prevalence dependency arising through the use of a non‐gold‐standard reference. The number of cases used also had implications for the ability of a study to meet its objectives. Means to reduce the negative effects of imperfect reference data in study design and interpretation are suggested. 相似文献
2.
3.
Aim Studying relationships between species and their physical environment requires species distribution data, ideally based on presence–absence (P–A) data derived from surveys. Such data are limited in their spatial extent. Presence‐only (P‐O) data are considered inappropriate for such analyses. Our aim was to evaluate whether such data may be used when considering a multitude of species over a large spatial extent, in order to analyse the relationships between environmental factors and species composition. Location The study was conducted in virtual space. However, geographic origin of the data used is the contiguous USA. Methods We created distribution maps for 50 virtual species based on actual environmental conditions in the study. Sampling locations were based on true observations from the Global Biodiversity Information Facility. We produced P–A data by selecting ∼1000 random locations and recorded the presence/absence of all species. We produced two P‐O data sets. Full P‐O set was produced by sampling the species in locations of true occurrences of species. Partial P‐O was a subset of full P‐O data set matching the size of the P–A data set. For each data set, we recorded the environmental variables at the same locations. We used CCA to evaluate the amount of variance in species composition explained by each variable. We evaluated the bias in the data set by calculating the deviation of average values of the environmental variables in sampled locations compared to the entire area. Results P–A and P‐O data sets were similar in terms of the amount of variance explained by the different environmental variables. We found sizable environmental and spatial bias in the P‐O data set, compared to the entire study area. Main conclusions Our results suggest that although P‐O data from collections contain bias, the multitude of species, and thus the relatively large amount of information in the data, allow the use of P‐O data for analysing environmental determinants of species composition. 相似文献
4.
For many applications the continuous prediction afforded by species distribution modeling must be converted to a map of presence or absence, so a threshold probability indicative of species presence must be fixed. Because of the bias in probability outputs due to frequency of presences (prevalence), a fixed threshold value, such as 0.5, does not usually correspond to the threshold above which the species is more likely to be present. In this paper four threshold criteria are compared for a wide range of sample sizes and prevalences, modeling a virtual species in order to avoid the omnipresent error sources that the use of real species data implies. In general, sensitivity–specificity difference minimizer and sensitivity–specificity sum maximizer criteria produced the most accurate predictions. The widely-used 0.5 fixed threshold and Kappa-maximizer criteria are the worst ones in almost all situations. Nevertheless, whatever the criteria used, the threshold value chosen and the research goals that determined its choice must be stated. 相似文献
5.
Jorge Soberón 《Journal of Biogeography》2015,42(4):807-808
It has been proposed that the study of co‐occurrence of species, which is traditionally performed using full presence–absence matrices of sets of many species, could benefit from simply testing for random co‐occurrence between pairs of species, and that use of a full presence–absence matrix is tantamount to regarding it as having some real ecological identity. Here I argue that although there are valid questions that can be answered using a pairwise approach, there are many others that naturally require the analysis of entire sets of species in a joint way, as provided for through the use of full presence–absence matrices. Moreover, there are theoretical and mathematical advantages to the use of presence–absence matrices, a few of which are briefly discussed in this short note. 相似文献
6.
Héctor T. Arita Andrés Christen Pilar Rodríguez Jorge Soberón 《Global Ecology and Biogeography》2012,21(2):282-292
Aim A great deal of information on distribution and diversity can be extracted from presence–absence matrices (PAMs), the basic analytical tool of many biogeographic studies. This paper presents numerical procedures that allow the analysis of such information by taking advantage of mathematical relationships within PAMs. In particular, we show how range–diversity (RD) plots summarize much of the information contained in the matrices by the simultaneous depiction of data on distribution and diversity. Innovation We use matrix algebra to extract and process data from PAMs. Information on the distribution of species and on species richness of sites is computed using the traditional R (by rows) and Q (by columns) procedures, as well as the new Rq (by rows, considering the structure of columns) and Qr (by columns, considering the structure by rows) methods. Matrix notation is particularly suitable for summarizing complex calculations using PAMs, and the associated algebra allows the implementation of efficient computational programs. We show how information on distribution and species richness can be depicted simultaneously in RD plots, allowing a direct examination of the relationship between those two aspects of diversity. We explore the properties of RD plots with a simple example, and use null models to show that while parameters of central tendency are not affected by randomization, the dispersion of points in RD plots does change, showing the significance of patterns of co‐occurrence of species and of similarity among sites. Main conclusion Species richness and range size are both valid measures of diversity that can be analysed simultaneously with RD plots. A full analysis of a system requires measures of central tendency and dispersion for both distribution and species richness. 相似文献
7.
8.
9.
Without quality presence–absence data,discrimination metrics such as TSS can be misleading measures of model performance 下载免费PDF全文
Boris Leroy Robin Delsol Bernard Hugueny Christine N. Meynard Chéïma Barhoumi Morgane Barbet‐Massin Céline Bellard 《Journal of Biogeography》2018,45(9):1994-2002
The discriminating capacity (i.e. ability to correctly classify presences and absences) of species distribution models (SDMs) is commonly evaluated with metrics such as the area under the receiving operating characteristic curve (AUC), the Kappa statistic and the true skill statistic (TSS). AUC and Kappa have been repeatedly criticized, but TSS has fared relatively well since its introduction, mainly because it has been considered as independent of prevalence. In addition, discrimination metrics have been contested because they should be calculated on presence–absence data, but are often used on presence‐only or presence‐background data. Here, we investigate TSS and an alternative set of metrics—similarity indices, also known as F‐measures. We first show that even in ideal conditions (i.e. perfectly random presence–absence sampling), TSS can be misleading because of its dependence on prevalence, whereas similarity/F‐measures provide adequate estimations of model discrimination capacity. Second, we show that in real‐world situations where sample prevalence is different from true species prevalence (i.e. biased sampling or presence‐pseudoabsence), no discrimination capacity metric provides adequate estimation of model discrimination capacity, including metrics specifically designed for modelling with presence‐pseudoabsence data. Our conclusions are twofold. First, they unequivocally impel SDM users to understand the potential shortcomings of discrimination metrics when quality presence–absence data are lacking, and we recommend obtaining such data. Second, in the specific case of virtual species, which are increasingly used to develop and test SDM methodologies, we strongly recommend the use of similarity/F‐measures, which were not biased by prevalence, contrary to TSS. 相似文献
10.
Presence‐only data present challenges for selecting thresholds to transform species distribution modeling results into binary outputs. In this article, we compare two recently published threshold selection methods (maxSSS and maxFpb) and examine the effectiveness of the threshold‐based prevalence estimation approach. Six virtual species with varying prevalence were simulated within a real landscape in southeastern Australia. Presence‐only models were built with DOMAIN, generalized linear model, Maxent, and Random Forest. Thresholds were selected with two methods maxSSS and maxFpb with four presence‐only datasets with different ratios of the number of known presences to the number of random points (KP–RPratio). Sensitivity, specificity, true skill statistic, and F measure were used to evaluate the performance of the results. Species prevalence was estimated as the ratio of the number of predicted presences to the total number of points in the evaluation dataset. Thresholds selected with maxFpb varied as the KP–RPratio of the threshold selection datasets changed. Datasets with the KP–RPratio around 1 generally produced better results than scores distant from 1. Results produced by We conclude that maxFpb had specificity too low for very common species using Random Forest and Maxent models. In contrast, maxSSS produced consistent results whichever dataset was used. The estimation of prevalence was almost always biased, and the bias was very large for DOMAIN and Random Forest predictions. We conclude that maxFpb is affected by the KP–RPratio of the threshold selection datasets, but maxSSS is almost unaffected by this ratio. Unbiased estimations of prevalence are difficult to be determined using the threshold‐based approach. 相似文献
11.
Samuel D. Veloz 《Journal of Biogeography》2009,36(12):2290-2299
Aim Environmental niche models that utilize presence‐only data have been increasingly employed to model species distributions and test ecological and evolutionary predictions. The ideal method for evaluating the accuracy of a niche model is to train a model with one dataset and then test model predictions against an independent dataset. However, a truly independent dataset is often not available, and instead random subsets of the total data are used for ‘training’ and ‘testing’ purposes. The goal of this study was to determine how spatially autocorrelated sampling affects measures of niche model accuracy when using subsets of a larger dataset for accuracy evaluation. Location The distribution of Centaurea maculosa (spotted knapweed; Asteraceae) was modelled in six states in the western United States: California, Oregon, Washington, Idaho, Wyoming and Montana. Methods Two types of niche modelling algorithms – the genetic algorithm for rule‐set prediction (GARP) and maximum entropy modelling (as implemented with Maxent) – were used to model the potential distribution of C. maculosa across the region. The effect of spatially autocorrelated sampling was examined by applying a spatial filter to the presence‐only data (to reduce autocorrelation) and then comparing predictions made using the spatial filter with those using a random subset of the data, equal in sample size to the filtered data. Results The accuracy of predictions from both algorithms was sensitive to the spatial autocorrelation of sampling effort in the occurrence data. Spatial filtering led to lower values of the area under the receiver operating characteristic curve plot but higher similarity statistic (I) values when compared with predictions from models built with random subsets of the total data, meaning that spatial autocorrelation of sampling effort between training and test data led to inflated measures of accuracy. Main conclusions The findings indicate that care should be taken when interpreting the results from presence‐only niche models when training and test data have been randomly partitioned but occurrence data were non‐randomly sampled (in a spatially autocorrelated manner). The higher accuracies obtained without the spatial filter are a result of spatial autocorrelation of sampling effort between training and test data inflating measures of prediction accuracy. If independently surveyed data for testing predictions are unavailable, then it may be necessary to explicitly account for the spatial autocorrelation of sampling effort between randomly partitioned training and test subsets when evaluating niche model predictions. 相似文献
12.
13.
14.
Assessing the accuracy of species distribution models to predict amphibian species richness patterns 总被引:1,自引:0,他引:1
1. Evaluating the distribution of species richness where biodiversity is high but has been insufficiently sampled is not an easy task. Species distribution modelling has become a useful approach for predicting their ranges, based on the relationships between species records and environmental variables. Overlapping predictions of individual distributions could be a useful strategy for obtaining estimates of species richness and composition in a region, but these estimates should be evaluated using a proper validation process, which compares the predicted richness values and composition with accurate data from independent sources. 2. In this study, we propose a simple approach to estimate model performance for several distributional predictions generated simultaneously. This approach is particularly suitable when species distribution modelling techniques that require only presence data are used. 3. The individual distributions for the 370 known amphibian species of Mexico were predicted using maxent to model data on their known presence (66,113 presence-only records). Distributions were subsequently overlapped to obtain a prediction of species richness. Accuracy was assessed by comparing the overall species richness values predicted for the region with observed and predicted values from 118 well-surveyed sites, each with an area of c. 100 km(2), which were identified using species accumulation curves and nonparametric estimators. 4. The derived models revealed a remarkable heterogeneity of species richness across the country, provided information about species composition per site and allowed us to obtain a measure of the spatial distribution of prediction errors. Examining the magnitude and location of model inaccuracies, as well as separately assessing errors of both commission and omission, highlights the inaccuracy of the predictions of species distribution models and the need to provide measures of uncertainty along with the model results. 5. The combination of a species distribution modelling method like maxent and species richness estimators offers a useful tool for identifying when the overall pattern provided by all model predictions might be representing the geographical patterns of species richness and composition, regardless of the particular quality or accuracy of the predictions for each individual species. 相似文献
15.
Aidin Niamir Andrew K. Skidmore Albertus G. Toxopeus Antonio R. Muñoz Raimundo Real 《Diversity & distributions》2011,17(6):1173-1185
Aim The spatial resolution of species atlases and therefore resulting model predictions are often too coarse for local applications. Collecting distribution data at a finer resolution for large numbers of species requires a comprehensive sampling effort, making it impractical and expensive. This study outlines the incorporation of existing knowledge into a conventional approach to predict the distribution of Bonelli’s eagle (Aquila fasciata) at a resolution 100 times finer than available atlas data. Location Malaga province, Andalusia, southern Spain. Methods A Bayesian expert system was proposed to utilize the knowledge from distribution models to yield the probability of a species being recorded at a finer resolution (1 × 1 km) than the original atlas data (10 × 10 km). The recorded probability was then used as a weight vector to generate a sampling scheme from the species atlas to enhance the accuracy of the modelling procedure. The maximum entropy for species distribution modelling (MaxEnt) was used as the species distribution model. A comparison was made between the results of the MaxEnt using the enhanced and, the random sampling scheme, based on four groups of environmental variables: topographic, climatic, biological and anthropogenic. Results The models with the sampling scheme enhanced by an expert system had a higher discriminative capacity than the baseline models. The downscaled (i.e. finer scale) species distribution maps using a hybrid MaxEnt/expert system approach were more specific to the nest locations and were more contrasted than those of the baseline model. Main conclusions The proposed method is a feasible substitute for comprehensive field work. The approach developed in this study is applicable for predicting the distribution of Bonelli’s eagle at a local scale from a national‐level occurrence data set; however, the usefulness of this approach may be limited to well‐known species. 相似文献
16.
17.
18.
Presence‐only data abounds in ecology, often accompanied by a background sample. Although many interesting aspects of the species’ distribution can be learned from such data, one cannot learn the overall species occurrence probability, or prevalence, without making unjustified simplifying assumptions. In this forum article we question the approach of Royle et al. (2012) that claims to be able to do this. 相似文献
19.
Tim Newbold Tom Reader Ahmed El‐Gabbas Wiebke Berg Wael M. Shohdi Samy Zalat Sherif Baha El Din Francis Gilbert 《Oikos》2010,119(8):1326-1334
Species distribution models are a very popular tool in ecology and biogeography and have great potential to help direct conservation efforts. Models are traditionally tested by using half the original species records to build the model and half to evaluate it. However, this can lead to overly optimistic estimates of model accuracy, particularly when there are systematic biases in the data. It is better to evaluate models using independent data. This study used independent species records from a new to survey to provide a more rigorous evaluation of distribution‐model accuracy. Distribution models were built for reptile, amphibian, butterfly and mammal species. The accuracy of these models was evaluated using the traditional approach of partitioning the original species records into model‐building and model‐evaluating datasets, and using independent records collected during a new field survey of 21 previously unvisited sites in diverse habitat types. We tested whether variation in distribution‐model accuracy among species could be explained by species detectability, range size, number of records used to build the models, and body size. Estimates of accuracy derived using the new species records correlated positively with estimates generated using the traditional data‐partitioning approach, but were on average 22% lower. Model accuracy was negatively related to range size and number of records used to build the models, and positively related to the body size of butterflies. There was no clear relationship between species detectability and model accuracy. The field data generally validated the species distribution models. However, there was considerable variation in model accuracy among species, some of which could be explained by the characteristics of species. 相似文献