首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
为探讨不同特征挖掘方法与广义提升回归模型相结合在数字土壤制图中的应用,本研究首先使用递归特征消除和过滤式两种特征筛选方法对环境协变量进行筛选,再分别使用原始环境协变量、筛选后的最优变量组合作为自变量,建立基于广义提升回归模型和随机森林模型的安徽省土壤pH预测模型并进行制图。结果表明: 引入两种特征挖掘方法均可有效提高广义提升回归模型和随机森林模型预测土壤pH的精度,并且可以起到降维的作用;相较于随机森林模型,广义提升回归模型的验证集预测精度略低,在训练集中,广义提升回归模型的精度却远高于随机森林模型,模型解释度高,整体效果较好;随机森林模型的主要参数ntree和mtry对于模型的影响程度较低,而不同参数对于广义提升回归模型的预测精度影响较大,不同参数组合模型精度不同,建模前需要进行调参。空间制图结果表明,安徽省土壤pH呈“南酸北碱”趋势。  相似文献   

2.
随机森林模型在分类与回归分析中的应用   总被引:25,自引:0,他引:25  
李欣海 《昆虫知识》2013,50(4):1190-1197
随机森林(random forest)模型是由Breiman和Cutler在2001年提出的一种基于分类树的算法。它通过对大量分类树的汇总提高了模型的预测精度,是取代神经网络等传统机器学习方法的新的模型。随机森林的运算速度很快,在处理大数据时表现优异。随机森林不需要顾虑一般回归分析面临的多元共线性的问题,不用做变量选择。现有的随机森林软件包给出了所有变量的重要性。另外,随机森林便于计算变量的非线性作用,而且可以体现变量间的交互作用(interaction)。它对离群值也不敏感。本文通过3个案例,分别介绍了随机森林在昆虫种类的判别分析、有无数据的分析(取代逻辑斯蒂回归)和回归分析上的应用。案例的数据格式和R语言代码可为研究随机森林在分类与回归分析中的应用提供参考。  相似文献   

3.
为探讨小流域尺度丘陵区的高分辨率数字土壤制图方法,通过对景观相分类的探索,配合应用不同尺度的Geomorphons(GM)微地形特征数据构成分类变量组参与高分辨率土壤pH、黏粒含量和阳离子交换量的预测制图,并与传统数字高程模型衍生变量和遥感变量进行组合与比较分析。此外,采用支持向量机、偏最小二乘回归和随机森林3种机器学习模型择优与残差回归克里金复合参与预测模型的构建与评价。结果表明: 景观及多尺度微地形分类变量组的应用分别提高小流域尺度丘陵地貌区pH、黏粒含量和阳离子交换量预测精度的18.8%、8.2%和8.7%。包含植被信息的景观相分类图相比土地利用数据有更高的模型贡献度;5 m分辨率的GM微地形分类图相比低分辨率的分类图更适宜高精度的预测制图。黏粒含量使用随机森林复合模型有最高的预测精度,而pH和阳离子交换量则不适宜在随机森林模型的基础上加入残差回归克里金模型。景观-多尺度微地形分类变量、数字高程模型衍生变量和遥感变量三者结合的模型预测表现最佳,表明多元变量在起伏地形区域相比单一数据源能够包含更多的土壤有效信息。由GM数据和地表景观数据组成的景观分类变量组作为主要变量能够解释小流域丘陵区部分土壤属性约40%的空间变异。在同类型土壤预测制图研究中,多分辨率GM及景观分类数据有潜力作为环境变量参与预测模型的构建。  相似文献   

4.
基于R的结构方程模型在生态学中的应用   总被引:1,自引:0,他引:1  
结构方程模型已经成为当前生态数据分析的主要方法之一。与其他多变量统计方法不同,结构方程模型的建模过程由理论假设驱动,且可以同时量化多个变量间的直接和间接因果关系。然而,由于结构方程模型引入国内生态学领域的时间相对较短,研究者经常在实际应用中遇到各种问题,各种使用错误也屡见不鲜。对此,本文系统阐述了结构方程模型的建模原理、建模流程、模型评价、模型修正等方面内容,并且结合具体研究案例介绍了结构方程模型分析的两个主流R包—lavaan和piecewiseSEM。其中,lavaan可以分析纳入了潜变量的结构方程模型,piecewiseSEM则可以解决各观测数据不独立以及响应变量残差不满足多元正态分布等问题。本文将有助于研究者准确理解结构方程模型并能扩大其在生态学中的应用。  相似文献   

5.
随机森林(Random forest)模型在2001年发表后得到广泛的关注。由于随机森林可以进行回归和判别等多种统计分析,而且不受正态性、方差齐性和自变量独立性等参数检验的前提条件的制约,其应用日益普遍,有被看作万能模型的趋势。实际上,随机森林是一种特点鲜明的模型,应用局部优化拟合观察值,在分析有偏效应关系的数据时,其结果往往不准确。本文以蝉科(Cicadidea)物种的分布数据为例,比较了随机森林在回归分析时与多元线性回归、广义可加模型和人工神经网络模型的差别,在判别分析时与线性判别分析的差别,强调了随机森林预测时的碎片化特点。结果显示随机森林在处理有多元共线性和交互作用的数据时,以及在判别分析时,其准确率最高。鉴于随机森林的局限性,建议做数据分析时选择多种模型进行比较。文中的R语言代码可为研究者提供参考。  相似文献   

6.
物种分布模型通常用于基础生态和应用生态研究,用来确定影响生物分布和物种丰富度的因素,量化物种与非生物条件的关系,预测物种对土地利用和气候变化的反应,并确定潜在的保护区.在传统的物种分布模型中,生物的相互作用很少被纳入,而联合物种分布模型(JSDMs)作为近年提出的一种新的可行方法,可以同时考虑环境因素和生物交互作用,因而成为分析生物群落结构和种间相互作用过程的有力工具.JSDMs以物种分布模型(SDMs)为基础,通常采用广义线性回归模型建立物种对环境变量的多变量响应,以随机效应的形式获取物种间的关联,同时结合隐变量模型(LVMs),并基于Laplace近似和马尔科夫蒙脱卡罗模拟的最大似然估计或贝叶斯方法来估算模型参数.本文对JSDMs的产生及理论基础进行归纳总结,重点介绍了不同类型JSDMs的特点及其在现代生态学中的应用,阐述了JSDMs的应用前景、使用过程中存在的问题及发展方向.随着对环境因素与多物种种间关系研究的深入,JSDMs将是今后物种分布模型研究的重点.  相似文献   

7.
互花米草成功入侵的关键是其生长繁殖能力以及对环境的适应能力,叶片含水率、相对叶绿素含量、碳氮比、总氮、总磷以及比叶面积等叶片功能性状反应的是互花米草对资源的利用能力以及环境的适应能力。以江苏盐城滨海湿地为研究对象,进行互花米草叶片功能性状与高光谱数据的关系研究。通过对原始光谱数据以及一阶微分转换光谱数据进行主成分分析提取新的主成分变量作为自变量分别建立不同性状的逐步回归、BP神经网络、支持向量机、随机森林4种预测模型,通过比较构建模型的R2以及RMSE选择最优模型,进而基于相关性分析得到的敏感波段构建最优模型,验证其准确性和适用性。研究结果发现:(1)一阶微分数据的建模效果优于原始光谱数据;(2)通过对不同功能性状的预测建模,发现4种模型的预测效果排序为:随机森林>支持向量机>BP神经网络>逐步回归,其中随机森林模型的准确性高、稳定性强,明显优于其他3种模型,而逐步回归模型的效果最差,不适用于互花米草叶片功能性状的高光谱建模;(3)通过对相关性分析得到的敏感波段建立随机森林模型,建模R2均大于0.90,验证R2介于0.73-0.95之间,进一步证实了随机森林模型的准确性和稳定性。研究结果表明,高光谱数据可以作为快速监测互花米草生长状况的有力手段,而随机森林模型可以作为高精度模型实现对互花米草不同叶片功能性状的估测。  相似文献   

8.
黑龙江大兴安岭是森林雷击火的高发地区,急需研发精确的火险预测模型对该区森林火灾进行预测.本文基于大兴安岭地区森林雷击火灾数据及环境变量数据,采用MAXENT模型进行森林雷击火的火险预测.首先对各环境变量进行共线性诊断,再利用累积正则化增益法和Jackknife方法评价了环境变量的重要性,最后采用最大Kappa值和AUC值检测了MAXENT模型的预测精度.结果表明: 闪电能量和中和电荷量的方差膨胀因子(VIF)值分别为5.012和6.230,与其他变量之间存在共线性,不能用于模型训练.日降雨量、云地闪电数量及云地闪回击电流强度是影响森林雷击火发生的3个最重要因素,日平均风速和坡向的影响较小.随着建模数据比例的增加,最大Kappa值和AUC值均有增大趋势.最大Kappa值都大于0.75,平均值为0.772; AUC值都大于0.5,平均值为0.859.MAXENT模型的预测精度达到中等精度,可应用于大兴安岭地区的森林雷击火火险预测.  相似文献   

9.
报道了英国独特的真菌保护区里真菌的生态学和生物多样性,而该保护区内有关物种构成的数据自1994年就开始进行收集了。关于真菌的生态学相互关系以及它们在总的生态系统功能中的作用可以作为生物多样性数据的补充。5~8月期间,在8块覆盖着不同植被(山毛榉,桦树,桦栎山毛榉,禾本科植物)实验区里,研究了森林凋落叶和叶本层土壤的特性和构成,测量了细菌种群数量和真菌的生物量(针对麦角固醇而言)。用相关分析和分段回归建模方法,结合可通过并行研究采集到的原生动物和线虫数据,得到了一系列结果。这些结果强调了某些因素的复杂性,这些因素影响着森林土壤和森林凋落叶中真菌生物量空间可变性的时间动态。大多数的相互作用看起来是瞬时的,在解释环境观测记录时应该对这一点给予充分考虑。最后,解释了若干具体关系,给出了进一步研究的方向,讨论了对整个生态系统功能研究的必要性。  相似文献   

10.
刘陈坚  张黎明  任引 《生态学报》2020,40(22):8199-8206
森林生物量会直接影响森林生态系统服务的评估。如何运用景感生态学,准确预测区域尺度下森林生物量的时空演变趋势,是关乎国家重大方针政策制定和生态产业体系建设的关键性战略课题。本研究目的是构建一套生态信息诊断框架,优化趋善化模型(3PG2模型)结构,解决由于模型结构设计所导致在森林景感营造过程中生态预测的不确定性。以杉木林分布广泛的福建南靖县为研究区域,选择合适的阈值范围和空间统计分析识别出模拟生物量的不确定性区域,构建包含Geogdetector软件、遗传技术和计算机程序3个部分组成的生态信息诊断框架,使用Geogdetector软件阐明多重因素交互作用对模型模拟的影响及机理,采用遗传技术优化模型结构以提升模拟精度,运用计算机程序和3PG2模型准确预测区域尺度杉木林生物量的时空演变趋势。结果表明:林龄是导致3PG2模型生物量模拟结果不确定性的主导因素。通过景感生态学(谜码数据和趋善化模型)构建的生态信息诊断框架可以准确预测森林生物量,实现区域尺度上的可持续森林管理。  相似文献   

11.
Rangelands with more than 8000 plant species occupy nearly 54.6% of the land area of Iran and thus are accounted for a rich plant genetic storage. Mazandaran province has 378,000 ha of rangelands with high plant species richness and diversity due to its climate conditions but plants distribution is at risk because of non-principle management, land use change and as a result changing environmental factors. Vegetation management strategies can be guided by models that predict plant species distribution based on governing environmental variables. This is especially useful for the dominant species that determine ecosystem processes. In fact, modelling algorithm in each SDM determines its suitability for different ecosystems. Our aim was to compare the predictive power of a number of SDMs and to evaluate the importance of a range of environmental variables as predictors in the context of semi-arid rangeland vegetation. The selected study area, the Sarkhas rangelands (northern Iran, 36°10′ 42˝ N - 51°19′ 11˝ E), covers approximately 4358.9 ha of Mazandaran province. The efficacy of four different modelling techniques as well as Ensemble model was evaluated to predict the distribution of five dominant forage plant species (Vicia villosa, Stachys lavandulifolia, Coronilla balansae, Sanguisorba minor and Alopecurus textilis). The used models included artificial neural network (ANN), boosted regression trees (BRT), classification and regression trees (CART), and random forest (RF). Ensemble, RF and CART had the highest area under curve. The AUC obtained for Vicia villosa, Stachys lavandulifolia, Coronilla balansae, Sanguisorba minor and Alopecurus textilis, were 0.90, 0.72, 0.76, 0.69 and 0.75 respectively. Ensemble model was the model that most consistently demonstrated high predictive power across species in the rangeland context investigated here. BRT exhibited the least predictive power. An importance analysis of variables showed that soil organic C according to the CART model (0.396) and K according to the RF model (0.396) were the most important environmental variables.  相似文献   

12.
A working guide to boosted regression trees   总被引:33,自引:0,他引:33  
1. Ecologists use statistical models for both explanation and prediction, and need techniques that are flexible enough to express typical features of their data, such as nonlinearities and interactions. 2. This study provides a working guide to boosted regression trees (BRT), an ensemble method for fitting statistical models that differs fundamentally from conventional techniques that aim to fit a single parsimonious model. Boosted regression trees combine the strengths of two algorithms: regression trees (models that relate a response to their predictors by recursive binary splits) and boosting (an adaptive method for combining many simple models to give improved predictive performance). The final BRT model can be understood as an additive regression model in which individual terms are simple trees, fitted in a forward, stagewise fashion. 3. Boosted regression trees incorporate important advantages of tree-based methods, handling different types of predictor variables and accommodating missing data. They have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and automatically handle interaction effects between predictors. Fitting multiple trees in BRT overcomes the biggest drawback of single tree models: their relatively poor predictive performance. Although BRT models are complex, they can be summarized in ways that give powerful ecological insight, and their predictive performance is superior to most traditional modelling methods. 4. The unique features of BRT raise a number of practical issues in model fitting. We demonstrate the practicalities and advantages of using BRT through a distributional analysis of the short-finned eel (Anguilla australis Richardson), a native freshwater fish of New Zealand. We use a data set of over 13 000 sites to illustrate effects of several settings, and then fit and interpret a model using a subset of the data. We provide code and a tutorial to enable the wider use of BRT by ecologists.  相似文献   

13.
1. Eutrophication is a serious threat in many parts of the world, and identifying the environmental factors that determine the spatial distribution of eutrophicated waterbodies as well as the development of management tools is a challenge. 2. In this study, data from the Ile‐de‐France region were analysed to determine if catchment scale environmental variables could predict concentrations of chlorophyll a (used as a proxy for eutrophication status) of artificial lakes and reservoirs. 3. General additive models (GAM) and random forest models (RF) displayed greater predictive power than generalised linear models, indicating the importance of non‐monotonic relationships. Using RF modelling, very high predictive accuracy was achieved for both continuous and binomial (eutrophic or not) response variables (continuous: R2 = 0.715; binomial: kappa = 0.764, 89% of waterbodies were accurately predicted). The better predictive power and robustness of RF versus GAM was attributed to the formers ability to better handle complex interactions between predictors and to account for threshold effects. 4. Our results confirmed the close link between the water quality of lakes and reservoirs and the characteristics of their catchments. Moreover, we also showed that (i) simple (e.g. linear and/or monotonic) relationships between catchment land use and water quality were only found for sub‐regional datasets, and (ii) land use needs to be considered in association with complementary environmental variables (hydromorphological variables) to best assess its impact on water quality.  相似文献   

14.
SUMMARY 1. The prediction of species distributions is of primary importance in ecology and conservation biology. Statistical models play an important role in this regard; however, researchers have little guidance when choosing between competing methodologies because few comparative studies have been conducted. 2. We provide a comprehensive comparison of traditional and alternative techniques for predicting species distributions using logistic regression analysis, linear discriminant analysis, classification trees and artificial neural networks to model: (1) the presence/absence of 27 fish species as a function of habitat conditions in 286 temperate lakes located in south‐central Ontario, Canada and (2) simulated data sets exhibiting deterministic, linear and non‐linear species response curves. 3. Detailed evaluation of model predictive power showed that approaches produced species models that differed in overall correct classification, specificity (i.e. ability to correctly predict species absence) and sensitivity (i.e. ability to correctly predict speciespresence) and in terms of which of the study lakes they correctly classified. Onaverage, neural networks outperformed the other modelling approaches, although all approaches predicted species presence/absence with moderate to excellent success. 4. Based on simulated non‐linear data, classification trees and neural networks greatly outperformed traditional approaches, whereas all approaches exhibited similar correct classification rates when modelling simulated linear data. 5. Detailed evaluation of model explanatory insight showed that the relative importance of the habitat variables in the species models varied among the approaches, where habitat variable importance was similar among approaches for some species and very different for others. 6. In general, differences in predictive power (both correct classification rate and identity of the lakes correctly classified) among the approaches corresponded with differences in habitat variable importance, suggesting that non‐linear modelling approaches (i.e. classification trees and neural networks) are better able to capture and model complex, non‐linear patterns found in ecological data. The results from the comparisons using simulated data further support this notion. 7. By employing parallel modelling approaches with the same set of data and focusing on comparing multiple metrics of predictive performance, researchers can begin to choose predictive models that not only provide the greatest predictive power, but also best fit the proposed application.  相似文献   

15.
Aim Elucidating the environmental limits of coral reefs is central to projecting future impacts of climate change on these ecosystems and their global distribution. Recent developments in species distribution modelling (SDM) and the availability of comprehensive global environmental datasets have provided an opportunity to reassess the environmental factors that control the distribution of coral reefs at the global scale as well as to compare the performance of different SDM techniques. Location Shallow waters world‐wide. Methods The SDM methods used were maximum entropy (Maxent) and two presence/absence methods: classification and regression trees (CART) and boosted regression trees (BRT). The predictive variables considered included sea surface temperature (SST), salinity, aragonite saturation state (ΩArag), nutrients, irradiance, water transparency, dust, current speed and intensity of cyclone activity. For many variables both mean and SD were considered, and at weekly, monthly and annually averaged time‐scales. All were transformed to a global 1° × 1° grid to generate coral reef probability maps for comparison with known locations. Model performance was compared in terms of receiver operating characteristic (ROC) curves and area under the curve (AUC) scores. Potential geographical bias was explored via misclassification maps of false positive and negative errors on test data. Results Boosted regression trees consistently outperformed other methods, although Maxent also performed acceptably. The dominant environmental predictors were the temperature variables (annual mean SST, and monthly and weekly minimum SST), followed by, and with their relative importance differing between regions, nutrients, light availability and ΩArag. No systematic bias in SDM performance was found between major coral provinces, but false negatives were more likely for cells containing ‘marginal’ non‐reef‐forming coral communities, e.g. Bermuda. Main conclusions Agreement between BRT and Maxent models gives predictive confidence for exploring the environmental limits of coral reef ecosystems at a spatial scale relevant to global climate models (c. 1° × 1°). Although SST‐related variables dominate the coral reef distribution models, contributions from nutrients, ΩArag and light availability were critical in developing models of reef presence in regions such as the Bahamas, South Pacific and Coral Triangle. The steep response in SST‐driven probabilities at low temperatures indicates that latitudinal expansion of coral reef habitat is very sensitive to global warming.  相似文献   

16.
Species distribution modelling (SDM) is a widely used tool and has many applications in ecology and conservation biology. Spatial autocorrelation (SAC), a pattern in which observations are related to one another by their geographic distance, is common in georeferenced ecological data. SAC in the residuals of SDMs violates the ‘independent errors’ assumption required to justify the use of statistical models in modelling species’ distributions. The autologistic modelling approach accounts for SAC by including an additional term (the autocovariate) representing the similarity between the value of the response variable at a location and neighbouring locations. However, autologistic models have been found to introduce bias in the estimation of parameters describing the influence of explanatory variables on habitat occupancy. To address this problem we developed an extension to the autologistic approach by calculating the autocovariate on SAC in residuals (the RAC approach). Performance of the new approach was tested on simulated data with a known spatial structure and on strongly autocorrelated mangrove species’ distribution data collected in northern Australia. The RAC approach was implemented as generalized linear models (GLMs) and boosted regression tree (BRT) models. We found that the BRT models with only environmental explanatory variables can account for some SAC, but applying the standard autologistic or RAC approaches further reduced SAC in model residuals and substantially improved model predictive performance. The RAC approach showed stronger inferential performance than the standard autologistic approach, as parameter estimates were more accurate and statistically significant variables were accurately identified. The new RAC approach presented here has the potential to account for spatial autocorrelation while maintaining strong predictive and inferential performance, and can be implemented across a range of modelling approaches.  相似文献   

17.
Linear discriminant analysis (LDA) is frequently used for classification/prediction problems in physical anthropology, but it is unusual to find examples where researchers consider the statistical limitations and assumptions required for this technique. In these instances, it is difficult to know whether the predictions are reliable. This paper considers a nonparametric alternative to predictive LDA: binary, recursive (or classification) trees. This approach has the advantage that data transformation is unnecessary, cases with missing predictor variables do not require special treatment, prediction success is not dependent on data meeting normality conditions or covariance homogeneity, and variable selection is intrinsic to the methodology. Here I compare the efficacy of classification trees with LDA, using typical morphometric data. With data from modern hominoids, the results show that both techniques perform nearly equally. With complete data sets, LDA may be a better choice, as is shown in this example, but with missing observations, classification trees perform outstandingly well, whereas commercial discriminant analysis programs do not predict classifications for cases with incompletely measured predictor variables and generally are not designed to address the problem of missing data. Testing of data prior to analysis is necessary, and classification trees are recommended either as a replacement for LDA or as a supplement whenever data do not meet relevant assumptions. It is highly recommended as an alternative to LDA whenever the data set contains important cases with missing predictor variables.  相似文献   

18.
Aim The oceans harbour a great diversity of organisms whose distribution and ecological preferences are often poorly understood. Species distribution modelling (SDM) could improve our knowledge and inform marine ecosystem management and conservation. Although marine environmental data are available from various sources, there are currently no user‐friendly, high‐resolution global datasets designed for SDM applications. This study aims to fill this gap by assembling a comprehensive, uniform, high‐resolution and readily usable package of global environmental rasters. Location Global, marine. Methods We compiled global coverage data, e.g. satellite‐based and in situ measured data, representing various aspects of the marine environment relevant for species distributions. Rasters were assembled at a resolution of 5 arcmin (c. 9.2 km) and a uniform landmask was applied. The utility of the dataset was evaluated by maximum entropy SDM of the invasive seaweed Codium fragile ssp. fragile. Results We present Bio‐ORACLE (ocean rasters for analysis of climate and environment), a global dataset consisting of 23 geophysical, biotic and climate rasters. This user‐friendly data package for marine species distribution modelling is available for download at http://www.bio‐oracle.ugent.be . The high predictive power of the distribution model of C. fragile ssp. fragile clearly illustrates the potential of the data package for SDM of shallow‐water marine organisms. Main conclusions The availability of this global environmental data package has the potential to stimulate marine SDM. The high predictive success of the presence‐only model of a notorious invasive seaweed shows that the information contained in Bio‐ORACLE can be informative about marine distributions and permits building highly accurate species distribution models.  相似文献   

19.
Model complexity in ecological niche modelling has been recently considered as an important issue that might affect model performance. New methodological developments have implemented the Akaike information criterion (AIC) to capture model complexity in the Maxent algorithm model. AIC is calculated based on the number of parameters and likelihoods of continuous raw outputs. ENMeval R package allows users to perform a species-specific tuning of Maxent settings running models with different combinations of regularization multiplier and feature classes and finally, all these models are compared using AIC corrected for small sample size. This approach is focused to find the “best” model parametrization and it is thought to maximize the model complexity and therefore, its predictability. We found that most niche modelling studies examined by us (68%) tend to consider AIC as a criterion of predictive accuracy in geographical distribution. In other words, AIC is used as a criterion to choose those models with the highest capacity to discriminate between presences and absences. However, the link between AIC and geographical predictive accuracy has not been tested so far. Here, we evaluated this relationship using a set of simulated (virtual) species. We created a set of nine virtual species with different ecological and geographical traits (e.g., niche position, niche breadth, range size) and generated different sets of true presences and absences data across geography. We built a set of models using Maxent algorithm with different regularization values and features schemes and calculated AIC values for each model. For each model, we obtained binary predictions using different threshold criteria and validated using independent presence and absences data. We correlated AIC values against standard validation metrics (e.g., Kappa, TSS) and the number of pixels correctly predicted as presences and absences. We did not find a correlation between AIC values and predictive accuracy from validation metrics. In general, those models with the lowest AIC values tend to generate geographical predictions with high commission and omission errors. The results were consistent across all species simulated. Finally, we suggest that AIC should not be used if users are interested in prediction more than explanation in ecological niche modelling.  相似文献   

20.
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号