首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
温度与发育速率关系模拟是昆虫学研究的一个重要内容, 传统基于经验风险最小的非线性参数模型(Logan模型、Lactin模型和王氏模型)存在诸多弊端。本文基于结构风险最小的改进支持向量回归(SVR)研究温度与棉铃虫Helicoverpa armigera蛹发育历期关系。结果表明: 与传统非线性模型相比, SVR模型性能优异; 基于全部92个样本, SVR模型拟合和留一法预测的决定系数R2分别为0.998和0.996, 估测的蛹期三基点温度更可信。从全部样本中依温度均匀选取部分样本实施独立预测, 当训练集为20个样本时, SVR模型独立预测的R2为0.981, 优于传统非线性模型中独立预测最佳的Lactin模型(R2=0.958); 当训练集进一步减少到12个样本时, SVR模型的R2仅降低到0.964, 而传统非线性模型均已不适用。结果提示SVR模型在小样本情况下较传统非线性模型优势明显, 在昆虫发育历期估测建模中有应用前景。  相似文献   

2.
ARIMA与SVM组合模型在害虫预测中的应用   总被引:2,自引:0,他引:2  
向昌盛  周子英 《昆虫学报》2010,53(9):1055-1060
害虫发生是一种复杂、 动态时间序列数据, 单一预测模型都是基于线性或非线性数据, 不能同时捕捉害虫发生的线性和非线性规律, 很难达到理想的预测精度。本研究首先采用差分自回归移动平均模型对昆虫发生时间序列进行线性建模, 然后采用支持向量机对非线性部分进行建模, 最后得到两种模型的组合预测结果。将组合模型应用到松毛虫Dendrolimus punctatus发生面积的预测, 实验结果表明组合模型的预测精度明显优于单一模型, 发挥了两种模型各自的优势。组合模型是一种切实可行的害虫预测预报方法。  相似文献   

3.
目的:采用定量构效关系(QSAR)方法探索酚类化合物的毒性与分子结构参数的关系。方法:基于支持向量回归(SVR)、依均方误差最小原则选择最优核函数,对酚类化合物及其衍生物进行了QSAR研究。结果:不同数据集选取的最优核函数有异,对小样本、非线性等问题,SVR具有较优的稳定性及预测能力,在酚类化合物及其衍生物的QSAR研究中得到了优于原文献方法的独立预测结果。结论:SVR模型具有较好的预测能力,在QSAR及相关研究中可得到更广泛应用。  相似文献   

4.
沈泽昊  赵俊 《生态学报》2007,27(3):953-963
将基于样本调查数据的群落-生境因子回归分析与GIS支持下的植物属性空间格局预测结合起来,是国际上植被-环境关系定量研究的新途径。通用可加性模型(GAM)的非参数属性使之具有对不同数据类型的广泛适应性,成为这种“回归分析+空间预测”途经的有效手段;不同程度上依赖于数字高程模型的环境空间数据集是实现空间预测的必要条件。介绍了这一新的研究途径,并应用于案例研究区域植物多样性指标空间格局的预测和分析。野外调查的一组样方地形特征指标和植物多样性指标(包括样方物种丰富度及乔木、灌木、草本、常绿木本、珍稀种类的丰富度),分别作为预测变量和响应变量,建立GAM模型。结合研究区域10m分辨率的数字高程模型,对该区域植物物种丰富度的空间格局进行空间预测,并对预测模型和结果进行统计分析和检验。结果表明:(1)不同的多样性指标具有不同的模型结构和模拟效果,重复模拟的结果稳定性也不同,反映了所受地形因子影响的差异;(2)影响各多样性指标空间格局的地形变量主要是坡位和坡度等小尺度特征,大尺度海拔因素的影响并不显著;(3)模拟结果与独立检验数据的相关分析表明,对乔木种、草本种、珍稀种的模拟全部有效;对常绿种和样方物种总数的模拟部分有效;而对灌木种丰富度的预测基本失败。(4)模型预测变量有效性和全面性决定了模型对数据的解释能力,样本大小对模型的稳定性和可靠性也有显著影响。就地形因子对生境条件的代表性、模拟误差的来源及GAMs模型的优缺点和应用前景进行了讨论。  相似文献   

5.
 利用多维时间序列分析方法,建立长苞铁杉(Tsuga longibracteata)种群胸径生长的多维时间序列模型,即Yt=1.7870785Yt-1-0.8950167Yt-2+0.4509997Ut-0.8036035Ut-1+0.3950577Ut-2,其中Yt、Yt-1、Yt-2分别为t年、t-5年、t-10年长苞铁杉胸径值(cm),Ut、Ut-1、Ut-2分别为t年、t-5年、t-10年长苞铁杉种群个体年龄(a),模型相关系数为0.9998。以年轮确定种群个体年龄法与多维时间序列模型相结合,建立能  相似文献   

6.
格氏栲种群个体年龄与胸径的时间序列模型研究   总被引:25,自引:3,他引:22       下载免费PDF全文
本文提出格氏栲种群以年轮确定种群个体年龄并与胸径、树高确定种群个体年龄有机结合的时间序列预测个体年龄方法。通过时间序列分析,确定出格氏栲种群个体年龄与胸径关系的ARIMA(1,2,0)模型,经检验该模型的相似性系数为93.48%,即该模型预测胸径生长量是可靠的。同时,通过ARI-MA(1,2,0)模型预测结果与实际调查材料组合起来,建立较为准确反映个体年龄的组合模型:A=10.15451+1.113851D+0.04220049D2-0.000227303D3式中D为格氏栲种群个体胸径,A为格氏栲种群个体年龄,相关指数为0.9998。可见,组合模型的格氏栲种群个体胸径与年龄回归关系极显著,效果理想,为相应研究提供一个较为可靠的方法。  相似文献   

7.
1问题的提出近年来,回归(线性或非线性的)模型应用十分广泛,并取得了积极的应用效果。不过,在建立回归模型时,有的著(作)者在建立线性或非线性回归模型之后,为检验模型理论值和实际观测值的拟合情况,往往还应用卡方适合性测验方法进行理论值和观察值吻合程度的测验[1~3]。作者认为这些检验方法实为卡方检验的误用。为进一步说明、分析该问题,在此将卡方适合性测定方法描述如下。2卡方适合性检验方法及步骤[4]设总体X的分布函数为F(x),且未知,X1,X2,…,Xn为其样本,我们的目的是要检验F(x)是否与预先给定的分布函…  相似文献   

8.
林木生长的多维时间序列分析   总被引:15,自引:0,他引:15  
利用多维时间序列分析方法,以影响杉木直径生长的五大主导气象要素作控制因子,建立杉木直径生长的CAR模型,从而对杉木直径生长提前一年作预测,回验结果表明模型准确率很高,为林木生长预测预报提供了一种新方法.  相似文献   

9.
多元模糊回归在害虫测报上的应用   总被引:8,自引:2,他引:6  
介绍了多元模糊回归(AnalysisforMultipleFuzzyRegression)害虫预测方法。建立了安徽徽州地区稻纵卷叶螟五(3)代二峰日蛾量的模糊隶属函数集。对历史资料的回代验证与独立样本的试报,结果令人满意,拟合率在90%以上。分级预报的结果比多元回归方法更为明确。  相似文献   

10.
马尾松毛虫幼虫高峰期发生量的预测模型研究   总被引:3,自引:0,他引:3  
【目的】为了提高马尾松毛虫Dendrolimus punctatus Walker发生量预测预报结果的准确性,为选用合适的预测模型提供依据。【方法】本文用平稳时间序列法、回归预测法、BP神经网络法、马尔科夫链法和列联表多因子多级分析预测法研究建立安徽省潜山县1983—2016年33年的马尾松毛虫1代和2代幼虫高峰期发生量的预测模型,并对5种模型进行比较。【结果】以卵高峰期卵量为自变量的回归模型、多元回归模型和逐步回归模型预测结果与实际值相差0.21~0.31头/株,其它8个一元回归预测结果与实际值相差1.06~1.58头/株。平稳时间序列预测2015和2016年的结果与实际值完全相符。BP神经网络预测结果若以误差标准为1头/株,1983—2014年预报准确率1代为90.32%,2代为100%。马尔科夫链预测2015和2016年,预测结果与实际值完全相符,均为1级。列联表多因子多级综合相关分析法预测2015和2016年的结果与实际值完全相符,1983—2014年1代幼虫高峰期发生量预测的历史符合率均为90.32%,2代为83.47%。为了研究不同分级标准对预测值的影响,将2代幼虫高峰期发生量的1级标准改为小于3.5头/株,则历史符合率为74.19%。【结论】上述方法中,回归预测法自变量的选择是预报准确的关键;时间平稳序列法适用于害虫发生过程符合平稳时间序列的标准;马尔科夫链法和列联表分析法分级标准科学与否直接影响预测结果的准确性;BP神经网络法可用于自变量与预报量非线性关系的研究,是一种比较理想的预报方法。  相似文献   

11.
The method of non-linear forecasting of time series was applied to different simulated signals and EEG in order to check its ability of distinguishing chaotic from noisy time series. The goodness of prediction was estimated, in terms of the correlation coefficient between forecasted and real time series, for non-linear and autoregressive (AR) methods. For the EEG signal both methods gave similar results. It seems that the EEG signal, in spite of its chaotic character, is well described by the AR model.  相似文献   

12.
AIM: We investigated the use of non-linear, multidimensional factor analysis for the study of observational data on death from breast cancer. These data were obtained in the context of a clinical practice and not in a clinical trial. We looked into the correlations between patient characteristics and time of death and/or disease-free interval. PATIENTS AND METHODS: We first analyzed the characteristics of a population of patients that had died from breast cancer (n = 295), then of a population including patients still alive 7 years after surgery (n = 344). We used correspondence analysis (CA) which is based on chi(2)-metrics, does not assume linear relationships, and provides graphic overviews. RESULTS: The CA mapped variables (clinical stage, histoprognostic grade, node status, receptor positivity) in a way that fits in well with available knowledge on their importance as prognostic factors. We observed, however, that death occurred during three main periods (1-3, 4-7, < OR = 8 years after surgery) defined by different mixes of variables as if the disease progressed by stage rather than continuously. The CA distinguished long-term survivors (>7 years) from patients who died 8-10 years after surgery. Long-term survivors tended to be node-negative; those who died at 8-10 years tended to be the youngest patients (under 40). CONCLUSIONS: Because correspondence analysis combines the advantages of multidimensional and non-linear methods, it is a valuable exploratory tool for describing multiple correlations within a population before attempting to establish statistical significance of selected variables by more classic methods.  相似文献   

13.
The increasing availability of time series expression datasets, although promising, raises a number of new computational challenges. Accordingly, the development of suitable classification methods to make reliable and sound predictions is becoming a pressing issue. We propose, here, a new method to classify time series gene expression via integration of biological networks. We evaluated our approach on 2 different datasets and showed that the use of a hidden Markov model/Gaussian mixture models hybrid explores the time-dependence of the expression data, thereby leading to better prediction results. We demonstrated that the biclustering procedure identifies function-related genes as a whole, giving rise to high accordance in prognosis prediction across independent time series datasets. In addition, we showed that integration of biological networks into our method significantly improves prediction performance. Moreover, we compared our approach with several state-of–the-art algorithms and found that our method outperformed previous approaches with regard to various criteria. Finally, our approach achieved better prediction results on early-stage data, implying the potential of our method for practical prediction.  相似文献   

14.
15.
We apply geostatistical modeling techniques to investigate spatial patterns of species richness. Unlike most other statistical modeling techniques that are valid only when observations are independent, geostatistical methods are designed for applications involving spatially dependent observations. When spatial dependencies, which are sometimes called autocorrelations, exist, geostatistical techniques can be applied to produce optimal predictions in areas (typically proximate to observed data) where no observed data exist. Using tiger beetle species (Cicindelidae) data collected in western North America, we investigate the characteristics of spatial relationships in species numbers data, First, we compare the accuracy of spatial predictions of species richness when data from grid squares of two different sizes (scales) are used to form the predictions. Next we examine how prediction accuracy varies as a function of areal extent of the region under investigation. Then we explore the relationship between the number of observations used to build spatial prediction models and prediction accuracy. Our results indicate that, within the taxon of tiger beetles and for the two scales we investigate, the accuracy of spatial predictions is unrelated to scale and that prediction accuracy is not obviously related lo the areal extent of the region under investigation. We also provide information about the relationship between sample size and prediction accuracy, and, finally, we show that prediction accuracy may be substantially diminished if spatial correlations in the data are ignored.  相似文献   

16.
Support vector machines are a popular machine learning method for many classification tasks in biology and chemistry. In addition, the support vector regression (SVR) variant is widely used for numerical property predictions. In chemoinformatics and pharmaceutical research, SVR has become the probably most popular approach for modeling of non-linear structure-activity relationships (SARs) and predicting compound potency values. Herein, we have systematically generated and analyzed SVR prediction models for a variety of compound data sets with different SAR characteristics. Although these SVR models were accurate on the basis of global prediction statistics and not prone to overfitting, they were found to consistently mispredict highly potent compounds. Hence, in regions of local SAR discontinuity, SVR prediction models displayed clear limitations. Compared to observed activity landscapes of compound data sets, landscapes generated on the basis of SVR potency predictions were partly flattened and activity cliff information was lost. Taken together, these findings have implications for practical SVR applications. In particular, prospective SVR-based potency predictions should be considered with caution because artificially low predictions are very likely for highly potent candidate compounds, the most important prediction targets.  相似文献   

17.
Quantitative predictions in computational life sciences are often based on regression models. The advent of machine learning has led to highly accurate regression models that have gained widespread acceptance. While there are statistical methods available to estimate the global performance of regression models on a test or training dataset, it is often not clear how well this performance transfers to other datasets or how reliable an individual prediction is–a fact that often reduces a user’s trust into a computational method. In analogy to the concept of an experimental error, we sketch how estimators for individual prediction errors can be used to provide confidence intervals for individual predictions. Two novel statistical methods, named CONFINE and CONFIVE, can estimate the reliability of an individual prediction based on the local properties of nearby training data. The methods can be applied equally to linear and non-linear regression methods with very little computational overhead. We compare our confidence estimators with other existing confidence and applicability domain estimators on two biologically relevant problems (MHC–peptide binding prediction and quantitative structure-activity relationship (QSAR)). Our results suggest that the proposed confidence estimators perform comparable to or better than previously proposed estimation methods. Given a sufficient amount of training data, the estimators exhibit error estimates of high quality. In addition, we observed that the quality of estimated confidence intervals is predictable. We discuss how confidence estimation is influenced by noise, the number of features, and the dataset size. Estimating the confidence in individual prediction in terms of error intervals represents an important step from plain, non-informative predictions towards transparent and interpretable predictions that will help to improve the acceptance of computational methods in the biological community.  相似文献   

18.
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.  相似文献   

19.
Investigation of the dynamics underlying periodic complexes in the EEG   总被引:4,自引:0,他引:4  
Periodic complexes (PC), occurring lateralised or diffuse, are relatively rare EEG phenomena which reflect acute severe brain disease. The pathophysiology is still incompletely understood. One hypothesis suggested by the alpha rhythm model of Lopes da Silva is that periodic complexes reflect limit cycle dynamics of cortical networks caused by excessive excitatory feedback. We examined this hypothesis by applying a recently developed technique to EEGs displaying periodic complexes and to periodic complexes generated by the model. The technique, non-linear cross prediction, characterises how well a time series can be predicted, and how much amplitude and time asymmetry is present. Amplitude and time asymmetry are indications of non-linearity. In accordance with the model, most EEG channels with PC showed clear evidence of amplitude and time asymmetry, pointing to non-linear dynamics. However, the non-linear predictability of true PC was substantially lower than that of PC generated by the model. Furthermore, no finite value for the correlation dimension could be obtained for the real EEG data, whereas the model time series had a dimension slighter higher than one, consistent with a limit cycle attractor. Thus we can conclude that PC reflect non-linear dynamics, but a limit cycle attractor is too simple an explanation. The possibility of more complex (high dimensional and spatio-temporal) non-linear dynamics should be investigated. Received: 26 February 1998 / Accepted in revised form: 24 August 1998  相似文献   

20.
Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR) of malaria; a smaller time series data (deaths due to Plasmodium vivax) of one year; and spatial data (zonal distribution of P. vivax deaths) for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city. The study also demonstrates that with excellent models of climatic forecasts readily available, using this method one can predict the disease incidence at long forecasting horizons, with high degree of efficiency and based on such technique a useful early warning system can be developed region wise or nation wise for disease prevention and control activities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号