首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
本文首先给出了一种新的预测误差估计方法,称为组块3×2交叉验证估计.通过基于人造与真实癌症生物数据的多个分类器上的模拟实验验证了本文方法在均方误差意义下优于文献中常用的2折和随机5×2交叉验证估计,对于5折,10折和Bootstrap交叉验证,在大多数情形下,本文方法也具有更小的均方误差.且比10折和随机5×2交叉验证有更小的计算开销.  相似文献   

2.
基于模型V=aDb,首先在Matlab下用模拟实验的方法,研究了度量误差对模型参数估计的影响,结果表明:当V的误差固定而D的误差不断增大时,用通常最小二乘法对模型进行参数估计,参数a的估计值不断增大,参数b的估计值不断减小,参数估计值随着 D的度量误差的增大越来越远离参数真实值;然后对消除度量误差影响的参数估计方法进行研究,分别用回归校准法、模拟外推法和度量误差模型方法对V和D都有度量误差的数据进行参数估计,结果表明:回归校准法、模拟外推法和度量误差模型方法都能得到参数的无偏估计,克服了用通常最小二乘法进行估计造成的参数估计的系统偏差,结果进一步表明度量误差模型方法优于回归校准法和模拟外推法.  相似文献   

3.
一种多元非线性度量误差模型的参数估计及算法   总被引:13,自引:2,他引:13  
本文讨论多元函数关系度量误差模型中参数的一种估计方法及算法的实现.  相似文献   

4.
用度量误差模型方法编制相容的生长过程表和材积表   总被引:14,自引:2,他引:12  
指出了按照常规方法建立的生长模型和材积模型不相容的原因、利用两阶段度量误差模型方法估计生长模型和材积模型的参数,进而编制相容的生长过程表和材积表.  相似文献   

5.
对于2SUR回归模型的参数估计问题,给出了一些一般均方误差矩阵比较结果,据此提出了一类线性估计和一类基于离差阵广义非限定估计的非线性两步估计,并获得了该两步估计类的一些有限样本性质。  相似文献   

6.
回归模型可用于预测森林生态系统地上生物量,其中最为常用的是最小二乘回归模型。在预测灌木,尤其是多茎灌木的地上生物量 时,最小二乘法与贝叶斯方法的比较很少被研究。我们开发了小叶锦鸡儿(Caragana microphylla Lam.)生物量预测模型。小叶锦鸡儿是科尔 沁沙地广泛分布的多茎灌木,对减少风蚀、固定沙丘具有重要作用。本研究建立6种表征生物量的异速增长模型,并基于统计标准选择 在预测生物量方面表现最佳的1种,然后分别用最小二乘法与贝叶斯方法对模型中的参数进行估计。参数估计过程中用自助法考察样本量大 小的影响,同时区分测试集与训练集。最后,我们比较了最小二乘法与贝叶斯方法在小叶锦鸡儿地上生物量预测中的表现。异速增长的6个 模型均达到显著水平,其中幂指数为1的模型表现最佳。研究结果表明,采用无先验信息与有先验信息的贝叶斯方法进行估计,得到的均 方误差在测试集上低于最小二乘法。另外,基径作为预测变量在最小二乘法与贝叶斯方法中均不显著,表明在生物量预测模型中应谨慎选 择合适变量。本研究强调贝叶斯方法、自助法和异速增长模型相结合能够提升沙地灌木生物量预测模型的准确度。  相似文献   

7.
三种森林生物量估测模型的比较分析   总被引:2,自引:0,他引:2       下载免费PDF全文
森林生物量的定量估算为全球碳储量、碳循环研究提供了重要的参考依据。该研究采用黑龙江长白山地区的TM影像和133块森林资源一类清查样地的数据, 选取地学参数、遥感反演参数等71个自变量分别构建多元逐步回归模型、传统BP (back propagation)神经网络模型和基于高斯误差函数的BP神经网络改进模型(Gaussian error function, Erf-BP), 进而估算该地区的森林生物量, 并进行比较分析。结果表明, 多元逐步回归模型估测的森林生物量预测精度为75%, 均方根误差为26.87 t·m-2; 传统BP神经网络模型估测森林生物量的预测精度为80.92%, 均方根误差为21.44 t·m-2; Erf-BP估测森林生物量的预测精度为82.22%, 均方根误差为20.83 t·m-2。可见, 改进后的Erf-BP能更好地模拟生物量与各个因子之间的关系, 估算精度更高。  相似文献   

8.
基于观测数据的陆地生态系统模型参数估计有助于提高模型的模拟和预测能力,降低模拟不确定性.在已有参数估计研究中,涡度相关技术测定的净生态系统碳交换量(NEE)数据的随机误差通常被假设为服从零均值的正态分布.然而近年来已有研究表明NEE数据的随机误差更服从双指数分布.为探讨NEE观测误差分布类型的不同选择对陆地生态系统机理模型参数估计以及碳通量模拟结果造成的差异,以长白山温带阔叶红松林为研究区域,采用马尔可夫链-蒙特卡罗方法,利用2003~2005年测定的NEE数据对陆地生态系统机理模型CEVSA2的敏感参数进行估计,对比分析了两种误差分布类型(正态分布和双指数分布)的参数估计结果以及碳通量模拟的差异.结果表明,基于正态观测误差模拟的总初级生产力和生态系统呼吸的年总量分别比基于双指数观测误差的模拟结果高61~86 g C m-2 a-1和107~116 g C m-2 a-1,导致前者模拟的NEE年总量较后者低29~47 g C m-2 a-1,特别在生长旺季期间有明显低估.在参数估计研究中,不能忽略观测误差的分布类型以及相应的目标函数的选择,它们的不合理设置可能对参数估计以及模拟结果产生较大影响.  相似文献   

9.
对于一些复杂的农业生态系统,人们对其生态过程了解较少,且这些系统的不确定性和模糊性较大,用传统的方法难以模拟这些系统的行为,神经网络模型因为能较精确地模拟这些系统的行为,而引起生态学者们的广泛兴趣。该文着重介绍了误差逆传神经网络模型的结构、算法及其在农业和生态学中的应用研究。误差逆传神经网络模型一般采用三层神经网络模型结构,三层的神经网络模型能模拟任意复杂程度的连续函数,而且因为它的结构小而不容易产生与训练数据的过度吻合。误差逆传神经网络模型算法的主要特征是:利用当前的输入误差对权值进行调整。在生态学和农业研究中,误差逆传神经网络模型通常作为非线性函数模拟器用于预测作物产量、生物生产量、生物与环境之间的关系等。已有的研究表明:误差逆传神经网络模型的模拟精度要远远高于多元线性方程,类似于非线性方程,而在样本量足够的情况下,有一定的外推能力。但是误差逆传神经网络模型需要大量的样本量来保证所求取参数的可靠性,但这在实际研究中很难做到,因而限制了误差逆传神经网络模型的应用。近年来人们提出了强制训练停止、复合模型等多种技术来提高误差逆传神经网络模型的外推能力,也提出了Garson算法、敏感性分析以及随机化检验等技术对误差逆传神经网络模型的机理进行解释。误差逆传神经网络模型的真正优势在于模拟人们了解较少或不确定性和模糊性较大系统的行为,这些是传统模型所无法实现的,因而是对传统机理模型的重要补充。  相似文献   

10.
生长模型的误差函数及其数学特征   总被引:7,自引:1,他引:7  
生长曲线是估计动物年龄的重要方法之一,在野生动物生态学中,动物的体重往往被用作估计动物年龄的主要指标。然而,在动物体重测定过程中经常会出现一些偏差。例如,动物的日常活动( 进食、饮水、排泄等)通常会引起动物体重的变化,这样在不同时间测定动物的体重就会产生偏差;此外,在测定动物体重的过程中,我们往往称量到一定的精确度。这些偏差将直接导致对动物年龄的估计误差。本文分析了4种常见生长模型(Logistic、Gompertz、Bertalanffy、Richards)的误差函数的数学特征。结果表明,由动物日常活动导致的年龄估计误差在动物的幼龄阶段为量小,而由称量精确度导致的年龄估算误差在生长曲线的拐点处为最小。  相似文献   

11.
Best linear unbiased prediction is well known for its wide rangeof applications including small area estimation. While the theoryis well established for mixed linear models and under normalityof the error and mixing distributions, the literature is sparsefor nonlinear mixed models under nonnormality of the error distributionor of the mixing distributions. We develop a resampling-basedunified approach for predicting mixed effects under a generalizedmixed model set-up. Second-order-accurate nonnegative estimatorsof mean squared prediction errors are also developed. Giventhe parametric model, the proposed methodology automaticallyproduces estimators of the small area parameters and their meansquared prediction errors, without requiring explicit analyticalexpressions for the mean squared prediction errors.  相似文献   

12.
Readily available proxies for the time of disease onset such as the time of the first diagnostic code can lead to substantial risk prediction error if performing analyses based on poor proxies. Due to the lack of detailed documentation and labor intensiveness of manual annotation, it is often only feasible to ascertain for a small subset the current status of the disease by a follow-up time rather than the exact time. In this paper, we aim to develop risk prediction models for the onset time efficiently leveraging both a small number of labels on the current status and a large number of unlabeled observations on imperfect proxies. Under a semiparametric transformation model for onset and a highly flexible measurement error model for proxy onset time, we propose the semisupervised risk prediction method by combining information from proxies and limited labels efficiently. From an initially estimator solely based on the labeled subset, we perform a one-step correction with the full data augmenting against a mean zero rank correlation score derived from the proxies. We establish the consistency and asymptotic normality of the proposed semisupervised estimator and provide a resampling procedure for interval estimation. Simulation studies demonstrate that the proposed estimator performs well in a finite sample. We illustrate the proposed estimator by developing a genetic risk prediction model for obesity using data from Mass General Brigham Healthcare Biobank.  相似文献   

13.
Accurate class probability estimation is important for medical decision making but is challenging, particularly when the number of candidate features exceeds the number of cases. Special methods have been developed for nonprobabilistic classification, but relatively little attention has been given to class probability estimation with numerous candidate variables. In this paper, we investigate overfitting in the development of regularized class probability estimators. We investigate the relation between overfitting and accurate class probability estimation in terms of mean square error. Using simulation studies based on real datasets, we found that some degree of overfitting can be desirable for reducing mean square error. We also introduce a mean square error decomposition for class probability estimation that helps clarify the relationship between overfitting and prediction accuracy.  相似文献   

14.
Lam Tran  Kevin He  Di Wang  Hui Jiang 《Biometrics》2023,79(2):1280-1292
The proliferation of biobanks and large public clinical data sets enables their integration with a smaller amount of locally gathered data for the purposes of parameter estimation and model prediction. However, public data sets may be subject to context-dependent confounders and the protocols behind their generation are often opaque; naively integrating all external data sets equally can bias estimates and lead to spurious conclusions. Weighted data integration is a potential solution, but current methods still require subjective specifications of weights and can become computationally intractable. Under the assumption that local data are generated from the set of unknown true parameters, we propose a novel weighted integration method based upon using the external data to minimize the local data leave-one-out cross validation (LOOCV) error. We demonstrate how the optimization of LOOCV errors for linear and Cox proportional hazards models can be rewritten as functions of external data set integration weights. Significant reductions in estimation error and prediction error are shown using simulation studies mimicking the heterogeneity of clinical data as well as a real-world example using kidney transplant patients from the Scientific Registry of Transplant Recipients.  相似文献   

15.
Summary Clinicians are often interested in the effect of covariates on survival probabilities at prespecified study times. Because different factors can be associated with the risk of short‐ and long‐term failure, a flexible modeling strategy is pursued. Given a set of multiple candidate working models, an objective methodology is proposed that aims to construct consistent and asymptotically normal estimators of regression coefficients and average prediction error for each working model, that are free from the nuisance censoring variable. It requires the conditional distribution of censoring given covariates to be modeled. The model selection strategy uses stepup or stepdown multiple hypothesis testing procedures that control either the proportion of false positives or generalized familywise error rate when comparing models based on estimates of average prediction error. The context can actually be cast as a missing data problem, where augmented inverse probability weighted complete case estimators of regression coefficients and prediction error can be used ( Tsiatis, 2006 , Semiparametric Theory and Missing Data). A simulation study and an interesting analysis of a recent AIDS trial are provided.  相似文献   

16.
Quantification of uncertainty associated with risk estimates is an important part of risk assessment. In recent years, use of second-order distributions, and two-dimensional simulations have been suggested for quantifying both variability and uncertainty. These approaches are better interpreted within the Bayesian framework. To help practitioners better use such methods and interpret the results, in this article, we describe propagation and interpretation of uncertainty in the Bayesian paradigm. We consider both the estimation problem where some summary measures of the risk distribution (e.g., mean, variance, or selected percentiles) are to be estimated, and the prediction problem, where the risk values for some specific individuals are to be predicted. We discuss some connections and differences between uncertainties in estimation and prediction problems, and present an interpretation of a decomposition of total variability/uncertainty into variability and uncertainty in terms of expected squared error of prediction and its reduction from perfect information. We also discuss the role of Monte Carlo methods in characterizing uncertainty. We explain the basic ideas using a simple example, and demonstrate Monte Carlo calculations using another example from the literature.  相似文献   

17.
大尺度估算森林生物量一直是人们关注的焦点,而构建林分水平的生物量模型是一种估算森林乔木层生物量的方法。本研究基于聚合法1、聚合法2、平差法、分解法构建红松人工林林分生物量模型,并对比分析4种可加性方法的预测精度,为黑龙江省红松人工林的生物量预测提供科学依据。各模型均使用权函数来消除各模型的异方差,并以留一交叉验证法(LOOCV)作为各模型的检验方法。结果表明: 平差法的整体预测能力略优于聚合法1、聚合法2和分解法,预测精度排序为平差法>聚合法1>聚合法2>分解法;分别对比不同林分断面积的预测能力时,4种可加性方法的预测精度不一致。当红松人工林的林分断面积分布于0~10或50~60 m2·hm-2区间时,建议采用分解法的参数估计值,而林分断面积分布于其他区间时,建议采用平差法的参数估计值。  相似文献   

18.
The estimation of individual values (marks) in a finite population of units (e.g., trees) scattered onto a survey region is considered under 3P sampling. For each unit, the mark is estimated by means of an inverse distance weighting interpolator. Conditions ensuring the design-based consistency of maps are considered under 3P sampling. A computationally simple mean squared error estimator is adopted. Because 3P sampling involves the prediction of marks for each unit in the population, prediction errors rather than marks can be interpolated. Then, marks are estimated by the predictions plus the interpolated errors. If predictions are good, prediction errors are more smoothed than raw marks so that the procedure is likely to better meet consistency requirements. The purpose of this paper is to provide theoretical and empirical evidence on the effectiveness of the interpolation based on prediction errors to prove that the proposed strategy is a tool of general validity for mapping forest stands.  相似文献   

19.
Marques TA 《Biometrics》2004,60(3):757-763
Line transect sampling is one of the most widely used methods for animal abundance assessment. Standard estimation methods assume certain detection on the transect, no animal movement, and no measurement errors. Failure of the assumptions can cause substantial bias. In this work, the effect of error measurement on line transect estimators is investigated. Based on considerations of the process generating the errors, a multiplicative error model is presented and a simple way of correcting estimates based on knowledge of the error distribution is proposed. Using beta models for the error distribution, the effect of errors and of the proposed correction is assessed by simulation. Adequate confidence intervals for the corrected estimates are obtained using a bootstrap variance estimate for the correction and the delta method. As noted by Chen (1998, Biometrics 54, 899-908), even unbiased estimators of the distances might lead to biased density estimators, depending on the actual error distribution. In contrast with the findings of Chen, who used an additive model, unbiased estimation of distances, given a multiplicative model, lead to overestimation of density. Some error distributions result in observed distance distributions that make efficient estimation impossible, by removing the shoulder present in the original detection function. This indicates the need to improve field methods to reduce measurement error. An application of the new methods to a real data set is presented.  相似文献   

20.
One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined. This work has much in common with robust estimation, but differs from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Simulation results are given showing the reduction in error rates that can be obtained with this method when compared with maximum likelihood estimation, especially under certain forms of model misspecification. Analysis of a melanoma dataset is presented to illustrate the use of the method in practice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号