共查询到20条相似文献,搜索用时 46 毫秒
1.
We consider profile-likelihood inference based on the multinomial distribution for assessing the accuracy of a diagnostic test. The methods apply to ordinal rating data when accuracy is assessed using the area under the receiver operating characteristic (ROC) curve. Simulation results suggest that the derived confidence intervals have acceptable coverage probabilities, even when sample sizes are small and the diagnostic tests have high accuracies. The methods extend to stratified settings and situations in which the ratings are correlated. We illustrate the methods using data from a clinical trial on the detection of ovarian cancer. 相似文献
2.
Summary . In medical research, there is great interest in developing methods for combining biomarkers. We argue that selection of markers should also be considered in the process. Traditional model/variable selection procedures ignore the underlying uncertainty after model selection. In this work, we propose a novel model-combining algorithm for classification in biomarker studies. It works by considering weighted combinations of various logistic regression models; five different weighting schemes are considered in the article. The weights and algorithm are justified using decision theory and risk-bound results. Simulation studies are performed to assess the finite-sample properties of the proposed model-combining method. It is illustrated with an application to data from an immunohistochemical study in prostate cancer. 相似文献
3.
Often, the functional form of covariate effects in an additive model varies across groups defined by levels of a categorical variable. This structure represents a factor-by-curve interaction. This article presents penalized spline models that incorporate factor-by-curve interactions into additive models. A mixed model formulation for penalized splines allows for straightforward model fitting and smoothing parameter selection. We illustrate the proposed model by applying it to pollen ragweed data in which seasonal trends vary by year. 相似文献
4.
Janet Franklin 《植被学杂志》1998,9(5):733-748
Abstract. Generalized additive, generalized linear, and classification tree models were developed to predict the distribution of 20 species of chaparral and coastal sage shrubs within the southwest ecoregion of California. Mapped explanatory variables included bioclimatic attributes related to primary environmental regimes: averages of annual precipitation, minimum temperature of the coldest month, maximum temperature of the warmest month, and topographically-distributed potential solar insolation of the wettest quarter (winter) and of the growing season (spring). Also tested for significance were slope angle (related to soil depth) and the geographic coordinates of each observation. Models were parameterized and evaluated based on species presence/absence data from 906 plots surveyed on National Forest lands. Although all variables were significant in at least one of the species’ models, those models based only on the bioclimatic variables predicted species presence with 3–26% error. While error would undoubtedly be greater if the models were evaluated using independent data, results indicate that these models are useful for predictive mapping – for interpolating species distribution data within the ecoregion. All three methods produced models with similar accuracy for a given species; GAMs were useful for exploring the shape of the response functions, GLMs allowed those response functions to be parameterized and their significance tested, and classification trees, while some-times difficult to interpret, yielded the lowest prediction errors (lower by 3–5%). 相似文献
5.
We modelled forest composition and structural diversity in the Uinta Mountains, Utah, as functions of satellite spectral data and spatially‐explicit environmental variables through generalized additive models. Measures of vegetation composition and structural diversity were available from existing forest inventory data. Satellite data included raw spectral data from the Landsat Thematic Mapper (TM), a GAP Analysis classified TM, and a vegetation index based on raw spectral data from an advanced very high resolution radiometer (AVHRR). Environmental predictor variables included maps of temperature, precipitation, elevation, aspect, slope, and geology. Spatially‐explicit predictions were generated for the presence of forest and lodgepole cover types, basal area of forest trees, percent cover of shrubs, and density of snags. The maps were validated using an independent set of field data collected from the Evanston ranger district within the Uinta Mountains. Within the Evanston ranger district, model predictions were 88% and 80% accurate for forest presence and lodgepole pine (Pinus contorta), respectively. An average 62% of the predictions of basal area, shrub cover, and snag density fell within a 15% deviation from the field validation values. The addition of TM spectral data and the GAP Analysis TM‐classified data contributed significantly to the models' predictions, while AVHRR had less significance. 相似文献
6.
This article presents a method for estimating the accuracy of psychological screening scales using receiver operating characteristic curves and associated statistics. Screening scales are typically semicontinuous within a known range with distributions that are nearly symmetric when the target condition is present and highly skewed when the condition is absent. We model screening scale outcomes using truncated normal distributions that accommodate these different distributional shapes and use subject-specific random effects to adjust for multiple assessments within individuals. Using the proposed model, we estimate the accuracy of the Symptom Checklist as a measure of major depression from a repeatedly screened sample of patients. 相似文献
7.
Background
In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models.Methods
We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not.Results
We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores.Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1616-z) contains supplementary material, which is available to authorized users. 相似文献8.
Receiver operating characteristic (ROC) curves are used to describe the performance of diagnostic procedures. This paper proposes a simple method for the statistical comparison of two ROC curves derived from the same set of patients and the same set of healthy subjects. Generalization to studies involving more than two screening factors is straightforward. This method does not require the calculation of variances of the areas or difference of areas under the curves. 相似文献
9.
Summary Boosting is a powerful approach to fitting regression models. This article describes a boosting algorithm for likelihood‐based estimation with incomplete data. The algorithm combines boosting with a variant of stochastic approximation that uses Markov chain Monte Carlo to deal with the missing data. Applications to fitting generalized linear and additive models with missing covariates are given. The method is applied to the Pima Indians Diabetes Data where over half of the cases contain missing values. 相似文献
10.
Summary . The Wilcoxon Mann-Whitney (WMW) U test is commonly used in nonparametric two-group comparisons when the normality of the underlying distribution is questionable. There has been some previous work on estimating power based on this procedure ( Lehmann, 1998 , Nonparametrics ). In this article, we present an approach for estimating type II error, which is applicable to any continuous distribution, and also extend the approach to handle grouped continuous data allowing for ties. We apply these results to obtaining standard errors of the area under the receiver operating characteristic curve (AUROC) for risk-prediction rules under H 1 and for comparing AUROC between competing risk prediction rules applied to the same data set. These results are based on SAS -callable functions to evaluate the bivariate normal integral and are thus easily implemented with standard software. 相似文献
11.
A comparison of methods for predicting vegetation type 总被引:3,自引:0,他引:3
Predictive modeling of vegetation patterns has wide application in vegetation science. In this paper I discuss three methods of predictive modeling using data from the alpine treeline ecotone as a case study. The study area is a portion of Glacier National Park, Montana. Parametric general linear models (GLM), artificial neural networks (ANN) and classification tree (CT) methods of predicting vegetation type are compared to determine the relative strength of each predictive approach and how they may be used in concert to increase understanding of important vegetation – environment relations. For each predictive method, vegetation type within the alpine treeline ecotone is predicted using a suite of environmental indicator variables including elevation, moisture potential, solar radiation potential, snow potential index, and disturbance history. Results from each of the predictive methods are compared against the real vegetation types to determine the relative accuracy of the methods.When the entire data field is examined (i.e., not evaluated by smaller spatial aggregates of data) the ANN procedure produces the most accurate predictions (=0.571); the CT predictions are the least accurate (=0.351). The predicted patterns of vegetation on the landscape are considerably different using the three methods. The GLM and CT methods produce large contiguous swaths of vegetation types throughout the study area, whereas the ANN method produces patterns with much more heterogeneity and smaller patches.When predictions are compared to reality at catchment scale, it becomes evident that the accuracy of each method varies depending upon the specific situation. The ANN procedure remains the most accurate method in the majority of the catchments, but both the GLM and PCT produce the most accurate classifications in at least one basin each.The variability in predictive ability of the three methods tested here indicates that there may not be a single best predictive method. Rather it may be important to use a suite of predictive models to help understand the environment – vegetation relationships. The ability to use multiple predictive methods to determine which spatial subunits of a landscape are outliers is important when identifying locations useful for climate change monitoring studies. 相似文献
12.
Prediction of plant species distribution in lowland river valleys in Belgium: modelling species response to site conditions 总被引:7,自引:0,他引:7
Ana M.F. Bio Piet De Becker Els De Bie Willy Huybrechts Martin Wassen 《Biodiversity and Conservation》2002,11(12):2189-2216
In ecological modelling, limitations in data and their applicability for predictive modelling are more rule than exception. Often modelling has to be performed on sub-optimal data, as explicit and controlled collection of (more) appropriate data would not be feasible. An example of predictive ecological modelling is given with application of generalized additive and generalized linear models fitted to presence–absence records of plant species and site condition data from four nutrient-poor Flemish lowland valleys. Standard regression procedures are used for modelling, although explanatory and response data do not meet all the assumptions implicit in these procedures. Data were non-randomly collected and are spatially autocorrelated; model residuals retain part of that correlation. The scale of most site-condition records does not match the scale of the response variable (species distribution). Hence, interpolated and up-scaled explanatory variables are used. Data are aggregated from distinct phytogeographical regions to allow for generalized models, applicable to a wider population of river valleys in the same region. Nevertheless, ecologically sound models are obtained, which predict well the distribution of most plant species for the Flemish river valleys considered. 相似文献
13.
Accurate diagnosis of disease is a critical part of health care. New diagnostic and screening tests must be evaluated based on their abilities to discriminate diseased from nondiseased states. The partial area under the receiver operating characteristic (ROC) curve is a measure of diagnostic test accuracy. We present an interpretation of the partial area under the curve (AUC), which gives rise to a nonparametric estimator. This estimator is more robust than existing estimators, which make parametric assumptions. We show that the robustness is gained with only a moderate loss in efficiency. We describe a regression modeling framework for making inference about covariate effects on the partial AUC. Such models can refine knowledge about test accuracy. Model parameters can be estimated using binary regression methods. We use the regression framework to compare two prostate-specific antigen biomarkers and to evaluate the dependence of biomarker accuracy on the time prior to clinical diagnosis of prostate cancer. 相似文献
14.
植物分布与气候之间的关系是预估未来气候变化对生态系统影响的实现基础。以往的物种分布模型通常以物种的分布区或者分布点的物种存在数据作为物种分布的响应变量。相较于物种存在数据, 多度反映了一个物种占用资源并把资源分配给个体的能力, 更能衡量物种对区域生态系统的影响。该研究通过野外调查获取了华北及周边地区1 045个样方的栎属树木多度, 利用广义线性模型、广义加性模型和随机森林模型模拟栓皮栎(Quercus variabilis)、麻栎(Q. acutissima)、槲栎(Q. aliena)、锐齿槲栎(Q. aliena var. acuteserrata)和蒙古栎(Q. mongolica) 5个树种多度的地理分布及未来2个不同时期(2050年和2070年)的潜在分布。结果表明: 随机森林模型对5个栎属树种的多度的拟合结果要优于广义线性模型和广义加性模型; 典型浓度路径(RCP) 8.5下的5个栎属树种在未来两个时期的多度变化幅度都要大于RCP 2.6下的变化, 在超过一半面积的区域中麻栎、槲栎、锐齿槲栎和蒙古栎的多度减少, 其中内蒙古东北部和黑龙江北部地区是5种栎属植物多度减少的集中分布地区。未来气候变化背景下, 需要加强对这几个区域的监测与物种保护。 相似文献
15.
16.
17.
New biomarkers are frequently being developed in laboratory settings for the early diagnosis of diseases. However, the assay can be so expensive to assess in some cases that the evaluation of a large number of assays becomes unfeasible. Under this setting pooling biospecimens becomes an appealing alternative. In this paper, we present the methodology to allow for general pooling strategies and different data structures, which include balanced and unbalanced pooling cases. An estimate of the area under the ROC curve of a single biomarker with its asymptotic mean and variance is provided. Furthermore, we develop a test statistic for comparing the areas under the ROC curves of two biomarkers. The methods are illustrated with data from a study evaluating biomarkers for coronary heart disease. 相似文献
18.
Fabio Attorre Marco Alfò Michele De Sanctis Fabio Francesconi Roberto Valenti Marcello Vitale Franco Bruno 《应用植被学》2011,14(2):242-255
Question: What is the effect of climate change on tree species abundance and distribution in the Italian peninsula? Location: Italian peninsula. Methods: Regression tree analysis, Random Forest, generalized additive model and geostatistical methods were compared to identify the best model for quantifying the effect of climate change on tree species distribution and abundance. Future potential species distribution, richness, local colonization, local extinction and species turnover were modelled according to two scenarios (A2 and B1) for 2050 and 2080. Results: Robust Random Forest proved to be the best statistical model to predict the potential distribution of tree species abundance. Climate change could lead to a shift in tree species distribution towards higher altitudes and a reduction of forest cover. Pinus sylvestris and Tilia cordata may be considered at risk of local extinction, while the other species could find potential suitable areas at the cost of a rearrangement of forest community composition and increasing competition. Conclusions: Geographical and topographical regional characteristics can have a noticeable influence on the impact of predicted climate change on forest ecosystems within the Mediterranean basin. It would be highly beneficial to create a standardized and harmonized European forest inventory in order to evaluate, at high resolution, the effect of climate change on forest ecosystems, identify regional differences and develop specific adaptive management strategies and plans. 相似文献
19.
广义模型及分类回归树在物种分布模拟中的应用与比较 总被引:19,自引:0,他引:19
比较3个应用较广的模拟物种地理分布模型:广义线性模型(GLM)、广义加法模型(GAM)与分类回归树(CART)对中国树种地理分布模拟的优劣,以提出更为合适的模拟物种地理分布模型,并用于预测气候变化对物种地理分布的影响。3个模型对中国15种树种地理分布的模拟研究表明:除对油松、辽东栎分布的模拟精度稍差外,对其余树种分布的模拟精度均较高,其中以GAM模型最好。结合地理信息系统(GIS),比较分析了这3个模型对青冈、木荷、红松和油松4种树种的地理分布模拟效果,结果亦表明:这3个模型均能很好模拟青冈和木荷的地理分布,而GLM模型对红松分布的模拟结果不太理想,3个模型对油松分布的模拟结果均不甚理想,其中以GLM模型最差。基于3个模型对未来气候变化下青冈与蒙古栎地理分布的预测表明:GLM模型与GAM模型对青冈分布的预测结果较为接近,青冈在未来气候变化情景下向西和向北扩展,而CART模型预测青冈在未来气候变化情景下除有向西、向北扩展趋势外,广东和广西南部的青冈分布区将消失;3个模型均预测蒙古栎在未来气候变化情景下向西扩展,扩展面积的大小为:模型的模拟面积>模型>模型。 相似文献
20.
Quantitative procedures for evaluating added values from new markers over a conventional risk scoring system for predicting event rates at specific time points have been extensively studied. However, a single summary statistic, for example, the area under the receiver operating characteristic curve or its derivatives, may not provide a clear picture about the relationship between the conventional and the new risk scoring systems. When there are no censored event time observations in the data, two simple scatterplots with individual conventional and new scores for \"cases\" and \"controls\" provide valuable information regarding the overall and the subject-specific level incremental values from the new markers. Unfortunately, in the presence of censoring, it is not clear how to construct such plots. In this article, we propose a nonparametric estimation procedure for the distributions of the differences between two risk scores conditional on the conventional score. The resulting quantile curves of these differences over the subject-specific conventional score provide extra information about the overall added value from the new marker. They also help us to identify a subgroup of future subjects who need the new predictors, especially when there is no unified utility function available for cost-risk-benefit decision making. The procedure is illustrated with two data sets. The first is from a well-known Mayo Clinic primary biliary cirrhosis liver study. The second is from a recent breast cancer study on evaluating the added value from a gene score, which is relatively expensive to measure compared with the routinely used clinical biomarkers for predicting the patient's survival after surgery. 相似文献