首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Aim The area under the receiver operating characteristic (ROC) curve (AUC) is a widely used statistic for assessing the discriminatory capacity of species distribution models. Here, I used simulated data to examine the interdependence of the AUC and classical discrimination measures (sensitivity and specificity) derived for the application of a threshold. I shall further exemplify with simulated data the implications of using the AUC to evaluate potential versus realized distribution models. Innovation After applying the threshold that makes sensitivity and specificity equal, a strong relationship between the AUC and these two measures was found. This result is corroborated with real data. On the other hand, the AUC penalizes the models that estimate potential distributions (the regions where the species could survive and reproduce due to the existence of suitable environmental conditions), and favours those that estimate realized distributions (the regions where the species actually lives). Main conclusions Firstly, the independence of the AUC from the threshold selection may be irrelevant in practice. This result also emphasizes the fact that the AUC assumes nothing about the relative costs of errors of omission and commission. However, in most real situations this premise may not be optimal. Measures derived from a contingency table for different cost ratio scenarios, together with the ROC curve, may be more informative than reporting just a single AUC value. Secondly, the AUC is only truly informative when there are true instances of absence available and the objective is the estimation of the realized distribution. When the potential distribution is the goal of the research, the AUC is not an appropriate performance measure because the weight of commission errors is much lower than that of omission errors.  相似文献   

2.
Species distribution modelling (SDM) can help conservation by providing information on the ecological requirements of species at risk. We developed habitat suitability models at multiple spatial scales for a threatened freshwater turtle, Emydoidea blandingii, in Ontario as a case study. We also explored the effect of background data selection and modelling algorithm selection on habitat suitability predictions. We used sighting records, high-resolution land cover data (25 m), and two SDM techniques: boosted regression trees; and maximum entropy modelling. The area under the receiver characteristic operating curve (AUC) for habitat suitability models tested on independent data ranged from 0.878 to 0.912 when using random background and from 0.727 to 0.741 with target-group background. E. blandingii habitat suitability was best predicted by air temperature, wetland area, open water area, road density, and cropland area. Habitat suitability increased with increasing air temperature and wetland area, and decreased with increasing cropland area. Low road density and open water increased habitat suitability, while high levels of either variable decreased habitat suitability. Robust habitat suitability maps for species at risk require using a multi-scale and multi-algorithm approach. If well used, SDM can offer insight on the habitat requirements of species at risk and help guide the development of management plans. Our results suggest that E. blandingii management plans should promote the protection of terrestrial habitat surrounding residential wetlands, halt the building of roads within and adjacent to currently occupied habitat, and identify movement corridors for isolated populations.  相似文献   

3.
Atrial fibrillation (AF), the most frequent cause of cardioembolic stroke, is increasing in prevalence as the population ages, and presents with a broad spectrum of symptoms and severity. The early identification of AF is an essential part for preventing the possibility of blood clotting and stroke. In this work, a real-time algorithm is proposed for accurately screening AF episodes in electrocardiograms. This method adopts heart rate sequence, and it involves the application of symbolic dynamics and Shannon entropy. Using novel recursive algorithms, a low-computational complexity can be obtained. Four publicly-accessible sets of clinical data (Long-Term AF, MIT-BIH AF, MIT-BIH Arrhythmia, and MIT-BIH Normal Sinus Rhythm Databases) were used for assessment. The first database was selected as a training set; the receiver operating characteristic (ROC) curve was performed, and the best performance was achieved at the threshold of 0.639: the sensitivity (Se), specificity (Sp), positive predictive value (PPV) and overall accuracy (ACC) were 96.14%, 95.73%, 97.03% and 95.97%, respectively. The other three databases were used for independent testing. Using the obtained decision-making threshold (i.e., 0.639), for the second set, the obtained parameters were 97.37%, 98.44%, 97.89% and 97.99%, respectively; for the third database, these parameters were 97.83%, 87.41%, 47.67% and 88.51%, respectively; the Sp was 99.68% for the fourth set. The latest methods were also employed for comparison. Collectively, results presented in this study indicate that the combination of symbolic dynamics and Shannon entropy yields a potent AF detector, and suggest this method could be of practical use in both clinical and out-of-clinical settings.  相似文献   

4.
Modelling approaches have the potential to significantly contribute to the spatial management of the deep-sea ecosystem in a cost effective manner. However, we currently have little understanding of the accuracy of such models, developed using limited data, of varying resolution. The aim of this study was to investigate the performance of predictive models constructed using non-simulated (real world) data of different resolution. Predicted distribution maps for three deep-sea habitats were constructed using MaxEnt modelling methods using high resolution multibeam bathymetric data and associated terrain derived variables as predictors. Model performance was evaluated using repeated 75/25 training/test data partitions using AUC and threshold-dependent assessment methods. The overall extent and distribution of each habitat, and the percentage contained within an existing MPA network were quantified and compared to results from low resolution GEBCO models. Predicted spatial extent for scleractinian coral reef and Syringammina fragilissima aggregations decreased with an increase in model resolution, whereas Pheronema carpenteri total suitable area increased. Distinct differences in predicted habitat distribution were observed for all three habitats. Estimates of habitat extent contained within the MPA network all increased when modelled at fine scale. High resolution models performed better than low resolution models according to threshold-dependent evaluation. We recommend the use of high resolution multibeam bathymetry data over low resolution bathymetry data for use in modelling approaches. We do not recommend the use of predictive models to produce absolute values of habitat extent, but likely areas of suitable habitat. Assessments of MPA network effectiveness based on calculations of percentage area protection (policy driven conservation targets) from low resolution models are likely to be fit for purpose.  相似文献   

5.
The various human‐induced threats imposed on nature have recently triggered the study of species' distributions. We developed potential suitability models using two algorithms for a threatened African mahogany, Entandrophragma angolense, in three East African countries; Kenya, Tanzania and Uganda. The effect of features selection and modelling algorithm selection on potential suitability predictions was explored. Occurrence records and high‐resolution environmental data were used. The two species distribution modelling techniques were genetic algorithm rule for prediction; and maximum entropy modelling. With Maxent, the area under the receiver characteristic operating curve (AUC) for potential distribution models tested on independent data ranged from 0.942 to 0.972 when using automatic features and from 0.974 to 0.666 with target or specific features. With GARP, AUC for potential distribution models ranged from 0.591 to 0.736 with all rule types and from 0.388 to 0.805 for specific rule types (Tables  1  and 2 ). The area under the E. angolense potential suitability was best predicted by soil, rainfall and aspect using GARP. Potential suitability increased with increasing aspect and decreased with increasing slope. Low rainfall and elevation increased potential suitability, while high levels of either variable decreased potential suitability. Potential suitability maps for vulnerable species require using a multi‐algorithm, fine scale data approach and incorporation of environmental variables like soil, slope, land use and elevation. Species distribution models can offer insight on the distribution requirements of vulnerable species and help guide the development of management plans. Results of this study suggest that E. angolense management plans should promote the protection of terrestrial forests surrounding water bodies including Mabira forest in Uganda.  相似文献   

6.
Aim We demonstrate how to integrate two widely used tools for modelling the spread of invasive plants, and compare the performance of the combined model with that of its individual components using the recent range dynamics of the invasive annual weed Ambrosia artemisiifolia L. Location Austria. Methods Species distribution models, which deliver habitat‐based information on potential distributions, and interacting particle systems, which simulate spatio‐temporal range dynamics as dependent on neighbourhood configurations, were combined into a common framework. We then used the combined model to simulate the invasion of A. artemisiifolia in Austria between 1990 and 2005. For comparison, simulations were also performed with models that accounted only for habitat suitability or neighbourhood configurations. The fit of the three models to the data was assessed by likelihood ratio tests, and simulated invasion patterns were evaluated against observed ones in terms of predictive discrimination ability (area under the receiver operating characteristic curve, AUC) and spatial autocorrelation (Moran’s I). Results The combined model fitted the data significantly better than the single‐component alternatives. Simulations relying solely on parameterized spread kernels performed worst in terms of both AUC and spatial pattern formation. Simulations based only on habitat information correctly predicted infestation of susceptible areas but reproduced the autocorrelated patterns of A. artemisiifolia expansion less adequately than did the integrated model. Main conclusions Our integrated modelling approach offers a flexible tool for forecasts of spatio‐temporal invasion patterns from landscape to regional scales. As a further advantage, scenarios of environmental change can be incorporated consistently by appropriately updating habitat suitability layers. Given the susceptibility of many alien plants, including A. artemisiifolia, to both land use and climate changes, taking such scenarios into account will increasingly become relevant for the design of proactive management strategies.  相似文献   

7.

Background  

The receiver operating characteristic (ROC) curve is a fundamental tool to assess the discriminant performance for not only a single marker but also a score function combining multiple markers. The area under the ROC curve (AUC) for a score function measures the intrinsic ability for the score function to discriminate between the controls and cases. Recently, the partial AUC (pAUC) has been paid more attention than the AUC, because a suitable range of the false positive rate can be focused according to various clinical situations. However, existing pAUC-based methods only handle a few markers and do not take nonlinear combination of markers into consideration.  相似文献   

8.

Background

Dengue is a re-emerging infectious disease of humans, rapidly growing from endemic areas to dengue-free regions due to favorable conditions. In recent decades, Guangzhou has again suffered from several big outbreaks of dengue; as have its neighboring cities. This study aims to examine the impact of dengue epidemics in Guangzhou, China, and to develop a predictive model for Zhongshan based on local weather conditions and Guangzhou dengue surveillance information.

Methods

We obtained weekly dengue case data from 1st January, 2005 to 31st December, 2014 for Guangzhou and Zhongshan city from the Chinese National Disease Surveillance Reporting System. Meteorological data was collected from the Zhongshan Weather Bureau and demographic data was collected from the Zhongshan Statistical Bureau. A negative binomial regression model with a log link function was used to analyze the relationship between weekly dengue cases in Guangzhou and Zhongshan, controlling for meteorological factors. Cross-correlation functions were applied to identify the time lags of the effect of each weather factor on weekly dengue cases. Models were validated using receiver operating characteristic (ROC) curves and k-fold cross-validation.

Results

Our results showed that weekly dengue cases in Zhongshan were significantly associated with dengue cases in Guangzhou after the treatment of a 5 weeks prior moving average (Relative Risk (RR) = 2.016, 95% Confidence Interval (CI): 1.845–2.203), controlling for weather factors including minimum temperature, relative humidity, and rainfall. ROC curve analysis indicated our forecasting model performed well at different prediction thresholds, with 0.969 area under the receiver operating characteristic curve (AUC) for a threshold of 3 cases per week, 0.957 AUC for a threshold of 2 cases per week, and 0.938 AUC for a threshold of 1 case per week. Models established during k-fold cross-validation also had considerable AUC (average 0.938–0.967). The sensitivity and specificity obtained from k-fold cross-validation was 78.83% and 92.48% respectively, with a forecasting threshold of 3 cases per week; 91.17% and 91.39%, with a threshold of 2 cases; and 85.16% and 87.25% with a threshold of 1 case. The out-of-sample prediction for the epidemics in 2014 also showed satisfactory performance.

Conclusion

Our study findings suggest that the occurrence of dengue outbreaks in Guangzhou could impact dengue outbreaks in Zhongshan under suitable weather conditions. Future studies should focus on developing integrated early warning systems for dengue transmission including local weather and human movement.  相似文献   

9.
《Endocrine practice》2019,25(11):1117-1126
Objective: While intraoperative parathyroid hormone (IOPTH) monitoring with a ≥50% drop commonly guides the extent of exploration for primary hyperparathyroidism (pHPT), receiver operating characteristic (ROC) analysis has not been performed to determine whether other criteria yield better sensitivity and specificity. The aim of this study was to identify the optimum percent change of IOPTH following removal of the abnormal parathyroid pathology, in order to predict biochemical cure. Secondary aims were to identify patient subgroups with increased area under the ROC curve (AUC) and the need for moderated criteria.Methods: A retrospective review was performed on patients undergoing primary parathyroid surgery for sporadic pHPT between 1999 and 2010 at a tertiary center for endocrine surgery. Eight hundred and ninety-six patients with primary hyperparathyroidism were included. Multigland disease (MGD) was defined as the intraoperative detection of more than 1 enlarged hypercellular gland or persistent disease after single gland excision. ROC analysis was used to determine the value with the best performance at predicting MGD, following bilateral exploration.Results: MGD was diagnosed in 174 patients (19.4%). ROC analysis demonstrated an AUC of 0.69. An IOPTH drop of 72% was the point of optimal discrimination with a sensitivity of 55% and specificity of 76% for predicting MGD. Subgroup analysis by preoperative calcium, preoperative PTH, localization studies, or pre- and post-excision IOPTH, did not identify any factors associated with an improved AUC.Conclusion: To our knowledge, this is the first study to use ROC analysis in a large patient cohort. An IOPTH drop of 72% was found to have optimal discriminating ability. We failed to identify a subset of patients for whom there was substantial improvement in the AUC, sensitivity, or specificity.Abbreviations: AUC = area under the ROC curve; BE = bilateral neck exploration; FE = focal parathyroid exploration; IOPTH = intraoperative parathyroid hormone; MGD = multigland disease; MIBI = Tc99m-sestamibi I-123 subtraction single-photon emission computed tomography/computed tomography; pHPT = primary hyperparathyroidism; ROC = receiver operating characteristic; SGD = single gland disease; US = surgeon-performed neck ultrasound  相似文献   

10.
Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10−13, 10−13, and 10−3, respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn''s disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.  相似文献   

11.
We used correlative models with species occurrence points, Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation indices, and topo-climatic predictors to map the current distribution and potential habitat of invasive Prosopis juliflora in Afar, Ethiopia. Time-series of MODIS Enhanced Vegetation Indices (EVI) and Normalized Difference Vegetation Indices (NDVI) with 250 m2 spatial resolution were selected as remote sensing predictors for mapping distributions, while WorldClim bioclimatic products and generated topographic variables from the Shuttle Radar Topography Mission product (SRTM) were used to predict potential infestations. We ran Maxent models using non-correlated variables and the 143 species- occurrence points. Maxent generated probability surfaces were converted into binary maps using the 10-percentile logistic threshold values. Performances of models were evaluated using area under the receiver-operating characteristic (ROC) curve (AUC). Our results indicate that the extent of P. juliflora invasion is approximately 3,605 km2 in the Afar region (AUC  = 0.94), while the potential habitat for future infestations is 5,024 km2 (AUC  = 0.95). Our analyses demonstrate that time-series of MODIS vegetation indices and species occurrence points can be used with Maxent modeling software to map the current distribution of P. juliflora, while topo-climatic variables are good predictors of potential habitat in Ethiopia. Our results can quantify current and future infestations, and inform management and policy decisions for containing P. juliflora. Our methods can also be replicated for managing invasive species in other East African countries.  相似文献   

12.
The discriminating capacity (i.e. ability to correctly classify presences and absences) of species distribution models (SDMs) is commonly evaluated with metrics such as the area under the receiving operating characteristic curve (AUC), the Kappa statistic and the true skill statistic (TSS). AUC and Kappa have been repeatedly criticized, but TSS has fared relatively well since its introduction, mainly because it has been considered as independent of prevalence. In addition, discrimination metrics have been contested because they should be calculated on presence–absence data, but are often used on presence‐only or presence‐background data. Here, we investigate TSS and an alternative set of metrics—similarity indices, also known as F‐measures. We first show that even in ideal conditions (i.e. perfectly random presence–absence sampling), TSS can be misleading because of its dependence on prevalence, whereas similarity/F‐measures provide adequate estimations of model discrimination capacity. Second, we show that in real‐world situations where sample prevalence is different from true species prevalence (i.e. biased sampling or presence‐pseudoabsence), no discrimination capacity metric provides adequate estimation of model discrimination capacity, including metrics specifically designed for modelling with presence‐pseudoabsence data. Our conclusions are twofold. First, they unequivocally impel SDM users to understand the potential shortcomings of discrimination metrics when quality presence–absence data are lacking, and we recommend obtaining such data. Second, in the specific case of virtual species, which are increasingly used to develop and test SDM methodologies, we strongly recommend the use of similarity/F‐measures, which were not biased by prevalence, contrary to TSS.  相似文献   

13.
Habitat modelling and predictive mapping are important tools for conservation planning, particularly for lesser known species such as many insectivorous bats. However, the scale at which modelling is undertaken can affect the predictive accuracy and restrict the use of the model at different scales. We assessed the validity of existing regional-scale habitat models at a local-scale and contrasted the habitat use of two morphologically similar species with differing conservation status (Mormopterus norfolkensis and Mormopterus species 2). We used negative binomial generalised linear models created from indices of activity and environmental variables collected from systematic acoustic surveys. We found that habitat type (based on vegetation community) best explained activity of both species, which were more active in floodplain areas, with most foraging activity recorded in the freshwater wetland habitat type. The threatened M. norfolkensis avoided urban areas, which contrasts with M. species 2 which occurred frequently in urban bushland. We found that the broad habitat types predicted from local-scale models were generally consistent with those from regional-scale models. However, threshold-dependent accuracy measures indicated a poor fit and we advise caution be applied when using the regional models at a fine scale, particularly when the consequences of false negatives or positives are severe. Additionally, our study illustrates that habitat type classifications can be important predictors and we suggest they are more practical for conservation than complex combinations of raw variables, as they are easily communicated to land managers.  相似文献   

14.
The classification accuracy of new diagnostic tests is based on receiver operating characteristic (ROC) curves. The area under the ROC curve (AUC) is one of the well-accepted summary measures for describing the accuracy of diagnostic tests. The AUC summary measure can vary by patient and testing characteristics. Thus, the performance of the test may be different in certain subpopulation of patients and readers. For this purpose, we propose a direct semi-parametric regression model for the non-parametric AUC measure for ordinal data while accounting for discrete and continuous covariates. The proposed method can be used to estimate the AUC value under degenerate data where certain rating categories are not observed. We will discuss the non-standard asymptotic theory, since the estimating functions were based on cross-correlated random variables. Simulation studies based on different classification models showed that the proposed model worked reasonably well with small percent bias and percent mean-squared error. The proposed method was applied to the prostate cancer study to estimate the AUC for four readers, and the carotid vessel study with age, gender, history of previous stroke, and total number of risk factors as covariates, to estimate the accuracy of the diagnostic test in the presence of subject-level covariates.  相似文献   

15.
Native to Southeast Asia, Hygrophila polysperma is an invasive aquatic weed of lotic habitats in the southern United States and Mexico. An increase in the number of water bodies invaded by hygrophila since 1990 suggests that current methods employed to control this weed are inadequate. Classical biological control may be a viable option for long term regulation of hygrophila in the invaded range. In this study, we used the Maximum Entropy Species Distribution Model (MaxEnt) to prioritize climatically suitable native habitats in India and Bangladesh for conducting exploratory surveys to collect biological control agents. In total, 164 point occurrences from the United States and Mexico and 20 predictor variables, including 19 bioclimatic variables and altitude, were used to predict the native distribution of hygrophila. Performance of the model was statistically verified using threshold dependent binomial tests and area under the curve (AUC) score of the receiver operating characteristic (ROC) curve plot. The results showed that the model performed significantly better than random in both binomial tests and AUC analyses. High suitability of occurrence of hygrophila was predicted in the northeastern region of India and northern and eastern parts of Bangladesh. Based on percent omission of known native occurrences, a color-coded final distribution map was prepared to prioritize areas for conducting future surveys. Our study proposes a technique that can be useful for prioritizing areas in native ranges for exploratory surveys to collect biological control agents.  相似文献   

16.
基于生态位模型的艾比湖国家级自然保护区马鹿生境评价   总被引:1,自引:0,他引:1  
生境评价和预测是对濒危物种进行有效保护的基础。通过2013年9月和2014年10月对新疆艾比湖国家级自然保护区开展2次秋季野外调查共收集了92处马鹿(Cervuselaphus)出现数据,利用马鹿出现数据作为分布点数据,选取地形、植被类型和气候因子3类23种因子作为生境变量,利用MAXENT生态位模型分析了新疆艾比湖国家级自然保护区马鹿秋季生境适宜性分布特征和主要生境因子对马鹿分布的影响。结果表明:模型预测结果较高,平均AUC(area under the curve,受试工作者曲线下面值)值为0.976;Jackknife检验结果显示:最热月最高温度对马鹿生境分布的影响较大。植被类型和坡度对马鹿生境分布的影响不大。海拔、年降雨量、气温日较差和最热季平均温度是影响马鹿生境分布的主要生境因子。马鹿秋季生境划分为高适宜、次适宜、低适宜和不适宜4个等级,马鹿的高适宜生境区主要分布在研究区域的北部,次适宜及低适宜生境区则分布于高适宜生境区的边缘,而不适宜生境区主要集中在西部和东部地区。研究不仅提供了马鹿在艾比湖的实际分布状况,也为马鹿生境和生境因子的关系方面提供了一个重要的科学依据。  相似文献   

17.
We present a correlative modelling technique that uses locality records (associated with species presence) and a set of predictor variables to produce a statistically justifiable probability response surface for a target species. The probability response surface indicates the suitability of each grid cell in a map for the target species in terms of the suite of predictor variables. The technique constructs a hyperspace for the target species using principal component axes derived from a principal components analysis performed on a training dataset. The training dataset comprises the values of the predictor variables associated with the localities where the species has been recorded as present. The origin of this hyperspace is taken to characterize the centre of the niche of the organism. All the localities (grid-cells) in the map region are then fitted into this hyperspace using the values of the predictor variables at these localities (the prediction dataset). The Euclidean distance from any locality to the origin of the hyperspace gives a measure of the 'centrality' of that locality in the hyperspace. These distances are used to derive probability values for each grid cell in the map region. The modelling technique was applied to bioclimatic data to predict bioclimatic suitability for three alien invasive plant species ( Lantana camara L., Ricinus communis L. and Solanum mauritianum Scop.) in South Africa, Lesotho and Swaziland. The models were tested against independent test records by calculating area under the curve (AUC) values of receiver operator characteristic (ROC) curves and kappa statistics. There was good agreement between the models and the independent test records. The pre-processing of climatic variable data to reduce the deleterious effects of multicollinearity, and the use of stopping rules to prevent overfitting of the models are important aspects of the modelling process.  相似文献   

18.

Background

In silico models have recently been created in order to predict which genetic variants are more likely to contribute to the risk of a complex trait given their functional characteristics. However, there has been no comprehensive review as to which type of predictive accuracy measures and data visualization techniques are most useful for assessing these models.

Methods

We assessed the performance of the models for predicting risk using various methodologies, some of which include: receiver operating characteristic (ROC) curves, histograms of classification probability, and the novel use of the quantile-quantile plot. These measures have variable interpretability depending on factors such as whether the dataset is balanced in terms of numbers of genetic variants classified as risk variants versus those that are not.

Results

We conclude that the area under the curve (AUC) is a suitable starting place, and for models with similar AUCs, violin plots are particularly useful for examining the distribution of the risk scores.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1616-z) contains supplementary material, which is available to authorized users.  相似文献   

19.
Metabolomics is increasingly being applied towards the identification of biomarkers for disease diagnosis, prognosis and risk prediction. Unfortunately among the many published metabolomic studies focusing on biomarker discovery, there is very little consistency and relatively little rigor in how researchers select, assess or report their candidate biomarkers. In particular, few studies report any measure of sensitivity, specificity, or provide receiver operator characteristic (ROC) curves with associated confidence intervals. Even fewer studies explicitly describe or release the biomarker model used to generate their ROC curves. This is surprising given that for biomarker studies in most other biomedical fields, ROC curve analysis is generally considered the standard method for performance assessment. Because the ultimate goal of biomarker discovery is the translation of those biomarkers to clinical practice, it is clear that the metabolomics community needs to start “speaking the same language” in terms of biomarker analysis and reporting-especially if it wants to see metabolite markers being routinely used in the clinic. In this tutorial, we will first introduce the concept of ROC curves and describe their use in single biomarker analysis for clinical chemistry. This includes the construction of ROC curves, understanding the meaning of area under ROC curves (AUC) and partial AUC, as well as the calculation of confidence intervals. The second part of the tutorial focuses on biomarker analyses within the context of metabolomics. This section describes different statistical and machine learning strategies that can be used to create multi-metabolite biomarker models and explains how these models can be assessed using ROC curves. In the third part of the tutorial we discuss common issues and potential pitfalls associated with different analysis methods and provide readers with a list of nine recommendations for biomarker analysis and reporting. To help readers test, visualize and explore the concepts presented in this tutorial, we also introduce a web-based tool called ROCCET (ROC Curve Explorer & Tester, http://www.roccet.ca). ROCCET was originally developed as a teaching aid but it can also serve as a training and testing resource to assist metabolomics researchers build biomarker models and conduct a range of common ROC curve analyses for biomarker studies.  相似文献   

20.
In the everyday routine of an analytic lab, one is often confronted with the challenge to identify an unknown microbial sample lacking prior information to set the search limits.In the present work, we propose a workflow, which uses the spectral diversity of a commercial database (SARAMIS) to narrow down the search field at a certain taxonomic level, followed by a refined classification by supervised modelling. As supervised learning algorithm, we have chosen a shrinkage discriminant analysis approach, which takes collinearity of the data into account and provides a scoring system for biomarker ranking. This ranking can be used to tailor specific biomarker subsets, which optimize discrimination between subgroups, allowing a weighting of misclassification.The suitability of the approach was verified based on a dataset containing the mass spectra of three Yersinia species Yersinia enterocolitica, Y. pseudotuberculosis and Yersinia pestis. Thereby, we laid the emphasis on the discrimination between the highly related species Yersinia pseudotuberculosis and Y. pestis.All three species were correctly identified at the genus level by the commercial database. Whereas Y. enterocolitica was correctly identified at the species level, discrimination between the highly related Y. pseudotuberculosis and Y. pestis strains was ambiguous. With the use of the supervised modelling approach, we were able to accurately discriminate all the species even when grown under different culture conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号