首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到14条相似文献,搜索用时 0 毫秒
1.
2.
  1. The receiver operating characteristic (ROC) and precision–recall (PR) plots have been widely used to evaluate the performance of species distribution models. Plotting the ROC/PR curves requires a traditional test set with both presence and absence data (namely PA approach), but species absence data are usually not available in reality. Plotting the ROC/PR curves from presence‐only data while treating background data as pseudo absence data (namely PO approach) may provide misleading results.
  2. In this study, we propose a new approach to calibrate the ROC/PR curves from presence and background data with user‐provided information on a constant c, namely PB approach. Here, c defines the probability that species occurrence is detected (labeled), and an estimate of c can also be derived from the PB‐based ROC/PR plots given that a model with good ability of discrimination is available. We used five virtual species and a real aerial photography to test the effectiveness of the proposed PB‐based ROC/PR plots. Different models (or classifiers) were trained from presence and background data with various sample sizes. The ROC/PR curves plotted by PA approach were used to benchmark the curves plotted by PO and PB approaches.
  3. Experimental results show that the curves and areas under curves by PB approach are more similar to that by PA approach as compared with PO approach. The PB‐based ROC/PR plots also provide highly accurate estimations of c in our experiment.
  4. We conclude that the proposed PB‐based ROC/PR plots can provide valuable complements to the existing model assessment methods, and they also provide an additional way to estimate the constant c (or species prevalence) from presence and background data.
  相似文献   

3.
Area under the free-response ROC curve (FROC) and a related summary index   总被引:1,自引:0,他引:1  
Bandos AI  Rockette HE  Song T  Gur D 《Biometrics》2009,65(1):247-256
Summary .  Free-response assessment of diagnostic systems continues to gain acceptance in areas related to the detection, localization, and classification of one or more "abnormalities" within a subject. A free-response receiver operating characteristic (FROC) curve is a tool for characterizing the performance of a free-response system at all decision thresholds simultaneously. Although the importance of a single index summarizing the entire curve over all decision thresholds is well recognized in ROC analysis (e.g., area under the ROC curve), currently there is no widely accepted summary of a system being evaluated under the FROC paradigm. In this article, we propose a new index of the free-response performance at all decision thresholds simultaneously, and develop a nonparametric method for its analysis. Algebraically, the proposed summary index is the area under the empirical FROC curve penalized for the number of erroneous marks, rewarded for the fraction of detected abnormalities, and adjusted for the effect of the target size (or "acceptance radius"). Geometrically, the proposed index can be interpreted as a measure of average performance superiority over an artificial "guessing" free-response process and it represents an analogy to the area between the ROC curve and the "guessing" or diagonal line. We derive the ideal bootstrap estimator of the variance, which can be used for a resampling-free construction of asymptotic bootstrap confidence intervals and for sample size estimation using standard expressions. The proposed procedure is free from any parametric assumptions and does not require an assumption of independence of observations within a subject. We provide an example with a dataset sampled from a diagnostic imaging study and conduct simulations that demonstrate the appropriateness of the developed procedure for the considered sample sizes and ranges of parameters.  相似文献   

4.
In lightning-induced fire risk prediction models, the number of potential predictors is usually high, with some redundancy among them. It is therefore important to select the best subset of predictors that obtain models with the greatest discrimination capacity. With this aim in mind, the logistic generalized linear model was used to estimate lightning-induced fire occurrence using a case study of the province of León (northwest Spain). A bootstrap-based test was used to obtain the optimal number of predictors and to model this optimal number of predictors displaying the largest area under the receiver operating characteristics curve. The results show that of the 16 variables initially considered, only three were necessary to obtain the model with the best discriminatory capacity for estimating lightning-induced fire occurrence. Moreover, this model can be considered equivalent to another nine alternative models with three covariates. Both the optimal and the equivalent models are useful in the spatially explicit assessment of fire risk, the planning and coordination of regional efforts to identify areas at greatest risk, and the design of long-term wildfire management strategies. The methodology used for this case study can be applied to other wildfire risk assessment situations where multiple and interconnected covariates are available.  相似文献   

5.
【目的】生态位模型在生物地理学、入侵生物学和保护生物学中具有广泛的应用,被越来越多地用于预测物种潜在分布和现实分布的研究中。本文以美国白蛾为例介绍pROC方案在生态位模型评价中的应用及其注意事项,以期对物种潜在分布预测进行合理的评价,促进生态位模型在我国的合理运用和发展。【方法】介绍ROC曲线和AUC值基本原理,总结其在生态位模型评价中的应用,从物种存在分布点和不存在分布点的可信度出发,分析AUC值用于模型评价的优点和不足,最后介绍局部受试者工作特征曲线的线下面积方案(pROC方案)来弥补传统AUC值的不足。【结果】AUC值虽独立于阈值,但因其综合灵敏度和特异度,而屏蔽这2个指标各自的特征,不能分别评估预测结果的灵敏度和特异度,同时对遗漏率和记账错率不能进行权衡,会误导使用者对模型的评价。与AUC值相比,ROC曲线的形状更具有价值,蕴含丰富的模型评价信息。【结论】模型评价需要将灵敏度和特异度区别对待,ROC曲线形状比AUC值在生态位模型评价中更为重要,pROC方案相对于传统AUC值具有优势,但容易对过度模拟做出不当判断。模型评价与作者研究目的密切相关:当以预测物种潜在分布为目的时(如入侵物种潜在分布、气候变化对物种分布的影响和谱系生物地理学),模型评价应当给予灵敏度(或者遗漏率)更多的权重;当以预测物种现实分布为目的时(如保护区界定和濒危物种引入),模型评价应当给予灵敏度和特异度同等的权重。  相似文献   

6.
Aim The area under the receiver operating characteristic (ROC) curve (AUC) is a widely used statistic for assessing the discriminatory capacity of species distribution models. Here, I used simulated data to examine the interdependence of the AUC and classical discrimination measures (sensitivity and specificity) derived for the application of a threshold. I shall further exemplify with simulated data the implications of using the AUC to evaluate potential versus realized distribution models. Innovation After applying the threshold that makes sensitivity and specificity equal, a strong relationship between the AUC and these two measures was found. This result is corroborated with real data. On the other hand, the AUC penalizes the models that estimate potential distributions (the regions where the species could survive and reproduce due to the existence of suitable environmental conditions), and favours those that estimate realized distributions (the regions where the species actually lives). Main conclusions Firstly, the independence of the AUC from the threshold selection may be irrelevant in practice. This result also emphasizes the fact that the AUC assumes nothing about the relative costs of errors of omission and commission. However, in most real situations this premise may not be optimal. Measures derived from a contingency table for different cost ratio scenarios, together with the ROC curve, may be more informative than reporting just a single AUC value. Secondly, the AUC is only truly informative when there are true instances of absence available and the objective is the estimation of the realized distribution. When the potential distribution is the goal of the research, the AUC is not an appropriate performance measure because the weight of commission errors is much lower than that of omission errors.  相似文献   

7.
In this work, we extend our previous ligand shape-based virtual screening approach by using the scoring function Hamza–Wei–Zhan (HWZ) score and an enhanced molecular shape-density model for the ligands. The performance of the method has been tested against the 40 targets in the Database of Useful Decoys and compared with the performance of our previous HWZ score method. The virtual screening results using the novel ligand shape-based approach demonstrated a favorable improvement (area under the receiver operator characteristics curve AUC?=?.89?±?.02) and effectiveness (hit rate HR1%?=?53.0%?±?6.3 and HR10%?=?71.1%?±?4.9). The comparison of the overall performance of our ligand shape-based method with the highest ligand shape-based virtual screening approach using the data fusion of multi queries showed that our strategy takes into account deeper the chemical information of the set of active ligands. Furthermore, the results indicated that our method are suitable for virtual screening and yields superior prediction accuracy than the other study derived from the data fusion using five queries. Therefore, our novel ligand shape-based screening method constitutes a robust and efficient approach to the 3D similarity screening of small compounds and open the door to a whole new approach to drug design by implementing the method in the structure-based virtual screening.  相似文献   

8.
9.
We developed an accurate method to predict nucleosome positioning from genome sequences by refining the previously developed method of Peckham et al. (2007) [19]. Here, we used the relative fragment frequency index we developed and a support vector machine to screen for nucleosomal and linker DNA sequences. Our twofold cross-validation revealed that the accuracy of our method based on the area under the receiver operating characteristic curve was 81%, whereas that of Peckham’s method was 75% when both of two nucleosomal sequence data obtained from independent experiments were used for validation. We suggest that our method is more effective in predicting nucleosome positioning.  相似文献   

10.
11.
The spectra of k-mer frequencies can reveal the structures and evolution of genome sequences. We confirmed that the trimodal spectrum of 8-mers in human genome sequences is distinguished only by CG2, CG1 and CG0 8-mer sets, containing 2,1 or 0 CpG, respectively. This phenomenon is called independent selection law. The three types of CG 8-mers were considered as different functional elements. We conjectured that (1) nucleosome binding motifs are mainly characterized by CG1 8-mers and (2) the core structural units of CpG island sequences are predominantly characterized by CG2 8-mers. To validate our conjectures, nucleosome occupied sequences and CGI sequences were extracted, then the sequence parameters were constructed through the information of the three CG 8-mer sets respectively. ROC analysis showed that CG1 8-mers are more preference in nucleosome occupied segments (AUC > 0.7) and CG2 8-mers are more preference in CGI sequences (AUC > 0.99). This validates our conjecture in principle.  相似文献   

12.
To develop a non-invasive and sensitive diagnostic test for cancer using peripheral blood, we evaluated gene expression profiling of blood obtained from patients with cancer of the digestive system and normal subjects. The expression profiles of blood-derived total RNA obtained from 39 cancer patients (11 colon cancer, 14 gastric cancer, and 14 pancreatic cancer) was clearly different from those obtained from 15 normal subjects. By comparing the gene expression profiles of cancer patients and normal subjects, 25 cancer-differentiating genes (p < 5.0 × 10−6 and fold differences >3) were identified and an “expression index” deduced from the expression values of these genes differentiated the validation cohort (11 colon cancer, 8 gastric cancer, 18 pancreatic cancer, and 15 normal subjects) into cancer patients and normal subjects with 100% (37/37) and 87% (13/15) accuracy, respectively. Although, the expression profiles were not clearly different between the cancer patients, some characteristic genes were identified according to the stage and species of the cancer. Interestingly, many immune-related genes such as antigen presenting, cell cycle accelerating, and apoptosis- and stress-inducing genes were up-regulated in cancer patients, reflecting the active turnover of immune regulatory cells in cancer patients. These results showed the potential relevance of peripheral blood gene expression profiling for the development of new diagnostic examination tools for cancer patients.  相似文献   

13.
Batch effects are technical sources of variation and can confound analysis. While many performance ranking exercises have been conducted to establish the best batch effect-correction algorithm (BECA), we hold the viewpoint that the notion of best is context-dependent. Moreover, alternative questions beyond the simplistic notion of “best” are also interesting: are BECAs robust against various degrees of confounding and if so, what is the limit? Using two different methods for simulating class (phenotype) and batch effects and taking various representative datasets across both genomics (RNA-Seq) and proteomics platforms, we demonstrate that under situations where sample classes and batch factors are moderately confounded, most BECAs are remarkably robust and only weakly affected by upstream normalization procedures. This observation is consistently supported across the multitude of test datasets. BECAs do have limits: When sample classes and batch factors are strongly confounded, BECA performance declines, with variable performance in precision, recall and also batch correction. We also report that while conventional normalization methods have minimal impact on batch effect correction, they do not affect downstream statistical feature selection, and in strongly confounded scenarios, may even outperform BECAs. In other words, removing batch effects is no guarantee of optimal functional analysis. Overall, this study suggests that simplistic performance ranking exercises are quite trivial, and all BECAs are compromises in some context or another.  相似文献   

14.
BackgroundIntravoxel incoherent motion (IVIM) plays an important role in predicting treatment responses in patient with nasopharyngeal carcinoma (NPC). The goal of this study was to develop and validate a radiomics nomogram based on IVIM parametric maps and clinical data for the prediction of treatment responses in NPC patients.MethodsEighty patients with biopsy-proven NPC were enrolled in this study. Sixty-two patients had complete responses and 18 patients had incomplete responses to treatment. Each patient received a multiple b-value diffusion-weighted imaging (DWI) examination before treatment. Radiomics features were extracted from IVIM parametric maps derived from DWI image. Feature selection was performed by the least absolute shrinkage and selection operator method. Radiomics signature was generated by support vector machine based on the selected features. Receiver operating characteristic (ROC) curves and area under the ROC curve (AUC) values were used to evaluate the diagnostic performance of radiomics signature. A radiomics nomogram was established by integrating the radiomics signature and clinical data.ResultsThe radiomics signature showed good prognostic performance to predict treatment response in both training (AUC = 0.906, P<0.001) and testing (AUC = 0.850, P<0.001) cohorts. The radiomic nomogram established by integrating the radiomic signature with clinical data significantly outperformed clinical data alone (C-index, 0.929 vs 0.724; P<0.0001).ConclusionsThe IVIM-based radiomics nomogram provided high prognostic ability to treatment responses in patients with NPC. The IVIM-based radiomics signature has the potential to be a new biomarker in prediction of the treatment responses and may affect treatment strategies in patients with NPC.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号