首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier''s performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.  相似文献   

2.
MOTIVATION: An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC (receiver operator characteristic) technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently. However, due to computational difficulties, the ROC-based classification has not been used with microarray data. Moreover, the standard ROC technique does not incorporate built-in biomarker selection. RESULTS: We propose a novel method for biomarker selection and classification using the ROC technique for microarray data. The proposed method uses a sigmoid approximation to the area under the ROC curve as the objective function for classification and the threshold gradient descent regularization method for estimation and biomarker selection. Tuning parameter selection based on the V-fold cross validation and predictive performance evaluation are also investigated. The proposed approach is demonstrated with a simulation study, the Colon data and the Estrogen data. The proposed approach yields parsimonious models with excellent classification performance.  相似文献   

3.
4.
5.
6.
7.

Background

Integrated 18F-fluorodeoxyglucose positron emission tomography/computed tomography (18F-FDG PET/CT) is widely performed in hilar and mediastinal lymph node (HMLN) staging of non-small cell lung cancer (NSCLC). However, the diagnostic efficiency of PET/CT remains controversial. This retrospective study is to evaluate the accuracy of PET/CT and the characteristics of false negatives and false positives to improve specificity and sensitivity.

Methods

219 NSCLC patients with systematic lymph node dissection or sampling underwent preoperative PET/CT scan. Nodal uptake with a maximum standardized uptake value (SUVmax) >2.5 was interpreted as PET/CT positive. The results of PET/CT were compared with the histopathological findings. The receiver operating characteristic (ROC) curve was generated to determine the diagnostic efficiency of PET/CT. Univariate and multivariate analysis were conducted to detect risk factors of false negatives and false positives.

Results

The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of PET/ CT in detecting HMLN metastases were 74.2% (49/66), 73.2% (112/153), 54.4% (49/90), 86.8% (112/129), and 73.5% (161/219). The ROC curve had an area under curve (AUC) of 0.791 (95% CI 0.723-0.860). The incidence of false negative HMLN metastases was 13.2% (17 of 129 patients). Factors that are significantly associated with false negatives are: concurrent lung disease or diabetes (p<0.001), non-adenocarcinoma (p<0.001), and SUVmax of primary tumor >4.0 (p=0.009). Postoperatively, 45.5% (41/90) patients were confirmed as false positive cases. The univariate analysis indicated age > 65 years old (p=0.009), well differentiation (p=0.002), and SUVmax of primary tumor ≦4.0 (p=0.007) as risk factors for false positive uptake.

Conclusion

The SUVmax of HMLN is a predictor of malignancy. Lymph node staging using PET/CT is far from equal to pathological staging account of some risk factors. This study may provide some aids to pre-therapy evaluation and decision-making.  相似文献   

8.
9.
10.
11.
MOTIVATION: Experimental limitations in high-throughput protein-protein interaction detection methods have resulted in low quality interaction datasets that contained sizable fractions of false positives and false negatives. Small-scale, focused experiments are then needed to complement the high-throughput methods to extract true protein interactions. However, the naturally vast interactomes would require much more scalable approaches. RESULTS: We describe a novel method called IRAP* as a computational complement for repurification of the highly erroneous experimentally derived protein interactomes. Our method involves an iterative process of removing interactions that are confidently identified as false positives and adding interactions detected as false negatives into the interactomes. Identification of both false positives and false negatives are performed in IRAP* using interaction confidence measures based on network topological metrics. Potential false positives are identified amongst the detected interactions as those with very low computed confidence values, while potential false negatives are discovered as the undetected interactions with high computed confidence values. Our results from applying IRAP* on large-scale interaction datasets generated by the popular yeast-two-hybrid assays for yeast, fruit fly and worm showed that the computationally repurified interaction datasets contained potentially lower fractions of false positive and false negative errors based on functional homogeneity. AVAILABILITY: The confidence indices for PPIs in yeast, fruit fly and worm as computed by our method can be found at our website http://www.comp.nus.edu.sg/~chenjin/fpfn.  相似文献   

12.
OBJECTIVE: To express the value of a diagnostic test under standardized and comparable conditions. STUDY DESIGN: Four new concepts of standardizing positive predictive value (SPPV), standardizing negative predictive value (SNPV), standardizing accuracy (SAc) and standardizing an incorrect diagnostic test were developed. The theoretical positive predictive value (SPPV), theoretical negative predictive value (SNPV), theoretical accuracy (SAc) and theoretical incorrect diagnosis rate (SIDR), which are not affected by a different constituent ratio of disease and nondisease groups and are obtained under the theoretical standard condition that the sample size in the disease group equals that in the nondisease group, were defined. Based on these concepts and the principles and methods of statistics and evaluation of diagnostic tests, corresponding formulas were deduced. RESULTS: The formulas are: SPPV = a(b + d)/[a(b + d) + b(a + c)] = Se/(1 + Se - Sp), SNPV = d(a + c)/[c(b + d) + d(a + c)] = Sp/(1 - Se + Sp), SAc = [a(b + d) + d(a + c)]/[2(a + c)(b + d)] = (Se + Sp)/2, and SIDR = [b(a + c) + c(b + d)]/[2(a + c)(b + d)] = (2 - Se - Sp)/2. Here, a, b, c and d refer to the case numbers of true positives, false positives, false negatives and true negatives; Se and Sp refer, respectively, to sensitivity and specificity. CONCLUSION: SPPV, SNPV, SAc and SIDR are very useful for expressing and evaluating the value of a diagnostic test under standardized and comparable conditions.  相似文献   

13.
Large-scale presence-absence monitoring programs have great promise for many conservation applications. Their value can be limited by potential incorrect inferences owing to observational errors, especially when data are collected by the public. To combat this, previous analytical methods have focused on addressing non-detection from public survey data. Misclassification errors have received less attention but are also likely to be a common component of public surveys, as well as many other data types. We derive estimators for dynamic occupancy parameters (extinction and colonization), focusing on the case where certainty can be assumed for a subset of detections. We demonstrate how to simultaneously account for non-detection (false negatives) and misclassification (false positives) when estimating occurrence parameters for gray wolves in northern Montana from 2007–2010. Our primary data source for the analysis was observations by deer and elk hunters, reported as part of the state’s annual hunter survey. This data was supplemented with data from known locations of radio-collared wolves. We found that occupancy was relatively stable during the years of the study and wolves were largely restricted to the highest quality habitats in the study area. Transitions in the occupancy status of sites were rare, as occupied sites almost always remained occupied and unoccupied sites remained unoccupied. Failing to account for false positives led to over estimation of both the area inhabited by wolves and the frequency of turnover. The ability to properly account for both false negatives and false positives is an important step to improve inferences for conservation from large-scale public surveys. The approach we propose will improve our understanding of the status of wolf populations and is relevant to many other data types where false positives are a component of observations.  相似文献   

14.
Although the aldosterone/renin ratio (ARR) is the most reliable screening test for primary aldo-steronism, false positives and negatives occur. Dietary salt restriction, concomitant malignant or renovascular hypertension, pregnancy and treatment with diuretics (including spironolactone), dihydropyridine calcium blockers, angiotensin converting enzyme inhibitors, and angiotensin receptor antagonists can produce false negatives by stimulating renin. We recently reported selective serotonin reuptake inhibitors lower the ratio. Because potassium regulates aldosterone, uncorrected hypokalemia can lead to false negatives. Beta-blockers, alpha-methyldopa, clonidine, and nonsteroidal anti-inflammatory drugs suppress renin, raising the ARR with potential for false positives. False positives may occur in patients with renal dysfunction or advancing age. We recently showed that (1) females have higher ratios than males, and (2) false positive ratios can occur during the luteal menstrual phase and while taking an oral ethynylestradiol/drospirenone (but not implanted subdermal etonogestrel) contraceptive, but only if calculated using direct renin concentration and not plasma renin activity. Where feasible, diuretics should be ceased at least 6 weeks and other interfering medications at least 2 before ARR measurement, substituting noninterfering agents (e. g., verapamil slow-release±hydralazine and prazosin or doxazosin) were required. Hypokalemia should be corrected and a liberal salt diet encouraged. Collecting blood midmorning from seated patients following 2-4 h upright posture improves sensitivity. The ARR is a screening test only and should be repeated once or more before deciding whether to proceed to confirmatory suppression testing. Liquid chromatography-tandem mass spectrometry aldosterone assays represent a major advance towards addressing inaccuracies inherent in other available methods.  相似文献   

15.
Strain engineering has been traditionally centered on the use of mutation, selection, and screening to develop improved strains. Although mutational and screening methods are well-characterized, selection remains poorly understood. We hypothesized that we could use a genome-wide method for assessing laboratory selections to design selections with enhanced sensitivity (true positives) and specificity (true negatives) towards a single desired phenotype. To test this hypothesis, we first applied multi-SCale Analysis of Library Enrichments (SCALEs) to identify genes conferring increased fitness in continuous flow selections with increasing levels of 3-hydroxypropionic acid (3-HP). We found that this selection not only enriched for 3-HP tolerance phenotypes but also for wall adherence phenotypes (41% false positives). Using this genome-wide data, we designed a serial-batch selection with a decreasing 3-HP gradient. Further examination by ROC analysis confirmed that the serial-batch approach resulted in significantly increased sensitivity (46%) and specificity (10%) for our desired phenotype (3-HP tolerance).  相似文献   

16.
Molecular docking computationally screens thousands to millions of organic molecules against protein structures, looking for those with complementary fits. Many approximations are made, often resulting in low “hit rates.” A strategy to overcome these approximations is to rescore top-ranked docked molecules using a better but slower method. One such is afforded by molecular mechanics-generalized Born surface area (MM-GBSA) techniques. These more physically realistic methods have improved models for solvation and electrostatic interactions and conformational change compared to most docking programs. To investigate MM-GBSA rescoring, we re-ranked docking hit lists in three small buried sites: a hydrophobic cavity that binds apolar ligands, a slightly polar cavity that binds aryl and hydrogen-bonding ligands, and an anionic cavity that binds cationic ligands. These sites are simple; consequently, incorrect predictions can be attributed to particular errors in the method, and many likely ligands may actually be tested. In retrospective calculations, MM-GBSA techniques with binding-site minimization better distinguished the known ligands for each cavity from the known decoys compared to the docking calculation alone. This encouraged us to test rescoring prospectively on molecules that ranked poorly by docking but that ranked well when rescored by MM-GBSA. A total of 33 molecules highly ranked by MM-GBSA for the three cavities were tested experimentally. Of these, 23 were observed to bind—these are docking false negatives rescued by rescoring. The 10 remaining molecules are true negatives by docking and false positives by MM-GBSA. X-ray crystal structures were determined for 21 of these 23 molecules. In many cases, the geometry prediction by MM-GBSA improved the initial docking pose and more closely resembled the crystallographic result; yet in several cases, the rescored geometry failed to capture large conformational changes in the protein. Intriguingly, rescoring not only rescued docking false positives, but also introduced several new false positives into the top-ranking molecules. We consider the origins of the successes and failures in MM-GBSA rescoring in these model cavity sites and the prospects for rescoring in biologically relevant targets.  相似文献   

17.
18.
Species distribution modelling has become a common approach in ecology in the last decades. As in any modelling exercise, evaluation of the predicted suitability surfaces is a key process, and the area under the receiver operating characteristic (ROC) curve (AUC) has become the most popular statistic for this purpose. A close covariation between the AUC and threshold-dependent discrimination measures (sensitivity Se and specificity Sp) raises into question the advantage of the threshold-independence of the AUC. In this study, the relationship between the AUC and several threshold-dependent discrimination measures is characterized in detail, and the sensitivity of the pattern to variations in the shape of the ROC curve is assessed. Hypothetical suitability values, coming from normal and skew-normal distributions, were simulated for both instances of presence and absence. The flexibility of the skew-normal distribution allowed for the simulation of a wide range of ROC curve configurations. The relationship between the AUC and threshold-dependent measures was graphically assessed; independently of the ROC curve shape, a nonlinear asymptotic relationship between the AUC and Se (and Sp) was obtained after applying the threshold that makes Se = Sp. A nonlinear asymptotic relationship between the AUC and the Youden index was also reported. These results imply that the AUC does not appropriately measure changes in the discrimination of models, and it is especially incapable of distinguishing between models with high discrimination capacity. Se or Sp derived from the application of the threshold that makes them equal is a preferred measure of discrimination power. Together with the rate of false positives and negatives, and with the prevalence of the species, these statistics provide more information about the discrimination capacity of the models than the AUC.  相似文献   

19.
20.
Predicting the interactions between all the possible pairs of proteins in a given organism (making a protein-protein interaction map) is a crucial subject in bioinformatics. Most of the previous methods based on supervised machine learning use datasets containing approximately the same number of interacting pairs of proteins (positives) and non-interacting pairs of proteins (negatives) for training a classifier and are estimated to yield a large number of false positives. Thinking that the negatives used in previous studies cannot adequately represent all the negatives that need to be taken into account, we have developed a method based on multiple Support Vector Machines (SVMs) that uses more negatives than positives for predicting interactions between pairs of yeast proteins and pairs of human proteins. We show that the performance of a single SVM improved as we increased the number of negatives used for training and that, if more than one CPU is available, an approach using multiple SVMs is useful not only for improving the performance of classifiers but also for reducing the time required for training them. Our approach can also be applied to assessing the reliability of high-throughput interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号