首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Clinical trials increasingly employ medical imaging data in conjunction with supervised classifiers, where the latter require large amounts of training data to accurately model the system. Yet, a classifier selected at the start of the trial based on smaller and more accessible datasets may yield inaccurate and unstable classification performance. In this paper, we aim to address two common concerns in classifier selection for clinical trials: (1) predicting expected classifier performance for large datasets based on error rates calculated from smaller datasets and (2) the selection of appropriate classifiers based on expected performance for larger datasets. We present a framework for comparative evaluation of classifiers using only limited amounts of training data by using random repeated sampling (RRS) in conjunction with a cross-validation sampling strategy. Extrapolated error rates are subsequently validated via comparison with leave-one-out cross-validation performed on a larger dataset. The ability to predict error rates as dataset size increases is demonstrated on both synthetic data as well as three different computational imaging tasks: detecting cancerous image regions in prostate histopathology, differentiating high and low grade cancer in breast histopathology, and detecting cancerous metavoxels in prostate magnetic resonance spectroscopy. For each task, the relationships between 3 distinct classifiers (k-nearest neighbor, naive Bayes, Support Vector Machine) are explored. Further quantitative evaluation in terms of interquartile range (IQR) suggests that our approach consistently yields error rates with lower variability (mean IQRs of 0.0070, 0.0127, and 0.0140) than a traditional RRS approach (mean IQRs of 0.0297, 0.0779, and 0.305) that does not employ cross-validation sampling for all three datasets.  相似文献   

2.
Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21-61%) and 29% (IQR, 15-65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04-5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n = 758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice.  相似文献   

3.
Prediction error estimation: a comparison of resampling methods   总被引:1,自引:0,他引:1  
MOTIVATION: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. RESULTS: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase. SUPPLEMENTARY INFORMATION: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm).  相似文献   

4.
MOTIVATION: Estimation of misclassification error has received increasing attention in clinical diagnosis and bioinformatics studies, especially in small sample studies with microarray data. Current error estimation methods are not satisfactory because they either have large variability (such as leave-one-out cross-validation) or large bias (such as resubstitution and leave-one-out bootstrap). While small sample size remains one of the key features of costly clinical investigations or of microarray studies that have limited resources in funding, time and tissue materials, accurate and easy-to-implement error estimation methods for small samples are desirable and will be beneficial. RESULTS: A bootstrap cross-validation method is studied. It achieves accurate error estimation through a simple procedure with bootstrap resampling and only costs computer CPU time. Simulation studies and applications to microarray data demonstrate that it performs consistently better than its competitors. This method possesses several attractive properties: (1) it is implemented through a simple procedure; (2) it performs well for small samples with sample size, as small as 16; (3) it is not restricted to any particular classification rules and thus applies to many parametric or non-parametric methods.  相似文献   

5.
BackgroundChikungunya, dengue, and Zika are three different arboviruses which have similar symptoms and are a major public health issue in Colombia. Despite the mandatory reporting of these arboviruses to the National Surveillance System in Colombia (SIVIGILA), it has been reported that the system captures less than 10% of diagnosed cases in some cities.Methodology/Principal findingsTo assess the scope and degree of arboviruses reporting in Colombia between 2014–2017, we conducted an observational study of surveillance data using the capture-recapture approach in three Colombian cities. Using healthcare facility registries (capture data) and surveillance-notified cases (recapture data), we estimated the degree of reporting by clinical diagnosis. We fit robust Poisson regressions to identify predictors of reporting and estimated the predicted probability of reporting by disease and year. To account for the potential misclassification of the clinical diagnosis, we used the simulation extrapolation for misclassification (MC-SIMEX) method. A total of 266,549 registries were examined. Overall arboviruses’ reporting ranged from 5.3% to 14.7% and varied in magnitude according to age and year of diagnosis. Dengue was the most notified disease (21–70%) followed by Zika (6–45%). The highest reporting rate was seen in 2016, an epidemic year. The MC-SIMEX corrected rates indicated underestimation of the reporting due to the potential misclassification bias.ConclusionsThese findings reflect challenges on arboviruses’ reporting, and therefore, potential challenges on the estimation of arboviral burden in Colombia and other endemic settings with similar surveillance systems.  相似文献   

6.
We present an approach to construct a classification rule based on the mass spectrometry data provided by the organizers of the "Classification Competition on Clinical Mass Spectrometry Proteomic Diagnosis Data." Before constructing a classification rule, we attempted to pre-process the data and to select features of the spectra that were likely due to true biological signals (i.e., peptides/proteins). As a result, we selected a set of 92 features. To construct the classification rule, we considered eight methods for selecting a subset of the features, combined with seven classification methods. The performance of the resulting 56 combinations was evaluated by using a cross-validation procedure with 1000 re-sampled data sets. The best result, as indicated by the lowest overall misclassification rate, was obtained by using the whole set of 92 features as the input for a support-vector machine (SVM) with a linear kernel. This method was therefore used to construct the classification rule. For the training data set, the total error rate for the classification rule, as estimated by using leave-one-out cross-validation, was equal to 0.16, with the sensitivity and specificity equal to 0.87 and 0.82, respectively.  相似文献   

7.
Is cross-validation valid for small-sample microarray classification?   总被引:5,自引:0,他引:5  
MOTIVATION: Microarray classification typically possesses two striking attributes: (1) classifier design and error estimation are based on remarkably small samples and (2) cross-validation error estimation is employed in the majority of the papers. Thus, it is necessary to have a quantifiable understanding of the behavior of cross-validation in the context of very small samples. RESULTS: An extensive simulation study has been performed comparing cross-validation, resubstitution and bootstrap estimation for three popular classification rules-linear discriminant analysis, 3-nearest-neighbor and decision trees (CART)-using both synthetic and real breast-cancer patient data. Comparison is via the distribution of differences between the estimated and true errors. Various statistics for the deviation distribution have been computed: mean (for estimator bias), variance (for estimator precision), root-mean square error (for composition of bias and variance) and quartile ranges, including outlier behavior. In general, while cross-validation error estimation is much less biased than resubstitution, it displays excessive variance, which makes individual estimates unreliable for small samples. Bootstrap methods provide improved performance relative to variance, but at a high computational cost and often with increased bias (albeit, much less than with resubstitution).  相似文献   

8.
Vaccine safety studies are often electronic health record (EHR)‐based observational studies. These studies often face significant methodological challenges, including confounding and misclassification of adverse event. Vaccine safety researchers use self‐controlled case series (SCCS) study design to handle confounding effect and employ medical chart review to ascertain cases that are identified using EHR data. However, for common adverse events, limited resources often make it impossible to adjudicate all adverse events observed in electronic data. In this paper, we considered four approaches for analyzing SCCS data with confirmation rates estimated from an internal validation sample: (1) observed cases, (2) confirmed cases only, (3) known confirmation rate, and (4) multiple imputation (MI). We conducted a simulation study to evaluate these four approaches using type I error rates, percent bias, and empirical power. Our simulation results suggest that when misclassification of adverse events is present, approaches such as observed cases, confirmed case only, and known confirmation rate may inflate the type I error, yield biased point estimates, and affect statistical power. The multiple imputation approach considers the uncertainty of estimated confirmation rates from an internal validation sample, yields a proper type I error rate, largely unbiased point estimate, proper variance estimate, and statistical power.  相似文献   

9.
MOTIVATION: Ranking gene feature sets is a key issue for both phenotype classification, for instance, tumor classification in a DNA microarray experiment, and prediction in the context of genetic regulatory networks. Two broad methods are available to estimate the error (misclassification rate) of a classifier. Resubstitution fits a single classifier to the data, and applies this classifier in turn to each data observation. Cross-validation (in leave-one-out form) removes each observation in turn, constructs the classifier, and then computes whether this leave-one-out classifier correctly classifies the deleted observation. Resubstitution typically underestimates classifier error, severely so in many cases. Cross-validation has the advantage of producing an effectively unbiased error estimate, but the estimate is highly variable. In many applications it is not the misclassification rate per se that is of interest, but rather the construction of gene sets that have the potential to classify or predict. Hence, one needs to rank feature sets based on their performance. RESULTS: A model-based approach is used to compare the ranking performances of resubstitution and cross-validation for classification based on real-valued feature sets and for prediction in the context of probabilistic Boolean networks (PBNs). For classification, a Gaussian model is considered, along with classification via linear discriminant analysis and the 3-nearest-neighbor classification rule. Prediction is examined in the steady-distribution of a PBN. Three metrics are proposed to compare feature-set ranking based on error estimation with ranking based on the true error, which is known owing to the model-based approach. In all cases, resubstitution is competitive with cross-validation relative to ranking accuracy. This is in addition to the enormous savings in computation time afforded by resubstitution.  相似文献   

10.
We investigate if the use of a priori knowledge allows an improvement of medical decision making. We compare two frameworks of classification – direct and indirect classification – with respect to different classification errors: differential misclassification, observed misclassification and true misclassification. We analyze general behaviors of the classifiers in an artificial example and furthermore as being interested in the diagnosis of early glaucoma we adapt a simulation model of the optic nerve head. Indirect classifiers outperform direct classifiers in certain parameter situations of a Monte‐Carlo study. In summary, we demonstrate that indirect classification provides a flexible framework to improve diagnostic rules by using explicit a priori knowledge in clinical research.  相似文献   

11.
MOTIVATION: A major problem of pattern classification is estimation of the Bayes error when only small samples are available. One way to estimate the Bayes error is to design a classifier based on some classification rule applied to sample data, estimate the error of the designed classifier, and then use this estimate as an estimate of the Bayes error. Relative to the Bayes error, the expected error of the designed classifier is biased high, and this bias can be severe with small samples. RESULTS: This paper provides a correction for the bias by subtracting a term derived from the representation of the estimation error. It does so for Boolean classifiers, these being defined on binary features. Although the general theory applies to any Boolean classifier, a model is introduced to reduce the number of parameters. A key point is that the expected correction is conservative. Properties of the corrected estimate are studied via simulation. The correction applies to binary predictors because they are mathematically identical to Boolean classifiers. In this context the correction is adapted to the coefficient of determination, which has been used to measure nonlinear multivariate relations between genes and design genetic regulatory networks. An application using gene-expression data from a microarray experiment is provided on the website http://gspsnap.tamu.edu/smallsample/ (user:'smallsample', password:'smallsample)').  相似文献   

12.

Background

We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).

Results

With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles.

Conclusions

Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.
  相似文献   

13.
A recently proposed optimal Bayesian classification paradigm addresses optimal error rate analysis for small-sample discrimination, including optimal classifiers, optimal error estimators, and error estimation analysis tools with respect to the probability of misclassification under binary classes. Here, we address multi-class problems and optimal expected risk with respect to a given risk function, which are common settings in bioinformatics. We present Bayesian risk estimators (BRE) under arbitrary classifiers, the mean-square error (MSE) of arbitrary risk estimators under arbitrary classifiers, and optimal Bayesian risk classifiers (OBRC). We provide analytic expressions for these tools under several discrete and Gaussian models and present a new methodology to approximate the BRE and MSE when analytic expressions are not available. Of particular note, we present analytic forms for the MSE under Gaussian models with homoscedastic covariances, which are new even in binary classification.  相似文献   

14.
Avoiding model selection bias in small-sample genomic datasets   总被引:2,自引:0,他引:2  
MOTIVATION: Genomic datasets generated by high-throughput technologies are typically characterized by a moderate number of samples and a large number of measurements per sample. As a consequence, classification models are commonly compared based on resampling techniques. This investigation discusses the conceptual difficulties involved in comparative classification studies. Conclusions derived from such studies are often optimistically biased, because the apparent differences in performance are usually not controlled in a statistically stringent framework taking into account the adopted sampling strategy. We investigate this problem by means of a comparison of various classifiers in the context of multiclass microarray data. RESULTS: Commonly used accuracy-based performance values, with or without confidence intervals, are inadequate for comparing classifiers for small-sample data. We present a statistical methodology that avoids bias in cross-validated model selection in the context of small-sample scenarios. This methodology is valid for both k-fold cross-validation and repeated random sampling.  相似文献   

15.
Introgressive hybridization between domestic dogs and wolves (Canis lupus) represents an emblematic case of anthropogenic hybridization and is increasingly threatening the genomic integrity of wolf populations expanding into human-modified landscapes. But studies formally estimating prevalence and accounting for imperfect detectability and uncertainty in hybrid classification are lacking. Our goal was to present an approach to formally estimate the proportion of admixture by using a capture-recapture (CR) framework applied to individual multilocus genotypes detected from non-invasive samples collected from a protected wolf population in Italy. We scored individual multilocus genotypes using a panel of 12 microsatellites and assigned genotypes to reference wolf and dog populations through Bayesian clustering procedures. Based on 152 samples, our dataset comprised the capture histories of 39 individuals sampled in 7 wolf packs and was organized in bi-monthly sampling occasions (Aug 2015−May 2016). We fitted CR models using a multievent formulation to explicitly handle uncertainty in individual classification, and accordingly examined 2 model scenarios: one reflecting a traditional approach to classifying individuals (i.e., minimizing the misclassification of wolves as hybrids; Type 1 error), and the other using a more stringent criterion aimed to balance Type 1 and Type 2 error rates (i.e., the misclassification of hybrids as wolves). Compared to the sample proportion of admixed individuals in the dataset (43.6%), formally estimated prevalence was 50% under the first and 70% under the second scenario, with 71.4% and 85.7% of admixed packs, respectively. At the individual level, the proportion of dog ancestry in the wolf population averaged 7.8% (95% CI = 4.4−11%). Balancing between Type 1 and 2 error rates in assignment tests, our second scenario produced an estimate of prevalence 40% higher compared to the alternative scenario, corresponding to a 65% decrease in Type 2 and no increase in Type 1 error rates. Providing a formal and innovative estimation approach to assess prevalence in admixed wild populations, our study confirms previous population modeling indicating that reproductive barriers between wolves and dogs, or dilution of dog genes through backcrossing, should not be expected per se to prevent the spread of introgression. As anthropogenic hybridization is increasingly affecting animal species globally, our approach is of interest to a broader audience of wildlife conservationists and practitioners. © 2021 The Authors. The Journal of Wildlife Management published by Wiley Periodicals LLC on behalf of The Wildlife Society.  相似文献   

16.
Small sample issues for microarray-based classification   总被引:2,自引:0,他引:2  
In order to study the molecular biological differences between normal and diseased tissues, it is desirable to perform classification among diseases and stages of disease using microarray-based gene-expression values. Owing to the limited number of microarrays typically used in these studies, serious issues arise with respect to the design, performance and analysis of classifiers based on microarray data. This paper reviews some fundamental issues facing small-sample classification: classification rules, constrained classifiers, error estimation and feature selection. It discusses both unconstrained and constrained classifier design from sample data, and the contributions to classifier error from constrained optimization and lack of optimality owing to design from sample data. The difficulty with estimating classifier error when confined to small samples is addressed, particularly estimating the error from training data. The impact of small samples on the ability to include more than a few variables as classifier features is explained.  相似文献   

17.
MOTIVATION: Machine learning methods such as neural networks, support vector machines, and other classification and regression methods rely on iterative optimization of the model quality in the space of the parameters of the method. Model quality measures (accuracies, correlations, etc.) are frequently overly optimistic because the training sets are dominated by particular families and subfamilies. To overcome the bias, the dataset is usually reduced by filtering out closely related objects. However, such filtering uses fixed similarity thresholds and ignores a part of the training information. RESULTS: We suggested a novel approach to calculate prediction model quality based on assigning to each data point inverse density weights derived from the postulated distance metric. We demonstrated that our new weighted measures estimate the model generalization better and are consistent with the machine learning theory. The Vapnik-Chervonenkis theorem was reformulated and applied to derive the space-uniform error estimates. Two examples were used to illustrate the advantages of the inverse density weighting. First, we demonstrated on a set with a built-in bias that the unweighted cross-validation procedure leads to an overly optimistic quality estimate, while the density-weighted quality estimates are more realistic. Second, an analytical equation for weighted quality estimates was used to derive an SVM model for signal peptide prediction using a full set of known signal peptides, instead of the usual filtered subset.  相似文献   

18.
Gustafson P  Le Nhu D 《Biometrics》2002,58(4):878-887
It is well known that imprecision in the measurement of predictor variables typically leads to bias in estimated regression coefficients. We compare the bias induced by measurement error in a continuous predictor with that induced by misclassification of a binary predictor in the contexts of linear and logistic regression. To make the comparison fair, we consider misclassification probabilities for a binary predictor that correspond to dichotomizing an imprecise continuous predictor in lieu of its precise counterpart. On this basis, nondifferential binary misclassification is seen to yield more bias than nondifferential continuous measurement error. However, it is known that differential misclassification results if a binary predictor is actually formed by dichotomizing a continuous predictor subject to nondifferential measurement error. When the postulated model linking the response and precise continuous predictor is correct, this differential misclassification is found to yield less bias than continuous measurement error, in contrast with nondifferential misclassification, i.e., dichotomization reduces the bias due to mismeasurement. This finding, however, is sensitive to the form of the underlying relationship between the response and the continuous predictor. In particular, we give a scenario where dichotomization involves a trade-off between model fit and misclassification bias. We also examine how the bias depends on the choice of threshold in the dichotomization process and on the correlation between the imprecise predictor and a second precise predictor.  相似文献   

19.
Class II human leukocyte antigens (HLA II) are proteins involved in the human immunological adaptive response by binding and exposing some pre-processed, non-self peptides in the extracellular domain in order to make them recognizable by the CD4+ T lymphocytes. However, the understanding of HLA–peptide binding interaction is a crucial step for designing a peptide-based vaccine because the high rate of polymorphisms in HLA class II molecules creates a big challenge, even though the HLA II proteins can be grouped into supertypes, where members of different class bind a similar pool of peptides. Hence, first we performed the supertype classification of 27 HLA II proteins using their binding affinities and structural-based linear motifs to create a stable group of supertypes. For this purpose, a well-known clustering method was used, and then, a consensus was built to find the stable groups and to show the functional and structural correlation of HLA II proteins. Thus, the overlap of the binding events was measured, confirming a large promiscuity within the HLA II–peptide interactions. Moreover, a very low rate of locus-specific binding events was observed for the HLA-DP genetic locus, suggesting a different binding selectivity of these proteins with respect to HLA-DR and HLA-DQ proteins. Secondly, a predictor based on a support vector machine (SVM) classifier was designed to recognize HLA II-binding peptides. The efficiency of prediction was estimated using precision, recall (sensitivity), specificity, accuracy, F-measure, and area under the ROC curve values of random subsampled dataset in comparison with other supervised classifiers. Also the leave-one-out cross-validation was performed to establish the efficiency of the predictor. The availability of HLA II–peptide interaction dataset, HLA II-binding motifs, high-quality amino acid indices, peptide dataset for SVM training, and MATLAB code of the predictor is available at http://sysbio.icm.edu.pl/HLA.  相似文献   

20.
Phenice (Am J Phys Anthropol 30 (1969):297–301) reported a success rate of 96% for his method of sex determination based on three morphological features of the pelvis. Numerous studies have tested and evaluated the method with affirmative results. The results of the study by MacLaughlin and Bruce (J Forensic Sci 35 (1990):1384–1392) were inconsistent with other studies, reporting far lower rates of accuracy and a greater degree of interobserver error. The authors believe that this may be the result of the inclusion of an “ambiguous” classification category. Revised modelling using forced classification of sex provides much higher classification rates with the implication that the poor results reported by MacLaughlin and Bruce were due to methodological error for the most part. Am J Phys Anthropol 159:182–183, 2016. © 2015 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号