首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 44 毫秒
1.
We compared the performance of several prediction techniques for breast cancer prognosis, based on AU-ROC performance (Area Under ROC) for different prognosis periods. The analyzed dataset contained 1,981 patients and from an initial 25 variables, the 11 most common clinical predictors were retained. We compared eight models from a wide spectrum of predictive models, namely; Generalized Linear Model (GLM), GLM-Net, Partial Least Square (PLS), Support Vector Machines (SVM), Random Forests (RF), Neural Networks, k-Nearest Neighbors (k-NN) and Boosted Trees. In order to compare these models, paired t-test was applied on the model performance differences obtained from data resampling. Random Forests, Boosted Trees, Partial Least Square and GLMNet have superior overall performance, however they are only slightly higher than the other models. The comparative analysis also allowed us to define a relative variable importance as the average of variable importance from the different models. Two sets of variables are identified from this analysis. The first includes number of positive lymph nodes, tumor size, cancer grade and estrogen receptor, all has an important influence on model predictability. The second set incudes variables related to histological parameters and treatment types. The short term vs long term contribution of the clinical variables are also analyzed from the comparative models. From the various cancer treatment plans, the combination of Chemo/Radio therapy leads to the largest impact on cancer prognosis.  相似文献   

2.
3.

Motivation

Ischemic stroke, triggered by an obstruction in the cerebral blood supply, leads to infarction of the affected brain tissue. An accurate and reproducible automatic segmentation is of high interest, since the lesion volume is an important end-point for clinical trials. However, various factors, such as the high variance in lesion shape, location and appearance, render it a difficult task.

Methods

In this article, nine classification methods (e.g. Generalized Linear Models, Random Decision Forests and Convolutional Neural Networks) are evaluated and compared with each other using 37 multiparametric MRI datasets of ischemic stroke patients in the sub-acute phase in terms of their accuracy and reliability for ischemic stroke lesion segmentation. Within this context, a multi-spectral classification approach is compared against mono-spectral classification performance using only FLAIR MRI datasets and two sets of expert segmentations are used for inter-observer agreement evaluation.

Results and Conclusion

The results of this study reveal that high-level machine learning methods lead to significantly better segmentation results compared to the rather simple classification methods, pointing towards a difficult non-linear problem. The overall best segmentation results were achieved by a Random Decision Forest and a Convolutional Neural Networks classification approach, even outperforming all previously published results. However, none of the methods tested in this work are capable of achieving results in the range of the human observer agreement and the automatic ischemic stroke lesion segmentation remains a complicated problem that needs to be explored in more detail to improve the segmentation results.  相似文献   

4.

Background  

Data generated using 'omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of subjects in the study. In this paper, we consider issues relevant in the design of biomedical studies in which the goal is the discovery of a subset of features and an associated algorithm that can predict a binary outcome, such as disease status. We compare the performance of four commonly used classifiers (K-Nearest Neighbors, Prediction Analysis for Microarrays, Random Forests and Support Vector Machines) in high-dimensionality data settings. We evaluate the effects of varying levels of signal-to-noise ratio in the dataset, imbalance in class distribution and choice of metric for quantifying performance of the classifier. To guide study design, we present a summary of the key characteristics of 'omics' data profiled in several human or animal model experiments utilizing high-content mass spectrometry and multiplexed immunoassay based techniques.  相似文献   

5.
Skin is the largest organ and outer enclosure of the integumentary system that protects the human body from pathogens. Among various cancers in the world, skin cancer is one of the most commonly diagnosed cancer which can be either melanoma or non-melanoma. Melanoma cancers are very fatal compared with non-melanoma cancers but the chances of survival rate are high when diagnosed and treated earlier. The main aim of this work is to analyze and investigate the performance of Non-Subsampled Bendlet Transform (NSBT) on various classifiers for detecting melanoma from dermoscopic images. NSBT is a multiscale and multidirectional transform based on second order shearlet system which precisely classifies the curvature over other directional representation systems. Here two-phase classification is employed using k-Nearest Neighbour (kNN), Naive Bayes (NB), Decision Trees (DT) and Support Vector Machines (SVM). The first phase classification is used to classify the images of PH2 database into normal and abnormal images and the second phase classification classifies the abnormal images into benign and malignant. Experimental result shows the improvement in classification accuracy, sensitivity and specificity compared with the state of art methods.  相似文献   

6.
One important application of gene expression analysis is to classify tissue samples according to their gene expression levels. Gene expression data are typically characterized by high dimensionality and small sample size, which makes the classification task quite challenging. In this paper, we present a data-dependent kernel for microarray data classification. This kernel function is engineered so that the class separability of the training data is maximized. A bootstrapping-based resampling scheme is introduced to reduce the possible training bias. The effectiveness of this adaptive kernel for microarray data classification is illustrated with a k-Nearest Neighbor (KNN) classifier. Our experimental study shows that the data-dependent kernel leads to a significant improvement in the accuracy of KNN classifiers. Furthermore, this kernel-based KNN scheme has been demonstrated to be competitive to, if not better than, more sophisticated classifiers such as Support Vector Machines (SVMs) and the Uncorrelated Linear Discriminant Analysis (ULDA) for classifying gene expression data.  相似文献   

7.
《Genomics》2020,112(5):3089-3096
Automatic classification of glaucoma from fundus images is a vital diagnostic tool for Computer-Aided Diagnosis System (CAD). In this work, a novel fused feature extraction technique and ensemble classifier fusion is proposed for diagnosis of glaucoma. The proposed method comprises of three stages. Initially, the fundus images are subjected to preprocessing followed by feature extraction and feature fusion by Intra-Class and Extra-Class Discriminative Correlation Analysis (IEDCA). The feature fusion approach eliminates between-class correlation while retaining sufficient Feature Dimension (FD) for Correlation Analysis (CA). The fused features are then fed to the classifiers namely Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbor (KNN) for classification individually. Finally, Classifier fusion is also designed which combines the decision of the ensemble of classifiers based on Consensus-based Combining Method (CCM). CCM based Classifier fusion adjusts the weights iteratively after comparing the outputs of all the classifiers. The proposed fusion classifier provides a better improvement in accuracy and convergence when compared to the individual algorithms. A classification accuracy of 99.2% is accomplished by the two-level hybrid fusion approach. The method is evaluated on the public datasets High Resolution Fundus (HRF) and DRIVE datasets with cross dataset validation.  相似文献   

8.
This study investigates the use of saliva, as an emerging diagnostic fluid in conjunction with classification techniques to discern biological heterogeneity in clinically labelled gingivitis and periodontitis subjects (80 subjects; 40/group) A battery of classification techniques were investigated as traditional single classifier systems as well as within a novel selective voting ensemble classification approach (SVA) framework. Unlike traditional single classifiers, SVA is shown to reveal patient-specific variations within disease groups, which may be important for identifying proclivity to disease progression or disease stability. Salivary expression profiles of IL-1ß, IL-6, MMP-8, and MIP-1α from 80 patients were analyzed using four classification algorithms (LDA: Linear Discriminant Analysis [LDA], Quadratic Discriminant Analysis [QDA], Naïve Bayes Classifier [NBC] and Support Vector Machines [SVM]) as traditional single classifiers and within the SVA framework (SVA-LDA, SVA-QDA, SVA-NB and SVA-SVM). Our findings demonstrate that performance measures (sensitivity, specificity and accuracy) of traditional classification as single classifier were comparable to that of the SVA counterparts using clinical labels of the samples as ground truth. However, unlike traditional single classifier approaches, the normalized ensemble vote-counts from SVA revealed varying proclivity of the subjects for each of the disease groups. More importantly, the SVA identified a subset of gingivitis and periodontitis samples that demonstrated a biological proclivity commensurate with the other clinical group. This subset was confirmed across SVA-LDA, SVA-QDA, SVA-NB and SVA-SVM. Heatmap visualization of their ensemble sets revealed lack of consensus between these subsets and the rest of the samples within the respective disease groups indicating the unique nature of the patients in these subsets. While the source of variation is not known, the results presented clearly elucidate the need for novel approaches that accommodate inherent heterogeneity and personalized variations within disease groups in diagnostic characterization. The proposed approach falls within the scope of P4 medicine (predictive, preventive, personalized, and participatory) with the ability to identify unique patient profiles that may predict specific disease trajectories and targeted disease management.  相似文献   

9.

Background

The explosively radiating evolution of cichlid fishes of Lake Malawi has yielded an amazing number of haplochromine species estimated as many as 500 to 800 with a surprising degree of diversity not only in color and stripe pattern but also in the shape of jaw and body among them. As these morphological diversities have been a central subject of adaptive speciation and taxonomic classification, such high diversity could serve as a foundation for automation of species identification of cichlids.

Methodology/Principal Finding

Here we demonstrate a method for automatic classification of the Lake Malawi cichlids based on computer vision and geometric morphometrics. For this end we developed a pipeline that integrates multiple image processing tools to automatically extract informative features of color and stripe patterns from a large set of photographic images of wild cichlids. The extracted information was evaluated by statistical classifiers Support Vector Machine and Random Forests. Both classifiers performed better when body shape information was added to the feature of color and stripe. Besides the coloration and stripe pattern, body shape variables boosted the accuracy of classification by about 10%. The programs were able to classify 594 live cichlid individuals belonging to 12 different classes (species and sexes) with an average accuracy of 78%, contrasting to a mere 42% success rate by human eyes. The variables that contributed most to the accuracy were body height and the hue of the most frequent color.

Conclusions

Computer vision showed a notable performance in extracting information from the color and stripe patterns of Lake Malawi cichlids although the information was not enough for errorless species identification. Our results indicate that there appears an unavoidable difficulty in automatic species identification of cichlid fishes, which may arise from short divergence times and gene flow between closely related species.  相似文献   

10.
Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).  相似文献   

11.
Optical Music Recognition (OMR) has received increasing attention in recent years. In this paper, we propose a classifier based on a new method named Directed Acyclic Graph-Large margin Distribution Machine (DAG-LDM). The DAG-LDM is an improvement of the Large margin Distribution Machine (LDM), which is a binary classifier that optimizes the margin distribution by maximizing the margin mean and minimizing the margin variance simultaneously. We modify the LDM to the DAG-LDM to solve the multi-class music symbol classification problem. Tests are conducted on more than 10000 music symbol images, obtained from handwritten and printed images of music scores. The proposed method provides superior classification capability and achieves much higher classification accuracy than the state-of-the-art algorithms such as Support Vector Machines (SVMs) and Neural Networks (NNs).  相似文献   

12.
A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern.  相似文献   

13.
《IRBM》2022,43(4):251-258
ObjectivesEsophageal Cancer is the sixth most common cancer with a high fatality rate. Early prognosis of esophageal abnormalities can improve the survival rate of the patients. The sequence of the progress of the esophageal cancer is from esophagitis to non-dysplasia Barrett's esophagus to dysplasia Barrett's esophagus to esophageal adenocarcinoma (EAC). Many studies revealed a 5-fold increase in EAC patients diagnosed with esophagitis, and those diagnosed with Barrett's esophagus have a greater risk of EAC.Material and methodsConvolutional Neural Network (CNN) with efficient feature extractors enable better prognosis of the pre cancerous stage, Barrett's esophagus and esophagitis. The transfer learning techniques with CNN can extract more relevant features for the automated classification of Barrett's esophagus and esophagitis. This paper presents a study on the classification of the esophagitis and Barrett's esophagus (BE) using Deep Convolution Neural Networks (DCNN).ResultsIn the first experiment, the DCNN models perform as a feature extractor, and standard classifiers do the classification. The performance analysis shows that the CNN model ResNet50 with Support Vector Machine (SVM) has an accuracy of 93.5%, recall 93.5%, precision 93.4%, f score 93.5%, AUC 89.8%. In the second experiment, the DCNN classification models perform the classification with Transfer Learning and fine-tuning. The ResNet50 model has improved accuracy of 94.46%, precision 94.46%, f score 94.46%, AUC 96.20%.ConclusionThe ResNet50 model with transfer learning and fine-tuning gives a better performance than the ResNet50 model with SVM classifier. Our experiments show that the DCNN is effective for diagnosing EAC, both as feature extractors and classification models with transfer learning and fine-tuning.  相似文献   

14.
Due to the large volume of protein sequence data, computational methods to determine the structure class and the fold class of a protein sequence have become essential. Several techniques based on sequence similarity, Neural Networks, Support Vector Machines (SVMs), etc. have been applied. Since most of these classifiers use binary classifiers for multi-classification, there may be (N) c2 classifiers required. This paper presents a framework using the Tree-Augmented Bayesian Networks (TAN) which performs multi-classification based on the theory of learning Bayesian Networks and using improved feature vector representation of (Ding et al., 2001). In order to enhance TAN's performance, pre-processing of data is done by feature discretization and post-processing is done by using Mean Probability Voting (MPV) scheme. The advantage of using Bayesian approach over other learning methods is that the network structure is intuitive. In addition, one can read off the TAN structure probabilities to determine the significance of each feature (say, hydrophobicity) for each class, which helps to further understand the complexity in protein structure. The experiments on the datasets used in three prominent recent works show that our approach is more accurate than other discriminative methods. The framework is implemented on the BAYESPROT web server and it is available at http://www-appn.comp.nus.edu.sg/~bioinfo/bayesprot/Default.htm. More detailed results are also available on the above website.  相似文献   

15.

Background

Lung cancer is a very frequent and lethal tumor with an identifiable risk population. Cytological analysis and chest X-ray failed to reduce mortality, and CT screenings are still controversially discussed. Recent studies provided first evidence for the potential usefulness of autoantigens as markers for lung cancer.

Methods

We used extended panels of arrayed antigens and determined autoantibody signatures of sera from patients with different kinds of lung cancer, different common non-tumor lung pathologies, and controls without any lung disease by a newly developed computer aided image analysis procedure. The resulting signatures were classified using linear kernel Support Vector Machines and 10-fold cross-validation.

Results

The novel approach allowed for discriminating lung cancer patients from controls without any lung disease with a specificity of 97.0%, a sensitivity of 97.9%, and an accuracy of 97.6%. The classification of stage IA/IB tumors and controls yielded a specificity of 97.6%, a sensitivity of 75.9%, and an accuracy of 92.9%. The discrimination of lung cancer patients from patients with non-tumor lung pathologies reached an accuracy of 88.5%.

Conclusion

We were able to separate lung cancer patients from subjects without any lung disease with high accuracy. Furthermore, lung cancer patients could be seprated from patients with other non-tumor lung diseases. These results provide clear evidence that blood-based tests open new avenues for the early diagnosis of lung cancer.  相似文献   

16.
Wang X 《Genomics》2012,99(2):90-95
Two-gene classifiers have attracted a broad interest for their simplicity and practicality. Most existing two-gene classification algorithms were involved in exhaustive search that led to their low time-efficiencies. In this study, we proposed two new two-gene classification algorithms which used simple univariate gene selection strategy and constructed simple classification rules based on optimal cut-points for two genes selected. We detected the optimal cut-point with the information entropy principle. We applied the two-gene classification models to eleven cancer gene expression datasets and compared their classification performance to that of some established two-gene classification models like the top-scoring pairs model and the greedy pairs model, as well as standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. These comparisons indicated that the performance of our two-gene classifiers was comparable to or better than that of compared models.  相似文献   

17.

Background  

Prediction of the transmembrane strands and topology of β-barrel outer membrane proteins is of interest in current bioinformatics research. Several methods have been applied so far for this task, utilizing different algorithmic techniques and a number of freely available predictors exist. The methods can be grossly divided to those based on Hidden Markov Models (HMMs), on Neural Networks (NNs) and on Support Vector Machines (SVMs). In this work, we compare the different available methods for topology prediction of β-barrel outer membrane proteins. We evaluate their performance on a non-redundant dataset of 20 β-barrel outer membrane proteins of gram-negative bacteria, with structures known at atomic resolution. Also, we describe, for the first time, an effective way to combine the individual predictors, at will, to a single consensus prediction method.  相似文献   

18.
Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.  相似文献   

19.
Electrograms stored in Implantable Cardioverter Defibrillators (ICD-EGM) have been proven to convey useful information for roughly determining the anatomical location of the Left Ventricular Tachycardia exit site (LVTES). Our aim here was to evaluate the possibilities from a machine learning system intended to provide an estimation of the LVTES anatomical region with the use of ICD-EGM in the situation where 12-lead electrocardiogram of ventricular tachycardia are not available. Several machine learning techniques were specifically designed and benchmarked, both from classification (such as Neural Networks (NN), and Support Vector Machines (SVM)) and regression (Kernel Ridge Regression) problem statements. Classifiers were evaluated by using accuracy rates for LVTES identification in a controlled number of anatomical regions, and the regression approach quality was studied in terms of the spatial resolution. We analyzed the ICD-EGM of 23 patients (18±10 EGM per patient) during left ventricular pacing and simultaneous recording of the spatial coordinates of the pacing electrode with a navigation system. Several feature sets extracted from ICD-EGM (consisting of times and voltages) were shown to convey more discriminative information than the raw waveform. Among classifiers, the SVM performed slightly better than NN. In accordance with previous clinical works, the average spatial resolution for the LVTES was about 3 cm, as in our system, which allows it to support the faster determination of the LVTES in ablation procedures. The proposed approach also provides with a framework suitable for driving the design of improved performance future systems.  相似文献   

20.

Background

Effective management of patients with diabetic foot infection is a crucial concern. A delay in prescribing appropriate antimicrobial agent can lead to amputation or life threatening complications. Thus, this electronic nose (e-nose) technique will provide a diagnostic tool that will allow for rapid and accurate identification of a pathogen.

Results

This study investigates the performance of e-nose technique performing direct measurement of static headspace with algorithm and data interpretations which was validated by Headspace SPME-GC-MS, to determine the causative bacteria responsible for diabetic foot infection. The study was proposed to complement the wound swabbing method for bacterial culture and to serve as a rapid screening tool for bacteria species identification. The investigation focused on both single and poly microbial subjected to different agar media cultures. A multi-class technique was applied including statistical approaches such as Support Vector Machine (SVM), K Nearest Neighbor (KNN), Linear Discriminant Analysis (LDA) as well as neural networks called Probability Neural Network (PNN). Most of classifiers successfully identified poly and single microbial species with up to 90% accuracy.

Conclusions

The results obtained from this study showed that the e-nose was able to identify and differentiate between poly and single microbial species comparable to the conventional clinical technique. It also indicates that even though poly and single bacterial species in different agar solution emit different headspace volatiles, they can still be discriminated and identified using multivariate techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号