首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 224 毫秒
1.
In this paper, the recently developed Extreme Learning Machine (ELM) is used for direct multicategory classification problems in the cancer diagnosis area. ELM avoids problems like local minima, improper learning rate and overfitting commonly faced by iterative learning methods and completes the training very fast. We have evaluated the multi-category classification performance of ELM on three benchmark microarray datasets for cancer diagnosis, namely, the GCM dataset, the Lung dataset and the Lymphoma dataset. The results indicate that ELM produces comparable or better classification accuracies with reduced training time and implementation complexity compared to artificial neural networks methods like conventional back-propagation ANN, Linder's SANN, and Support Vector Machine methods like SVM-OVO and Ramaswamy's SVM-OVA. ELM also achieves better accuracies for classification of individual categories.  相似文献   

2.
Zhao N  Pang B  Shyu CR  Korkin D 《Proteomics》2011,11(22):4321-4330
Structural knowledge about protein-protein interactions can provide insights to the basic processes underlying cell function. Recent progress in experimental and computational structural biology has led to a rapid growth of experimentally resolved structures and computationally determined near-native models of protein-protein interactions. However, determining whether a protein-protein interaction is physiological or it is the artifact of an experimental or computational method remains a challenging problem. In this work, we have addressed two related problems. The first problem is distinguishing between the experimentally obtained physiological and crystal-packing protein-protein interactions. The second problem is concerned with the classification of near-native and inaccurate docking models. We first defined a universal set of interface features and employed a support vector machines (SVM)-based approach to classify the interactions for both problems, with the accuracy, precision, and recall for the first problem classifier reaching 93%. To improve the classification, we next developed a semi-supervised learning approach for the second problem, using transductive SVM (TSVM). We applied both classifiers to a commonly used protein docking benchmark of 124 complexes. We found that while we reached the classification accuracies of 78.9% for the SVM classifier and 80.3% for the TSVM classifier, improving protein-docking methods by model re-ranking remains a challenging problem.  相似文献   

3.
MOTIVATION: An important challenge in the use of large-scale gene expression data for biological classification occurs when the expression dataset being analyzed involves multiple classes. Key issues that need to be addressed under such circumstances are the efficient selection of good predictive gene groups from datasets that are inherently 'noisy', and the development of new methodologies that can enhance the successful classification of these complex datasets. METHODS: We have applied genetic algorithms (GAs) to the problem of multi-class prediction. A GA-based gene selection scheme is described that automatically determines the members of a predictive gene group, as well as the optimal group size, that maximizes classification success using a maximum likelihood (MLHD) classification method. RESULTS: The GA/MLHD-based approach achieves higher classification accuracies than other published predictive methods on the same multi-class test dataset. It also permits substantial feature reduction in classifier genesets without compromising predictive accuracy. We propose that GA-based algorithms may represent a powerful new tool in the analysis and exploration of complex multi-class gene expression data. AVAILABILITY: Supplementary information, data sets and source codes are available at http://www.omniarray.com/bioinformatics/GA.  相似文献   

4.
MOTIVATION: The solvent accessibility of amino acid residues plays an important role in tertiary structure prediction, especially in the absence of significant sequence similarity of a query protein to those with known structures. The prediction of solvent accessibility is less accurate than secondary structure prediction in spite of improvements in recent researches. The k-nearest neighbor method, a simple but powerful classification algorithm, has never been applied to the prediction of solvent accessibility, although it has been used frequently for the classification of biological and medical data. RESULTS: We applied the fuzzy k-nearest neighbor method to the solvent accessibility prediction, using PSI-BLAST profiles as feature vectors, and achieved high prediction accuracies. With leave-one-out cross-validation on the ASTRAL SCOP reference dataset constructed by sequence clustering, our method achieved 64.1% accuracy for a 3-state (buried/intermediate/exposed) prediction (thresholds of 9% for buried/intermediate and 36% for intermediate/exposed) and 86.7, 82.0, 79.0 and 78.5% accuracies for 2-state (buried/exposed) predictions (thresholds of each 0, 5, 16 and 25% for buried/exposed), respectively. Our method also showed slightly better accuracies than other methods by about 2-5% on the RS126 dataset and a benchmarking dataset with 229 proteins. AVAILABILITY: Program and datasets are available at http://biocom1.ssu.ac.kr/FKNNacc/ CONTACT: jul@ssu.ac.kr.  相似文献   

5.
In this paper, an immune-inspired model, named innate and adaptive artificial immune system (IA-AIS) is proposed and applied to the problem of identification of unsolicited bulk e-mail messages (SPAM). It integrates entities analogous to macrophages, B and T lymphocytes, modeling both the innate and the adaptive immune systems. An implementation of the algorithm was capable of identifying more than 99% of legitimate or SPAM messages in particular parameter configurations. It was compared to an optimized version of the naïve Bayes classifier, which has been attained extremely high correct classification rates. It has been concluded that IA-AIS has a greater ability to identify SPAM messages, although the identification of legitimate messages is not as high as that of the implemented naïve Bayes classifier.  相似文献   

6.
Lu J  Zhu Y  Li Y  Lu W  Hu L  Niu B  Qing P  Gu L 《Protein and peptide letters》2010,17(12):1536-1541
Information about interactions between enzymes and small molecules is important for understanding various metabolic bioprocesses. In this article we applied a majority voting system to predict the interactions between enzymes and small molecules in the metabolic pathways, by combining several classifiers including AdaBoost, Bagging and KNN together. The advantage of such a strategy is based on the principle that a predictor based majority voting systems usually provide more reliable results than any single classifier. The prediction accuracies thus obtained on a training dataset and an independent testing dataset were 82.8% and 84.8%, respectively. The prediction accuracy for the networking couples in the independent testing dataset was 75.5%, which is about 4% higher than that reported in a previous study. The web-server for the prediction method presented in this paper is available at http://chemdata.shu.edu.cn/small-enz.  相似文献   

7.
Capillary non-perfusion (CNP) in the retina is a characteristic feature used in the management of a wide range of retinal diseases. There is no well-established computation tool for assessing the extent of CNP. We propose a novel texture segmentation framework to address this problem. This framework comprises three major steps: pre-processing, unsupervised total variation texture segmentation, and supervised segmentation. It employs a state-of-the-art multiphase total variation texture segmentation model which is enhanced by new kernel based region terms. The model can be applied to texture and intensity-based multiphase problems. A supervised segmentation step allows the framework to take expert knowledge into account, an AdaBoost classifier with weighted cost coefficient is chosen to tackle imbalanced data classification problems. To demonstrate its effectiveness, we applied this framework to 48 images from malarial retinopathy and 10 images from ischemic diabetic maculopathy. The performance of segmentation is satisfactory when compared to a reference standard of manual delineations: accuracy, sensitivity and specificity are 89.0%, 73.0%, and 90.8% respectively for the malarial retinopathy dataset and 80.8%, 70.6%, and 82.1% respectively for the diabetic maculopathy dataset. In terms of region-wise analysis, this method achieved an accuracy of 76.3% (45 out of 59 regions) for the malarial retinopathy dataset and 73.9% (17 out of 26 regions) for the diabetic maculopathy dataset. This comprehensive segmentation framework can quantify capillary non-perfusion in retinopathy from two distinct etiologies, and has the potential to be adopted for wider applications.  相似文献   

8.
ABSTRACT: BACKGROUND: Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification. RESULTS: TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naive Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets. CONCLUSIONS: TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.  相似文献   

9.
《IRBM》2020,41(6):331-353
Objectives: Epileptic seizures are one of the most common diseases in society and difficult to detect. In this study, a new method was proposed to automatically detect and classify epileptic seizures from EEG (Electroencephalography) signals.Methods: In the proposed method, EEG signals classification five-classes including the cases of eyes open, eyes closed, healthy, from the tumor region, an epileptic seizure, has been carried out by using the support vector machine (SVM) and the normalization methods comprising the z-score, minimum-maximum, and MAD normalizations. To classify the EEG signals, the support vector machine classifiers having different kernel functions, including Linear, Cubic, and Medium Gaussian, have been used. In order to evaluate the performance of the proposed hybrid models, the confusion matrix, ROC curves, and classification accuracy have been used. The used SVM models are Linear SVM, Cubic SVM, and Medium Gaussian SVM.Results: Without the normalizations, the obtained classification accuracies are 76.90%, 82.40%, and 81.70% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively. After applying the z-score normalization to the multi-class EEG signals dataset, the obtained classification accuracies are 77.10%, 82.30%, and 81.70% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively. With the minimum-maximum normalization, the obtained classification accuracies are 77.20%, 82.40%, and 81.50% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively. Moreover, finally, after applying the MAD normalization to the multi-class EEG signals dataset, the obtained classification accuracies are 76.70%, 82.50%, and 81.40% using Linear SVM, Cubic SVM, and Medium Gaussian SVM, respectively.Conclusion: The obtained results have shown that the best hybrid model is the combination of cubic SVM and MAD normalization in the classification of EEG signals classification five-classes.  相似文献   

10.

Background  

Microarray experiments are becoming a powerful tool for clinical diagnosis, as they have the potential to discover gene expression patterns that are characteristic for a particular disease. To date, this problem has received most attention in the context of cancer research, especially in tumor classification. Various feature selection methods and classifier design strategies also have been generally used and compared. However, most published articles on tumor classification have applied a certain technique to a certain dataset, and recently several researchers compared these techniques based on several public datasets. But, it has been verified that differently selected features reflect different aspects of the dataset and some selected features can obtain better solutions on some certain problems. At the same time, faced with a large amount of microarray data with little knowledge, it is difficult to find the intrinsic characteristics using traditional methods. In this paper, we attempt to introduce a combinational feature selection method in conjunction with ensemble neural networks to generally improve the accuracy and robustness of sample classification.  相似文献   

11.
PurposeAccurate detection and treatment of Coronary Artery Disease is mainly based on invasive Coronary Angiography, which could be avoided provided that a robust, non-invasive detection methodology emerged. Despite the progress of computational systems, this remains a challenging issue. The present research investigates Machine Learning and Deep Learning methods in competing with the medical experts' diagnostic yield. Although the highly accurate detection of Coronary Artery Disease, even from the experts, is presently implausible, developing Artificial Intelligence models to compete with the human eye and expertise is the first step towards a state-of-the-art Computer-Aided Diagnostic system.MethodsA set of 566 patient samples is analysed. The dataset contains Polar Maps derived from scintigraphic Myocardial Perfusion Imaging studies, clinical data, and Coronary Angiography results. The latter is considered as reference standard. For the classification of the medical images, the InceptionV3 Convolutional Neural Network is employed, while, for the categorical and continuous features, Neural Networks and Random Forest classifier are proposed.ResultsThe research suggests that an optimal strategy competing with the medical expert's accuracy involves a hybrid multi-input network composed of InceptionV3 and a Random Forest. This method matches the expert's accuracy, which is 79.15% in the particular dataset.ConclusionImage classification using deep learning methods can cooperate with clinical data classification methods to enhance the robustness of the predicting model, aiming to compete with the medical expert's ability to identify Coronary Artery Disease subjects, from a large scale patient dataset.  相似文献   

12.
Classification is a data mining task the goal of which is to learn a model, from a training dataset, that can predict the class of a new data instance, while clustering aims to discover natural instance-groupings within a given dataset. Learning cluster-based classification systems involves partitioning a training set into data subsets (clusters) and building a local classification model for each data cluster. The class of a new instance is predicted by first assigning the instance to its nearest cluster and then using that cluster’s local classification model to predict the instance’s class. In this paper, we present an ant colony optimization (ACO) approach to building cluster-based classification systems. Our ACO approach optimizes the number of clusters, the positioning of the clusters, and the choice of classification algorithm to use as the local classifier for each cluster. We also present an ensemble approach that allows the system to decide on the class of a given instance by considering the predictions of all local classifiers, employing a weighted voting mechanism based on the fuzzy degree of membership in each cluster. Our experimental evaluation employs five widely used classification algorithms: naïve Bayes, nearest neighbour, Ripper, C4.5, and support vector machines, and results are reported on a suite of 54 popular UCI benchmark datasets.  相似文献   

13.
Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.  相似文献   

14.
This paper introduces a new subcellular localization system (TSSub) for eukaryotic proteins. This system extracts features from both profiles and amino acid sequences. Four different features are extracted from profiles by four probabilistic neural network (PNN) classifiers, respectively (the amino acid composition from whole profiles; the amino acid composition from the N-terminus of profiles; the dipeptide composition from whole profiles and the amino acid composition from fragments of profiles). In addition, a support vector machine (SVM) classifier is added to implement the residue-couple feature extracted from amino acid sequences. The results from the five classifiers are fused by an additional SVM classifier. The overall accuracies of this TSSub reach 93.0 and 77.4% on Reinhardt and Hubbard's eukaryotic protein dataset and Huang and Li's eukaryotic protein dataset, respectively. The comparison with existing methods results shows TSSub provides better prediction performance than existing methods. AVAILABILITY: The web server is available from http://166.111.24.5/webtools/TSSub/index.html.  相似文献   

15.
Taxonomic and phylogenetic fingerprinting based on sequence analysis of gene fragments from the large-subunit rRNA (LSU) gene or the internal transcribed spacer (ITS) region is becoming an integral part of fungal classification. The lack of an accurate and robust classification tool trained by a validated sequence database for taxonomic placement of fungal LSU genes is a severe limitation in taxonomic analysis of fungal isolates or large data sets obtained from environmental surveys. Using a hand-curated set of 8,506 fungal LSU gene fragments, we determined the performance characteristics of a naïve Bayesian classifier across multiple taxonomic levels and compared the classifier performance to that of a sequence similarity-based (BLASTN) approach. The naïve Bayesian classifier was computationally more rapid (>460-fold with our system) than the BLASTN approach, and it provided equal or superior classification accuracy. Classifier accuracies were compared using sequence fragments of 100 bp and 400 bp and two different PCR primer anchor points to mimic sequence read lengths commonly obtained using current high-throughput sequencing technologies. Accuracy was higher with 400-bp sequence reads than with 100-bp reads. It was also significantly affected by sequence location across the 1,400-bp test region. The highest accuracy was obtained across either the D1 or D2 variable region. The naïve Bayesian classifier provides an effective and rapid means to classify fungal LSU sequences from large environmental surveys. The training set and tool are publicly available through the Ribosomal Database Project (http://rdp.cme.msu.edu/classifier/classifier.jsp).  相似文献   

16.

Background

Each lung structure exhales a unique pattern of aerosols, which can be used to detect and monitor lung diseases non-invasively. The challenges are accurately interpreting the exhaled aerosol fingerprints and quantitatively correlating them to the lung diseases.

Objective and Methods

In this study, we presented a paradigm of an exhaled aerosol test that addresses the above two challenges and is promising to detect the site and severity of lung diseases. This paradigm consists of two steps: image feature extraction using sub-regional fractal analysis and data classification using a support vector machine (SVM). Numerical experiments were conducted to evaluate the feasibility of the breath test in four asthmatic lung models. A high-fidelity image-CFD approach was employed to compute the exhaled aerosol patterns under different disease conditions.

Findings

By employing the 10-fold cross-validation method, we achieved 100% classification accuracy among four asthmatic models using an ideal 108-sample dataset and 99.1% accuracy using a more realistic 324-sample dataset. The fractal-SVM classifier has been shown to be robust, highly sensitive to structural variations, and inherently suitable for investigating aerosol-disease correlations.

Conclusion

For the first time, this study quantitatively linked the exhaled aerosol patterns with their underlying diseases and set the stage for the development of a computer-aided diagnostic system for non-invasive detection of obstructive respiratory diseases.  相似文献   

17.
Pattern recognition and classification are two of the key topics in computer science. In this paper a novel method for the task of pattern classification is presented. The proposed method combines a hybrid associative classifier (Clasificador Híbrido Asociativo con Traslación, CHAT, in Spanish), a coding technique for output patterns called one-hot vector and majority voting during the classification step. The method is termed as CHAT One-Hot Majority (CHAT-OHM). The performance of the method is validated by comparing the accuracy of CHAT-OHM with other well-known classification algorithms. During the experimental phase, the classifier was applied to four datasets related to the medical field. The results also show that the proposed method outperforms the original CHAT classification accuracy.  相似文献   

18.
19.
介绍了非负矩阵分解算法(NMF)的基本原理,给出一种利用NMF进行脑电能量谱特征提取的方法。设计试验对10个被试在三种不同注意任务中的脑电信号进行特征提取,并采用人工神经网络作为分类器进行分类测试。结果表明,NMF算法在高维特征空间具有较强的特征选择能力,其分类正确率明显高于主分量分析(PCA)方法和直接法,三种意识任务的分类正确率分别达到84.5、88%和86.5。  相似文献   

20.
The process of knowledge discovery from big and high dimensional datasets has become a popular research topic. The classification problem is a key task in bioinformatics, business intelligence, decision science, astronomy, physics, etc. Building associative classifiers has been a notable research interest in recent years because of their superior accuracy. In associative classifiers, using under-sampling or over-sampling methods for imbalanced big datasets reduces accuracy or increases running time, respectively. Hence, there is a significant need to create efficient associative classifiers for imbalanced big data problems. These classifiers should be able to handle challenges such as memory usage, running time and efficiently exploring the search space. To this end, efficient calculation of measures is a primary objective for associative classifiers. In this paper, we propose a new efficient associative classifier for big imbalanced datasets. The proposed method is based on Rare-PEARs (a multi-objective evolutionary algorithm that efficiently discovers rare and reliable association rules) and is able to evaluate rules in a distributed manner by using a new storing data format. This format simplifies measures calculation and is fully compatible with the MapReduce programming model. We have applied the proposed method (RPII) on a well-known big dataset (ECBDL’14) and have compared our results with seven other learning methods. The experimental results show that RPII outperform other methods in sensitivity and final score measures (the values of sensitivity and final score measures were approximately 0.74 and 0.54 respectively). The results demonstrate that the proposed method is a good candidate for large-scale classification problems; furthermore, it achieves reasonable execution time when the target platform is a typical computer clusters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号