首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Anticipated hand movements of amputee subjects are considered difficult to classify using only Electromyogram (EMG) signals and machine learning techniques. For a long time, classifying such s-EMG signals have been considered as a non-linear problem, and the problem of signal sparsity has not been given detailed attention in a large set of action classes. For addressing these problems, this paper is proposing a linear-time classifier termed as Random Fourier Mapped Collaborative Representation with distance weighted Tikhonov regularization matrix (RFMCRT). RFMCRT attempts to tackle the non-linear problem via Random Fourier Features and sparsity issue with collaborative representation. The projection error of Random Fourier Features is reduced by projecting to the same dimension as the original feature space and later finding the collaborative representation, with an optional non-negative constraint (RFMNNCRT). The proposed two classifiers were tested with time-domain features computed from the EMG signals obtained from NINAPRO databases using a non-overlapping sliding window size of 256 ms. Due to the random nature of our proposed classifiers, this paper has computed the average and worst-case performance for 50 trials and compared them with other reported classifiers. The results show that RFMNNCRT (average case) outperformed state-of-the-art classifiers with the accuracy of 93.44% for intact subjects and 55.67% for amputee subjects. In the worst-case situation, RFMCRT achieves considerable performance for the same, with the reported accuracy of 91.55% and 50.27% respectively. Our proposed classifier guarantees acceptable levels of accuracy for large classes of hand movements and also maintains good computational efficiency in comparison to LDA and SVM.  相似文献   

2.
We have introduced a new method of protein secondary structure prediction which is based on the theory of support vector machine (SVM). SVM represents a new approach to supervised pattern classification which has been successfully applied to a wide range of pattern recognition problems, including object recognition, speaker identification, gene function prediction with microarray expression profile, etc. In these cases, the performance of SVM either matches or is significantly better than that of traditional machine learning approaches, including neural networks.The first use of the SVM approach to predict protein secondary structure is described here. Unlike the previous studies, we first constructed several binary classifiers, then assembled a tertiary classifier for three secondary structure states (helix, sheet and coil) based on these binary classifiers. The SVM method achieved a good performance of segment overlap accuracy SOV=76.2 % through sevenfold cross validation on a database of 513 non-homologous protein chains with multiple sequence alignments, which out-performs existing methods. Meanwhile three-state overall per-residue accuracy Q(3) achieved 73.5 %, which is at least comparable to existing single prediction methods. Furthermore a useful "reliability index" for the predictions was developed. In addition, SVM has many attractive features, including effective avoidance of overfitting, the ability to handle large feature spaces, information condensing of the given data set, etc. The SVM method is conveniently applied to many other pattern classification tasks in biology.  相似文献   

3.
The unpredictability of the occurrence of epileptic seizures makes it difficult to detect and treat this condition effectively. An automatic system that characterizes epileptic activities in EEG signals would allow patients or the people near them to take appropriate precautions, would allow clinicians to better manage the condition, and could provide more insight into these phenomena thereby revealing important clinical information. Various methods have been proposed to detect epileptic activity in EEG recordings. Because of the nonlinear and dynamic nature of EEG signals, the use of nonlinear Higher Order Spectra (HOS) features is a seemingly promising approach. This paper presents the methodology employed to extract HOS features (specifically, cumulants) from normal, interictal, and epileptic EEG segments and to use significant features in classifiers for the detection of these three classes. In this work, 300 sets of EEG data belonging to the three classes were used for feature extraction and classifier development and evaluation. The results show that the HOS based measures have unique ranges for the different classes with high confidence level (p-value < 0.0001). On evaluating several classifiers with the significant features, it was observed that the Support Vector Machine (SVM) presented a high detection accuracy of 98.5% thereby establishing the possibility of effective EEG segment classification using the proposed technique.  相似文献   

4.
Ensemble classifier for protein fold pattern recognition   总被引:4,自引:0,他引:4  
MOTIVATION: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. RESULTS: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6-21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. AVAILABILITY: The ensemble classifier, called PFP-Pred, is available as a web-server at http://202.120.37.186/bioinf/fold/PFP-Pred.htm for public usage.  相似文献   

5.
Epilepsy is a common neurological disorder that is characterized by the recurrence of seizures. Electroencephalogram (EEG) signals are widely used to diagnose seizures. Because of the non-linear and dynamic nature of the EEG signals, it is difficult to effectively decipher the subtle changes in these signals by visual inspection and by using linear techniques. Therefore, non-linear methods are being researched to analyze the EEG signals. In this work, we use the recorded EEG signals in Recurrence Plots (RP), and extract Recurrence Quantification Analysis (RQA) parameters from the RP in order to classify the EEG signals into normal, ictal, and interictal classes. Recurrence Plot (RP) is a graph that shows all the times at which a state of the dynamical system recurs. Studies have reported significantly different RQA parameters for the three classes. However, more studies are needed to develop classifiers that use these promising features and present good classification accuracy in differentiating the three types of EEG segments. Therefore, in this work, we have used ten RQA parameters to quantify the important features in the EEG signals.These features were fed to seven different classifiers: Support vector machine (SVM), Gaussian Mixture Model (GMM), Fuzzy Sugeno Classifier, K-Nearest Neighbor (KNN), Naive Bayes Classifier (NBC), Decision Tree (DT), and Radial Basis Probabilistic Neural Network (RBPNN). Our results show that the SVM classifier was able to identify the EEG class with an average efficiency of 95.6%, sensitivity and specificity of 98.9% and 97.8%, respectively.  相似文献   

6.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques.  相似文献   

7.
It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically-inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.  相似文献   

8.
We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of k-length subsequences ("k-mers") in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSI-BLAST in order to build the profiles is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profile-based string kernels used with SVM classifiers strongly outperform all recently presented supervised SVM methods. We further examine how to incorporate predicted secondary structure information into the profile kernel to obtain a small but significant performance improvement. We also show how we can use the learned SVM classifier to extract "discriminative sequence motifs"--short regions of the original profile that contribute almost all the weight of the SVM classification score--and show that these discriminative motifs correspond to meaningful structural features in the protein data. The use of PSI-BLAST profiles can be seen as a semi-supervised learning technique, since PSI-BLAST leverages unlabeled data from a large sequence database to build more informative profiles. Recently presented "cluster kernels" give general semi-supervised methods for improving SVM protein classification performance. We show that our profile kernel results also outperform cluster kernels while providing much better scalability to large datasets.  相似文献   

9.
Zhao N  Pang B  Shyu CR  Korkin D 《Proteomics》2011,11(22):4321-4330
Structural knowledge about protein-protein interactions can provide insights to the basic processes underlying cell function. Recent progress in experimental and computational structural biology has led to a rapid growth of experimentally resolved structures and computationally determined near-native models of protein-protein interactions. However, determining whether a protein-protein interaction is physiological or it is the artifact of an experimental or computational method remains a challenging problem. In this work, we have addressed two related problems. The first problem is distinguishing between the experimentally obtained physiological and crystal-packing protein-protein interactions. The second problem is concerned with the classification of near-native and inaccurate docking models. We first defined a universal set of interface features and employed a support vector machines (SVM)-based approach to classify the interactions for both problems, with the accuracy, precision, and recall for the first problem classifier reaching 93%. To improve the classification, we next developed a semi-supervised learning approach for the second problem, using transductive SVM (TSVM). We applied both classifiers to a commonly used protein docking benchmark of 124 complexes. We found that while we reached the classification accuracies of 78.9% for the SVM classifier and 80.3% for the TSVM classifier, improving protein-docking methods by model re-ranking remains a challenging problem.  相似文献   

10.
《IRBM》2020,41(4):195-204
ObjectivesMammography mass recognition is considered as a very challenge pattern recognition problem due to the high similarity between normal and abnormal masses. Therefore, the main objective of this study is to develop an efficient and optimized two-stage recognition model to tackle this recognition task.Material and methodsBasically, the developed recognition model combines an ensemble of linear Support Vector Machine (SVM) classifiers with a Reinforcement Learning-based Memetic Particle Swarm Optimizer (RLMPSO) as RLMPSO-SVM recognition model. RLMPSO is used to construct a two-stage of an ensemble of linear SVM classifiers by performing simultaneous SVM parameters tuning, features selection, and training instances selection. The first stage of RLMPSO-SVM recognition model is responsible about recognizing the input ROI mammography masses as normal or abnormal mass pattern. Meanwhile, the second stage of RLMPSO-SVM model used to perform further recognition for abnormal ROIs as malignant or benign masses. In order to evaluate the effectiveness of RLMPSO-SVM, a total of 1187 normal ROIs, 111 malignant ROIs, and 135 benign ROIs were randomly selected from DDSM database images.ResultsReported results indicated that RLMPSO-SVM model was able to achieve performances of 97.57% sensitivity rate with 97.86% specificity rate for normal vs. abnormal recognition cases. For malignant vs. benign recognition performance it was reported of 97.81% sensitivity rate with 96.92% specificity rate.ConclusionReported results indicated that RLMPSO-SVM recognition model is an effective tool that could assist the radiologist during the diagnosis of the presented abnormalities in mammography images. The outcomes indicated that RLMPSO-SVM significantly outperformed various SVM-based models as well as other variants of computational intelligence models including multi-layer perceptron, naive Bayes classifier, and k-nearest neighbor.  相似文献   

11.
《Genomics》2020,112(5):3089-3096
Automatic classification of glaucoma from fundus images is a vital diagnostic tool for Computer-Aided Diagnosis System (CAD). In this work, a novel fused feature extraction technique and ensemble classifier fusion is proposed for diagnosis of glaucoma. The proposed method comprises of three stages. Initially, the fundus images are subjected to preprocessing followed by feature extraction and feature fusion by Intra-Class and Extra-Class Discriminative Correlation Analysis (IEDCA). The feature fusion approach eliminates between-class correlation while retaining sufficient Feature Dimension (FD) for Correlation Analysis (CA). The fused features are then fed to the classifiers namely Support Vector Machine (SVM), Random Forest (RF) and K-Nearest Neighbor (KNN) for classification individually. Finally, Classifier fusion is also designed which combines the decision of the ensemble of classifiers based on Consensus-based Combining Method (CCM). CCM based Classifier fusion adjusts the weights iteratively after comparing the outputs of all the classifiers. The proposed fusion classifier provides a better improvement in accuracy and convergence when compared to the individual algorithms. A classification accuracy of 99.2% is accomplished by the two-level hybrid fusion approach. The method is evaluated on the public datasets High Resolution Fundus (HRF) and DRIVE datasets with cross dataset validation.  相似文献   

12.
Prototype based classifiers are effective algorithms in modeling classification problems and have been applied in multiple domains. While many supervised learning algorithms have been successfully extended to kernels to improve the discrimination power by means of the kernel concept, prototype based classifiers are typically still used with Euclidean distance measures. Kernelized variants of prototype based classifiers are currently too complex to be applied for larger data sets. Here we propose an extension of Kernelized Generalized Learning Vector Quantization (KGLVQ) employing a sparsity and approximation technique to reduce the learning complexity. We provide generalization error bounds and experimental results on real world data, showing that the extended approach is comparable to SVM on different public data.  相似文献   

13.
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/~meshi/functionPrediction.  相似文献   

14.
Dabney AR  Storey JD 《PloS one》2007,2(10):e1002
Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.  相似文献   

15.
We propose a novel method for recognizing sequential patterns such as motion trajectory of biological objects (i.e., cells, organelle, protein molecules, etc.), human behavior motion, and meteorological data. In the proposed method, a local classifier is prepared for every point (or timing or frame) and then the whole pattern is recognized by majority voting of the recognition results of the local classifiers. The voting strategy has a strong benefit that even if an input pattern has a very large deviation from a prototype locally at several points, they do not severely influence the recognition result; they are treated just as several incorrect votes and thus will be neglected successfully through the majority voting. For regularizing the recognition result, we introduce partial-dependency to local classifiers. An important point is that this dependency is introduced to not only local classifiers at neighboring point pairs but also to those at distant point pairs. Although, the dependency makes the problem non-Markovian (i.e., higher-order Markovian), it can still be solved efficiently by using a graph cut algorithm with polynomial-order computations. The experimental results revealed that the proposed method can achieve better recognition accuracy while utilizing the above characteristics of the proposed method.  相似文献   

16.
Species- and individual-specific animal calls can be used in identification as verified in playback experiments and analyses of features extracted from these signals. The use of machine-learning methods and acoustic features borrowed from human speech recognition to identify animals at the species and individual level has increased recently. To date there have been few studies comparing the performances of these methods and features used for call-type-independent species and individual identification. We compared the performance of four machine-learning classifiers in the identification of ten passerine species, and individual identification for three passerines using two acoustic features. The methods did not require us to pre-categorize the component syllables in call-type-independent species and individual identification systems. The results of our experiment indicated that support vector machines (SVM) performed best generally, regardless of which acoustic feature was used, linear predictive coefficients (LPCs) increased the recognition accuracies of hidden Markov models (HMM) greatly, and the most appropriate classifiers for LPCs and Mel-frequency cepstral coefficients (MFCCs) were HMM and SVM respectively. This study will assist researchers in selecting classifiers and features to use in future species and individual recognition studies.  相似文献   

17.
Support vector machine applications in bioinformatics   总被引:14,自引:0,他引:14  
  相似文献   

18.
N. Bhaskar  M. Suchetha 《IRBM》2021,42(4):268-276
ObjectivesIn this paper, we propose a computationally efficient Correlational Neural Network (CorrNN) learning model and an automated diagnosis system for detecting Chronic Kidney Disease (CKD). A Support Vector Machine (SVM) classifier is integrated with the CorrNN model for improving the prediction accuracy.Material and methodsThe proposed hybrid model is trained and tested with a novel sensing module. We have monitored the concentration of urea in the saliva sample to detect the disease. Experiments are carried out to test the model with real-time samples and to compare its performance with conventional Convolutional Neural Network (CNN) and other traditional data classification methods.ResultsThe proposed method outperforms the conventional methods in terms of computational speed and prediction accuracy. The CorrNN-SVM combined network achieved a prediction accuracy of 98.67%. The experimental evaluations show a reduction in overall computation time of about 9.85% compared to the conventional CNN algorithm.ConclusionThe use of the SVM classifier has improved the capability of the network to make predictions more accurately. The proposed framework substantially advances the current methodology, and it provides more precise results compared to other data classification methods.  相似文献   

19.
Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.  相似文献   

20.
Microarrays have thousands to tens-of-thousands of gene features, but only a few hundred patient samples are available. The fundamental problem in microarray data analysis is identifying genes whose disruption causes congenital or acquired disease in humans. In this paper, we propose a new evolutionary method that can efficiently select a subset of potentially informative genes for support vector machine (SVM) classifiers. The proposed evolutionary method uses SVM with a given subset of gene features to evaluate the fitness function, and new subsets of features are selected based on the estimates of generalization error of SVMs and frequency of occurrence of the features in the evolutionary approach. Thus, in theory, selected genes reflect to some extent the generalization performance of SVM classifiers. We compare our proposed method with several existing methods and find that the proposed method can obtain better classification accuracy with a smaller number of selected genes than the existing methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号