首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Millions of cells are present in thousands of images created in high-throughput screening (HTS). Biologists could classify each of these cells into a phenotype by visual inspection. But in the presence of millions of cells this visual classification task becomes infeasible. Biologists train classification models on a few thousand visually classified example cells and iteratively improve the training data by visual inspection of the important misclassified phenotypes. Classification methods differ in performance and performance evaluation time. We present a comparative study of computational performance of gentle boosting, joint boosting CellProfiler Analyst (CPA), support vector machines (linear and radial basis function) and linear discriminant analysis (LDA) on two data sets of HT29 and HeLa cancer cells.

Results

For the HT29 data set we find that gentle boosting, SVM (linear) and SVM (RBF) are close in performance but SVM (linear) is faster than gentle boosting and SVM (RBF). For the HT29 data set the average performance difference between SVM (RBF) and SVM (linear) is 0.42 %. For the HeLa data set we find that SVM (RBF) outperforms other classification methods and is on average 1.41 % better in performance than SVM (linear).

Conclusions

Our study proposes SVM (linear) for iterative improvement of the training data and SVM (RBF) for the final classifier to classify all unlabeled cells in the whole data set.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-342) contains supplementary material, which is available to authorized users.  相似文献   

2.
Using surface electromyography (sEMG) signal for efficient recognition of hand gestures has attracted increasing attention during the last decade, with most previous work being focused on recognition of upper arm and gross hand movements and some work on the classification of individual finger movements such as finger typing tasks. However, relatively few investigations can be found in the literature for automatic classification of multiple finger movements such as finger number gestures. This paper focuses on the recognition of number gestures based on a 4-channel wireless sEMG system. We investigate the effects of three popular feature types (i.e. Hudgins’ time–domain features (TD), autocorrelation and cross-correlation coefficients (ACCC) and spectral power magnitudes (SPM)) and four popular classification algorithms (i.e. k-nearest neighbor (k-NN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and support vector machine (SVM)) in offline recognition. Motivated by the good performance of SVM, we further propose combining the three features and employing a new classification method, multiple kernel learning SVM (MKL-SVM). Real sEMG results from six subjects show that all combinations, except k-NN or LDA using ACCC features, can achieve above 91% average recognition accuracy, and the highest accuracy is 97.93% achieved by the proposed MKL-SVM method using the three feature combination (3F). Referring to the offline recognition results, we also implement a real-time recognition system. Our results show that all six subjects can achieve a real-time recognition accuracy higher than 90%. The number gestures are therefore promising for practical applications such as human–computer interaction (HCI).  相似文献   

3.
N. Bhaskar  M. Suchetha 《IRBM》2021,42(4):268-276
ObjectivesIn this paper, we propose a computationally efficient Correlational Neural Network (CorrNN) learning model and an automated diagnosis system for detecting Chronic Kidney Disease (CKD). A Support Vector Machine (SVM) classifier is integrated with the CorrNN model for improving the prediction accuracy.Material and methodsThe proposed hybrid model is trained and tested with a novel sensing module. We have monitored the concentration of urea in the saliva sample to detect the disease. Experiments are carried out to test the model with real-time samples and to compare its performance with conventional Convolutional Neural Network (CNN) and other traditional data classification methods.ResultsThe proposed method outperforms the conventional methods in terms of computational speed and prediction accuracy. The CorrNN-SVM combined network achieved a prediction accuracy of 98.67%. The experimental evaluations show a reduction in overall computation time of about 9.85% compared to the conventional CNN algorithm.ConclusionThe use of the SVM classifier has improved the capability of the network to make predictions more accurately. The proposed framework substantially advances the current methodology, and it provides more precise results compared to other data classification methods.  相似文献   

4.
In the drug discovery process, the metabolic fate of drugs is crucially important to prevent drug-drug interactions. Therefore, P450 isozyme selectivity prediction is an important task for screening drugs of appropriate metabolism profiles. Recently, large-scale activity data of five P450 isozymes (CYP1A2 CYP2C9, CYP3A4, CYP2D6, and CYP2C19) have been obtained using quantitative high-throughput screening with a bioluminescence assay. Although some isozymes share similar selectivities, conventional supervised learning algorithms independently learn a prediction model from each P450 isozyme. They are unable to exploit the other P450 isozyme activity data to improve the predictive performance of each P450 isozyme's selectivity. To address this issue, we apply transfer learning that uses activity data of the other isozymes to learn a prediction model from multiple P450 isozymes. After using the large-scale P450 isozyme selectivity dataset for five P450 isozymes, we evaluate the model's predictive performance. Experimental results show that, overall, our algorithm outperforms conventional supervised learning algorithms such as support vector machine (SVM), Weighted k-nearest neighbor classifier, Bagging, Adaboost, and latent semantic indexing (LSI). Moreover, our results show that the predictive performance of our algorithm is improved by exploiting the multiple P450 isozyme activity data in the learning process. Our algorithm can be an effective tool for P450 selectivity prediction for new chemical entities using multiple P450 isozyme activity data.  相似文献   

5.
We consider the efficient initialization of structure and parameters of generalized Gaussian radial basis function (RBF) networks using fuzzy decision trees generated by fuzzy ID3 like induction algorithms. The initialization scheme is based on the proposed functional equivalence property of fuzzy decision trees and generalized Gaussian RBF networks. The resulting RBF network is compact, easy to induce, comprehensible, and has acceptable classification accuracy with stochastic gradient descent learning algorithm.  相似文献   

6.
Chopra P  Lee J  Kang J  Lee S 《PloS one》2010,5(12):e14305
Recent studies suggest that the deregulation of pathways, rather than individual genes, may be critical in triggering carcinogenesis. The pathway deregulation is often caused by the simultaneous deregulation of more than one gene in the pathway. This suggests that robust gene pair combinations may exploit the underlying bio-molecular reactions that are relevant to the pathway deregulation and thus they could provide better biomarkers for cancer, as compared to individual genes. In order to validate this hypothesis, in this paper, we used gene pair combinations, called doublets, as input to the cancer classification algorithms, instead of the original expression values, and we showed that the classification accuracy was consistently improved across different datasets and classification algorithms. We validated the proposed approach using nine cancer datasets and five classification algorithms including Prediction Analysis for Microarrays (PAM), C4.5 Decision Trees (DT), Naive Bayesian (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN).  相似文献   

7.
It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically-inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.  相似文献   

8.
This paper presents the pruning and model-selecting algorithms to the support vector learning for sample classification and function regression. When constructing RBF network by support vector learning we occasionally obtain redundant support vectors which do not significantly affect the final classification and function approximation results. The pruning algorithms primarily based on the sensitivity measure and the penalty term. The kernel function parameters and the position of each support vector are updated in order to have minimal increase in error, and this makes the structure of SVM network more flexible. We illustrate this approach with synthetic data simulation and face detection problem in order to demonstrate the pruning effectiveness.  相似文献   

9.
Mass spectrometry (MS)-based metabolomics studies often require handling of both identified and unidentified metabolite data. In order to avoid bias in data interpretation, it would be of advantage for the data analysis to include all available data. A practical challenge in exploratory metabolomics analysis is therefore how to interpret the changes related to unidentified peaks. In this paper, we address the challenge by predicting the class membership of unknown peaks by applying and comparing multiple supervised classifiers to selected lipidomics datasets. The employed classifiers include k-nearest neighbours (k-NN), support vector machines (SVM), partial least squares and discriminant analysis (PLS-DA) and Naive Bayes methods which are known to be effective and efficient in predicting the labels for unseen data. Here, the class label predictions are sought for unidentified lipid profiles coming from high throughput global screening in Ultra Performance Liquid Chromatography Mass Spectrometry (UPLCTM/MS) experimental setup. Our investigation reveals that k-NN and SVM classifiers outperform both PLS-DA and Naive Bayes classifiers. Naive Bayes classifier perform poorly among all models and this observation seems logical as lipids are highly co-regulated and do not respect Naive Bayes assumptions of features being conditionally independent given the class. Common label predictions from k-NN and SVM can serve as a good starting point to explore full data and thereby facilitating exploratory studies where label information is critical for the data interpretation.  相似文献   

10.
Computational models of cytochrome P450 3A4 inhibition were developed based on high-throughput screening data for 4470 proprietary compounds. Multiple models differentiating inhibitors (IC(50) <3 microM) and noninhibitors were generated using various machine-learning algorithms (recursive partitioning [RP], Bayesian classifier, logistic regression, k-nearest-neighbor, and support vector machine [SVM]) with structural fingerprints and topological indices. Nineteen models were evaluated by internal 10-fold cross-validation and also by an independent test set. Three most predictive models, Barnard Chemical Information (BCI)-fingerprint/SVM, MDL-keyset/SVM, and topological indices/RP, correctly classified 249, 248, and 236 compounds of 291 noninhibitors and 135, 137, and 147 compounds of 179 inhibitors in the validation set. Their overall accuracies were 82%, 82%, and 81%, respectively. Investigating applicability of the BCI/SVM model found a strong correlation between the predictive performance and the structural similarity to the training set. Using Tanimoto similarity index as a confidence measurement for the predictions, the limitation of the extrapolation was 0.7 in the case of the BCI/SVM model. Taking consensus of the 3 best models yielded a further improvement in predictive capability, kappa = 0.65 and accuracy = 83%. The consensus model could also be tuned to minimize either false positives or false negatives depending on the emphasis of the screening.  相似文献   

11.
Anticipated hand movements of amputee subjects are considered difficult to classify using only Electromyogram (EMG) signals and machine learning techniques. For a long time, classifying such s-EMG signals have been considered as a non-linear problem, and the problem of signal sparsity has not been given detailed attention in a large set of action classes. For addressing these problems, this paper is proposing a linear-time classifier termed as Random Fourier Mapped Collaborative Representation with distance weighted Tikhonov regularization matrix (RFMCRT). RFMCRT attempts to tackle the non-linear problem via Random Fourier Features and sparsity issue with collaborative representation. The projection error of Random Fourier Features is reduced by projecting to the same dimension as the original feature space and later finding the collaborative representation, with an optional non-negative constraint (RFMNNCRT). The proposed two classifiers were tested with time-domain features computed from the EMG signals obtained from NINAPRO databases using a non-overlapping sliding window size of 256 ms. Due to the random nature of our proposed classifiers, this paper has computed the average and worst-case performance for 50 trials and compared them with other reported classifiers. The results show that RFMNNCRT (average case) outperformed state-of-the-art classifiers with the accuracy of 93.44% for intact subjects and 55.67% for amputee subjects. In the worst-case situation, RFMCRT achieves considerable performance for the same, with the reported accuracy of 91.55% and 50.27% respectively. Our proposed classifier guarantees acceptable levels of accuracy for large classes of hand movements and also maintains good computational efficiency in comparison to LDA and SVM.  相似文献   

12.
Song S  Zhan Z  Long Z  Zhang J  Yao L 《PloS one》2011,6(2):e17191

Background

Support vector machine (SVM) has been widely used as accurate and reliable method to decipher brain patterns from functional MRI (fMRI) data. Previous studies have not found a clear benefit for non-linear (polynomial kernel) SVM versus linear one. Here, a more effective non-linear SVM using radial basis function (RBF) kernel is compared with linear SVM. Different from traditional studies which focused either merely on the evaluation of different types of SVM or the voxel selection methods, we aimed to investigate the overall performance of linear and RBF SVM for fMRI classification together with voxel selection schemes on classification accuracy and time-consuming.

Methodology/Principal Findings

Six different voxel selection methods were employed to decide which voxels of fMRI data would be included in SVM classifiers with linear and RBF kernels in classifying 4-category objects. Then the overall performances of voxel selection and classification methods were compared. Results showed that: (1) Voxel selection had an important impact on the classification accuracy of the classifiers: in a relative low dimensional feature space, RBF SVM outperformed linear SVM significantly; in a relative high dimensional space, linear SVM performed better than its counterpart; (2) Considering the classification accuracy and time-consuming holistically, linear SVM with relative more voxels as features and RBF SVM with small set of voxels (after PCA) could achieve the better accuracy and cost shorter time.

Conclusions/Significance

The present work provides the first empirical result of linear and RBF SVM in classification of fMRI data, combined with voxel selection methods. Based on the findings, if only classification accuracy was concerned, RBF SVM with appropriate small voxels and linear SVM with relative more voxels were two suggested solutions; if users concerned more about the computational time, RBF SVM with relative small set of voxels when part of the principal components were kept as features was a better choice.  相似文献   

13.
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively.  相似文献   

14.
In recent years, the number of wildfires has increased all over the world. Therefore, mapping wildfire susceptibility is crucial for prevention, early detection, and supporting wildfire management decisions. This study aims to generate Machine Learning (ML) based wildfire susceptibility maps for Adana and Mersin provinces, which are located in the Mediterranean Region of Turkey. To generate a wildfire inventory, this study uses active fire pixels derived from MODIS monthly MCD14ML composites. Furthermore, as a sub aim, the performance of seven ML approaches, namely, stand-alone Logistic Regression (LR), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and ensemble algorithms, namely Random Forest (RF), Gradient Boosting (GB), eXtreme Gradient Boosting (XGB), and AdaBoost (AB), was evaluated based on wildfire susceptibility mapping. The capabilities of the corresponding ML methods were assessed using thirteen wildfire conditioning factors, which can be grouped into four main categories: topographical, meteorological, vegetation, and anthropogenic factors. The Information Gain (IG) approach was used to assess their importance scores. A multicollinearity analysis was also performed to assess the relationship between conditioning factors. To compare the predictive performances of ML algorithms, five performance metrics, namely average accuracy, precision, recall, F1 score, and area under the curve, were used. To test the significance of the generated wildfire susceptibility maps and to detect similarities and differences among the output of these ML algorithms, McNemar's test was implemented. In the end, the ML-based models were locally interpreted using the Shapley Additive exPlanations (SHAP) technique. The AUC values of seven methods varied from 0.817 to 0.879, and the accuracy scores ranged between 0.734 and 0.812. The results showed that the RF model provided the best results considering all performance metrics. The accuracy score and AUC values of the RF model were equal to 0.812 and 0.879, respectively. On the other hand, stand-alone algorithms (LDA, SVM, and LR) represented lower performance than tree-based ensemble methods. Both the IG and SHAP analyses showed that elevation, temperature, and slope factors were the most contributing factors. The RF model classifier found that 7.20% of the study area has very high wildfire susceptibility, and the majority of the wildfire samples (68.84%) correspond to the very high susceptible areas in the RF model. The outcomes of this study are likely to provide decision-makers with a better understanding of wildfires in the Eastern Mediterranean Region of Turkey.  相似文献   

15.
Yao Y  Zhang T  Xiong Y  Li L  Huo J  Wei DQ 《Biotechnology journal》2011,6(11):1367-1376
The support vector machine (SVM), an effective statistical learning method, has been widely used in mutation prediction. Two factors, i.e., feature selection and parameter setting, have shown great influence on the efficiency and accuracy of SVM classification. In this study, according to the principles of a genetic algorithm (GA) and SVM, we developed a GA-SVM program and applied it to human cytochrome P450s (CYP450s), which are important monooxygenases in phase I drug metabolism. The program optimizes features and parameters simultaneously, and hence fewer features are used and the overall prediction accuracy is improved. We focus on the mutation of non-synonymous single nucleotide polymorphisms (nsSNPs) in protein sequences that appear to exhibit significant influences on drug metabolism. The final predictive model has a quite satisfactory performance, with the prediction accuracy of 61% and cross-validation accuracy of 73%. The results indicate that the GA-SVM program is a powerful tool in optimizing mutation predictive models of nsSNPs of human CYP450s.  相似文献   

16.
The spectral fusion by Raman spectroscopy and Fourier infrared spectroscopy combined with pattern recognition algorithms is utilized to diagnose thyroid dysfunction serum, and finds the spectral segment with the highest sensitivity to further advance diagnosis speed. Compared with the single infrared spectroscopy or Raman spectroscopy, the proposal can improve the detection accuracy, and can obtain more spectral features, indicating greater differences between thyroid dysfunction and normal serum samples. For discriminating different samples, principal component analysis (PCA) was first used for feature extraction to reduce the dimension of high‐dimension spectral data and spectral fusion. Then, support vector machine (SVM), back propagation neural network, extreme learning machine and learning vector quantization algorithms were employed to establish the discriminant diagnostic models. The accuracy of spectral fusion of the best analytical model PCA‐SVM, single Raman spectral accuracy and single infrared spectral accuracy is 83.48%, 78.26% and 80%, respectively. The accuracy of spectral fusion is higher than the accuracy of single spectrum in five classifiers. And the diagnostic accuracy of spectral fusion in the range of 2000 to 2500 cm?1 is 81.74%, which greatly improves the sample measure speed and data analysis speed than analysis of full spectra. The results from our study demonstrate that the serum spectral fusion technique combined with multivariate statistical methods have great potential for the screening of thyroid dysfunction.  相似文献   

17.
One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''''the separability of a sample'''' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method''s essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate.  相似文献   

18.
Acquisition of the standard plane is the prerequisite of biometric measurement and diagnosis during the ultrasound (US) examination. In this paper, a new algorithm is developed for the automatic recognition of the fetal facial standard planes (FFSPs) such as the axial, coronal, and sagittal planes. Specifically, densely sampled root scale invariant feature transform (RootSIFT) features are extracted and then encoded by Fisher vector (FV). The Fisher network with multi-layer design is also developed to extract spatial information to boost the classification performance. Finally, automatic recognition of the FFSPs is implemented by support vector machine (SVM) classifier based on the stochastic dual coordinate ascent (SDCA) algorithm. Experimental results using our dataset demonstrate that the proposed method achieves an accuracy of 93.27% and a mean average precision (mAP) of 99.19% in recognizing different FFSPs. Furthermore, the comparative analyses reveal the superiority of the proposed method based on FV over the traditional methods.  相似文献   

19.
张霞  李占斌  张振文  邓彦 《生态学报》2012,32(21):6788-6794
预测陕西洛惠渠灌区地下水动态变化情况,在综合分析了各种地下水动态研究方法的基础上,提出了基于支持向量机和改进的BP神经网络模型的灌区地下水动态预测方法,并在MATLAB中编制了相应的计算机程序,建立了相应的地下水动态预测模型。以灌区多年实例数据为学习样本和测试样本,比较了两种模型的地下水动态预测优劣性。研究表明,支持向量机模型和BP网络模型在样本训练学习过程中都具较高的模拟精度,而在样本学习阶段,支持向量机的预测精度明显优于BP网络,可以很好的描述地下水动态复杂的耦合关系。支持向量机方法切实可行,更加适合大型灌区地下水动态预测,是对传统地下水动态研究方法的补充与完善。  相似文献   

20.
A support vector machine (SVM) modeling approach for short-term load forecasting is proposed. The SVM learning scheme is applied to the power load data, forcing the network to learn the inherent internal temporal property of power load sequence. We also study the performance when other related input variables such as temperature and humidity are considered. The performance of our proposed SVM modeling approach has been tested and compared with feed-forward neural network and cosine radial basis function neural network approaches. Numerical results show that the SVM approach yields better generalization capability and lower prediction error compared to those neural network approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号