首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We compared the classification accuracy of two sections of the fungal internal transcribed spacer (ITS) region, individually and combined, and the 5′ section (about 600 bp) of the large-subunit rRNA (LSU), using a naive Bayesian classifier and BLASTN. A hand-curated ITS-LSU training set of 1,091 sequences and a larger training set of 8,967 ITS region sequences were used. Of the factors evaluated, database composition and quality had the largest effect on classification accuracy, followed by fragment size and use of a bootstrap cutoff to improve classification confidence. The naive Bayesian classifier and BLASTN gave similar results at higher taxonomic levels, but the classifier was faster and more accurate at the genus level when a bootstrap cutoff was used. All of the ITS and LSU sections performed well (>97.7% accuracy) at higher taxonomic ranks from kingdom to family, and differences between them were small at the genus level (within 0.66 to 1.23%). When full-length sequence sections were used, the LSU outperformed the ITS1 and ITS2 fragments at the genus level, but the ITS1 and ITS2 showed higher accuracy when smaller fragment sizes of the same length and a 50% bootstrap cutoff were used. In a comparison using the larger ITS training set, ITS1 and ITS2 had very similar accuracy classification for fragments between 100 and 200 bp. Collectively, the results show that any of the ITS or LSU sections we tested provided comparable classification accuracy to the genus level and underscore the need for larger and more diverse classification training sets.  相似文献   

2.
In this paper we propose a new technique that adaptively extracts subject specific motor imagery related EEG patterns in the space–time–frequency plane for single trial classification. The proposed approach requires no prior knowledge of reactive frequency bands, their temporal behavior or cortical locations. For a given electrode array, it finds all these parameters by constructing electrode adaptive time–frequency segmentations that are optimized for discrimination. This is accomplished first by segmenting the EEG along the time axis with Local Cosine Packets. Next the most discriminant frequency subbands are selected in each time segment with a frequency axis clustering algorithm to achieve time and frequency band adaptation individually. Finally the subject adapted features are sorted according to their discrimination power to reduce dimensionality and the top subset is used for final classification. We provide experimental results for 5 subjects of the BCI competition 2005 dataset IVa to show the superior performance of the proposed method. In particular, we demonstrate that by using a linear support vector machine as a classifier, the classification accuracy of the proposed algorithm varied between 90.5% and 99.7% and the average classification accuracy was 96%.  相似文献   

3.
Identifying biomarkers that are indicative of a phenotypic state is difficult because of the amount of natural variability which exists in any population. While there are many different algorithms to select biomarkers, previous investigation shows the sensitivity and flexibility of support vector machines (SVM) make them an attractive candidate. Here we evaluate the ability of support vector machine recursive feature elimination (SVM-RFE) to identify potential metabolic biomarkers in liquid chromatography mass spectrometry untargeted metabolite datasets. Two separate experiments are considered, a low variance (low biological noise) prokaryotic stress experiment, and a high variance (high biological noise) mammalian stress experiment. For each experiment, the phenotypic response to stress is metabolically characterized. SVM-based classification and metabolite ranking is undertaken using a systematically reduced number of biological replicates to evaluate the impact of sample size on biomarker reproducibility and robustness. Our results indicate the highest ranked 1 % of metabolites, the most predictive of the physiological state, were identified by SVM-RFE even when the number of training examples was small (≥3) and the coefficient of variation was high (>0.5). An accuracy analysis shows filtering with recursive feature elimination measurably improves SVM classification accuracy, an effect that is pronounced when the number of training examples is small. These results indicate that SVM-RFE can be successful at biomarker identification even in challenging scenarios where the training examples are noisy and the number of biological replicates is low.  相似文献   

4.
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the ‘state-of-the-art’ transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.  相似文献   

5.
A recently developed machine learning algorithm referred to as Extreme Learning Machine (ELM) was used to classify machine control commands out of time series of spike trains of ensembles of CA1 hippocampus neurons (n = 34) of a rat, which was performing a target-to-goal task on a two-dimensional space through a brain-machine interface system. Performance of ELM was analyzed in terms of training time and classification accuracy. The results showed that some processes such as class code prefix, redundancy code suffix and smoothing effect of the classifiers' outputs could improve the accuracy of classification of robot control commands for a brain-machine interface system.  相似文献   

6.
OBJECTIVE: To investigate of the potential value of morphometry and discriminant analysis for the classification of benign and malignant gastric cells and lesions. STUDY DESIGN: The data set consisted of 13,300 cells from 120 cases composed of 30 cases of cancer, 26 cases of gastritis and 64 cases of ulcer according to the final histologic diagnosis. The cytologic diagnosis was divided into 5 categories (gastritis, ulcer, inflammatory dysplasia, cancer and true dysplasia). Classification was attempted at 2 levels: the cell level to classify individual cells and the case level to classify individual cases. For the cellular classification the measured cells from 50% of available cases were selected as a training set to construct a model. The cells from the remaining cases were used as a test set to validate the model. Similarly for case classification, the same 50% of cases that were used for cell classification were used as a training set and the remaining cases as a test set. Images of routinely processed gastric smears stained by the Papanicolaou technique were analyzed by a customized image analysis system. RESULTS: Application of discriminant analysis on the test set gave correct classification of 98.4% of benign cells and 67.1% of malignant cells. On case classification, 100% accuracy was achieved for benign and malignant cases, both for the training and test sets. CONCLUSION: The application of discriminant analysis described in this paper could produce significant classification results at the cellular and individual case level.  相似文献   

7.
The aim of this study was the development, evaluation and analysis of a neuro-fuzzy classifier for a supervised and hard classification of coastal environmental vulnerability due to marine aquaculture using minimal training sets within a Geographic Information System (GIS). The neuro-fuzzy classification model NEFCLASS‐J, was used to develop learning algorithms to create the structure (rule base) and the parameters (fuzzy sets) of a fuzzy classifier from a set of labeled data. The training sites were manually classified based on four categories of coastal environmental vulnerability through meetings and interviews with experts having field experience and specific knowledge of the environmental problems investigated. The inter-class separability estimations were performed on the training data set to assess the difficulty of the class separation problem under investigation. The two training data sets did not follow the assumptions of multivariate normality. For this reason Bhattacharyy and Jeffries–Matusita distances were used to estimate the probability of correct classification. Further evaluation and analysis of the quality of the classification achieved low values of quantity and allocation disagreement and a good overall accuracy. For each of the four classes the user and producer values for accuracy were between 77% and 100%.In conclusion, the use of a neuro-fuzzy classifier for a supervised and hard classification of coastal environmental vulnerability demonstrated an ability to derive an accurate and reliable classification using a minimal number of training sets.  相似文献   

8.
A range of single classifiers have been proposed to classify crop types using time series vegetation indices, and hybrid classifiers are used to improve discriminatory power. Traditional fusion rules use the product of multi-single classifiers, but that strategy cannot integrate the classification output of machine learning classifiers. In this research, the performance of two hybrid strategies, multiple voting (M-voting) and probabilistic fusion (P-fusion), for crop classification using NDVI time series were tested with different training sample sizes at both pixel and object levels, and two representative counties in north Xinjiang were selected as study area. The single classifiers employed in this research included Random Forest (RF), Support Vector Machine (SVM), and See 5 (C 5.0). The results indicated that classification performance improved (increased the mean overall accuracy by 5%~10%, and reduced standard deviation of overall accuracy by around 1%) substantially with the training sample number, and when the training sample size was small (50 or 100 training samples), hybrid classifiers substantially outperformed single classifiers with higher mean overall accuracy (1%~2%). However, when abundant training samples (4,000) were employed, single classifiers could achieve good classification accuracy, and all classifiers obtained similar performances. Additionally, although object-based classification did not improve accuracy, it resulted in greater visual appeal, especially in study areas with a heterogeneous cropping pattern.  相似文献   

9.
Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.  相似文献   

10.
Minimum squared error based classification (MSEC) method establishes a unique classification model for all the test samples. However, this classification model may be not optimal for each test sample. This paper proposes an improved MSEC (IMSEC) method, which is tailored for each test sample. The proposed method first roughly identifies the possible classes of the test sample, and then establishes a minimum squared error (MSE) model based on the training samples from these possible classes of the test sample. We apply our method to face recognition. The experimental results on several datasets show that IMSEC outperforms MSEC and the other state-of-the-art methods in terms of accuracy.  相似文献   

11.
In this paper, I describe a set of procedures that automate forest disturbance mapping using a pair of Landsat images. The approach is built on the traditional pair-wise change detection method, but is designed to extract training data without user interaction and uses a robust classification algorithm capable of handling incorrectly labeled training data. The steps in this procedure include: i) creating masks for water, non-forested areas, clouds, and cloud shadows; ii) identifying training pixels whose value is above or below a threshold defined by the number of standard deviations from the mean value of the histograms generated from local windows in the short-wave infrared (SWIR) difference image; iii) filtering the original training data through a number of classification algorithms using an n-fold cross validation to eliminate mislabeled training samples; and finally, iv) mapping forest disturbance using a supervised classification algorithm. When applied to 17 Landsat footprints across the U.S. at five-year intervals between 1985 and 2010, the proposed approach produced forest disturbance maps with 80 to 95% overall accuracy, comparable to those obtained from traditional approaches to forest change detection. The primary sources of mis-classification errors included inaccurate identification of forests (errors of commission), issues related to the land/water mask, and clouds and cloud shadows missed during image screening. The approach requires images from the peak growing season, at least for the deciduous forest sites, and cannot readily distinguish forest harvest from natural disturbances or other types of land cover change. The accuracy of detecting forest disturbance diminishes with the number of years between the images that make up the image pair. Nevertheless, the relatively high accuracies, little or no user input needed for processing, speed of map production, and simplicity of the approach make the new method especially practical for forest cover change analysis over very large regions.  相似文献   

12.
13.
14.
Hypothetical protein [HP] annotation poses a great challenge especially when the protein is putatively linked or mapped to another protein. With protein interaction networks (PIN) prevailing, many visualizers still remain unsupported to the HP annotation. Through this work, we propose a six-point classification system to validate protein interactions based on diverse features. The HP data-set was used as a training data-set to find putative functional interaction partners to the remaining proteins that are waiting to be interacting. A Total Reliability Score (TRS) was calculated based on the six-point classification which was evaluated using machine learning algorithm on a single node. We found that multilayer perceptron of neural network yielded 81.08% of accuracy in modelling TRS whereas feature selection algorithms confirmed that all classification features are implementable. Furthermore statistical results using variance and co-variance analyses confirmed the usefulness of these classification metrics. It has been evaluated that of all the classification features, subcellular location (sorting signals) makes higher impact in predicting the function of HPs.  相似文献   

15.
《IRBM》2022,43(4):300-308
ObjectivesThis study investigates the performance of the Support Vector Machine (SVM) to classify non-real-time and real-time EMG signals. The study also compares training performance using personalized and generalized data from all subjects. Thus, an idea about the data sets to be used in the training of the real-time classification model has been put forward. In addition, real-time classification results were obtained for ten days, and it was observed how training oneself would affect the classification results.Material and methods:EMG data were acquired for 7 hand gestures from 8 healthy subjects to create the data set: fist, fingers spread, wave-in, wave-out, pronation, supination, and rest. Subjects repeated each gesture 30 times. The Myo armband with 8 dry surface electrodes was used for data acquisition.Results14 features of the EMG signals have been extracted and non-real-time classification has been made for each feature; the highest accuracy of 96.38% was obtained using root mean square (RMS) and integrated EMG features. Three (3) kernel functions of SVM were tested in non-real-time classification and the highest accuracy was obtained with Cubic SVM using 3rd order polynomial. For this reason, Cubic SVM was used for real-time classification using the features that gave the best results in non-real-time classification. A subject repeated the gestures and real-time classification was performed. The highest accuracy of 99.05% was obtained with the mean absolute value (MAV) feature. The real-time classification was undertaken on eight subjects using the MAV feature's best performance with an average accuracy of 95.83% using the personalized data set and 91.79% using the generalized data set.ConclusionThe greatest accuracy is obtained by training the classifier with the subject's own data. Thus, it can be said that EMG signals are personal, just like fingerprints and retina. In addition, as a result, the tests repeated for 10 days showed the repeatability of the activation of the relevant muscle set and the training takes place and how this can be applied to those who will use prosthetic hands to obtain certain gestures.  相似文献   

16.
Data transformations prior to analysis may be beneficial in classification tasks. In this article we investigate a set of such transformations on 2D graph-data derived from facial images and their effect on classification accuracy in a high-dimensional setting. These transformations are low-variance in the sense that each involves only a fixed small number of input features. We show that classification accuracy can be improved when penalized regression techniques are employed, as compared to a principal component analysis (PCA) pre-processing step. In our data example classification accuracy improves from 47% to 62% when switching from PCA to penalized regression. A second goal is to visualize the resulting classifiers. We develop importance plots highlighting the influence of coordinates in the original 2D space. Features used for classification are mapped to coordinates in the original images and combined into an importance measure for each pixel. These plots assist in assessing plausibility of classifiers, interpretation of classifiers, and determination of the relative importance of different features.  相似文献   

17.
将63例II型糖尿病患者以及140例正常人皮肤的自体荧光光谱分为训练集和测试集两类,针对常用的四种核函数,运用交叉验证、网格寻优法计算最优分类参数,然后结合训练集建模并对测试集分类,结果显示使用径向基核函数时分类效果相对最佳。在此基础上,构建了一种基于线性核函数与径向基核函数的混合核函数,该核函数对人体皮肤自体荧光光谱的分类效果较之于径向基核函数更优,其分类正确率为82.61%,敏感性为69.57%,特异性为95.65%。研究结果表明支持向量机可用于人体皮肤自体荧光光谱的分类,有助于提高糖尿病筛查的正确率。  相似文献   

18.
Ultrasound can be used to study tendon movement. However, measurement of tendon movement is mostly based on manual tracking of anatomical landmarks such as the musculo-tendinous junction, limiting the applicability to a small number of muscle-tendon units. The aim of this study was to quantify tendon displacement without anatomical landmarks using a speckle tracking algorithm optimized for tendons in long B-mode image sequences. A dedicated two-dimensional multi-kernel block-matching scheme with subpixel motion estimation was devised to handle large displacements over long sequences. The accuracy of the tracking on porcine tendons was evaluated during different displacements and velocities. Subsequently, the accuracy of tracking the flexor digitorum superficialis (FDS) of a human cadaver hand was evaluated. Finally, the in-vivo accuracy of the tendon tracking was determined by measuring the movement of the FDS at the wrist level. For the porcine experiment and the human cadaver arm experiment tracking errors were, on average, 0.08 and 0.05 mm, respectively (1.3% and 1.0%). For the in-vivo experiment the tracking error was, on average, 0.3 mm (1.6%). This study demonstrated that our dedicated speckle tracking can quantify tendon displacement at different physiological velocities without anatomical landmarks with high accuracy. The technique allows tracking over large displacements and in a wider range of tendons than by using anatomical landmarks.  相似文献   

19.
Machine learning or deep learning models have been widely used for taxonomic classification of metagenomic sequences and many studies reported high classification accuracy. Such models are usually trained based on sequences in several training classes in hope of accurately classifying unknown sequences into these classes. However, when deploying the classification models on real testing data sets, sequences that do not belong to any of the training classes may be present and are falsely assigned to one of the training classes with high confidence. Such sequences are referred to as out-of-distribution (OOD) sequences and are ubiquitous in metagenomic studies. To address this problem, we develop a deep generative model-based method, MLR-OOD, that measures the probability of a testing sequencing belonging to OOD by the likelihood ratio of the maximum of the in-distribution (ID) class conditional likelihoods and the Markov chain likelihood of the testing sequence measuring the sequence complexity. We compose three different microbial data sets consisting of bacterial, viral, and plasmid sequences for comprehensively benchmarking OOD detection methods. We show that MLR-OOD achieves the state-of-the-art performance demonstrating the generality of MLR-OOD to various types of microbial data sets. It is also shown that MLR-OOD is robust to the GC content, which is a major confounding effect for OOD detection of genomic sequences. In conclusion, MLR-OOD will greatly reduce false positives caused by OOD sequences in metagenomic sequence classification.  相似文献   

20.
Automated segmentation and morphometry of fluorescently labeled cell nuclei in batches of 3D confocal stacks is essential for quantitative studies. Model-based segmentation algorithms are attractive due to their robustness. Previous methods incorporated a single nuclear model. This is a limitation for tissues containing multiple cell types with different nuclear features. Improved segmentation for such tissues requires algorithms that permit multiple models to be used simultaneously. This requires a tight integration of classification and segmentation algorithms. Two or more nuclear models are constructed semiautomatically from user-provided training examples. Starting with an initial over-segmentation produced by a gradient-weighted watershed algorithm, a hierarchical fragment merging tree rooted at each object is built. Linear discriminant analysis is used to classify each candidate using multiple object models. On the basis of the selected class, a Bayesian score is computed. Fragment merging decisions are made by comparing the score with that of other candidates, and the scores of constituent fragments of each candidate. The overall segmentation accuracy was 93.7% and classification accuracy was 93.5%, respectively, on a diverse collection of images drawn from five different regions of the rat brain. The multi-model method was found to achieve high accuracy on nuclear segmentation and classification by correctly resolving ambiguities in clustered regions containing heterogeneous cell populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号