首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Despite growing concerns over the health of global invertebrate diversity, terrestrial invertebrate monitoring efforts remain poorly geographically distributed. Machine-assisted classification has been proposed as a potential solution to quickly gather large amounts of data; however, previous studies have often used unrealistic or idealized datasets to train and test their models.In this study, we describe a practical methodology for including machine learning in ecological data acquisition pipelines. Here we train and test machine learning algorithms to classify over 72,000 terrestrial invertebrate specimens from morphometric data and contextual metadata. All vouchered specimens were collected in pitfall traps by the National Ecological Observatory Network (NEON) at 45 locations across the United States from 2016 to 2019. Specimens were photographed, and two separate machine learning paradigms were used to classify them. In the first, we used a convolutional neural network (ResNet-50), and in the second, we extracted morphometric data as feature vectors using ImageJ and used traditional machine learning methods to classify specimens. Issues stemming from inconsistent taxonomic label specificity were resolved by making classifications at the lowest identified taxonomic level (LITL). Taxa with too few specimens to be included in the training dataset were classified by the model using zero-shot classification.When classifying specimens that were known and seen by our models, we reached a maximum accuracy of 72.7% using eXtreme Gradient Boosting (XGBoost) at the LITL. This nearly matched the maximum accuracy achieved by the CNN of 72.8% at the LITL. Models that were trained without contextual metadata underperformed models with contextual metadata. We also classified invertebrate taxa that were unknown to the model using zero-shot classification, reaching a maximum accuracy of 65.5% when using the ResNet-50, compared to 39.4% when using XGBoost.The general methodology outlined here represents a realistic application of machine learning as a tool for ecological studies. We found that more advanced and complex machine learning methods such as convolutional neural networks are not necessarily more accurate than traditional machine learning methods. Hierarchical and LITL classifications allow for flexible taxonomic specificity at the input and output layers. These methods also help address the ‘long tail’ problem of underrepresented taxa missed by machine learning models. Finally, we encourage researchers to consider more than just morphometric data when training their models, as we have shown that the inclusion of contextual metadata can provide significant improvements to accuracy.  相似文献   

2.
A Lord  D Horn  M Breakspear  M Walter 《PloS one》2012,7(8):e41282
Major depression is a prevalent disorder that imposes a significant burden on society, yet objective laboratory-style tests to assist in diagnosis are lacking. We employed network-based analyses of "resting state" functional neuroimaging data to ascertain group differences in the endogenous cortical activity between healthy and depressed subjects.We additionally sought to use machine learning techniques to explore the ability of these network-based measures of resting state activity to provide diagnostic information for depression. Resting state fMRI data were acquired from twenty two depressed outpatients and twenty two healthy subjects matched for age and gender. These data were anatomically parcellated and functional connectivity matrices were then derived using the linear correlations between the BOLD signal fluctuations of all pairs of cortical and subcortical regions.We characterised the hierarchical organization of these matrices using network-based matrics, with an emphasis on their mid-scale "modularity" arrangement. Whilst whole brain measures of organization did not differ between groups, a significant rearrangement of their community structure was observed. Furthermore we were able to classify individuals with a high level of accuracy using a support vector machine, primarily through the use of a modularity-based metric known as the participation index.In conclusion, the application of machine learning techniques to features of resting state fMRI network activity shows promising potential to assist in the diagnosis of major depression, now suggesting the need for validation in independent data sets.  相似文献   

3.
Erroneous behavior usually elicits a distinct pattern in neural waveforms. In particular, inspection of the concurrent recorded electroencephalograms (EEG) typically reveals a negative potential at fronto-central electrodes shortly following a response error (Ne or ERN) as well as an error-awareness-related positivity (Pe). Seemingly, the brain signal contains information about the occurrence of an error. Assuming a general error evaluation system, the question arises whether this information can be utilized in order to classify behavioral performance within or even across different cognitive tasks. In the present study, a machine learning approach was employed to investigate the outlined issue. Ne as well as Pe were extracted from the single-trial EEG signals of participants conducting a flanker and a mental rotation task and subjected to a machine learning classification scheme (via a support vector machine, SVM). Overall, individual performance in the flanker task was classified more accurately, with accuracy rates of above 85%. Most importantly, it was even feasible to classify responses across both tasks. In particular, an SVM trained on the flanker task could identify erroneous behavior with almost 70% accuracy in the EEG data recorded during the rotation task, and vice versa. Summed up, we replicate that the response-related EEG signal can be used to identify erroneous behavior within a particular task. Going beyond this, it was possible to classify response types across functionally different tasks. Therefore, the outlined methodological approach appears promising with respect to future applications.  相似文献   

4.
A new machine learning method referred to as F-score_ELM was proposed to classify the lying and truth-telling using the electroencephalogram (EEG) signals from 28 guilty and innocent subjects. Thirty-one features were extracted from the probe responses from these subjects. Then, a recently-developed classifier called extreme learning machine (ELM) was combined with F-score, a simple but effective feature selection method, to jointly optimize the number of the hidden nodes of ELM and the feature subset by a grid-searching training procedure. The method was compared to two classification models combining principal component analysis with back-propagation network and support vector machine classifiers. We thoroughly assessed the performance of these classification models including the training and testing time, sensitivity and specificity from the training and testing sets, as well as network size. The experimental results showed that the number of the hidden nodes can be effectively optimized by the proposed method. Also, F-score_ELM obtained the best classification accuracy and required the shortest training and testing time.  相似文献   

5.
Recent years have witnessed an increasing interest in the application of machine learning to clinical informatics and healthcare systems. A significant amount of research has been done on healthcare systems based on supervised learning. In this study, we present a generalized solution to detect visually observable symptoms on faces using semi-supervised anomaly detection combined with machine vision algorithms. We rely on the disease-related statistical facts to detect abnormalities and classify them into multiple categories to narrow down the possible medical reasons of detecting. Our method is in contrast with most existing approaches, which are limited by the availability of labeled training data required for supervised learning, and therefore offers the major advantage of flagging any unusual and visually observable symptoms.  相似文献   

6.
A probabilistic modelling is presented to detect mental activity from gait signature recorded from healthy subjects. The proposed scheme is based on principal component analysis with reduced feature dimension followed by a na?ve Gaussian Bayes classifier. The leave-one-out cross-validation shows the detection accuracy of 94% with specificity and sensitivity of 96% and 98.3%, respectively. The research has a potential application in the prevention of elderly risk falls, lie detection and rehabilitation among Parkinson's patients.  相似文献   

7.
A probabilistic modelling is presented to detect mental activity from gait signature recorded from healthy subjects. The proposed scheme is based on principal component analysis with reduced feature dimension followed by a naïve Gaussian Bayes classifier. The leave-one-out cross-validation shows the detection accuracy of 94% with specificity and sensitivity of 96% and 98.3%, respectively. The research has a potential application in the prevention of elderly risk falls, lie detection and rehabilitation among Parkinson's patients.  相似文献   

8.
Suitable shark conservation depends on well-informed population assessments. Direct methods such as scientific surveys and fisheries monitoring are adequate for defining population statuses, but species-specific indices of abundance and distribution coming from these sources are rare for most shark species. We can rapidly fill these information gaps by boosting media-based remote monitoring efforts with machine learning and automation.We created a database of 53,345 shark images covering 219 species of sharks, and packaged object-detection and image classification models into a Shark Detector bundle. The Shark Detector recognizes and classifies sharks from videos and images using transfer learning and convolutional neural networks (CNNs). We applied these models to common data-generation approaches of sharks: collecting occurrence records from photographs taken by the public or citizen scientists, processing baited remote camera footage and online videos, and data-mining Instagram. We examined the accuracy of each model and tested genus and species prediction correctness as a result of training data quantity.The Shark Detector can classify 47 species pertaining to 26 genera. It sorted heterogeneous datasets of images sourced from Instagram with 91% accuracy and classified species with 70% accuracy. It located sharks in baited remote footage and YouTube videos with 89% accuracy, and classified located subjects to the species level with 69% accuracy. All data-generation methods were processed without manual interaction.As media-based remote monitoring appears to dominate methods for observing sharks in nature, we developed an open-source Shark Detector to facilitate common identification applications. Prediction accuracy of the software pipeline increases as more images are added to the training dataset. We provide public access to the software on our GitHub page.  相似文献   

9.
Supervised machine learning can be used to predict which drugs human cardiomyocytes have been exposed to. Using electrophysiological data collected from human cardiomyocytes with known exposure to different drugs, a supervised machine learning algorithm can be trained to recognize and classify cells that have been exposed to an unknown drug. Furthermore, the learning algorithm provides information on the relative contribution of each data parameter to the overall classification. Probabilities and confidence in the accuracy of each classification may also be determined by the algorithm. In this study, the electrophysiological effects of β–adrenergic drugs, propranolol and isoproterenol, on cardiomyocytes derived from human induced pluripotent stem cells (hiPS-CM) were assessed. The electrophysiological data were collected using high temporal resolution 2-photon microscopy of voltage sensitive dyes as a reporter of membrane voltage. The results demonstrate the ability of our algorithm to accurately assess, classify, and predict hiPS-CM membrane depolarization following exposure to chronotropic drugs.  相似文献   

10.
Semantic priming is usually studied by examining ERPs over many trials and subjects. This article aims at detecting semantic priming at the single-trial level. By using machine learning techniques it is possible to analyse and classify short traces of brain activity, which could, for example, be used to build a Brain Computer Interface (BCI). This article describes an experiment where subjects were presented with word pairs and asked to decide whether the words were related or not. A classifier was trained to determine whether the subjects judged words as related or unrelated based on one second of EEG data. The results show that the classifier accuracy when training per subject varies between 54% and 67%, and is significantly above chance level for all subjects (N  = 12) and the accuracy when training over subjects varies between 51% and 63%, and is significantly above chance level for 11 subjects, pointing to a general effect.  相似文献   

11.
Existing computational pipelines for quantitative analysis of high‐content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone‐arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open‐source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high‐content microscopy data.  相似文献   

12.
环境微生物研究中机器学习算法及应用   总被引:1,自引:0,他引:1  
陈鹤  陶晔  毛振镀  邢鹏 《微生物学报》2022,62(12):4646-4662
微生物在环境中无处不在,它们不仅是生物地球化学循环和环境演化的关键参与者,也在环境监测、生态治理和保护中发挥着重要作用。随着高通量技术的发展,大量微生物数据产生,运用机器学习对环境微生物大数据进行建模和分析,在微生物标志物识别、污染物预测和环境质量预测等领域的科学研究和社会应用方面均具有重要意义。机器学习可分为监督学习和无监督学习2大类。在微生物组学研究当中,无监督学习通过聚类、降维等方法高效地学习输入数据的特征,进而对微生物数据进行整合和归类。监督学习运用有特征和标记的微生物数据集训练模型,在面对只有特征没有标记的数据时可以判断出标记,从而实现对新数据的分类、识别和预测。然而,复杂的机器学习算法通常以牺牲可解释性为代价来重点关注模型预测的准确性。机器学习模型通常可以看作预测特定结果的“黑匣子”,即对模型如何得出预测所知甚少。为了将机器学习更多地运用于微生物组学研究、提高我们提取有价值的微生物信息的能力,深入了解机器学习算法、提高模型的可解释性尤为重要。本文主要介绍在环境微生物领域常用的机器学习算法和基于微生物组数据的机器学习模型的构建步骤,包括特征选择、算法选择、模型构建和评估等,并对各种机器学习模型在环境微生物领域的应用进行综述,深入探究微生物组与周围环境之间的关联,探讨提高模型可解释性的方法,并为未来环境监测、环境健康预测提供科学参考。  相似文献   

13.
Despite extensive preventive efforts, falls continue to be a major source of morbidity and mortality among elderly. Real-time detection of falls and their urgent communication to a telecare center may enable rapid medical assistance, thus increasing the sense of security of the elderly and reducing some of the negative consequences of falls. Many different approaches have been explored to automatically detect a fall using inertial sensors. Although previously published algorithms report high sensitivity (SE) and high specificity (SP), they have usually been tested on simulated falls performed by healthy volunteers. We recently collected acceleration data during a number of real-world falls among a patient population with a high-fall-risk as part of the SensAction-AAL European project. The aim of the present study is to benchmark the performance of thirteen published fall-detection algorithms when they are applied to the database of 29 real-world falls. To the best of our knowledge, this is the first systematic comparison of fall detection algorithms tested on real-world falls. We found that the SP average of the thirteen algorithms, was (mean ± std) 83.0% ± 30.3% (maximum value = 98%). The SE was considerably lower (SE = 57.0% ± 27.3%, maximum value = 82.8%), much lower than the values obtained on simulated falls. The number of false alarms generated by the algorithms during 1-day monitoring of three representative fallers ranged from 3 to 85. The factors that affect the performance of the published algorithms, when they are applied to the real-world falls, are also discussed. These findings indicate the importance of testing fall-detection algorithms in real-life conditions in order to produce more effective automated alarm systems with higher acceptance. Further, the present results support the idea that a large, shared real-world fall database could, potentially, provide an enhanced understanding of the fall process and the information needed to design and evaluate a high-performance fall detector.  相似文献   

14.
We propose a new method, based on machine learning techniques, for the analysis of a combination of continuous data from dataloggers and a sampling of contemporaneous behaviour observations. This data combination provides an opportunity for biologists to study behaviour at a previously unknown level of detail and accuracy; however, continuously recorded data are of little use unless the resulting large volumes of raw data can be reliably translated into actual behaviour. We address this problem by applying a Support Vector Machine and a Hidden-Markov Model that allows us to classify an animal''s behaviour using a small set of field observations to calibrate continuously recorded activity data. Such classified data can be applied quantitatively to the behaviour of animals over extended periods and at times during which observation is difficult or impossible. We demonstrate the usefulness of the method by applying it to data from six cheetah (Acinonyx jubatus) in the Okavango Delta, Botswana. Cumulative activity data scores were recorded every five minutes by accelerometers embedded in GPS radio-collars for around one year on average. Direct behaviour sampling of each of the six cheetah were collected in the field for comparatively short periods. Using this approach we are able to classify each five minute activity score into a set of three key behaviour (feeding, mobile and stationary), creating a continuous behavioural sequence for the entire period for which the collars were deployed. Evaluation of our classifier with cross-validation shows the accuracy to be , but that the accuracy for individual classes is reduced with decreasing sample size of direct observations. We demonstrate how these processed data can be used to study behaviour identifying seasonal and gender differences in daily activity and feeding times. Results given here are unlike any that could be obtained using traditional approaches in both accuracy and detail.  相似文献   

15.
Meissner M  Koch O  Klebe G  Schneider G 《Proteins》2009,74(2):344-352
We present machine learning approaches for turn prediction from the amino acid sequence. Different turn classes and types were considered based on a novel turn classification scheme. We trained an unsupervised (self-organizing map) and two kernel-based classifiers, namely the support vector machine and a probabilistic neural network. Turn versus non-turn classification was carried out for turn families containing intramolecular hydrogen bonds and three to six residues. Support vector machine classifiers yielded a Matthews correlation coefficient (mcc) of approximately 0.6 and a prediction accuracy of 80%. Probabilistic neural networks were developed for beta-turn type prediction. The method was able to distinguish between five types of beta-turns yielding mcc > 0.5 and at least 80% overall accuracy. We conclude that the proposed new turn classification is distinct and well-defined, and machine learning classifiers are suited for sequence-based turn prediction. Their potential for sequence-based prediction of turn structures is discussed.  相似文献   

16.
High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.  相似文献   

17.
Because trip-related falls account for a significant proportion of falls by patients with amputations and older adults, the ability to repeatedly and reliably simulate a trip or evoke a trip-like response in a laboratory setting has potential utility as a tool to assess trip-related fall risk and as a training tool to reduce fall risk. This paper describes a treadmill-based method for delivering postural perturbations during locomotion to evoke a trip-like response and serve as a surrogate for an overground trip. Subjects walked at a normalized velocity in a Computer Assisted Rehabilitation Environment (CAREN). During single-limb stance, the treadmill belt speed was rapidly changed, thereby requiring the subject to perform a compensatory stepping response to avoid falling. Peak trunk flexion angle and peak trunk flexion velocity during the initial compensatory step following the perturbation were smaller for responses associated with recoveries compared to those associated with falls. These key fall prediction variables were consistent with the outcomes observed for laboratory-induced trips of older adults. This perturbation technique also demonstrated that this method of repeated but randomly delivered perturbations can evoke consistent, within-subject responses.  相似文献   

18.
Aberrant DNA methylation of CpG sites is among the earliest and most frequent alterations in cancer. Several studies suggest that aberrant methylation occurs in a tumour type-specific manner. However, large-scale analysis of candidate genes has so far been hampered by the lack of high throughput assays for methylation detection. We have developed the first microarray-based technique which allows genome-wide assessment of selected CpG dinucleotides as well as quantification of methylation at each site. Several hundred CpG sites were screened in 76 samples from four different human tumour types and corresponding healthy controls. Discriminative CpG dinucleotides were identified for different tissue type distinctions and used to predict the tumour class of as yet unknown samples with high accuracy using machine learning techniques. Some CpG dinucleotides correlate with progression to malignancy, whereas others are methylated in a tissue-specific manner independent of malignancy. Our results demonstrate that genome-wide analysis of methylation patterns combined with supervised and unsupervised machine learning techniques constitute a powerful novel tool to classify human cancers.  相似文献   

19.
In this paper, a recently developed machine learning algorithm referred to as Extreme Learning Machine (ELM) is used to classify five mental tasks from different subjects using electroencephalogram (EEG) signals available from a well-known database. Performance of ELM is compared in terms of training time and classification accuracy with a Backpropagation Neural Network (BPNN) classifier and also Support Vector Machines (SVMs). For SVMs, the comparisons have been made for both 1-against-1 and 1-against-all methods. Results show that ELM needs an order of magnitude less training time compared with SVMs and two orders of magnitude less compared with BPNN. The classification accuracy of ELM is similar to that of SVMs and BPNN. The study showed that smoothing of the classifiers' outputs can significantly improve their classification accuracies.  相似文献   

20.
Taxonomic identification of fossils based on morphometric data traditionally relies on the use of standard linear models to classify such data. Machine learning and decision trees offer powerful alternative approaches to this problem but are not widely used in palaeontology. Here, we apply these techniques to published morphometric data of isolated theropod teeth in order to explore their utility in tackling taxonomic problems. We chose two published datasets consisting of 886 teeth from 14 taxa and 3020 teeth from 17 taxa, respectively, each with five morphometric variables per tooth. We also explored the effects that missing data have on the final classification accuracy. Our results suggest that machine learning and decision trees yield superior classification results over a wide range of data permutations, with decision trees achieving accuracies of 96% in classifying test data in some cases. Missing data or attempts to generate synthetic data to overcome missing data seriously degrade all classifiers predictive accuracy. The results of our analyses also indicate that using ensemble classifiers combining different classification techniques and the examination of posterior probabilities is a useful aid in checking final class assignments. The application of such techniques to isolated theropod teeth demonstrate that simple morphometric data can be used to yield statistically robust taxonomic classifications and that lower classification accuracy is more likely to reflect preservational limitations of the data or poor application of the methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号