首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gaussian processes for machine learning   总被引:13,自引:0,他引:13  
Gaussian processes (GPs) are natural generalisations of multivariate Gaussian random variables to infinite (countably or continuous) index sets. GPs have been applied in a large number of fields to a diverse range of ends, and very many deep theoretical analyses of various properties are available. This paper gives an introduction to Gaussian processes on a fairly elementary level with special emphasis on characteristics relevant in machine learning. It draws explicit connections to branches such as spline smoothing models and support vector machines in which similar ideas have been investigated. Gaussian process models are routinely used to solve hard machine learning problems. They are attractive because of their flexible non-parametric nature and computational simplicity. Treated within a Bayesian framework, very powerful statistical methods can be implemented which offer valid estimates of uncertainties in our predictions and generic model selection procedures cast as nonlinear optimization problems. Their main drawback of heavy computational scaling has recently been alleviated by the introduction of generic sparse approximations.13,78,31 The mathematical literature on GPs is large and often uses deep concepts which are not required to fully understand most machine learning applications. In this tutorial paper, we aim to present characteristics of GPs relevant to machine learning and to show up precise connections to other "kernel machines" popular in the community. Our focus is on a simple presentation, but references to more detailed sources are provided.  相似文献   

2.
This paper reviews concept learning in Cebus monkeys, focussing on their ability to use the identity relation, oddity and natural concepts. Capuchins are similar to other primate genera in their use of these concepts. The extant data on learning in primates generally reflect historical concerns with general processes of learning. An alternative approach which considers the tasks the animal faces in its natural environment may be better suited to the discovery of species-unique characteristics of learning. This approach has not yet been applied to Cebus.  相似文献   

3.
For current computational intelligence techniques, a major challenge is how to learn new concepts in changing environment. Traditional learning schemes could not adequately address this problem due to a lack of dynamic data selection mechanism. In this paper, inspired by human learning process, a novel classification algorithm based on incremental semi-supervised support vector machine (SVM) is proposed. Through the analysis of prediction confidence of samples and data distribution in a changing environment, a “soft-start” approach, a data selection mechanism and a data cleaning mechanism are designed, which complete the construction of our incremental semi-supervised learning system. Noticeably, with the ingenious design procedure of our proposed algorithm, the computation complexity is reduced effectively. In addition, for the possible appearance of some new labeled samples in the learning process, a detailed analysis is also carried out. The results show that our algorithm does not rely on the model of sample distribution, has an extremely low rate of introducing wrong semi-labeled samples and can effectively make use of the unlabeled samples to enrich the knowledge system of classifier and improve the accuracy rate. Moreover, our method also has outstanding generalization performance and the ability to overcome the concept drift in a changing environment.  相似文献   

4.
A system with some degree of biological plausibility is developed to categorise items from a widely used machine learning benchmark. The system uses fatiguing leaky integrate and fire neurons, a relatively coarse point model that roughly duplicates biological spiking properties; this allows spontaneous firing based on hypo-fatigue so that neurons not directly stimulated by the environment may be included in the circuit. A novel compensatory Hebbian learning algorithm is used that considers the total synaptic weight coming into a neuron. The network is unsupervised and entirely self-organising. This is relatively effective as a machine learning algorithm, categorising with just neurons, and the performance is comparable with a Kohonen map. However the learning algorithm is not stable, and behaviour decays as length of training increases. Variables including learning rate, inhibition and topology are explored leading to stable systems driven by the environment. The model is thus a reasonable next step toward a full neural memory model.  相似文献   

5.
《Genomics》2020,112(3):2524-2534
The development of embryonic cells involves several continuous stages, and some genes are related to embryogenesis. To date, few studies have systematically investigated changes in gene expression profiles during mammalian embryogenesis. In this study, a computational analysis using machine learning algorithms was performed on the gene expression profiles of mouse embryonic cells at seven stages. First, the profiles were analyzed through a powerful Monte Carlo feature selection method for the generation of a feature list. Second, increment feature selection was applied on the list by incorporating two classification algorithms: support vector machine (SVM) and repeated incremental pruning to produce error reduction (RIPPER). Through SVM, we extracted several latent gene biomarkers, indicating the stages of embryonic cells, and constructed an optimal SVM classifier that produced a nearly perfect classification of embryonic cells. Furthermore, some interesting rules were accessed by the RIPPER algorithm, suggesting different expression patterns for different stages.  相似文献   

6.
Neural learning algorithms generally involve a number of identical processing units, which are fully or partially connected, and involve an update function, such as a ramp, a sigmoid or a Gaussian function for instance. Some variations also exist, where units can be heterogeneous, or where an alternative update technique is employed, such as a pulse stream generator. Associated with connections are numerical values that must be adjusted using a learning rule, and and dictated by parameters that are learning rule specific, such as momentum, a learning rate, a temperature, amongst others. Usually, neural learning algorithms involve local updates, and a global interaction between units is often discouraged, except in instances where units are fully connected, or involve synchronous updates. In all of these instances, concurrency within a neural algorithm cannot be fully exploited without a suitable implementation strategy. A design scheme is described for translating a neural learning algorithm from inception to implementation on a parallel machine using PVM or MPI libraries, or onto programmable logic such as FPGAs. A designer must first describe the algorithm using a specialised Neural Language, from which a Petri net (PN) model is constructed automatically for verification, and building a performance model. The PN model can be used to study issues such as synchronisation points, resource sharing and concurrency within a learning rule. Specialised constructs are provided to enable a designer to express various aspects of a learning rule, such as the number and connectivity of neural nodes, the interconnection strategies, and information flows required by the learning algorithm. A scheduling and mapping strategy is then used to translate this PN model onto a multiprocessor template. We demonstrate our technique using a Kohonen and backpropagation learning rules, implemented on a loosely coupled workstation cluster, and a dedicated parallel machine, with PVM libraries.  相似文献   

7.
《Trends in genetics : TIG》2023,39(4):285-307
Liquid biopsies (LBs), particularly using circulating tumor DNA (ctDNA), are expected to revolutionize precision oncology and blood-based cancer screening. Recent technological improvements, in combination with the ever-growing understanding of cell-free DNA (cfDNA) biology, are enabling the detection of tumor-specific changes with extremely high resolution and new analysis concepts beyond genetic alterations, including methylomics, fragmentomics, and nucleosomics. The interrogation of a large number of markers and the high complexity of data render traditional correlation methods insufficient. In this regard, machine learning (ML) algorithms are increasingly being used to decipher disease- and tissue-specific signals from cfDNA. Here, we review recent insights into biological ctDNA features and how these are incorporated into sophisticated ML applications.  相似文献   

8.
环境微生物研究中机器学习算法及应用   总被引:1,自引:0,他引:1  
陈鹤  陶晔  毛振镀  邢鹏 《微生物学报》2022,62(12):4646-4662
微生物在环境中无处不在,它们不仅是生物地球化学循环和环境演化的关键参与者,也在环境监测、生态治理和保护中发挥着重要作用。随着高通量技术的发展,大量微生物数据产生,运用机器学习对环境微生物大数据进行建模和分析,在微生物标志物识别、污染物预测和环境质量预测等领域的科学研究和社会应用方面均具有重要意义。机器学习可分为监督学习和无监督学习2大类。在微生物组学研究当中,无监督学习通过聚类、降维等方法高效地学习输入数据的特征,进而对微生物数据进行整合和归类。监督学习运用有特征和标记的微生物数据集训练模型,在面对只有特征没有标记的数据时可以判断出标记,从而实现对新数据的分类、识别和预测。然而,复杂的机器学习算法通常以牺牲可解释性为代价来重点关注模型预测的准确性。机器学习模型通常可以看作预测特定结果的“黑匣子”,即对模型如何得出预测所知甚少。为了将机器学习更多地运用于微生物组学研究、提高我们提取有价值的微生物信息的能力,深入了解机器学习算法、提高模型的可解释性尤为重要。本文主要介绍在环境微生物领域常用的机器学习算法和基于微生物组数据的机器学习模型的构建步骤,包括特征选择、算法选择、模型构建和评估等,并对各种机器学习模型在环境微生物领域的应用进行综述,深入探究微生物组与周围环境之间的关联,探讨提高模型可解释性的方法,并为未来环境监测、环境健康预测提供科学参考。  相似文献   

9.
赵学彤  杨亚东  渠鸿竹  方向东 《遗传》2018,40(9):693-703
随着组学技术的不断发展,对于不同层次和类型的生物数据的获取方法日益成熟。在疾病诊治过程中会产生大量数据,通过机器学习等人工智能方法解析复杂、多维、多尺度的疾病大数据,构建临床决策支持工具,辅助医生寻找快速且有效的疾病诊疗方案是非常必要的。在此过程中,机器学习等人工智能方法的选择显得尤为重要。基于此,本文首先从类型和算法角度对临床决策支持领域中常用的机器学习等方法进行简要综述,分别介绍了支持向量机、逻辑回归、聚类算法、Bagging、随机森林和深度学习,对机器学习等方法在临床决策支持中的应用做了相应总结和分类,并对它们的优势和不足分别进行讨论和阐述,为临床决策支持中机器学习等人工智能方法的选择提供有效参考。  相似文献   

10.
11.
基质辅助激光解吸/电离飞行时间质谱(matrix-assisted laser desorption/ionization time-of-flight mass spectrometry,MALDI-TOF MS)是一种新兴的高通量技术,已广泛应用于临床微生物、食品微生物和水产微生物的快速鉴定。如何进一步提高MALDI-TOF MS在微生物鉴定中的分辨率是该技术当前面临的一大挑战。为了高效处理大量高维微生物MALDI-TOF MS数据,各种机器学习算法得到了应用。本文综述了机器学习在微生物MALDI-TOFMS鉴定中的应用。首先,本文在介绍机器学习在微生物MALDI-TOF MS分类中的工作流程后,进一步对MALDI-TOF MS的数据特征、MALDI-TOF MS数据库、数据的预处理和模型的性能评估进行了描述。然后讨论了典型的机器学习分类算法和集成学习算法的应用。简单的机器学习算法很难满足微生物MALDI-TOF MS分类的高分辨率的需求,而组合不同机器学习算法和集成学习算法可以获得更好的微生物分类性能。在MALDI-TOF MS数据的预处理方面,小波算法和遗传算法的应用最广,它们...  相似文献   

12.
A pseudo-random generator is an algorithm to generate a sequence of objects determined by a truly random seed which is not truly random. It has been widely used in many applications, such as cryptography and simulations. In this article, we examine current popular machine learning algorithms with various on-line algorithms for pseudo-random generated data in order to find out which machine learning approach is more suitable for this kind of data for prediction based on on-line algorithms. To further improve the prediction performance, we propose a novel sample weighted algorithm that takes generalization errors in each iteration into account. We perform intensive evaluation on real Baccarat data generated by Casino machines and random number generated by a popular Java program, which are two typical examples of pseudo-random generated data. The experimental results show that support vector machine and k-nearest neighbors have better performance than others with and without sample weighted algorithm in the evaluation data set.  相似文献   

13.
For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%-13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology.  相似文献   

14.
Researches on the microbiome have been actively conducted worldwide and the results have shown human gut bacterial environment significantly impacts on immune system, psychological conditions, cancers, obesity, and metabolic diseases. Thanks to the development of sequencing technology, microbiome studies with large number of samples are eligible on an acceptable cost nowadays. Large samples allow analysis of more sophisticated modeling using machine learning approaches to study relationships between microbiome and various traits. This article provides an overview of machine learning methods for non-data scientists interested in the association analysis of microbiomes and host phenotypes. Once genomic feature of microbiome is determined, various analysis methods can be used to explore the relationship between microbiome and host phenotypes that include penalized regression, support vector machine (SVM), random forest, and artificial neural network (ANN). Deep neural network methods are also touched. Analysis procedure from environment setup to extract analysis results are presented with Python programming language.  相似文献   

15.
A fundamental and frequently overlooked aspect of animal learning is its reliance on compatibility between the learning rules used and the attentional and motivational mechanisms directing them to process the relevant data (called here data-acquisition mechanisms). We propose that this coordinated action, which may first appear fragile and error prone, is in fact extremely powerful, and critical for understanding cognitive evolution. Using basic examples from imprinting and associative learning, we argue that by coevolving to handle the natural distribution of data in the animal's environment, learning and data-acquisition mechanisms are tuned jointly so as to facilitate effective learning using relatively little memory and computation. We then suggest that this coevolutionary process offers a feasible path for the incremental evolution of complex cognitive systems, because it can greatly simplify learning. This is illustrated by considering how animals and humans can use these simple mechanisms to learn complex patterns and represent them in the brain. We conclude with some predictions and suggested directions for experimental and theoretical work.  相似文献   

16.
Advances in the prediction of protein targeting signals   总被引:5,自引:0,他引:5  
Schneider G  Fechner U 《Proteomics》2004,4(6):1571-1580
Enlarged sets of reference data and special machine learning approaches have improved the accuracy of the prediction of protein subcellular localization. Recent approaches report over 95% correct predictions with low fractions of false-positives for secretory proteins. A clear trend is to develop specifically tailored organism- and organelle-specific prediction tools rather than using one general method. Focus of the review is on machine learning systems, highlighting four concepts: the artificial neural feed-forward network, the self-organizing map (SOM), the Hidden-Markov-Model (HMM), and the support vector machine (SVM).  相似文献   

17.

Background

Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports.

Results

The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases.

Conclusions

This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals.  相似文献   

18.
Abstract

Accurate and rapid toxic gas concentration prediction model plays an important role in emergency aid of sudden gas leak. However, it is difficult for existing dispersion model to achieve accuracy and efficiency requirements at the same time. Although some researchers have considered developing new forecasting models with traditional machine learning, such as back propagation (BP) neural network, support vector machine (SVM), the prediction results obtained from such models need to be improved still in terms of accuracy. Then new prediction models based on deep learning are proposed in this paper. Deep learning has obvious advantages over traditional machine learning in prediction and classification. Deep belief networks (DBNs) as well as convolution neural networks (CNNs) are used to build new dispersion models here. Both models are compared with Gaussian plume model, computation fluid dynamics (CFD) model and models based on traditional machine learning in terms of accuracy, prediction time, and computation time. The experimental results turn out that CNNs model performs better considering all evaluation indexes.  相似文献   

19.
The purpose of this narrative review is to provide a critical reflection of how analytical machine learning approaches could provide the platform to harness variability of patient presentation to enhance clinical prediction. The review includes a summary of current knowledge on the physiological adaptations present in people with spinal pain. We discuss how contemporary evidence highlights the importance of not relying on single features when characterizing patients given the variability of physiological adaptations present in people with spinal pain. The advantages and disadvantages of current analytical strategies in contemporary basic science and epidemiological research are reviewed and we consider how analytical machine learning approaches could provide the platform to harness the variability of patient presentations to enhance clinical prediction of pain persistence or recurrence. We propose that machine learning techniques can be leveraged to translate a potentially heterogeneous set of variables into clinically useful information with the potential to enhance patient management.  相似文献   

20.
Animals use heuristic strategies to determine from which conspecifics to learn socially. This leads to directed social learning. Directed social learning protects them from copying non-adaptive information. So far, the strategies of animals, leading to directed social learning, are assumed to rely on (possibly indirect) inferences about the demonstrator’s success. As an alternative to this assumption, we propose a strategy that only uses self-established estimates of the pay-offs of behavior. We evaluate the strategy in a number of agent-based simulations. Critically, the strategy’s success is warranted by the inclusion of an incremental learning mechanism. Our findings point out new theoretical opportunities to regulate social learning for animals. More broadly, our simulations emphasize the need to include a realistic learning mechanism in game-theoretic studies of social learning strategies, and call for re-evaluation of previous findings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号