首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 18 毫秒
1.
In this paper, we develop a novel semi-supervised learning algorithm called active hybrid deep belief networks (AHD), to address the semi-supervised sentiment classification problem with deep learning. First, we construct the previous several hidden layers using restricted Boltzmann machines (RBM), which can reduce the dimension and abstract the information of the reviews quickly. Second, we construct the following hidden layers using convolutional restricted Boltzmann machines (CRBM), which can abstract the information of reviews effectively. Third, the constructed deep architecture is fine-tuned by gradient-descent based supervised learning with an exponential loss function. Finally, active learning method is combined based on the proposed deep architecture. We did several experiments on five sentiment classification datasets, and show that AHD is competitive with previous semi-supervised learning algorithm. Experiments are also conducted to verify the effectiveness of our proposed method with different number of labeled reviews and unlabeled reviews respectively.  相似文献   

2.
Subcellular localization of a protein is important to understand proteins’ functions and interactions. There are many techniques based on computational methods to predict protein subcellular locations, but it has been shown that many prediction tasks have a training data shortage problem. This paper introduces a new method to mine proteins with non-experimental annotations, which are labeled by non-experimental evidences of protein databases to overcome the training data shortage problem. A novel active sample selection strategy is designed, taking advantage of active learning technology, to actively find useful samples from the entire data pool of candidate proteins with non-experimental annotations. This approach can adequately estimate the “value” of each sample, automatically select the most valuable samples and add them into the original training set, to help to retrain the classifiers. Numerical experiments with for four popular multi-label classifiers on three benchmark datasets show that the proposed method can effectively select the valuable samples to supplement the original training set and significantly improve the performances of predicting classifiers.  相似文献   

3.
Considering the two-class classification problem in brain imaging data analysis, we propose a sparse representation-based multi-variate pattern analysis (MVPA) algorithm to localize brain activation patterns corresponding to different stimulus classes/brain states respectively. Feature selection can be modeled as a sparse representation (or sparse regression) problem. Such technique has been successfully applied to voxel selection in fMRI data analysis. However, single selection based on sparse representation or other methods is prone to obtain a subset of the most informative features rather than all. Herein, our proposed algorithm recursively eliminates informative features selected by a sparse regression method until the decoding accuracy based on the remaining features drops to a threshold close to chance level. In this way, the resultant feature set including all the identified features is expected to involve all the informative features for discrimination. According to the signs of the sparse regression weights, these selected features are separated into two sets corresponding to two stimulus classes/brain states. Next, in order to remove irrelevant/noisy features in the two selected feature sets, we perform a nonparametric permutation test at the individual subject level or the group level. In data analysis, we verified our algorithm with a toy data set and an intrinsic signal optical imaging data set. The results show that our algorithm has accurately localized two class-related patterns. As an application example, we used our algorithm on a functional magnetic resonance imaging (fMRI) data set. Two sets of informative voxels, corresponding to two semantic categories (i.e., “old people” and “young people”), respectively, are obtained in the human brain.  相似文献   

4.
In this paper, we address the important problem of feature selection for a P300-based brain computer interface (BCI) speller system in several aspects. Firstly, time segment selection and electroencephalogram channel selection are jointly performed for better discriminability of P300 and background signals. Secondly, in view of the situation that training data with labels are insufficient, we propose an iterative semi-supervised support vector machine for joint spatio-temporal feature selection as well as classification, in which both labeled training data and unlabeled test data are utilized. More importantly, the semi-supervised learning enables the adaptivity of the system. The performance of our algorithm has been evaluated through the analysis of a P300 dataset provided by BCI Competition 2005 and another dataset collected from an in-house P300 speller system. The results show that our algorithm for joint feature selection and classification achieves satisfactory performance, meanwhile it can significantly reduce the training effort of the system. Furthermore, this algorithm is implemented online and the corresponding results demonstrate that our algorithm can improve the adaptiveness of the P300-based BCI speller.  相似文献   

5.
Genome-wide association studies (GWAS) have generated sufficient data to assess the role of selection in shaping allelic diversity of disease-associated SNPs. Negative selection against disease risk variants is expected to reduce their frequencies making them overrepresented in the group of minor (<50%) alleles. Indeed, we found that the overall proportion of risk alleles was higher among alleles with frequency <50% (minor alleles) compared to that in the group of major alleles. We hypothesized that negative selection may have different effects on environment (or lifestyle)-dependent versus environment (or lifestyle)-independent diseases. We used an environment/lifestyle index (ELI) to assess influence of environmental/lifestyle factors on disease etiology. ELI was defined as the number of publications mentioning “environment” or “lifestyle” AND disease per 1,000 disease-mentioning publications. We found that the frequency distributions of the risk alleles for the diseases with strong environmental/lifestyle components follow the distribution expected under a selectively neutral model, while frequency distributions of the risk alleles for the diseases with weak environmental/lifestyle influences is shifted to the lower values indicating effects of negative selection. We hypothesized that previously selectively neutral variants become risk alleles when environment changes. The hypothesis of ancestrally neutral, currently disadvantageous risk-associated alleles predicts that the distribution of risk alleles for the environment/lifestyle dependent diseases will follow a neutral model since natural selection has not had enough time to influence allele frequencies. The results of our analysis suggest that prediction of SNP functionality based on the level of evolutionary conservation may not be useful for SNPs associated with environment/lifestyle dependent diseases.  相似文献   

6.
In constructing and visualizing a virtual three-dimensional forest scene, we must first obtain the vegetation distribution, namely, the location of each plant in the forest. Because the forest contains a large number of plants, the distribution of each plant is difficult to obtain from actual measurement methods. Random approaches are used as common solutions to simulate a forest distribution but fail to reflect the specific biological arrangements among types of plants. Observations show that plants in the forest tend to generate particular distribution patterns due to growth competition and specific habitats. This pattern, which represents a local feature in the distribution and occurs repeatedly in the forest, is in line with the “locality” and “static” characteristics in the “texture data”, making it possible to use a sample-based texture synthesis strategy to build the distribution. We propose a vegetation distribution data generation method that uses sample-based vector pattern synthesis. A sample forest stand is obtained first and recorded as a two-dimensional vector-element distribution pattern. Next, the large-scale vegetation distribution pattern is synthesized automatically using the proposed vector pattern synthesis algorithm. The synthesized distribution pattern resembles the sample pattern in the distribution features. The vector pattern synthesis algorithm proposed in this paper adopts a neighborhood comparison technique based on histogram matching, which makes it efficient and easy to implement. Experiments show that the distribution pattern synthesized with this method can sufficiently preserve the features of the sample distribution pattern, making our method meaningful for constructing realistic forest scenes.  相似文献   

7.
The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein–protein interactions extraction, and (2) Gene–suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene–suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.  相似文献   

8.
Our nervous system can efficiently recognize objects in spite of changes in contextual variables such as perspective or lighting conditions. Several lines of research have proposed that this ability for invariant recognition is learned by exploiting the fact that object identities typically vary more slowly in time than contextual variables or noise. Here, we study the question of how this “temporal stability” or “slowness” approach can be implemented within the limits of biologically realistic spike-based learning rules. We first show that slow feature analysis, an algorithm that is based on slowness, can be implemented in linear continuous model neurons by means of a modified Hebbian learning rule. This approach provides a link to the trace rule, which is another implementation of slowness learning. Then, we show analytically that for linear Poisson neurons, slowness learning can be implemented by spike-timing–dependent plasticity (STDP) with a specific learning window. By studying the learning dynamics of STDP, we show that for functional interpretations of STDP, it is not the learning window alone that is relevant but rather the convolution of the learning window with the postsynaptic potential. We then derive STDP learning windows that implement slow feature analysis and the “trace rule.” The resulting learning windows are compatible with physiological data both in shape and timescale. Moreover, our analysis shows that the learning window can be split into two functionally different components that are sensitive to reversible and irreversible aspects of the input statistics, respectively. The theory indicates that irreversible input statistics are not in favor of stable weight distributions but may generate oscillatory weight dynamics. Our analysis offers a novel interpretation for the functional role of STDP in physiological neurons.  相似文献   

9.
Community structure detection is of great importance because it can help in discovering the relationship between the function and the topology structure of a network. Many community detection algorithms have been proposed, but how to incorporate the prior knowledge in the detection process remains a challenging problem. In this paper, we propose a semi-supervised community detection algorithm, which makes full utilization of the must-link and cannot-link constraints to guide the process of community detection and thereby extracts high-quality community structures from networks. To acquire the high-quality must-link and cannot-link constraints, we also propose a semi-supervised component generation algorithm based on active learning, which actively selects nodes with maximum utility for the proposed semi-supervised community detection algorithm step by step, and then generates the must-link and cannot-link constraints by accessing a noiseless oracle. Extensive experiments were carried out, and the experimental results show that the introduction of active learning into the problem of community detection makes a success. Our proposed method can extract high-quality community structures from networks, and significantly outperforms other comparison methods.  相似文献   

10.
Currently, remote sensing technologies were widely employed in the dynamic monitoring of the land. This paper presented an algorithm named fuzzy nonlinear proximal support vector machine (FNPSVM) by basing on ETM+ remote sensing image. This algorithm is applied to extract various types of lands of the city Da’an in northern China. Two multi-category strategies, namely “one-against-one” and “one-against-rest” for this algorithm were described in detail and then compared. A fuzzy membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability, training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine (PSVM) under different training conditions. The impacts of the selection of training samples, testing samples and features on the four classifiers were also evaluated in these experiments.  相似文献   

11.
Incremental learning, in which new knowledge is acquired gradually through trial and error, can be distinguished from one-shot learning, in which the brain learns rapidly from only a single pairing of a stimulus and a consequence. Very little is known about how the brain transitions between these two fundamentally different forms of learning. Here we test a computational hypothesis that uncertainty about the causal relationship between a stimulus and an outcome induces rapid changes in the rate of learning, which in turn mediates the transition between incremental and one-shot learning. By using a novel behavioral task in combination with functional magnetic resonance imaging (fMRI) data from human volunteers, we found evidence implicating the ventrolateral prefrontal cortex and hippocampus in this process. The hippocampus was selectively “switched” on when one-shot learning was predicted to occur, while the ventrolateral prefrontal cortex was found to encode uncertainty about the causal association, exhibiting increased coupling with the hippocampus for high-learning rates, suggesting this region may act as a “switch,” turning on and off one-shot learning as required.  相似文献   

12.
Moshe Szyf 《Epigenetics》2011,6(8):971-978
Although epidemiological data provides evidence that there is an interaction between genetics (nature) and the social and physical environments (nurture) in human development; the main open question remains the mechanism. The pattern of distribution of methyl groups in DNA is different from cell-type to cell type and is conferring cell specific identity on DNA during cellular differentiation and organogenesis. This is an innate and highly programmed process. However, recent data suggests that DNA methylation is not only involved in cellular differentiation but that it is also involved in modulation of genome function in response to signals from the physical, biological and social environments. We propose that modulation of DNA methylation in response to environmental cues early in life serves as a mechanism of life-long genome “adaptation” that molecularly embeds the early experiences of a child (“nurture”) in the genome (“nature”). There is an emerging line of data supporting this hypothesis in rodents, non-human primates and humans that will be reviewed here. However, several critical questions remain including the identification of mechanisms that transmit the signals from the social environment to the DNA methylation/demethylation enzymes.Key words: DNA methylation, psychiatry, development, epidemiology, environment  相似文献   

13.
Learning in a stochastic environment consists of estimating a model from a limited amount of noisy data, and is therefore inherently uncertain. However, many classical models reduce the learning process to the updating of parameter estimates and neglect the fact that learning is also frequently accompanied by a variable “feeling of knowing” or confidence. The characteristics and the origin of these subjective confidence estimates thus remain largely unknown. Here we investigate whether, during learning, humans not only infer a model of their environment, but also derive an accurate sense of confidence from their inferences. In our experiment, humans estimated the transition probabilities between two visual or auditory stimuli in a changing environment, and reported their mean estimate and their confidence in this report. To formalize the link between both kinds of estimate and assess their accuracy in comparison to a normative reference, we derive the optimal inference strategy for our task. Our results indicate that subjects accurately track the likelihood that their inferences are correct. Learning and estimating confidence in what has been learned appear to be two intimately related abilities, suggesting that they arise from a single inference process. We show that human performance matches several properties of the optimal probabilistic inference. In particular, subjective confidence is impacted by environmental uncertainty, both at the first level (uncertainty in stimulus occurrence given the inferred stochastic characteristics) and at the second level (uncertainty due to unexpected changes in these stochastic characteristics). Confidence also increases appropriately with the number of observations within stable periods. Our results support the idea that humans possess a quantitative sense of confidence in their inferences about abstract non-sensory parameters of the environment. This ability cannot be reduced to simple heuristics, it seems instead a core property of the learning process.  相似文献   

14.
A new manifold learning method, called parameter-free semi-supervised local Fisher discriminant analysis (pSELF), is proposed to map the gene expression data into a low-dimensional space for tumor classification. Motivated by the fact that semi-supervised and parameter-free are two desirable and promising characteristics for dimension reduction, a new difference-based optimization objective function with unlabeled samples has been designed. The proposed method preserves the global structure of unlabeled samples in addition to separating labeled samples in different classes from each other. The semi-supervised method has an analytic form of the globally optimal solution, which can be computed efficiently by eigen decomposition. Experimental results on synthetic data and SRBCT, DLBCL, and Brain Tumor gene expression data sets demonstrate the effectiveness of the proposed method.  相似文献   

15.

Background

Selecting a subset of relevant properties from a large set of features that describe a dataset is a challenging machine learning task. In biology, for instance, the advances in the available technologies enable the generation of a very large number of biomarkers that describe the data. Choosing the more informative markers along with performing a high-accuracy classification over the data can be a daunting task, particularly if the data are high dimensional. An often adopted approach is to formulate the feature selection problem as a biobjective optimization problem, with the aim of maximizing the performance of the data analysis model (the quality of the data training fitting) while minimizing the number of features used.

Results

We propose an optimization approach for the feature selection problem that considers a “chaotic” version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature. The balance between exploration of the search space and exploitation of the best solutions is a challenge in multi-objective optimization. The exploration/exploitation rate is controlled by the parameter I that limits the random walk range of the ants/prey. This variable is increased iteratively in a quasi-linear manner to decrease the exploration rate as the optimization progresses. The quasi-linear decrease in the variable I may lead to immature convergence in some cases and trapping in local minima in other cases. The chaotic system proposed here attempts to improve the tradeoff between exploration and exploitation. The methodology is evaluated using different chaotic maps on a number of feature selection datasets. To ensure generality, we used ten biological datasets, but we also used other types of data from various sources. The results are compared with the particle swarm optimizer and with genetic algorithm variants for feature selection using a set of quality metrics.  相似文献   

16.
The quality of electrophysiological recordings varies a lot due to technical and biological variability and neuroscientists inevitably have to select “good” recordings for further analyses. This procedure is time-consuming and prone to selection biases. Here, we investigate replacing human decisions by a machine learning approach. We define 16 features, such as spike height and width, select the most informative ones using a wrapper method and train a classifier to reproduce the judgement of one of our expert electrophysiologists. Generalisation performance is then assessed on unseen data, classified by the same or by another expert. We observe that the learning machine can be equally, if not more, consistent in its judgements as individual experts amongst each other. Best performance is achieved for a limited number of informative features; the optimal feature set being different from one data set to another. With 80–90% of correct judgements, the performance of the system is very promising within the data sets of each expert but judgments are less reliable when it is used across sets of recordings from different experts. We conclude that the proposed approach is relevant to the selection of electrophysiological recordings, provided parameters are adjusted to different types of experiments and to individual experimenters.  相似文献   

17.
18.
Classical Marr-Albus theories of cerebellar learning employ only cortical sites of plasticity. However, tests of these theories using adaptive calibration of the vestibulo–ocular reflex (VOR) have indicated plasticity in both cerebellar cortex and the brainstem. To resolve this long-standing conflict, we attempted to identify the computational role of the brainstem site, by using an adaptive filter version of the cerebellar microcircuit to model VOR calibration for changes in the oculomotor plant. With only cortical plasticity, introducing a realistic delay in the retinal-slip error signal of 100 ms prevented learning at frequencies higher than 2.5 Hz, although the VOR itself is accurate up to at least 25 Hz. However, the introduction of an additional brainstem site of plasticity, driven by the correlation between cerebellar and vestibular inputs, overcame the 2.5 Hz limitation and allowed learning of accurate high-frequency gains. This “cortex-first” learning mechanism is consistent with a wide variety of evidence concerning the role of the flocculus in VOR calibration, and complements rather than replaces the previously proposed “brainstem-first” mechanism that operates when ocular tracking mechanisms are effective. These results (i) describe a process whereby information originally learnt in one area of the brain (cerebellar cortex) can be transferred and expressed in another (brainstem), and (ii) indicate for the first time why a brainstem site of plasticity is actually required by Marr-Albus type models when high-frequency gains must be learned in the presence of error delay.  相似文献   

19.
The concept of plant intelligence, as proposed by Anthony Trewavas, has raised considerable discussion. However, plant intelligence remains loosely defined; often it is either perceived as practically synonymous to Darwinian fitness, or reduced to a mere decorative metaphor. A more strict view can be taken, emphasizing necessary prerequisites such as memory and learning, which requires clarifying the definition of memory itself. To qualify as memories, traces of past events have to be not only stored, but also actively accessed. We propose a criterion for eliminating false candidates of possible plant intelligence phenomena in this stricter sense: an “intelligent” behavior must involve a component that can be approximated by a plausible algorithmic model involving recourse to stored information about past states of the individual or its environment. Re-evaluation of previously presented examples of plant intelligence shows that only some of them pass our test.
“You were hurt?” Kumiko said, looking at the scar.Sally looked down. “Yeah.”“Why didn''t you have it removed?”“Sometimes it''s good to remember.”“Being hurt?”“Being stupid.”—(W. Gibson: Mona Lisa Overdrive)
Key words: intelligence, memory, learning, plant development, mathematical models, plant neurobiology, definition of terms  相似文献   

20.
Guillaume Martin 《Genetics》2014,197(1):237-255
Models relating phenotype space to fitness (phenotype–fitness landscapes) have seen important developments recently. They can roughly be divided into mechanistic models (e.g., metabolic networks) and more heuristic models like Fisher’s geometrical model. Each has its own drawbacks, but both yield testable predictions on how the context (genomic background or environment) affects the distribution of mutation effects on fitness and thus adaptation. Both have received some empirical validation. This article aims at bridging the gap between these approaches. A derivation of the Fisher model “from first principles” is proposed, where the basic assumptions emerge from a more general model, inspired by mechanistic networks. I start from a general phenotypic network relating unspecified phenotypic traits and fitness. A limited set of qualitative assumptions is then imposed, mostly corresponding to known features of phenotypic networks: a large set of traits is pleiotropically affected by mutations and determines a much smaller set of traits under optimizing selection. Otherwise, the model remains fairly general regarding the phenotypic processes involved or the distribution of mutation effects affecting the network. A statistical treatment and a local approximation close to a fitness optimum yield a landscape that is effectively the isotropic Fisher model or its extension with a single dominant phenotypic direction. The fit of the resulting alternative distributions is illustrated in an empirical data set. These results bear implications on the validity of Fisher’s model’s assumptions and on which features of mutation fitness effects may vary (or not) across genomic or environmental contexts.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号