首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 662 毫秒
1.
2.
An important open problem of computational neuroscience is the generic organization of computations in networks of neurons in the brain. We show here through rigorous theoretical analysis that inherent stochastic features of spiking neurons, in combination with simple nonlinear computational operations in specific network motifs and dendritic arbors, enable networks of spiking neurons to carry out probabilistic inference through sampling in general graphical models. In particular, it enables them to carry out probabilistic inference in Bayesian networks with converging arrows ("explaining away") and with undirected loops, that occur in many real-world tasks. Ubiquitous stochastic features of networks of spiking neurons, such as trial-to-trial variability and spontaneous activity, are necessary ingredients of the underlying computational organization. We demonstrate through computer simulations that this approach can be scaled up to neural emulations of probabilistic inference in fairly large graphical models, yielding some of the most complex computations that have been carried out so far in networks of spiking neurons.  相似文献   

3.
Bayesian belief networks (BBN) are a widely studied graphical model for representing uncertainty and probabilistic interdependence among variables. One of the factors that restricts the model's wide acceptance in practical applications is that the general inference with BBN is NP-hard. This is also true for the maximum a posteriori probability (MAP) problem, which is to find the most probable joint value assignment to all uninstantiated variables, given instantiation of some variables in a BBN. To circumvent the difficulty caused by MAP's computational complexity, we suggest in this paper a neural network approximation approach. With this approach, a BBN is treated as a neural network without any change or transformation of the network structure, and the node activation functions are derived based on an energy function defined over a given BBN. Three methods are developed. They are the hill-climbing style discrete method, the simulated annealing method, and the continuous method based on the mean field theory. All three methods are for BBN of general structures, with the restriction that nodes of BBN are binary variables. In addition, rules for applying these methods to noisy-or networks are also developed, which may lead to more efficient computation in some cases. These methods' convergence is analyzed, and their validity tested through a series of computer experiments with two BBN of moderate size and complexity. Although additional theoretical and empirical work is needed, the analysis and experiments suggest that this approach may lead to effective and accurate approximation for MAP problems.  相似文献   

4.
We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Since substitutions of amino acids are common in protein families, incorporating wild-cards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. As protein databases become larger, data driven learning algorithms for probabilistic models such as SMTs will require vast amounts of memory. We therefore describe and use efficient data structures to improve the memory usage of SMTs. We evaluate SMTs by building protein family classifiers using the Pfam and SCOP databases and compare our results to previously published results and state-of-the-art protein homology detection methods. SMTs outperform previous probabilistic suffix tree methods and under certain conditions perform comparably to state-of-the-art protein homology methods.  相似文献   

5.
The identification of effective connectivity from time-series data such as electroencephalogram (EEG) or time-resolved function magnetic resonance imaging (fMRI) recordings is an important problem in brain imaging. One commonly used approach to inference effective connectivity is based on vector autoregressive models and the concept of Granger causality. However, this probabilistic concept of causality can lead to spurious causalities in the presence of latent variables. Recently, graphical models have been used to discuss problems of causal inference for multivariate data. In this paper, we extend these concepts to the case of time-series and present a graphical approach for discussing Granger-causal relationships among multiple time-series. In particular, we propose a new graphical representation that allows the characterization of spurious causality and, thus, can be used to investigate spurious causality. The method is demonstrated with concurrent EEG and fMRI recordings which are used to investigate the interrelations between the alpha rhythm in the EEG and blood oxygenation level dependent (BOLD) responses in the fMRI. The results confirm previous findings on the location of the source of the EEG alpha rhythm.  相似文献   

6.
Minimum message length encoding and the comparison of macromolecules   总被引:2,自引:0,他引:2  
A method of inductive inference known asminimum message length encoding is applied to string comparison in molecular biology. The question of whether or not two strings are related and, if so, of how they are related and the problem of finding a good theory of string mutation are treated as inductive inference problems. The method allows the posterior odds-ratio of two string alignments or of two models of string mutation to be computed. The connection between models of mutation and existing string alignment algorithms is made explicit. A fast minimum message length alignment algorithm is also described. Supported by Australian Research Council grant A48830856.  相似文献   

7.
Much modern work in phylogenetics depends on statistical sampling approaches to phylogeny construction to estimate probability distributions of possible trees for any given input data set. Our theoretical understanding of sampling approaches to phylogenetics remains far less developed than that for optimization approaches, however, particularly with regard to the number of sampling steps needed to produce accurate samples of tree partition functions. Despite the many advantages in principle of being able to sample trees from sophisticated probabilistic models, we have little theoretical basis for concluding that the prevailing sampling approaches do in fact yield accurate samples from those models within realistic numbers of steps. We propose a novel approach to phylogenetic sampling intended to be both efficient in practice and more amenable to theoretical analysis than the prevailing methods. The method depends on replacing the standard tree rearrangement moves with an alternative Markov model in which one solves a theoretically hard but practically tractable optimization problem on each step of sampling. The resulting method can be applied to a broad range of standard probability models, yielding practical algorithms for efficient sampling and rigorous proofs of accurate sampling for heated versions of some important special cases. We demonstrate the efficiency and versatility of the method by an analysis of uncertainty in tree inference over varying input sizes. In addition to providing a new practical method for phylogenetic sampling, the technique is likely to prove applicable to many similar problems involving sampling over combinatorial objects weighted by a likelihood model.  相似文献   

8.
Summary Minimum message length encoding is a technique of inductive inference with theoretical and practical advantages. It allows the posterior odds-ratio of two theories or hypotheses to be calculated. Here it is applied to problems of aligning or relating two strings, in particular two biological macromolecules. We compare the r-theory, that the strings are related, with the null-theory, that they are not related. If they are related, the probabilities of the various alignments can be calculated. This is done for one-, three-, and five-state models of relation or mutation. These correspond to linear and piecewise linear cost functions on runs of insertions and deletions. We describe how to estimate parameters of a model. The validity of a model is itself an hypothesis and can be objectively tested. This is done on real DNA strings and on artificial data. The tests on artificial data indicate limits on what can be inferred in various situations. The tests on real DNA support either the three- or five-state models over the one-state model. Finally, a fast, approximate minimum message length string comparison algorithm is described.Offprint requests to: L. Allison  相似文献   

9.
In order to improve our understanding of directly transmitted pathogens within host populations, epidemic models should take into account individual heterogeneities as well as stochastic fluctuations in individual parameters. The associated cost results in an increasing level of complexity of the mathematical models which generally lack consistent formalisms. In this paper, we demonstrate that complex epidemic models could be expressed as colored stochastic Petri nets (CSPN). CSPN is a mathematical tool developed in computer science. The concept is based on the Markov Chain theory and on a standard well codified graphical formalism. This approach presents an alternative to other computer simulation methods since it offers both a theoretical formalism and a graphical representation that facilitate the implementation, the understanding and thus the replication or modification of the model. We explain how common concepts of epidemic models--such as the incidence function--can be easily translated into an individual based point of view in the CSPN formalism. We then illustrate this approach by using the well documented susceptible-infected model with recruitment and death.  相似文献   

10.
Graphical models describe the linear correlation structure of data and have been used to establish causal relationships among phenotypes in genetic mapping populations. Data are typically collected at a single point in time. Biological processes on the other hand are often non-linear and display time varying dynamics. The extent to which graphical models can recapitulate the architecture of an underlying biological processes is not well understood. We consider metabolic networks with known stoichiometry to address the fundamental question: “What can causal networks tell us about metabolic pathways?”. Using data from an Arabidopsis BaySha population and simulated data from dynamic models of pathway motifs, we assess our ability to reconstruct metabolic pathways using graphical models. Our results highlight the necessity of non-genetic residual biological variation for reliable inference. Recovery of the ordering within a pathway is possible, but should not be expected. Causal inference is sensitive to subtle patterns in the correlation structure that may be driven by a variety of factors, which may not emphasize the substrate-product relationship. We illustrate the effects of metabolic pathway architecture, epistasis and stochastic variation on correlation structure and graphical model-derived networks. We conclude that graphical models should be interpreted cautiously, especially if the implied causal relationships are to be used in the design of intervention strategies.  相似文献   

11.
The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.  相似文献   

12.
We introduce three algorithms for learning generative models of molecular structures from molecular dynamics simulations. The first algorithm learns a Bayesian-optimal undirected probabilistic model over user-specified covariates (e.g., fluctuations, distances, angles, etc). L1 regularization is used to ensure sparse models and thus reduce the risk of over-fitting the data. The topology of the resulting model reveals important couplings between different parts of the protein, thus aiding in the analysis of molecular motions. The generative nature of the model makes it well-suited to making predictions about the global effects of local structural changes (e.g., the binding of an allosteric regulator). Additionally, the model can be used to sample new conformations. The second algorithm learns a time-varying graphical model where the topology and parameters change smoothly along the trajectory, revealing the conformational sub-states. The last algorithm learns a Markov Chain over undirected graphical models which can be used to study and simulate kinetics. We demonstrate our algorithms on multiple molecular dynamics trajectories.  相似文献   

13.
Drawing on various notions from theoretical computer science, we present a novel numerical approach, motivated by the notion of algorithmic probability, to the problem of approximating the Kolmogorov-Chaitin complexity of short strings. The method is an alternative to the traditional lossless compression algorithms, which it may complement, the two being serviceable for different string lengths. We provide a thorough analysis for all binary strings of length and for most strings of length by running all Turing machines with 5 states and 2 symbols ( with reduction techniques) using the most standard formalism of Turing machines, used in for example the Busy Beaver problem. We address the question of stability and error estimation, the sensitivity of the continued application of the method for wider coverage and better accuracy, and provide statistical evidence suggesting robustness. As with compression algorithms, this work promises to deliver a range of applications, and to provide insight into the question of complexity calculation of finite (and short) strings.Additional material can be found at the Algorithmic Nature Group website at http://www.algorithmicnature.org. An Online Algorithmic Complexity Calculator implementing this technique and making the data available to the research community is accessible at http://www.complexitycalculator.com.  相似文献   

14.
Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,(19) we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be 2-2(over)n, where n is the total number of structures.  相似文献   

15.
Identifying the structure and dynamics of synaptic interactions between neurons is the first step to understanding neural network dynamics. The presence of synaptic connections is traditionally inferred through the use of targeted stimulation and paired recordings or by post-hoc histology. More recently, causal network inference algorithms have been proposed to deduce connectivity directly from electrophysiological signals, such as extracellularly recorded spiking activity. Usually, these algorithms have not been validated on a neurophysiological data set for which the actual circuitry is known. Recent work has shown that traditional network inference algorithms based on linear models typically fail to identify the correct coupling of a small central pattern generating circuit in the stomatogastric ganglion of the crab Cancer borealis. In this work, we show that point process models of observed spike trains can guide inference of relative connectivity estimates that match the known physiological connectivity of the central pattern generator up to a choice of threshold. We elucidate the necessary steps to derive faithful connectivity estimates from a model that incorporates the spike train nature of the data. We then apply the model to measure changes in the effective connectivity pattern in response to two pharmacological interventions, which affect both intrinsic neural dynamics and synaptic transmission. Our results provide the first successful application of a network inference algorithm to a circuit for which the actual physiological synapses between neurons are known. The point process methodology presented here generalizes well to larger networks and can describe the statistics of neural populations. In general we show that advanced statistical models allow for the characterization of effective network structure, deciphering underlying network dynamics and estimating information-processing capabilities.  相似文献   

16.
Inferring regulatory networks from experimental data via probabilistic graphical models is a popular framework to gain insights into biological systems. However, the inherent noise in experimental data coupled with a limited sample size reduces the performance of network reverse engineering. Prior knowledge from existing sources of biological information can address this low signal to noise problem by biasing the network inference towards biologically plausible network structures. Although integrating various sources of information is desirable, their heterogeneous nature makes this task challenging. We propose two computational methods to incorporate various information sources into a probabilistic consensus structure prior to be used in graphical model inference. Our first model, called Latent Factor Model (LFM), assumes a high degree of correlation among external information sources and reconstructs a hidden variable as a common source in a Bayesian manner. The second model, a Noisy-OR, picks up the strongest support for an interaction among information sources in a probabilistic fashion. Our extensive computational studies on KEGG signaling pathways as well as on gene expression data from breast cancer and yeast heat shock response reveal that both approaches can significantly enhance the reconstruction accuracy of Bayesian Networks compared to other competing methods as well as to the situation without any prior. Our framework allows for using diverse information sources, like pathway databases, GO terms and protein domain data, etc. and is flexible enough to integrate new sources, if available.  相似文献   

17.
MOTIVATION: Matching a biological sequence against a probabilistic pattern (or profile) is a common task in computational biology. A probabilistic profile, represented as a scoring matrix, is more suitable than a deterministic pattern to retain the peculiarities of a given segment of a family of biological sequences. Brute-force algorithms take O(NP) to match a sequence of N characters against a profile of length P < N. RESULTS: In this work, we exploit string compression techniques to speedup brute-force profile matching. We present two algorithms, based on run-length and LZ78 encodings, that reduce computational complexity by the compression factor of the encoding.  相似文献   

18.
Particle Swarm Optimization (PSO) is a stochastic optimization approach that originated from simulations of bird flocking, and that has been successfully used in many applications as an optimization tool. Estimation of distribution algorithms (EDAs) are a class of evolutionary algorithms which perform a two-step process: building a probabilistic model from which good solutions may be generated and then using this model to generate new individuals. Two distinct research trends that emerged in the past few years are the hybridization of PSO and EDA algorithms and the parallelization of EDAs to exploit the idea of exchanging the probabilistic model information. In this work, we propose the use of a cooperative PSO/EDA algorithm based on the exchange of heterogeneous probabilistic models. The model is heterogeneous because the cooperating PSO/EDA algorithms use different methods to sample the search space. Three different exchange approaches are tested and compared in this work. In all these approaches, the amount of information exchanged is adapted based on the performance of the two cooperating swarms. The performance of the cooperative model is compared to the existing state-of-the-art PSO cooperative approaches using a suite of well-known benchmark optimization functions.  相似文献   

19.
Dependence measures and tests for independence have recently attracted a lot of attention, because they are the cornerstone of algorithms for network inference in probabilistic graphical models. Pearson''s product moment correlation coefficient is still by far the most widely used statistic yet it is largely constrained to detecting linear relationships. In this work we provide an exact formula for the th nearest neighbor distance distribution of rank-transformed data. Based on that, we propose two novel tests for independence. An implementation of these tests, together with a general benchmark framework for independence testing, are freely available as a CRAN software package (http://cran.r-project.org/web/packages/knnIndep). In this paper we have benchmarked Pearson''s correlation, Hoeffding''s , dcor, Kraskov''s estimator for mutual information, maximal information criterion and our two tests. We conclude that no particular method is generally superior to all other methods. However, dcor and Hoeffding''s are the most powerful tests for many different types of dependence.  相似文献   

20.
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号