共查询到20条相似文献,搜索用时 156 毫秒
1.
2.
3.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request. 相似文献
4.
MOTIVATION: Many biomedical and clinical research problems involve discovering causal relationships between observations gathered from temporal events. Dynamic Bayesian networks are a powerful modeling approach to describe causal or apparently causal relationships, and support complex medical inference, such as future response prediction, automated learning, and rational decision making. Although many engines exist for creating Bayesian networks, most require a local installation and significant data manipulation to be practical for a general biologist or clinician. No software pipeline currently exists for interpretation and inference of dynamic Bayesian networks learned from biomedical and clinical data. RESULTS: miniTUBA is a web-based modeling system that allows clinical and biomedical researchers to perform complex medical/clinical inference and prediction using dynamic Bayesian network analysis with temporal datasets. The software allows users to choose different analysis parameters (e.g. Markov lags and prior topology), and continuously update their data and refine their results. miniTUBA can make temporal predictions to suggest interventions based on an automated learning process pipeline using all data provided. Preliminary tests using synthetic data and laboratory research data indicate that miniTUBA accurately identifies regulatory network structures from temporal data. AVAILABILITY: miniTUBA is available at http://www.minituba.org. 相似文献
5.
6.
Many mechanisms of neural processing rely critically upon the synaptic connectivity between neurons. As our ability to simultaneously record from large populations of neurons expands, the ability to infer network connectivity from this data has become a major goal of computational neuroscience. To address this issue, we employed several different methods to infer synaptic connections from simulated spike data from a realistic local cortical network model. This approach allowed us to directly compare the accuracy of different methods in predicting synaptic connectivity. We compared the performance of model-free (coherence measure and transfer entropy) and model-based (coupled escape rate model) methods of connectivity inference, applying those methods to the simulated spike data from the model networks with different network topologies. Our results indicate that the accuracy of the inferred connectivity was higher for highly clustered, near regular, or small-world networks, while accuracy was lower for random networks, irrespective of which analysis method was employed. Among the employed methods, the model-based method performed best. This model performed with higher accuracy, was less sensitive to threshold changes, and required less data to make an accurate assessment of connectivity. Given that cortical connectivity tends to be highly clustered, our results outline a powerful analytical tool for inferring local synaptic connectivity from observations of spontaneous activity. 相似文献
7.
8.
9.
Background
All currently available methods of network/association inference from microarray gene expression measurements implicitly assume that such measurements represent the actual expression levels of different genes within each cell included in the biological sample under study. Contrary to this common belief, modern microarray technology produces signals aggregated over a random number of individual cells, a "nitty-gritty" aspect of such arrays, thereby causing a random effect that distorts the correlation structure of intra-cellular gene expression levels. 相似文献10.
11.
12.
13.
Background
Inferring Gene Regulatory Networks (GRNs) from time course microarray data suffers from the dimensionality problem created by the short length of available time series compared to the large number of genes in the network. To overcome this, data integration from diverse sources is mandatory. Microarray data from different sources and platforms are publicly available, but integration is not straightforward, due to platform and experimental differences.Methods
We analyse here different normalisation approaches for microarray data integration, in the context of reverse engineering of GRN quantitative models. We introduce two preprocessing approaches based on existing normalisation techniques and provide a comprehensive comparison of normalised datasets.Conclusions
Results identify a method based on a combination of Loess normalisation and iterative K-means as best for time series normalisation for this problem. 相似文献14.
Dilated cardiomyopathy (DCM) is a leading cause of heart failure (HF) and cardiac transplantations in Western countries. Single-source gene expression analysis studies have identified potential disease biomarkers and drug targets. However, because of the diversity of experimental settings and relative lack of data, concerns have been raised about the robustness and reproducibility of the predictions. This study presents the identification of robust and reproducible DCM signature genes based on the integration of several independent data sets and functional network information. Gene expression profiles from three public data sets containing DCM and non-DCM samples were integrated and analyzed, which allowed the implementation of clinical diagnostic models. Differentially expressed genes were evaluated in the context of a global protein–protein interaction network, constructed as part of this study. Potential associations with HF were identified by searching the scientific literature. From these analyses, classification models were built and their effectiveness in differentiating between DCM and non-DCM samples was estimated. The main outcome was a set of integrated, potentially novel DCM signature genes, which may be used as reliable disease biomarkers. An empirical demonstration of the power of the integrative classification models against single-source models is also given. 相似文献
15.
Background
Recent analysis of the yeast gene network shows that most genes have few inputs, indicating that enumerative gene reconstruction methods are both useful and computationally feasible. A simple enumerative reconstruction method based on a discrete dynamical system model is used to study how microarray experiments involving modulated global perturbations can be designed to obtain reasonably accurate reconstructions. The method is tested on artificial gene networks with biologically realistic in/out degree characteristics. 相似文献16.
17.
18.
Ram R Chetty M 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(2):353-367
An efficient two-step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray data sets is presented. The inferred gene regulatory network (GRN) is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs: 1) discovery of a gene's Markov Blanket (MB), 2) formulation of a flexible measure to determine the network's quality, 3) efficient searching with the aid of a guided genetic algorithm, and 4) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell cycle gene expression data sets. The realistic synthetic data sets validate the robustness of the method by varying topology, sample size, time delay, noise, vertex in-degree, and the presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell cycle data is investigated for its biological relevance using well-known interactions, sequence analysis, motif patterns, and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed. 相似文献
19.
20.
The amount of data produced by molecular biologists is growing at an exponential rate. Some of the fastest growing sets of data are measurements of gene expression, comparable in quantity only to gene sequences and the vast biological literature. Both gene expression data and sequence data offer hints as to the functions of thousands of newly discovered genes, but neither give complete answers. Therefore, much effort is being focused on integrating these large data sets and combining them with all available functional data to draw inferences about the functions of uncharacterised genes. This review discusses the most pertinent functional data for genome-wide functional inference and describes several methods by which these disparate data types are being integrated. 相似文献