首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Many statistical methods have been developed to screen for differentially expressed genes associated with specific phenotypes in the microarray data. However, it remains a major challenge to synthesize the observed expression patterns with abundant biological knowledge for more complete understanding of the biological functions among genes. Various methods including clustering analysis on genes, neural network, Bayesian network and pathway analysis have been developed toward this goal. In most of these procedures, the activation and inhibition relationships among genes have hardly been utilized in the modeling steps. We propose two novel Bayesian models to integrate the microarray data with the putative pathway structures obtained from the KEGG database and the directional gene–gene interactions in the medical literature. We define the symmetric Kullback–Leibler divergence of a pathway, and use it to identify the pathway(s) most supported by the microarray data. Monte Carlo Markov Chain sampling algorithm is given for posterior computation in the hierarchical model. The proposed method is shown to select the most supported pathway in an illustrative example. Finally, we apply the methodology to a real microarray data set to understand the gene expression profile of osteoblast lineage at defined stages of differentiation. We observe that our method correctly identifies the pathways that are reported to play essential roles in modulating bone mass.  相似文献   

2.
3.
Prediction of molecular interaction networks from large-scale datasets in genomics and other omics experiments is an important task in terms of both developing bioinformatics methods and solving biological problems. We have applied a kernel-based network inference method for extracting functionally related genes to the response of nitrogen deprivation in cyanobacteria Anabaena sp. PCC 7120 integrating three heterogeneous datasets: microarray data, phylogenetic profiles, and gene orders on the chromosome. We obtained 1348 predicted genes that are somehow related to known genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. While this dataset contained previously known genes related to the nitrogen deprivation condition, it also contained additional genes. Thus, we attempted to select any relevant genes using the constraints of Pfam domains and NtcA-binding sites. We found candidates of nitrogen metabolism-related genes, which are depicted as extensions of existing KEGG pathways. The prediction of functional relationships between proteins rather than functions of individual proteins will thus assist the discovery from the large-scale datasets.  相似文献   

4.
MOTIVATION: Our purpose is to develop a statistical modeling approach for cancer biomarker discovery and provide new insights into early cancer detection. We propose the concept of dependence network, apply it for identifying cancer biomarkers, and study the difference between the protein or gene samples from cancer and non-cancer subjects based on mass-spectrometry (MS) and microarray data. RESULTS: Three MS and two gene microarray datasets are studied. Clear differences are observed in the dependence networks for cancer and non-cancer samples. Protein/gene features are examined three at one time through an exhaustive search. Dependence networks are constructed by binding triples identified by the eigenvalue pattern of the dependence model, and are further compared to identify cancer biomarkers. Such dependence-network-based biomarkers show much greater consistency under 10-fold cross-validation than the classification-performance-based biomarkers. Furthermore, the biological relevance of the dependence-network-based biomarkers using microarray data is discussed. The proposed scheme is shown promising for cancer diagnosis and prediction. AVAILABILITY: See supplements: http://dsplab.eng.umd.edu/~genomics/dependencenetwork/  相似文献   

5.
Many bioinformatics problems can be tackled from a fresh angle offered by the network perspective. Directly inspired by metabolic network structural studies, we propose an improved gene clustering approach for inferring gene signaling pathways from gene microarray data. Based on the construction of co-expression networks that consists of both significantly linear and non-linear gene associations together with controlled biological and statistical significance, our approach tends to group functionally related genes into tight clusters despite their expression dissimilarities. We illustrate our approach and compare it to the traditional clustering approaches on a yeast galactose metabolism dataset and a retinal gene expression dataset. Our approach greatly outperforms the traditional approach in rediscovering the relatively well known galactose metabolism pathway in yeast and in clustering genes of the photoreceptor differentiation pathway. AVAILABILITY: The clustering method has been implemented in an R package "GeneNT" that is freely available from: http://www.cran.org.  相似文献   

6.

Background

We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes.

Methodology

Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail.

Conclusions

We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers.  相似文献   

7.
8.
MOTIVATION: Experimental gene expression data sets, such as those generated by microarray or gene chip experiments, typically have significant noise and complicated interconnectivities that make understanding even simple regulatory patterns difficult. Given these complications, characterizing the effectiveness of different analysis techniques to uncover network groups and structures remains a challenge. Generating simulated expression patterns with known biological features of expression complexity, diversity and interconnectivities provides a more controlled means of investigating the appropriateness of different analysis methods. A simulation-based approach can systematically evaluate different gene expression analysis techniques and provide a basis for improved methods in dynamic metabolic network reconstruction. RESULTS: We have developed an on-line simulator, called eXPatGen, to generate dynamic gene expression patterns typical of microarray experiments. eXPatGen provides a quantitative network structure to represent key biological features, including the induction, repression, and cascade regulation of messenger RNA (mRNA). The simulation is modular such that the expression model can be replaced with other representations, depending on the level of biological detail required by the user. Two example gene networks, of 25 and 100 genes respectively, were simulated. Two standard analysis techniques, clustering and PCA analysis, were performed on the resulting expression patterns in order to demonstrate how the simulator might be used to evaluate different analysis methods and provide experimental guidance for biological studies of gene expression. AVAILABILITY: http://www.che.udel.edu/eXPatGen/  相似文献   

9.
10.
11.
12.
Kim S  Imoto S  Miyano S 《Bio Systems》2004,75(1-3):57-65
We propose a dynamic Bayesian network and nonparametric regression model for constructing a gene network from time series microarray gene expression data. The proposed method can overcome a shortcoming of the Bayesian network model in the sense of the construction of cyclic regulations. The proposed method can analyze the microarray data as a continuous data and can capture even nonlinear relations among genes. It can be expected that this model will give a deeper insight into complicated biological systems. We also derive a new criterion for evaluating an estimated network from Bayes approach. We conduct Monte Carlo experiments to examine the effectiveness of the proposed method. We also demonstrate the proposed method through the analysis of the Saccharomyces cerevisiae gene expression data.  相似文献   

13.
In poplar, genetic research on wood properties is very important for the improvement of wood quality. Studies of wood formation genes at each developmental stage using modern biotechnology have often been limited to several genes or gene families. Because of the complex regulatory network involved in the co-expression and interactions of thousands of genes, however, the genetic mechanisms of wood formation must be surveyed on a genome-wide scale. In this study, we identified wood formation-related genes using a differentially co-expressed (DCE) gene subset approach based on biological networks inferred from microarray data. Gene co-expression networks in leaf, root, and wood tissues were first constructed and topologically analyzed using microarray data collected from the Gene Expression Omnibus. The DCE gene modules in wood-forming tissue were then detected based on graph theory, which was followed by gene ontology (GO) enrichment analysis and GO annotation of probe sets. Finally, 72 probe sets were identified in the largest cohesive subgroup of the DCE gene network in wood tissue, with most of the probe sets associated with wood formation-related biological processes and GO cellular component categories. The approach described in this paper provides an effective strategy to identify wood formation genes in poplar and should contribute to the better understanding of the genetic and molecular mechanisms underlying wood properties in trees.  相似文献   

14.
MOTIVATION: The immune response to bacterial infection represents a complex network of dynamic gene and protein interactions. We present an optimized reverse engineering strategy aimed at a reconstruction of this kind of interaction networks. The proposed approach is based on both microarray data and available biological knowledge. RESULTS: The main kinetics of the immune response were identified by fuzzy clustering of gene expression profiles (time series). The number of clusters was optimized using various evaluation criteria. For each cluster a representative gene with a high fuzzy-membership was chosen in accordance with available physiological knowledge. Then hypothetical network structures were identified by seeking systems of ordinary differential equations, whose simulated kinetics could fit the gene expression profiles of the cluster-representative genes. For the construction of hypothetical network structures singular value decomposition (SVD) based methods and a newly introduced heuristic Network Generation Method here were compared. It turned out that the proposed novel method could find sparser networks and gave better fits to the experimental data. CONTACT: Reinhard.Guthke@hki-jena.de.  相似文献   

15.
Gene regulatory networks are a crucial aspect of systems biology in describing molecular mechanisms of the cell. Various computational models rely on random gene selection to infer such networks from microarray data. While incorporation of prior knowledge into data analysis has been deemed important, in practice, it has generally been limited to referencing genes in probe sets and using curated knowledge bases. We investigate the impact of augmenting microarray data with semantic relations automatically extracted from the literature, with the view that relations encoding gene/protein interactions eliminate the need for random selection of components in non-exhaustive approaches, producing a more accurate model of cellular behavior. A genetic algorithm is then used to optimize the strength of interactions using microarray data and an artificial neural network fitness function. The result is a directed and weighted network providing the individual contribution of each gene to its target. For testing, we used invasive ductile carcinoma of the breast to query the literature and a microarray set containing gene expression changes in these cells over several time points. Our model demonstrates significantly better fitness than the state-of-the-art model, which relies on an initial random selection of genes. Comparison to the component pathways of the KEGG Pathways in Cancer map reveals that the resulting networks contain both known and novel relationships. The p53 pathway results were manually validated in the literature. 60% of non-KEGG relationships were supported (74% for highly weighted interactions). The method was then applied to yeast data and our model again outperformed the comparison model. Our results demonstrate the advantage of combining gene interactions extracted from the literature in the form of semantic relations with microarray analysis in generating contribution-weighted gene regulatory networks. This methodology can make a significant contribution to understanding the complex interactions involved in cellular behavior and molecular physiology.  相似文献   

16.
MOTIVATION: A promising and reliable approach to annotate gene function is clustering genes not only by using gene expression data but also literature information, especially gene networks. RESULTS: We present a systematic method for gene clustering by combining these totally different two types of data, particularly focusing on network modularity, a global feature of gene networks. Our method is based on learning a probabilistic model, which we call a hidden modular random field in which the relation between hidden variables directly represents a given gene network. Our learning algorithm which minimizes an energy function considering the network modularity is practically time-efficient, regardless of using the global network property. We evaluated our method by using a metabolic network and microarray expression data, changing with microarray datasets, parameters of our model and gold standard clusters. Experimental results showed that our method outperformed other four competing methods, including k-means and existing graph partitioning methods, being statistically significant in all cases. Further detailed analysis showed that our method could group a set of genes into a cluster which corresponds to the folate metabolic pathway while other methods could not. From these results, we can say that our method is highly effective for gene clustering and annotating gene function.  相似文献   

17.
We investigate in this paper reverse engineering of gene regulatory networks from time-series microarray data. We apply dynamic Bayesian networks (DBNs) for modeling cell cycle regulations. In developing a network inference algorithm, we focus on soft solutions that can provide a posteriori probability (APP) of network topology. In particular, we propose a variational Bayesian structural expectation maximization algorithm that can learn the posterior distribution of the network model parameters and topology jointly. We also show how the obtained APPs of the network topology can be used in a Bayesian data integration strategy to integrate two different microarray data sets. The proposed VBSEM algorithm has been tested on yeast cell cycle data sets. To evaluate the confidence of the inferred networks, we apply a moving block bootstrap method. The inferred network is validated by comparing it to the KEGG pathway map.  相似文献   

18.
Gene co-expression, in many cases, implies the presence of a functional linkage between genes. Co-expression analysis has uncovered gene regulatory mechanisms in model organisms such as Escherichia coli and yeast. Recently, accumulation of Arabidopsis microarray data has facilitated a genome-wide inspection of gene co-expression profiles in this model plant. An approach using network analysis has provided an intuitive way to represent complex co-expression patterns between many genes. Co-expression network analysis has enabled us to extract modules, or groups of tightly co-expressed genes, associated with biological processes. Furthermore, integrated analysis of gene expression and metabolite accumulation has allowed us to hypothesize the functions of genes associated with specific metabolic processes. Co-expression network analysis is a powerful approach for data-driven hypothesis construction and gene prioritization, and provides novel insights into the system-level understanding of plant cellular processes.  相似文献   

19.
20.
An efficient two-step Markov blanket method for modeling and inferring complex regulatory networks from large-scale microarray data sets is presented. The inferred gene regulatory network (GRN) is based on the time series gene expression data capturing the underlying gene interactions. For constructing a highly accurate GRN, the proposed method performs: 1) discovery of a gene's Markov Blanket (MB), 2) formulation of a flexible measure to determine the network's quality, 3) efficient searching with the aid of a guided genetic algorithm, and 4) pruning to obtain a minimal set of correct interactions. Investigations are carried out using both synthetic as well as yeast cell cycle gene expression data sets. The realistic synthetic data sets validate the robustness of the method by varying topology, sample size, time delay, noise, vertex in-degree, and the presence of hidden nodes. It is shown that the proposed approach has excellent inferential capabilities and high accuracy even in the presence of noise. The gene network inferred from yeast cell cycle data is investigated for its biological relevance using well-known interactions, sequence analysis, motif patterns, and GO data. Further, novel interactions are predicted for the unknown genes of the network and their influence on other genes is also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号