首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.  相似文献   

2.
3.
To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.  相似文献   

4.
Genetic and pharmacological perturbation experiments, such as deleting a gene and monitoring gene expression responses, are powerful tools for studying cellular signal transduction pathways. However, it remains a challenge to automatically derive knowledge of a cellular signaling system at a conceptual level from systematic perturbation-response data. In this study, we explored a framework that unifies knowledge mining and data mining towards the goal. The framework consists of the following automated processes: 1) applying an ontology-driven knowledge mining approach to identify functional modules among the genes responding to a perturbation in order to reveal potential signals affected by the perturbation; 2) applying a graph-based data mining approach to search for perturbations that affect a common signal; and 3) revealing the architecture of a signaling system by organizing signaling units into a hierarchy based on their relationships. Applying this framework to a compendium of yeast perturbation-response data, we have successfully recovered many well-known signal transduction pathways; in addition, our analysis has led to many new hypotheses regarding the yeast signal transduction system; finally, our analysis automatically organized perturbed genes as a graph reflecting the architecture of the yeast signaling system. Importantly, this framework transformed molecular findings from a gene level to a conceptual level, which can be readily translated into computable knowledge in the form of rules regarding the yeast signaling system, such as “if genes involved in the MAPK signaling are perturbed, genes involved in pheromone responses will be differentially expressed.”  相似文献   

5.

Background  

The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.  相似文献   

6.
Ephrins and semaphorins regulate a wide variety of developmental processes, including axon guidance and cell migration. We have studied the roles of the ephrin EFN-4 and the semaphorin MAB-20 in patterning cell-cell contacts among the cells that give rise to the ray sensory organs of Caenorhabditis elegans. In wild-type, contacts at adherens junctions form only between cells belonging to the same ray. In efn-4 and mab-20 mutants, ectopic contacts form between cells belonging to different rays. Ectopic contacts also occur in mutants in regulatory genes that specify ray morphological identity. We used efn-4 and mab-20 reporters to investigate whether these ray identity genes function through activating expression of efn-4 or mab-20 in ray cells. mab-20 reporter expression in ray cells was unaffected by mutants in the Pax6 homolog mab-18 and the Hox genes egl-5 and mab-5, suggesting that these genes do not regulate mab-20 expression. We find that mab-18 is necessary for activating efn-4 reporter expression, but this activity alone is not sufficient to account for mab-18 function in controlling cell-cell contact formation. In egl-5 mutants, efn-4 reporter expression in certain ray cells was increased, inconsistent with a simple repulsion model for efn-4 action. The evidence indicates that ray identity genes primarily regulate ray morphogenesis by pathways other than through regulation of expression of semaphorin and ephrin.  相似文献   

7.
Jung KH  Lee J  Dardick C  Seo YS  Cao P  Canlas P  Phetsom J  Xu X  Ouyang S  An K  Cho YJ  Lee GC  Lee Y  An G  Ronald PC 《PLoS genetics》2008,4(8):e1000164
Functional redundancy limits detailed analysis of genes in many organisms. Here, we report a method to efficiently overcome this obstacle by combining gene expression data with analysis of gene-indexed mutants. Using a rice NSF45K oligo-microarray to compare 2-week-old light- and dark-grown rice leaf tissue, we identified 365 genes that showed significant 8-fold or greater induction in the light relative to dark conditions. We then screened collections of rice T-DNA insertional mutants to identify rice lines with mutations in the strongly light-induced genes. From this analysis, we identified 74 different lines comprising two independent mutant lines for each of 37 light-induced genes. This list was further refined by mining gene expression data to exclude genes that had potential functional redundancy due to co-expressed family members (12 genes) and genes that had inconsistent light responses across other publicly available microarray datasets (five genes). We next characterized the phenotypes of rice lines carrying mutations in ten of the remaining candidate genes and then carried out co-expression analysis associated with these genes. This analysis effectively provided candidate functions for two genes of previously unknown function and for one gene not directly linked to the tested biochemical pathways. These data demonstrate the efficiency of combining gene family-based expression profiles with analyses of insertional mutants to identify novel genes and their functions, even among members of multi-gene families.  相似文献   

8.
9.
Global gene expression profiling has emerged as a major tool in understanding complex response patterns of biological systems to perturbations. However, a lack of unbiased analytical approaches has restricted the utility of complex microarray data to gain novel system level insights. Here we report a strategy, express path analysis (EPA), that helps to establish various pathways differentially recruited to achieve specific cellular responses under contrasting environmental conditions in an unbiased manner. The analysis superimposes differentially regulated genes between contrasting environments onto the network of functional protein associations followed by a series of iterative enrichments and network analysis. To test the utility of the approach, we infected THP1 macrophage cells with a virulent Mycobacterium tuberculosis strain (H37Rv) or the attenuated non-virulent strain H37Ra as contrasting perturbations and generated the temporal global expression profiles. EPA of the results provided details of response-specific and time-dependent host molecular network perturbations. Further analysis identified tyrosine kinase Src as the major regulatory hub discriminating the responses between wild-type and attenuated Mtb infection. We were then able to verify this novel role of Src experimentally and show that Src executes its role through regulating two vital antimicrobial processes of the host cells (i.e. autophagy and acidification of phagolysosome). These results bear significant potential for developing novel anti-tuberculosis therapy. We propose that EPA could prove extremely useful in understanding complex cellular responses for a variety of perturbations, including pathogenic infections.  相似文献   

10.
Aylor DL  Zeng ZB 《PLoS genetics》2008,4(3):e1000029
Gene expression data has been used in lieu of phenotype in both classical and quantitative genetic settings. These two disciplines have separate approaches to measuring and interpreting epistasis, which is the interaction between alleles at different loci. We propose a framework for estimating and interpreting epistasis from a classical experiment that combines the strengths of each approach. A regression analysis step accommodates the quantitative nature of expression measurements by estimating the effect of gene deletions plus any interaction. Effects are selected by significance such that a reduced model describes each expression trait. We show how the resulting models correspond to specific hierarchical relationships between two regulator genes and a target gene. These relationships are the basic units of genetic pathways and genomic system diagrams. Our approach can be extended to analyze data from a variety of experiments, multiple loci, and multiple environments.  相似文献   

11.
In complex diseases, various combinations of genomic perturbations often lead to the same phenotype. On a molecular level, combinations of genomic perturbations are assumed to dys-regulate the same cellular pathways. Such a pathway-centric perspective is fundamental to understanding the mechanisms of complex diseases and the identification of potential drug targets. In order to provide an integrated perspective on complex disease mechanisms, we developed a novel computational method to simultaneously identify causal genes and dys-regulated pathways. First, we identified a representative set of genes that are differentially expressed in cancer compared to non-tumor control cases. Assuming that disease-associated gene expression changes are caused by genomic alterations, we determined potential paths from such genomic causes to target genes through a network of molecular interactions. Applying our method to sets of genomic alterations and gene expression profiles of 158 Glioblastoma multiforme (GBM) patients we uncovered candidate causal genes and causal paths that are potentially responsible for the altered expression of disease genes. We discovered a set of putative causal genes that potentially play a role in the disease. Combining an expression Quantitative Trait Loci (eQTL) analysis with pathway information, our approach allowed us not only to identify potential causal genes but also to find intermediate nodes and pathways mediating the information flow between causal and target genes. Our results indicate that different genomic perturbations indeed dys-regulate the same functional pathways, supporting a pathway-centric perspective of cancer. While copy number alterations and gene expression data of glioblastoma patients provided opportunities to test our approach, our method can be applied to any disease system where genetic variations play a fundamental causal role.  相似文献   

12.
13.
We develop a statistical framework to study the relationship between chromatin features and gene expression. This can be used to predict gene expression of protein coding genes, as well as microRNAs. We demonstrate the prediction in a variety of contexts, focusing particularly on the modENCODE worm datasets. Moreover, our framework reveals the positional contribution around genes (upstream or downstream) of distinct chromatin features to the overall prediction of expression levels.  相似文献   

14.
Wessel J  Zapala MA  Schork NJ 《Genomics》2007,90(1):132-142
The availability of high-throughput genotyping technologies and microarray assays has allowed researchers to consider pursuing investigations whose ultimate goal is the identification of genetic variations that influence levels of gene expression, e.g., "expression quantitative trait locus" or "eQTL" mapping studies. However, the large number of genes whose expression levels can be tested for association with genetic variations in such studies can create both statistical and biological interpretive problems. We consider the integrated analysis of eQTL mapping data that incorporates pathway, function, and disease process information. The goal of this analysis is to determine if compelling patterns emerge from the data that are consistent with the notion that perturbations in the molecular physiologic environment induced by genetic variations implicate the expression patterns of multiple genes via genetic network relationships or feedback mechanisms. We apply available genetic network and pathway analysis software, as well as a novel regression analysis technique, to carry out the proposed studies. We also consider extensions of the proposed strategies and areas of future research.  相似文献   

15.
Tran LM  Rizk ML  Liao JC 《Biophysical journal》2008,95(12):5606-5617
Complete modeling of metabolic networks is desirable, but it is difficult to accomplish because of the lack of kinetics. As a step toward this goal, we have developed an approach to build an ensemble of dynamic models that reach the same steady state. The models in the ensemble are based on the same mechanistic framework at the elementary reaction level, including known regulations, and span the space of all kinetics allowable by thermodynamics. This ensemble allows for the examination of possible phenotypes of the network upon perturbations, such as changes in enzyme expression levels. The size of the ensemble is reduced by acquiring data for such perturbation phenotypes. If the mechanistic framework is approximately accurate, the ensemble converges to a smaller set of models and becomes more predictive. This approach bypasses the need for detailed characterization of kinetic parameters and arrives at a set of models that describes relevant phenotypes upon enzyme perturbations.  相似文献   

16.
MOTIVATION: Gene expression data have become an instrumental resource in describing the molecular state associated with various cellular phenotypes and responses to environmental perturbations. The utility of expression profiling has been demonstrated in partitioning clinical states, predicting the class of unknown samples and in assigning putative functional roles to previously uncharacterized genes based on profile similarity. However, gene expression profiling has had only limited success in identifying therapeutic targets. This is partly due to the fact that current methods based on fold-change focus only on single genes in isolation, and thus cannot convey causal information. In this paper, we present a technique for analysis of expression data in a graph-theoretic framework that relies on associations between genes. We describe the global organization of these networks and biological correlates of their structure. We go on to present a novel technique for the molecular characterization of disparate cellular states that adds a new dimension to the fold-based methods and conclude with an example application to a human medulloblastoma dataset. RESULTS: We have shown that expression networks generated from large model-organism expression datasets are scale-free and that the average clustering coefficient of these networks is several orders of magnitude higher than would be expected for similarly sized scale-free networks, suggesting an inherent hierarchical modularity similar to that previously identified in other biological networks. Furthermore, we have shown that these properties are robust with respect to the parameters of network construction. We have demonstrated an enrichment of genes having lethal knockout phenotypes in the high-degree (i.e. hub) nodes in networks generated from aggregate condition datasets; using process-focused Saccharomyces cerivisiae datasets we have demonstrated additional high-degree enrichments of condition-specific genes encoding proteins known to be involved in or important for the processes interrogated by the microarrays. These results demonstrate the utility of network analysis applied to expression data in identifying genes that are regulated in a state-specific manner. We concluded by showing that a sample application to a human clinical dataset prominently identified a known therapeutic target. AVAILABILITY: Software implementing the methods for network generation presented in this paper is available for academic use by request from the authors in the form of compiled linux binary executables.  相似文献   

17.
18.
19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号