首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hokeun Sun  Hongzhe Li 《Biometrics》2012,68(4):1197-1206
Summary Gaussian graphical models have been widely used as an effective method for studying the conditional independency structure among genes and for constructing genetic networks. However, gene expression data typically have heavier tails or more outlying observations than the standard Gaussian distribution. Such outliers in gene expression data can lead to wrong inference on the dependency structure among the genes. We propose a l1 penalized estimation procedure for the sparse Gaussian graphical models that is robustified against possible outliers. The likelihood function is weighted according to how the observation is deviated, where the deviation of the observation is measured based on its own likelihood. An efficient computational algorithm based on the coordinate gradient descent method is developed to obtain the minimizer of the negative penalized robustified‐likelihood, where nonzero elements of the concentration matrix represents the graphical links among the genes. After the graphical structure is obtained, we re‐estimate the positive definite concentration matrix using an iterative proportional fitting algorithm. Through simulations, we demonstrate that the proposed robust method performs much better than the graphical Lasso for the Gaussian graphical models in terms of both graph structure selection and estimation when outliers are present. We apply the robust estimation procedure to an analysis of yeast gene expression data and show that the resulting graph has better biological interpretation than that obtained from the graphical Lasso.  相似文献   

2.
3.
A duplication growth model of gene expression networks   总被引:8,自引:0,他引:8  
  相似文献   

4.
MOTIVATION: Genetic networks are often described statistically using graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an 'ill-posed' inverse problem. METHODS: We introduce a novel framework for small-sample inference of graphical models from gene expression data. Specifically, we focus on the so-called graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (1) improved (regularized) small-sample point estimates of partial correlation, (2) an exact test of edge inclusion with adaptive estimation of the degree of freedom and (3) a heuristic network search based on false discovery rate multiple testing. Steps (2) and (3) correspond to an empirical Bayes estimate of the network topology. RESULTS: Using computer simulations, we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for small-sample datasets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding large-scale gene association network for 3883 genes.  相似文献   

5.
6.
7.
A conjugate Wishart prior is used to present a simple and rapid procedure for computing the analytic posterior (mode and uncertainty) of the precision matrix elements of a Gaussian distribution. An interpretation of covariance estimates in terms of eigenvalues is presented, along with a simple decision-rule step to improve the performance of the estimation of sparse precision matrices and associated graphs. In this, elements of the estimated precision matrix that are zero or near zero can be detected and shrunk to zero. Simulated data sets are used to compare posterior estimation with decision-rule with two other Wishart-based approaches and with graphical lasso. Furthermore, an empirical Bayes procedure is used to select prior hyperparameters in high dimensional cases with extension to sparsity.  相似文献   

8.
Applications on inference of biological networks have raised a strong interest in the problem of graph estimation in high-dimensional Gaussian graphical models. To handle this problem, we propose a two-stage procedure which first builds a family of candidate graphs from the data, and then selects one graph among this family according to a dedicated criterion. This estimation procedure is shown to be consistent in a high-dimensional setting, and its risk is controlled by a non-asymptotic oracle-like inequality. The procedure is tested on a real data set concerning gene expression data, and its performances are assessed on the basis of a large numerical study. The procedure is implemented in the R-package GGMselect available on the CRAN.  相似文献   

9.
We propose a new statistical method for constructing a genetic network from microarray gene expression data by using a Bayesian network. An essential point of Bayesian network construction is the estimation of the conditional distribution of each random variable. We consider fitting nonparametric regression models with heterogeneous error variances to the microarray gene expression data to capture the nonlinear structures between genes. Selecting the optimal graph, which gives the best representation of the system among genes, is still a problem to be solved. We theoretically derive a new graph selection criterion from Bayes approach in general situations. The proposed method includes previous methods based on Bayesian networks. We demonstrate the effectiveness of the proposed method through the analysis of Saccharomyces cerevisiae gene expression data newly obtained by disrupting 100 genes.  相似文献   

10.
The standard approach for identifying gene networks is based on experimental perturbations of gene regulatory systems such as gene knock-out experiments, followed by a genome-wide profiling of differential gene expressions. However, this approach is significantly limited in that it is not possible to perturb more than one or two genes simultaneously to discover complex gene interactions or to distinguish between direct and indirect downstream regulations of the differentially-expressed genes. As an alternative, genetical genomics study has been proposed to treat naturally-occurring genetic variants as potential perturbants of gene regulatory system and to recover gene networks via analysis of population gene-expression and genotype data. Despite many advantages of genetical genomics data analysis, the computational challenge that the effects of multifactorial genetic perturbations should be decoded simultaneously from data has prevented a widespread application of genetical genomics analysis. In this article, we propose a statistical framework for learning gene networks that overcomes the limitations of experimental perturbation methods and addresses the challenges of genetical genomics analysis. We introduce a new statistical model, called a sparse conditional Gaussian graphical model, and describe an efficient learning algorithm that simultaneously decodes the perturbations of gene regulatory system by a large number of SNPs to identify a gene network along with expression quantitative trait loci (eQTLs) that perturb this network. While our statistical model captures direct genetic perturbations of gene network, by performing inference on the probabilistic graphical model, we obtain detailed characterizations of how the direct SNP perturbation effects propagate through the gene network to perturb other genes indirectly. We demonstrate our statistical method using HapMap-simulated and yeast eQTL datasets. In particular, the yeast gene network identified computationally by our method under SNP perturbations is well supported by the results from experimental perturbation studies related to DNA replication stress response.  相似文献   

11.
Huihang Liu  Xinyu Zhang 《Biometrics》2023,79(3):2050-2062
Advances in information technologies have made network data increasingly frequent in a spectrum of big data applications, which is often explored by probabilistic graphical models. To precisely estimate the precision matrix, we propose an optimal model averaging estimator for Gaussian graphs. We prove that the proposed estimator is asymptotically optimal when candidate models are misspecified. The consistency and the asymptotic distribution of model averaging estimator, and the weight convergence are also studied when at least one correct model is included in the candidate set. Furthermore, numerical simulations and a real data analysis on yeast genetic data are conducted to illustrate that the proposed method is promising.  相似文献   

12.
13.
MGraph: graphical models for microarray data analysis   总被引:2,自引:0,他引:2  
  相似文献   

14.
We propose a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including protein-protein interactions, protein-DNA interactions, binding site information, existing literature and so on. Microarray data do not contain enough information for constructing gene networks accurately in many cases. Our method adds biological knowledge to the estimation method of gene networks under a Bayesian statistical framework, and also controls the trade-off between microarray information and biological knowledge automatically. We conduct Monte Carlo simulations to show the effectiveness of the proposed method. We analyze Saccharomyces cerevisiae gene expression data as an application.  相似文献   

15.
MOTIVATION: An important problem in systems biology is the inference of biochemical pathways and regulatory networks from postgenomic data. Various reverse engineering methods have been proposed in the literature, and it is important to understand their relative merits and shortcomings. In the present paper, we compare the accuracy of reconstructing gene regulatory networks with three different modelling and inference paradigms: (1) Relevance networks (RNs): pairwise association scores independent of the remaining network; (2) graphical Gaussian models (GGMs): undirected graphical models with constraint-based inference, and (3) Bayesian networks (BNs): directed graphical models with score-based inference. The evaluation is carried out on the Raf pathway, a cellular signalling network describing the interaction of 11 phosphorylated proteins and phospholipids in human immune system cells. We use both laboratory data from cytometry experiments as well as data simulated from the gold-standard network. We also compare passive observations with active interventions. RESULTS: On Gaussian observational data, BNs and GGMs were found to outperform RNs. The difference in performance was not significant for the non-linear simulated data and the cytoflow data, though. Also, we did not observe a significant difference between BNs and GGMs on observational data in general. However, for interventional data, BNs outperform GGMs and RNs, especially when taking the edge directions rather than just the skeletons of the graphs into account. This suggests that the higher computational costs of inference with BNs over GGMs and RNs are not justified when using only passive observations, but that active interventions in the form of gene knockouts and over-expressions are required to exploit the full potential of BNs. AVAILABILITY: Data, software and supplementary material are available from http://www.bioss.sari.ac.uk/staff/adriano/research.html  相似文献   

16.
Babur O  Colak R  Demir E  Dogrusoz U 《Proteomics》2008,8(11):2196-2198
High-throughput experiments, most significantly DNA microarrays, provide us with system-scale profiles. Connecting these data with existing biological networks poses a formidable challenge to uncover facts about a cell's proteome. Studies and tools with this purpose are limited to networks with simple structure, such as protein-protein interaction graphs, or do not go much beyond than simply displaying values on the network. We have built a microarray data analysis tool, named PATIKAmad, which can be used to associate microarray data with the pathway models in mechanistic detail, and provides facilities for visualization, clustering, querying, and navigation of biological graphs related with loaded microarray experiments. PATIKAmad is freely available to noncommercial users as a new module of PATIKAweb at http://web.patika.org.  相似文献   

17.
Biomarkers are often organized into networks, in which the strengths of network connections vary across subjects depending on subject-specific covariates (eg, genetic variants). Variation of network connections, as subject-specific feature variables, has been found to predict disease clinical outcome. In this work, we develop a two-stage method to estimate biomarker networks that account for heterogeneity among subjects and evaluate network's association with disease clinical outcome. In the first stage, we propose a conditional Gaussian graphical model with mean and precision matrix depending on covariates to obtain covariate-dependent networks with connection strengths varying across subjects while assuming homogeneous network structure. In the second stage, we evaluate clinical utility of network measures (connection strengths) estimated from the first stage. The second-stage analysis provides the relative predictive power of between-region network measures on clinical impairment in the context of regional biomarkers and existing disease risk factors. We assess the performance of proposed method by extensive simulation studies and application to a Huntington's disease (HD) study to investigate the effect of HD causal gene on the rate of change in motor symptom through affecting brain subcortical and cortical gray matter atrophy connections. We show that cortical network connections and subcortical volumes, but not subcortical connections are identified to be predictive of clinical motor function deterioration. We validate these findings in an independent HD study. Lastly, highly similar patterns seen in the gray matter connections and a previous white matter connectivity study suggest a shared biological mechanism for HD and support the hypothesis that white matter loss is a direct result of neuronal loss as opposed to the loss of myelin or dysmyelination.  相似文献   

18.
A system is constructed to automatically infer a genetic network byapplication of graphical Gaussian modeling to the expression profiledata. Our system is composed of two parts: one part is automaticdetermination of cluster boundaries of profiles in hierarchicalclustering, and another part is inference of a genetic network byapplication of graphical Gaussian modeling to the clustered profiles.Since thousands of or tens of thousands of gene expression profiles aremeasured under only one hundred conditions, the profiles naturally showsome similar patterns. Therefore, a preprocessing for systematicallyclustering the profiles is prerequisite to infer the relationship betweenthe genes. For this purpose, a method for automatic determination ofcluster boundaries is newly developed without any biological knowledgeand any additional analyses. Then, the profiles for each cluster areanalyzed by graphical Gaussian modeling to infer the relationship betweenthe clusters. Thus, our system automatically provides a graph betweenclusters only by input the profile data. The performance of the presentsystem is validated by 2467 profiles from yeast genes. The clusters andthe genetic network obtained by our system are discussed in terms of thegene function and the known regulatory relationship between genes.  相似文献   

19.
MOTIVATION: Estimating the network of regulative interactions between genes from gene expression measurements is a major challenge. Recently, we have shown that for gene networks of up to around 35 genes, optimal network models can be computed. However, even optimal gene network models will in general contain false edges, since the expression data will not unambiguously point to a single network. RESULTS: In order to overcome this problem, we present a computational method to enumerate the most likely m networks and to extract a widely common subgraph (denoted as gene network motif) from these. We apply the method to bacterial gene expression data and extensively compare estimation results to knowledge. Our results reveal that gene network motifs are in significantly better agreement to biological knowledge than optimal network models. We also confirm this observation in a series of estimations using synthetic microarray data and compare estimations by our method with previous estimations for yeast. Furthermore, we use our method to estimate similarities and differences of the gene networks that regulate tryptophan metabolism in two related species and thereby demonstrate the analysis of gene network evolution. AVAILABILITY: Commercial license negotiable with Gene Networks Inc. (cherkis@gene-networks.com) CONTACT: sascha-ott@gmx.net  相似文献   

20.
KnowledgeEditor is a graphical workbench for biological experts to model biomolecular network graphs. The modeled network data are represented by SRML, and can be published via the internet with the help of plug-in module 'GSCope'. KnowledgeEditor helps us to model and analyze biological pathways based on microarray data. It is possible to analyze the drawn networks by simulating up-down regulatory cascade in molecular interactions. AVAILABILITY: KnowledgeEditor is available at http://gscope.gsc.riken.go.jp/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号