首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recently, the concept of mutual information has been proposed for inferring the structure of genetic regulatory networks from gene expression profiling. After analyzing the limitations of mutual information in inferring the gene-to-gene interactions, this paper introduces the concept of conditional mutual information and based on it proposes two novel algorithms to infer the connectivity structure of genetic regulatory networks. One of the proposed algorithms exhibits a better accuracy while the other algorithm excels in simplicity and flexibility. By exploiting the mutual information and conditional mutual information, a practical metric is also proposed to assess the likeliness of direct connectivity between genes. This novel metric resolves a common limitation associated with the current inference algorithms, namely the situations where the gene connectivity is established in terms of the dichotomy of being either connected or disconnected. Based on the data sets generated by synthetic networks, the performance of the proposed algorithms is compared favorably relative to existing state-of-the-art schemes. The proposed algorithms are also applied on realistic biological measurements, such as the cutaneous melanoma data set, and biological meaningful results are inferred.  相似文献   

2.
Leucine-responsive regulatory protein (Lrp) is a global regulatory protein that affects the expression of multiple genes and operons in bacteria. Although the physiological purpose of Lrp-mediated gene regulation remains unclear, it has been suggested that it functions to coordinate cellular metabolism with the nutritional state of the environment. The results of gene expression profiles between otherwise isogenic lrp(+) and lrp(-) strains of Escherichia coli support this suggestion. The newly discovered Lrp-regulated genes reported here are involved either in small molecule or macromolecule synthesis or degradation, or in small molecule transport and environmental stress responses. Although many of these regulatory effects are direct, others are indirect consequences of Lrp-mediated changes in the expression levels of other global regulatory proteins. Because computational methods to analyze and interpret high dimensional DNA microarray data are still an early stage, much of the emphasis of this work is directed toward the development of methods to identify differentially expressed genes with a high level of confidence. In particular, we describe a Bayesian statistical framework for a posterior estimate of the standard deviation of gene measurements based on a limited number of replications. We also describe an algorithm to compute a posterior estimate of differential expression for each gene based on the experiment-wide global false positive and false negative level for a DNA microarray data set. This allows the experimenter to compute posterior probabilities of differential expression for each individual differential gene expression measurement.  相似文献   

3.
In this article, we introduce an exploratory framework for learning patterns of conditional co-expression in gene expression data. The main idea behind the proposed approach consists of estimating how the information content shared by a set of M nodes in a network (where each node is associated to an expression profile) varies upon conditioning on a set of L conditioning variables (in the simplest case represented by a separate set of expression profiles). The method is non-parametric and it is based on the concept of statistical co-information, which, unlike conventional correlation based techniques, is not restricted in scope to linear conditional dependency patterns. Moreover, such conditional co-expression relationships can potentially indicate regulatory interactions that do not manifest themselves when only pair-wise relationships are considered. A moment based approximation of the co-information measure is derived that efficiently gets around the problem of estimating high-dimensional multi-variate probability density functions from the data, a task usually not viable due to the intrinsic sample size limitations that characterize expression level measurements. By applying the proposed exploratory method, we analyzed a whole genome microarray assay of the eukaryote Saccharomices cerevisiae and were able to learn statistically significant patterns of conditional co-expression. A selection of such interactions that carry a meaningful biological interpretation are discussed.  相似文献   

4.
5.
Inferring gene networks from gene expression data is an important step in understanding the molecular machinery of life. Three methods for establishing and quantifying causal relationships between genes based on steady-state measurements in single-gene perturbation experiments have recently been proposed: the regulatory strength method, the local regulatory strength method, and Gardner's method. The theoretical basis of these methods is presented here in a thorough and consistent fashion. In principle, for the same data set all three methods would generate identical networks, but they would quantify the strengths of connections in different ways. The regulatory strength method is shown here to be topology-dependent. It adopts the format of the data collected in gene expression microarray experiments and therefore can be immediately used with this technology. The regulatory strengths obtained by this method can also be used to compute local regulatory strengths. In contrast, Gardner's method requires both measurements of mRNA concentrations and measurements of the applied rate perturbations, which is not usually part of a standard microarray experimental protocol. The results generated by Gardner's method and by the two regulatory strengths methods differ only by scaling constants, but Gardner's method requires more measurements. On the other hand, the explicit use of rate perturbations in Gardner's approach allows one to address new questions with this method, like what perturbations caused given responses of the system. Results of the application of the three techniques to real experimental data are presented and discussed. The comparative analysis presented in this paper can be helpful for identifying an appropriate technique for inferring genetic networks and for interpreting the results of its application to experimental data.  相似文献   

6.
MOTIVATION: Most biological traits may be correlated with the underlying gene expression patterns that are partially determined by DNA sequence variation. The correlations between gene expressions and quantitative traits are essential for understanding the functions of genes and dissecting gene regulatory networks. RESULTS: In the present study, we adopted a novel statistical method, called the stochastic expectation and maximization (SEM) algorithm, to analyze the associations between gene expression levels and quantitative trait values and identify genetic loci controlling the gene expression variations. In the first step, gene expression levels measured from microarray experiments were assigned to two different clusters based on the strengths of their association with the phenotypes of a quantitative trait under investigation. In the second step, genes associated with the trait were mapped to genetic loci of the genome. Because gene expressions are quantitative, the genetic loci controlling the expression traits are called expression quantitative trait loci. We applied the same SEM algorithm to a real dataset collected from a barley genetic experiment with both quantitative traits and gene expression traits. For the first time, we identified genes associated with eight agronomy traits of barley. These genes were then mapped to seven chromosomes of the barley genome. The SEM algorithm and the result of the barley data analysis are useful to scientists in the areas of bioinformatics and plant breeding. Availability and implementation: The R program for the SEM algorithm can be downloaded from our website: http://www.statgen.ucr.edu.  相似文献   

7.
A Bayesian missing value estimation method for gene expression profile data   总被引:13,自引:0,他引:13  
MOTIVATION: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology. RESULTS: When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values. AVAILABILITY: The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.  相似文献   

8.
9.
10.
11.
Duarte CW  Zeng ZB 《Genetics》2011,187(3):955-964
Expression QTL (eQTL) studies involve the collection of microarray gene expression data and genetic marker data from segregating individuals in a population to search for genetic determinants of differential gene expression. Previous studies have found large numbers of trans-regulated genes (regulated by unlinked genetic loci) that link to a single locus or eQTL "hotspot," and it would be desirable to find the mechanism of coregulation for these gene groups. However, many difficulties exist with current network reconstruction algorithms such as low power and high computational cost. A common observation for biological networks is that they have a scale-free or power-law architecture. In such an architecture, highly influential nodes exist that have many connections to other nodes. If we assume that this type of architecture applies to genetic networks, then we can simplify the problem of genetic network reconstruction by focusing on discovery of the key regulatory genes at the top of the network. We introduce the concept of "shielding" in which a specific gene expression variable (the shielder) renders a set of other gene expression variables (the shielded genes) independent of the eQTL. We iteratively build networks from the eQTL to the shielder down using tests of conditional independence. We have proposed a novel test for controlling the shielder false-positive rate at a predetermined level by requiring a threshold number of shielded genes per shielder. Using simulation, we have demonstrated that we can control the shielder false-positive rate as well as obtain high shielder and edge specificity. In addition, we have shown our method to be robust to violation of the latent variable assumption, an important feature in the practical application of our method. We have applied our method to a yeast expression QTL data set in which microarray and marker data were collected from the progeny of a backcross of two species of Saccharomyces cerevisiae (Brem et al. 2002). Seven genetic networks have been discovered, and bioinformatic analysis of the discovered regulators and corresponding regulated genes has generated plausible hypotheses for mechanisms of regulation that can be tested in future experiments.  相似文献   

12.
Design of microarray experiments for genetical genomics studies   总被引:2,自引:0,他引:2       下载免费PDF全文
Bueno Filho JS  Gilmour SG  Rosa GJ 《Genetics》2006,174(2):945-957
  相似文献   

13.
An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature for performing this type of diagnosis. Unfortunately, most of these techniques assume that different subtypes of cancer are already known to exist. Their utility is limited when such subtypes have not been previously identified. Although methods for identifying such subtypes exist, these methods do not work well for all datasets. It would be desirable to develop a procedure to find such subtypes that is applicable in a wide variety of circumstances. Even if no information is known about possible subtypes of a certain form of cancer, clinical information about the patients, such as their survival time, is often available. In this study, we develop some procedures that utilize both the gene expression data and the clinical data to identify subtypes of cancer and use this knowledge to diagnose future patients. These procedures were successfully applied to several publicly available datasets. We present diagnostic procedures that accurately predict the survival of future patients based on the gene expression profile and survival times of previous patients. This has the potential to be a powerful tool for diagnosing and treating cancer.  相似文献   

14.
Adjustments and measures of differential expression for microarray data   总被引:4,自引:0,他引:4  
MOTIVATION: Existing analyses of microarray data often incorporate an obscure data normalization procedure applied prior to data analysis. For example, ratios of microarray channels intensities are normalized to have common mean over the set of genes. We made an attempt to understand the meaning of such procedures from the modeling point of view, and to formulate the model assumptions that underlie them. Given a considerable diversity of data adjustment procedures, the question of their performance, comparison and ranking for various microarray experiments was of interest. RESULTS: A two-step statistical procedure is proposed: data transformation (adjustment for slide-specific effect) followed by a statistical test applied to transformed data. Various methods of analysis for differential expression are compared using simulations and real data on colon cancer cell lines. We found that robust categorical adjustments outperform the ones based on a precisely defined stochastic model, including some commonly used procedures.  相似文献   

15.
A duplication growth model of gene expression networks   总被引:8,自引:0,他引:8  
  相似文献   

16.
17.
Large-scale microarray gene expression data provide the possibility of constructing genetic networks or biological pathways. Gaussian graphical models have been suggested to provide an effective method for constructing such genetic networks. However, most of the available methods for constructing Gaussian graphs do not account for the sparsity of the networks and are computationally more demanding or infeasible, especially in the settings of high dimension and low sample size. We introduce a threshold gradient descent (TGD) regularization procedure for estimating the sparse precision matrix in the setting of Gaussian graphical models and demonstrate its application to identifying genetic networks. Such a procedure is computationally feasible and can easily incorporate prior biological knowledge about the network structure. Simulation results indicate that the proposed method yields a better estimate of the precision matrix than the procedures that fail to account for the sparsity of the graphs. We also present the results on inference of a gene network for isoprenoid biosynthesis in Arabidopsis thaliana. These results demonstrate that the proposed procedure can indeed identify biologically meaningful genetic networks based on microarray gene expression data.  相似文献   

18.
Genetic interaction screens have been applied with great success in several organisms to study gene function and the genetic architecture of the cell. However, most studies have been performed under optimal growth conditions even though many functional interactions are known to occur under specific cellular conditions. In this study, we have performed a large‐scale genetic interaction analysis in Saccharomyces cerevisiae involving approximately 49 × 1,200 double mutants in the presence of five different stress conditions, including osmotic, oxidative and cell wall‐altering stresses. This resulted in the generation of a differential E‐MAP (or dE‐MAP) comprising over 250,000 measurements of conditional interactions. We found an extensive number of conditional genetic interactions that recapitulate known stress‐specific functional associations. Furthermore, we have also uncovered previously unrecognized roles involving the phosphatase regulator Bud14, the histone methylation complex COMPASS and membrane trafficking complexes in modulating the cell wall integrity pathway. Finally, the osmotic stress differential genetic interactions showed enrichment for genes coding for proteins with conditional changes in phosphorylation but not for genes with conditional changes in gene expression. This suggests that conditional genetic interactions are a powerful tool to dissect the functional importance of the different response mechanisms of the cell.  相似文献   

19.
A fundamental problem in DNA microarray analysis is the lack of a common standard to compare the expression levels of different samples. Several normalization protocols have been proposed to overcome variables inherent in this technology. As yet, there are no satisfactory methods to exchange gene expression data among different research groups or to compare gene expression values under different stimulus–response profiles. We have tested a normalization procedure based on comparing gene expression levels to the signals generated from hybridizing genomic DNA (genomic normalization). This procedure was applied to DNA microarrays of Mycobacterium tuberculosis using RNA extracted from cultures growing to the logarithmic and stationary phases. The applied normalization procedure generated reproducible measurements of expression level for 98% of the putative mycobacterial ORFs, among which 5.2% were significantly changed comparing the logarithmic to stationary growth phase. Additionally, analysis of expression levels of a subset of genes by real time PCR technology revealed an agreement in expression of 90% of the examined genes when genomic DNA normalization was applied instead of 29–68% agreement when RNA normalization was used to measure the expression levels in the same set of RNA samples. Further examination of microarray expression levels displayed clusters of genes differentially expressed between the logarithmic, early stationary and late stationary growth phases. We conclude that genomic DNA standards offer advantages over conventional RNA normalization procedures and can be adapted for the investigation of microbial genomes.  相似文献   

20.
MOTIVATION: In clinical practice, pathological phenotypes are often labelled with ordinal scales rather than binary, e.g. the Gleason grading system for tumour cell differentiation. However, in the literature of microarray analysis, these ordinal labels have been rarely treated in a principled way. This paper describes a gene selection algorithm based on Gaussian processes to discover consistent gene expression patterns associated with ordinal clinical phenotypes. The technique of automatic relevance determination is applied to represent the significance level of the genes in a Bayesian inference framework. RESULTS: The usefulness of the proposed algorithm for ordinal labels is demonstrated by the gene expression signature associated with the Gleason score for prostate cancer data. Our results demonstrate how multi-gene markers that may be initially developed with a diagnostic or prognostic application in mind are also useful as an investigative tool to reveal associations between specific molecular and cellular events and features of tumour physiology. Our algorithm can also be applied to microarray data with binary labels with results comparable to other methods in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号