首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques.  相似文献   

2.
Gene expression profiles of clinical cohorts can be used to identify genes that are correlated with a clinical variable of interest such as patient outcome or response to a particular drug. However, expression measurements are susceptible to technical bias caused by variation in extraneous factors such as RNA quality and array hybridization conditions. If such technical bias is correlated with the clinical variable of interest, the likelihood of identifying false positive genes is increased. Here we describe a method to visualize an expression matrix as a projection of all genes onto a plane defined by a clinical variable and a technical nuisance variable. The resulting plot indicates the extent to which each gene is correlated with the clinical variable or the technical variable. We demonstrate this method by applying it to three clinical trial microarray data sets, one of which identified genes that may have been driven by a confounding technical variable. This approach can be used as a quality control step to identify data sets that are likely to yield false positive results.  相似文献   

3.
MOTIVATION: Recently, a new type of expression data is being collected which aims to measure the effect of genetic variation on gene expression in pathways. In these datasets, expression profiles are constructed for multiple strains of the same model organism under the same condition. The goal of analyses of these data is to find differences in regulatory patterns due to genetic variation between strains, often without a phenotype of interest in mind. We present a new method based on notions of tight regulation and differential expression to look for sets of genes which appear to be significantly affected by genetic variation. RESULTS: When we use categorical phenotype information, as in the Alzheimer's and diabetes datasets, our method finds many of the same gene sets as gene set enrichment analysis. In addition, our notion of correlated gene sets allows us to focus our efforts on biological processes subjected to tight regulation. In murine hematopoietic stem cells, we are able to discover significant gene sets independent of a phenotype of interest. Some of these gene sets are associated with several blood-related phenotypes. AVAILABILITY: The programs are available by request from the authors.  相似文献   

4.
Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.  相似文献   

5.
Bø T  Jonassen I 《Genome biology》2002,3(4):research00-11
Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier. We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes. When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.  相似文献   

6.
7.
8.
Sesame (Sesamum indicum) is an important oilseed crop which produces seeds with 50% oil that have a distinct flavor and contains antioxidant lignans. Because sesame lignans are known to have antioxidant and health-protecting properties, metabolic pathways for lignans have been of interest in developing sesame seeds. As an initial approach to identify genes involved in accumulation of storage products and in the biosynthesis of antioxidant lignans, 3328 expressed sequence tags (ESTs) were obtained from a cDNA library of immature seeds 5-25 days old. ESTs were clustered and analyzed by the BLASTX or FASTAX program against the GenBank NR and Arabidopsis proteome databases. To compare gene expression profiles during development of green and non-green seeds, a comparative analysis was carried out between developing sesame and Arabidopsis seed ESTs. Analyses of these two seed EST sets have helped to identify similar and different gene expression profiles during seed development, and to identify a large number of sesame seed-specific genes. In particular, we have identified EST candidates for genes possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin, and also suggest a possible metabolic pathway for the generation of cofactors required for synthesis of storage lipid in non-green oilseeds. Seed-specific expression of several candidate genes has been confirmed by northern blot analysis.  相似文献   

9.
J Zhang  J Jia  F Zhu  X Ma  B Han  X Wei  C Tan  Y Jiang  Y Chen 《Molecular bioSystems》2012,8(10):2645-2656
Some drugs, such as anticancer EGFR tyrosine kinase inhibitors, elicit markedly different clinical response rates due to differences in drug bypass signaling as well as genetic variations of drug target and downstream drug-resistant genes. The profiles of these bypass signaling are expected to be useful for improved drug response prediction, which have not been systematically explored previously. In this work, we searched and analyzed 16 literature-reported EGFR tyrosine kinase inhibitor bypass signaling routes in the EGFR pathway, which include 5 compensatory routes of EGFR transactivation by another receptor, and 11 alternative routes activated by another receptor. These 16 routes are reportedly regulated by 11 bypass genes. Their expression profiles together with the mutational, amplification and expression profiles of EGFR and 4 downstream drug-resistant genes, were used as new sets of biomarkers for identifying 53 NSCLC cell-lines sensitive or resistant to EGFR tyrosine kinase inhibitors gefitinib, erlotinib and lapatinib. The collective profiles of all 16 genes distinguish sensitive and resistant cell-lines are better than those of individual genes and the combined EGFR and downstream drug resistant genes, and their derived cell-line response rates are consistent with the reported clinical response rates of the three drugs. The usefulness of cell-line data for drug response studies was further analyzed by comparing the expression profiles of EGFR and bypass genes in NSCLC cell-lines and patient samples, and by using a machine learning feature selection method for selecting drug response biomarkers. Our study suggested that the profiles of drug bypass signaling are highly useful for improved drug response prediction.  相似文献   

10.
Understanding the regulation of gene expression requires the identification of cis -acting control elements that modulate gene function. The recent availability of complete genome sequences and profiles of mRNA expression has facilitated the development and utilization of computational methods to identify discrete regulatory elements. We have developed an oligomer counting method that identifies sequences that occur significantly more often in a group of interest relative to other genes in the genome. The use of a second parameter, which measures the frequency of oligomers within the group of interest, allows the detection of false positive signals caused by very infrequent oligomers that would otherwise appear as significant. Applying this method to gene groups that have a common expression pattern or shared function should identify oligomers that comprise cis -acting control elements. As a test of this method, we applied this approach to a set of intron-containing yeast genes, where we easily identified the known splicing signals as control elements. We have used this training set to examine how this method is affected by the length of the oligomer examined, as well as the size and composition of the gene group. These simulations allowed us to identify rules for selecting groups of genes to analyze. Finally, application of this method to nuclear genes encoding proteins targeted to the mitochondria identified a new putative cis -acting sequence in the 3'-untranslated region of this family of genes, which may play a role in mRNA localization or the regulation of mRNA stability or translation.  相似文献   

11.
12.
13.
A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster; many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators.  相似文献   

14.
15.
16.
17.
Chemotherapeutic response of cancer cells to a given compound is one of the most fundamental information one requires to design anti-cancer drugs. Recently, considerable amount of drug-induced gene expression data has become publicly available, in addition to cytotoxicity databases. These large sets of data provided an opportunity to apply machine learning methods to predict drug activity. However, due to the complexity of cancer drug mechanisms, none of the existing methods is perfect. In this paper, we propose a novel ensemble learning method to predict drug response. In addition, we attempt to use the drug screen data together with two novel signatures produced from the drug-induced gene expression profiles of cancer cell lines. Finally, we evaluate predictions by in vitro experiments in addition to the tests on data sets. The predictions of the methods, the signatures and the software are available from http://mtan.etu.edu.tr/drug-response-prediction/.  相似文献   

18.
MOTIVATION: MicroRNAs (miRNAs) and mRNAs constitute an important part of gene regulatory networks, influencing diverse biological phenomena. Elucidating closely related miRNAs and mRNAs can be an essential first step towards the discovery of their combinatorial effects on different cellular states. Here, we propose a probabilistic learning method to identify synergistic miRNAs involving regulation of their condition-specific target genes (mRNAs) from multiple information sources, i.e. computationally predicted target genes of miRNAs and their respective expression profiles. RESULTS: We used data sets consisting of miRNA-target gene binding information and expression profiles of miRNAs and mRNAs on human cancer samples. Our method allowed us to detect functionally correlated miRNA-mRNA modules involved in specific biological processes from multiple data sources by using a balanced fitness function and efficient searching over multiple populations. The proposed algorithm found two miRNA-mRNA modules, highly correlated with respect to their expression and biological function. Moreover, the mRNAs included in the same module showed much higher correlations when the related miRNAs were highly expressed, demonstrating our method's ability for finding coherent miRNA-mRNA modules. Most members of these modules have been reported to be closely related with cancer. Consequently, our method can provide a primary source of miRNA and target sets presumed to constitute closely related parts of gene regulatory pathways.  相似文献   

19.

Background

Clustering is a widely used technique for analysis of gene expression data. Most clustering methods group genes based on the distances, while few methods group genes according to the similarities of the distributions of the gene expression levels. Furthermore, as the biological annotation resources accumulated, an increasing number of genes have been annotated into functional categories. As a result, evaluating the performance of clustering methods in terms of the functional consistency of the resulting clusters is of great interest.

Results

In this paper, we proposed the WDCM (Weibull Distribution-based Clustering Method), a robust approach for clustering gene expression data, in which the gene expressions of individual genes are considered as the random variables following unique Weibull distributions. Our WDCM is based on the concept that the genes with similar expression profiles have similar distribution parameters, and thus the genes are clustered via the Weibull distribution parameters. We used the WDCM to cluster three cancer gene expression data sets from the lung cancer, B-cell follicular lymphoma and bladder carcinoma and obtained well-clustered results. We compared the performance of WDCM with k-means and Self Organizing Map (SOM) using functional annotation information given by the Gene Ontology (GO). The results showed that the functional annotation ratios of WDCM are higher than those of the other methods. We also utilized the external measure Adjusted Rand Index to validate the performance of the WDCM. The comparative results demonstrate that the WDCM provides the better clustering performance compared to k-means and SOM algorithms. The merit of the proposed WDCM is that it can be applied to cluster incomplete gene expression data without imputing the missing values. Moreover, the robustness of WDCM is also evaluated on the incomplete data sets.

Conclusions

The results demonstrate that our WDCM produces clusters with more consistent functional annotations than the other methods. The WDCM is also verified to be robust and is capable of clustering gene expression data containing a small quantity of missing values.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号