首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques.  相似文献   

2.

Background  

Cells dynamically adapt their gene expression patterns in response to various stimuli. This response is orchestrated into a number of gene expression modules consisting of co-regulated genes. A growing pool of publicly available microarray datasets allows the identification of modules by monitoring expression changes over time. These time-series datasets can be searched for gene expression modules by one of the many clustering methods published to date. For an integrative analysis, several time-series datasets can be joined into a three-dimensional gene-condition-time dataset, to which standard clustering or biclustering methods are, however, not applicable. We thus devise a probabilistic clustering algorithm for gene-condition-time datasets.  相似文献   

3.
Gene expression array technology has made possible the assay of expression levels of tens of thousands of genes at a time; large databases of such measurements are currently under construction. One important use of such databases is the ability to search for experiments that have similar gene expression levels as a query, potentially identifying previously unsuspected relationships among cellular states. Such searches depend crucially on the metric used to assess the similarity between pairs of experiments. The complex joint distribution of gene expression levels, particularly their correlational structure and non-normality, make simple similarity metrics such as Euclidean distance or correlational similarity scores suboptimal for use in this application. We present a similarity metric for gene expression array experiments that takes into account the complex joint distribution of expression values. We provide a computationally tractable approximation to this measure, and have implemented a database search tool based on it. We discuss implementation issues and efficiency, and we compare our new metric to other standard metrics.  相似文献   

4.
MOTIVATION: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. RESULTS: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling-based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. CONCLUSIONS: Our method reports clusterings defined by biologically focused sets of genes. In annotation-driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner. AVAILABILITY: We provide the R package adSplit as part of Bioconductor release 1.9 and on http://compdiag.molgen.mpg.de/software.  相似文献   

5.
6.
7.
MOTIVATION: Analysis of gene expression data can provide insights into the time-lagged co-regulation of genes/gene clusters. However, existing methods such as the Event Method and the Edge Detection Method are inefficient as they compare only two genes at a time. More importantly, they neglect some important information due to their scoring criterian. In this paper, we propose an efficient algorithm to identify time-lagged co-regulated gene clusters. The algorithm facilitates localized comparison and processes several genes simultaneously to generate detailed and complete time-lagged information for genes/gene clusters. RESULTS: We experimented with the time-series Yeast gene dataset and compared our algorithm with the Event Method. Our results show that our algorithm is not only efficient, but also delivers more reliable and detailed information on time-lagged co-regulation between genes/gene clusters. AVAILABILITY: The software is available upon request. CONTACT: jiliping@comp.nus.edu.sg SUPPLEMENTARY INFORMATION: Supplementary tables and figures for this paper can be found at http://www.comp.nus.edu.sg/~jiliping/p2.htm.  相似文献   

8.
MOTIVATION: Microarray technology enables the study of gene expression in large scale. The application of methods for data analysis then allows for grouping genes that show a similar expression profile and that are thus likely to be co-regulated. A relationship among genes at the biological level often presents itself by locally similar and potentially time-shifted patterns in their expression profiles. RESULTS: Here, we propose a new method (CLARITY; Clustering with Local shApe-based similaRITY) for the analysis of microarray time course experiments that uses a local shape-based similarity measure based on Spearman rank correlation. This measure does not require a normalization of the expression data and is comparably robust towards noise. It is also able to detect similar and even time-shifted sub-profiles. To this end, we implemented an approach motivated by the BLAST algorithm for sequence alignment.We used CLARITY to cluster the times series of gene expression data during the mitotic cell cycle of the yeast Saccharomyces cerevisiae. The obtained clusters were related to the MIPS functional classification to assess their biological significance. We found that several clusters were significantly enriched with genes that share similar or related functions.  相似文献   

9.
There is great interest in chromosome- and pathway-based techniques for genomics data analysis in the current work in order to understand the mechanism of disease. However, there are few studies addressing the abilities of machine learning methods in incorporating pathway information for analyzing microarray data. In this paper, we identified the characteristic pathways by combining the classification error rates of out-of-bag (OOB) in random forests with pathways information. At each characteristic pathway, the correlation of gene expression was studied and the co-regulated gene patterns in different biological conditions were mined by Mining Attribute Profile (MAP) algorithm. The discovered co-regulated gene patterns were clustered by the average-linkage hierarchical clustering technique. The results showed that the expression of genes at the same characteristic pathway were approximate. Furthermore, two characteristic pathways were discovered to present co-regulated gene patterns in which one contained 108 patterns and the other contained one pattern. The results of cluster analysis showed that the smallest similarity coefficient of clusters was more than 0.623, which indicated that the co-regulated patterns in different biological conditions were more approximate at the same characteristic pathway. The methods discussed in this paper can provide additional insight into the study of microarray data.  相似文献   

10.
11.
Kim HY  Kim MJ  Han JI  Kim BK  Lee YS  Lee YS  Kim JH 《Bio Systems》2009,95(1):17-25
A time-series microarray experiment is useful to study the changes in the expression of a large number of genes over time. Many methods for clustering genes using gene expression profiles have been suggested, but it is not easy to interpret the biological significance of the results or utilize these methods for understanding the dynamics of gene regulatory systems. In this study, we introduce an algorithm for readjusting the boundaries of clusters by adopting the advantages of both k-means and singular value decomposition (SVD). In addition, we suggest a methodology for searching the principal genes that can be the most crucial genes in regulation of clusters. We found 34 principal genes from 171 clusters having strong concentratedness in their expression patterns and distinct ranges of oscillatory phases, by using a time-series microarray dataset of mouse embryonic stem (ES) cells after induction of dopaminergic neural differentiation. The biological significance of the principal genes examined in the literature supports the feasibility of our algorithms in that the hierarchy of clusters may lead the manifestation of the phenotypes, e.g., the development of the nervous system.  相似文献   

12.

Background

Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets.

Results

The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric.

Conclusions

The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.
  相似文献   

13.
Differential expression of genes detected with the analysis of high throughput genomic experiments is a commonly used intermediate step for the identification of signaling pathways involved in the response to different biological conditions. The impact analysis was the first approach for the analysis of signaling pathways involved in a certain biological process that was able to take into account not only the magnitude of the expression change of the genes but also the topology of signaling pathways including the type of each interactions between the genes. In the impact analysis, signaling pathways are represented as weighted directed graphs with genes as nodes and the interactions between genes as edges. Edges weights are represented by a β factor, the regulatory efficiency, which is assumed to be equal to 1 in inductive interactions between genes and equal to −1 in repressive interactions. This study presents a similarity analysis between gene expression time series aimed to find correspondences with the regulatory efficiency, i.e. the β factor as found in a widely used pathway database. Here, we focused on correlations among genes directly connected in signaling pathways, assuming that the expression variations of upstream genes impact immediately downstream genes in a short time interval and without significant influences by the interactions with other genes. Time series were processed using three different similarity metrics. The first metric is based on the bit string matching; the second one is a specific application of the Dynamic Time Warping to detect similarities even in presence of stretching and delays; the third one is a quantitative comparative analysis resulting by an evaluation of frequency domain representation of time series: the similarity metric is the correlation between dominant spectral components. These three approaches are tested on real data and pathways, and a comparison is performed using Information Retrieval benchmark tools, indicating the frequency approach as the best similarity metric among the three, for its ability to detect the correlation based on the correspondence of the most significant frequency components.  相似文献   

14.
An noticeable number of biclustering approaches have been proposed proposed for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. In this context, recognizing groups of co-expressed or co-regulated genes, that is, genes which follow a similar expression pattern, is one of the main objectives. Due to the problem complexity, heuristic searches are usually used instead of exhaustive algorithms. Furthermore, most of biclustering approaches use a measure or cost function that determines the quality of biclusters. Having a suitable quality metric for bicluster is a critical aspect, not only for guiding the search, but also for establishing a comparison criteria among the results obtained by different biclustering techniques. In this paper, we analyse a large number of existing approaches to quality measures for gene expression biclusters, as well as we present a comparative study of them based on their capability to recognize different expression patterns in biclusters.  相似文献   

15.
Biochemical and cytogenetic experiments have led to the hypothesis that eukaryotic chromatin is organized into a series of distinct domains that are functionally independent. Two expectations of this hypothesis are: (i) adjacent genes are more frequently co-expressed than is expected by chance; and (ii) co-expressed neighbouring genes are often functionally related. Here we report that over 10% of Arabidopsis thaliana genes are within large, co-expressed chromosomal regions. Two per cent (497/22,520) of genes are highly co-expressed (r > 0.7), about five times the number expected by chance. These genes fall into 226 groups distributed across the genome, and each group typically contains two to three genes. Among the highly co-expressed groups, 40% (91/226) have genes with high amino acid sequence similarity. Nonetheless, duplicate genes alone do not explain the observed levels of co-expression. Co-expressed, non-homologous genes are transcribed in parallel, share functions, and lie close together more frequently than expected. Our results show that the A. thaliana genome contains domains of gene expression. Small domains have highly co-expressed genes that often share functional and sequence similarity and are probably co-regulated by nearby regulatory sequences. Genes within large, significantly correlated groups are typically co-regulated at a low level, suggesting the presence of large chromosomal domains.  相似文献   

16.
MOTIVATIONS: Bi-clustering is an important approach in microarray data analysis. The underlying bases for using bi-clustering in the analysis of gene expression data are (1) similar genes may exhibit similar behaviors only under a subset of conditions, not all conditions, (2) genes may participate in more than one function, resulting in one regulation pattern in one context and a different pattern in another. Using bi-clustering algorithms, one can obtain sets of genes that are co-regulated under subsets of conditions. RESULTS: We develop a polynomial time algorithm to find an optimal bi-cluster with the maximum similarity score. To our knowledge, this is the first formulation for bi-cluster problems that admits a polynomial time algorithm for optimal solutions. The algorithm works for a special case, where the bi-clusters are approximately squares. We then extend the algorithm to handle various kinds of other cases. Experiments on simulation data and real data show that the new algorithms outperform most of the existing methods in many cases. Our new algorithms have the following advantages: (1) no discretization procedure is required, (2) performs well for overlapping bi-clusters and (3) works well for additive bi-clusters. AVAILABILITY: The software is available at http://www.cs.cityu.edu.hk/~liuxw/msbe/help.html.  相似文献   

17.
Expression profiling of time-series experiments is widely used to study biological systems. However, determining the quality of the resulting profiles remains a fundamental problem. Because of inadequate sampling rates, the effect of arrest-and-release methods and loss of synchronization, the measurements obtained from a series of time points may not accurately represent the underlying expression profiles. To solve this, we propose an approach that combines time-series and static (average) expression data analysis--for each gene, we determine whether its temporal expression profile can be reconciled with its static expression levels. We show that by combining synchronized and unsynchronized human cell cycle data, we can identify many cycling genes that are missed when using only time-series data. The algorithm also correctly distinguishes cycling genes from genes that specifically react to an environmental stimulus even if they share similar temporal expression profiles. Experimental validation of these results shows the utility of this analytical approach for determining the accuracy of gene expression patterns.  相似文献   

18.
Both autonomously functioning thyroid nodules (AFTNs) and cold thyroid nodules (CTNs) are characterized by an increased proliferation, however, they have opposite functional activities. Therefore, with the aim to further understand the distinct molecular pathology of each entity and to discover common mechanisms like those leading to increased proliferation in both, AFTNs and CTNs, we now compared gene expression of AFTNs and CTNs with in vitro model systems (TSH-stimulated and ras-transfected primary cultures (PC)) whose gene expression patterns can be attributed to specific molecular alterations. Since combinations of co-regulated genes are more likely to reveal molecular mechanisms, we used a procedure which groups co-regulated genes within "gene sets". We found a co-regulated gene set in the AFTNs that overlaps with differential expression in TSH-stimulated PCs but not in CTNs or ras-transfected PCs. In addition to thyroid peroxidase and sialyltransferase 1, this set of co-regulated genes comprises metallothioneins and the G-protein-coupled receptor 56. Although their role in the thyroid is unknown so far, their appearance in one group indicates a functional relevance in TSH-TSH receptor-stimulated mechanisms. Furthermore, we identified down-regulated gene sets with concordant expression patterns in AFTNs, CTNs and ras-transfected PCs. However, these expression patterns are not of relevance in the TSH-stimulated PCs. These findings suggest that TSH-stimulated PCs can be used as a model of increased thyroid function (AFTNs), whereas the ras-transfected PCs better reflect the increased proliferation of both AFTNs and CTNs.  相似文献   

19.

Background  

A common observation in the analysis of gene expression data is that many genes display similarity in their expression patterns and therefore appear to be co-regulated. However, the variation associated with microarray data and the complexity of the experimental designs make the acquisition of co-expressed genes a challenge. We developed a novel method for Extracting microarray gene expression Patterns and Identifying co-expressed Genes, designated as EPIG. The approach utilizes the underlying structure of gene expression data to extract patterns and identify co-expressed genes that are responsive to experimental conditions.  相似文献   

20.
MOTIVATION: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. RESULTS: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms. AVAILABILITY: http://www.cs.tau.ac.il/~rshamir/expander/expander.html  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号