期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

CLICK and EXPANDER: a system for clustering and visualizing gene expression data 总被引：9，自引：0，他引：9

Sharan R Maron-Katz A Shamir R 《Bioinformatics (Oxford, England)》2003,19(14):1787-1799

MOTIVATION: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. RESULTS: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms. AVAILABILITY: http://www.cs.tau.ac.il/~rshamir/expander/expander.html 相似文献

2.

Fuzzy J-Means and VNS methods for clustering genes from microarray data 总被引：4，自引：0，他引：4

Belacel N Cuperlović-Culf M Laflamme M Ouellette R 《Bioinformatics (Oxford, England)》2004,20(11):1690-1701

MOTIVATION: In the interpretation of gene expression data from a group of microarray experiments that include samples from either different patients or conditions, special consideration must be given to the pleiotropic and epistatic roles of genes, as observed in the variation of gene coexpression patterns. Crisp clustering methods assign each gene to one cluster, thereby omitting information about the multiple roles of genes. RESULTS: Here, we present the application of a local search heuristic, Fuzzy J-Means, embedded into the variable neighborhood search metaheuristic for the clustering of microarray gene expression data. We show that for all the datasets studied this algorithm outperforms the standard Fuzzy C-Means heuristic. Different methods for the utilization of cluster membership information in determining gene coregulation are presented. The clustering and data analyses were performed on simulated datasets as well as experimental cDNA microarray data for breast cancer and human blood from the Stanford Microarray Database. AVAILABILITY: The source code of the clustering software (C programming language) is freely available from Nabil.Belacel@nrc-cnrc.gc.ca 相似文献

3.

Clustering of time-course gene expression data using a mixed-effects model with B-splines 总被引：2，自引：0，他引：2

Luan Y Li H 《Bioinformatics (Oxford, England)》2003,19(4):474-482

相似文献

4.

Combining sequence and time series expression data to learn transcriptional modules

Kundaje A Middendorf M Gao F Wiggins C Leslie C 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(3):194-202

相似文献

5.

A Bayesian regression approach to the inference of regulatory networks from gene expression data 总被引：3，自引：0，他引：3

Rogers S Girolami M 《Bioinformatics (Oxford, England)》2005,21(14):3131-3137

MOTIVATION: There is currently much interest in reverse-engineering regulatory relationships between genes from microarray expression data. We propose a new algorithmic method for inferring such interactions between genes using data from gene knockout experiments. The algorithm we use is the Sparse Bayesian regression algorithm of Tipping and Faul. This method is highly suited to this problem as it does not require the data to be discretized, overcomes the need for an explicit topology search and, most importantly, requires no heuristic thresholding of the discovered connections. RESULTS: Using simulated expression data, we are able to show that this algorithm outperforms a recently published correlation-based approach. Crucially, it does this without the need to set any ad hoc threshold on possible connections. 相似文献

6.

Clustering and re-clustering for pattern discovery in gene expression data

Ma PC Chan KC Chiu DK 《Journal of bioinformatics and computational biology》2005,3(2):281-301

相似文献

7.

Diametrical clustering for identifying anti-correlated gene clusters 总被引：6，自引：0，他引：6

Dhillon IS Marcotte EM Roshan U 《Bioinformatics (Oxford, England)》2003,19(13):1612-1619

相似文献

8.

Spearman Correlation Identifies Statistically Significant Gene Expression Clusters in Spinal Cord Development and Injury 总被引：1，自引：0，他引：1

Kotlyar M Fuhrman S Ableson A Somogyi R 《Neurochemical research》2002,27(10):1133-1140

An important problem in the analysis of large-scale gene expression data is the validation of gene expression clusters. By examining the temporal expression patterns of 74 genes expressed in rat spinal cord under three different experimental conditions, we have found evidence that some genes cluster together under multiple conditions. Using RT-PCR data from spinal cord development and two sets of microarray data from spinal injury, we applied Spearman correlation to identify clusters and to assign P values to pairs of genes with highly similar temporal expression patterns. We found that 15% of genes occurred in statistically significant pairs in all three experimental conditions, providing both statistical and experimental support for the idea that genes that cluster together are co-regulated. In addition, we demonstrated that DNA microarray and RT-PCR data are comparable, and can be combined to confirm gene expression relationships. 相似文献

9.

Quantitative trait associated microarray gene expression data analysis

Qu Y Xu S 《Molecular biology and evolution》2006,23(8):1558-1573

Selection on phenotypes may cause genetic change. To understand the relationship between phenotype and gene expression from an evolutionary viewpoint, it is important to study the concordance between gene expression and profiles of phenotypes. In this study, we use a novel method of clustering to identify genes whose expression profiles are related to a quantitative phenotype. Cluster analysis of gene expression data aims at classifying genes into several different groups based on the similarity of their expression profiles across multiple conditions. The hope is that genes that are classified into the same clusters may share underlying regulatory elements or may be a part of the same metabolic pathways. Current methods for examining the association between phenotype and gene expression are limited to linear association measured by the correlation between individual gene expression values and phenotype. Genes may be associated with the phenotype in a nonlinear fashion. In addition, groups of genes that share a particular pattern in their relationship to phenotype may be of evolutionary interest. In this study, we develop a method to group genes based on orthogonal polynomials under a multivariate Gaussian mixture model. The effect of each expressed gene on the phenotype is partitioned into a cluster mean and a random deviation from the mean. Genes can also be clustered based on a time series. Parameters are estimated using the expectation-maximization algorithm and implemented in SAS. The method is verified with simulated data and demonstrated with experimental data from 2 studies, one clusters with respect to severity of disease in Alzheimer's patients and another clusters data for a rat fracture healing study over time. We find significant evidence of nonlinear associations in both studies and successfully describe these patterns with our method. We give detailed instructions and provide a working program that allows others to directly implement this method in their own analyses. 相似文献

10.

Adaptive quality-based clustering of gene expression profiles 总被引：17，自引：0，他引：17

De Smet F Mathys J Marchal K Thijs G De Moor B Moreau Y 《Bioinformatics (Oxford, England)》2002,18(5):735-746

MOTIVATION: Microarray experiments generate a considerable amount of data, which analyzed properly help us gain a huge amount of biologically relevant information about the global cellular behaviour. Clustering (grouping genes with similar expression profiles) is one of the first steps in data analysis of high-throughput expression measurements. A number of clustering algorithms have proved useful to make sense of such data. These classical algorithms, though useful, suffer from several drawbacks (e.g. they require the predefinition of arbitrary parameters like the number of clusters; they force every gene into a cluster despite a low correlation with other cluster members). In the following we describe a novel adaptive quality-based clustering algorithm that tackles some of these drawbacks. RESULTS: We propose a heuristic iterative two-step algorithm: First, we find in the high-dimensional representation of the data a sphere where the "density" of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster-quality-based approach). In a second step, we derive an optimal radius of the cluster (adaptive approach) so that only the significantly coexpressed genes are included in the cluster. This estimation is achieved by fitting a model to the data using an EM-algorithm. By inferring the radius from the data itself, the biologist is freed from finding an optimal value for this radius by trial-and-error. The computational complexity of this method is approximately linear in the number of gene expression profiles in the data set. Finally, our method is successfully validated using existing data sets. AVAILABILITY: http://www.esat.kuleuven.ac.be/~thijs/Work/Clustering.html 相似文献

11.

Bayesian hierarchical modeling for time course microarray experiments

Chi YY Ibrahim JG Bissahoyo A Threadgill DW 《Biometrics》2007,63(2):496-504

Time course microarray experiments designed to characterize the dynamic regulation of gene expression in biological systems are becoming increasingly important. One critical issue that arises when examining time course microarray data is the identification of genes that show different temporal expression patterns among biological conditions. Here we propose a Bayesian hierarchical model to incorporate important experimental factors and to account for correlated gene expression measurements over time and over different genes. A new gene selection algorithm is also presented with the model to simultaneously identify genes that show changes in expression among biological conditions, in response to time and other experimental factors of interest. The algorithm performs well in terms of the false positive and false negative rates in simulation studies. The methodology is applied to a mouse model time course experiment to correlate temporal changes in azoxymethane-induced gene expression profiles with colorectal cancer susceptibility. 相似文献

12.

Clustering methods for microarray gene expression data 总被引：1，自引：0，他引：1

Belacel N Wang Q Cuperlovic-Culf M 《Omics : a journal of integrative biology》2006,10(4):507-531

Within the field of genomics, microarray technologies have become a powerful technique for simultaneously monitoring the expression patterns of thousands of genes under different sets of conditions. A main task now is to propose analytical methods to identify groups of genes that manifest similar expression patterns and are activated by similar conditions. The corresponding analysis problem is to cluster multi-condition gene expression data. The purpose of this paper is to present a general view of clustering techniques used in microarray gene expression data analysis. 相似文献

13.

A data-driven clustering method for time course gene expression data 总被引：1，自引：0，他引：1

Ma P Castillo-Davis CI Zhong W Liu JS 《Nucleic acids research》2006,34(4):1261-1269

Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a 'mean curve' construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSClust, is freely available (http://genemerge.bioteam.net/SSClust.html). 相似文献

14.

Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data 总被引：1，自引：0，他引：1

Christinat Y Wachmann B Zhang L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(4):583-593

Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and therefore might converge towards local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to OPSM, our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster's columns and vice-versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer dataset and random data. Results on the yeast genome showed that 89% of the one hundred biggest non-overlapping biclusters were enriched with Gene Ontology annotations. A comparison with OPSM and ISA demonstrated a better efficiency when using gene and condition orders. We present results on random and real datasets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters. 相似文献

15.

Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset

Liu X Sivaganesan S Yeung KY Guo J Bumgarner RE Medvedovic M 《Bioinformatics (Oxford, England)》2006,22(14):1737-1744

MOTIVATION: Identifying groups of co-regulated genes by monitoring their expression over various experimental conditions is complicated by the fact that such co-regulation is condition-specific. Ignoring the context-specific nature of co-regulation significantly reduces the ability of clustering procedures to detect co-expressed genes due to additional 'noise' introduced by non-informative measurements. RESULTS: We have developed a novel Bayesian hierarchical model and corresponding computational algorithms for clustering gene expression profiles across diverse experimental conditions and studies that accounts for context-specificity of gene expression patterns. The model is based on the Bayesian infinite mixtures framework and does not require a priori specification of the number of clusters. We demonstrate that explicit modeling of context-specificity results in increased accuracy of the cluster analysis by examining the specificity and sensitivity of clusters in microarray data. We also demonstrate that probabilities of co-expression derived from the posterior distribution of clusterings are valid estimates of statistical significance of created clusters. AVAILABILITY: The open-source package gimm is available at http://eh3.uc.edu/gimm. 相似文献

16.

A skellam model to identify differential patterns of gene expression induced by environmental signals

Libo Jiang Ke Mao Rongling Wu 《BMC genomics》2014,15(1)

相似文献

17.

Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development

Xuejing Li Casandra Panea Chris H. Wiggins Valerie Reinke Christina Leslie 《PLoS computational biology》2010,6(4)

相似文献

18.

Validating clustering for gene expression data 总被引：24，自引：0，他引：24

Yeung KY Haynor DR Ruzzo WL 《Bioinformatics (Oxford, England)》2001,17(4):309-318

MOTIVATION: Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. RESULTS: We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality. 相似文献

19.

Evaluation and optimization of clustering in gene expression data analysis

Famili AF Liu G Liu Z 《Bioinformatics (Oxford, England)》2004,20(10):1535-1545

MOTIVATION: A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. RESULTS: We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a cluster's stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters. AVAILABILITY: Please contact the first author. 相似文献

20.

Supervised cluster analysis for microarray data based on multivariate Gaussian mixture 总被引：7，自引：0，他引：7

Qu Y Xu S 《Bioinformatics (Oxford, England)》2004,20(12):1905-1913

MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi 相似文献