期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data 总被引：5，自引：0，他引：5

McShane LM Radmacher MD Freidlin B Yu R Li MC Simon R 《Bioinformatics (Oxford, England)》2002,18(11):1462-1469

MOTIVATION: Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen. A cDNA microarray experiment produces a gene expression 'profile'. Often interest lies in discovering novel subgroupings, or 'clusters', of specimens based on their profiles, for example identification of new tumor taxonomies. Cluster analysis techniques such as hierarchical clustering and self-organizing maps have frequently been used for investigating structure in microarray data. However, clustering algorithms always detect clusters, even on random data, and it is easy to misinterpret the results without some objective measure of the reproducibility of the clusters. RESULTS: We present statistical methods for testing for overall clustering of gene expression profiles, and we define easily interpretable measures of cluster-specific reproducibility that facilitate understanding of the clustering structure. We apply these methods to elucidate structure in cDNA microarray gene expression profiles obtained on melanoma tumors and on prostate specimens. 相似文献

2.

An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data

Hsu AL Tang SL Halgamuge SK 《Bioinformatics (Oxford, England)》2003,19(16):2131-2140

MOTIVATION: Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. RESULTS: The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). AVAILABILITY: JAVA software of dynamic SOM tree algorithm is available upon request for academic use. SUPPLEMENTARY INFORMATION: A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf 相似文献

3.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Grotkjaer T Winther O Regenberg B Nielsen J Hansen LK 《Bioinformatics (Oxford, England)》2006,22(1):58-67

相似文献

4.

Text mining biomedical literature for discovering gene-to-gene relationships: a comparative study of algorithms

Liu Y Navathe SB Civera J Dasigi V Ram A Ciliax BJ Dingledine R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(1):62-76

Partitioning closely related genes into clusters has become an important element of practically all statistical analyses of microarray data. A number of computer algorithms have been developed for this task. Although these algorithms have demonstrated their usefulness for gene clustering, some basic problems remain. This paper describes our work on extracting functional keywords from MEDLINE for a set of genes that are isolated for further study from microarray experiments based on their differential expression patterns. The sharing of functional keywords among genes is used as a basis for clustering in a new approach called BEA-PARTITION in this paper. Functional keywords associated with genes were extracted from MEDLINE abstracts. We modified the Bond Energy Algorithm (BEA), which is widely accepted in psychology and database design but is virtually unknown in bioinformatics, to cluster genes by functional keyword associations. The results showed that BEA-PARTITION and hierarchical clustering algorithm outperformed k-means clustering and self-organizing map by correctly assigning 25 of 26 genes in a test set of four known gene groups. To evaluate the effectiveness of BEA-PARTITION for clustering genes identified by microarray profiles, 44 yeast genes that are differentially expressed during the cell cycle and have been widely studied in the literature were used as a second test set. Using established measures of cluster quality, the results produced by BEA-PARTITION had higher purity, lower entropy, and higher mutual information than those produced by k-means and self-organizing map. Whereas BEA-PARTITION and the hierarchical clustering produced similar quality of clusters, BEA-PARTITION provides clear cluster boundaries compared to the hierarchical clustering. BEA-PARTITION is simple to implement and provides a powerful approach to clustering genes or to any clustering problem where starting matrices are available from experimental observations. 相似文献

5.

与实验条件相关的基因功能模块聚类分析方法 总被引：2，自引：0，他引：2

喻辉郭政李霞屠康《生物物理学报》2004,20(3):225-232

针对细胞内基因功能模块化的现象,定义了“基因功能模块”和“特征功能模块”两个概念,并基于这两个概念提出一种“与实验条件相关的基因功能模块聚类算法”。该算法综合利用基因功能知识与基因表达谱信息,将基因聚类为与实验条件相关的基因功能模块。向基因表达谱中加入水平逐渐升高的数据噪音,根据基因功能模块对数据噪音的抵抗力,确定最稳定的基因功能模块,即特征功能模块。加噪音实验显示,在基因芯片技术可能发生的噪音范围内,该算法对噪音的稳健性优于层次聚类和模糊C均值聚类。将模块聚类算法应用在NCI60数据集上,发现了8个与实验条件高度相关的特征功能模块。相似文献

6.

Binary tree-structured vector quantization approach to clustering and visualizing microarray data

Sultan M Wigle DA Cumbaa CA Maziarz M Glasgow J Tsao MS Jurisica I 《Bioinformatics (Oxford, England)》2002,18(Z1):S111-S119

MOTIVATION: With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified. RESULTS: Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach. 相似文献

7.

Fast iterative gene clustering based on information theoretic criteria for selecting the cluster structure.

Ciprian Doru Giurc?neanu Ioan T?bu? Jaakko Astola Juha Ollila Mauno Vihinen 《Journal of computational biology》2004,11(4):660-682

相似文献

8.

Using repeated measurements to validate hierarchical gene clusters

Bréhélin L Gascuel O Martin O 《Bioinformatics (Oxford, England)》2008,24(5):682-688

MOTIVATION: Hierarchical clustering is a common approach to study protein and gene expression data. This unsupervised technique is used to find clusters of genes or proteins which are expressed in a coordinated manner across a set of conditions. Because of both the biological and technical variability, experimental repetitions are generally performed. In this work, we propose an approach to evaluate the stability of clusters derived from hierarchical clustering by taking repeated measurements into account. RESULTS: The method is based on the bootstrap technique that is used to obtain pseudo-hierarchies of genes from resampled datasets. Based on a fast dynamic programming algorithm, we compare the original hierarchy to the pseudo-hierarchies and assess the stability of the original gene clusters. Then a shuffling procedure can be used to assess the significance of the cluster stabilities. Our approach is illustrated on simulated data and on two microarray datasets. Compared to the standard hierarchical clustering methodology, it allows to point out the dubious and stable clusters, and thus avoids misleading interpretations. AVAILABILITY: The programs were developed in C and R languages. 相似文献

9.

Hierarchical Dirichlet process model for gene expression clustering

Liming Wang Xiaodong Wang 《EURASIP Journal on Bioinformatics and Systems Biology》2013,2013(1):5

Clustering is an important data processing tool for interpreting microarray data and genomic network inference. In this article, we propose a clustering algorithm based on the hierarchical Dirichlet processes (HDP). The HDP clustering introduces a hierarchical structure in the statistical model which captures the hierarchical features prevalent in biological data such as the gene express data. We develop a Gibbs sampling algorithm based on the Chinese restaurant metaphor for the HDP clustering. We apply the proposed HDP algorithm to both regulatory network segmentation and gene expression clustering. The HDP algorithm is shown to outperform several popular clustering algorithms by revealing the underlying hierarchical structure of the data. For the yeast cell cycle data, we compare the HDP result to the standard result and show that the HDP algorithm provides more information and reduces the unnecessary clustering fragments. 相似文献

10.

Clustering approaches to identifying gene expression patterns from DNA microarray data

Do JH Choi DK 《Molecules and cells》2008,25(2):279-288

相似文献

11.

Clustering by soft-constraint affinity propagation: applications to gene-expression data 总被引：4，自引：0，他引：4

Leone M Sumedha Weigt M 《Bioinformatics (Oxford, England)》2007,23(20):2708-2715

MOTIVATION: Similarity-measure-based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck (2007a). In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, e.g. in analyzing gene expression data. RESULTS: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new a priori free parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster. 相似文献

12.

A graph-theoretic modeling on GO space for biological interpretation of gene clusters 总被引：3，自引：0，他引：3

Lee SG Hur JU Kim YS 《Bioinformatics (Oxford, England)》2004,20(3):381-388

相似文献

13.

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings

Torrente A Kapushesky M Brazma A 《Bioinformatics (Oxford, England)》2005,21(21):3993-3999

MOTIVATION: Clustering is one of the most widely used methods in unsupervised gene expression data analysis. The use of different clustering algorithms or different parameters often produces rather different results on the same data. Biological interpretation of multiple clustering results requires understanding how different clusters relate to each other. It is particularly non-trivial to compare the results of a hierarchical and a flat, e.g. k-means, clustering. RESULTS: We present a new method for comparing and visualizing relationships between different clustering results, either flat versus flat, or flat versus hierarchical. When comparing a flat clustering to a hierarchical clustering, the algorithm cuts different branches in the hierarchical tree at different levels to optimize the correspondence between the clusters. The optimization function is based on graph layout aesthetics or on mutual information. The clusters are displayed using a bipartite graph where the edges are weighted proportionally to the number of common elements in the respective clusters and the weighted number of crossings is minimized. The performance of the algorithm is tested using simulated and real gene expression data. The algorithm is implemented in the online gene expression data analysis tool Expression Profiler. AVAILABILITY: http://www.ebi.ac.uk/expressionprofiler 相似文献

14.

Gene ordering in partitive clustering using microarray expressions

Ray SS Bandyopadhyay S Pal SK 《Journal of biosciences》2007,32(5):1019-1025

A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarray gene expressions.Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution. 相似文献

15.

Simultaneous gene clustering and subset selection for sample classification via MDL

Jörnsten R Yu B 《Bioinformatics (Oxford, England)》2003,19(9):1100-1109

相似文献

16.

Searching the principal genes for neural differentiation of mouse ES cells by factorizing eigengenes of clusters

Kim HY Kim MJ Han JI Kim BK Lee YS Lee YS Kim JH 《Bio Systems》2009,95(1):17-25

A time-series microarray experiment is useful to study the changes in the expression of a large number of genes over time. Many methods for clustering genes using gene expression profiles have been suggested, but it is not easy to interpret the biological significance of the results or utilize these methods for understanding the dynamics of gene regulatory systems. In this study, we introduce an algorithm for readjusting the boundaries of clusters by adopting the advantages of both k-means and singular value decomposition (SVD). In addition, we suggest a methodology for searching the principal genes that can be the most crucial genes in regulation of clusters. We found 34 principal genes from 171 clusters having strong concentratedness in their expression patterns and distinct ranges of oscillatory phases, by using a time-series microarray dataset of mouse embryonic stem (ES) cells after induction of dopaminergic neural differentiation. The biological significance of the principal genes examined in the literature supports the feasibility of our algorithms in that the hierarchy of clusters may lead the manifestation of the phenotypes, e.g., the development of the nervous system. 相似文献

17.

Detecting clusters of different geometrical shapes in microarray gene expression data 总被引：1，自引：0，他引：1

Kim DW Lee KH Lee D 《Bioinformatics (Oxford, England)》2005,21(9):1927-1934

MOTIVATION: Clustering has been used as a popular technique for finding groups of genes that show similar expression patterns under multiple experimental conditions. Many clustering methods have been proposed for clustering gene-expression data, including the hierarchical clustering, k-means clustering and self-organizing map (SOM). However, the conventional methods are limited to identify different shapes of clusters because they use a fixed distance norm when calculating the distance between genes. The fixed distance norm imposes a fixed geometrical shape on the clusters regardless of the actual data distribution. Thus, different distance norms are required for handling the different shapes of clusters. RESULTS: We present the Gustafson-Kessel (GK) clustering method for microarray gene-expression data. To detect clusters of different shapes in a dataset, we use an adaptive distance norm that is calculated by a fuzzy covariance matrix (F) of each cluster in which the eigenstructure of F is used as an indicator of the shape of the cluster. Moreover, the GK method is less prone to falling into local minima than the k-means and SOM because it makes decisions through the use of membership degrees of a gene to clusters. The algorithmic procedure is accomplished by the alternating optimization technique, which iteratively improves a sequence of sets of clusters until no further improvement is possible. To test the performance of the GK method, we applied the GK method and well-known conventional methods to three recently published yeast datasets, and compared the performance of each method using the Saccharomyces Genome Database annotations. The clustering results of the GK method are more significantly relevant to the biological annotations than those of the other methods, demonstrating its effectiveness and potential for clustering gene-expression data. AVAILABILITY: The software was developed using Java language, and can be executed on the platforms that JVM (Java Virtual Machine) is running. It is available from the authors upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://dragon.kaist.ac.kr/gk. 相似文献

18.

A novel harmony search-K means hybrid algorithm for clustering gene expression data

KA Abdul Nazeer MP Sebastian SD Madhu Kumar 《Bioinformation》2013,9(2):84-88

Recent progress in bioinformatics research has led to the accumulation of huge quantities of biological data at various data sources. The DNA microarray technology makes it possible to simultaneously analyze large number of genes across different samples. Clustering of microarray data can reveal the hidden gene expression patterns from large quantities of expression data that in turn offers tremendous possibilities in functional genomics, comparative genomics, disease diagnosis and drug development. The k- ¬means clustering algorithm is widely used for many practical applications. But the original k-¬means algorithm has several drawbacks. It is computationally expensive and generates locally optimal solutions based on the random choice of the initial centroids. Several methods have been proposed in the literature for improving the performance of the k-¬means algorithm. A meta-heuristic optimization algorithm named harmony search helps find out near-global optimal solutions by searching the entire solution space. Low clustering accuracy of the existing algorithms limits their use in many crucial applications of life sciences. In this paper we propose a novel Harmony Search-K means Hybrid (HSKH) algorithm for clustering the gene expression data. Experimental results show that the proposed algorithm produces clusters with better accuracy in comparison with the existing algorithms. 相似文献

19.

Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study

Junbai?Wang Email author Jan?Delabie Hans?Christian?Aasheim Erlend?Smeland Ola?Myklebost 《BMC bioinformatics》2002,3(1):36

Background

A method to evaluate and analyze the massive data generated by series of microarray experiments is of utmost importance to reveal the hidden patterns of gene expression. Because of the complexity and the high dimensionality of microarray gene expression profiles, the dimensional reduction of raw expression data and the feature selections necessary for, for example, classification of disease samples remains a challenge. To solve the problem we propose a two-level analysis. First self-organizing map (SOM) is used. SOM is a vector quantization method that simplifies and reduces the dimensionality of original measurements and visualizes individual tumor sample in a SOM component plane. Next, hierarchical clustering and K-means clustering is used to identify patterns of gene expression useful for classification of samples. 相似文献

20.

Genesis: cluster analysis of microarray data 总被引：26，自引：0，他引：26

Sturn A Quackenbush J Trajanoski Z 《Bioinformatics (Oxford, England)》2002,18(1):207-208

相似文献