首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Data clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process.  相似文献   

2.
MOTIVATION: Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. RESULTS: We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies.  相似文献   

3.
Whole genomic DNA-DNA hybridization has been a cornerstone of bacterial species determination but is not widely used because it is not easily implemented. We have developed a method based on random genome fragments and DNA microarray technology that overcomes the disadvantages of whole-genome DNA-DNA hybridization. Reference genomes of four fluorescent Pseudomonas species were fragmented, and 60 to 96 genome fragments of approximately 1 kb from each strain were spotted on microarrays. Genomes from 12 well-characterized fluorescent Pseudomonas strains were labeled with Cy dyes and hybridized to the arrays. Cluster analysis of the hybridization profiles revealed taxonomic relationships between bacterial strains tested at species to strain level resolution, suggesting that this approach is useful for the identification of bacteria as well as determining the genetic distance among bacteria. Since arrays can contain thousands of DNA spots, a single array has the potential for broad identification capacity. In addition, the method does not require laborious cross-hybridizations and can provide an open database of hybridization profiles, avoiding the limitations of traditional DNA-DNA hybridization.  相似文献   

4.
Whole genomic DNA-DNA hybridization has been a cornerstone of bacterial species determination but is not widely used because it is not easily implemented. We have developed a method based on random genome fragments and DNA microarray technology that overcomes the disadvantages of whole-genome DNA-DNA hybridization. Reference genomes of four fluorescent Pseudomonas species were fragmented, and 60 to 96 genome fragments of approximately 1 kb from each strain were spotted on microarrays. Genomes from 12 well-characterized fluorescent Pseudomonas strains were labeled with Cy dyes and hybridized to the arrays. Cluster analysis of the hybridization profiles revealed taxonomic relationships between bacterial strains tested at species to strain level resolution, suggesting that this approach is useful for the identification of bacteria as well as determining the genetic distance among bacteria. Since arrays can contain thousands of DNA spots, a single array has the potential for broad identification capacity. In addition, the method does not require laborious cross-hybridizations and can provide an open database of hybridization profiles, avoiding the limitations of traditional DNA-DNA hybridization.  相似文献   

5.
MOTIVATION: Genome-wide gene expression measurements, as currently determined by the microarray technology, can be represented mathematically as points in a high-dimensional gene expression space. Genes interact with each other in regulatory networks, restricting the cellular gene expression profiles to a certain manifold, or surface, in gene expression space. To obtain knowledge about this manifold, various dimensionality reduction methods and distance metrics are used. For data points distributed on curved manifolds, a sensible distance measure would be the geodesic distance along the manifold. In this work, we examine whether an approximate geodesic distance measure captures biological similarities better than the traditionally used Euclidean distance. RESULTS: We computed approximate geodesic distances, determined by the Isomap algorithm, for one set of lymphoma and one set of lung cancer microarray samples. Compared with the ordinary Euclidean distance metric, this distance measure produced more instructive, biologically relevant, visualizations when applying multidimensional scaling. This suggests the Isomap algorithm as a promising tool for the interpretation of microarray data. Furthermore, the results demonstrate the benefit and importance of taking nonlinearities in gene expression data into account.  相似文献   

6.
DNA microarrays may be used to identify microbial species present in environmental and clinical samples. However, automated tools for reliable species identification based on observed microarray hybridization patterns are lacking. We present an algorithm, E-Predict, for microarray-based species identification. E-Predict compares observed hybridization patterns with theoretical energy profiles representing different species. We demonstrate the application of the algorithm to viral detection in a set of clinical samples and discuss its relevance to other metagenomic applications.  相似文献   

7.
8.
9.
Determination of stromal signatures in breast carcinoma   总被引:2,自引:0,他引:2       下载免费PDF全文
Many soft tissue tumors recapitulate features of normal connective tissue. We hypothesize that different types of fibroblastic tumors are representative of different populations of fibroblastic cells or different activation states of these cells. We examined two tumors with fibroblastic features, solitary fibrous tumor (SFT) and desmoid-type fibromatosis (DTF), by DNA microarray analysis and found that they have very different expression profiles, including significant differences in their patterns of expression of extracellular matrix genes and growth factors. Using immunohistochemistry and in situ hybridization on a tissue microarray, we found that genes specific for these two tumors have mutually specific expression in the stroma of nonneoplastic tissues. We defined a set of 786 gene spots whose pattern of expression distinguishes SFT from DTF. In an analysis of DNA microarray gene expression data from 295 previously published breast carcinomas, we found that expression of this gene set defined two groups of breast carcinomas with significant differences in overall survival. One of the groups had a favorable outcome and was defined by the expression of DTF genes. The other group of tumors had a poor prognosis and showed variable expression of genes enriched for SFT type. Our findings suggest that the host stromal response varies significantly among carcinomas and that gene expression patterns characteristic of soft tissue tumors can be used to discover new markers for normal connective tissue cells.  相似文献   

10.
Analysis of gene expression data using self-organizing maps.   总被引:29,自引:0,他引:29  
DNA microarray technologies together with rapidly increasing genomic sequence information is leading to an explosion in available gene expression data. Currently there is a great need for efficient methods to analyze and visualize these massive data sets. A self-organizing map (SOM) is an unsupervised neural network learning algorithm which has been successfully used for the analysis and organization of large data files. We have here applied the SOM algorithm to analyze published data of yeast gene expression and show that SOM is an excellent tool for the analysis and visualization of gene expression profiles.  相似文献   

11.
The detection of genes that show similar profiles under different experimental conditions is often an initial step in inferring the biological significance of such genes. Visualization tools are used to identify genes with similar profiles in microarray studies. Given the large number of genes recorded in microarray experiments, gene expression data are generally displayed on a low dimensional plot, based on linear methods. However, microarray data show nonlinearity, due to high-order terms of interaction between genes, so alternative approaches, such as kernel methods, may be more appropriate. We introduce a technique that combines kernel principal component analysis (KPCA) and Biplot to visualize gene expression profiles. Our approach relies on the singular value decomposition of the input matrix and incorporates an additional step that involves KPCA. The main properties of our method are the extraction of nonlinear features and the preservation of the input variables (genes) in the output display. We apply this algorithm to colon tumor, leukemia and lymphoma datasets. Our approach reveals the underlying structure of the gene expression profiles and provides a more intuitive understanding of the gene and sample association.  相似文献   

12.
13.
Cluster analysis of gene-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and constructing gene regulatory networks. The motivation for considering mutual information is its capacity to measure a general dependence among gene random variables. We propose a novel clustering strategy based on minimizing mutual information among gene clusters. Simulated annealing is employed to solve the optimization problem. Bootstrap techniques are employed to get more accurate estimates of mutual information when the data sample size is small. Moreover, we propose to combine the mutual information criterion and traditional distance criteria such as the Euclidean distance and the fuzzy membership metric in designing the clustering algorithm. The performances of the new clustering methods are compared with those of some existing methods, using both synthesized data and experimental data. It is seen that the clustering algorithm based on a combined metric of mutual information and fuzzy membership achieves the best performance. The supplemental material is available at www.gspsnap.tamu.edu/gspweb/zxb/glioma_zxb.  相似文献   

14.
The comparison of gene expression profiles among DNA microarray experiments enables the identification of unknown relationships among experiments to uncover the underlying biological relationships. Despite the ongoing accumulation of data in public databases, detecting biological correlations among gene expression profiles from multiple laboratories on a large scale remains difficult. Here, we applied a module (sets of genes working in the same biological action)-based correlation analysis in combination with a network analysis to Arabidopsis data and developed a 'module-based correlation network' (MCN) which represents relationships among DNA microarray experiments on a large scale. We developed a Web-based data analysis tool, 'AtCAST' (Arabidopsis thaliana: DNA Microarray Correlation Analysis Tool), which enables browsing of an MCN or mining of users' microarray data by mapping the data into an MCN. AtCAST can help researchers to find novel connections among DNA microarray experiments, which in turn will help to build new hypotheses to uncover physiological mechanisms or gene functions in Arabidopsis.  相似文献   

15.
16.
Rooman M  Albert J  Dehouck Y  Haye A 《PloS one》2011,6(12):e27948
Available DNA microarray time series that record gene expression along the developmental stages of multicellular eukaryotes, or in unicellular organisms subject to external perturbations such as stress and diauxie, are analyzed. By pairwise comparison of the gene expression profiles on the basis of a translation-invariant and scale-invariant distance measure corresponding to least-rectangle regression, it is shown that peaks in the average distance values are noticeable and are localized around specific time points. These points systematically coincide with the transition points between developmental phases or just follow the external perturbations. This approach can thus be used to identify automatically, from microarray time series alone, the presence of external perturbations or the succession of developmental stages in arbitrary cell systems. Moreover, our results show that there is a striking similarity between the gene expression responses to these a priori very different phenomena. In contrast, the cell cycle does not involve a perturbation-like phase, but rather continuous gene expression remodeling. Similar analyses were conducted using three other standard distance measures, showing that the one we introduced was superior. Based on these findings, we set up an adapted clustering method that uses this distance measure and classifies the genes on the basis of their expression profiles within each developmental stage or between perturbation phases.  相似文献   

17.
18.
Prostatic intraepithelial neoplasia (PIN) is considered the pre-malignant stage of prostate carcinoma, but little is known of its initiation and evolution. The identification of genes associated with these precursors of prostate cancer may elucidate the pathways of the early oncogenesis of this disease. Previously, we have reported that activin, a member of the TGFbeta superfamily, acted as an inhibitory growth factor in prostate cancer. We used laser capture microdissection, mRNA-library amplification (RNA-PCR), subtractive hybridization, and complementary DNA microarray to examine gene expression profiles in activin-positive PIN, compared with activin-negative PIN. Subtractive hybridization showed that 28 genes were differentially expressed (13 and 15 genes were up- and down-regulated, respectively). Microarray analysis identified 29 and 56 more genes (4 times) up- and down-regulated, respectively, suggesting that DNA microarray is a more effective method in screening gene profiles. We have validated the known genes identified by both subtractive hybridization and microarray technologies, using Northern blot analysis in the mRNA libraries generated from cells microdissected from pathological slides. We have successfully showed that at least 13 genes are involved in activin-associated PIN. The evaluation of candidate genes that emerge from these experiments provides a rational approach to investigate those genes significant in evolution from PIN to prostate carcinoma.  相似文献   

19.
We study the effects on clustering quality by different normalization and pre-clustering techniques for a novel mixed-integer nonlinear optimization-based clustering algorithm, the Global Optimum Search with Enhanced Positioning (EP_GOS_Clust). These are important issues to be addressed. DNA microarray experiments are informative tools to elucidate gene regulatory networks. But in order for gene expression levels to be comparable across microarrays, normalization procedures have to be properly undertaken. The aim of pre-clustering is to use an adequate amount of discriminatory characteristics to form rough information profiles, so that data with similar features can be pre-grouped together and outliers deemed insignificant to the clustering process can be removed. Using experimental DNA microarray data from the yeast Saccharomyces Cerevisiae, we study the merits of pre-clustering genes based on distance/correlation comparisons and symbolic representations such as {+, o, -}. As a performance metric, we look at the intra- and inter-cluster error sums, two generic but intuitive measures of clustering quality. We also use publicly available Gene Ontology resources to assess the clusters' level of biological coherence. Our analysis indicates a significant effect by normalization and pre-clustering methods on the clustering results. Hence, the outcome of this study has significance in fine-tuning the EP_GOS_Clust clustering approach.  相似文献   

20.
目的:研制猪链球菌2型(SS2)全基因组DNA芯片,建立SS2基因表达谱技术平台。方法:利用SS2全基因组序列,挑选出2194条基因,经PCR扩增出2156条基因并将产物纯化,点样制备芯片;将芯片用于表达谱研究,采用实时定量PCR验证表达谱结果,对芯片进行可靠性分析。结果:芯片杂交数据与实时定量PCR验证显示了较高的相关性,二者相关系数r=0.87。结论:研制了一批SS2全基因组DNA芯片,并建立了基于DNA芯片的表达谱技术平台。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号