首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi  相似文献   

2.
MOTIVATION: Microarrays have become a central tool in biological research. Their applications range from functional annotation to tissue classification and genetic network inference. A key step in the analysis of gene expression data is the identification of groups of genes that manifest similar expression patterns. This translates to the algorithmic problem of clustering genes based on their expression patterns. RESULTS: We present a novel clustering algorithm, called CLICK, and its applications to gene expression analysis. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups (kernels) of highly similar elements, which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clusters. We report on the application of CLICK to a variety of gene expression data sets. In all those applications it outperformed extant algorithms according to several common figures of merit. We also point out that CLICK can be successfully used for the identification of common regulatory motifs in the upstream regions of co-regulated genes. Furthermore, we demonstrate how CLICK can be used to accurately classify tissue samples into disease types, based on their expression profiles. Finally, we present a new java-based graphical tool, called EXPANDER, for gene expression analysis and visualization, which incorporates CLICK and several other popular clustering algorithms. AVAILABILITY: http://www.cs.tau.ac.il/~rshamir/expander/expander.html  相似文献   

3.
4.
Gene co-expression network (GCN) mining identifies gene modules with highly correlated expression profiles across samples/conditions. It enables researchers to discover latent gene/molecule interactions, identify novel gene functions, and extract molecular features from certain disease/condition groups, thus helping to identify disease biomarkers. However, there lacks an easy-to-use tool package for users to mine GCN modules that are relatively small in size with tightly connected genes that can be convenient for downstream gene set enrichment analysis, as well as modules that may share common members. To address this need, we developed an online GCN mining tool package: TSUNAMI (Tools SUite for Network Analysis and MIning). TSUNAMI incorporates our state-of-the-art lmQCM algorithm to mine GCN modules for both public and user-input data (microarray, RNA-seq, or any other numerical omics data), and then performs downstream gene set enrichment analysis for the identified modules. It has several features and advantages: 1) a user-friendly interface and real-time co-expression network mining through a web server; 2) direct access and search of NCBI Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, as well as user-input gene expression matrices for GCN module mining; 3) multiple co-expression analysis tools to choose from, all of which are highly flexible in regards to parameter selection options; 4) identified GCN modules are summarized to eigengenes, which are convenient for users to check their correlation with other clinical traits; 5) integrated downstream Enrichr enrichment analysis and links to other gene set enrichment tools; and 6) visualization of gene loci by Circos plot in any step of the process. The web service is freely accessible through URL: https://biolearns.medicine.iu.edu/. Source code is available at https://github.com/huangzhii/TSUNAMI/.  相似文献   

5.
The Chinese hamster ovary (CHO) cell line is one of the most widely used mammalian cell lines for biopharmaceutical production. We have developed and characterized a gene expression microarray (WyeHamster2a) specific for CHO cells that has enabled the study of ~3,500 sequences. Analysis of multiple sets of replicate scans showed that data derived from the WyeHamster2a array is highly reproducible confirming it as a robust tool for profiling. Twelve gene sequences were selected for follow-up RT-qPCR to confirm the accuracy and precision of the microarray results. In all but the most subtle gene expression differences, the microarray proved to be a reliable measure of differential gene expression. Finally, we were able to quantify the difference between using a bona fide CHO-specific microarray for profiling CHO cells versus an alternate, commercially available, rodent microarray such as a mouse or rat-specific format.  相似文献   

6.
MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques.  相似文献   

7.

Background  

Gene clustering has been widely used to group genes with similar expression pattern in microarray data analysis. Subsequent enrichment analysis using predefined gene sets can provide clues on which functional themes or regulatory sequence motifs are associated with individual gene clusters. In spite of the potential utility, gene clustering and enrichment analysis have been used in separate platforms, thus, the development of integrative algorithm linking both methods is highly challenging.  相似文献   

8.

Background  

Gene set enrichment analysis (GSEA) is a microarray data analysis method that uses predefined gene sets and ranks of genes to identify significant biological changes in microarray data sets. GSEA is especially useful when gene expression changes in a given microarray data set is minimal or moderate.  相似文献   

9.
While numerous genes that play important regulatory roles during tooth development in mice have been identified, little is known about gene expression profile and their function during human odontogenesis. To unveil expression profile of odontogenic genes in humans, we conducted genome-wide gene expression analysis by microarray assays to analyze differential gene expression between tooth germ and lip tissue from 11-week old human fetuses. We identified 167 genes that are strongly expressed in the cap stage tooth germ as compared to the lip tissue. Among them, 145 genes were further identified by gene ontology enrichment analysis that are highly represented in multiple gene ontology classes, include extracellular components, sequence-specific DNA binding proteins, Wnt-protein binding molecules, system development, organogenesis, and cell differentiation. Sixty-seven genes that are known to be associated with mammalian tooth development and tooth abnormalities were identified. Real-time PCR was further employed to validate microarray data. Moreover, in situ hybridization assay demonstrated tooth type specific expression of ISL1 and BARX1 in the incisor, canine, and molar respectively, consistent with microarray results. Our results represent a set of reliable data that could provide a solid base for future elaboration of molecular mechanisms underlying human tooth development.  相似文献   

10.
MOTIVATION: Clustering techniques such as k-means and hierarchical clustering are commonly used to analyze DNA microarray derived gene expression data. However, the interactions between processes underlying the cell activity suggest that the complexity of the microarray data structure may not be fully represented with discrete clustering methods. RESULTS: A newly developed software tool called MILVA (microarray latent visualization and analysis) is presented here to investigate microarray data without separating gene expression profiles into discrete classes. The underpinning of the MILVA software is the two-dimensional topographic representation of multidimensional microarray data. On this basis, the interactive MILVA functions allow a continuous exploration of microarray data driven by the direct supervision of the biologist in detecting activity patterns of co-regulated genes. AVAILABILITY: The MILVA software is freely available. The software and the related documentation can be downloaded from http://www.ncrg.aston.ac.uk/Projects/milva. User 'surrey' as username and '3245' as password to login. The software is currently available for Windows platform only.  相似文献   

11.
Idiopathic pulmonary fibrosis (IPF), characterized by irreversible scarring and progressive destruction of the lung tissue, is one of the most common types of idiopathic interstitial pneumonia worldwide. However, there are no reliable candidates for curative therapies. Hence, elucidation of the mechanisms of IPF genesis and exploration of potential biomarkers and prognostic indicators are essential for accurate diagnosis and treatment of IPF. Recently, efficient microarray and bioinformatics analyses have promoted an understanding of the molecular mechanisms of disease occurrence and development, which is necessary to explore genetic alternations and identify potential diagnostic biomarkers. However, high false-positive rates results have been observed based on single microarray datasets. In the current study, we performed a comprehensive analysis of the differential expression, biological functions, and interactions of IPF-related genes. Three publicly available microarray datasets including 54 IPF samples and 34 normal samples were integrated by performing gene set enrichment analysis and analyzing differentially expressed genes (DEGs). Our results identified 350 DEGs genetically associated with IPF. Gene ontology analyses revealed that the changes in the modules were mostly enriched in the positive regulation of smooth muscle cell proliferation, positive regulation of inflammatory responses, and the extracellular space. Kyoto encyclopedia of genes and genomes enrichment analysis of DEGs revealed that IPF involves the TNF signaling pathway, NOD-like receptor signaling pathway, and PPAR signaling pathway. To identify key genes related to IPF in the protein-protein interaction network, 20 hub genes were screened out with highest scores. Our results provided a framework for developing new pathological molecular networks related to specific diseases in silico.  相似文献   

12.
Microarrays: technologies overview and data analysis   总被引:2,自引:0,他引:2  
DNA microarrays are a powerful tool to investigate differential gene expression for thousands of genes simultaneously. In this review, recent advances in DNA microarray technologies and their applications are examined. Various DNA microarray platforms are described along with their methods for fabrication and their use. In addition some algorithms and tools for the analysis of microarray expression data, including clustering methods, partitioning and machine learning methods are discussed.  相似文献   

13.
MOTIVATION: The increasing use of microarray technologies is generating large amounts of data that must be processed in order to extract useful and rational fundamental patterns of gene expression. Hierarchical clustering technology is one method used to analyze gene expression data, but traditional hierarchical clustering algorithms suffer from several drawbacks (e.g. fixed topology structure; mis-clustered data which cannot be reevaluated). In this paper, we introduce a new hierarchical clustering algorithm that overcomes some of these drawbacks. RESULT: We propose a new tree-structure self-organizing neural network, called dynamically growing self-organizing tree (DGSOT) algorithm for hierarchical clustering. The DGSOT constructs a hierarchy from top to bottom by division. At each hierarchical level, the DGSOT optimizes the number of clusters, from which the proper hierarchical structure of the underlying dataset can be found. In addition, we propose a new cluster validation criterion based on the geometric property of the Voronoi partition of the dataset in order to find the proper number of clusters at each hierarchical level. This criterion uses the Minimum Spanning Tree (MST) concept of graph theory and is computationally inexpensive for large datasets. A K-level up distribution (KLD) mechanism, which increases the scope of data distribution in the hierarchy construction, was used to improve the clustering accuracy. The KLD mechanism allows the data misclustered in the early stages to be reevaluated at a later stage and increases the accuracy of the final clustering result. The clustering result of the DGSOT is easily displayed as a dendrogram for visualization. Based on a yeast cell cycle microarray expression dataset, we found that our algorithm extracts gene expression patterns at different levels. Furthermore, the biological functionality enrichment in the clusters is considerably high and the hierarchical structure of the clusters is more reasonable. AVAILABILITY: DGSOT is available upon request from the authors.  相似文献   

14.
MOTIVATION: A promising and reliable approach to annotate gene function is clustering genes not only by using gene expression data but also literature information, especially gene networks. RESULTS: We present a systematic method for gene clustering by combining these totally different two types of data, particularly focusing on network modularity, a global feature of gene networks. Our method is based on learning a probabilistic model, which we call a hidden modular random field in which the relation between hidden variables directly represents a given gene network. Our learning algorithm which minimizes an energy function considering the network modularity is practically time-efficient, regardless of using the global network property. We evaluated our method by using a metabolic network and microarray expression data, changing with microarray datasets, parameters of our model and gold standard clusters. Experimental results showed that our method outperformed other four competing methods, including k-means and existing graph partitioning methods, being statistically significant in all cases. Further detailed analysis showed that our method could group a set of genes into a cluster which corresponds to the folate metabolic pathway while other methods could not. From these results, we can say that our method is highly effective for gene clustering and annotating gene function.  相似文献   

15.
Competitive gene set tests are commonly used in molecular pathway analysis to test for enrichment of a particular gene annotation category amongst the differential expression results from a microarray experiment. Existing gene set tests that rely on gene permutation are shown here to be extremely sensitive to inter-gene correlation. Several data sets are analyzed to show that inter-gene correlation is non-ignorable even for experiments on homogeneous cell populations using genetically identical model organisms. A new gene set test procedure (CAMERA) is proposed based on the idea of estimating the inter-gene correlation from the data, and using it to adjust the gene set test statistic. An efficient procedure is developed for estimating the inter-gene correlation and characterizing its precision. CAMERA is shown to control the type I error rate correctly regardless of inter-gene correlations, yet retains excellent power for detecting genuine differential expression. Analysis of breast cancer data shows that CAMERA recovers known relationships between tumor subtypes in very convincing terms. CAMERA can be used to analyze specified sets or as a pathway analysis tool using a database of molecular signatures.  相似文献   

16.
cDNA microarray technology enables detailed analysis of gene expression throughout complex processes such as differentiation. The aim of this study was to analyze the gene expression profile of normal human intestinal epithelial cells using cell models that recapitulate the crypt-villus axis of intestinal differentiation in comparison with the widely used Caco-2 cell model. cDNA microarrays (19,200 human genes) and a clustering algorithm were used to identify patterns of gene expression in the crypt-like proliferative HIEC and tsFHI cells, and villus epithelial cells as well as Caco-2/15 cells at two distinct stages of differentiation. Unsupervised hierarchical clustering analysis of global gene expression among the cell lines identified two branches: one for the HIEC cells versus a second comprised of two sub-groups: (a) the proliferative Caco-2 cells and (b) the differentiated Caco-2 cells and closely related villus epithelial cells. At the gene level, supervised hierarchical clustering with 272 differentially expressed genes revealed distinct expression patterns specific to each cell phenotype. We identified several upregulated genes that could lead to the identification of new regulatory pathways involved in cell differentiation and carcinogenesis. The combined use of microarray analysis and human intestinal cell models thus provides a powerful tool for establishing detailed gene expression profiles of proliferative to terminally differentiated intestinal cells. Furthermore, the molecular differences between the normal human intestinal cell models and Caco-2 cells clearly point out the strengths and limitations of this widely used experimental model for studying intestinal cell proliferation and differentiation.  相似文献   

17.
Konno R 《Human cell》2001,14(4):261-266
Gene expression of human ovarian carcinoma cell lines and epithelial ovarian tumors was examined by oligonucleotide microarray for about 6000 human cDNAs. (1) Comparison of gene expression between CDDP-sensitive human ovarian serous adenocarcinoma cell lines and CDDP-resistant cell lines revealed that gamma-glutamylcysteine synthetase, glutathione peroxidase-like protein, dehydrogenase (UGDH), NAD(P)H: quinoneoxireductase, glucose-6-phosphatase, ornithine decarboxylase and dihydrodiol dehydrogenase were associated with a mechanism of CDDP-resistance. Comparison of gene expression between taxol-sensitive human ovarian cell lines and taxol-resistant cell lines showed that up-regulation of 30 kinds of gene expression including MDR and semaphorin E in taxol-resistant cell lines. (2) Comparison of gene expression among serous adenocarcinomas, clear cell adenocarcinomas and non-cancerous ovarian tissues by hierarchical clustering demonstrated that clear difference between carcinomas and non-cancerous ovarian tissues but not obvious difference between serous and clear adenocarcinomas. Genes that were up- and down-regulated specifically in these two types of ovarian carcinomas were further selected by the criteria that difference in the mRNA level by more than 4-fold between tumors and non-cancerous tissues. Tissue type specific alterations of gene expression are likely to play important roles in the carcinogenesis of epithelial ovarian tumors. cDNA microarray is a powerful and high-throughput tool to analyze gene expression of cancer development.  相似文献   

18.
19.
MOTIVATION: Gene expression profiling is a powerful approach to identify genes that may be involved in a specific biological process on a global scale. For example, gene expression profiling of mutant animals that lack or contain an excess of certain cell types is a common way to identify genes that are important for the development and maintenance of given cell types. However, it is difficult for traditional computational methods, including unsupervised and supervised learning methods, to detect relevant genes from a large collection of expression profiles with high sensitivity and specificity. Unsupervised methods group similar gene expressions together while ignoring important prior biological knowledge. Supervised methods utilize training data from prior biological knowledge to classify gene expression. However, for many biological problems, little prior knowledge is available, which limits the prediction performance of most supervised methods. RESULTS: We present a Bayesian semi-supervised learning method, called BGEN, that improves upon supervised and unsupervised methods by both capturing relevant expression profiles and using prior biological knowledge from literature and experimental validation. Unlike currently available semi-supervised learning methods, this new method trains a kernel classifier based on labeled and unlabeled gene expression examples. The semi-supervised trained classifier can then be used to efficiently classify the remaining genes in the dataset. Moreover, we model the confidence of microarray probes and probabilistically combine multiple probe predictions into gene predictions. We apply BGEN to identify genes involved in the development of a specific cell lineage in the C. elegans embryo, and to further identify the tissues in which these genes are enriched. Compared to K-means clustering and SVM classification, BGEN achieves higher sensitivity and specificity. We confirm certain predictions by biological experiments. AVAILABILITY: The results are available at http://www.csail.mit.edu/~alanqi/projects/BGEN.html.  相似文献   

20.
Accurate prediction of survival of cancer patients is still a key open problem in clinical research. Recently, many large-scale gene expression clusterings have identified sets of genes reportedly predictive of prognosis; however, those gene sets shared few genes in common and were poorly validated using independent data. We have developed a systems biology-based approach by using either combined gene sets and the protein interaction network (Method A) or the protein network alone (Method B) to identify common prognostic genes based on microarray gene expression data of glioblastoma multiforme and compared with differential gene expression clustering (Method C). Validations of prediction performance show that the 23-prognostic gene classifier identified by Method A outperforms other gene classifiers identified by Methods B and C or previously reported for gliomas on 17 of 20 independent sample cohorts across five tumor types. We also find that among the 23 genes are 21 related to cellular proliferation and two related to response to stress/immune response. We further find that the increased expression of the 21 genes and the decreased expression of the other two genes are associated with poorer survival, which is supportive with the notion that cellular proliferation and immune response contribute to a significant portion of predictive power of prognostic classifiers. Our results demonstrate that the systems biology-based approach enables to identify common survival-associated genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号