共查询到20条相似文献,搜索用时 15 毫秒
1.
Genomics and proteomics approaches generate distinct gene expression and protein profiles, listing individual genes embedded in broad functional terms as gene ontologies. However, interpretation of gene profiles in a regulatory and functional context remains a major issue. Elucidation of regulatory mechanisms at the gene expression level via analysis of promoter regions is a prominent procedure to decipher such gene regulatory networks. We propose a novel genetic algorithm (GA) to extract joint promoter modules in a set of coexpressed genes as resulting from differential gene expression experiments. Algorithm design has focused on the following constraints: (I) identification of the major promoter modules, which are (II) characterized by a maximum number of joint motifs and (III) are found in a maximum number of coexpressed genes. The capability of the GA in detecting multiple modules was evaluated on various test data sets, analyzing the impact of the number of motifs per promoter module, the number of genes associated with a module, as well as the total number of distinct promoter modules encoded in a sequence set. In addition to the test data sets, the GA was evaluated on two biological examples, namely a muscle-specific data set and the upstream sequences of the beta-actin gene (ACTB) derived from different species, complemented by a comparison to alternative promoter module identification routines. 相似文献
2.
MOTIVATION: There is a growing interest in extracting statistical patterns from gene expression time-series data, in which a key challenge is the development of stable and accurate probabilistic models. Currently popular models, however, would be computationally prohibitive unless some independence assumptions are made to describe large-scale data. We propose an unsupervised conditional random fields (CRF) model to overcome this problem by progressively infusing information into the labelling process through a small variable voting pool. RESULTS: An unsupervised CRF model is proposed for efficient analysis of gene expression time series and is successfully applied to gene class discovery and class prediction. The proposed model treats each time series as a random field and assigns an optimal cluster label to each time series, so as to partition the time series into clusters without a priori knowledge about the number of clusters and the initial centroids. Another advantage of the proposed method is the relaxation of independence assumptions. 相似文献
3.
Krejník M Kléma J 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):788-798
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed. 相似文献
4.
5.
Current clustering methods are routinely applied to gene expressiontime course data to find genes with similar activation patternsand ultimately to understand the dynamics of biological processes.As the dynamic unfolding of a biological process often involvesthe activation of genes at different rates, successful clusteringin this context requires dealing with varying time and shapepatterns simultaneously. This motivates the combination of anovel pairwise warping with a suitable clustering method todiscover expression shape clusters. We develop a novel clusteringmethod that combines an initial pairwise curve alignment toadjust for time variation within likely clusters. The cluster-specifictime synchronization method shows excellent performance overstandard clustering methods in terms of cluster quality measuresin simulations and for yeast and human fibroblast data sets.In the yeast example, the discovered clusters have high concordancewith the known biological processes. 相似文献
6.
Gene-Ontology-based clustering of gene expression data 总被引:2,自引:0,他引:2
The expected correlation between genetic co-regulation and affiliation to a common biological process is not necessarily the case when numerical cluster algorithms are applied to gene expression data. GO-Cluster uses the tree structure of the Gene Ontology database as a framework for numerical clustering, and thus allowing a simple visualization of gene expression data at various levels of the ontology tree. AVAILABILITY: The 32-bit Windows application is freely available at http://www.mpibpc.mpg.de/go-cluster/ 相似文献
7.
8.
9.
Sperança MA Vinkenoog R Ocampos M Fischer K Janse CJ Waters AP del Portillo HA 《Experimental parasitology》2001,97(3):119-128
The cdc2 gene product, a 34-kDa protein kinase, plays a universal role in the M phase of the eukaryotic cell cycle. To study the cell cycle regulation in malarial parasites, we have characterized a cdc2-related gene from the most widely distributed human malaria, Plasmodium vivax (Pvcrk2). The full-length Pvcrk2 revealed 90--99% homology with Crk2 proteins from other Plasmodium species and approximately 60% homology with p34(cdc2) proteins from higher eukaryotes. We used the temperature-sensitive Schizosaccharomyces pombe cdc2 mutant (cdc2-33(ts)) for gene complementation studies. Expression of the full-length 33-kDa PvCrk2 protein, a truncated 27-kDa version, and two chimeric proteins in which we exchanged the N- and C-terminal regions of PvCrk2 with their S. pombe counterparts at the restrictive temperature in the mutant cdc2-33(ts) did not complement the cell cycle defect. However, conditional expression of the Pvcrk2 genes or the chimera containing the C terminus from Spcdc2 in mutant cdc2-33(ts) cells produced cell-cycle-arrested phenotypes only in the induced state and at the permissive temperature. Our results thus provide the first compelling genetic evidence that the plasmodial Crk2 gene product(s) is capable of interfering with the well-conserved eukaryotic cell cycle machinery. 相似文献
10.
Validating clustering for gene expression data 总被引:24,自引:0,他引:24
MOTIVATION: Many clustering algorithms have been proposed for the analysis of gene expression data, but little guidance is available to help choose among them. We provide a systematic framework for assessing the results of clustering algorithms. Clustering algorithms attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. Our methodology is to apply a clustering algorithm to the data from all but one experimental condition. The remaining condition is used to assess the predictive power of the resulting clusters-meaningful clusters should exhibit less variation in the remaining condition than clusters formed by chance. RESULTS: We successfully applied our methodology to compare six clustering algorithms on four gene expression data sets. We found our quantitative measures of cluster quality to be positively correlated with external standards of cluster quality. 相似文献
11.
It has been well established that gene expression data contain large amounts of random variation that affects both the analysis and the results of microarray experiments. Typically, microarray data are either tested for differential expression between conditions or grouped on the basis of profiles that are assessed temporally or across genetic or environmental conditions. While testing differential expression relies on levels of certainty to evaluate the relative worth of various analyses, cluster analysis is exploratory in nature and has not had the benefit of any judgment of statistical inference. By using a novel dissimilarity function to ascertain gene expression clusters and conditional randomization of the data space to illuminate distinctions between statistically significant clusters of gene expression patterns, we aim to provide a level of confidence to inferred clusters of gene expression data. We apply both permutation and convex hull approaches for randomization of the data space and show that both methods can provide an effective assessment of gene expression profiles whose coregulation is statistically different from that expected by random chance alone. 相似文献
12.
Adaptive quality-based clustering of gene expression profiles 总被引:17,自引:0,他引:17
De Smet F Mathys J Marchal K Thijs G De Moor B Moreau Y 《Bioinformatics (Oxford, England)》2002,18(5):735-746
MOTIVATION: Microarray experiments generate a considerable amount of data, which analyzed properly help us gain a huge amount of biologically relevant information about the global cellular behaviour. Clustering (grouping genes with similar expression profiles) is one of the first steps in data analysis of high-throughput expression measurements. A number of clustering algorithms have proved useful to make sense of such data. These classical algorithms, though useful, suffer from several drawbacks (e.g. they require the predefinition of arbitrary parameters like the number of clusters; they force every gene into a cluster despite a low correlation with other cluster members). In the following we describe a novel adaptive quality-based clustering algorithm that tackles some of these drawbacks. RESULTS: We propose a heuristic iterative two-step algorithm: First, we find in the high-dimensional representation of the data a sphere where the "density" of expression profiles is locally maximal (based on a preliminary estimate of the radius of the cluster-quality-based approach). In a second step, we derive an optimal radius of the cluster (adaptive approach) so that only the significantly coexpressed genes are included in the cluster. This estimation is achieved by fitting a model to the data using an EM-algorithm. By inferring the radius from the data itself, the biologist is freed from finding an optimal value for this radius by trial-and-error. The computational complexity of this method is approximately linear in the number of gene expression profiles in the data set. Finally, our method is successfully validated using existing data sets. AVAILABILITY: http://www.esat.kuleuven.ac.be/~thijs/Work/Clustering.html 相似文献
13.
14.
15.
Background
The four heterogeneous childhood cancers, neuroblastoma, non-Hodgkin lymphoma, rhabdomyosarcoma, and Ewing sarcoma present a similar histology of small round blue cell tumor (SRBCT) and thus often leads to misdiagnosis. Identification of biomarkers for distinguishing these cancers is a well studied problem. Existing methods typically evaluate each gene separately and do not take into account the nonlinear interaction between genes and the tools that are used to design the diagnostic prediction system. Consequently, more genes are usually identified as necessary for prediction. We propose a general scheme for finding a small set of biomarkers to design a diagnostic system for accurate classification of the cancer subgroups. We use multilayer networks with online gene selection ability and relational fuzzy clustering to identify a small set of biomarkers for accurate classification of the training and blind test cases of a well studied data set. 相似文献16.
17.
18.
19.
20.
Osmostress-induced changes in yeast gene expression 总被引:17,自引:0,他引:17
Joäo C. S. Varela Catelijne van Beekvelt Rudi J. Planta Willem H. Mager 《Molecular microbiology》1992,6(15):2183-2190
When Saccharomyces cerevisiae cells are exposed to high concentration of NaCl, they show reduced viability, methionine uptake and protein biosynthesis. Cells can acquire tolerance against a severe salt shock (up to 1.4 M NaCl) by a previous treatment with 0.7 M NaCl, but not by a previous heat shock. Two-dimensional analysis of [3H]-leucine-labelled proteins from salt-shocked cells (0.7 M NaCl) revealed the elevated rate of synthesis of nine proteins, among which were the heat-shock proteins hsp12 and hsp26. Northern analysis using gene-specific probes confirmed the identity of the latter proteins and, in addition, demonstrated the induction of glycerol-3-phosphate dehydrogenase gene expression. The synthesis of the same set of proteins is induced or enhanced upon exposure of cells to 0.8 M sucrose, although not as dramatically as in an iso-osmolar NaCl concentration (0.7 M). 相似文献