期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams

Yoon S Nardini C Benini L De Micheli G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(4):339-354

The biclustering method can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. This is because the biclustering approach, in contrast to the conventional clustering techniques, focuses on finding a subset of the genes and a subset of the experimental conditions that together exhibit coherent behavior. However, the biclustering problem is inherently intractable, and it is often computationally costly to find biclusters with high levels of coherence. In this work, we propose a novel biclustering algorithm that exploits the zero-suppressed binary decision diagrams (ZBDDs) data structure to cope with the computational challenges. Our method can find all biclusters that satisfy specific input conditions, and it is scalable to practical gene expression data. We also present experimental results confirming the effectiveness of our approach. 相似文献

2.

Group additive regression models for genomic data analysis

Luan Y Li H 《Biostatistics (Oxford, England)》2008,9(1):100-113

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer-specific survival. 相似文献

3.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data

Zhao H Liew AW Xie X Yan H 《Journal of theoretical biology》2008,251(2):264-274

Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes. 相似文献

4.

A computational approach to measuring coherence of gene expression in pathways

Yang HH Hu Y Buetow KH Lee MP 《Genomics》2004,84(1):211-217

This study uses a computational approach to analyze coherence of expression of genes in pathways. Microarray data were analyzed with respect to coherent gene expression in a group of genes defined as a pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Our hypothesis is that genes in the same pathway are more likely to be coordinately regulated than a randomly selected gene set. A correlation coefficient for each pair of genes in a pathway was estimated based on gene expression in normal or tumor samples, and statistically significant correlation coefficients were identified. The coherence indicator was defined as the ratio of the number of gene pairs in the pathway whose correlation coefficients are significant, divided by the total number of gene pairs in the pathway. We defined all genes that appeared in the KEGG pathways as a reference gene set. Our analysis indicated that the mean coherence indicator of pathways is significantly larger than the mean coherence indicator of random gene sets drawn from the reference gene set. Thus, the result supports our hypothesis. The significance of each individual pathway of n genes was evaluated by comparing its coherence indicator with coherence indicators of 1000 random permutation sets of n genes chosen from the reference gene set. We analyzed three data sets: two Affymetrix microarrays and one cDNA microarray. For each of the three data sets, statistically significant pathways were identified among all KEGG pathways. Seven of 96 pathways had a significant coherence indicator in normal tissue and 14 of 96 pathways had a significant coherence indicator in tumor tissue in all three data sets. The increase in the number of pathways with significant coherence indicators may reflect the fact that tumor cells have a higher rate of metabolism than normal cells. Five pathways involved in oxidative phosphorylation, ATP synthesis, protein synthesis, or RNA synthesis were coherent in both normal and tumor tissue, demonstrating that these are essential genes, a high level of expression of which is required regardless of cell type. 相似文献

5.

Recent patents on biclustering algorithms for gene expression data analysis

Liew AW Law NF Yan H 《Recent patents on DNA & gene sequences》2011,5(2):117-125

相似文献

6.

Gene expression data analysis using a novel approach to biclustering combining discrete and continuous data 总被引：1，自引：0，他引：1

Christinat Y Wachmann B Zhang L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2008,5(4):583-593

Many different methods exist for pattern detection in gene expression data. In contrast to classical methods, biclustering has the ability to cluster a group of genes together with a group of conditions (replicates, set of patients or drug compounds). However, since the problem is NP-complex, most algorithms use heuristic search functions and therefore might converge towards local maxima. By using the results of biclustering on discrete data as a starting point for a local search function on continuous data, our algorithm avoids the problem of heuristic initialization. Similar to OPSM, our algorithm aims to detect biclusters whose rows and columns can be ordered such that row values are growing across the bicluster's columns and vice-versa. Results have been generated on the yeast genome (Saccharomyces cerevisiae), a human cancer dataset and random data. Results on the yeast genome showed that 89% of the one hundred biggest non-overlapping biclusters were enriched with Gene Ontology annotations. A comparison with OPSM and ISA demonstrated a better efficiency when using gene and condition orders. We present results on random and real datasets that show the ability of our algorithm to capture statistically significant and biologically relevant biclusters. 相似文献

7.

Pathway Network Analyses for Autism Reveal Multisystem Involvement,Major Overlaps with Other Diseases and Convergence upon MAPK and Calcium Signaling

Ya Wen Mohamad J. Alshikho Martha R. Herbert 《PloS one》2016,11(4)

We used established databases in standard ways to systematically characterize gene ontologies, pathways and functional linkages in the large set of genes now associated with autism spectrum disorders (ASDs). These conditions are particularly challenging—they lack clear pathognomonic biological markers, they involve great heterogeneity across multiple levels (genes, systemic biological and brain characteristics, and nuances of behavioral manifestations)—and yet everyone with this diagnosis meets the same defining behavioral criteria. Using the human gene list from Simons Foundation Autism Research Initiative (SFARI) we performed gene set enrichment analysis with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database, and then derived a pathway network from pathway-pathway functional interactions again in reference to KEGG. Through identifying the GO (Gene Ontology) groups in which SFARI genes were enriched, mapping the coherence between pathways and GO groups, and ranking the relative strengths of representation of pathway network components, we 1) identified 10 disease-associated and 30 function-associated pathways 2) revealed calcium signaling pathway and neuroactive ligand-receptor interaction as the most enriched, statistically significant pathways from the enrichment analysis, 3) showed calcium signaling pathways and MAPK signaling pathway to be interactive hubs with other pathways and also to be involved with pervasively present biological processes, 4) found convergent indications that the process “calcium-PRC (protein kinase C)-Ras-Raf-MAPK/ERK” is likely a major contributor to ASD pathophysiology, and 5) noted that perturbations associated with KEGG’s category of environmental information processing were common. These findings support the idea that ASD-associated genes may contribute not only to core features of ASD themselves but also to vulnerability to other chronic and systemic problems potentially including cancer, metabolic conditions and heart diseases. ASDs may thus arise, or emerge, from underlying vulnerabilities related to pleiotropic genes associated with pervasively important molecular mechanisms, vulnerability to environmental input and multiple systemic co-morbidities. 相似文献

8.

Quality Measures for Gene Expression Biclusters

Beatriz Pontes Ral Girldez Jess S. Aguilar-Ruiz 《PloS one》2015,10(3)

An noticeable number of biclustering approaches have been proposed proposed for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. In this context, recognizing groups of co-expressed or co-regulated genes, that is, genes which follow a similar expression pattern, is one of the main objectives. Due to the problem complexity, heuristic searches are usually used instead of exhaustive algorithms. Furthermore, most of biclustering approaches use a measure or cost function that determines the quality of biclusters. Having a suitable quality metric for bicluster is a critical aspect, not only for guiding the search, but also for establishing a comparison criteria among the results obtained by different biclustering techniques. In this paper, we analyse a large number of existing approaches to quality measures for gene expression biclusters, as well as we present a comparative study of them based on their capability to recognize different expression patterns in biclusters. 相似文献

9.

Biclustering algorithms for biological data analysis: a survey 总被引：7，自引：0，他引：7

Madeira SC Oliveira AL 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(1):24-45

A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications. 相似文献

10.

QServer: a biclustering server for prediction and assessment of co-expressed gene clusters

Zhou F Ma Q Li G Xu Y 《PloS one》2012,7(3):e32660

相似文献

11.

Phylogenetic comparative assembly

Husemann Peter Stoye Jens 《Algorithms for molecular biology : AMB》2010,5(1):1-12

Background

Biclustering is an important analysis procedure to understand the biological mechanisms from microarray gene expression data. Several algorithms have been proposed to identify biclusters, but very little effort was made to compare the performance of different algorithms on real datasets and combine the resultant biclusters into one unified ranking.

Results

In this paper we propose differential co-expression framework and a differential co-expression scoring function to objectively quantify quality or goodness of a bicluster of genes based on the observation that genes in a bicluster are co-expressed in the conditions belonged to the bicluster and not co-expressed in the other conditions. Furthermore, we propose a scoring function to stratify biclusters into three types of co-expression. We used the proposed scoring functions to understand the performance and behavior of the four well established biclustering algorithms on six real datasets from different domains by combining their output into one unified ranking.

Conclusions

Differential co-expression framework is useful to provide quantitative and objective assessment of the goodness of biclusters of co-expressed genes and performance of biclustering algorithms in identifying co-expression biclusters. It also helps to combine the biclusters output by different algorithms into one unified ranking i.e. meta-biclustering. 相似文献

12.

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

Serin A Vingron M 《Algorithms for molecular biology : AMB》2011,6(1):18-12

相似文献

13.

Biclustering by sparse canonical correlation analysis

Harold Pimentel Zhiyue Hu Haiyan Huang 《Quantitative Biology.》2018,6(1):56

Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in-creasingly important, especially when the samples are pooled from a wide range of experimental conditions. Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong “group interactions” under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set. Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc. Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix. 相似文献

14.

Quantitative trait associated microarray gene expression data analysis

Qu Y Xu S 《Molecular biology and evolution》2006,23(8):1558-1573

Selection on phenotypes may cause genetic change. To understand the relationship between phenotype and gene expression from an evolutionary viewpoint, it is important to study the concordance between gene expression and profiles of phenotypes. In this study, we use a novel method of clustering to identify genes whose expression profiles are related to a quantitative phenotype. Cluster analysis of gene expression data aims at classifying genes into several different groups based on the similarity of their expression profiles across multiple conditions. The hope is that genes that are classified into the same clusters may share underlying regulatory elements or may be a part of the same metabolic pathways. Current methods for examining the association between phenotype and gene expression are limited to linear association measured by the correlation between individual gene expression values and phenotype. Genes may be associated with the phenotype in a nonlinear fashion. In addition, groups of genes that share a particular pattern in their relationship to phenotype may be of evolutionary interest. In this study, we develop a method to group genes based on orthogonal polynomials under a multivariate Gaussian mixture model. The effect of each expressed gene on the phenotype is partitioned into a cluster mean and a random deviation from the mean. Genes can also be clustered based on a time series. Parameters are estimated using the expectation-maximization algorithm and implemented in SAS. The method is verified with simulated data and demonstrated with experimental data from 2 studies, one clusters with respect to severity of disease in Alzheimer's patients and another clusters data for a rat fracture healing study over time. We find significant evidence of nonlinear associations in both studies and successfully describe these patterns with our method. We give detailed instructions and provide a working program that allows others to directly implement this method in their own analyses. 相似文献

15.

Discovering biclusters in gene expression data based on high-dimensional linear geometries

Xiangchao Gan Alan Wee-Chung Liew Hong Yan 《BMC bioinformatics》2008,9(1):209

相似文献

16.

PageRank-based identification of signaling crosstalk from transcriptomics data: the case of Arabidopsis thaliana

Omranian N Mueller-Roeber B Nikoloski Z 《Molecular bioSystems》2012,8(4):1121-1127

相似文献

17.

An ensemble biclustering approach for querying gene expression compendia with experimental lists

De Smet R Marchal K 《Bioinformatics (Oxford, England)》2011,27(14):1948-1956

MOTIVATION: Query-based biclustering techniques allow interrogating a gene expression compendium with a given gene or gene list. They do so by searching for genes in the compendium that have a profile close to the average expression profile of the genes in this query-list. As it can often not be guaranteed that the genes in a long query-list will all be mutually coexpressed, it is advisable to use each gene separately as a query. This approach, however, leaves the user with a tedious post-processing of partially redundant biclustering results. The fact that for each query-gene multiple parameter settings need to be tested in order to detect the 'most optimal bicluster size' adds to the redundancy problem. RESULTS: To aid with this post-processing, we developed an ensemble approach to be used in combination with query-based biclustering. The method relies on a specifically designed consensus matrix in which the biclustering outcomes for multiple query-genes and for different possible parameter settings are merged in a statistically robust way. Clustering of this matrix results in distinct, non-redundant consensus biclusters that maximally reflect the information contained within the original query-based biclustering results. The usefulness of the developed approach is illustrated on a biological case study in Escherichia coli. Availability and implementation: Compiled Matlab code is available from http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Information_DeSmet_2011/. 相似文献

18.

A new DNA sequence entropy-based Kullback-Leibler algorithm for gene clustering

Houshang Dehghanzadeh Mostafa Ghaderi-Zefrehei Seyed Ziaeddin Mirhoseini Saeid Esmaeilkhaniyan Ishaku Lemu Haruna Hamed Amirpour Najafabadi 《Journal of applied genetics》2020,61(2):231-238

Information theory is a branch of mathematics that overlaps with communications, biology, and medical engineering. Entropy is a measure of uncertainty in the set of information. In this study, for each gene and its exons sets, the entropy was calculated in orders one to four. Based on the relative entropy of genes and exons, Kullback-Leibler divergence was calculated. After obtaining the Kullback-Leibler distance for genes and exons sets, the results were entered as input into 7 clustering algorithms: single, complete, average, weighted, centroid, median, and K-means. To aggregate the results of clustering, the AdaBoost algorithm was used. Finally, the results of the AdaBoost algorithm were investigated by GeneMANIA prediction server to explore the results from gene annotation point of view. All calculations were performed using the MATLAB Engineering Software (2015). Following our findings on investigating the results of genes metabolic pathways based on the gene annotations, it was revealed that our proposed clustering method yielded correct, logical, and fast results. This method at the same that had not had the disadvantages of aligning allowed the genes with actual length and content to be considered and also did not require high memory for large-length sequences. We believe that the performance of the proposed method could be used with other competitive gene clustering methods to group biologically relevant set of genes. Also, the proposed method can be seen as a predictive method for those genes bearing up weak genomic annotations. 相似文献

19.

An improved hybrid of SVM and SCAD for pathway analysis

Misman MF Mohamad MS Deris S Abdullah A Hashim SZ 《Bioinformation》2011,7(4):169-175

Pathway analysis has lead to a new era in genomic research by providing further biological process information compared to traditional single gene analysis. Beside the advantage, pathway analysis provides some challenges to the researchers, one of which is the quality of pathway data itself. The pathway data usually defined from biological context free, when it comes to a specific biological context (e.g. lung cancer disease), typically only several genes within pathways are responsible for the corresponding cellular process. It also can be that some pathways may be included with uninformative genes or perhaps informative genes were excluded. Moreover, many algorithms in pathway analysis neglect these limitations by treating all the genes within pathways as significant. In previous study, a hybrid of support vector machines and smoothly clipped absolute deviation with groups-specific tuning parameters (gSVM-SCAD) was proposed in order to identify and select the informative genes before the pathway evaluation process. However, gSVM-SCAD had showed a limitation in terms of the performance of classification accuracy. In order to deal with this limitation, we made an enhancement to the tuning parameter method for gSVM-SCAD by applying the B-Type generalized approximate cross validation (BGACV). Experimental analyses using one simulated data and two gene expression data have shown that the proposed method obtains significant results in identifying biologically significant genes and pathways, and in classification accuracy. 相似文献

20.

Mining co-regulated gene profiles for the detection of functional associations in gene expression data 总被引：1，自引：0，他引：1

Gyenesei A Wagner U Barkow-Oesterreicher S Stolte E Schlapbach R 《Bioinformatics (Oxford, England)》2007,23(15):1927-1935

MOTIVATION: Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods. RESULTS: We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques. 相似文献