期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Discovering coherent biclusters from gene expression data using zero-suppressed binary decision diagrams

Yoon S Nardini C Benini L De Micheli G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2005,2(4):339-354

The biclustering method can be a very useful analysis tool when some genes have multiple functions and experimental conditions are diverse in gene expression measurement. This is because the biclustering approach, in contrast to the conventional clustering techniques, focuses on finding a subset of the genes and a subset of the experimental conditions that together exhibit coherent behavior. However, the biclustering problem is inherently intractable, and it is often computationally costly to find biclusters with high levels of coherence. In this work, we propose a novel biclustering algorithm that exploits the zero-suppressed binary decision diagrams (ZBDDs) data structure to cope with the computational challenges. Our method can find all biclusters that satisfy specific input conditions, and it is scalable to practical gene expression data. We also present experimental results confirming the effectiveness of our approach. 相似文献

2.

A new geometric biclustering algorithm based on the Hough transform for analysis of large-scale microarray data

Zhao H Liew AW Xie X Yan H 《Journal of theoretical biology》2008,251(2):264-274

Biclustering is an important tool in microarray analysis when only a subset of genes co-regulates in a subset of conditions. Different from standard clustering analyses, biclustering performs simultaneous classification in both gene and condition directions in a microarray data matrix. However, the biclustering problem is inherently intractable and computationally complex. In this paper, we present a new biclustering algorithm based on the geometrical viewpoint of coherent gene expression profiles. In this method, we perform pattern identification based on the Hough transform in a column-pair space. The algorithm is especially suitable for the biclustering analysis of large-scale microarray data. Our studies show that the approach can discover significant biclusters with respect to the increased noise level and regulatory complexity. Furthermore, we also test the ability of our method to locate biologically verifiable biclusters within an annotated set of genes. 相似文献

3.

A systematic comparison and evaluation of biclustering methods for gene expression data 总被引：9，自引：0，他引：9

Prelić A Bleuler S Zimmermann P Wille A Bühlmann P Gruissem W Hennig L Thiele L Zitzler E 《Bioinformatics (Oxford, England)》2006,22(9):1122-1129

MOTIVATION: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. 相似文献

4.

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

Serin A Vingron M 《Algorithms for molecular biology : AMB》2011,6(1):18-12

相似文献

5.

Symmetric and asymmetric multi-modality biclustering analysis for microarray data matrix

Kung SY Mak MW Tagkopoulos I 《Journal of bioinformatics and computational biology》2006,4(2):275-298

Machine learning techniques offer a viable approach to cluster discovery from microarray data, which involves identifying and classifying biologically relevant groups in genes and conditions. It has been recognized that genes (whether or not they belong to the same gene group) may be co-expressed via a variety of pathways. Therefore, they can be adequately described by a diversity of coherence models. In fact, it is known that a gene may participate in multiple pathways that may or may not be co-active under all conditions. It is therefore biologically meaningful to simultaneously divide genes into functional groups and conditions into co-active categories--leading to the so-called biclustering analysis. For this, we have proposed a comprehensive set of coherence models to cope with various plausible regulation processes. Furthermore, a multivariate biclustering analysis based on fusion of different coherence models appears to be promising because the expression level of genes from the same group may follow more than one coherence models. The simulation studies further confirm that the proposed framework enjoys the advantage of high prediction performance. 相似文献

6.

Context Specific and Differential Gene Co-expression Networks via Bayesian Biclustering

Chuan Gao Ian C. McDowell Shiwen Zhao Christopher D. Brown Barbara E. Engelhardt 《PLoS computational biology》2016,12(7)

Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues. 相似文献

7.

BicAT: a biclustering analysis toolbox

Barkow S Bleuler S Prelic A Zimmermann P Zitzler E 《Bioinformatics (Oxford, England)》2006,22(10):1282-1283

SUMMARY: Besides classical clustering methods such as hierarchical clustering, in recent years biclustering has become a popular approach to analyze biological data sets, e.g. gene expression data. The Biclustering Analysis Toolbox (BicAT) is a software platform for clustering-based data analysis that integrates various biclustering and clustering techniques in terms of a common graphical user interface. Furthermore, BicAT provides different facilities for data preparation, inspection and postprocessing such as discretization, filtering of biclusters according to specific criteria or gene pair analysis for constructing gene interconnection graphs. The possibility to use different biclustering algorithms inside a single graphical tool allows the user to compare clustering results and choose the algorithm that best fits a specific biological scenario. The toolbox is described in the context of gene expression analysis, but is also applicable to other types of data, e.g. data from proteomics or synthetic lethal experiments. AVAILABILITY: The BicAT toolbox is freely available at http://www.tik.ee.ethz.ch/sop/bicat and runs on all operating systems. The Java source code of the program and a developer's guide is provided on the website as well. Therefore, users may modify the program and add further algorithms or extensions. 相似文献

8.

Quality Measures for Gene Expression Biclusters

Beatriz Pontes Ral Girldez Jess S. Aguilar-Ruiz 《PloS one》2015,10(3)

An noticeable number of biclustering approaches have been proposed proposed for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. In this context, recognizing groups of co-expressed or co-regulated genes, that is, genes which follow a similar expression pattern, is one of the main objectives. Due to the problem complexity, heuristic searches are usually used instead of exhaustive algorithms. Furthermore, most of biclustering approaches use a measure or cost function that determines the quality of biclusters. Having a suitable quality metric for bicluster is a critical aspect, not only for guiding the search, but also for establishing a comparison criteria among the results obtained by different biclustering techniques. In this paper, we analyse a large number of existing approaches to quality measures for gene expression biclusters, as well as we present a comparative study of them based on their capability to recognize different expression patterns in biclusters. 相似文献

9.

Genome-wide matching of genes to cellular roles using guilt-by-association models derived from single sample analysis

JA Klomp KA Furge 《BMC research notes》2012,5(1):370

相似文献

10.

Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree

Gadagkar SR Rosenberg MS Kumar S 《Journal of experimental zoology. Part B. Molecular and developmental evolution》2005,304(1):64-74

Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support. 相似文献

11.

QServer: a biclustering server for prediction and assessment of co-expressed gene clusters

Zhou F Ma Q Li G Xu Y 《PloS one》2012,7(3):e32660

相似文献

12.

Genomics and genome-wide association studies: an integrative approach to expression QTL mapping

Degnan JH Lasky-Su J Raby BA Xu M Molony C Schadt EE Lange C 《Genomics》2008,92(3):129-133

Expression QTL mapping by integrating genome-wide gene expression and genotype data is a promising approach to identifying functional genetic variation, but is hampered by the large number of multiple comparisons inherent in such studies. A novel approach to addressing multiple testing problems in genome-wide family-based association studies is screening candidate markers using heritability or conditional power. We apply these methods in settings in which microarray gene expression data are used as phenotypes, screening for SNPs near the expressed genes. We perform association analyses for phenotypes using a univariate approach. We also perform simulations on trios with large numbers of causal SNPs to determine the optimal number of markers to use in a screen. We demonstrate that our family-based screening approach performs well in the analysis of integrative genomic datasets and that screening using either heritability or conditional power produces similar, though not identical, results. 相似文献

13.

Biclustering by sparse canonical correlation analysis

Harold Pimentel Zhiyue Hu Haiyan Huang 《Quantitative Biology.》2018,6(1):56

Background: Developing appropriate computational tools to distill biological insights from large-scale gene expression data has been an important part of systems biology. Considering that gene relationships may change or only exist in a subset of collected samples, biclustering that involves clustering both genes and samples has become in-creasingly important, especially when the samples are pooled from a wide range of experimental conditions. Methods: In this paper, we introduce a new biclustering algorithm to find subsets of genomic expression features (EFs) (e.g., genes, isoforms, exon inclusion) that show strong “group interactions” under certain subsets of samples. Group interactions are defined by strong partial correlations, or equivalently, conditional dependencies between EFs after removing the influences of a set of other functionally related EFs. Our new biclustering method, named SCCA-BC, extends an existing method for group interaction inference, which is based on sparse canonical correlation analysis (SCCA) coupled with repeated random partitioning of the gene expression data set. Results: SCCA-BC gives sensible results on real data sets and outperforms most existing methods in simulations. Software is available at https://github.com/pimentel/scca-bc. Conclusions: SCCA-BC seems to work in numerous conditions and the results seem promising for future extensions. SCCA-BC has the ability to find different types of bicluster patterns, and it is especially advantageous in identifying a bicluster whose elements share the same progressive and multivariate normal distribution with a dense covariance matrix. 相似文献

14.

BiVisu: software tool for bicluster detection and visualization

Cheng KO Law NF Siu WC Lau TH 《Bioinformatics (Oxford, England)》2007,23(17):2342-2344

BiVisu is an open-source software tool for detecting and visualizing biclusters embedded in a gene expression matrix. Through the use of appropriate coherence relations, BiVisu can detect constant, constant-row, constant-column, additive-related as well as multiplicative-related biclusters. The biclustering results are then visualized under a 2D setting for easy inspection. In particular, parallel coordinate (PC) plots for each bicluster are displayed, from which objective and subjective cluster quality evaluation can be performed. Availability: BiVisu has been developed in Matlab and is available at http://www.eie.polyu.edu.hk/~nflaw/Biclustering/. 相似文献

15.

基于双聚类挖掘癌症共享的基因功能模块

张凡林爱华林美华丁元林饶绍奇《遗传》2013,35(3):333-342

基因多效性是癌症遗传机制中的普遍现象, 但罕见系统性的分析。文章提出采用双聚类挖掘基因功能模块的新思路探索癌症的共享分子机制和不同癌症间的关系。获取20种癌症的基因表达数据, 应用改良t检验和倍数法筛选出至少在两种癌症中差异表达的基因, 得到10417×20的数据矩阵; 采用双聚类方法获得22个癌症共享的基因簇; 进一步富集分析得到17个基因功能模块(Bonferroni校正后P<0.05), 主要参与有丝分裂染色单体分离的调控、细胞分化、免疫和炎症反应、胶原纤维组织等生物过程; 主要执行ATP结合和微管活动、MHCⅡ类受体活性、肽链内切酶抑制活性等分子功能; 活动区域主要在细胞骨架、染色体、MHCⅡ蛋白质复合体、中间丝蛋白、胶原纤维等。基于模块构建癌症相关网络, 显示胃癌、卵巢腺癌、宫颈鳞癌和间皮瘤等之间相关程度较高, 而两种血液系统癌症(急性髓细胞性白血病与多发性骨髓瘤)分子机制与其他癌症存在较大差异。可见癌症共享的基因功能模块与多种生物机制有关, 癌症之间相似性可能与组织起源、共同的致癌机制等有关。文章提出的基因多效性分析方法有助于解释人类复杂性疾病的共享分子机制。相似文献

16.

Enhanced homology searching through genome reading frame predetermination

Yuan J Bush B Elbrecht A Liu Y Zhang T Zhao W Blevins R 《Bioinformatics (Oxford, England)》2004,20(9):1416-1427

MOTIVATION: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search-based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes or motifs often are difficult to find using single query-based homology search algorithms against large sequence datasets such as the human genome. Therefore, the motivation behind this work was to develop an approach to enhance the sensitivity of traditional single query-based homology algorithms against genomic data without losing search selectivity. RESULTS: We demonstrate that by searching against a genome fragmented into all possible reading frames, the sensitivity of homology-based searches is enhanced without degrading its selectivity. Using the ETS-domain, bromodomain and acetyl-CoA acetyltransferase gene as queries, we were able to demonstrate that direct protein-protein searches using BLAST2P or FASTA3 against a human genome segmented among all possible reading frames and translated was substantially more sensitive than traditional protein-DNA searches against a raw genomic sequence using an application such as TBLAST2N. Receiver operating characteristic analysis was employed to demonstrate that the algorithms remained selective, while comparisons of the algorithms showed that the protein-protein searches were more sensitive in identifying hits. Therefore, through the overprediction of reading frames by this method and the increased sensitivity of protein-protein based homology search algorithms, a genome can be deeply mined, potentially finding hits overlooked by protein-DNA searches against raw genomic data. 相似文献

17.

A general framework for biclustering gene expression data

Li H Chen X Zhang K Jiang T 《Journal of bioinformatics and computational biology》2006,4(4):911-933

A large number of biclustering methods have been proposed to detect patterns in gene expression data. All these methods try to find some type of biclusters but no one can discover all the types of patterns in the data. Furthermore, researchers have to design new algorithms in order to find new types of biclusters/patterns that interest biologists. In this paper, we propose a novel approach for biclustering that, in general, can be used to discover all computable patterns in gene expression data. The method is based on the theory of Kolmogorov complexity. More precisely, we use Kolmogorov complexity to measure the randomness of submatrices as the merit of biclusters because randomness naturally consists in a lack of regularity, which is a common property of all types of patterns. On the basis of algorithmic probability measure, we develop a Markov Chain Monte Carlo algorithm to search for biclusters. Our method can also be easily extended to solve the problems of conventional clustering and checkerboard type biclustering. The preliminary experiments on simulated as well as real data show that our approach is very versatile and promising. 相似文献

18.

Biclustering algorithms for biological data analysis: a survey 总被引：7，自引：0，他引：7

Madeira SC Oliveira AL 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2004,1(1):24-45

A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications. 相似文献

19.

cluML

Bolshakova N Cunningham P 《Applied bioinformatics》2005,4(3):211-213

cluML is a new markup language for microarray data clustering and cluster validity assessment. The XML-based format has been designed to address some of the limitations observed in traditional formats, such as inability to store multiple clustering (including biclustering) and validation results within a dataset. cluML is an effective tool to support biomedical knowledge representation in gene expression data analysis. Although cluML was developed for DNA microarray analysis applications, it can be effectively used for the representation of clustering and for the validation of other biomedical and physical data that has no limitations. 相似文献

20.

Recent patents on biclustering algorithms for gene expression data analysis

Liew AW Law NF Yan H 《Recent patents on DNA & gene sequences》2011,5(2):117-125

相似文献