共查询到20条相似文献,搜索用时 8 毫秒
1.
Ponzoni I Azuaje F Augusto J Glass D 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(4):624-634
There is a need to design computational methods to support the prediction of gene regulatory networks. Such models should offer both biologically-meaningful and computationally-accurate predictions, which in combination with other techniques may improve large-scale, integrative studies. This paper presents a new machine learning method for the prediction of putative regulatory associations from expression data, which exhibit properties never or only partially addressed by other techniques recently published. The method was tested on a Saccharomyces cerevisiae gene expression dataset. The results were statistically validated and compared with the relationships inferred by two machine learning approaches to gene regulatory network prediction. Furthermore, the resulting predictions were assessed using domain knowledge. The proposed algorithm may be able to accurately predict relevant biological associations between genes. One of the most relevant features of this new method is the prediction of adaptive regulation thresholds for the discretization of gene expression values, which is required prior to the rule association learning process. Moreover, an important advantage consists of its low computational cost to infer association rules. The proposed system may significantly support exploratory, large-scale studies of automated identification of potentially-relevant gene expression associations. 相似文献
2.
It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method. 相似文献
3.
Background
With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.Results
We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.Conclusions
A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.4.
OryzaExpress: an integrated database of gene expression networks and omics annotations in rice 总被引:1,自引:0,他引:1
Hamada K Hongo K Suwabe K Shimizu A Nagayama T Abe R Kikuchi S Yamamoto N Fujii T Yokoyama K Tsuchida H Sano K Mochizuki T Oki N Horiuchi Y Fujita M Watanabe M Matsuoka M Kurata N Yano K 《Plant & cell physiology》2011,52(2):220-229
5.
Mining gene expression databases for association rules 总被引:16,自引:0,他引:16
6.
Background
Protein-protein interactions (PPIs) play a key role in understanding the mechanisms of cellular processes. The availability of interactome data has catalyzed the development of computational approaches to elucidate functional behaviors of proteins on a system level. Gene Ontology (GO) and its annotations are a significant resource for functional characterization of proteins. Because of wide coverage, GO data have often been adopted as a benchmark for protein function prediction on the genomic scale.Results
We propose a computational approach, called M-Finder, for functional association pattern mining. This method employs semantic analytics to integrate the genome-wide PPIs with GO data. We also introduce an interactive web application tool that visualizes a functional association network linked to a protein specified by a user. The proposed approach comprises two major components. First, the PPIs that have been generated by high-throughput methods are weighted in terms of their functional consistency using GO and its annotations. We assess two advanced semantic similarity metrics which quantify the functional association level of each interacting protein pair. We demonstrate that these measures outperform the other existing methods by evaluating their agreement to other biological features, such as sequence similarity, the presence of common Pfam domains, and core PPIs. Second, the information flow-based algorithm is employed to discover a set of proteins functionally associated with the protein in a query and their links efficiently. This algorithm reconstructs a functional association network of the query protein. The output network size can be flexibly determined by parameters.Conclusions
M-Finder provides a useful framework to investigate functional association patterns with any protein. This software will also allow users to perform further systematic analysis of a set of proteins for any specific function. It is available online at http://bionet.ecs.baylor.edu/mfinder7.
8.
Pedro Carmona-Saez Monica Chagoyen Andres Rodriguez Oswaldo Trelles Jose M Carazo Alberto Pascual-Montano 《BMC bioinformatics》2006,7(1):54-16
Background
Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process. 相似文献9.
10.
11.
EST序列代表了组织基因表达的转录信号,本研究尝试开发简单高效的大规模EST分析方法,从NCBI下载水稻(Oryza sativa)的所有EST序列并进行分析以获取水稻发育过程基因表达的重要信息。通过进行blast比对和phrap拼接分析,及利用Unix文本过滤方法,从EST序列拼接获得了3万多个重叠群序列。进一步将重叠群序列与NCBI核酸数据库进行比对获得了各个序列的注释信息。从重叠群的组织表达初步挖掘中发现花药的表达数量最多,为下一步探讨水稻发育器官特异表达基因调控打下了重要基础。 相似文献
12.
Information on gene expression in colon tumors versus normal human colon was recently generated by an oligonucleotide microarray study. We used the associated database to search for genes that display age-dependent variations in expression. Statistically significant evidence was obtained that such genes are present in both the tumor and normal tissue databases. Besides the analysis of all genes included in the database, three subsets of genes were analyzed separately: genes controlled by p53, and genes coding for ribosomal proteins and for nuclear-encoded mitochondrial proteins. Among the genes controlled by p53 some show an age-dependent change in expression in tumor tissues, in the sense compatible with an activation of p53 at higher age. A decreased expression of some ribosomal genes at advanced age was detected both in tumor and normal tissues. No significant age-dependent expression could be detected for genes encoding mitochondrial proteins. 相似文献
13.
14.
15.
The present paper demonstrates the application of CART (classification and regression trees) to control a mosquito vector (Culex quinquefasciatus) for bancroftian filariasis in India. The database on filariasis and a commercially available software CART (Salford systems Inc. USA) were used in this study. Baseline entomological data related to bancroftian filariasis was utilized for deriving prediction rules. The data was categorized into three different aspects, namely (1) mosquito abundance, (2) meteorological and (3) socio-economic details. This data was taken from a database developed for a project entitled "Database management system for the control of bancroftian filariasis" sponsored by Ministry of Communication and Information Technology (MC&IT), Government of India, New Delhi. Predictor variables (maximum temperature, minimum temperature, rain fall, relative humidity, wind speed, house type) were ranked by CART according to their influence on the target variable (month). The approach is useful for forecasting vector (mosquito) densities in forthcoming seasons. 相似文献
16.
Background
Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems.Results
In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained.We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall.Conclusion
CPredictor3.0 can serve as a promising tool of protein complex prediction.17.
Massive amounts of gene expression data are generated using microarrays for functional studies of genes and gene expression data clustering is a useful tool for studying the functional relationship among genes in a biological process. We have developed a computer package EXCAVATOR for clustering gene expression profiles based on our new framework for representing gene expression data as a minimum spanning tree. EXCAVATOR uses a number of rigorous and efficient clustering algorithms. This program has a number of unique features, including capabilities for: (i) data- constrained clustering; (ii) identification of genes with similar expression profiles to pre-specified seed genes; (iii) cluster identification from a noisy background; (iv) computational comparison between different clustering results of the same data set. EXCAVATOR can be run from a Unix/Linux/DOS shell, from a Java interface or from a Web server. The clustering results can be visualized as colored figures and 2-dimensional plots. Moreover, EXCAVATOR provides a wide range of options for data formats, distance measures, objective functions, clustering algorithms, methods to choose number of clusters, etc. The effectiveness of EXCAVATOR has been demonstrated on several experimental data sets. Its performance compares favorably against the popular K-means clustering method in terms of clustering quality and computing time. 相似文献
18.
MOTIVATION: Inferring the genetic interaction mechanism using Bayesian networks has recently drawn increasing attention due to its well-established theoretical foundation and statistical robustness. However, the relative insufficiency of experiments with respect to the number of genes leads to many false positive inferences. RESULTS: We propose a novel method to infer genetic networks by alleviating the shortage of available mRNA expression data with prior knowledge. We call the proposed method 'modularized network learning' (MONET). Firstly, the proposed method divides a whole gene set to overlapped modules considering biological annotations and expression data together. Secondly, it infers a Bayesian network for each module, and integrates the learned subnetworks to a global network. An algorithm that measures a similarity between genes based on hierarchy, specificity and multiplicity of biological annotations is presented. The proposed method draws a global picture of inter-module relationships as well as a detailed look of intra-module interactions. We applied the proposed method to analyze Saccharomyces cerevisiae stress data, and found several hypotheses to suggest putative functions of unclassified genes. We also compared the proposed method with a whole-set-based approach and two expression-based clustering approaches. 相似文献
19.
Recent development in DNA microarray technologies has made the reconstruction of gene regulatory networks (GRNs) feasible. To infer the overall structure of a GRN, there is a need to find out how the expression of each gene can be affected by the others. Many existing approaches to reconstructing GRNs are developed to generate hypotheses about the presence or absence of interactions between genes so that laboratory experiments can be performed afterwards for verification. Since, they are not intended to be used to predict if a gene in an unseen sample has any interactions with other genes, statistical verification of the reliability of the discovered interactions can be difficult. Furthermore, since the temporal ordering of the data is not taken into consideration, the directionality of regulation cannot be established using these existing techniques. To tackle these problems, we propose a data mining technique here. This technique makes use of a probabilistic inference approach to uncover interesting dependency relationships in noisy, high-dimensional time series expression data. It is not only able to determine if a gene is dependent on another but also whether or not it is activated or inhibited. In addition, it can predict how a gene would be affected by other genes even in unseen samples. For performance evaluation, the proposed technique has been tested with real expression data. Experimental results show that it can be very effective. The discovered dependency relationships can reveal gene regulatory relationships that could be used to infer the structures of GRNs. 相似文献