期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The computational analysis of scientific literature to define and recognize gene expression clusters 总被引：2，自引：0，他引：2

下载免费PDF全文

Raychaudhuri S Chang JT Imam F Altman RB 《Nucleic acids research》2003,31(15):4553-4560

A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster; many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators. 相似文献

2.

GARBAN: genomic analysis and rapid biological annotation of cDNA microarray and proteomic data

Martínez-Cruz LA Rubio A Martínez-Chantar ML Labarga A Barrio I Podhorski A Segura V Sevilla Campo JL Avila MA Mato JM 《Bioinformatics (Oxford, England)》2003,19(16):2158-2160

SUMMARY: Genomic Analysis and Rapid Biological ANnotation (GARBAN) is a new tool that provides an integrated framework to analyze simultaneously and compare multiple data sets derived from microarray or proteomic experiments. It carries out automated classifications of genes or proteins according to the criteria of the Gene Ontology Consortium at a level of depth defined by the user. Additionally, it performs clustering analysis of all sets based on functional categories or on differential expression levels. GARBAN also provides graphical representations of the biological pathways in which all the genes/proteins participate. AVAILABILITY: http://garban.tecnun.es. 相似文献

3.

Assigning function to yeast proteins by integration of technologies

Hazbun TR Malmström L Anderson S Graczyk BJ Fox B Riffle M Sundin BA Aranda JD McDonald WH Chiu CH Snydsman BE Bradley P Muller EG Fields S Baker D Yates JR Davis TN 《Molecular cell》2003,12(6):1353-1365

Interpreting genome sequences requires the functional analysis of thousands of predicted proteins, many of which are uncharacterized and without obvious homologs. To assess whether the roles of large sets of uncharacterized genes can be assigned by targeted application of a suite of technologies, we used four complementary protein-based methods to analyze a set of 100 uncharacterized but essential open reading frames (ORFs) of the yeast Saccharomyces cerevisiae. These proteins were subjected to affinity purification and mass spectrometry analysis to identify copurifying proteins, two-hybrid analysis to identify interacting proteins, fluorescence microscopy to localize the proteins, and structure prediction methodology to predict structural domains or identify remote homologies. Integration of the data assigned function to 48 ORFs using at least two of the Gene Ontology (GO) categories of biological process, molecular function, and cellular component; 77 ORFs were annotated by at least one method. This combination of technologies, coupled with annotation using GO, is a powerful approach to classifying genes. 相似文献

4.

A framework for incorporating functional interrelationships into protein function prediction algorithms

Zhang XF Dai DQ 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):740-753

The functional annotation of proteins is one of the most important tasks in the post-genomic era. Although many computational approaches have been developed in recent years to predict protein function, most of these traditional algorithms do not take interrelationships among functional terms into account, such as different GO terms usually coannotate with some common proteins. In this study, we propose a new functional similarity measure in the form of Jaccard coefficient to quantify these interrelationships and also develop a framework for incorporating GO term similarity into protein function prediction process. The experimental results of cross-validation on S. cerevisiae and Homo sapiens data sets demonstrate that our method is able to improve the performance of protein function prediction. In addition, we find that small size terms associated with a few of proteins obtain more benefit than the large size ones when considering functional interrelationships. We also compare our similarity measure with other two widely used measures, and results indicate that when incorporated into function prediction algorithms, our proposed measure is more effective. Experiment results also illustrate that our algorithms outperform two previous competing algorithms, which also take functional interrelationships into account, in prediction accuracy. Finally, we show that our method is robust to annotations in the database which are not complete at present. These results give new insights about the importance of functional interrelationships in protein function prediction. 相似文献

5.

A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities

Soneson C Fontes M 《Biostatistics (Oxford, England)》2012,13(1):129-141

Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings. 相似文献

6.

Annotating genes of known and unknown function by large-scale coexpression analysis 总被引：3，自引：0，他引：3

Horan K Jang C Bailey-Serres J Mittler R Shelton C Harper JF Zhu JK Cushman JC Gollery M Girke T 《Plant physiology》2008,147(1):41-57

About 40% of the proteins encoded in eukaryotic genomes are proteins of unknown function (PUFs). Their functional characterization remains one of the main challenges in modern biology. In this study we identified the PUF encoding genes from Arabidopsis (Arabidopsis thaliana) using a combination of sequence similarity, domain-based, and empirical approaches. Large-scale gene expression analyses of 1,310 publicly available Affymetrix chips were performed to associate the identified PUF genes with regulatory networks and biological processes of known function. To generate quality results, the study was restricted to expression sets with replicated samples. First, genome-wide clustering and gene function enrichment analysis of clusters allowed us to associate 1,541 PUF genes with tightly coexpressed genes for proteins of known function (PKFs). Over 70% of them could be assigned to more specific biological process annotations than the ones available in the current Gene Ontology release. The most highly overrepresented functional categories in the obtained clusters were ribosome assembly, photosynthesis, and cell wall pathways. Interestingly, the majority of the PUF genes appeared to be controlled by the same regulatory networks as most PKF genes, because clusters enriched in PUF genes were extremely rare. Second, large-scale analysis of differentially expressed genes was applied to identify a comprehensive set of abiotic stress-response genes. This analysis resulted in the identification of 269 PKF and 104 PUF genes that responded to a wide variety of abiotic stresses, whereas 608 PKF and 206 PUF genes responded predominantly to specific stress treatments. The provided coexpression and differentially expressed gene data represent an important resource for guiding future functional characterization experiments of PUF and PKF genes. Finally, the public Plant Gene Expression Database (http://bioweb.ucr.edu/PED) was developed as part of this project to provide efficient access and mining tools for the vast gene expression data of this study. 相似文献

7.

Gene coexpression network analysis as a source of functional annotation for rice genes

Childs KL Davidson RM Buell CR 《PloS one》2011,6(7):e22196

相似文献

8.

Empirical evidence of the applicability of functional clustering through gene expression classification

Krejník M Kléma J 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):788-798

The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed. 相似文献

9.

Network enrichment analysis: extension of gene-set enrichment analysis to gene networks

A Alexeyenko W Lee M Pernemalm J Guegan P Dessen V Lazar J Lehtiö Y Pawitan 《BMC bioinformatics》2012,13(1):226

ABSTRACT: BACKGROUND: Gene-set enrichment analyses (GEA or GSEA) are commonly used for biological characterization of an experimental gene-set. This is done by finding known functional categories, such as pathways or Gene Ontology terms, that are over-represented in the experimental set; the assessment is based on an overlap statistic. Rich biological information in terms of gene interaction network is now widely available, but this topological information is not used by GEA, so there is a need for methods that exploit this type of information in high-throughput data analysis. RESULTS: We developed a method of network enrichment analysis (NEA) that extends the overlap statistic in GEA to network links between genes in the experimental set and those in the functional categories. For the crucial step in statistical inference, we developed a fast network randomization algorithm in order to obtain the distribution of any network statistic under the null hypothesis of no association between an experimental gene-set and a functional category. We illustrate the NEA method using gene and protein expression data from a lung cancer study. CONCLUSIONS: The results indicate that the NEA method is more powerful than the traditional GEA, primarily because the relationships between gene sets were more strongly captured by network connectivity rather than by simple overlaps. 相似文献

10.

MRHCA: a nonparametric statistics based method for hub and co-expression module identification in large gene co-expression network

Yu Zhang Sha Cao Jing Zhao Burair Alsaihati Qin Ma Chi Zhang 《Quantitative Biology.》2018,6(1):40

相似文献

11.

Chapter 8: Biological Knowledge Assembly and Interpretation

Ju Han Kim 《PLoS computational biology》2012,8(12)

相似文献

12.

Analyzing yeast protein-protein interaction data obtained from different sources 总被引：1，自引：0，他引：1

Bader GD Hogue CW 《Nature biotechnology》2002,20(10):991-997

High-throughput methods for detecting protein interactions, such as mass spectrometry and yeast two-hybrid assays, continue to produce vast amounts of data that may be exploited to infer protein function and regulation. As this article went to press, the pool of all published interaction information on Saccharomyces cerevisiae was 15,143 interactions among 4,825 proteins, and power-law scaling supports an estimate of 20,000 specific protein interactions. To investigate the biases, overlaps, and complementarities among these data, we have carried out an analysis of two high-throughput mass spectrometry (HMS)-based protein interaction data sets from budding yeast, comparing them to each other and to other interaction data sets. Our analysis reveals 198 interactions among 222 proteins common to both data sets, many of which reflect large multiprotein complexes. It also indicates that a "spoke" model that directly pairs bait proteins with associated proteins is roughly threefold more accurate than a "matrix" model that connects all proteins. In addition, we identify a large, previously unsuspected nucleolar complex of 148 proteins, including 39 proteins of unknown function. Our results indicate that existing large-scale protein interaction data sets are nonsaturating and that integrating many different experimental data sets yields a clearer biological view than any single method alone. 相似文献

13.

YGA: Identifying distinct biological features between yeast gene sets

Darby Tien-Hao Chang Wen-Si LiYi-Han Bai Wei-Sheng Wu 《Gene》2013

相似文献

14.

STARTS--a stable root transformation system for rapid functional analyses of proteins of the monocot model plant barley

Imani J Li L Schäfer P Kogel KH 《The Plant journal : for cell and molecular biology》2011,67(4):726-735

Large data sets are generated from plants by the various 'omics platforms. Currently, a limiting step in data analysis is the assessment of protein function and its translation into a biological context. The lack of robust high-throughput transformation systems for monocotyledonous plants, to which the vast majority of crop plants belong, is a major restriction and impedes exploitation of novel traits in agriculture. Here we present a stable root transformation system for barley, termed STARTS, that allows assessment of gene function in root tissues within 6 weeks. The system is based on the finding that a callus, produced on root induction medium from the scutellum of the immature embryo, is able to regenerate roots from single transformed cells by concomitant suppression of shoot development. Using Agrobacterium tumefaciens-mediated transfer of genes involved in root development and pathogenesis, we show that those calli regenerate large amounts of uniformly transformed roots for in situ functional analysis of newly expressed proteins. 相似文献

15.

Nonparametric methods for identifying differentially expressed genes in microarray data 总被引：11，自引：0，他引：11

Troyanskaya OG Garber ME Brown PO Botstein D Altman RB 《Bioinformatics (Oxford, England)》2002,18(11):1454-1461

MOTIVATION: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs. RESULTS: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis. 相似文献

16.

Functional annotation of hierarchical modularity

Padmanabhan K Wang K Samatova NF 《PloS one》2012,7(4):e33744

In biological networks of molecular interactions in a cell, network motifs that are biologically relevant are also functionally coherent, or form functional modules. These functionally coherent modules combine in a hierarchical manner into larger, less cohesive subsystems, thus revealing one of the essential design principles of system-level cellular organization and function-hierarchical modularity. Arguably, hierarchical modularity has not been explicitly taken into consideration by most, if not all, functional annotation systems. As a result, the existing methods would often fail to assign a statistically significant functional coherence score to biologically relevant molecular machines. We developed a methodology for hierarchical functional annotation. Given the hierarchical taxonomy of functional concepts (e.g., Gene Ontology) and the association of individual genes or proteins with these concepts (e.g., GO terms), our method will assign a Hierarchical Modularity Score (HMS) to each node in the hierarchy of functional modules; the HMS score and its p-value measure functional coherence of each module in the hierarchy. While existing methods annotate each module with a set of "enriched" functional terms in a bag of genes, our complementary method provides the hierarchical functional annotation of the modules and their hierarchically organized components. A hierarchical organization of functional modules often comes as a bi-product of cluster analysis of gene expression data or protein interaction data. Otherwise, our method will automatically build such a hierarchy by directly incorporating the functional taxonomy information into the hierarchy search process and by allowing multi-functional genes to be part of more than one component in the hierarchy. In addition, its underlying HMS scoring metric ensures that functional specificity of the terms across different levels of the hierarchical taxonomy is properly treated. We have evaluated our method using Saccharomyces cerevisiae data from KEGG and MIPS databases and several other computationally derived and curated datasets. The code and additional supplemental files can be obtained from http://code.google.com/p/functional-annotation-of-hierarchical-modularity/ (Accessed 2012 March 13). 相似文献

17.

Functional Module Search in Protein Networks based on Semantic Similarity Improves the Analysis of Proteomics Data

Desislava Boyanova Santosh Nilla Gunnar W. Klau Thomas Dandekar Tobias Müller Marcus Dittrich 《Molecular & cellular proteomics : MCP》2014,13(7):1877-1889

相似文献

18.

Novel search method for the discovery of functional relationships

Ramírez F Lawyer G Albrecht M 《Bioinformatics (Oxford, England)》2012,28(2):269-276

MOTIVATION: Numerous annotations are available that functionally characterize genes and proteins with regard to molecular process, cellular localization, tissue expression, protein domain composition, protein interaction, disease association and other properties. Searching this steadily growing amount of information can lead to the discovery of new biological relationships between genes and proteins. To facilitate the searches, methods are required that measure the annotation similarity of genes and proteins. However, most current similarity methods are focused only on annotations from the Gene Ontology (GO) and do not take other annotation sources into account. RESULTS: We introduce the new method BioSim that incorporates multiple sources of annotations to quantify the functional similarity of genes and proteins. We compared the performance of our method with four other well-known methods adapted to use multiple annotation sources. We evaluated the methods by searching for known functional relationships using annotations based only on GO or on our large data warehouse BioMyn. This warehouse integrates many diverse annotation sources of human genes and proteins. We observed that the search performance improved substantially for almost all methods when multiple annotation sources were included. In particular, our method outperformed the other methods in terms of recall and average precision. 相似文献

19.

Global landscape of a co-expressed gene network in barley and its application to gene discovery in Triticeae crops

Mochida K Uehara-Yamaguchi Y Yoshida T Sakurai T Shinozaki K 《Plant & cell physiology》2011,52(5):785-803

相似文献

20.

Probabilistic protein function prediction from heterogeneous genome-wide data

Nariai N Kolaczyk ED Kasif S 《PloS one》2007,2(3):e337

Dramatic improvements in high throughput sequencing technologies have led to a staggering growth in the number of predicted genes. However, a large fraction of these newly discovered genes do not have a functional assignment. Fortunately, a variety of novel high-throughput genome-wide functional screening technologies provide important clues that shed light on gene function. The integration of heterogeneous data to predict protein function has been shown to improve the accuracy of automated gene annotation systems. In this paper, we propose and evaluate a probabilistic approach for protein function prediction that integrates protein-protein interaction (PPI) data, gene expression data, protein motif information, mutant phenotype data, and protein localization data. First, functional linkage graphs are constructed from PPI data and gene expression data, in which an edge between nodes (proteins) represents evidence for functional similarity. The assumption here is that graph neighbors are more likely to share protein function, compared to proteins that are not neighbors. The functional linkage graph model is then used in concert with protein domain, mutant phenotype and protein localization data to produce a functional prediction. Our method is applied to the functional prediction of Saccharomyces cerevisiae genes, using Gene Ontology (GO) terms as the basis of our annotation. In a cross validation study we show that the integrated model increases recall by 18%, compared to using PPI data alone at the 50% precision. We also show that the integrated predictor is significantly better than each individual predictor. However, the observed improvement vs. PPI depends on both the new source of data and the functional category to be predicted. Surprisingly, in some contexts integration hurts overall prediction accuracy. Lastly, we provide a comprehensive assignment of putative GO terms to 463 proteins that currently have no assigned function. 相似文献