共查询到20条相似文献,搜索用时 28 毫秒
1.
Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile
下载免费PDF全文

GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us,
is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this
paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in
protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach
automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls
on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast
protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of
our approach for predicting protein functions to “biology process” by three measures particularly designed for functional
classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific
functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown
at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006. 相似文献
2.
Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile 总被引:1,自引:2,他引:1
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006. 相似文献
3.
4.
5.
随着DNA芯片技术的广泛应用,基因表达数据分析已成为生命科学的研究热点之一。概述基因表达聚类技术类型、算法分类与特点、结果可视化与注释;阐述一些流行的和新型的算法;介绍17个最新相关软件包和在线web服务工具;并说明软件工具的研究趋向。 相似文献
6.
A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance 相似文献
7.
8.
Array-based gene expression studies frequently serve to identify genes that are expressed differently under two or more conditions. The actual analysis of the data, however, may be hampered by a number of technical and statistical problems. Possible remedies on the level of computational analysis lie in appropriate preprocessing steps, proper normalization of the data and application of statistical testing procedures in the derivation of differentially expressed genes. This review summarizes methods that are available for these purposes and provides a brief overview of the available software tools. 相似文献
9.
Assessing reliability of gene clusters from gene expression data 总被引:5,自引:0,他引:5
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis. Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature. In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability of any gene cluster of interest. Electronic Publication 相似文献
10.
Wu B 《Biostatistics (Oxford, England)》2007,8(3):566-575
We study statistical methods to detect cancer genes that are over- or down-expressed in some but not all samples in a disease group. This has proven useful in cancer studies where oncogenes are activated only in a small subset of samples. We propose the outlier robust t-statistic (ORT), which is intuitively motivated from the t-statistic, the most commonly used differential gene expression detection method. Using real and simulation studies, we compare the ORT to the recently proposed cancer outlier profile analysis (Tomlins and others, 2005) and the outlier sum statistic of Tibshirani and Hastie (2006). The proposed method often has more detection power and smaller false discovery rates. Supplementary information can be found at http://www.biostat.umn.edu/~baolin/research/ort.html. 相似文献
11.
12.
The risk associated with exposure to hepatotoxic drugs is difficult to quantify. Animal experiments to assess their chronic toxicological impact are time consuming. New quantitative approaches to correlate gene expression changes caused by drug exposure to chronic toxicity are required. This article proposes a mathematical model entitled Toxicologic Prediction Network (TPN) to assess chronic hepatotoxicity based on subchronic hepatic gene expression data in rats. A directed graph accounts for the interactions between the drugs, differentially expressed genes and chronic hepatotoxicity. A knowledge-based mathematical model estimates phenotypical exposure risk such as toxic hepatopathy, diffuse fatty change and hepatocellular adenoma for rats. The network's edges encoding the interaction strength are determined by solving an inversion problem that minimizes the difference between the observed and the predicted relative gene expressions as well as the chronic toxicity data. A realistic case study demonstrates how chronic health risk of three halogenated aromatic hydrocarbons can be inferred from subchronic gene expression data. The advantages of the TPN are further demonstrated through two novel applications: Estimation of toxicological impact of new drugs and drug mixtures as well as rigorous determination of the optimal drug formulation to achieve maximum potency with minimum side-effects. Prediction of animal toxicity may be relevant for assessing risk for humans in the future. 相似文献
13.
Although many numerical clustering algorithms have been applied to gene expression dataanalysis,the essential step is still biological interpretation by manual inspection.The correlation betweengenetic co-regulation and affiliation to a common biological process is what biologists expect.Here,weintroduce some clustering algorithms that are based on graph structure constituted by biological knowledge.After applying a widely used dataset,we compared the result clusters of two of these algorithms in terms ofthe homogeneity of clusters and coherence of annotation and matching ratio.The results show that theclusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Clustersoftware,which contains the genes that are most expression correlative and most consistent with biologicalfunctions.Moreover,knowledge-guided analysis seems much more applicable than GO-Cluster in a largerdataset. 相似文献
14.
Analysis of large-scale gene expression data. 总被引:10,自引:0,他引:10
G Sherlock 《Briefings in bioinformatics》2001,2(4):350-362
DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data. 相似文献
15.
Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact. 相似文献
16.
Although large-scale gene expression data have been studied from many perspectives, they have not been systematically integrated to infer the regulatory potentials of individual genes in specific pathways. Here we report the analysis of expression patterns of genes in the Calvin cycle from 95 Arabidopsis microarray experiments, which revealed a consistent gene regulation pattern in most experiments. This identified pattern, likely due to gene regulation by light rather than feedback regulations of the metabolite fluxes in the Calvin cycle, is remarkably consistent with the rate-limiting roles of the enzymes encoded by these genes reported from both experimental and modeling approaches. Therefore, the regulatory potential of the genes in a pathway may be inferred from their expression patterns. Furthermore, gene expression analysis in the context of a known pathway helps to categorize various biological perturbations that would not be recognized with the prevailing methods. 相似文献
17.
Data analysis--not data production--is becoming the bottleneck in gene expression research. Data integration is necessary to cope with an ever increasing amount of data, to cross-validate noisy data sets, and to gain broad interdisciplinary views of large biological data sets. New Internet resources may help researchers to combine data sets across different gene expression platforms. However, noise and disparities in experimental protocols strongly limit data integration. A detailed review of four selected studies reveals how some of these limitations may be circumvented and illustrates what can be achieved through data integration. 相似文献
18.
Differential display (DD) is one of the most commonly used approaches for identifying differentially expressed genes. However,
there has been lack of an accurate guidance on how many DD polymerase chain reaction (PCR) primer combinations are needed
to display most of the genes expressed in a eukaryotic cell. This study critically evaluated the gene coverage by DD as a
function of the number of arbitrary primers, the number of 3′ bases of an arbitrary primer required to completely match an
mRNA target sequence, the additional 5′ base match(s) of arbitrary primers in first-strand cDNA recognition, and the length
of mRNA tails being analyzed. The resulting new DD mathematical model predicts that 80 to 160 arbitrary 13mers, when used
in combinations with 3 one-base anchored oligo-dT primers, would allow any given mRNA within a eukaryotic cell to be detected
with a 74% to 93% probability, respectively. The prediction was supported by both computer simulation of the DD process and
experimental data from a comprehensive fluorescent DD screening for target genes of tumor-suppressor p53. Thus, this work provides a theoretical foundation upon which global analysis of gene expression by DD can be pursued. 相似文献
19.
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships. 相似文献