共查询到20条相似文献,搜索用时 15 毫秒
1.
Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile 下载免费PDF全文
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us,
is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this
paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in
protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach
automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls
on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast
protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of
our approach for predicting protein functions to “biology process” by three measures particularly designed for functional
classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific
functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown
at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006. 相似文献
2.
Widely predicting specific protein functions based on protein-protein interaction data and gene expression profile 总被引:1,自引:2,他引:1
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006. 相似文献
3.
4.
5.
随着DNA芯片技术的广泛应用,基因表达数据分析已成为生命科学的研究热点之一。概述基因表达聚类技术类型、算法分类与特点、结果可视化与注释;阐述一些流行的和新型的算法;介绍17个最新相关软件包和在线web服务工具;并说明软件工具的研究趋向。 相似文献
6.
A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance 相似文献
7.
8.
Array-based gene expression studies frequently serve to identify genes that are expressed differently under two or more conditions. The actual analysis of the data, however, may be hampered by a number of technical and statistical problems. Possible remedies on the level of computational analysis lie in appropriate preprocessing steps, proper normalization of the data and application of statistical testing procedures in the derivation of differentially expressed genes. This review summarizes methods that are available for these purposes and provides a brief overview of the available software tools. 相似文献
9.
10.
Assessing reliability of gene clusters from gene expression data 总被引:5,自引:0,他引:5
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis.
Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations
and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature.
In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess
the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed
from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule
consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed
methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering
methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability
of any gene cluster of interest.
Electronic Publication 相似文献
11.
基因差异表达与杂种优势形成机制探讨 总被引:6,自引:0,他引:6
对杂种优势这一普遍而重要的生物学现象研究虽有百余年的历史, 但其根本机理尚未阐述清楚。继基因组组成差异及基因效应研究之后, 基因表达差异成为探寻杂种优势分子机理新的切入点。旨在通过揭示杂种中等位基因差异表达、杂种与亲本间基因差异表达的调控机制, 来认识杂种优势形成的分子机理, 从而达到指导育种实践的目的。文章概述了杂种等位基因差异表达现象及其产生机理, 总结了杂种与亲本相比所呈现出的加性、显性和超显性等多种差异基因表达模式, 归纳了表达谱研究筛选出的与杂种优势形成有关的基因, 以及某些关键生化代谢途径对杂种优势形成的贡献。但由于杂种优势机理的复杂性, 基因表达研究并没有得出统一的表达模式, 大多数杂种优势基因也不能被归属为同一类别。尽管如此, 基因表达谱研究毕竟迈出了解析杂种优势形成复杂基因表达网络的第一步, 随着表达谱技术和生物信息学的不断更新和发展, 杂种优势形成的分子机理有望在基因表达层面上取得突破。 相似文献
12.
Wu B 《Biostatistics (Oxford, England)》2007,8(3):566-575
We study statistical methods to detect cancer genes that are over- or down-expressed in some but not all samples in a disease group. This has proven useful in cancer studies where oncogenes are activated only in a small subset of samples. We propose the outlier robust t-statistic (ORT), which is intuitively motivated from the t-statistic, the most commonly used differential gene expression detection method. Using real and simulation studies, we compare the ORT to the recently proposed cancer outlier profile analysis (Tomlins and others, 2005) and the outlier sum statistic of Tibshirani and Hastie (2006). The proposed method often has more detection power and smaller false discovery rates. Supplementary information can be found at http://www.biostat.umn.edu/~baolin/research/ort.html. 相似文献
13.
14.
Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we proposed a new model-based clustering method--the clustering of regression models method, which groups genes that share a similar relationship to the covariate(s). This method provides a unified approach for a family of clustering procedures and can be applied for data collected with various experimental designs. In addition, when combined with per-gene methods for assessing differential expression that employ the same regression modeling structure, an integrated framework for the analysis of microarray data is obtained. The proposed methodology was applied to two microarray data sets, one from a breast cancer study and the other from a yeast cell cycle study. 相似文献
15.
The risk associated with exposure to hepatotoxic drugs is difficult to quantify. Animal experiments to assess their chronic toxicological impact are time consuming. New quantitative approaches to correlate gene expression changes caused by drug exposure to chronic toxicity are required. This article proposes a mathematical model entitled Toxicologic Prediction Network (TPN) to assess chronic hepatotoxicity based on subchronic hepatic gene expression data in rats. A directed graph accounts for the interactions between the drugs, differentially expressed genes and chronic hepatotoxicity. A knowledge-based mathematical model estimates phenotypical exposure risk such as toxic hepatopathy, diffuse fatty change and hepatocellular adenoma for rats. The network's edges encoding the interaction strength are determined by solving an inversion problem that minimizes the difference between the observed and the predicted relative gene expressions as well as the chronic toxicity data. A realistic case study demonstrates how chronic health risk of three halogenated aromatic hydrocarbons can be inferred from subchronic gene expression data. The advantages of the TPN are further demonstrated through two novel applications: Estimation of toxicological impact of new drugs and drug mixtures as well as rigorous determination of the optimal drug formulation to achieve maximum potency with minimum side-effects. Prediction of animal toxicity may be relevant for assessing risk for humans in the future. 相似文献
16.
Although many numerical clustering algorithms have been applied to gene expression dataanalysis,the essential step is still biological interpretation by manual inspection.The correlation betweengenetic co-regulation and affiliation to a common biological process is what biologists expect.Here,weintroduce some clustering algorithms that are based on graph structure constituted by biological knowledge.After applying a widely used dataset,we compared the result clusters of two of these algorithms in terms ofthe homogeneity of clusters and coherence of annotation and matching ratio.The results show that theclusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Clustersoftware,which contains the genes that are most expression correlative and most consistent with biologicalfunctions.Moreover,knowledge-guided analysis seems much more applicable than GO-Cluster in a largerdataset. 相似文献
17.
Analysis of large-scale gene expression data. 总被引:10,自引:0,他引:10
G Sherlock 《Briefings in bioinformatics》2001,2(4):350-362
DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data. 相似文献
18.
Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact. 相似文献
19.
基因表达聚类分析技术的现状与发展 总被引:5,自引:0,他引:5
随着多个生物基因组测序的完成、DNA芯片技术的广泛应用,基因表达数据分析已成为后基因组时代的研究热点.聚类分析能将功能相关的基因按表达谱的相似程度归纳成类,有助于对未知功能的基因进行研究,是目前基因表达分析研究的主要计算技术之一.已有多种聚类分析算法用于基因表达数据分析,各种算法因其着眼点、原理等方面的差异,而各有其优缺点.如何对各种聚类算法的有效性进行分析、并开发新型的、适合于基因表达数据分析的方法已是当务之急. 相似文献
20.
The primitive epithelium of embryonic chicken proventriculus (glandular stomach) differentiates, after day 6 of incubation, into luminal epithelium, which faces the lumen and abundantly secretes mucus, and glandular epithelium, which invaginates into mesenchyme and later expresses embryonic chicken pepsinogen (ECPg). So far it is not well understood how undifferentiated epithelial cells differentiate into these two distinct cell populations. Spasmolytic polypeptide (SP) is known to be expressed in surface mucous cells of mammalian stomach. In order to obtain the differentiation marker for proventricular luminal epithelial cells, we cloned a cDNA encoding chicken SP ( cSP ). Sequence analysis indicated that cSP has the duplicated cysteine-rich domain characteristic of SP. Examination of the spatial and temporal expression pattern of cSP gene revealed that, during embryogenesis, cSP was expressed in luminal epithelial cells of the proventriculus, gizzard, small intestine, and lung, but not the esophagus. In the proventriculus, cSP mRNA was first detected on day 8 of incubation and was localized to differentiated luminal epithelial cells. By using cSP as a molecular marker, the effects of mesenchyme on the differentiation of epithelium were analyzed in vitro . On the basis of these data, a model is presented concerning the differentiation of proventricular epithelium. 相似文献