首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

2.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

3.
k-均值聚类算法是一种广泛应用于基因表达数据聚类分析中的迭代变换算法,它通常用距离法来表示基因间的关系,但不能有效的反应基因间的相互依赖的关系。为此,提出基于信息论的k-modes聚类算法,克服了以上缺点。另外,还引入了伪F统计量,一方面,可以对空间中有部分重叠的点进行有效的分类;另一方面,可以给出最佳聚类数目,从而弥补了k-modes聚类法的不足。使其成为一种非常有效的算法,从而达到较优的聚类效果。  相似文献   

4.
基于PCR的基因差异表达分析技术   总被引:2,自引:0,他引:2  
基因差异表达分析是研究许多生物学过程的分子基础的一条直接、有效的途径。自DDRT-PCR技术建立以来,一系列基于PCR的基因差异表达分析技术,如SAGE、SSH、RDA和DNA微阵列等相继发展起来,为分析和克隆差异表达的基因提供了更为快速、灵敏的工具。本对这几种方法进行了简要综述,比较了不同方法的优缺点,并展望了今后基因差异表达研究技术的发展方向。  相似文献   

5.
随着DNA芯片技术的广泛应用,基因表达数据分析已成为生命科学的研究热点之一。概述基因表达聚类技术类型、算法分类与特点、结果可视化与注释;阐述一些流行的和新型的算法;介绍17个最新相关软件包和在线web服务工具;并说明软件工具的研究趋向。  相似文献   

6.
A hybrid GA (genetic algorithm)-based clustering (HGACLUS) schema, combining merits of the Simulated Annealing, was described for finding an optimal or near-optimal set of medoids. This schema maximized the clustering success by achieving internal cluster cohesion and external cluster isolation. The performance  相似文献   

7.
8.
Array-based gene expression studies frequently serve to identify genes that are expressed differently under two or more conditions. The actual analysis of the data, however, may be hampered by a number of technical and statistical problems. Possible remedies on the level of computational analysis lie in appropriate preprocessing steps, proper normalization of the data and application of statistical testing procedures in the derivation of differentially expressed genes. This review summarizes methods that are available for these purposes and provides a brief overview of the available software tools.  相似文献   

9.
Assessing reliability of gene clusters from gene expression data   总被引:5,自引:0,他引:5  
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis. Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature. In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability of any gene cluster of interest. Electronic Publication  相似文献   

10.
We study statistical methods to detect cancer genes that are over- or down-expressed in some but not all samples in a disease group. This has proven useful in cancer studies where oncogenes are activated only in a small subset of samples. We propose the outlier robust t-statistic (ORT), which is intuitively motivated from the t-statistic, the most commonly used differential gene expression detection method. Using real and simulation studies, we compare the ORT to the recently proposed cancer outlier profile analysis (Tomlins and others, 2005) and the outlier sum statistic of Tibshirani and Hastie (2006). The proposed method often has more detection power and smaller false discovery rates. Supplementary information can be found at http://www.biostat.umn.edu/~baolin/research/ort.html.  相似文献   

11.
高磊  朱明珠  郭政  李霞 《生物信息学》2006,4(3):105-108
利用基因表达谱数据,通过计算互作蛋白质的表达相关系数,来筛选、优化蛋白质互作网络。结果显示,利用经过筛选的互作数据,根据邻居计数法和卡方法进行功能预测的预测效果明显提高,距离待测蛋白质较远的邻居也包含着与待测蛋白质功能一致的信息。  相似文献   

12.
The risk associated with exposure to hepatotoxic drugs is difficult to quantify. Animal experiments to assess their chronic toxicological impact are time consuming. New quantitative approaches to correlate gene expression changes caused by drug exposure to chronic toxicity are required. This article proposes a mathematical model entitled Toxicologic Prediction Network (TPN) to assess chronic hepatotoxicity based on subchronic hepatic gene expression data in rats. A directed graph accounts for the interactions between the drugs, differentially expressed genes and chronic hepatotoxicity. A knowledge-based mathematical model estimates phenotypical exposure risk such as toxic hepatopathy, diffuse fatty change and hepatocellular adenoma for rats. The network's edges encoding the interaction strength are determined by solving an inversion problem that minimizes the difference between the observed and the predicted relative gene expressions as well as the chronic toxicity data. A realistic case study demonstrates how chronic health risk of three halogenated aromatic hydrocarbons can be inferred from subchronic gene expression data. The advantages of the TPN are further demonstrated through two novel applications: Estimation of toxicological impact of new drugs and drug mixtures as well as rigorous determination of the optimal drug formulation to achieve maximum potency with minimum side-effects. Prediction of animal toxicity may be relevant for assessing risk for humans in the future.  相似文献   

13.
Although many numerical clustering algorithms have been applied to gene expression dataanalysis,the essential step is still biological interpretation by manual inspection.The correlation betweengenetic co-regulation and affiliation to a common biological process is what biologists expect.Here,weintroduce some clustering algorithms that are based on graph structure constituted by biological knowledge.After applying a widely used dataset,we compared the result clusters of two of these algorithms in terms ofthe homogeneity of clusters and coherence of annotation and matching ratio.The results show that theclusters of knowledge-guided analysis are the kernel parts of the clusters of Gene Ontology (GO)-Clustersoftware,which contains the genes that are most expression correlative and most consistent with biologicalfunctions.Moreover,knowledge-guided analysis seems much more applicable than GO-Cluster in a largerdataset.  相似文献   

14.
Analysis of large-scale gene expression data.   总被引:10,自引:0,他引:10  
DNA microarray technology has resulted in the generation of large complex data sets, such that the bottleneck in biological investigation has shifted from data generation, to data analysis. This review discusses some of the algorithms and tools for the analysis and organisation of microarray expression data, including clustering methods, partitioning methods, and methods for correlating expression data to other biological data.  相似文献   

15.
Currently, linear mixed model analyses of expression microarray experiments are performed either in a gene-specific or global mode. The joint analysis provides more flexibility in terms of how parameters are fitted and estimated and tends to be more powerful than the gene-specific analysis. Here we show how to implement the gene-specific linear mixed model analysis as an exact algorithm for the joint linear mixed model analysis. The gene-specific algorithm is exact, when the mixed model equations can be partitioned into unrelated components: One for all global fixed and random effects and the others for the gene-specific fixed and random effects for each gene separately. This unrelatedness holds under three conditions: (1) any gene must have the same number of replicates or probes on all arrays, but these numbers can differ among genes; (2) the residual variance of the (transformed) expression data must be homogeneous or constant across genes (other variance components need not be homogeneous) and (3) the number of genes in the experiment is large. When these conditions are violated, the gene-specific algorithm is expected to be nearly exact.  相似文献   

16.
Although large-scale gene expression data have been studied from many perspectives, they have not been systematically integrated to infer the regulatory potentials of individual genes in specific pathways. Here we report the analysis of expression patterns of genes in the Calvin cycle from 95 Arabidopsis microarray experiments, which revealed a consistent gene regulation pattern in most experiments. This identified pattern, likely due to gene regulation by light rather than feedback regulations of the metabolite fluxes in the Calvin cycle, is remarkably consistent with the rate-limiting roles of the enzymes encoded by these genes reported from both experimental and modeling approaches. Therefore, the regulatory potential of the genes in a pathway may be inferred from their expression patterns. Furthermore, gene expression analysis in the context of a known pathway helps to categorize various biological perturbations that would not be recognized with the prevailing methods.  相似文献   

17.
Data analysis--not data production--is becoming the bottleneck in gene expression research. Data integration is necessary to cope with an ever increasing amount of data, to cross-validate noisy data sets, and to gain broad interdisciplinary views of large biological data sets. New Internet resources may help researchers to combine data sets across different gene expression platforms. However, noise and disparities in experimental protocols strongly limit data integration. A detailed review of four selected studies reveals how some of these limitations may be circumvented and illustrates what can be achieved through data integration.  相似文献   

18.
Differential display (DD) is one of the most commonly used approaches for identifying differentially expressed genes. However, there has been lack of an accurate guidance on how many DD polymerase chain reaction (PCR) primer combinations are needed to display most of the genes expressed in a eukaryotic cell. This study critically evaluated the gene coverage by DD as a function of the number of arbitrary primers, the number of 3′ bases of an arbitrary primer required to completely match an mRNA target sequence, the additional 5′ base match(s) of arbitrary primers in first-strand cDNA recognition, and the length of mRNA tails being analyzed. The resulting new DD mathematical model predicts that 80 to 160 arbitrary 13mers, when used in combinations with 3 one-base anchored oligo-dT primers, would allow any given mRNA within a eukaryotic cell to be detected with a 74% to 93% probability, respectively. The prediction was supported by both computer simulation of the DD process and experimental data from a comprehensive fluorescent DD screening for target genes of tumor-suppressor p53. Thus, this work provides a theoretical foundation upon which global analysis of gene expression by DD can be pursued.  相似文献   

19.
We propose a model-based approach to unify clustering and network modeling using time-course gene expression data. Specifically, our approach uses a mixture model to cluster genes. Genes within the same cluster share a similar expression profile. The network is built over cluster-specific expression profiles using state-space models. We discuss the application of our model to simulated data as well as to time-course gene expression data arising from animal models on prostate cancer progression. The latter application shows that with a combined statistical/bioinformatics analyses, we are able to extract gene-to-gene relationships supported by the literature as well as new plausible relationships.  相似文献   

20.
由于基因表达数据高属性维、低样本维的特点,Fisher分类器对该种数据分类性能不是很高。本文提出了Fisher的改进算法Fisher-List。该算法独特之处在于为每个类别确定一个决策阀值,每个阀值既包含总体样本信息,又含有某些对分类至关重要的个体样本信息。本文用实验证明新算法在基因表达数据分类方面比Fisher、LogitBoost、AdaBoost、k-近邻法、决策树和支持向量机具有更高的性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号