首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Increasing knowledge about the organization of proteins into complexes, systems, and pathways has led to a flowering of theoretical approaches for exploiting this knowledge in order to better learn the functions of proteins and their roles underlying phenotypic traits and diseases. Much of this body of theory has been developed and tested in model organisms, relying on their relative simplicity and genetic and biochemical tractability to accelerate the research. In this review, we discuss several of the major approaches for computationally integrating proteomics and genomics observations into integrated protein networks, then applying guilt-by-association in these networks in order to identify genes underlying traits. Recent trends in this field include a rising appreciation of the modular network organization of proteins underlying traits or mutational phenotypes, and how to exploit such protein modularity using computational approaches related to the internet search algorithm PageRank. Many protein network-based predictions have recently been experimentally confirmed in yeast, worms, plants, and mice, and several successful approaches in model organisms have been directly translated to analyze human disease, with notable recent applications to glioma and breast cancer prognosis.  相似文献   

2.
Cells exhibit a variety of phenotypes in different stages and diseases. Although several markers for cellular phenotypes have been identified, gene combinations denoting cellular phenotypes have not been completely elucidated. Recent advances in gene analysis have revealed that various gene expression patterns are observed in each cell species and status. In this review, the perspectives of gene combinations in cellular phenotype presentation are discussed. Gene expression profiles change during cellular processes, such as cell proliferation, cell differentiation, and cell death. In addition, epigenetic regulation increases the complexity of the gene expression profile. The role of gene combinations and panels of gene combinations in each cellular condition are also discussed.  相似文献   

3.
Statistical inference for simultaneous clustering of gene expression data   总被引:1,自引:0,他引:1  
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function theta=Phi(P) of the true data generating distribution P, and an estimate is obtained by applying this function to the empirical distribution P(n). We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distribution of Phi(P(n)). The method is illustrated on a publicly available data set.  相似文献   

4.
Automated function prediction (AFP) methods increasingly use knowledge discovery algorithms to map sequence, structure, literature, and/or pathway information about proteins whose functions are unknown into functional ontologies, typically (a portion of) the Gene Ontology (GO). While there are a growing number of methods within this paradigm, the general problem of assessing the accuracy of such prediction algorithms has not been seriously addressed. We present first an application for function prediction from protein sequences using the POSet Ontology Categorizer (POSOC) to produce new annotations by analyzing collections of GO nodes derived from annotations of protein BLAST neighborhoods. We then also present hierarchical precision and hierarchical recall as new evaluation metrics for assessing the accuracy of any predictions in hierarchical ontologies, and discuss results on a test set of protein sequences. We show that our method provides substantially improved hierarchical precision (measure of predictions made that are correct) when applied to the nearest BLAST neighbors of target proteins, as compared with simply imputing that neighborhood's annotations to the target. Moreover, when our method is applied to a broader BLAST neighborhood, hierarchical precision is enhanced even further. In all cases, such increased hierarchical precision performance is purchased at a modest expense of hierarchical recall (measure of all annotations that get predicted at all).  相似文献   

5.
Gene Ontology广泛地应用于基于基因芯片数据的差异表达功能类分析。基因芯片技术存在检测缺失与检测误差等问题。本文探讨上述这二个因素对利用基因表达谱挖掘Gene Ontology中差异表达功能类的影响。结果显示,差异表达功能类对于检测缺失与检测误差干扰等有一定的稳健性。  相似文献   

6.
The function of the majority of genes in the human and mouse genomes is unknown. Investigating and illuminating this dark genome is a major challenge for the biomedical sciences. The International Mouse Phenotyping Consortium (IMPC) is addressing this through the generation and broad-based phenotyping of a knockout (KO) mouse line for every protein-coding gene, producing a multidimensional data set that underlies a genome-wide annotation map from genes to phenotypes. Here, we develop a multivariate (MV) statistical approach and apply it to IMPC data comprising 148 phenotypes measured across 4,548 KO lines.There are 4,256 (1.4% of 302,997 observed data measurements) hits called by the univariate (UV) model analysing each phenotype separately, compared to 31,843 (10.5%) hits in the observed data results of the MV model, corresponding to an estimated 7.5-fold increase in power of the MV model relative to the UV model. One key property of the data set is its 55.0% rate of missingness, resulting from quality control filters and incomplete measurement of some KO lines. This raises the question of whether it is possible to infer perturbations at phenotype–gene pairs at which data are not available, i.e., to infer some in vivo effects using statistical analysis rather than experimentation. We demonstrate that, even at missing phenotypes, the MV model can detect perturbations with power comparable to the single-phenotype analysis, thereby filling in the complete gene–phenotype map with good sensitivity.A factor analysis of the MV model’s fitted covariance structure identifies 20 clusters of phenotypes, with each cluster tending to be perturbed collectively. These factors cumulatively explain 75% of the KO-induced variation in the data and facilitate biological interpretation of perturbations. We also demonstrate that the MV approach strengthens the correspondence between IMPC phenotypes and existing gene annotation databases. Analysis of a subset of KO lines measured in replicate across multiple laboratories confirms that the MV model increases power with high replicability.

The function of the majority of genes in the human and mouse genomes is unknown, and illuminating this "dark genome" is a major challenge for the biomedical sciences. This study shows that multi-dimensional phenotypes from single-gene knockout mouse lines can be analysed at a genome-wide scale both to increase power and infer missing phenotypes.  相似文献   

7.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interaction data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automatically selects the most appropriate functional classes as specific as possible during the learning process, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

8.
GESTs (gene expression similarity and taxonomy similarity), a gene functional prediction approach previously proposed by us, is based on gene expression similarity and concept similarity of functional classes defined in Gene Ontology (GO). In this paper, we extend this method to protein-protein interac-tion data by introducing several methods to filter the neighbors in protein interaction networks for a protein of unknown function(s). Unlike other conventional methods, the proposed approach automati-cally selects the most appropriate functional classes as specific as possible during the learning proc-ess, and calls on genes annotated to nearby classes to support the predictions to some small-sized specific classes in GO. Based on the yeast protein-protein interaction information from MIPS and a dataset of gene expression profiles, we assess the performances of our approach for predicting protein functions to “biology process” by three measures particularly designed for functional classes organ-ized in GO. Results show that our method is powerful for widely predicting gene functions with very specific functional terms. Based on the GO database published in December 2004, we predict some proteins whose functions were unknown at that time, and some of the predictions have been confirmed by the new SGD annotation data published in April, 2006.  相似文献   

9.
Currently, some efforts have been devoted to the text analysis of disease phenotype data, and their results indicated that similar disease phenotypes arise from functionally related genes. These related genes work together, as a functional module, to perform a desired cellular function. We constructed a text-based human disease phenotype network and detected 82 disease-specific gene functional modules, each corresponding to a different phenotype cluster, by means of graph-based clustering and mapping from disease phenotype to gene. Since genes in such gene functional modules are functionally related and cause clinically similar diseases, they may share common genetic origin of their associated disease phenotypes. We believe the investigation may facilitate the ultimate understanding of the common pathophysiologic basis of associated diseases.  相似文献   

10.
11.
In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients.  相似文献   

12.
13.
Learnability-based further prediction of gene functions in Gene Ontology   总被引:9,自引:0,他引:9  
Tu K  Yu H  Guo Z  Li X 《Genomics》2004,84(6):922-928
Currently the functional annotations of many genes are not specific enough, limiting their further application in biology and medicine. It is necessary to push the gene functional annotations deeper in Gene Ontology (GO), or to predict further annotated genes with more specific GO terms. A framework of learnability-based further prediction of gene functions in GO is proposed in this paper. Local classifiers are constructed in local classification spaces rooted at qualified parent nodes in GO, and their classification performances are evaluated with the averaged Tanimoto index (ATI). Classification spaces with higher ATIs are selected out, and genes annotated only to the parent classes are predicted to child classes. Through learnability-based further predicting, the functional annotations of annotated genes are made more specific. Experiments on the fibroblast serum response dataset reported further functional predictions for several human genes and also gave interesting clues to the varied learnability between classes of different GO ontologies, different levels, and different numbers of child classes.  相似文献   

14.
在最近兴起的生物医学大数据和精准医学研究中,本体正发挥着不可替代的作用。但一大部分人对本体的定义、类型与应用比较混淆。这篇综述对这些问题进行了回答。具体而言,本体使用人和计算机都可以理解的术语及关系集来表述各种实体/概念以及它们之间关系。根据本体的作用,本体可分为三类:具有多关系模型的增强版控制术语集,反映某知识域的本体知识系统,和对元数据在语义上的约束与标准化的元数据本体。基于这些作用,本体被广泛地应用于生物医学数据的标准化、整合、检索与分析,已经知识的挖掘。当代本体学的研究发展迅速,我国刚刚起步,机遇与挑战并存。广泛地开展国内与国际有关本体的合作研究,将促进国内生物医学本体领域研究水平的整体提升,提高生物与临床科研大数据整合与精准医学研究的能力。  相似文献   

15.
16.
17.
Human embryogenesis includes an integrated set of complex yet coordinated development of different organs and tissues, which is regulated by the spatiotemporal expression of many genes. Deciphering the gene regulation profile is essential for understanding the molecular basis of human embryo development. While molecular and genetic studies in mouse have served as a valuable tool to understand mammalian development, significant differences exists in human and mouse development at morphological and genomic levels. Thus it is important to carry out research directly on human embryonic development. Here we will review some recent studies on gene regulation during human embryogenesis with particular focus on the period of organogenesis, which had not been well studied previously. We will highlight a gene expression database of human embryos from the 4(th) to the 9(th) week. The analysis of gene regulation during this period reveals that genes functioning in a given developmental process tend to be coordinately regulated during human embryogenesis. This feature allows us to use this database to identify new genes important for a particular developmental process/pathway and deduce the potential function of a novel gene during organogenesis. Such a gene expression atlas should serve as an important resource for molecular study of human development and pathogenesis.  相似文献   

18.

Background

The past few years have seen a rapid development in novel high-throughput technologies that have created large-scale data on protein-protein interactions (PPI) across human and most model species. This data is commonly represented as networks, with nodes representing proteins and edges representing the PPIs. A fundamental challenge to bioinformatics is how to interpret this wealth of data to elucidate the interaction of patterns and the biological characteristics of the proteins. One significant purpose of this interpretation is to predict unknown protein functions. Although many approaches have been proposed in recent years, the challenge still remains how to reasonably and precisely measure the functional similarities between proteins to improve the prediction effectiveness.

Results

We used a Semantic and Layered Protein Function Prediction (SLPFP) framework to more effectively predict unknown protein functions at different functional levels. The framework relies on a new protein similarity measurement and a clustering-based protein function prediction algorithm. The new protein similarity measurement incorporates the topological structure of the PPI network, as well as the protein’s semantic information in terms of known protein functions at different functional layers. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed framework in predicting unknown protein functions.

Conclusion

The proposed framework has a higher prediction accuracy compared with other similar approaches. The prediction results are stable even for a large number of proteins. Furthermore, the framework is able to predict unknown functions at different functional layers within the Munich Information Center for Protein Sequence (MIPS) hierarchical functional scheme. The experimental results demonstrated that the new protein similarity measurement reflects more reasonably and precisely relationships between proteins.  相似文献   

19.
Assessing reliability of gene clusters from gene expression data   总被引:5,自引:0,他引:5  
The rapid development of microarray technologies has raised many challenging problems in experiment design and data analysis. Although many numerical algorithms have been successfully applied to analyze gene expression data, the effects of variations and uncertainties in measured gene expression levels across samples and experiments have been largely ignored in the literature. In this article, in the context of hierarchical clustering algorithms, we introduce a statistical resampling method to assess the reliability of gene clusters identified from any hierarchical clustering method. Using the clustering trees constructed from the resampled data, we can evaluate the confidence value for each node in the observed clustering tree. A majority-rule consensus tree can be obtained, showing clusters that only occur in a majority of the resampled trees. We illustrate our proposed methods with applications to two published data sets. Although the methods are discussed in the context of hierarchical clustering methods, they can be applied with other cluster-identification methods for gene expression data to assess the reliability of any gene cluster of interest. Electronic Publication  相似文献   

20.
Comprehensive analysis of keratin gene clusters in humans and rodents   总被引:1,自引:0,他引:1  
Here, we present the comparative analysis of the two keratin (K) gene clusters in the genomes of man, mouse and rat. Overall, there is a remarkable but not perfect synteny among the clusters of the three mammalian species. The human type I keratin gene cluster consists of 27 genes and 4 pseudogenes, all in the same orientation. It is interrupted by a domain of multiple genes encoding keratin-associated proteins (KAPs). Cytokeratin, hair and inner root sheath keratin genes are grouped together in small subclusters, indicating that evolution occurred by duplication events. At the end of the rodent type I gene cluster, a novel gene related to K14 and K17 was identified, which is converted to a pseudogene in humans. The human type II cluster consists of 27 genes and 5 pseudogenes, most of which are arranged in the same orientation. Of the 26 type II murine keratin genes now known, the expression of two new genes was identified by RT-PCR. Kb20, the first gene in the cluster, was detected in lung tissue. Kb39, a new ortholog of K1, is expressed in certain stratified epithelia. It represents a candidate gene for those hyperkeratotic skin syndromes in which no K1 mutations were identified so far. Most remarkably, the human K3 gene which causes Meesmann's corneal dystrophy when mutated, lacks a counterpart in the mouse genome. While the human genome has 138 pseudogenes related to K8 and K18, the mouse and rat genomes contain only 4 and 6 such pseudogenes. Our results also provide the basis for a unified keratin nomenclature and for future functional studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号