首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One of the most important tasks of modern bioinformatics is the development of computational tools that can be used to understand and treat human disease. To date, a variety of methods have been explored and algorithms for candidate gene prioritization are gaining in their usefulness. Here, we propose an algorithm for detecting gene-disease associations based on the human protein-protein interaction network, known gene-disease associations, protein sequence, and protein functional information at the molecular level. Our method, PhenoPred, is supervised: first, we mapped each gene/protein onto the spaces of disease and functional terms based on distance to all annotated proteins in the protein interaction network. We also encoded sequence, function, physicochemical, and predicted structural properties, such as secondary structure and flexibility. We then trained support vector machines to detect gene-disease associations for a number of terms in Disease Ontology and provided evidence that, despite the noise/incompleteness of experimental data and unfinished ontology of diseases, identification of candidate genes can be successful even when a large number of candidate disease terms are predicted on simultaneously. Availability: www.phenopred.org.  相似文献   

2.
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.  相似文献   

3.
Microarray technology has become employed widely for biological researchers to identify genes associated with conditions such as diseases and drugs. To date, many methods have been developed to analyze data covering a large number of genes, but they focus only on statistical significance and cannot decipher the data with biological concepts. Gene Ontology (GO) is utilized to understand the data with biological interpretation; however, it is restricted to specific ontology such as biological process, molecular function, and cellular component. Here, we attempted to apply MeSH (Medical Subject Headings) to interpret groups of genes from biological viewpoint. To assign MeSH terms to genes, in this study, contexts associated with genes are retrieved from full set of MEDLINE data using machine learning, and then extracted MeSH terms from retrieved articles. Utilizing the developed method, we implemented a software called BioCompass. It generates high-scoring lists and hierarchical lists for diseases MeSH terms associated with groups of genes to utilize MeSH and GO tree, and illustrated a wiring diagram by linking genes with extracted association from articles. Researchers can easily retrieve genes and keywords of interest, such as diseases and drugs, associated with groups of genes. Using retrieved MeSH terms and OMIM in conjunction with, we could obtain more disease information associated with target gene. BioCompass helps researchers to interpret groups of genes such as microarray data from a biological viewpoint.  相似文献   

4.
5.
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.  相似文献   

6.
A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE''s predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.  相似文献   

7.
Systemic lupus erythematosus (SLE) commonly accredited as “the great imitator” is a highly complex disease involving multiple gene susceptibility with non-specific symptoms. Many experimental and computational approaches have been used to investigate the disease related candidate genes. But the limited knowledge of gene function and disease correlation and also lack of complete functional details about the majority of genes in susceptible locus, encumbrances the identification of SLE related candidate genes. In this paper, we have studied the human immunome network (undirected) using various graph theoretical centrality measures integrated with the gene ontology terms to predict the new candidate genes. As a result, we have identified 8 candidate genes, which may act as potential targets for SLE disease. We have also carried out the same analysis by replacing the human immunome network with human immunome signaling network (directed) and as an outcome we have obtained 5 candidate genes as potential targets for SLE disease. From the comparison study, we have found these two approaches are complementary in nature.  相似文献   

8.
Familial hypercholesterolemia (FH) is a monogenic lipid disorder which promotes atherosclerosis and cardiovascular diseases. Owing to the lack of sufficient published information, this study aims to identify the potential genetic biomarkers for FH by studying the global gene expression profile of blood cells. The microarray expression data of FH patients and controls was analyzed by different computational biology methods like differential expression analysis, protein network mapping, hub gene identification, functional enrichment of biological pathways, and immune cell restriction analysis. Our results showed the dysregulated expression of 115 genes connected to lipid homeostasis, immune responses, cell adhesion molecules, canonical Wnt signaling, mucin type O-glycan biosynthesis pathways in FH patients. The findings from expanded protein interaction network construction with known FH genes and subsequent Gene Ontology (GO) annotations have also supported the above findings, in addition to identifying the involvement of dysregulated thyroid hormone and ErbB signaling pathways in FH patients. The genes like CSNK1A1, JAK3, PLCG2, RALA, and ZEB2 were found to be enriched under all GO annotation categories. The subsequent phenotype ontology results have revealed JAK3I, PLCG2, and ZEB2 as key hub genes contributing to the inflammation underlying cardiovascular and immune response related phenotypes. Immune cell restriction findings show that above three genes are highly expressed by T-follicular helper CD4+ T cells, naïve B cells, and monocytes, respectively. These findings not only provide a theoretical basis to understand the role of immune dysregulations underlying the atherosclerosis among FH patients but may also pave the way to develop genomic medicine for cardiovascular diseases.  相似文献   

9.
10.
A first-draft human protein-interaction map   总被引:3,自引:2,他引:1       下载免费PDF全文

Background

Protein-interaction maps are powerful tools for suggesting the cellular functions of genes. Although large-scale protein-interaction maps have been generated for several invertebrate species, projects of a similar scale have not yet been described for any mammal. Because many physical interactions are conserved between species, it should be possible to infer information about human protein interactions (and hence protein function) using model organism protein-interaction datasets.

Results

Here we describe a network of over 70,000 predicted physical interactions between around 6,200 human proteins generated using the data from lower eukaryotic protein-interaction maps. The physiological relevance of this network is supported by its ability to preferentially connect human proteins that share the same functional annotations, and we show how the network can be used to successfully predict the functions of human proteins. We find that combining interaction datasets from a single organism (but generated using independent assays) and combining interaction datasets from two organisms (but generated using the same assay) are both very effective ways of further improving the accuracy of protein-interaction maps.

Conclusions

The complete network predicts interactions for a third of human genes, including 448 human disease genes and 1,482 genes of unknown function, and so provides a rich framework for biomedical research.
  相似文献   

11.
Chen L  Tai J  Zhang L  Shang Y  Li X  Qu X  Li W  Miao Z  Jia X  Wang H  Li W  He W 《Molecular bioSystems》2011,7(9):2547-2553
Understanding the pathogenesis of complex diseases is aided by precise identification of the genes responsible. Many computational methods have been developed to prioritize candidate disease genes, but coverage of functional annotations may be a limiting factor for most of these methods. Here, we introduce a global candidate gene prioritization approach that considers information about network properties in the human protein interaction network and risk transformative contents from known disease genes. Global risk transformative scores were then used to prioritize candidate genes. This method was introduced to prioritize candidate genes for prostate cancer. The effectiveness of our global risk transformative algorithm for prioritizing candidate genes was evaluated according to validation studies. Compared with ToppGene and random walk-based methods, our method outperformed the two other candidate gene prioritization methods. The generality of our method was assessed by testing it on prostate cancer and other types of cancer. The performance was evaluated using standard leave-one-out cross-validation.  相似文献   

12.
Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.  相似文献   

13.
复杂疾病的发生发展与机体内生物学通路的功能紊乱有密切联系,从高通量数据出发,利用计算机辅助方法来研究疾病与通路间的关系具有重要意义.本文提出了一个新的基于网络的全局性通路识别方法.该方法利用蛋白质互作信息和通路的基因集组成信息构建复杂的蛋白质-通路网.然后,基于表达谱数据,通过随机游走算法从全局层面优化疾病风险通路.最终,通过扰动方式识别统计学显著的风险通路.将该网络运用于结肠直肠癌风险通路识别,识别出15个与结肠直肠癌发生与发展过程显著相关的通路.通过与其他通路识别方法(超几何检验,SPIA)相比较,该方法能够更有效识别出疾病相关的风险通路.  相似文献   

14.

Background

Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge.

Results

In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence.

Conclusions

The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases.
  相似文献   

15.
Genome-wide association studies for a variety of diseases are identifying increasing numbers of candidate genes. Now we are confronted with the fact that some genes are common candidates across diseases. Thus there is a strong need to develop a hypothesis formulation methodology to comprehend multifaceted associations between genes and diseases. We have developed a computational method for building transdisease-transgene association structure. By introducing the basic rationale underlying the gene knockout approach as an information processing procedure to a network constructed on the basis of hyperlinks between disease and gene pages listed in the Online Mendelian Inheritance in Man (OMIM) database, relations of genes with diseases are computationally quantified. We did successively eliminate gene pages (called "computational gene knockout" in this paper) expected to contribute to metabolic syndrome, and catalogued each association with various disease pages. We thereby apply a co-clustering method to the gene-disease relations to obtain an association structure by classifying diseases and genes simultaneously. Observing an association structure between over 100 diseases and their related genes, we then found that the structure revealed gene classes that were commonly associated with diseases as well as gene classes that were selectively associated with a specific disease class.  相似文献   

16.
Borklu Yucel E  Ulgen KO 《PloS one》2011,6(12):e29284

Background

Cellular mechanisms leading to aging and therefore increasing susceptibility to age-related diseases are a central topic of research since aging is the ultimate, yet not understood mechanism of the fate of a cell. Studies with model organisms have been conducted to ellucidate these mechanisms, and chronological aging of yeast has been extensively used as a model for oxidative stress and aging of postmitotic tissues in higher eukaryotes.

Methodology/Principal Findings

The chronological aging network of yeast was reconstructed by integrating protein-protein interaction data with gene ontology terms. The reconstructed network was then statistically “tuned” based on the betweenness centrality values of the nodes to compensate for the computer automated method. Both the originally reconstructed and tuned networks were subjected to topological and modular analyses. Finally, an ultimate “heart” network was obtained via pooling the step specific key proteins, which resulted from the decomposition of the linear paths depicting several signaling routes in the tuned network.

Conclusions/Significance

The reconstructed networks are of scale-free and hierarchical nature, following a power law model with γ  =  1.49. The results of modular and topological analyses verified that the tuning method was successful. The significantly enriched gene ontology terms of the modular analysis confirmed also that the multifactorial nature of chronological aging was captured by the tuned network. The interplay between various signaling pathways such as TOR, Akt/PKB and cAMP/Protein kinase A was summarized in the “heart” network originated from linear path analysis. The deletion of four genes, TCB3, SNA3, PST2 and YGR130C, was found to increase the chronological life span of yeast. The reconstructed networks can also give insight about the effect of other cellular machineries on chronological aging by targeting different signaling pathways in the linear path analysis, along with unraveling of novel proteins playing part in these pathways.  相似文献   

17.
Rapid development of high-throughput technologies has permitted the identification of an increasing number of disease-associated genes (DAGs), which are important for understanding disease initiation and developing precision therapeutics. However, DAGs often contain large amounts of redundant or false positive information, leading to difficulties in quantifying and prioritizing potential relationships between these DAGs and human diseases. In this study, a network-oriented gene entropy approach (NOGEA) is proposed for accurately inferring master genes that contribute to specific diseases by quantitatively calculating their perturbation abilities on directed disease-specific gene networks. In addition, we confirmed that the master genes identified by NOGEA have a high reliability for predicting disease-specific initiation events and progression risk. Master genes may also be used to extract the underlying information of different diseases, thus revealing mechanisms of disease comorbidity. More importantly, approved therapeutic targets are topologically localized in a small neighborhood of master genes in the interactome network, which provides a new way for predicting drug-disease associations. Through this method, 11 old drugs were newly identified and predicted to be effective for treating pancreatic cancer and then validated by in vitro experiments. Collectively, the NOGEA was useful for identifying master genes that control disease initiation and co-occurrence, thus providing a valuable strategy for drug efficacy screening and repositioning. NOGEA codes are publicly available at https://github.com/guozihuaa/NOGEA.  相似文献   

18.

Background

Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes.

Results

We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases.

Conclusions

In this study, we show that node embeddings learned from PPI networks work well for disease gene prediction, while integrating node embeddings with other biological annotations further improves the performance of classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction.
  相似文献   

19.
刘澳  陈宇  亓春龙  吕晓萌  王威 《菌物学报》2023,42(1):312-329
菌盖是大型真菌的重要组成部分,也是其产生有性孢子的部位,但是其发育机制仍不明确。本研究以金针菇Flammulina filiformis为材料,采用转录组和蛋白组联合分析的方法,比较分析了金针菇成熟期和伸长期菌盖的差异基因与蛋白,并对其进行GO (gene ontology)功能聚类分析、KEGG (Kyoto encyclopedia of genes and genomes)富集分析和蛋白互作网络分析。本研究筛选到差异表达基因有1 391个,差异表达蛋白147个,均以上调表达为主。GO功能聚类分析结果表明,催化活性(catalytic activity)条目富集基因最多,其次是细胞组分(cell part)、细胞过程(cellular process)和细胞器(organelle)。KEGG富集分析结果表明,差异表达基因和蛋白主要富集在碳水化合物代谢通路(carbohydrate metabolism)和氨基酸代谢通路(amino acid metabolism)等。本研究选取了9个关键的差异表达基因,使用实时荧光定量PCR (real-time quantitative PCR,RT-qPCR)对其表达量进行了验证。RT-qPCR验证结果与转录组测序结果相一致。蛋白互作网络分析表明,水解酶类、结构域类和转录调节类蛋白为互作网络的主要结点。本研究联合转录组、蛋白组测序数据,通过分析差异基因与蛋白,为深入了解金针菇菌盖发育机制提供数据参考。  相似文献   

20.
Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号