首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.  相似文献   

2.
Chen L  Tai J  Zhang L  Shang Y  Li X  Qu X  Li W  Miao Z  Jia X  Wang H  Li W  He W 《Molecular bioSystems》2011,7(9):2547-2553
Understanding the pathogenesis of complex diseases is aided by precise identification of the genes responsible. Many computational methods have been developed to prioritize candidate disease genes, but coverage of functional annotations may be a limiting factor for most of these methods. Here, we introduce a global candidate gene prioritization approach that considers information about network properties in the human protein interaction network and risk transformative contents from known disease genes. Global risk transformative scores were then used to prioritize candidate genes. This method was introduced to prioritize candidate genes for prostate cancer. The effectiveness of our global risk transformative algorithm for prioritizing candidate genes was evaluated according to validation studies. Compared with ToppGene and random walk-based methods, our method outperformed the two other candidate gene prioritization methods. The generality of our method was assessed by testing it on prostate cancer and other types of cancer. The performance was evaluated using standard leave-one-out cross-validation.  相似文献   

3.
Zhao J  Yang TH  Huang Y  Holme P 《PloS one》2011,6(9):e24306
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.  相似文献   

4.

Background

Alzheimer’s disease (AD) is one of the leading genetically complex and heterogeneous disorder that is influenced by both genetic and environmental factors. The underlying risk factors remain largely unclear for this heterogeneous disorder. In recent years, high throughput methodologies, such as genome-wide linkage analysis (GWL), genome-wide association (GWA) studies, and genome-wide expression profiling (GWE), have led to the identification of several candidate genes associated with AD. However, due to lack of consistency within their findings, an integrative approach is warranted. Here, we have designed a rank based gene prioritization approach involving convergent analysis of multi-dimensional data and protein-protein interaction (PPI) network modelling.

Results

Our approach employs integration of three different AD datasets- GWL,GWA and GWE to identify overlapping candidate genes ranked using a novel cumulative rank score (SR) based method followed by prioritization using clusters derived from PPI network. SR for each gene is calculated by addition of rank assigned to individual gene based on either p value or score in three datasets. This analysis yielded 108 plausible AD genes. Network modelling by creating PPI using proteins encoded by these genes and their direct interactors resulted in a layered network of 640 proteins. Clustering of these proteins further helped us in identifying 6 significant clusters with 7 proteins (EGFR, ACTB, CDC2, IRAK1, APOE, ABCA1 and AMPH) forming the central hub nodes. Functional annotation of 108 genes revealed their role in several biological activities such as neurogenesis, regulation of MAP kinase activity, response to calcium ion, endocytosis paralleling the AD specific attributes. Finally, 3 potential biochemical biomarkers were found from the overlap of 108 AD proteins with proteins from CSF and plasma proteome. EGFR and ACTB were found to be the two most significant AD risk genes.

Conclusions

With the assumption that common genetic signals obtained from different methodological platforms might serve as robust AD risk markers than candidates identified using single dimension approach, here we demonstrated an integrated genomic convergence approach for disease candidate gene prioritization from heterogeneous data sources linked to AD.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-199) contains supplementary material, which is available to authorized users.  相似文献   

5.
Breast cancer is the most common female death-causing cancer worldwide. A network-based integration method was proposed to identify potential breast cancer genes. First, genes were prioritized using a gene prioritization algorithm by the strategy of disease risks transferred between genes in a network with weighted vertexes and edges. Our prioritization algorithm was effectives and robust for top-ranked seed gene number and higher area under the curve values compared to ToppGene and ToppNet. Then, 20 potential breast cancer genes were identified as common genes of the top 50 candidate genes for their robustness in multiple prioritizations. These genes could accurately classify tumor and normal samples of all and paired sample sets and three independent datasets. Of potential breast cancer genes, 18 were verified by literature and 2 were novel genes that need further study. This study would contribute to the understanding of the genetic architecture for the diagnosis and treatment of breast cancer.  相似文献   

6.
Genome-wide linkage and association studies have demonstrated promise in identifying genetic factors that influence health and disease. An important challenge is to narrow down the set of candidate genes that are implicated by these analyses. Protein-protein interaction (PPI) networks are useful in extracting the functional relationships between known disease and candidate genes, based on the principle that products of genes implicated in similar diseases are likely to exhibit significant connectivity/proximity. Information flow?based methods are shown to be very effective in prioritizing candidate disease genes. In this article, we utilize the topology of PPI networks to infer functional information in the context of disease association. Our approach is based on the assumption that PPI networks are organized into recurrent schemes that underlie the mechanisms of cooperation among different proteins. We hypothesize that proteins associated with similar diseases would exhibit similar topological characteristics in PPI networks. Utilizing the location of a protein in the network with respect to other proteins (i.e., the "topological profile" of the proteins), we develop a novel measure to assess the topological similarity of proteins in a PPI network. We then use this measure to prioritize candidate disease genes based on the topological similarity of their products and the products of known disease genes. We test the resulting algorithm, Vavien, via systematic experimental studies using an integrated human PPI network and the Online Mendelian Inheritance in Man (OMIM) database. Vavien outperforms other network-based prioritization algorithms as shown in the results and is available at www.diseasegenes.org.  相似文献   

7.
Zhang L  Li X  Tai J  Li W  Chen L 《PloS one》2012,7(6):e39542
Predicting candidate genes using gene expression profiles and unbiased protein-protein interactions (PPI) contributes a lot in deciphering the pathogenesis of complex diseases. Recent studies showed that there are significant disparities in network topological features between non-disease and disease genes in protein-protein interaction settings. Integrated methods could consider their characteristics comprehensively in a biological network. In this study, we introduce a novel computational method, based on combined network topological features, to construct a combined classifier and then use it to predict candidate genes for coronary artery diseases (CAD). As a result, 276 novel candidate genes were predicted and were found to share similar functions to known disease genes. The majority of the candidate genes were cross-validated by other three methods. Our method will be useful in the search for candidate genes of other diseases.  相似文献   

8.
Gene co-expression, in many cases, implies the presence of a functional linkage between genes. Co-expression analysis has uncovered gene regulatory mechanisms in model organisms such as Escherichia coli and yeast. Recently, accumulation of Arabidopsis microarray data has facilitated a genome-wide inspection of gene co-expression profiles in this model plant. An approach using network analysis has provided an intuitive way to represent complex co-expression patterns between many genes. Co-expression network analysis has enabled us to extract modules, or groups of tightly co-expressed genes, associated with biological processes. Furthermore, integrated analysis of gene expression and metabolite accumulation has allowed us to hypothesize the functions of genes associated with specific metabolic processes. Co-expression network analysis is a powerful approach for data-driven hypothesis construction and gene prioritization, and provides novel insights into the system-level understanding of plant cellular processes.  相似文献   

9.
MOTIVATION: The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. RESULTS: We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes.  相似文献   

10.
11.
Candidate gene identification is typically labour intensive, involving laboratory experiments required to corroborate or disprove any hypothesis for a nominated candidate gene being considered the causative gene. The traditional approach to reduce the number of candidate genes entails fine-mapping studies using markers and pedigrees. Gene prioritization establishes the ranking of candidate genes based on their relevance to the biological process of interest, from which the most promising genes can be selected for further analysis. To date, many computational methods have focused on the prediction of candidate genes by analysis of their inherent sequence characteristics and similarity with respect to known disease genes, as well as their functional annotation. In the last decade, several computational tools for prioritizing candidate genes have been proposed. A large number of them are web-based tools, while others are standalone applications that install and run locally. This review attempts to take a close look at gene prioritization criteria, as well as candidate gene prioritization algorithms, and thus provide a comprehensive synopsis of the subject matter.  相似文献   

12.
A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE''s predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.  相似文献   

13.
14.
15.
Renal transplantation is the only efficacious treatment for end-stage kidney disease. However, some people have developed renal insufficiency after transplantation, the mechanisms of which have not been well clarified. Previous studies have focused on patient factors, while the effect of gene expression in the donor kidney on post-transplant renal function has been less studied. Donor kidney clinical data and mRNA expression status were extracted from the GEO database (GSE147451). Weight gene co-expression network analysis (WGCNA) and differential gene enrichment analysis were performed. For external validation, we collected data from 122 patients who accepted renal transplantation at several hospitals and measured the level of target genes by qPCR. This study included 192 patients from the GEO data set, and 13 co-expressed genes were confirmed by WGCNA and differential gene enrichment analysis. Then, the PPI network contained 17 edges as well as 12 nodes, and four central genes (PRKDC, RFC5, RFC3 and RBM14) were identified. We found by collecting data from 122 patients who underwent renal transplantation in several hospitals and by multivariate logistic regression that acute graft-versus-host disease postoperative infection, PRKDC [Hazard Ratio (HR) = 4.44; 95% CI = [1.60, 13.68]; p = 0.006] mRNA level correlated with the renal function after transplantation. The prediction model constructed had good predictive accuracy (C-index = 0.886). Elevated levels of donor kidney PRKDC are associated with renal dysfunction after transplantation. The prediction model of renal function status for post-transplant recipients based on PRKDC has good predictive accuracy and clinical application.  相似文献   

16.
17.
本研究通过公共数据和实验数据,全面分析环氧化物水解酶2(epoxide hydrolase 2, EPHX2)在肝细胞癌中的表达情况、功能作用以及预后意义。利用GEO和MitoCarta数据集,筛选肝细胞癌中呈差异表达的线粒体相关基因;利用TCGA数据库分析EPHX2及其相关基因在肝细胞癌中的表达水平;运行R包绘制Kaplan-Meier生存曲线和功能富集分析;基于STRING和GSEA构建蛋白质互作网络和基因集富集分析;荧光定量PCR和GEO数据集验证EPHX2在肝细胞癌中的表达水平。本研究共筛选得到15个在肝细胞癌中呈差异表达的线粒体相关基因。EPHX2在肝细胞癌组织中的表达水平显著降低(P<0.01)。EPHX2表达水平与肝癌患者性别、分期和级别有关,而与年龄、T分期等因素无关。与EPHX2低表达组肝癌患者相比,EPHX2高表达组肝癌患者预后较好。功能富集结果显示,EPHX2与补体途径、脂肪酸降解等信号通路有关。蛋白质互作网络结果显示,EPHX2与HAO1、AGXT、ACOX1、GSTκ1、SCP-2、CAT、CYP2C8,CYP2C9,CYP2B6,和CYP2J2等密切相关。GSEA结果显示,EPHX2低表达组与肝癌细胞增殖、肝癌复发等基因集正相关。荧光定量PCR和GEO数据集验证结果显示,EPHX2在肝细胞癌组织和肝癌细胞株中呈显著低表达。EPHX2在肝细胞癌中呈显著低表达,提示其可能在肝细胞癌发生发展过程中发挥抑癌基因作用,但具体作用机制还需进一步验证。  相似文献   

18.
Protein-protein interaction (PPI) network analysis has been considered as a useful approach to explore the mechanisms of complex diseases, such as cancer. To date, many proteins have been reported to involve in the development of cancer. Exploration of cancer proteins in the human PPI network may provide important biological information to uncover molecular mechanisms of cancer. Here, we have explored network characteristics (including degree, betweenness, clustering coefficient and shortest-path distance) of cancer proteins of the human nuclear and tyrosine kinases receptors network (NR-RTK) constructed in our earlier work. We found that the network topology of cancer proteins in this network have some specific features. Relative to the non-cancer proteins, the cancer proteins have likely higher degree, higher betweenness, similar clustering coefficient and similar shortest-path distance. Finally, we found that the cancer proteins were involved mainly in signalling pathways which dysfunction is directly related to cancer onset. These findings are helpful for cancer candidate protein prioritization and verification, and identification of key pathways involved in cancer disease.  相似文献   

19.
Epithelial ovarian cancer (EOC) is categorized into four major histological subtypes such as clear cell carcinoma (CCC), endometrioid carcinoma (EC), mucinous carcinoma (MC), and serous carcinoma (SC). Heterogeneity of the EOC leads to different clinical outcomes of the disease, although all the subtypes are originated from the same layer of tissue. Therefore, it is of interest to identify the common candidate genes, miRNA and their interaction network in four the subtypes of EOC. A comparative gene expression analysis identified 248 common differentially expressed genes (DEGs) in the four subtypes of EOC. Identified common DEGs were found to be enriched in cancer specific pathways. A protein-protein interaction (PPI) network of the common DEGs were constructed, and subsequent module and survival analyses identified seven key candidate genes (CCNB1, CENPM, CEP55, RACGAP1, TPX2, UBE2C, and ZWINT). We also documented 10 key candidate miRNAs (hsa-mir-16-5p, hsa-mir-23b-3p, hsa-mir-34a-5p, hsa-mir-103a-3p, hsa-mir-107, hsa-mir-124-3p, hsa-mir-129-2-3p, hsa-mir-147a, hsa-mir-205-5p, and hsa-mir-195-5p) linked to the candidate genes. These derived data find application in the understanding of EOC.  相似文献   

20.
In microarray-based case–control studies of a disease, people often attempt to identify a few diagnostic or prognostic markers amongst the most significant differentially expressed (DE) genes. However, the reproducibility of DE genes identified in different studies for a disease is typically very low. To tackle the problem, we could evaluate the reproducibility of DE genes across studies and define robust markers for disease diagnosis using disease-associated protein–protein interaction (PPI) subnetwork. Using datasets for four cancer types, we found that the most significant DE genes in cancer exhibit consistent up- or down-regulation in different datasets. For each cancer type, the 5 (or 10) most significant DE genes separately extracted from different datasets tend to be significantly coexpressed and closely connected in the PPI subnetwork, thereby indicating that they are highly reproducible at the PPI level. Consequently, we were able to build robust subnetwork-based classifiers for cancer diagnosis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号