首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.

Background

The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult.

Methods

We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks.

Results

We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks.

Conclusions

The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.
  相似文献   

2.

Objectives

To characterize biomarkers that underlie osteosarcoma (OS) metastasis based on an ego-network.

Results

From the microarray data, we obtained 13,326 genes. By combining PPI data and microarray data, 10,520 shared genes were found and constructed into ego-networks. 17 significant ego-networks were identified with p < 0.05. In the pathway enrichment analysis, seven ego-networks were identified with the most significant pathway.

Conclusions

These significant ego-modules were potential biomarkers that reveal the potential mechanisms in OS metastasis, which may contribute to understanding cancer prognoses and providing new perspectives in the treatment of cancer.
  相似文献   

3.

Background

Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge.

Results

In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence.

Conclusions

The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases.
  相似文献   

4.
5.

Background

Cardiac hypertrophy and acute myocardial infarction (AMI) are two common heart diseases worldwide. However, research is needed into the exact pathogenesis and effective treatment strategies for these diseases. Recently, microRNAs (miRNAs) have been suggested to regulate the pathological pathways of heart disease, indicating a potential role in novel treatments.

Results

In our study, we constructed a miRNA-gene-drug network and analyzed its topological features. We also identified some significantly dysregulated miRNA-gene-drug triplets (MGDTs) in cardiac hypertrophy and AMI using a computational method. Then, we characterized the activity score profile features for MGDTs in cardiac hypertrophy and AMI. The functional analyses suggested that the genes in the network held special functions. We extracted an insulin-like growth factor 1 receptor-related subnetwork in cardiac hypertrophy and a vascular endothelial growth factor A-related subnetwork in AMI. Finally, we considered insulin-like growth factor 1 receptor and vascular endothelial growth factor A as two candidate drug targets by utilizing the cardiac hypertrophy and AMI pathways.

Conclusion

These results provide novel insights into the mechanisms and treatment of cardiac hypertrophy and AMI.
  相似文献   

6.

Background

Although post-traumatic stress disorder (PTSD) is primarily a mental disorder, it can cause additional symptoms that do not seem to be directly related to the central nervous system, which PTSD is assumed to directly affect. PTSD-mediated heart diseases are some of such secondary disorders. In spite of the significant correlations between PTSD and heart diseases, spatial separation between the heart and brain (where PTSD is primarily active) prevents researchers from elucidating the mechanisms that bridge the two disorders. Our purpose was to identify genes linking PTSD and heart diseases.

Methods

In this study, gene expression profiles of various murine tissues observed under various types of stress or without stress were analyzed in an integrated manner using tensor decomposition (TD).

Results

Based upon the obtained features, ~?400 genes were identified as candidate genes that may mediate heart diseases associated with PTSD. Various gene enrichment analyses supported biological reliability of the identified genes. Ten genes encoding protein-, DNA-, or mRNA-interacting proteins—ILF2, ILF3, ESR1, ESR2, RAD21, HTT, ATF2, NR3C1, TP53, and TP63—were found to be likely to regulate expression of most of these ~?400 genes and therefore are candidate primary genes that cause PTSD-mediated heart diseases. Approximately 400 genes in the heart were also found to be strongly affected by various drugs whose known adverse effects are related to heart diseases and/or fear memory conditioning; these data support the reliability of our findings.

Conclusions

TD-based unsupervised feature extraction turned out to be a useful method for gene selection and successfully identified possible genes causing PTSD-mediated heart diseases.
  相似文献   

7.

Background

Pancreatic cancer is one of the most lethal tumors with poor prognosis, and lacks of effective biomarkers in diagnosis and treatment. The aim of this investigation was to identify hub genes in pancreatic cancer, which would serve as potential biomarkers for cancer diagnosis and therapy in the future.

Methods

Combination of two expression profiles of GSE16515 and GSE22780 from Gene Expression Omnibus (GEO) database was served as training set. Differentially expressed genes (DEGs) with top 25% variance followed by protein-protein interaction (PPI) network were performed to find candidate genes. Then, hub genes were further screened by survival and cox analyses in The Cancer Genome Atlas (TCGA) database. Finally, hub genes were validated in GSE15471 dataset from GEO by supervised learning methods k-nearest neighbor (kNN) and random forest algorithms.

Results

After quality control and batch effect elimination of training set, 181 DEGs bearing top 25% variance were identified as candidate genes. Then, two hub genes, MMP7 and ITGA2, correlating with diagnosis and prognosis of pancreatic cancer were screened as hub genes according to above-mentioned bioinformatics methods. Finally, hub genes were demonstrated to successfully differ tumor samples from normal tissues with predictive accuracies reached to 93.59 and 81.31% by using kNN and random forest algorithms, respectively.

Conclusions

All the hub genes were associated with the regulation of tumor microenvironment, which implicated in tumor proliferation, progression, migration, and metastasis. Our results provide a novel prospect for diagnosis and treatment of pancreatic cancer, which may have a further application in clinical.
  相似文献   

8.

Background

Appropriate definitionof neural network architecture prior to data analysis is crucialfor successful data mining. This can be challenging when the underlyingmodel of the data is unknown. The goal of this study was to determinewhether optimizing neural network architecture using genetic programmingas a machine learning strategy would improve the ability of neural networksto model and detect nonlinear interactions among genes in studiesof common human diseases.

Results

Using simulateddata, we show that a genetic programming optimized neural network approachis able to model gene-gene interactions as well as a traditionalback propagation neural network. Furthermore, the genetic programmingoptimized neural network is better than the traditional back propagationneural network approach in terms of predictive ability and powerto detect gene-gene interactions when non-functional polymorphismsare present.

Conclusion

This study suggeststhat a machine learning strategy for optimizing neural network architecturemay be preferable to traditional trial-and-error approaches forthe identification and characterization of gene-gene interactionsin common, complex human diseases.
  相似文献   

9.
Wang J  Xie D  Lin H  Yang Z  Zhang Y 《Proteome science》2012,10(Z1):S18

Background

Many biological processes recognize in particular the importance of protein complexes, and various computational approaches have been developed to identify complexes from protein-protein interaction (PPI) networks. However, high false-positive rate of PPIs leads to challenging identification.

Results

A protein semantic similarity measure is proposed in this study, based on the ontology structure of Gene Ontology (GO) terms and GO annotations to estimate the reliability of interactions in PPI networks. Interaction pairs with low GO semantic similarity are removed from the network as unreliable interactions. Then, a cluster-expanding algorithm is used to detect complexes with core-attachment structure on filtered network. Our method is applied to three different yeast PPI networks. The effectiveness of our method is examined on two benchmark complex datasets. Experimental results show that our method performed better than other state-of-the-art approaches in most evaluation metrics.

Conclusions

The method detects protein complexes from large scale PPI networks by filtering GO semantic similarity. Removing interactions with low GO similarity significantly improves the performance of complex identification. The expanding strategy is also effective to identify attachment proteins of complexes.
  相似文献   

10.

Background

Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome.

Results

Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data.

Conclusion

Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.
  相似文献   

11.
12.

Background

Chromophobe renal cell carcinoma (ChRCC) is the second common subtype of non-clear cell renal cell carcinoma (nccRCC), which accounting for 4–5% of renal cell carcinoma (RCC). However, there is no effective bio-marker to predict clinical outcomes of this malignant disease. Bioinformatic methods may provide a feasible potential to solve this problem.

Methods

In this study, differentially expressed genes (DEGs) of ChRCC samples on The Cancer Genome Atlas database were filtered out to construct co-expression modules by weighted gene co-expression network analysis and the key module were identified by calculating module-trait correlations. Functional analysis was performed on the key module and candidate hub genes were screened out by co-expression and MCODE analysis. Afterwards, real hub genes were filter out in an independent dataset GSE15641 and validated by survival analysis.

Results

Overall 2215 DEGs were screened out to construct eight co-expression modules. Brown module was identified as the key module for the highest correlations with pathologic stage, neoplasm status and survival status. 29 candidate hub genes were identified. GO and KEGG analysis demonstrated most candidate genes were enriched in mitotic cell cycle. Three real hub genes (SKA1, ERCC6L, GTSE-1) were selected out after mapping candidate genes to GSE15641 and two of them (SKA1, ERCC6L) were significantly related to overall survivals of ChRCC patients.

Conclusions

In summary, our findings identified molecular markers correlated with progression and prognosis of ChRCC, which might provide new implications for improving risk evaluation, therapeutic intervention, and prognosis prediction in ChRCC patients.
  相似文献   

13.

Background

Protein kinase C ζ (PKCζ), an isoform of the atypical protein kinase C, is a pivotal regulator in cancer. However, the molecular and cellular mechanisms whereby PKCζ regulates tumorigenesis and metastasis are still not fully understood. In this study, proteomics and bioinformatics analyses were performed to establish a protein-protein interaction (PPI) network associated with PKCζ, laying a stepping stone to further understand the diverse biological roles of PKCζ.

Methods

Protein complexes associated with PKCζ were purified by co-immunoprecipitation from breast cancer cell MDA-MB-231 and identified by LC-MS/MS. Two biological replicates and two technical replicates were analyzed. The observed proteins were filtered using the CRAPome database to eliminate the potential false positives. The proteomics identification results were combined with PPI database search to construct the interactome network. Gene ontology (GO) and pathway analysis were performed by PANTHER database and DAVID. Next, the interaction between PKCζ and protein phosphatase 2 catalytic subunit alpha (PPP2CA) was validated by co-immunoprecipitation, Western blotting and immunofluorescence. Furthermore, the TCGA database and the COSMIC database were used to analyze the expressions of these two proteins in clinical samples.

Results

The PKCζ centered PPI network containing 178 nodes and 1225 connections was built. Network analysis showed that the identified proteins were significantly associated with several key signaling pathways regulating cancer related cellular processes.

Conclusions

Through combining the proteomics and bioinformatics analyses, a PKCζ centered PPI network was constructed, providing a more complete picture regarding the biological roles of PKCζ in both cancer regulation and other aspects of cellular biology.
  相似文献   

14.

Background

A genetic study was performed to identify candidate genes associated with day blindness in the standard wire haired dachshund. Based on a literature review of diseases in dogs and human with phenotypes similar to day blindness, ten genes were selected and evaluated as potential candidate genes associated with day blindness in the breed.

Results

Three of the genes, CNGB3, CNGA3 and GNAT2, involved in cone degeneration and seven genes and loci, ABCA4, RDH5, CORD8, CORD9, RPGRIP1, GUCY2D and CRX, reported to be involved in cone-rod dystrophies were studied. Polymorphic markers at each of the candidate loci were studied in a family with 36 informative offspring. The study revealed a high frequency of recombinations between the candidate marker alleles and the disease.

Conclusion

Since all of the markers were at the exact position of the candidate loci, and several recombinations were detected for each of the loci, all ten genes were excluded as causal for this canine, early onset cone-rod dystrophy. The described markers may, however, be useful to screen other canine resource families segregating eye diseases for association to the ten genes.
  相似文献   

15.

Background

Protein complexes play an important role in biological processes. Recent developments in experiments have resulted in the publication of many high-quality, large-scale protein-protein interaction (PPI) datasets, which provide abundant data for computational approaches to the prediction of protein complexes. However, the precision of protein complex prediction still needs to be improved due to the incompletion and noise in PPI networks.

Results

There exist complex and diverse relationships among proteins after integrating multiple sources of biological information. Considering that the influences of different types of interactions are not the same weight for protein complex prediction, we construct a multi-relationship protein interaction network (MPIN) by integrating PPI network topology with gene ontology annotation information. Then, we design a novel algorithm named MINE (identifying protein complexes based on Multi-relationship protein Interaction NEtwork) to predict protein complexes with high cohesion and low coupling from MPIN.

Conclusions

The experiments on yeast data show that MINE outperforms the current methods in terms of both accuracy and statistical significance.
  相似文献   

16.

Background

Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes.

Results

We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases.

Conclusions

In this study, we show that node embeddings learned from PPI networks work well for disease gene prediction, while integrating node embeddings with other biological annotations further improves the performance of classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction.
  相似文献   

17.
18.
Ou-Yang  Le  Yan  Hong  Zhang  Xiao-Fei 《BMC bioinformatics》2017,18(13):463-34

Background

The accurate identification of protein complexes is important for the understanding of cellular organization. Up to now, computational methods for protein complex detection are mostly focus on mining clusters from protein-protein interaction (PPI) networks. However, PPI data collected by high-throughput experimental techniques are known to be quite noisy. It is hard to achieve reliable prediction results by simply applying computational methods on PPI data. Behind protein interactions, there are protein domains that interact with each other. Therefore, based on domain-protein associations, the joint analysis of PPIs and domain-domain interactions (DDI) has the potential to obtain better performance in protein complex detection. As traditional computational methods are designed to detect protein complexes from a single PPI network, it is necessary to design a new algorithm that could effectively utilize the information inherent in multiple heterogeneous networks.

Results

In this paper, we introduce a novel multi-network clustering algorithm to detect protein complexes from multiple heterogeneous networks. Unlike existing protein complex identification algorithms that focus on the analysis of a single PPI network, our model can jointly exploit the information inherent in PPI and DDI data to achieve more reliable prediction results. Extensive experiment results on real-world data sets demonstrate that our method can predict protein complexes more accurately than other state-of-the-art protein complex identification algorithms.

Conclusions

In this work, we demonstrate that the joint analysis of PPI network and DDI network can help to improve the accuracy of protein complex detection.
  相似文献   

19.

Background

Alcoholism is a complex disease. There have been many reports on significant comorbidity between alcoholism and schizophrenia. For the genetic study of complex diseases, association analysis has been recommended because of its higher power than that of the linkage analysis for detecting genes with modest effects on disease.

Results

To identify alcoholism susceptibility loci, we performed genome-wide single-nucleotide polymorphisms (SNP) association tests, which yielded 489 significant SNPs at the 1% significance level. The association tests showed that tsc0593964 (P-value 0.000013) on chromosome 7 was most significantly associated with alcoholism. From 489 SNPs, 74 genes were identified. Among these genes, GABRA1 is a member of the same gene family with GABRA2 that was recently reported as alcoholism susceptibility gene.

Conclusion

By comparing 74 genes to the published results of various linkage studies of schizophrenia, we identified 13 alcoholism associated genes that were located in the regions reported to be linked to schizophrenia. These 13 identified genes can be important candidate genes to study the genetic mechanism of co-occurrence of both diseases.
  相似文献   

20.

Background

The integration of high-quality, genome-wide analyses offers a robust approach to elucidating genetic factors involved in complex human diseases. Even though several methods exist to integrate heterogeneous omics data, most biologists still manually select candidate genes by examining the intersection of lists of candidates stemming from analyses of different types of omics data that have been generated by imposing hard (strict) thresholds on quantitative variables, such as P-values and fold changes, increasing the chance of missing potentially important candidates.

Methods

To better facilitate the unbiased integration of heterogeneous omics data collected from diverse platforms and samples, we propose a desirability function framework for identifying candidate genes with strong evidence across data types as targets for follow-up functional analysis. Our approach is targeted towards disease systems with sparse, heterogeneous omics data, so we tested it on one such pathology: spontaneous preterm birth (sPTB).

Results

We developed the software integRATE, which uses desirability functions to rank genes both within and across studies, identifying well-supported candidate genes according to the cumulative weight of biological evidence rather than based on imposition of hard thresholds of key variables. Integrating 10 sPTB omics studies identified both genes in pathways previously suspected to be involved in sPTB as well as novel genes never before linked to this syndrome. integRATE is available as an R package on GitHub (https://github.com/haleyeidem/integRATE).

Conclusions

Desirability-based data integration is a solution most applicable in biological research areas where omics data is especially heterogeneous and sparse, allowing for the prioritization of candidate genes that can be used to inform more targeted downstream functional analyses.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号