首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Substantial evidence has shown that microRNAs (miRNAs) may be causally linked to the occurrence and progression of human diseases. Herein, we conducted an enrichment analysis to identify potential functional miRNA-disease associations (MDAs) in humans by integrating currently known biological data: miRNA-target interactions (MTIs), protein-protein interactions, and gene-disease associations. Two contributing factors to functional miRNA-disease associations were quantitatively considered: the direct effects of miRNA that target disease-related genes, and indirect effects triggered by protein-protein interactions. Ninety-nine miRNAs were scanned for possible functional association with 2223 MeSH-defined human diseases. Each miRNA was experimentally validated to target ≥ 10 mRNA genes. Putative MDAs were identified when at least one MTI was confidently validated for a disease. Overall, 19648 putative MDAs were found, of which 10.0% was experimentally validated. Further results suggest that filtering for miRNAs that target a greater number of disease-related genes (n ≥ 8) can significantly enrich for true MDAs from the set of putative associations (enrichment rate = 60.7%, adjusted hypergeometric p = 2.41×10−91). Considering the indirect effects of miRNAs further elevated the enrichment rate to 72.6%. By using this method, a novel MDA between miR-24 and ovarian cancer was found. Compared with scramble miRNA overexpression of miR-24 was validated to remarkably induce ovarian cancer cells apoptosis. Our study provides novel insight into factors contributing to functional MDAs by integrating large quantities of previously generated biological data, and establishes a feasible method to identify plausible associations with high confidence.  相似文献   

2.
Assigning functions to unknown proteins is one of the most important problems in proteomics. Several approaches have used protein-protein interaction data to predict protein functions. We previously developed a Markov random field (MRF) based method to infer a protein's functions using protein-protein interaction data and the functional annotations of its protein interaction partners. In the original model, only direct interactions were considered and each function was considered separately. In this study, we develop a new model which extends direct interactions to all neighboring proteins, and one function to multiple functions. The goal is to understand a protein's function based on information on all the neighboring proteins in the interaction network. We first developed a novel kernel logistic regression (KLR) method based on diffusion kernels for protein interaction networks. The diffusion kernels provide means to incorporate all neighbors of proteins in the network. Second, we identified a set of functions that are highly correlated with the function of interest, referred to as the correlated functions, using the chi-square test. Third, the correlated functions were incorporated into our new KLR model. Fourth, we extended our model by incorporating multiple biological data sources such as protein domains, protein complexes, and gene expressions by converting them into networks. We showed that the KLR approach of incorporating all protein neighbors significantly improved the accuracy of protein function predictions over the MRF model. The incorporation of multiple data sets also improved prediction accuracy. The prediction accuracy is comparable to another protein function classifier based on the support vector machine (SVM), using a diffusion kernel. The advantages of the KLR model include its simplicity as well as its ability to explore the contribution of neighbors to the functions of proteins of interest.  相似文献   

3.
BackgroundThere is a growing body of evidence associating microRNAs (miRNAs) with human diseases. MiRNAs are new key players in the disease paradigm demonstrating roles in several human diseases. The functional association between miRNAs and diseases remains largely unclear and far from complete. With the advent of high-throughput functional genomics techniques that infer genes and biological pathways dysregulted in diseases, it is now possible to infer functional association between diseases and biological molecules by integrating disparate biological information.ResultsHere, we first used Lasso regression model to identify miRNAs associated with disease signature as a proof of concept. Then we proposed an integrated approach that uses disease-gene associations from microarray experiments and text mining, and miRNA-gene association from computational predictions and protein networks to build functional associations network between miRNAs and diseases. The findings of the proposed model were validated against gold standard datasets using ROC analysis and results were promising (AUC=0.81). Our protein network-based approach discovered 19 new functional associations between prostate cancer and miRNAs. The new 19 associations were validated using miRNA expression data and clinical profiles and showed to act as diagnostic and prognostic prostate biomarkers. The proposed integrated approach allowed us to reconstruct functional associations between miRNAs and human diseases and uncovered functional roles of newly discovered miRNAs.ConclusionsLasso regression was used to find associations between diseases and miRNAs using their gene signature. Defining miRNA gene signature by integrating the downstream effect of miRNAs demonstrated better performance than the miRNA signature alone. Integrating biological networks and multiple data to define miRNA and disease gene signature demonstrated high performance to uncover new functional associations between miRNAs and diseases.  相似文献   

4.
Using indirect protein-protein interactions for protein complex prediction   总被引:1,自引:0,他引:1  
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein-protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein-protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.  相似文献   

5.
Plant protein-protein interaction networks have not been identified by large-scale experiments. In order to better understand the protein interactions in rice, the Predicted Rice Interactome Network (PRIN; http://bis.zju.edu.cn/prin/) presented 76,585 predicted interactions involving 5,049 rice proteins. After mapping genomic features of rice (GO annotation, subcellular localization prediction, and gene expression), we found that a well-annotated and biologically significant network is rich enough to capture many significant functional linkages within higher-order biological systems, such as pathways and biological processes. Furthermore, we took MADS-box domain-containing proteins and circadian rhythm signaling pathways as examples to demonstrate that functional protein complexes and biological pathways could be effectively expanded in our predicted network. The expanded molecular network in PRIN has considerably improved the capability of these analyses to integrate existing knowledge and provide novel insights into the function and coordination of genes and gene networks.  相似文献   

6.
A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE''s predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation.  相似文献   

7.
Greedily building protein networks with confidence   总被引:2,自引:0,他引:2  
MOTIVATION: With genome sequences complete for human and model organisms, it is essential to understand how individual genes and proteins are organized into biological networks. Much of the organization is revealed by proteomics experiments that now generate torrents of data. Extracting relevant complexes and pathways from high-throughput proteomics data sets has posed a challenge, however, and new methods to identify and extract networks are essential. We focus on the problem of building pathways starting from known proteins of interest. RESULTS: We have developed an efficient, greedy algorithm, SEEDY, that extracts biologically relevant biological networks from protein-protein interaction data, building out from selected seed proteins. The algorithm relies on our previous study establishing statistical confidence levels for interactions generated by two-hybrid screens and inferred from mass spectrometric identification of protein complexes. We demonstrate the ability to extract known yeast complexes from high-throughput protein interaction data with a tunable parameter that governs the trade-off between sensitivity and selectivity. DNA damage repair pathways are presented as a detailed example. We highlight the ability to join heterogeneous data sets, in this case protein-protein interactions and genetic interactions, and the appearance of cross-talk between pathways caused by re-use of shared components. SIGNIFICANCE AND COMPARISON: The significance of the SEEDY algorithm is that it is fast, running time O[(E + V) log V] for V proteins and E interactions, a single adjustable parameter controls the size of the pathways that are generated, and an associated P-value indicates the statistical confidence that the pathways are enriched for proteins with a coherent function. Previous approaches have focused on extracting sub-networks by identifying motifs enriched in known biological networks. SEEDY provides the complementary ability to perform a directed search based on proteins of interest. AVAILABILITY: SEEDY software (Perl source), data tables and confidence score models (R source) are freely available from the author.  相似文献   

8.
The biomedical literature contains a wealth of information on associations between many different types of objects, such as protein-protein interactions, gene-disease associations and subcellular locations of proteins. When searching such information using conventional search engines, e.g. PubMed, users see the data only one-abstract at a time and 'hidden' in natural language text. AliBaba is an interactive tool for graphical summarization of search results. It parses the set of abstracts that fit a PubMed query and presents extracted information on biomedical objects and their relationships as a graphical network. AliBaba extracts associations between cells, diseases, drugs, proteins, species and tissues. Several filter options allow for a more focused search. Thus, researchers can grasp complex networks described in various articles at a glance. AVAILABILITY: http://alibaba.informatik.hu-berlin.de/  相似文献   

9.
Large-scale protein-protein interaction data sets have been generated for several species including yeast and human and have enabled the identification, quantification, and prediction of cellular molecular networks. Affinity purification-mass spectrometry (AP-MS) is the preeminent methodology for large-scale analysis of protein complexes, performed by immunopurifying a specific "bait" protein and its associated "prey" proteins. The analysis and interpretation of AP-MS data sets is, however, not straightforward. In addition, although yeast AP-MS data sets are relatively comprehensive, current human AP-MS data sets only sparsely cover the human interactome. Here we develop a framework for analysis of AP-MS data sets that addresses the issues of noise, missing data, and sparsity of coverage in the context of a current, real world human AP-MS data set. Our goal is to extend and increase the density of the known human interactome by integrating bait-prey and cocomplexed preys (prey-prey associations) into networks. Our framework incorporates a score for each identified protein, as well as elements of signal processing to improve the confidence of identified protein-protein interactions. We identify many protein networks enriched in known biological processes and functions. In addition, we show that integrated bait-prey and prey-prey interactions can be used to refine network topology and extend known protein networks.  相似文献   

10.
Protein interactions are fundamental to the proper functioning of cells, and aberrant formation or regulation of protein interactions is at the heart of many diseases, including cancer. The advancement of methods to study the identity, function, and regulation of protein complexes makes possible the understanding of how those complexes malfunction in human diseases. New methodologies in mass spectrometry, microscopy, and protein structural analysis are rapidly advancing the amount and quality of the data, as well as the level of detail that can be obtained from experiments. With this progress, the questions that can be addressed and the biological landscape are changing. This series of minireviews highlights methodological advances and how they have been applied in novel ways to explore the function and regulation of pathways and dynamic networks in cells.  相似文献   

11.
MOTIVATION: Recent screening techniques have made large amounts of protein-protein interaction data available, from which biologically important information such as the function of uncharacterized proteins, the existence of novel protein complexes, and novel signal-transduction pathways can be discovered. However, experimental data on protein interactions contain many false positives, making these discoveries difficult. Therefore computational methods of assessing the reliability of each candidate protein-protein interaction are urgently needed. RESULTS: We developed a new 'interaction generality' measure (IG2) to assess the reliability of protein-protein interactions using only the topological properties of their interaction-network structure. Using yeast protein-protein interaction data, we showed that reliable protein-protein interactions had significantly lower IG2 values than less-reliable interactions, suggesting that IG2 values can be used to evaluate and filter interaction data to enable the construction of reliable protein-protein interaction networks.  相似文献   

12.
MOTIVATION: Gene association/interaction networks provide vast amounts of information about essential processes inside the cell. A complete picture of gene-gene associations/interactions would open new horizons for biologists, ranging from pure appreciation to successful manipulation of biological pathways for therapeutic purposes. Therefore, identification of important biological complexes whose members (genes and their products proteins) interact with each other is of prime importance. Numerous experimental methods exist but, for the most part, they are costly and labor intensive. Computational techniques, such as the one proposed in this work, provide a quick 'budget' solution that can be used as a screening tool before more expensive techniques are attempted. Here, we introduce a novel computational method based on the partial least squares (PLS) regression technique for reconstruction of genetic networks from microarray data. RESULTS: The proposed PLS method is shown to be an effective screening procedure for the detection of gene-gene interactions from microarray data. Both simulated and real microarray experiments show that the PLS-based approach is superior to its competitors both in terms of performance and applicability. AVAILABILITY: R code is available from the supplementary web-site whose URL is given below.  相似文献   

13.
Tu K  Yu H  Li YX 《Journal of biotechnology》2006,124(3):475-485
The ever-increasing flow of gene expression profiles and protein-protein interactions has catalyzed many computational approaches for inference of gene functions. Despite all the efforts, there is still room for improvement, for the information enriched in each biological data source has not been exploited to its fullness. A composite method is proposed for classifying unannotated genes based on expression data and protein-protein interaction (PPI) data, which extracts information from both data sources in novel ways. With the noise nature of expression data taken into consideration, importance is attached to the consensus expression patterns of gene classes instead of the actual expression profiles of individual genes, thus characterizing the composite method with enhanced robustness against microarray data variation. With regard to the PPI network, the traditional clear-cut binary attitude towards inter- and intra-functional interactions is abandoned, whereas a more objective perspective into the PPI network structure is formed through incorporating the varied function-function interaction probabilities into the algorithm. The composite method was implemented in two numerical experiments, where its improvement over single-data-source based methods was observed and the superiority of the novel data handling operations was discussed.  相似文献   

14.
Yang P  Li X  Wu M  Kwoh CK  Ng SK 《PloS one》2011,6(7):e21502

Background

Phenotypically similar diseases have been found to be caused by functionally related genes, suggesting a modular organization of the genetic landscape of human diseases that mirrors the modularity observed in biological interaction networks. Protein complexes, as molecular machines that integrate multiple gene products to perform biological functions, express the underlying modular organization of protein-protein interaction networks. As such, protein complexes can be useful for interrogating the networks of phenome and interactome to elucidate gene-phenotype associations of diseases.

Methodology/Principal Findings

We proposed a technique called RWPCN (Random Walker on Protein Complex Network) for predicting and prioritizing disease genes. The basis of RWPCN is a protein complex network constructed using existing human protein complexes and protein interaction network. To prioritize candidate disease genes for the query disease phenotypes, we compute the associations between the protein complexes and the query phenotypes in their respective protein complex and phenotype networks. We tested RWPCN on predicting gene-phenotype associations using leave-one-out cross-validation; our method was observed to outperform existing approaches. We also applied RWPCN to predict novel disease genes for two representative diseases, namely, Breast Cancer and Diabetes.

Conclusions/Significance

Guilt-by-association prediction and prioritization of disease genes can be enhanced by fully exploiting the underlying modular organizations of both the disease phenome and the protein interactome. Our RWPCN uses a novel protein complex network as a basis for interrogating the human phenome-interactome network. As the protein complex network can capture the underlying modularity in the biological interaction networks better than simple protein interaction networks, RWPCN was found to be able to detect and prioritize disease genes better than traditional approaches that used only protein-phenotype associations.  相似文献   

15.
Prediction of molecular interaction networks from large-scale datasets in genomics and other omics experiments is an important task in terms of both developing bioinformatics methods and solving biological problems. We have applied a kernel-based network inference method for extracting functionally related genes to the response of nitrogen deprivation in cyanobacteria Anabaena sp. PCC 7120 integrating three heterogeneous datasets: microarray data, phylogenetic profiles, and gene orders on the chromosome. We obtained 1348 predicted genes that are somehow related to known genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. While this dataset contained previously known genes related to the nitrogen deprivation condition, it also contained additional genes. Thus, we attempted to select any relevant genes using the constraints of Pfam domains and NtcA-binding sites. We found candidates of nitrogen metabolism-related genes, which are depicted as extensions of existing KEGG pathways. The prediction of functional relationships between proteins rather than functions of individual proteins will thus assist the discovery from the large-scale datasets.  相似文献   

16.
The increasing interest in systems biology has resulted in extensive experimental data describing networks of interactions (or associations) between molecules in metabolism, protein-protein interactions and gene regulation. Comparative analysis of these networks is central to understanding biological systems. We report a novel method (PHUNKEE: Pairing subgrapHs Using NetworK Environment Equivalence) by which similar subgraphs in a pair of networks can be identified. Like other methods, PHUNKEE explicitly considers the graphical form of the data and allows for gaps. However, it is novel in that it includes information about the context of the subgraph within the adjacent network. We also explore a new approach to quantifying the statistical significance of matching subgraphs. We report similar subgraphs in metabolic pathways and in protein-protein interaction networks. The most similar metabolic subgraphs were generally found to occur in processes central to all life, such as purine, pyrimidine and amino acid metabolism. The most similar pairs of subgraphs found in the protein-protein interaction networks of Drosophila melanogaster and Saccharomyces cerevisiae also include central processes such as cell division but, interestingly, also include protein sub-networks involved in pre-mRNA processing. The inclusion of network context information in the comparison of protein interaction networks increased the number of similar subgraphs found consisting of proteins involved in the same functional process. This could have implications for the prediction of protein function.  相似文献   

17.

Background

Protein complexes play an important role in biological processes. Recent developments in experiments have resulted in the publication of many high-quality, large-scale protein-protein interaction (PPI) datasets, which provide abundant data for computational approaches to the prediction of protein complexes. However, the precision of protein complex prediction still needs to be improved due to the incompletion and noise in PPI networks.

Results

There exist complex and diverse relationships among proteins after integrating multiple sources of biological information. Considering that the influences of different types of interactions are not the same weight for protein complex prediction, we construct a multi-relationship protein interaction network (MPIN) by integrating PPI network topology with gene ontology annotation information. Then, we design a novel algorithm named MINE (identifying protein complexes based on Multi-relationship protein Interaction NEtwork) to predict protein complexes with high cohesion and low coupling from MPIN.

Conclusions

The experiments on yeast data show that MINE outperforms the current methods in terms of both accuracy and statistical significance.
  相似文献   

18.
MOTIVATION: Biological processes in cells are properly performed by gene regulations, signal transductions and interactions between proteins. To understand such molecular networks, we propose a statistical method to estimate gene regulatory networks and protein-protein interaction networks simultaneously from DNA microarray data, protein-protein interaction data and other genome-wide data. RESULTS: We unify Bayesian networks and Markov networks for estimating gene regulatory networks and protein-protein interaction networks according to the reliability of each biological information source. Through the simultaneous construction of gene regulatory networks and protein-protein interaction networks of Saccharomyces cerevisiae cell cycle, we predict the role of several genes whose functions are currently unknown. By using our probabilistic model, we can detect false positives of high-throughput data, such as yeast two-hybrid data. In a genome-wide experiment, we find possible gene regulatory relationships and protein-protein interactions between large protein complexes that underlie complex regulatory mechanisms of biological processes.  相似文献   

19.

Background

Scientists have been trying to understand the molecular mechanisms of diseases to design preventive and therapeutic strategies for a long time. For some diseases, it has become evident that it is not enough to obtain a catalogue of the disease-related genes but to uncover how disruptions of molecular networks in the cell give rise to disease phenotypes. Moreover, with the unprecedented wealth of information available, even obtaining such catalogue is extremely difficult.

Principal Findings

We developed a comprehensive gene-disease association database by integrating associations from several sources that cover different biomedical aspects of diseases. In particular, we focus on the current knowledge of human genetic diseases including mendelian, complex and environmental diseases. To assess the concept of modularity of human diseases, we performed a systematic study of the emergent properties of human gene-disease networks by means of network topology and functional annotation analysis. The results indicate a highly shared genetic origin of human diseases and show that for most diseases, including mendelian, complex and environmental diseases, functional modules exist. Moreover, a core set of biological pathways is found to be associated with most human diseases. We obtained similar results when studying clusters of diseases, suggesting that related diseases might arise due to dysfunction of common biological processes in the cell.

Conclusions

For the first time, we include mendelian, complex and environmental diseases in an integrated gene-disease association database and show that the concept of modularity applies for all of them. We furthermore provide a functional analysis of disease-related modules providing important new biological insights, which might not be discovered when considering each of the gene-disease association repositories independently. Hence, we present a suitable framework for the study of how genetic and environmental factors, such as drugs, contribute to diseases.

Availability

The gene-disease networks used in this study and part of the analysis are available at http://ibi.imim.es/DisGeNET/DisGeNETweb.html#Download.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号