共查询到20条相似文献,搜索用时 0 毫秒
1.
Predicting potential drug-target interactions from heterogeneous biological data is critical not only for better understanding of the various interactions and biological processes, but also for the development of novel drugs and the improvement of human medicines. In this paper, the method of Network-based Random Walk with Restart on the Heterogeneous network (NRWRH) is developed to predict potential drug-target interactions on a large scale under the hypothesis that similar drugs often target similar target proteins and the framework of Random Walk. Compared with traditional supervised or semi-supervised methods, NRWRH makes full use of the tool of the network for data integration to predict drug-target associations. It integrates three different networks (protein-protein similarity network, drug-drug similarity network, and known drug-target interaction networks) into a heterogeneous network by known drug-target interactions and implements the random walk on this heterogeneous network. When applied to four classes of important drug-target interactions including enzymes, ion channels, GPCRs and nuclear receptors, NRWRH significantly improves previous methods in terms of cross-validation and potential drug-target interaction prediction. Excellent performance enables us to suggest a number of new potential drug-target interactions for drug development. 相似文献
2.
Background
Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types.Results
We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully.Conclusions
Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.3.
4.
Background
Protein-protein interactions are crucially important for cellular processes. Knowledge of these interactions improves the understanding of cell cycle, metabolism, signaling, transport, and secretion. Information about interactions can hint at molecular causes of diseases, and can provide clues for new therapeutic approaches. Several (usually expensive and time consuming) experimental methods can probe protein - protein interactions. Data sets, derived from such experiments make the development of prediction methods feasible, and make the creation of protein-protein interaction network predicting tools possible. 相似文献5.
Bo Xu Kun Li Wei Zheng Xiaoxia Liu Yijia Zhang Zhehuan Zhao Zengyou He 《BMC bioinformatics》2018,19(1):535
Background
Identifying protein complexes from protein-protein interaction (PPI) network is one of the most important tasks in proteomics. Existing computational methods try to incorporate a variety of biological evidences to enhance the quality of predicted complexes. However, it is still a challenge to integrate different types of biological information into the complexes discovery process under a unified framework. Recently, attributed network embedding methods have be proved to be remarkably effective in generating vector representations for nodes in the network. In the transformed vector space, both the topological proximity and node attributed affinity between different nodes are preserved. Therefore, such attributed network embedding methods provide us a unified framework to integrate various biological evidences into the protein complexes identification process.Results
In this article, we propose a new method called GANE to predict protein complexes based on Gene Ontology (GO) attributed network embedding. Firstly, it learns the vector representation for each protein from a GO attributed PPI network. Based on the pair-wise vector representation similarity, a weighted adjacency matrix is constructed. Secondly, it uses the clique mining method to generate candidate cores. Consequently, seed cores are obtained by ranking candidate cores based on their densities on the weighted adjacency matrix and removing redundant cores. For each seed core, its attachments are the proteins with correlation score that is larger than a given threshold. The combination of a seed core and its attachment proteins is reported as a predicted protein complex by the GANE algorithm. For performance evaluation, we compared GANE with six protein complex identification methods on five yeast PPI networks. Experimental results showes that GANE performs better than the competing algorithms in terms of different evaluation metrics.Conclusions
GANE provides a framework that integrate many valuable and different biological information into the task of protein complex identification. The protein vector representation learned from our attributed PPI network can also be used in other tasks, such as PPI prediction and disease gene prediction.6.
Background
Predicting drug-protein interactions from heterogeneous biological data sources is a key step for in silico drug discovery. The difficulty of this prediction task lies in the rarity of known drug-protein interactions and myriad unknown interactions to be predicted. To meet this challenge, a manifold regularization semi-supervised learning method is presented to tackle this issue by using labeled and unlabeled information which often generates better results than using the labeled data alone. Furthermore, our semi-supervised learning method integrates known drug-protein interaction network information as well as chemical structure and genomic sequence data.Results
Using the proposed method, we predicted certain drug-protein interactions on the enzyme, ion channel, GPCRs, and nuclear receptor data sets. Some of them are confirmed by the latest publicly available drug targets databases such as KEGG.Conclusions
We report encouraging results of using our method for drug-protein interaction network reconstruction which may shed light on the molecular interaction inference and new uses of marketed drugs.7.
8.
J. B. de Miranda-Filho L. J. Chaves 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1991,81(2):265-271
Summary Procedures for selecting among parental varieties to be used in the synthesis of composites are discussed. In addition to the criterion based on the mean and variance of composites of the same size (k) proposed by Cordoso (1976), we suggest the index Ij=w1vj+w2 j or Ij=(2/k) Ij for a preliminary selection among parental varieties. We show that by increasing k (size of the composite) Ij tends to gj, the general combining ability effect. Such a criterion is particularly important when n, the number of parental varieties, is large, so that the number of possible composites (Nc=2n–n–1) becomes too large to be handled when using the common prediction procedures. Yield data from a 9 × 9 variety diallel cross were used for illustration. 相似文献
9.
Yuyang Jiang;Jing Wang;Yixin Zhang;Zhi Wei Cao;Qinglong Zhang;Jinsong Su;Song He;Xiaochen Bo 《中国科学:生命科学英文版》2025,(2):527-540
The concept of synthetic lethality(SL) has been successfully used for targeted therapies. To further explore SL for cancer therapy, identifying more SL interactions with therapeutic potential are essential. Recently, graph neural network-based deep learning methods have been proposed for SL prediction, which reduce the SL search space of wet-lab based methods. However, these methods ignore that most SL interactions depend strongly on genetic context, which limits the application of the predicted results. In this study, we proposed a graph recurrent network-based model for specific context-dependent SL prediction(SLGRN). In particular, we introduced a Graph Recurrent Network-based encoder to acquire a context-specific, low-dimensional feature representation for each node, facilitating the prediction of novel SL. SLGRN leveraged gate recurrent unit(GRU) and it incorporated a context-dependent-level state to effectively integrate information from all nodes. As a result, SLGRN outperforms the state-of-the-arts models for SL prediction. We subsequently validate novel SL interactions under different contexts based on combination therapy or patient survival analysis. Through in vitro experiments and retrospective clinical analysis, we emphasize the potential clinical significance of this context-specific SL prediction model. 相似文献
10.
MOTIVATION: High-throughput protein interaction detection methods are strongly affected by false positive and false negative results. Focused experiments are needed to complement the large-scale methods by validating previously detected interactions but it is often difficult to decide which proteins to probe as interaction partners. Developing reliable computational methods assisting this decision process is a pressing need in bioinformatics. RESULTS: We show that we can use the conserved properties of the protein network to identify and validate interaction candidates. We apply a number of machine learning algorithms to the protein connectivity information and achieve a surprisingly good overall performance in predicting interacting proteins. Using a 'leave-one-out' approach we find average success rates between 20 and 40% for predicting the correct interaction partner of a protein. We demonstrate that the success of these methods is based on the presence of conserved interaction motifs within the network. AVAILABILITY: A reference implementation and a table with candidate interacting partners for each yeast protein are available at http://www.protsuggest.org. 相似文献
11.
12.
Background
Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge.Results
In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence.Conclusions
The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases.13.
MicroRNAs (miRNAs) suppress gene expression by forming a duplex with a target messenger RNA (mRNA), blocking translation or initiating cleavage. Computational approaches have proven valuable for predicting which mRNAs can be targeted by a given miRNA, but currently available prediction methods do not address the extent of duplex formation under physiological conditions. Some miRNAs can at low concentrations bind to target mRNAs, whereas others are unlikely to bind within a physiologically relevant concentration range. Here we present a novel approach in which we find potential target sites on mRNA that minimize the calculated free energy of duplex formation, compute the free energy change involved in unfolding these sites, and use these energies to estimate the extent of duplex formation at specified initial concentrations of both species. We compare our predictions to experimentally confirmed miRNA-mRNA interactions (and non-interactions) in Drosophila melanogaster and in human. Although our method does not predict whether the targeted mRNA is degraded and/or its translation to protein inhibited, our quantitative estimates generally track experimentally supported results, indicating that this approach can be used to predict whether an interaction occurs at specified concentrations. Our approach offers a more-quantitative understanding of post-translational regulation in different cell types, tissues, and developmental conditions. 相似文献
14.
MOTIVATION: We are motivated by the fast-growing number of protein structures in the Protein Data Bank with necessary information for prediction of protein-protein interaction sites to develop methods for identification of residues participating in protein-protein interactions. We would like to compare conditional random fields (CRFs)-based method with conventional classification-based methods that omit the relation between two labels of neighboring residues to show the advantages of CRFs-based method in predicting protein-protein interaction sites. RESULTS: The prediction of protein-protein interaction sites is solved as a sequential labeling problem by applying CRFs with features including protein sequence profile and residue accessible surface area. The CRFs-based method can achieve a comparable performance with state-of-the-art methods, when 1276 nonredundant hetero-complex protein chains are used as training and test set. Experimental result shows that CRFs-based method is a powerful and robust protein-protein interaction site prediction method and can be used to guide biologists to make specific experiments on proteins. AVAILABILITY: http://www.insun.hit.edu.cn/~mhli/site_CRFs/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献
15.
This paper proposes an efficient ensemble system to tackle the protein secondary structure prediction problem with neural networks as base classifiers. The experimental results show that the multi-layer system can lead to better results. When deploying more accurate classifiers, the higher accuracy of the ensemble system can be obtained. 相似文献
16.
Background
Identifying associated phenotypes of proteins is a challenge of the modern genetics since the multifactorial trait often results from contributions of many proteins. Besides the high-through phenotype assays, the computational methods are alternative ways to identify the phenotypes of proteins.Methodology/Principal Findings
Here, we proposed a new method for predicting protein phenotypes in yeast based on protein-protein interaction network. Instead of only the most likely phenotype, a series of possible phenotypes for the query protein were generated and ranked acording to the tethering potential score. As a result, the first order prediction accuracy of our method achieved 65.4% evaluated by Jackknife test of 1,267 proteins in budding yeast, much higher than the success rate (15.4%) of a random guess. And the likelihood of the first 3 predicted phenotypes including all the real phenotypes of the proteins was 70.6%.Conclusions/Significance
The candidate phenotypes predicted by our method provided useful clues for the further validation. In addition, the method can be easily applied to the prediction of protein associated phenotypes in other organisms. 相似文献17.
Protein-protein interactions (PPIs), involved in many biological processes such as cellular signaling, are ultimately encoded in the genome. Solving the problem of predicting protein interactions from the genome sequence will lead to increased understanding of complex networks, evolution and human disease. We can learn the relationship between genomes and networks by focusing on an easily approachable subset of high-resolution protein interactions that are mediated by peptide recognition modules (PRMs) such as PDZ, WW and SH3 domains. This review focuses on computational prediction and analysis of PRM-mediated networks and discusses sequence- and structure-based interaction predictors, techniques and datasets for identifying physiologically relevant PPIs, and interpreting high-resolution interaction networks in the context of evolution and human disease. 相似文献
18.
Background
Current knowledge and data on miRNA-lncRNA interactions is still limited and little effort has been made to predict target lncRNAs of miRNAs. Accumulating evidences suggest that the interaction patterns between lncRNAs and miRNAs are closely related to relative expression level, forming a titration mechanism. It could provide an effective approach for characteristic feature extraction. In addition, using the coding non-coding co-expression network and sequence data could also help to measure the similarities among miRNAs and lncRNAs. By mathematically analyzing these types of similarities, we come up with two findings that (i) lncRNAs/miRNAs tend to collaboratively interact with miRNAs/lncRNAs of similar expression profiles, and vice versa, and (ii) those miRNAs interacting with a cluster of common target genes tend to jointly target at the common lncRNAs.Methods
In this work, we developed a novel group preference Bayesian collaborative filtering model called GBCF for picking up a top-k probability ranking list for an individual miRNA or lncRNA based on the known miRNA-lncRNA interaction network.Results
To evaluate the effectiveness of GBCF, leave-one-out and k-fold cross validations as well as a series of comparison experiments were carried out. GBCF achieved the values of area under ROC curve of 0.9193, 0.8354+/??0.0079, 0.8615+/??0.0078, and 0.8928+/??0.0082 based on leave-one-out, 2-fold, 5-fold, and 10-fold cross validations respectively, demonstrating its reliability and robustness.Conclusions
GBCF could be used to select potential lncRNA targets of specific miRNAs and offer great insights for further researches on ceRNA regulation network.19.
Geurts P Fillet M de Seny D Meuwis MA Malaise M Merville MP Wehenkel L 《Bioinformatics (Oxford, England)》2005,21(14):3138-3145
MOTIVATION: Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to diagnose the current state or predict the evolution of a disease. Recent developments in machine learning allow one to exploit such datasets, characterized by small numbers of very high-dimensional samples. RESULTS: We propose a systematic approach based on decision tree ensemble methods, which is used to automatically determine proteomic biomarkers and predictive models. The approach is validated on two datasets of surface-enhanced laser desorption/ionization time of flight measurements, for the diagnosis of rheumatoid arthritis and inflammatory bowel diseases. The results suggest that the methodology can handle a broad class of similar problems. 相似文献
20.