首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
As one large class of non-coding RNAs(nc RNAs), long nc RNAs(lncRNAs) have gained considerable attention in recent years. Mutations and dysfunction of lnc RNAs have been implicated in human disorders. Many lnc RNAs exert their effects through interactions with the corresponding RNA-binding proteins. Several computational approaches have been developed, but only few are able to perform the prediction of these interactions from a network-based point of view. Here,we introduce a computational method named lnc RNA–protein bipartite network inference(LPBNI). LPBNI aims to identify potential lnc RNA–interacting proteins, by making full use of the known lnc RNA–protein interactions. Leave-one-out cross validation(LOOCV) test shows that LPBNI significantly outperforms other network-based methods, including random walk(RWR)and protein-based collaborative filtering(Pro CF). Furthermore, a case study was performed to demonstrate the performance of LPBNI using real data in predicting potential lnc RNA–interacting proteins.  相似文献   

3.
Computational methods for predicting drug-target interactions have become important in drug research because they can help to reduce the time, cost, and failure rates for developing new drugs. Recently, with the accumulation of drug-related data sets related to drug side effects and pharmacological data, it has became possible to predict potential drug-target interactions. In this study, we focus on drug-drug interactions (DDI), their adverse effects () and pharmacological information (), and investigate the relationship among chemical structures, side effects, and DDIs from several data sources. In this study, data from the STITCH database, from drugs.com, and drug-target pairs from ChEMBL and SIDER were first collected. Then, by applying two machine learning approaches, a support vector machine (SVM) and a kernel-based L1-norm regularized logistic regression (KL1LR), we showed that DDI is a promising feature in predicting drug-target interactions. Next, the accuracies of predicting drug-target interactions using DDI were compared to those obtained using the chemical structure and side effects based on the SVM and KL1LR approaches, showing that DDI was the data source contributing the most for predicting drug-target interactions.  相似文献   

4.
Kinase mediated phosphorylation site detection is the key mechanism of post translational mechanism that plays an important role in regulating various cellular processes and phenotypes. Many diseases, like cancer are related with the signaling defects which are associated with protein phosphorylation. Characterizing the protein kinases and their substrates enhances our ability to understand the mechanism of protein phosphorylation and extends our knowledge of signaling network; thereby helping us to treat such diseases. Experimental methods for predicting phosphorylation sites are labour intensive and expensive. Also, manifold increase of protein sequences in the databanks over the years necessitates the improvement of high speed and accurate computational methods for predicting phosphorylation sites in protein sequences. Till date, a number of computational methods have been proposed by various researchers in predicting phosphorylation sites, but there remains much scope of improvement. In this communication, we present a simple and novel method based on Grammatical Inference (GI) approach to automate the prediction of kinase specific phosphorylation sites. In this regard, we have used a popular GI algorithm Alergia to infer Deterministic Stochastic Finite State Automata (DSFA) which equally represents the regular grammar corresponding to the phosphorylation sites. Extensive experiments on several datasets generated by us reveal that, our inferred grammar successfully predicts phosphorylation sites in a kinase specific manner. It performs significantly better when compared with the other existing phosphorylation site prediction methods. We have also compared our inferred DSFA with two other GI inference algorithms. The DSFA generated by our method performs superior which indicates that our method is robust and has a potential for predicting the phosphorylation sites in a kinase specific manner.  相似文献   

5.

Background

Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information.

Results

AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins.

Conclusion

AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains.  相似文献   

6.
Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox''s proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by or . This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://compbio.cs.umn.edu/Net-Cox/.  相似文献   

7.
Fanconi anemia (FA) is a heterogeneous recessive disorder associated with a markedly elevated risk to develop cancer. To date sixteen FA genes have been identified, three of which predispose heterozygous mutation carriers to breast cancer. The FA proteins work together in a genome maintenance pathway, the so-called FA/BRCA pathway which is important during the S phase of the cell cycle. Since not all FA patients can be linked to (one of) the sixteen known complementation groups, new FA genes remain to be identified. In addition the complex FA network remains to be further unravelled. One of the FA genes, FANCI, has been identified via a combination of bioinformatic techniques exploiting FA protein properties and genetic linkage. The aim of this study was to develop a prioritization approach for proteins of the entire human proteome that potentially interact with the FA/BRCA pathway or are novel candidate FA genes. To this end, we combined the original bioinformatics approach based on the properties of the first thirteen FA proteins identified with publicly available tools for protein-protein interactions, literature mining (Nermal) and a protein function prediction tool (FuncNet). Importantly, the three newest FA proteins FANCO/RAD51C, FANCP/SLX4, and XRCC2 displayed scores in the range of the already known FA proteins. Likewise, a prime candidate FA gene based on next generation sequencing and having a very low score was subsequently disproven by functional studies for the FA phenotype. Furthermore, the approach strongly enriches for GO terms such as DNA repair, response to DNA damage stimulus, and cell cycle-regulated genes. Additionally, overlaying the top 150 with a haploinsufficiency probability score, renders the approach more tailored for identifying breast cancer related genes. This approach may be useful for prioritization of putative novel FA or breast cancer genes from next generation sequencing efforts.  相似文献   

8.
Gene-gene interactions may play an important role in the genetics of a complex disease. Detection and characterization of gene-gene interactions is a challenging issue that has stimulated the development of various statistical methods to address it. In this study, we introduce a method to measure gene interactions using entropy-based statistics from a contingency table of trait and genotype combinations. We also developed an exploration procedure by using graphs. We propose a standardized relative information gain (RIG) measure to evaluate the interactions between single nucleotide polymorphism (SNP) combinations. To identify the k th order interactions, contingency tables of trait and genotype combinations of k SNPs are constructed, with which RIGs are calculated. The RIGs are standardized using the mean and standard deviation from the permuted datasets. SNP combinations yielding high standardized RIG are chosen for gene-gene interactions. Detection of high-order interactions and comparison of interaction strengths between different orders are made possible by using standardized RIG. We have applied the proposed standardized entropy-based method to two types of data sets from a simulation study and a real genetic association study. We have compared our method and the multifactor dimensionality reduction (MDR) method through power analysis of eight different genetic models with varying penetrance rates, number of SNPs, and sample sizes. Our method shows successful identification of genetic associations and gene-gene interactions both in simulation and real genetic data. Simulation results suggest that the proposed entropy-based method is better able to detect high-order interactions and is superior to the MDR method in most cases. The proposed method is well suited for detecting interactions without main effects as well as for models including main effects.  相似文献   

9.
10.
11.
A common biological pathway reconstruction approach—as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences—starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database—as compared to the naïve mapping approach—eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.  相似文献   

12.
We describe a method to predict protein-protein interactions (PPIs) formed between structured domains and short peptide motifs. We take an integrative approach based on consensus patterns of known motifs in databases, structures of domain-motif complexes from the PDB and various sources of non-structural evidence. We combine this set of clues using a Bayesian classifier that reports the likelihood of an interaction and obtain significantly improved prediction performance when compared to individual sources of evidence and to previously reported algorithms. Our Bayesian approach was integrated into PrePPI, a structure-based PPI prediction method that, so far, has been limited to interactions formed between two structured domains. Around 80,000 new domain-motif mediated interactions were predicted, thus enhancing PrePPI’s coverage of the human protein interactome.  相似文献   

13.
Although deregulation of Hedgehog signalling is considered to play a crucial oncogenic role and commonly occurrs in medulloblastoma, genetic lesions in components of this pathway are observed in a minority of cases. The recent identification of a novel putative tumor suppressor (RENKCTD11) on chromosome 17p13.2, a region most frequently lost in human medulloblastoma, highlights the role of allelic deletion of the gene in this brain malignancy, leading to the loss of growth inhibitory activity via suppression of Gli-dependent activation of Hedgehog target genes. The presence on 17p13 of another tumor suppressor gene (p53) whose inactivation cooperates with Hedgehog pathway for medulloblastoma formation, suggests that 17p deletion unveils haploinsufficiency conditions leading to abrogation of either direct and indirect checkpoints of Hedgehog signalling in cancer.  相似文献   

14.
As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.  相似文献   

15.
Missing information in motion capture data caused by occlusion or detachment of markers is a common problem that is difficult to avoid entirely. The aim of this study was to develop and test an algorithm for reconstruction of corrupted marker trajectories in datasets representing human gait. The reconstruction was facilitated using information of marker inter-correlations obtained from a principal component analysis, combined with a novel weighting procedure. The method was completely data-driven, and did not require any training data. We tested the algorithm on datasets with movement patterns that can be considered both well suited (healthy subject walking on a treadmill) and less suited (transitioning from walking to running and the gait of a subject with cerebral palsy) to reconstruct. Specifically, we created 50 copies of each dataset, and corrupted them with gaps in multiple markers at random temporal and spatial positions. Reconstruction errors, quantified by the average Euclidian distance between predicted and measured marker positions, was ≤ 3 mm for the well suited dataset, even when there were gaps in up to 70% of all time frames. For the less suited datasets, median reconstruction errors were in the range 5–6 mm. However, a few reconstructions had substantially larger errors (up to 29 mm). Our results suggest that the proposed algorithm is a viable alternative both to conventional gap-filling algorithms and state-of-the-art reconstruction algorithms developed for motion capture systems. The strengths of the proposed algorithm are that it can fill gaps anywhere in the dataset, and that the gaps can be considerably longer than when using conventional interpolation techniques. Limitations are that it does not enforce musculoskeletal constraints, and that the reconstruction accuracy declines if applied to datasets with less predictable movement patterns.  相似文献   

16.
Synthetic Lethal (SL) genetic interactions play a key role in various types of biological research, ranging from understanding genotype-phenotype relationships to identifying drug-targets against cancer. Despite recent advances in empirical measuring SL interactions in human cells, the human genetic interaction map is far from complete. Here, we present a novel approach to predict this map by exploiting patterns in cancer genome evolution. First, we show that empirically determined SL interactions are reflected in various gene presence, absence, and duplication patterns in hundreds of cancer genomes. The most evident pattern that we discovered is that when one member of an SL interaction gene pair is lost, the other gene tends not to be lost, i.e. the absence of co-loss. This observation is in line with expectation, because the loss of an SL interacting pair will be lethal to the cancer cell. SL interactions are also reflected in gene expression profiles, such as an under representation of cases where the genes in an SL pair are both under expressed, and an over representation of cases where one gene of an SL pair is under expressed, while the other one is over expressed. We integrated the various previously unknown cancer genome patterns and the gene expression patterns into a computational model to identify SL pairs. This simple, genome-wide model achieves a high prediction power (AUC = 0.75) for known genetic interactions. It allows us to present for the first time a comprehensive genome-wide list of SL interactions with a high estimated prediction precision, covering up to 591,000 gene pairs. This unique list can potentially be used in various application areas ranging from biotechnology to medical genetics.  相似文献   

17.
Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships.Protein complexes are central to nearly all biochemical processes in the cell (1). In physiologically relevant states, their protein members assemble with varying degrees of stability, over time and under different cellular conditions, to carry out specific cellular functions (1). Although it is a dynamic and tightly regulated process, there is much evidence to support the notion that protein complex assembly results in discrete signaling macromolecules (2). According to the modular organization of molecular networks of the cell (3), protein complexes cooperate in functional networks through dynamic physical interactions with other macromolecules, including other protein complexes (46). These physical interactions between pairs of protein complexes may form the backbone of cellular processes (7), such as the recruitment of complexes by other complexes to sites of genome reorganization or in signaling networks. In this study, we attempted to predict these physical interactions between all pairs of known protein complexes, using the manually curated protein complex databases in CORUM and CYC2008 for humans and yeast, respectively.The physical protein interactions that may occur between pairs of complexes have been reported to be more transient, compared with the combination of both permanent and transient interactions that occur within complexes (8). Indeed, the very stability of protein interactions within a protein complex lies between the two extremes of either transient or permanent states (9). Consequently, the experimental identification in a genome-wide manner of the physical interactions between pairs of complexes is very difficult. This challenge has recently been addressed (7, 10) by experiments where the weak interactions were preserved during affinity purifications, followed by inference of the less stable interactions of proteins with the core proteins within the complex. Guided by a computational method to predict the list of protein members in the complexes (10), this allowed a screen of putative inter-complex relationships from human cell lines (7). This adds to the many landmark developments in recent years to characterize protein complexes in a genome-wide manner (7, 1113). However, in these experiments it is not always easy to infer accurately what constitutes the protein members of a protein complex. Because of various experimental limitations (14) and the dynamic nature of complex assembly in the cell (15), the protein members of the complexes must be predicted from thousands of purification measurements (1012, 16). As a result, there are surprisingly large differences in the protein complexes inferred in these studies, depending on the algorithm used (17, 18). Hence, the inference of protein complexes from genome-wide screens (11, 12) is likely to contain significant noise from false-positives resulting from methodological uncertainty (9). This noise would in turn cause ambiguity when attempting to predict, genome-wide, interactions that may occur between protein complexes. One solution to this problem, as applied in this study, is the use of comprehensive databases of the so-called “gold standard” community definitions of protein complexes (1922). In these resources, critical reading of the scientific literature by trained experts leads to definitions of the lists of protein members that are experimentally verified to form complexes. Each of these manually curated protein complexes are assigned functional annotations and a unique identifier. It is our assumption that this approach will allow for a more accurate resolution of the physical interactions between protein complexes.Based on this reasoning, we utilized all protein complex pairs from 1216 human protein complexes in CORUM (21) and 471 in the yeast CYC2008 databases (22, 23), and we attempted to predict physical interactions between them.To this end, we integrated only binary physical protein interactions that were experimentally verified and supported by Medline references, from the iRefIndex database (24, 25), and we developed a statistical method that compared the number of observed physical protein interactions between pairs of protein complexes versus the number of protein interactions expected to be present in pairs of randomized protein complexes. The highest scoring predicted pairs formed a network that was analyzed to identify communities of physically interacting protein complexes. Such higher order perspectives of cellular proteomes may aid discovery of novel functional relationships and lead to an improved understanding of cellular behavior.One recent study utilized manually curated protein complexes-complex interactions in yeast (23) as part of a machine learning strategy to identify complex-complex interactions. However, they added to the training data complex pairs enriched with protein interactions under the assumption that these were likely to contain complex-complex interactions but without a clear statistical argument to assess the reliability of these. Our aim has been to provide a more rigorous statistical approach applied to yeast and human, in which the main confounding factors, protein degrees and protein similarities within the complexes, have been taken into account.We used only the manually curated yeast complex-complex interactions from Ref. 23 as the reference set to evaluate our method after verifying with the authors that the manual curation had not been guided by enrichment in the protein network. Of these interactions, we predicted half at a 10% false discovery rate. Thus, although improvements in data as well as methods are still required for a more complete prediction of complex-complex interactions, a fair portion of these interactions can be reliably predicted now by using our method.  相似文献   

18.
ABC‐type drug efflux pumps, e.g., ABCB1 (=P‐glycoprotein, =MDR1), ABCC1 (=MRP1), and ABCG2 (=MXR, =BCRP), confer a multi‐drug resistance (MDR) phenotype to cancer cells. Furthermore, the important contribution of ABC transporters for bioavailability, distribution, elimination, and blood–brain barrier permeation of drug candidates is increasingly recognized. This review presents an overview on the different computational methods and models pursued to predict ABC transporter substrate properties of drug‐like compounds. They encompass ligand‐based approaches ranging from ‘simple rule’‐based efforts to sophisticated machine learning methods. Many of these models show excellent performance for the data sets used. However, due to the complex nature of the applied methods, useful interpretation of the models that can be directly translated into chemical structures by the medicinal chemist is rather difficult. Additionally, very recent and promising attempts in the field of structure‐based modeling of ABC transporters, which embody homology modeling as well as recently published X‐ray structures of murine ABCB1, will be discussed.  相似文献   

19.
Recently more and more evidence suggest that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G×G) and gene-environmental (G×E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G×G or G×E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G×G and G×E interactions.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号