首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

During the identification of potential candidates, computational prediction of drug-target interactions (DTIs) is important to subsequent expensive validation in wet-lab. DTI screening considers four scenarios, depending on whether the drug is an existing or a new drug and whether the target is an existing or a new target. However, existing approaches have the following limitations. First, only a few of them can address the most difficult scenario (i.e., predicting interactions between new drugs and new targets). More importantly, none of the existing approaches could provide the explicit information for understanding the mechanism of forming interactions, such as the drug-target feature pairs contributing to the interactions.

Results

In this paper, we propose a Triple Matrix Factorization-based model (TMF) to tackle these problems. Compared with former state-of-the-art predictive methods, TMF demonstrates its significant superiority by assessing the predictions on four benchmark datasets over four kinds of screening scenarios. Also, it exhibits its outperformance by validating predicted novel interactions. More importantly, by using PubChem fingerprints of chemical structures as drug features and occurring frequencies of amino acid trimer as protein features, TMF shows its ability to find out the features determining interactions, including dominant feature pairs, frequently occurring substructures, and conserved triplet of amino acids.

Conclusions

Our TMF provides a unified framework of DTI prediction for all the screening scenarios. It also presents a new insight for the underlying mechanism of DTIs by indicating dominant features, which play important roles in the forming of DTI.
  相似文献   

2.

Background

Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types.

Results

We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully.

Conclusions

Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.
  相似文献   

3.
4.

Background

Tandem affinity purification coupled with mass-spectrometry (TAP/MS) analysis is a popular method for the identification of novel endogenous protein-protein interactions (PPIs) in large-scale. Computational analysis of TAP/MS data is a critical step, particularly for high-throughput datasets, yet it remains challenging due to the noisy nature of TAP/MS data.

Results

We investigated several major TAP/MS data analysis methods for identifying PPIs, and developed an advanced method, which incorporates an improved statistical method to filter out false positives from the negative controls. Our method is named PPIRank that stands for PPI rank ing in TAP/MS data. We compared PPIRank with several other existing methods in analyzing two pathway-specific TAP/MS PPI datasets from Drosophila.

Conclusion

Experimental results show that PPIRank is more capable than other approaches in terms of identifying known interactions collected in the BioGRID PPI database. Specifically, PPIRank is able to capture more true interactions and simultaneously less false positives in both Insulin and Hippo pathways of Drosophila Melanogaster.
  相似文献   

5.

Purpose

The environmental life cycle management (LCM) literature proposes many factors considered critical in order to successfully conduct LCM. This study contrasts these vague and general factors proposed as critical to LCM in existing literature, with detailed accounts of LCM in practice.

Methods

A literature review of three related research fields, i.e., LCM, life cycle thinking, and sustainable supply chain management, is contrasted with a study of how LCM is enacted in practice in a large multinational manufacturing company recognized for its LCM work. A qualitative study, with mainly a managerial focus, is conducted based on interviews, workshops, part-time observations, and document studies.

Results and discussion

The literature review demonstrates that the three related research fields provide different accounts of LCM: all apply a holistic environmental perspective, but with different emphases and using largely different research methods. The empirical study shows that integration was a common topic at the studied company and that solutions were often sought in tools and processes. Middle management support proved important, and challenging, in these integration efforts. Challenges identified also included further integrating LCM into departments such as purchasing and sales.

Conclusions

The constant focus on integration at the studied company implies that LCM work is an ongoing effort. Several integration paths are identified: (1) inclusion of sustainability aspects in tools and processes, (2) finding ways to work around certain organizational levels, and (3) using networks and social interaction to create commitment and integration. Although the concept of LCM implies a holistic approach, LCM in practice reveals a lack of a comprehensive overview of LCM-related initiatives and of involved sustainability practitioners within the studied organization.
  相似文献   

6.

Background

Developing novel uses of approved drugs, called drug repositioning, can reduce costs and times in traditional drug development. Network-based approaches have presented promising results in this field. However, even though various types of interactions such as activation or inhibition exist in drug-target interactions and molecular pathways, most of previous network-based studies disregarded this information.

Methods

We developed a novel computational method, Prediction of Drugs having Opposite effects on Disease genes (PDOD), for identifying drugs having opposite effects on altered states of disease genes. PDOD utilized drug-drug target interactions with ‘effect type’, an integrated directed molecular network with ‘effect type’ and ‘effect direction’, and disease genes with regulated states in disease patients. With this information, we proposed a scoring function to discover drugs likely to restore altered states of disease genes using the path from a drug to a disease through the drug-drug target interactions, shortest paths from drug targets to disease genes in molecular pathways, and disease gene-disease associations.

Results

We collected drug-drug target interactions, molecular pathways, and disease genes with their regulated states in the diseases. PDOD is applied to 898 drugs with known drug-drug target interactions and nine diseases. We compared performance of PDOD for predicting known therapeutic drug-disease associations with the previous methods. PDOD outperformed other previous approaches which do not exploit directional information in molecular network. In addition, we provide a simple web service that researchers can submit genes of interest with their altered states and will obtain drugs seeming to have opposite effects on altered states of input genes at http://gto.kaist.ac.kr/pdod/index.php/main.

Conclusions

Our results showed that ‘effect type’ and ‘effect direction’ information in the network based approaches can be utilized to identify drugs having opposite effects on diseases. Our study can offer a novel insight into the field of network-based drug repositioning.
  相似文献   

7.

Background

Recently, the metabolite-likeness of the drug space has emerged and has opened a new possibility for exploring human metabolite-like candidates in drug discovery. However, the applicability of metabolite-likeness in drug discovery has been largely unexplored. Moreover, there are no reports on its applications for the repositioning of drugs to possible enzyme modulators, although enzyme-drug relations could be directly inferred from the similarity relationships between enzyme’s metabolites and drugs.

Methods

We constructed a drug-metabolite structural similarity matrix, which contains 1,861 FDA-approved drugs and 1,110 human intermediary metabolites scored with the Tanimoto similarity. To verify the metabolite-likeness measure for drug repositioning, we analyzed 17 known antimetabolite drugs that resemble the innate metabolites of their eleven target enzymes as the gold standard positives. Highly scored drugs were selected as possible modulators of enzymes for their corresponding metabolites. Then, we assessed the performance of metabolite-likeness with a receiver operating characteristic analysis and compared it with other drug-target prediction methods. We set the similarity threshold for drug repositioning candidates of new enzyme modulators based on maximization of the Youden’s index. We also carried out literature surveys for supporting the drug repositioning results based on the metabolite-likeness.

Results

In this paper, we applied metabolite-likeness to repurpose FDA-approved drugs to disease-associated enzyme modulators that resemble human innate metabolites. All antimetabolite drugs were mapped with their known 11 target enzymes with statistically significant similarity values to the corresponding metabolites. The comparison with other drug-target prediction methods showed the higher performance of metabolite-likeness for predicting enzyme modulators. After that, the drugs scored higher than similarity score of 0.654 were selected as possible modulators of enzymes for their corresponding metabolites. In addition, we showed that drug repositioning results of 10 enzymes were concordant with the literature evidence.

Conclusions

This study introduced a method to predict the repositioning of known drugs to possible modulators of disease associated enzymes using human metabolite-likeness. We demonstrated that this approach works correctly with known antimetabolite drugs and showed that the proposed method has better performance compared to other drug target prediction methods in terms of enzyme modulators prediction. This study as a proof-of-concept showed how to apply metabolite-likeness to drug repositioning as well as potential in further expansion as we acquire more disease associated metabolite-target protein relations.
  相似文献   

8.

Introduction

A common problem in metabolomics data analysis is the existence of a substantial number of missing values, which can complicate, bias, or even prevent certain downstream analyses. One of the most widely-used solutions to this problem is imputation of missing values using a k-nearest neighbors (kNN) algorithm to estimate missing metabolite abundances. kNN implicitly assumes that missing values are uniformly distributed at random in the dataset, but this is typically not true in metabolomics, where many values are missing because they are below the limit of detection of the analytical instrumentation.

Objectives

Here, we explore the impact of nonuniformly distributed missing values (missing not at random, or MNAR) on imputation performance. We present a new model for generating synthetic missing data and a new algorithm, No-Skip kNN (NS-kNN), that accounts for MNAR values to provide more accurate imputations.

Methods

We compare the imputation errors of the original kNN algorithm using two distance metrics, NS-kNN, and a recently developed algorithm KNN-TN, when applied to multiple experimental datasets with different types and levels of missing data.

Results

Our results show that NS-kNN typically outperforms kNN when at least 20–30% of missing values in a dataset are MNAR. NS-kNN also has lower imputation errors than KNN-TN on realistic datasets when at least 50% of missing values are MNAR.

Conclusion

Accounting for the nonuniform distribution of missing values in metabolomics data can significantly improve the results of imputation algorithms. The NS-kNN method imputes missing metabolomics data more accurately than existing kNN-based approaches when used on realistic datasets.
  相似文献   

9.

Background

Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data to the clinic has not kept pace. Over the past 60 years, pharmaceutical companies have successfully demonstrated the safety and efficacy of over 1,200 novel therapeutic drugs via costly clinical studies. While this process must continue, better use can be made of the existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification of disease genes by identifying the most probable candidate genes linked to genetic markers of the disease or phenotype under investigation. Integration of drug-target data with candidate gene prediction systems can identify novel phenotypes which may benefit from current therapeutics. Such a drug repositioning tool can save valuable time and money spent on preclinical studies and phase I clinical trials.

Methods

We previously used Gentrepid (http://www.gentrepid.org) as a platform to predict 1,497 candidate genes for the seven complex diseases considered in the Wellcome Trust Case-Control Consortium genome-wide association study; namely Type 2 Diabetes, Bipolar Disorder, Crohn's Disease, Hypertension, Type 1 Diabetes, Coronary Artery Disease and Rheumatoid Arthritis. Here, we adopted a simple approach to integrate drug data from three publicly available drug databases: the Therapeutic Target Database, the Pharmacogenomics Knowledgebase and DrugBank; with candidate gene predictions from Gentrepid at the systems level.

Results

Using the publicly available drug databases as sources of drug-target association data, we identified a total of 428 candidate genes as novel therapeutic targets for the seven phenotypes of interest, and 2,130 drugs feasible for repositioning against the predicted novel targets.

Conclusions

By integrating genetic, bioinformatic and drug data, we have demonstrated that currently available drugs may be repositioned as novel therapeutics for the seven diseases studied here, quickly taking advantage of prior work in pharmaceutics to translate ground-breaking results in genetics to clinical treatments.
  相似文献   

10.

Background

Computational approaches in the identification of drug targets are expected to reduce time and effort in drug development. Advances in genomics and proteomics provide the opportunity to uncover properties of druggable genomes. Although several studies have been conducted for distinguishing drug targets from non-drug targets, they mainly focus on the sequences and functional roles of proteins. Many other properties of proteins have not been fully investigated.

Methods

Using the DrugBank (version 3.0) database containing nearly 6,816 drug entries including 760 FDA-approved drugs and 1822 of their targets and human UniProt/Swiss-Prot databases, we defined 1578 non-redundant drug target and 17,575 non-drug target proteins. To select these non-redundant protein datasets, we built four datasets (A, B, C, and D) by considering clustering of paralogous proteins.

Results

We first reassessed the widely used properties of drug target proteins. We confirmed and extended that drug target proteins (1) are likely to have more hydrophobic, less polar, less PEST sequences, and more signal peptide sequences higher and (2) are more involved in enzyme catalysis, oxidation and reduction in cellular respiration, and operational genes. In this study, we proposed new properties (essentiality, expression pattern, PTMs, and solvent accessibility) for effectively identifying drug target proteins. We found that (1) drug targetability and protein essentiality are decoupled, (2) druggability of proteins has high expression level and tissue specificity, and (3) functional post-translational modification residues are enriched in drug target proteins. In addition, to predict the drug targetability of proteins, we exploited two machine learning methods (Support Vector Machine and Random Forest). When we predicted drug targets by combining previously known protein properties and proposed new properties, an F-score of 0.8307 was obtained.

Conclusions

When the newly proposed properties are integrated, the prediction performance is improved and these properties are related to drug targets. We believe that our study will provide a new aspect in inferring drug-target interactions.
  相似文献   

11.

Introduction

Lung cancer is the leading cause of cancer related mortality owing to the advanced stage it is usually detected because the available diagnostic tests are expensive and invasive; therefore, they cannot be used for general screening.

Objectives

To increase robustness of previous biomarker panels—based on metabolites in sweat samples—proposed by the authors, new samples were collected within different intervals (4 months and 2 years), analyzed at different times (2012 and 2014, respectively) by different analysts to discriminate between LC patients and smokers at risk factor.

Methods

Sweat analysis was carried out by LC–MS/MS with minimum sample preparation and the generated analytical data were then integrated to minimize variability in statistical analysis.

Results

Panels with capability to discriminate LC patients from smokers at risk factor were obtained taken into account the variability between both cohorts as a consequence of the different intervals for samples collection, the times at which the analyses were carried out and the influence of the analyst. Two panels of metabolites using the PanelomiX tool allow reducing false negatives (95 % specificity) and false positives (95 % sensitivity). The first panel (96.9 % specificity and 83.8 % sensitivity) is composed by monoglyceride MG(22:2), muconic, suberic and urocanic acids, and a tetrahexose; the second panel (81.2 % specificity and 97.3 % sensitivity) is composed by the monoglyceride MG(22:2), muconic, nonanedioic and urocanic acids, and a tetrahexose.

Conclusion

The study has allowed obtaining a prediction model more robust than that obtained in the previous study from the authors.
  相似文献   

12.

Background

Clinical statement alone is not enough to predict the progression of disease. Instead, the gene expression profiles have been widely used to forecast clinical outcomes. Many genes related to survival have been identified, and recently miRNA expression signatures predicting patient survival have been also investigated for several cancers. However, miRNAs and their target genes associated with clinical outcomes have remained largely unexplored.

Methods

Here, we demonstrate a survival analysis based on the regulatory relationships of miRNAs and their target genes. The patient survivals for the two major cancers, ovarian cancer and glioblastoma multiforme (GBM), are investigated through the integrated analysis of miRNA-mRNA interaction pairs.

Results

We found that there is a larger survival difference between two patient groups with an inversely correlated expression profile of miRNA and mRNA. It supports the idea that signatures of miRNAs and their targets related to cancer progression can be detected via this approach.

Conclusions

This integrated analysis can help to discover coordinated expression signatures of miRNAs and their target mRNAs that can be employed for therapeutics in human cancers.
  相似文献   

13.

Background

Malaria is a major public health burden in Southeastern Bangladesh, particularly in the Chittagong Hill Tracts region. Malaria is endemic in 13 districts of Bangladesh and the highest prevalence occurs in Khagrachari (15.47%).

Methods

A risk map was developed and geographic risk factors identified using a Bayesian approach. The Bayesian geostatistical model was developed from previously identified individual and environmental covariates (p < 0.2; age, different forest types, elevation and economic status) for malaria prevalence using WinBUGS 1.4. Spatial correlation was estimated within a Bayesian framework based on a geostatistical model. The infection status (positives and negatives) was modeled using a Bernoulli distribution. Maps of the posterior distributions of predicted prevalence were developed in geographic information system (GIS).

Results

Predicted high prevalence areas were located along the north-eastern areas, and central part of the study area. Low to moderate prevalence areas were predicted in the southwestern, southeastern and central regions. Individual age and nearness to fragmented forest were associated with malaria prevalence after adjusting the spatial auto-correlation.

Conclusion

A Bayesian analytical approach using multiple enabling technologies (geographic information systems, global positioning systems, and remote sensing) provide a strategy to characterize spatial heterogeneity in malaria risk at a fine scale. Even in the most hyper endemic region of Bangladesh there is substantial spatial heterogeneity in risk. Areas that are predicted to be at high risk, based on the environment but that have not been reached by surveys are identified.
  相似文献   

14.

Purpose

Practitioners of life cycle assessment (LCA) acknowledge that more input from social scientists can help advance the cause of life cycle management (LCM). This commentary offers a social science perspective on a long-running question within LCA, namely, how the field should manage not only stakeholders’ values but also those of practitioners themselves.

Methods

More than 60 interviews were conducted with LCA practitioners and their industry clients. Qualitative data were also collected through participant observation at several LCA and LCM conferences, a study of the field’s history, and extensive content and discourse analysis of LCA publications and online forums.

Results and discussion

Results show that LCA practitioners’ values are informed partly by the knowledge acquired through their LCA work. At the same time, LCA standards and professional norms implicitly advise practitioners to keep those values out of their work as much as possible, so as not to compromise its apparent objectivity. By contrast, many social scientists contend openly that value-based judgments, based on “situated knowledge,” can actually enhance the rigor, accountability, and credibility of scientific assessments.

Conclusions

LCA practitioners’ own situated knowledge justifies not only the value choices required by LCA but also their evaluative judgments of contemporary life cycle-based sustainability initiatives. This more critical voice could advance the goals of LCM while also boosting the credibility of LCA more generally.
  相似文献   

15.

Background

Protein synthetic lethal genetic interactions are useful to define functional relationships between proteins and pathways. However, the molecular mechanism of synthetic lethal genetic interactions remains unclear.

Results

In this study we used the clusters of short polypeptide sequences, which are typically shorter than the classically defined protein domains, to characterize the functionalities of proteins. We developed a framework to identify significant short polypeptide clusters from yeast protein sequences, and then used these short polypeptide clusters as features to predict yeast synthetic lethal genetic interactions. The short polypeptide clusters based approach provides much higher coverage for predicting yeast synthetic lethal genetic interactions. Evaluation using experimental data sets showed that the short polypeptide clusters based approach is superior to the previous protein domain based one.

Conclusion

We were able to achieve higher performance in yeast synthetic lethal genetic interactions prediction using short polypeptide clusters as features. Our study suggests that the short polypeptide cluster may help better understand the functionalities of proteins.
  相似文献   

16.
17.

Background

Currently a huge amount of protein-protein interaction data is available from high throughput experimental methods. In a large network of protein-protein interactions, groups of proteins can be identified as functional clusters having related functions where a single protein can occur in multiple clusters. However experimental methods are error-prone and thus the interactions in a functional cluster may include false positives or there may be unreported interactions. Therefore correctly identifying a functional cluster of proteins requires the knowledge of whether any two proteins in a cluster interact, whether an interaction can exclude other interactions, or how strong the affinity between two interacting proteins is.

Methods

In the present work the yeast protein-protein interaction network is clustered using a spectral clustering method proposed by us in 2006 and the individual clusters are investigated for functional relationships among the member proteins. 3D structural models of the proteins in one cluster have been built – the protein structures are retrieved from the Protein Data Bank or predicted using a comparative modeling approach. A rigid body protein docking method (Cluspro) is used to predict the protein-protein interaction complexes. Binding sites of the docked complexes are characterized by their buried surface areas in the docked complexes, as a measure of the strength of an interaction.

Results

The clustering method yields functionally coherent clusters. Some of the interactions in a cluster exclude other interactions because of shared binding sites. New interactions among the interacting proteins are uncovered, and thus higher order protein complexes in the cluster are proposed. Also the relative stability of each of the protein complexes in the cluster is reported.

Conclusions

Although the methods used are computationally expensive and require human intervention and judgment, they can identify the interactions that could occur together or ones that are mutually exclusive. In addition indirect interactions through another intermediate protein can be identified. These theoretical predictions might be useful for crystallographers to select targets for the X-ray crystallographic determination of protein complexes.
  相似文献   

18.

Background

Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome.

Results

Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data.

Conclusion

Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.
  相似文献   

19.

Introduction

Data processing is one of the biggest problems in metabolomics, given the high number of samples analyzed and the need of multiple software packages for each step of the processing workflow.

Objectives

Merge in the same platform the steps required for metabolomics data processing.

Methods

KniMet is a workflow for the processing of mass spectrometry-metabolomics data based on the KNIME Analytics platform.

Results

The approach includes key steps to follow in metabolomics data processing: feature filtering, missing value imputation, normalization, batch correction and annotation.

Conclusion

KniMet provides the user with a local, modular and customizable workflow for the processing of both GC–MS and LC–MS open profiling data.
  相似文献   

20.

Introduction

The generic metabolomics data processing workflow is constructed with a serial set of processes including peak picking, quality assurance, normalisation, missing value imputation, transformation and scaling. The combination of these processes should present the experimental data in an appropriate structure so to identify the biological changes in a valid and robust manner.

Objectives

Currently, different researchers apply different data processing methods and no assessment of the permutations applied to UHPLC-MS datasets has been published. Here we wish to define the most appropriate data processing workflow.

Methods

We assess the influence of normalisation, missing value imputation, transformation and scaling methods on univariate and multivariate analysis of UHPLC-MS datasets acquired for different mammalian samples.

Results

Our studies have shown that once data are filtered, missing values are not correlated with m/z, retention time or response. Following an exhaustive evaluation, we recommend PQN normalisation with no missing value imputation and no transformation or scaling for univariate analysis. For PCA we recommend applying PQN normalisation with Random Forest missing value imputation, glog transformation and no scaling method. For PLS-DA we recommend PQN normalisation, KNN as the missing value imputation method, generalised logarithm transformation and no scaling. These recommendations are based on searching for the biologically important metabolite features independent of their measured abundance.

Conclusion

The appropriate choice of normalisation, missing value imputation, transformation and scaling methods differs depending on the data analysis method and the choice of method is essential to maximise the biological derivations from UHPLC-MS datasets.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号