首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 216 毫秒
1.
Structure-based drug design uses three-dimensional geometric information of macromolecules, such as proteins or nucleic acids, to identify suitable ligands. Geometric deep learning, an emerging concept of neural-network-based machine learning, has been applied to macromolecular structures. This review provides an overview of the recent applications of geometric deep learning in bioorganic and medicinal chemistry, highlighting its potential for structure-based drug discovery and design. Emphasis is placed on molecular property prediction, ligand binding site and pose prediction, and structure-based de novo molecular design. The current challenges and opportunities are highlighted, and a forecast of the future of geometric deep learning for drug discovery is presented.  相似文献   

2.
3.
4.
Deep learning demonstrates greater competence over traditional machine learning techniques for many tasks. In last several years, deep learning has been applied to protein function prediction and a series of good achievements has been obtained. These findings extensively advanced our understanding of protein function. However, the accuracy of protein function prediction based upon deep learning still has yet to be improved. In article number 1900019, Issue 12, Zhang et al. construct DeepFunc, a deep learning framework using derived feature information of protein sequence and protein interactions network. They find that implementing DeepFunc for protein function prediction is more accurate than using DeepGO, a similar method reported previously. Meanwhile, they find that the method of combining multiple derived feature information in DeepFunc is much better than the method of using only single derived feature information. Due to its fully exploiting feature representation learning ability, deep learning with more derived feature information will enable it to be a promising method for solving more complicated protein function prediction problems and other bioinformatics challenges. Recent researches have provided some major insights into the value for using deep learning to protein function prediction problem.  相似文献   

5.
Accurate retention time (RT) prediction is important for spectral library-based analysis in data-independent acquisition mass spectrometry-based proteomics. The deep learning approach has demonstrated superior performance over traditional machine learning methods for this purpose. The transformer architecture is a recent development in deep learning that delivers state-of-the-art performance in many fields such as natural language processing, computer vision, and biology. We assess the performance of the transformer architecture for RT prediction using datasets from five deep learning models Prosit, DeepDIA, AutoRT, DeepPhospho, and AlphaPeptDeep. The experimental results on holdout datasets and independent datasets exhibit state-of-the-art performance of the transformer architecture. The software and evaluation datasets are publicly available for future development in the field.  相似文献   

6.
Errata     
Abstract

Mass spectrometry (MS)-based proteomics is an unrivaled tool for studying complex biological systems and diseases in the post-genomic era. In recent years, MS has emerged as a powerful structural biological tool to characterize protein conformation and conformational dynamics. The advantages of MS in structural studies are most evident for membrane proteins such as GPCRs (G protein-coupled receptors), where other well-established structural methods such as X-ray crystallography and NMR remain challenging. For proteins with available high-resolution structures, MS-based structural strategies can provide valuable, previously inaccessible information on protein conformational changes and dynamics, protein motion/flexibility, ligand–protein binding, and protein–protein interfaces. In the past several years, we have developed and adapted a number of MS-based structural approaches, such as CDSiL-MS (Conformational changes and Dynamics using Stable-isotope Labeling and MS), CXMS (Crosslinking/MS) and HDXMS (Hydrogen-Deuterium Exchange MS), to study protein structures and conformational dynamics in human β2-adrenegic receptor (β2AR) signaling. In this mini-review, we will highlight several examples demonstrating the power of MS in structural analysis to better elucidate the structural basis of GPCR signaling, particularly through the β-arrestin-mediated GPCR signaling pathway.  相似文献   

7.
Prediction of protein–protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.  相似文献   

8.
Deep learning has revolutionized research in image processing, speech recognition, natural language processing, game playing, and will soon revolutionize research in proteomics and genomics. Through three examples in genomics, protein structure prediction, and proteomics, we demonstrate that deep learning is changing bioinformatics research, shifting from algorithm‐centric to data‐centric approaches.  相似文献   

9.
We present a novel partner‐specific protein–protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter‐protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding‐associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/ . Proteins 2014; 82:1142–1155. © 2013 Wiley Periodicals, Inc.  相似文献   

10.
Current understanding of the underlying molecular network and mechanism for attention-deficit hyperactivity disorder (ADHD) is lacking and incomplete. Previous studies suggest that genomic structural variations play an important role in the pathogenesis of ADHD. For effective modeling, deep learning approaches have become a method of choice, with ability to predict the impact of genetic variations involving complicated mechanisms. In this study, we examined copy number variation in whole genome sequencing from 116 African Americans ADHD children and 408 African American controls. We divided the human genome into 150 regions, and the variation intensity in each region was applied as feature vectors for deep learning modeling to classify ADHD patients. The accuracy of deep learning for predicting ADHD diagnosis is consistently around 78% in a two-fold shuffle test, compared with ∼50% by traditional k-mean clustering methods. Additional whole genome sequencing data from 351 European Americans children, including 89 ADHD cases and 262 controls, were applied as independent validation using feature vectors obtained from the African American ethnicity analysis. The accuracy of ADHD labeling was lower in this setting (∼70–75%) but still above the results from traditional methods. The regions with highest weight overlapped with the previously reported ADHD-associated copy number variation regions, including genes such as GRM1 and GRM8, key drivers of metabotropic glutamate receptor signaling. A notable discovery is that structural variations in non-coding genomic (intronic/intergenic) regions show prediction weights that can be as high as prediction weight from variations in coding regions, results that were unexpected.  相似文献   

11.
蛋白质翻译后修饰对蛋白质成熟、结构和功能多样性有决定性的作用。但蛋白质翻译后修饰的多样性、普遍性、动态性,使传统的生物化学方法在全局水平上理解翻译后修饰非常有限,对它们的研究、特别是大规模的研究长期发展缓慢。现在,在实验研究基础上,借助多方面的生物信息学方法,可以快速高通量的预测和鉴定蛋白质翻译后修饰。一方面,可以从序列角度出发,基于酶识别底物的特异性,用位点权重矩阵、支持向量机等算法,从底物蛋白质序列提取修饰相关的保守序列,并用于预测翻译后修饰位点。这种方法相对成熟,能够取得较理想的预测准确性,但不能反映不同时间不同细胞的翻译后修饰状态。另一方面,可从质谱数据分析出发,有望捕获细胞内翻译后修饰的动态特性。质谱分析的高灵敏度、高准确度和高通量的能力已使建立在质谱基础上的蛋白质组学成为研究翻译后修饰的重要工具,生物信息学方法和质谱蛋白质组学的结合则更可以加速研究翻译后修饰的进程。本文从序列和质谱分析两个角度总结评价了各种翻译后修饰相关生物信息学方法的研究近况,重点讨论利用质谱数据鉴定翻译后修饰的新思路。  相似文献   

12.
Jie Hou  Tianqi Wu  Renzhi Cao  Jianlin Cheng 《Proteins》2019,87(12):1165-1178
Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.  相似文献   

13.
With the development of artificial intelligence (AI) technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein – DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein – DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed.  相似文献   

14.
Rong Liu  Jianjun Hu 《Proteins》2013,81(11):1885-1899
Accurate prediction of DNA‐binding residues has become a problem of increasing importance in structural bioinformatics. Here, we presented DNABind, a novel hybrid algorithm for identifying these crucial residues by exploiting the complementarity between machine learning‐ and template‐based methods. Our machine learning‐based method was based on the probabilistic combination of a structure‐based and a sequence‐based predictor, both of which were implemented using support vector machines algorithms. The former included our well‐designed structural features, such as solvent accessibility, local geometry, topological features, and relative positions, which can effectively quantify the difference between DNA‐binding and nonbinding residues. The latter combined evolutionary conservation features with three other sequence attributes. Our template‐based method depended on structural alignment and utilized the template structure from known protein–DNA complexes to infer DNA‐binding residues. We showed that the template method had excellent performance when reliable templates were found for the query proteins but tended to be strongly influenced by the template quality as well as the conformational changes upon DNA binding. In contrast, the machine learning approach yielded better performance when high‐quality templates were not available (about 1/3 cases in our dataset) or the query protein was subject to intensive transformation changes upon DNA binding. Our extensive experiments indicated that the hybrid approach can distinctly improve the performance of the individual methods for both bound and unbound structures. DNABind also significantly outperformed the state‐of‐art algorithms by around 10% in terms of Matthews's correlation coefficient. The proposed methodology could also have wide application in various protein functional site annotations. DNABind is freely available at http://mleg.cse.sc.edu/DNABind/ . Proteins 2013; 81:1885–1899. © 2013 Wiley Periodicals, Inc.  相似文献   

15.

Background  

The prediction of protein-protein binding site can provide structural annotation to the protein interaction data from proteomics studies. This is very important for the biological application of the protein interaction data that is increasing rapidly. Moreover, methods for predicting protein interaction sites can also provide crucial information for improving the speed and accuracy of protein docking methods.  相似文献   

16.
Structural proteomics is one of the powerful research areas in the postgenomic era, elucidating structure-function relationships of uncharacterized gene products based on the 3D protein structure. It proposes biochemical and cellular functions of unannotated proteins and thereby identifies potential drug design and protein engineering targets. Recently, a number of pioneering groups in structural proteomics research have achieved proof of structural proteomic theory by predicting the 3D structures of hypothetical proteins that successfully identified the biological functions of those proteins. The pioneering groups made use of a number of techniques, including NMR spectroscopy, which has been applied successfully to structural proteomics studies over the past 10 years. In addition, advances in hardware design, data acquisition methods, sample preparation and automation of data analysis have been developed and successfully applied to high-throughput structure determination techniques. These efforts ensure that NMR spectroscopy will become an important methodology for performing structural proteomics research on a genomic scale. NMR-based structural proteomics together with x-ray crystallography will provide a comprehensive structural database to predict the basic biological functions of hypothetical proteins identified by the genome projects.  相似文献   

17.
Kaleel  Manaz  Torrisi  Mirko  Mooney  Catherine  Pollastri  Gianluca 《Amino acids》2019,51(9):1289-1296

Predicting the three-dimensional structure of proteins is a long-standing challenge of computational biology, as the structure (or lack of a rigid structure) is well known to determine a protein’s function. Predicting relative solvent accessibility (RSA) of amino acids within a protein is a significant step towards resolving the protein structure prediction challenge especially in cases in which structural information about a protein is not available by homology transfer. Today, arguably the core of the most powerful prediction methods for predicting RSA and other structural features of proteins is some form of deep learning, and all the state-of-the-art protein structure prediction tools rely on some machine learning algorithm. In this article we present a deep neural network architecture composed of stacks of bidirectional recurrent neural networks and convolutional layers which is capable of mining information from long-range interactions within a protein sequence and apply it to the prediction of protein RSA using a novel encoding method that we shall call “clipped”. The final system we present, PaleAle 5.0, which is available as a public server, predicts RSA into two, three and four classes at an accuracy exceeding 80% in two classes, surpassing the performances of all the other predictors we have benchmarked.

  相似文献   

18.
In the past decade, improvements in genome annotation, protein fractionation methods and mass spectrometry instrumentation resulted in rapid growth of Drosophila proteomics. This review presents the current status of proteomics research in the fly. Areas that have seen major advances in recent years include efforts to map and catalog the Drosophila proteome and high-throughput as well as targeted studies to analyze protein–protein interactions and post-translational modifications. Stable isotope labeling of flies and other applications of quantitative proteomics have opened up new possibilities for functional analyses. It is clear that proteomics is becoming an indispensable tool in Drosophila systems biology research that adds a unique dimension to studying gene function.  相似文献   

19.
Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.  相似文献   

20.
Proteomics is the study of the protein complement of a genome and employs a number of newly emerging tools. One such tool is chemical proteomics, which is a branch of proteomics devoted to the exploration of protein function using both in vitro and in vivo chemical probes. Chemical proteomics aims to define protein function and mechanism at the level of directly observed protein–ligand interactions, whereas chemical genomics aims to define the biological role of a protein using chemical knockouts and observing phenotypic changes. Chemical proteomics is therefore traditional mechanistic biochemistry performed in a systems-based manner, using either activity- or affinity-based probes that target proteins related by chemical reactivities or by binding site shape/properties, respectively. Systems are groups of proteins related by metabolic pathway, regulatory pathway or binding to the same ligand. Studies can be based on two main types of proteome samples: pooled proteins (1 mixture of N proteins) or isolated proteins in a given system and studied in parallel (N single protein samples). Although the field of chemical proteomics originated with the use of covalent labeling strategies such as isotope-coded affinity tagging, it is expanding to include chemical probes that bind proteins noncovalently, and to include more methods for observing protein–ligand interactions. This review presents an emerging role for nuclear magnetic resonance spectroscopy in chemical proteomics, both in vitro and in vivo. Applications include: functional proteomics using cofactor fingerprinting to assign proteins to gene families; gene family-based structural characterizations of protein–ligand complexes; gene family-focused design of drug leads; and chemical proteomic probes using nuclear magnetic resonance SOLVE and studies of protein–ligand interactions in vivo.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号