首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 187 毫秒
Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.  相似文献   

Deep learning demonstrates greater competence over traditional machine learning techniques for many tasks. In last several years, deep learning has been applied to protein function prediction and a series of good achievements has been obtained. These findings extensively advanced our understanding of protein function. However, the accuracy of protein function prediction based upon deep learning still has yet to be improved. In article number 1900019, Issue 12, Zhang et al. construct DeepFunc, a deep learning framework using derived feature information of protein sequence and protein interactions network. They find that implementing DeepFunc for protein function prediction is more accurate than using DeepGO, a similar method reported previously. Meanwhile, they find that the method of combining multiple derived feature information in DeepFunc is much better than the method of using only single derived feature information. Due to its fully exploiting feature representation learning ability, deep learning with more derived feature information will enable it to be a promising method for solving more complicated protein function prediction problems and other bioinformatics challenges. Recent researches have provided some major insights into the value for using deep learning to protein function prediction problem.  相似文献   

药物研发是非常重要但也十分耗费人力物力的过程。利用计算机辅助预测药物与蛋白质亲和力的方法可以极大地加快药物研发过程。药物靶标亲和力预测的关键在于对药物和蛋白质进行准确详细地信息表征。提出一种基于深度学习与多层次信息融合的药物靶标亲和力的预测模型,试图通过综合药物与蛋白质的多层次信息,来获得更好的预测表现。首先将药物表述成分子图和扩展连接指纹两种形式,分别利用图卷积神经网络模块和全连接层进行学习;其次将蛋白质序列和蛋白质K-mer特征分别输入卷积神经网络模块和全连接层来学习蛋白质潜在特征;随后将4个通道学习到的特征进行融合,再利用全连接层进行预测。在两个基准药物靶标亲和力数据集上验证了所提方法的有效性,并与其他已有模型作对比研究。结果说明提出的模型相比基准模型能得到更好的预测性能,表明提出的综合药物与蛋白质多层次信息的药物靶标亲和力预测策略是有效的。  相似文献   

泛素化是目前广受关注的一种翻译后修饰过程,对蛋白质降解、DNA修复等多种细胞过程都具有重要的调控作用。本文根据国内外蛋白质泛素化位点预测的研究,分析了预测泛素化位点的特征属性,总结了对这些特征进行优化的特征选择方法,并对预测过程中所使用的各种机器学习分类器进行了概述。  相似文献   

赖氨酸琥珀酰化是一种新型的翻译后修饰,在蛋白质调节和细胞功能控制中发挥重要作用,所以准确识别蛋白质中的琥珀酰化位点是有必要的。传统的实验耗费物力和财力。通过计算方法预测是近段时间以来提出的一种高效的预测方法。本研究中,我们开发了一种新的预测方法iSucc-PseAAC,它是通过使用多种分类算法结合不同的特征提取方法。最终发现,基于耦合序列(PseAAC)特征提取下,使用支持向量机分类效果是最好的,并结合集成学习解决了数据不平衡问题。与现有方法预测效果对比,iSucc-PseAAC在区分赖氨酸琥珀酰化位点方面,更具有意义和实用性。  相似文献   

药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。  相似文献   


We develop ways to predict the side chain orientations of residues within a protein structure by using several different statistical machine learning methods. Here side chain orientation of a given residue i is measured by an angle Ωi between the vector pointing from the center of the protein structure to the Cα i atom and the vector pointing from the Cα i atom to the center of its side chain atoms. To predict the Ωi angles, we construct statistical models by using several different methods such as general linear regression, a regression tree and bagging, a neural network, and a support vector machine. The root mean square errors for the different models range only from 36.67 to 37.60 degrees and the correlation coefficients are all between 30% and 34%. The performances of different models in the test set are, thus, quite similar, and show the relative predictive power of these models to be significant in comparison with random side chain orientations.  相似文献   

As a newly-identified protein post-translational modification, malonylation is involved in a variety of biological functions. Recognizing malonylation sites in substrates represents an initial but crucial step in elucidating the molecular mechanisms underlying protein malonylation. In this study, we constructed a deep learning (DL) network classifier based on long short-term memory (LSTM) with word embedding (LSTMWE) for the prediction of mammalian malonylation sites. LSTMWE performs better than traditional classifiers developed with common pre-defined feature encodings or a DL classifier based on LSTM with a one-hot vector. The performance of LSTMWE is sensitive to the size of the training set, but this limitation can be overcome by integration with a traditional machine learning (ML) classifier. Accordingly, an integrated approach called LEMP was developed, which includes LSTMWE and the random forest classifier with a novel encoding of enhanced amino acid content. LEMP performs not only better than the individual classifiers but also superior to the currently-available malonylation predictors. Additionally, it demonstrates a promising performance with a low false positive rate, which is highly useful in the prediction application. Overall, LEMP is a useful tool for easily identifying malonylation sites with high confidence. LEMP is available at http://www.bioinfogo.org/lemp.  相似文献   

Predicting the various binding sites of a protein from its structure sheds light on its function and paves the way towards design of interaction inhibitors. Here, we report ScanNet, a freely available web server for prediction of protein–protein, protein - disordered protein and protein - antibody binding sites from structure. ScanNet (Spatio-Chemical Arrangement of Neighbors Network) is an end-to-end, interpretable geometric deep learning model that learns spatio-chemical patterns directly from 3D structures. ScanNet consistently outperforms Machine Learning models based on handcrafted features and comparative modeling approaches. The web server is linked to both the PDB and AlphaFoldDB, and supports user-provided structure files. Predictions can be readily visualized on the website via the Molstar web app and locally via ChimeraX. ScanNet is available at http://bioinfo3d.cs.tau.ac.il/ScanNet/.  相似文献   

Annotation of protein functions plays an important role in understanding life at the molecular level. High‐throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time‐consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence‐ and network‐derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low‐dimensional vector which is combined with topological information extracted from protein–protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.  相似文献   

金冬  张萌  贾藏芝 《生物信息学》2022,20(3):182-188
在遗传学中,终止子是位于poly(A)位点下游、长度在数百碱基以内、包含多个回文序列、具有终止转录功能的DNA结构域,其主要作用是使转录终止。在原核生物基因组中有两类转录终止子,即Rho-dependent因子和Rho-independent因子。在本项研究中,提出了一种新的预测模型(TermCNN)来快速准确地识别细菌转录终止子。该模型将具有代表性的6-mer特征子集(2 537个特征)和电子—离子相互作用伪电位(EIIP)作为输入向量,利用卷积神经网络(CNN)构建预测模型。五折交叉验证和独立测试的结果表明该模型优于最新的预测模型iTerm-PseKNC。值得注意的是,该模型在跨物种试验中具有明显的优势。它可以高度精确地预测大肠杆菌(E. coli)和枯草芽孢杆菌(B. subtilis)的转录终止子。  相似文献   

Cervical cancer is the leading cause of death in women, mainly in developing countries, including India. Recent advancements in technologies could allow for more rapid, cost-effective, and sensitive screening and treatment measures for cervical cancer. To this end, deep learning-based methods have received importance for classifying cervical cancer patients into different risk groups. Furthermore, deep learning models are now available to study the progression and treatment of cancerous cervical conditions. Undoubtedly, deep learning methods can enhance our knowledge toward a better understanding of cervical cancer progression. However, it is essential to thoroughly validate the deep learning-based models before they can be implicated in everyday clinical practice. This work reviews recent development in deep learning approaches employed in cervical cancer diagnosis and prognosis. Further, we provide an overview of recent methods and databases leveraging these new approaches for cervical cancer risk prediction and patient outcomes. Finally, we conclude the state-of-the-art approaches for future research opportunities in this domain.  相似文献   

In eukaryotes, protein phosphorylation is specifically catalyzed by numerous protein kinases(PKs), faithfully orchestrates various biological processes, and reversibly determines cellular dynamics and plasticity. Here we report an updated algorithm of Group-based Prediction System(GPS) 5.0 to improve the performance for predicting kinase-specific phosphorylation sites(p-sites). Two novel methods, position weight determination(PWD) and scoring matrix optimization(SMO), were developed. Compared with other existing tools, GPS 5.0 exhibits a highly competitive accuracy. Besides serine/threonine or tyrosine kinases, GPS 5.0 also supports the prediction of dual-specificity kinase-specific p-sites. In the classical module of GPS 5.0, 617 individual predictors were constructed for predicting p-sites of 479 human PKs. To extend the application of GPS5.0, a species-specific module was implemented to predict kinase-specific p-sites for 44,795 PKs in161 eukaryotes. The online service and local packages of GPS 5.0 are freely available for academic research at http://gps.biocuckoo.cn.  相似文献   

Proteases are enzymes that cleave and hydrolyse the peptide bonds between two specific amino acid residues of target substrate proteins. Protease-controlled proteolysis plays a key role in the degradation and recycling of proteins, which is essential for various physiological processes.Thus, solving the substrate identification problem will have important implications for the precise understanding of functions and physiological roles of proteases, as well as for therapeutic target identification and pharmaceutical applicability. Consequently, there is a great demand for bioinformatics methods that can predict novel substrate cleavage events with high accuracy by utilizing both sequence and structural information. In this study, we present Procleave, a novel bioinformatics approach for predicting protease-specific substrates and specific cleavage sites by taking into account both their sequence and 3D structural information. Structural features of known cleavage sites were represented by discrete values using a LOWESS data-smoothing optimization method,which turned out to be critical for the performance of Procleave. The optimal approximations of all structural parameter values were encoded in a conditional random field(CRF) computational framework, alongside sequence and chemical group-based features. Here, we demonstrate the outstanding performance of Procleave through extensive benchmarking and independent tests. Procleave is capable of correctly identifying most cleavage sites in the case study. Importantly, when applied to the human structural proteome encompassing 17,628 protein structures, Procleave suggests a number of potential novel target substrates and their corresponding cleavage sites of different proteases.Procleave is implemented as a webserver and is freely accessible at http://procleave.erc.monash.edu/.  相似文献   

The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.  相似文献   

目的:提出一种基于人类视觉注意力机制的RE-Net网络结构以使卷积神经网络(CNN)更适用于眼底相的屈光不正的智能诊断评估。方法:RE-Net由Res Net34作为骨干网络,进一步使用了上下文注意力模块,包括通道注意力机制和空间注意力机制,使其相应的通道发挥最大的作用,提高响应区域的权重。结果:使用了4358张眼底图像作为RE-Net的训练集。在包含485张眼底图像的测试集上,分类准确率分别为,高度近视93.3%,中度近视89.7%,轻度近视83.2%,轻度远视82.5%,中度远视79.5%,重度远视84.6%,平均分类准确率达85.5%,曲线下面积(AUC)为0.909,灵敏度为0.93,特异性为0.89, Kappa值为0.79 (x~2=23.21,P0.05)。结论:基于深度学习的RE-NET人工智能诊断系统能较好进行屈光不正的诊断评估,有望为屈光不正提供一种新的筛查工具。  相似文献   


A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local φ-ψ energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 Å backbone rmsd for fragments of about 60–70 residues of a-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of β-sheet structures are briefly described.  相似文献   



Adhesion of the Trypanosoma cruzi trypomastigotes, the causative agent of Chagas'' disease in humans, to components of the extracellular matrix (ECM) is an important step in host cell invasion. The signaling events triggered in the parasite upon binding to ECM are less explored and, to our knowledge, there is no data available regarding •NO signaling.

Methodology/Principal Findings

Trypomastigotes were incubated with ECM for different periods of time. Nitrated and S-nitrosylated proteins were analyzed by Western blotting using anti-nitrotyrosine and S-nitrosyl cysteine antibodies. At 2 h incubation time, a decrease in NO synthase activity, •NO, citrulline, arginine and cGMP concentrations, as well as the protein modifications levels have been observed in the parasite. The modified proteins were enriched by immunoprecipitation with anti-nitrotyrosine antibodies (nitrated proteins) or by the biotin switch method (S-nitrosylated proteins) and identified by MS/MS. The presence of both modifications was confirmed in proteins of interest by immunoblotting or immunoprecipitation.


For the first time it was shown that T. cruzi proteins are amenable to modifications by S-nitrosylation and nitration. When T. cruzi trypomastigotes are incubated with the extracellular matrix there is a general down regulation of these reactions, including a decrease in both NOS activity and cGMP concentration. Notwithstanding, some specific proteins, such as enolase or histones had, at least, their nitration levels increased. This suggests that post-translational modifications of T. cruzi proteins are not only a reflex of NOS activity, implying other mechanisms that circumvent a relatively low synthesis of •NO. In conclusion, the extracellular matrix, a cell surrounding layer of macromolecules that have to be trespassed by the parasite in order to be internalized into host cells, contributes to the modification of •NO signaling in the parasite, probably an essential move for the ensuing invasion step.  相似文献   

Computational approaches for predicting protein-protein interfaces are extremely useful for understanding and modelling the quaternary structure of protein assemblies. In particular, partner-specific binding site prediction methods allow delineating the specific residues that compose the interface of protein complexes. In recent years, new machine learning and other algorithmic approaches have been proposed to solve this problem. However, little effort has been made in finding better training datasets to improve the performance of these methods. With the aim of vindicating the importance of the training set compilation procedure, in this work we present BIPSPI+, a new version of our original server trained on carefully curated datasets that outperforms our original predictor. We show how prediction performance can be improved by selecting specific datasets that better describe particular types of protein interactions and interfaces (e.g. homo/hetero). In addition, our upgraded web server offers a new set of functionalities such as the sequence-structure prediction mode, hetero- or homo-complex specialization and the guided docking tool that allows to compute 3D quaternary structure poses using the predicted interfaces. BIPSPI+ is freely available at https://bipspi.cnb.csic.es.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号