一般的蛋白质对接程序能够提供大量的待选构象,但其中仅含有少量的正确构象。现在对接的主要工作在于如何从这些大量构象中挑出正确构象。我们先前的研究工作证明蛋白质界面比非界面表面具有更高的能量。在这里,我们使用由chen等人提出的一个用于检验、设计对接程序的蛋白质复合物标准库中的非抗原-抗体复合物,将侧链能量运用到对接中,并比较了侧链能量和残基配对倾向性、残基组成倾向性、残基保守性在对接中的表现。单独使用这四项的正确构象的平均百排分位排序分别为:38.6±19.6、26.3±20.8、22.7±16.6和37.8±26.1,但是对于个别蛋白,侧链能量的表现要优于其它的三个参数。我们将四个参数综合起来考虑,发展了一个新的打分函数,平均百排分位排序为22.2±7.8,并且提高了筛选效率。  相似文献   

目前,基于计算机数学方法对基因的功能注释已成为热点及挑战,其中以机器学习方法应用最为广泛。生物信息学家不断提出有效、快速、准确的机器学习方法用于基因功能的注释,极大促进了生物医学的发展。本文就关于机器学习方法在基因功能注释的应用与进展作一综述。主要介绍几种常用的方法,包括支持向量机、k近邻算法、决策树、随机森林、神经网络、马尔科夫随机场、logistic回归、聚类算法和贝叶斯分类器,并对目前机器学习方法应用于基因功能注释时如何选择数据源、如何改进算法以及如何提高预测性能上进行讨论。  相似文献   

Chen J  Lu Z  Sakon J  Stites WE 《Biophysical chemistry》2004,110(3):15334-248
Efforts to design proteins with greatly reduced sequence diversity have often resulted in proteins with so-called molten globule properties. Substitutions were made at six neighboring sites in the major hydrophobic core of staphylococcal nuclease to create variants with all leucine, all isoleucine or all valine at these sites. The mutant proteins with simplified cores constructed here are quite unstable and have poorly packed cores, attested to by interaction energies. Eight related mutants with greater sequence diversity were also constructed. Comparison to these mutants and 159 other permutations of these 3 aliphatic side chains at these same 6 sites previously constructed shows that the simplified cores are not unusual in their stabilities or interaction energies. Further, crystal structures of the two mutants with the worst packing, as measured by interaction energies, showed no unusual disorder in the core. Therefore, reduction of sequence diversity is not necessarily incompatible with a single stable native structure. Other factors must also contribute to previous protein design failures.  相似文献   

Prediction of protein structure from sequence has been intensely studied for many decades, owing to the problem's importance and its uniquely well-defined physical and computational bases. While progress has historically ebbed and flowed, the past two years saw dramatic advances driven by the increasing “neuralization” of structure prediction pipelines, whereby computations previously based on energy models and sampling procedures are replaced by neural networks. The extraction of physical contacts from the evolutionary record; the distillation of sequence–structure patterns from known structures; the incorporation of templates from homologs in the Protein Databank; and the refinement of coarsely predicted structures into finely resolved ones have all been reformulated using neural networks. Cumulatively, this transformation has resulted in algorithms that can now predict single protein domains with a median accuracy of 2.1 Å, setting the stage for a foundational reconfiguration of the role of biomolecular modeling within the life sciences.  相似文献   

药物从研发到临床应用需要耗费较长的时间,研发期间的投入成本可高达十几亿元。而随着医药研发与人工智能的结合以及生物信息学的飞速发展,药物活性相关数据急剧增加,传统的实验手段进行药物活性预测已经难以满足药物研发的需求。借助算法来辅助药物研发,解决药物研发中的各种问题能够大大推动药物研发进程。传统机器学习方法尤其是随机森林、支持向量机和人工神经网络在药物活性方面能够达到较高的预测精度。深度学习由于具有多层神经网络,模型可以接收高维的输入变量且不需要人工限定数据输入特征,可以拟合较为复杂的函数模型,应用于药物研发可以进一步提高各个环节的效率。在药物活性预测中应用较为广泛的深度学习模型主要是深度神经网络(deep neural networks,DNN)、循环神经网络(recurrent neural networks,RNN)和自编码器(auto encoder,AE),而生成对抗网络(generative adversarial networks,GAN)由于其生成数据的能力常常被用来和其他模型结合进行数据增强。近年来深度学习在药物分子活性预测方面的研究和应用综述表明,深度学习模型的准确度和效率均高于传统实验方法和传统机器学习方法。因此,深度学习模型有望成为药物研发领域未来十年最重要的辅助计算模型。  相似文献   

In higher eukaryotic cells, chromosomes are folded inside the nucleus. Recent advances in whole-genome mapping technologies have revealed the multiscale features of 3D genome organization that are intertwined with fundamental genome functions. However, DNA sequence determinants that modulate the formation of 3D genome organization remain poorly characterized. In the past few years, predicting 3D genome organization based on DNA sequence features has become an active area of research. Here, we review the recent progress in computational approaches to unraveling important sequence elements for 3D genome organization. In particular, we discuss the rapid development of machine learning-based methods that facilitate the connections between DNA sequence features and 3D genome architectures at different scales. While much progress has been made in developing predictive models for revealing important sequence features for 3D genome organization, new research is urgently needed to incorporate multi-omic data and enhance model interpretability, further advancing our understanding of gene regulation mechanisms through the lens of 3D genome organization.  相似文献   

赖氨酸琥珀酰化是一种新型的翻译后修饰,在蛋白质调节和细胞功能控制中发挥重要作用,所以准确识别蛋白质中的琥珀酰化位点是有必要的。传统的实验耗费物力和财力。通过计算方法预测是近段时间以来提出的一种高效的预测方法。本研究中,我们开发了一种新的预测方法iSucc-PseAAC,它是通过使用多种分类算法结合不同的特征提取方法。最终发现,基于耦合序列(PseAAC)特征提取下,使用支持向量机分类效果是最好的,并结合集成学习解决了数据不平衡问题。与现有方法预测效果对比,iSucc-PseAAC在区分赖氨酸琥珀酰化位点方面,更具有意义和实用性。  相似文献   

Intrinsically disordered or unstructured proteins (or regions in proteins) have been found to be important in a wide range of biological functions and implicated in many diseases. Due to the high cost and low efficiency of experimental determination of intrinsic disorder and the exponential increase of unannotated protein sequences, developing complementary computational prediction methods has been an active area of research for several decades. Here, we employed an ensemble of deep Squeeze-and-Excitation residual inception and long short-term memory (LSTM) networks for predicting protein intrinsic disorder with input from evolutionary information and predicted one-dimensional structural properties. The method, called SPOT-Disorder2, offers substantial and consistent improvement not only over our previous technique based on LSTM networks alone, but also over other state-of-the-art techniques in three independent tests with different ratios of disordered to ordered amino acid residues, and for sequences with either rich or limited evolutionary information. More importantly, semi-disordered regions predicted in SPOT-Disorder2 are more accurate in identifying molecular recognition features (MoRFs) than methods directly designed for MoRFs prediction. SPOT-Disorder2 is available as a web server and as a standalone program at https://sparks-lab.org/server/spot-disorder2/.  相似文献   

Protein nitration and nitrosylation are essential post-translational modifications(PTMs)involved in many fundamental cellular processes. Recent studies have revealed that excessive levels of nitration and nitrosylation in some critical proteins are linked to numerous chronic diseases.Therefore, the identification of substrates that undergo such modifications in a site-specific manner is an important research topic in the community and will provide candidates for targeted therapy. In this study, we aimed to develop a computational tool for predicting nitration and nitrosylation sites in proteins. We first constructed four types of encoding features, including positional amino acid distributions, sequence contextual dependencies, physicochemical properties, and position-specificscoring features, to represent the modified residues. Based on these encoding features, we established a predictor called DeepNitro using deep learning methods for predicting protein nitration and nitrosylation. Using n-fold cross-validation, our evaluation shows great AUC values for DeepNitro, 0.65 for tyrosine nitration, 0.80 for tryptophan nitration, and 0.70 for cysteine nitrosylation, respectively,demonstrating the robustness and reliability of our tool. Also, when tested in the independent dataset, DeepNitro is substantially superior to other similar tools with a 7%à42% improvement in the prediction performance. Taken together, the application of deep learning method and novel encoding schemes, especially the position-specific scoring feature, greatly improves the accuracy of nitration and nitrosylation site prediction and may facilitate the prediction of other PTM sites. DeepNitro is implemented in JAVA and PHP and is freely available for academic research at http://deepnitro.renlab.org.  相似文献   

A simple analytical model is presented for the prediction of methyl-side chain dynamics in comparison with S(2) order parameters obtained by NMR relaxation spectroscopy. The model, which is an extension of the local contact model for backbone order parameter prediction, uses a static 3D protein structure as input. It expresses the methyl-group S(2) order parameters as a function of local contacts of the methyl carbon with respect to the neighboring atoms in combination with the number of consecutive mobile dihedral angles between the methyl group and the protein backbone. For six out of seven proteins the prediction results are good when compared with experimentally determined methyl-group S(2) values with an average correlation coefficient r = 0.65+/-0.14. For the unusually rigid cytochrome c(2) no significant correlation between prediction and experiment is found. The presented model provides independent support for the reliability of current side-chain relaxation methods along with their interpretation by the model-free formalism.  相似文献   

In this article, we describe our efforts in contact prediction in the CASP13 experiment. We employed a new deep learning-based contact prediction tool, DeepMetaPSICOV (or DMP for short), together with new methods and data sources for alignment generation. DMP evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network. We also improved our method for multiple sequence alignment generation and included metagenomic sequences in the search. We discuss successes and failures of our approach and identify areas where further improvements may be possible. DMP is freely available at: https://github.com/psipred/DeepMetaPSICOV .  相似文献   

Xu H  Xu H  Lin M  Wang W  Li Z  Huang J  Chen Y  Chen X 《Proteomics》2007,7(23):4255-4263
Current drug discovery and development approaches rely extensively on the identification and validation of appropriate targets; for example, those with marketable and robust therapeutics. Wide-ranging efforts have been directed at this problem and various approaches have been developed to identify disease-associated genes as candidates. In this work, we show with statistical significance that successful drug targets, in addition to their linkage to disease, share common characteristics that are disease-independent. For example, marked differences in functional category, tissue specificity, and sequence variability are observed between known targets and average proteins. These results lead to an interesting hypothesis: potentially good drug targets shall have some desired properties, which we refer to as "drug target-likeness" that are beyond their disease-associations. Because of the limited availability of comprehensive protein characteristics data, we tried to learn the drug target-likeness property at the sequence level. Results show that a support vector machine model is able to accurately distinguish targets from nontargets entirely with sequence features. It is our hope that these encouraging results will invite future systematic proteomic scale experiments to gather necessary protein characteristics data for the accurate and predictive definition of "drug target-likeness", providing a new perspective toward understanding and pursuing effective therapeutics.  相似文献   

The use of antigenicity scales based on physicochemical properties and the sliding window method in combination with an averaging algorithm and subsequent search for the maximum value is the classical method for B-cell epitope prediction. However, recent studies have demonstrated that the best classical methods provide a poor correlation with experimental data. We review both classical and novel algorithms and present our own implementation of the algorithms. The AAPPred software is available at http://www.bioinf.ru/aappred/.  相似文献   

We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.  相似文献   

microRNA(miRNA)是一类不编码蛋白的调控小分子RNA,在真核生物中发挥着广泛而重要的调控功能.由于miRNA的表达具有时空特异性,因而通过计算方法预测miRNA而后有针对性的实验验证是miRNA发现的一条重要途径.降低假阳性率是miRNA预测方法面临的重要挑战.本研究采用集成学习方法构建预测miRNA前体的分类器SVMbagging,对训练集、测试集和独立测试集的结果表明,本研究的方法性能稳健、假阳性率低,具有很好的泛化能力,尤其是当阈值取0.9时,特异性高达99.90%,敏感性在26%以上,适合于全基因组预测.采用SVMbagging在人全基因组中预测miRNA前体,当取阈值0.9时,得到14933个可能的miRNA前体.通过与高通量小RNA测序数据的比较,发现其中4481个miRNA前体具有完全匹配的小RNA序列,与理论估计的真阳性数值非常接近.最后,对32个可能的miRNA进行实验验证,确定其中2条为真实的miRNA.  相似文献   

Recent advancements in Artificial Intelligence (AI) and Machine Learning (ML) technology have brought on substantial strides in predicting and identifying health emergencies, disease populations, and disease state and immune response, amongst a few. Although, skepticism remains regarding the practical application and interpretation of results from ML-based approaches in healthcare settings, the inclusion of these approaches is increasing at a rapid pace. Here we provide a brief overview of machine learning-based approaches and learning algorithms including supervised, unsupervised, and reinforcement learning along with examples. Second, we discuss the application of ML in several healthcare fields, including radiology, genetics, electronic health records, and neuroimaging. We also briefly discuss the risks and challenges of ML application to healthcare such as system privacy and ethical concerns and provide suggestions for future applications.  相似文献   

Yuan Z  Burrage K  Mattick JS 《Proteins》2002,48(3):566-570
A Support Vector Machine learning system has been trained to predict protein solvent accessibility from the primary structure. Different kernel functions and sliding window sizes have been explored to find how they affect the prediction performance. Using a cut-off threshold of 15% that splits the dataset evenly (an equal number of exposed and buried residues), this method was able to achieve a prediction accuracy of 70.1% for single sequence input and 73.9% for multiple alignment sequence input, respectively. The prediction of three and more states of solvent accessibility was also studied and compared with other methods. The prediction accuracies are better than, or comparable to, those obtained by other methods such as neural networks, Bayesian classification, multiple linear regression, and information theory. In addition, our results further suggest that this system may be combined with other prediction methods to achieve more reliable results, and that the Support Vector Machine method is a very useful tool for biological sequence analysis.  相似文献   

在基于质谱技术的蛋白质鉴定过程中,数据库搜索是主要的方法。漏切位点和酶切规则决定了图谱候选肽段的范围,是数据库搜索算法的重要参数。对于常用的胰蛋白酶切来说,除了局部构象、三维结构、实验条件,以及其它偶然因素会影响赖氨酸K或者精氨酸R后的位点能否被酶切外,该位点附近的其它氨基酸也会影响蛋白水解酶的酶切效果。从质谱图谱中时常会鉴定出包含漏切位点的肽段,因此,预测蛋白质的酶切位点能够为数据库搜索算法提供更为可靠的模型,也能够为了解和分析蛋白质的酶切规律提供依据。本文提出了一种基于马尔科夫(Markov)链的预测方法,能够利用蛋白质的序列信息来预测候选酶切位点的酶切概率,在蛋白酶切过程中,预测肽段的覆盖率可以达到85%以上。  相似文献   

在蛋白质结构预测的研究中,一个重要的问题就是正确预测二硫键的连接,二硫键的准确预测可以减少蛋白质构像的搜索空间,有利于蛋白质3D结构的预测,本文将预测二硫键的连接问题转化成对连接模式的分类问题,并成功地将支持向量机方法引入到预测工作中。通过对半胱氨酸局域序列连接模式的分类预测,可以由蛋白质的一级结构序列预测该蛋白质的二硫键的连接。结果表明蛋白质的二硫键的连接与半胱氨酸局域序列连接模式有重要联系,应用支持向量机方法对蛋白质结构的二硫键预测取得了良好的结果。  相似文献   

