共查询到20条相似文献,搜索用时 0 毫秒
1.
Arguably, 2020 was the year of high-accuracy protein structure predictions, with AlphaFold 2.0 achieving previously unseen accuracy in the Critical Assessment of Protein Structure Prediction (CASP). In 2021, DeepMind and EMBL-EBI developed the AlphaFold Protein Structure Database to make an unprecedented number of reliable protein structure predictions easily accessible to the broad scientific community. We provide a brief overview and describe the latest developments in the AlphaFold database. We highlight how the fields of data services, bioinformatics, structural biology, and drug discovery are directly affected by the influx of protein structure data. We also show examples of cutting-edge research that took advantage of the AlphaFold database. It is apparent that connections between various fields through protein structures are now possible, but the amount of data poses new challenges. Finally, we give an outlook regarding the future direction of the database, both in terms of data sets and new functionalities. 相似文献
2.
蛋白质结构预测是生命科学和医学的重要研究领域,也是人工智能在科学研究中的重要应用场景。AlphaFold2是由DeepMind开发的一种基于深度学习的蛋白质结构预测系统,可以从氨基酸序列中高效地生成原子级精度的蛋白质空间结构。由于AlphaFold2优越的性能,自问世以来对蛋白质结构预测方面的研究提供了前所未有的助力,因此备受关注和研究。本文介绍了AlphaFold2的模型架构、亮点、局限性和应用进展,列举了几种其他类型的蛋白质结构预测模型,并讨论了其能力、优势及局限性并思考该蛋白质结构预测模型的未来发展方向。 相似文献
3.
Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from coevolutionary analysis via machine learning to have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences where templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structures, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement. 相似文献
4.
Theodoros K. Karamanos 《Biopolymers》2023,114(3):e23530
Coevolution between protein residues is normally interpreted as direct contact. However, the evolutionary record of a protein sequence contains rich information that may include long-range functional couplings, couplings that report on homo-oligomeric states or even conformational changes. Due to the complexity of the sequence space and the lack of structural information on various members of a protein family, it has been difficult to effectively mine the additional information encoded in a multiple sequence alignment (MSA). Here, taking advantage of the recent release of the AlphaFold (AF) database we attempt to identify coevolutionary couplings that cannot be explained simply by spatial proximity. We propose a simple computational method that performs direct coupling analysis on a MSA and searches for couplings that are not satisfied in any of the AF models of members of the identified protein family. Application of this method on 2012 protein families suggests that ~12% of the total identified coevolving residue pairs are spatially distant and more likely to be disordered than their contacting counterparts. We expect that this analysis will help improve the quality of coevolutionary distance restraints used for structure determination and will be useful in identifying potentially functional/allosteric cross-talk between distant residues. 相似文献
5.
Airlie J. McCoy Massimo D. Sammito Randy J. Read 《Acta Crystallographica. Section D, Structural Biology》2022,78(1):1-13
The AlphaFold2 results in the 14th edition of Critical Assessment of Structure Prediction (CASP14) showed that accurate (low root-mean-square deviation) in silico models of protein structure domains are on the horizon, whether or not the protein is related to known structures through high-coverage sequence similarity. As highly accurate models become available, generated by harnessing the power of correlated mutations and deep learning, one of the aspects of structural biology to be impacted will be methods of phasing in crystallography. Here, the data from CASP14 are used to explore the prospects for changes in phasing methods, and in particular to explore the prospects for molecular-replacement phasing using in silico models. 相似文献
6.
宫维斌 《生物化学与生物物理进展》2024,51(12):3073-3083
近年来,基于深度学习的方法研究在蛋白质结构预测领域实现了重大突破。AlphaFold 2(AF2)于2021年开源发布,实现了蛋白质暨蛋白质复合物三维结构的高精度预测,使得研究人员能够快速获取可靠的三维结构信息,显著加速了蛋白质结构与功能研究的进展。2024年发布的AlphaFold 3(AF3)更进一步,能对蛋白质-核酸、蛋白质-小分子等生物复合物的三维结构进行精准预测。AF3采用改进的算法与更高效的模型,大幅提升了预测准确度,特别是在抗原-抗体复合物、蛋白质-小分子复合物等方面展现出卓越的性能。AlphaFold的成功不仅为结构生物学带来了革命性进展,还在药物研发、蛋白质设计、分子功能机制研究等领域展示了巨大的应用潜力,推动了生物医学研究的革新。本文将回顾AlphaFold及相关蛋白质结构预测方法的研发历史,概述其关键技术和当下应用,并结合其局限性,展望未来的研究方向和应用。 相似文献
7.
Gerard J. Kleywegt 《Acta Crystallographica. Section D, Structural Biology》1999,55(11):1878-1884
Prior to attaching any biological significance to differences between two related protein crystal structures, it must be established that such differences are genuine, rather than artefacts of the structure-determination protocol. This will be all the more important as more and more related protein structures are solved and comparative structural biology attempts to correlate structural differences with variations in biological function, activity or affinity. A method has been developed which enables unbiased assessment of differences between the structures of related biomacromolecules using experimental crystallographic information alone. It is based on the use of local density-correlation maps, which contain information regarding the similarity of the experimental electron density for corresponding parts of different copies of a molecule. The method can be used to assess a priori which parts of two or more molecules are likely to be structurally similar; this information can then be employed during structure refinement. Alternatively, the method can be used a posteriori to verify that differences observed in two or more models are supported by the experimental information. Several examples are discussed which validate the notion that local conformational variability is highly correlated to differences in the local experimental electron density. 相似文献
8.
Maarten A. Brems Robert Runkel Todd O. Yeates Peter Virnau 《Protein science : a publication of the Protein Society》2022,31(8)
The computer artificial intelligence system AlphaFold has recently predicted previously unknown three‐dimensional structures of thousands of proteins. Focusing on the subset with high‐confidence scores, we algorithmically analyze these predictions for cases where the protein backbone exhibits rare topological complexity, that is, knotting. Amongst others, we discovered a 71‐knot, the most topologically complex knot ever found in a protein, as well several six‐crossing composite knots comprised of two methyltransferase or carbonic anhydrase domains, each containing a simple trefoil knot. These deeply embedded composite knots occur evidently by gene duplication and interconnection of knotted dimers. Finally, we report two new five‐crossing knots including the first 51‐knot. Our list of analyzed structures forms the basis for future experimental studies to confirm these novel‐knotted topologies and to explore their complex folding mechanisms. 相似文献
9.
《Journal of molecular biology》2021,433(20):167127
Characterizing the three-dimensional structure of macromolecules is central to understanding their function. Traditionally, structures of proteins and their complexes have been determined using experimental techniques such as X-ray crystallography, NMR, or cryo-electron microscopy—applied individually or in an integrative manner. Meanwhile, however, computational methods for protein structure prediction have been improving their accuracy, gradually, then suddenly, with the breakthrough advance by AlphaFold2, whose models of monomeric proteins are often as accurate as experimental structures. This breakthrough foreshadows a new era of computational methods that can build accurate models for most monomeric proteins. Here, we envision how such accurate modeling methods can combine with experimental structural biology techniques, enhancing integrative structural biology. We highlight the challenges that arise when considering multiple structural conformations, protein complexes, and polymorphic assemblies. These challenges will motivate further developments, both in modeling programs and in methods to solve experimental structures, towards better and quicker investigation of structure–function relationships. 相似文献
10.
Proteins consisting of repeating amino acid motifs are abundant in all kingdoms of life, especially in higher eukaryotes. Repeat-containing proteins self-organize into elongated non-globular structures. Do the same general underlying principles that dictate the folding of globular domains apply also to these extended topologies? Using a simplified structure-based model capturing a perfectly funneled energy landscape, we surveyed the predicted mechanism of folding for ankyrin repeat containing proteins. The ankyrin family is one of the most extensively studied classes of non-globular folds. The model based only on native contacts reproduces most of the experimental observations on the folding of these proteins, including a folding mechanism that is reminiscent of a nucleation propagation growth. The confluence of simulation and experimental results suggests that the folding of non-globular proteins is accurately described by a funneled energy landscape, in which topology plays a determinant role in the folding mechanism. 相似文献
11.
Jorge Roel-Touris;Lourdes Carcelén;Enrique Marcos; 《Protein science : a publication of the Protein Society》2024,33(4):e4936
De novo designing immunoglobulin-like frameworks that allow for functional loop diversification shows great potential for crafting antibody-like scaffolds with fully customizable structures and functions. In this work, we combined de novo parametric design with deep-learning methods for protein structure prediction and design to explore the structural landscape of 7-stranded immunoglobulin domains. After screening folding of nearly 4 million designs, we have assembled a structurally diverse library of ~50,000 immunoglobulin domains with high-confidence AlphaFold2 predictions and structures diverging from naturally occurring ones. The designed dataset enabled us to identify structural requirements for the correct folding of immunoglobulin domains, shed light on β-sheet–β-sheet rotational preferences and how these are linked to functional properties. Our approach eliminates the need for preset loop conformations and opens the route to large-scale de novo design of immunoglobulin-like frameworks. 相似文献
12.
Lasse Middendorf;Lars A. Eicholt; 《Proteins》2024,92(6):757-767
Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins. 相似文献
13.
《Structure (London, England : 1993)》2023,31(1):111-119.e2
- Download : Download high-res image (160KB)
- Download : Download full-size image
14.
Nabuurs SB Krieger E Spronk CA Nederveen AJ Vriend G Vuister GW 《Journal of biomolecular NMR》2005,33(2):123-134
For biomolecular NMR structures typically only a poor correspondence is observed between statistics derived from the experimental
input data and structural quality indicators obtained from the structure ensembles. Here, we investigate the relationship
between the amount of available NMR data and structure quality. By generating datasets with a predetermined information content
and evaluating the quality of the resulting structure ensembles we show that there is, in contrast to previous findings, a
linear relation between the information contained in experimental data and structural quality. From this relation, a new quality
parameter is derived that provides direct insight, on a per-residue basis, into the extent to which structural quality is
governed by the experimental input data. 相似文献
15.
William Sheffler David Baker 《Protein science : a publication of the Protein Society》2010,19(10):1991-1995
We present an improved version of RosettaHoles, a methodology for quantitative and visual characterization of protein core packing. RosettaHoles2 features a packing measure more rapidly computable, accurate and physically transparent, as well as a new validation score intended for structures submitted to the Protein Data Bank. The differential packing measure is parameterized to maximize the gap between computationally generated and experimentally determined X‐ray structures, and can be used in refinement of protein structure models. The parameters of the model provide insight into components missing in current force fields, and the validation score gives an upper bound on the X‐ray resolution of Protein Data Bank structures; a crystal structure should have a validation score as good as or better than its resolution. 相似文献
16.
Alvaro Martin Hermosilla;Carolin Berner;Sergey Ovchinnikov;Anastassia A. Vorobieva; 《Protein science : a publication of the Protein Society》2024,33(7):e5033
In silico validation of de novo designed proteins with deep learning (DL)-based structure prediction algorithms has become mainstream. However, formal evidence of the relationship between a high-quality predicted model and the chance of experimental success is lacking. We used experimentally characterized de novo water-soluble and transmembrane β-barrel designs to show that AlphaFold2 and ESMFold excel at different tasks. ESMFold can efficiently identify designs generated based on high-quality (designable) backbones. However, only AlphaFold2 can predict which sequences have the best chance of experimentally folding among similar designs. We show that ESMFold can generate high-quality structures from just a few predicted contacts and introduce a new approach based on incremental perturbation of the prediction (“in silico melting”), which can reveal differences in the presence of favorable contacts between designs. This study provides a new insight on DL-based structure prediction models explainability and on how they could be leveraged for the design of increasingly complex proteins; in particular membrane proteins which have historically lacked basic in silico validation tools. 相似文献
17.
Oligopeptide biases in protein sequences and their use in predicting protein coding regions in nucleotide sequences 总被引:16,自引:0,他引:16
We have examined oligopeptides with lengths ranging from 2 to 11 residues in protein sequences that show no obvious evolutionary relationship. All sequences in the Protein Identification Resource database were carefully classified by sensitive homology searches into superfamilies to obtain unbiased oligopeptide counts. The results, contrary to previous studies, show clear prejudices in protein sequences. The oligopeptide preferences were used to help decide the significance of sequence homologies and to improve the more general methods for detecting protein coding regions within nucleotide sequences. 相似文献
18.
Based on the geometrical parameters around seventeen incorrectly assigned trans conformations of peptide bonds in protein structures and their correct cis counterparts, we have devised an algorithm that is capable of detecting these sites. The algorithm was optimized to reliably find all of the seventeen test cases. It can be used to quickly scan an atomic coordinate file or the complete Brookhaven Protein Data Base for more likely candidates for non‐Pro cis peptide bonds. Also, it can be of help to guide the crystallographer in intermediate stages of structure determination towards suspect areas. © 1999 John Wiley & Sons, Inc. Biopoly 50: 536–544, 1999 相似文献
19.
Mia L. Raves Jurgen F. Doreleijers Hans Vis Constantin E. Vorgias Keith S. Wilson Robert Kaptein 《Journal of biomolecular NMR》2001,21(3):235-248
Joint refinement, i.e., the simultaneous refinement of a structure against both nuclear magnetic resonance (NMR) spectroscopic and X-ray crystallographic data, was performed on the HU protein from Bacillus stearothermophilus (HUBst). The procedure was aimed at investigating the compatibility of the two data sets and at identifying conflicting information. Wherever important differences were found, such as peptide flips in the main-chain conformation, the data were further analyzed to find the cause. The NMR data showed some errors arising either from the manual interpretation of the spectra or from the incorrect account for spin diffusion. The most important artefact inherent to the X-ray data is the crystal packing of the molecules: the effects range from the limitation of the freedom of the flexible parts of the HUBst molecule to possibly one of the peptide flips. 相似文献
20.
Roman A. Laskowski Janet M. Thornton 《Protein science : a publication of the Protein Society》2022,31(1):283
The PDBsum web server provides structural analyses of the entries in the Protein Data Bank (PDB). Two recent additions are described here. The first is the detailed analysis of the SARS‐CoV‐2 virus protein structures in the PDB. These include the variants of concern, which are shown both on the sequences and 3D structures of the proteins. The second addition is the inclusion of the available AlphaFold models for human proteins. The pages allow a search of the protein against existing structures in the PDB via the Sequence Annotated by Structure (SAS) server, so one can easily compare the predicted model against experimentally determined structures. The server is freely accessible to all at http://www.ebi.ac.uk/pdbsum. 相似文献