期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Positioning of anchor groups in protein loop prediction: the importance of solvent accessibility and secondary structure elements

Wohlfahrt G Hangoc V Schomburg D 《Proteins》2002,47(3):370-378

The prediction of loop regions in the process of protein structure prediction by homology is still an unsolved problem. In an earlier publication, we could show that the correct placement of the amino acids serving as an anchor group to be connected by a loop fragment with a predicted geometry is a highly important step and an essential requirement within the process (Lessel and Schomburg, Proteins 1999; 37:56-64). In this article, we present an analysis of the quality of possible loop predictions with respect to gap length, fragment length, amino acid type, secondary structure, and solvent accessibility. For 550 insertions and 544 deletions, we test all possible positions for anchor groups with an inserted loop of a length between 3 and 12 amino acids. We could show that approximately 80% of the indel regions could be predicted within 1.5 A RMSD from a knowledge-based loop data base if criteria for the correct localization of anchor groups could be found and the loops can be sorted correctly. From our analysis, several conclusions regarding the optimal placement of anchor groups become obvious: (1) The correct placement of anchor groups is even more important for longer gap lengths, (2) medium length fragments (length 5-8) perform better than short or long ones, (3) the placement of anchor groups at hydrophobic amino acids gives a higher chance to include the best possible loop, (4) anchor groups within secondary structure elements, in particular beta-sheets are suitable, (5) amino acids with lower solvent accessibility are better anchor group. A preliminary test using a combination of the anchor group positioning criteria deduced from our analysis shows very promising results. 相似文献

2.

Importance of anchor group positioning in protein loop prediction.

U Lessel D Schomburg 《Proteins》1999,37(1):56-64

The aim of loop prediction in protein homology modeling is to connect the main chain ends of two successive regions, conserved in template and target structures by protein fragments that are as similar to the target as possible. For the development of a new loop prediction method, examples of insertions and deletions were searched automatically in data sets of structurally aligned protein pairs. Three different criteria were applied for the determination of the positions where the main chain conformations of the proteins begin to differ, i.e., the anchoring groups of the insertions and deletions, giving three test data sets. The target structures in these data sets were predicted by inserting fragments from different fragment data banks between the anchoring groups of the templates. The proposals of matching fragments were sorted with decreasing correspondence in the geometry of the anchoring groups. For assessment of the prediction quality, the template loops were substituted by the proposed ones, and their root mean square deviations to the target structures were determined. In addition, the best 20 fragments in the whole loop data bank used-those with the lowest deviations from the target structures after insertion into the templates-were determined and compared with the proposals. The analysis of the results shows limitations of knowledge-based loop prediction. It is demonstrated that the selection of the anchoring groups is the most important step in the whole procedure. Proteins 1999;37:56-64. 相似文献

3.

Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model

de Bakker PI DePristo MA Burke DF Blundell TL 《Proteins》2003,51(1):21-40

The accuracy of model selection from decoy ensembles of protein loop conformations was explored by comparing the performance of the Samudrala-Moult all-atom statistical potential (RAPDF) and the AMBER molecular mechanics force field, including the Generalized Born/surface area solvation model. Large ensembles of consistent loop conformations, represented at atomic detail with idealized geometry, were generated for a large test set of protein loops of 2 to 12 residues long by a novel ab initio method called RAPPER that relies on fine-grained residue-specific phi/psi propensity tables for conformational sampling. Ranking the conformers on the basis of RAPDF scores resulted in selected conformers that had an average global, non-superimposed RMSD for all heavy mainchain atoms ranging from 1.2 A for 4-mers to 2.9 A for 8-mers to 6.2 A for 12-mers. After filtering on the basis of anchor geometry and RAPDF scores, ranking by energy minimization of the AMBER/GBSA potential energy function selected conformers that had global RMSD values of 0.5 A for 4-mers, 2.3 A for 8-mers, and 5.0 A for 12-mers. Minimized fragments had, on average, consistently lower RMSD values (by 0.1 A) than their initial conformations. The importance of the Generalized Born solvation energy term is reflected by the observation that the average RMSD accuracy for all loop lengths was worse when this term is omitted. There are, however, still many cases where the AMBER gas-phase minimization selected conformers of lower RMSD than the AMBER/GBSA minimization. The AMBER/GBSA energy function had better correlation with RMSD to native than the RAPDF. When the ensembles were supplemented with conformations extracted from experimental structures, a dramatic improvement in selection accuracy was observed at longer lengths (average RMSD of 1.3 A for 8-mers) when scoring with the AMBER/GBSA force field. This work provides the basis for a promising hybrid approach of ab initio and knowledge-based methods for loop modeling. 相似文献

4.

Modeling protein loops with knowledge-based prediction of sequence-structure alignment

Peng HP Yang AS 《Bioinformatics (Oxford, England)》2007,23(21):2836-2842

MOTIVATION: As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS: We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY: http://cmb.genomics.sinica.edu.tw 相似文献

5.

Loops In Proteins (LIP)--a comprehensive loop database for homology modelling

Michalsky E Goede A Preissner R 《Protein engineering》2003,16(12):979-985

One of the most important and challenging tasks in protein modelling is the prediction of loops, as can be seen in the large variety of existing approaches. Loops In Proteins (LIP) is a database that includes all protein segments of a length up to 15 residues contained in the Protein Data Bank (PDB). In this study, the applicability of LIP to loop prediction in the framework of homology modelling is investigated. Searching the database for loop candidates takes less than 1 s on a desktop PC, and ranking them takes a few minutes. This is an order of magnitude faster than most existing procedures. The measure of accuracy is the root mean square deviation (RMSD) with respect to the main-chain atoms after local superposition of target loop and predicted loop. Loops of up to nine residues length were modelled with a local RMSD <1 A and those of length up to 14 residues with an accuracy better than 2 A. The results were compared in detail with a thoroughly evaluated and tested ab initio method published recently and additionally with two further methods for a small loop test set. The LIP method produced very good predictions. In particular for longer loops it outperformed other methods. 相似文献

6.

Development and benchmarking of TASSER(iter) for the iterative improvement of protein structure predictions

Lee SY Skolnick J 《Proteins》2007,68(1):39-47

To improve the accuracy of TASSER models especially in the limit where threading provided template alignments are of poor quality, we have developed the TASSER(iter) algorithm which uses the templates and contact restraints from TASSER generated models for iterative structure refinement. We apply TASSER(iter) to a large benchmark set of 2,773 nonhomologous single domain proteins that are < or = 200 in length and that cover the PDB at the level of 35% pairwise sequence identity. Overall, TASSER(iter) models have a smaller global average RMSD of 5.48 A compared to 5.81 A RMSD of the original TASSER models. Classifying the targets by the level of prediction difficulty (where Easy targets have a good template with a corresponding good threading alignment, Medium targets have a good template but a poor alignment, and Hard targets have an incorrectly identified template), TASSER(iter) (TASSER) models have an average RMSD of 4.15 A (4.35 A) for the Easy set and 9.05 A (9.52 A) for the Hard set. The largest reduction of average RMSD is for the Medium set where the TASSER(iter) models have an average global RMSD of 5.67 A compared to 6.72 A of the TASSER models. Seventy percent of the Medium set TASSER(iter) models have a smaller RMSD than the TASSER models, while 63% of the Easy and 60% of the Hard TASSER models are improved by TASSER(iter). For the foldable cases, where the targets have a RMSD to the native <6.5 A, TASSER(iter) shows obvious improvement over TASSER models: For the Medium set, it improves the success rate from 57.0 to 67.2%, followed by the Hard targets where the success rate improves from 32.0 to 34.8%, with the smallest improvement in the Easy targets from 82.6 to 84.0%. These results suggest that TASSER(iter) can provide more reliable predictions for targets of Medium difficulty, a range that had resisted improvement in the quality of protein structure predictions. 相似文献

7.

Near-native protein loop sampling using nonparametric density estimation accommodating sparcity

Joo H Chavan AG Day R Lennox KP Sukhanov P Dahl DB Vannucci M Tsai J 《PLoS computational biology》2011,7(10):e1002234

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. 相似文献

8.

How intron splicing affects the deletion and insertion profile in Drosophila melanogaster

Ptak SE Petrov DA 《Genetics》2002,162(3):1233-1244

Studies of "dead-on-arrival" transposable elements in Drosophila melanogaster found that deletions outnumber insertions approximately 8:1 with a median size for deletions of approximately 10 bp. These results are consistent with the deletion and insertion profiles found in most other Drosophila pseudogenes. In contrast, a recent study of D. melanogaster introns found a deletion/insertion ratio of 1.35:1, with 84% of deletions being shorter than 10 bp. This discrepancy could be explained if deletions, especially long deletions, are more frequently strongly deleterious than insertions and are eliminated disproportionately from intron sequences. To test this possibility, we use analysis and simulations to examine how deletions and insertions of different lengths affect different components of splicing and determine the distribution of deletions and insertions that preserve the original exons. We find that, consistent with our predictions, longer deletions affect splicing at a much higher rate compared to insertions and short deletions. We also explore other potential constraints in introns and show that most of these also disproportionately affect large deletions. Altogether we demonstrate that constraints in introns may explain much of the difference in the pattern of deletions and insertions observed in Drosophila introns and pseudogenes. 相似文献

9.

Loop prediction for a GPCR homology model: Algorithms and results

Dahlia A. Goldfeld Kai Zhu Thijs Beuming Richard A. Friesner 《Proteins》2013,81(2):214-228

We present loop structure prediction results of the intracellular and extracellular loops of four G‐protein‐coupled receptors (GPCRs): bovine rhodopsin (bRh), the turkey β1‐adrenergic (β1Ar), the human β2‐adrenergic (β2Ar) and the human A2a adenosine receptor (A2Ar) in perturbed environments. We used the protein local optimization program, which builds thousands of loop candidates by sampling rotamer states of the loops' constituent amino acids. The candidate loops are discriminated between with our physics‐based, all‐atom energy function, which is based on the OPLS force field with implicit solvent and several correction terms. For relevant cases, explicit membrane molecules are included to simulate the effect of the membrane on loop structure. We also discuss a new sampling algorithm that divides phase space into different regions, allowing more thorough sampling of long loops that greatly improves results. In the first half of the paper, loop prediction is done with the GPCRs' transmembrane domains fixed in their crystallographic positions, while the loops are built one‐by‐one. Side chains near the loops are also in non‐native conformations. The second half describes a full homology model of β2Ar using β1Ar as a template. No information about the crystal structure of β2Ar was used to build this homology model. We are able to capture the architecture of short loops and the very long second extracellular loop, which is key for ligand binding. We believe this the first successful example of an RMSD validated, physics‐based loop prediction in the context of a GPCR homology model. Proteins 2013. © 2012 Wiley Periodicals, Inc. 相似文献

10.

Functional consequences of insertions and deletions in the complementarity-determining regions of human antibodies

Lantto J Ohlin M 《The Journal of biological chemistry》2002,277(47):45108-45114

Insertions and deletions of nucleotides in the genes encoding the variable domains of antibodies are natural components of the hypermutation process, which may expand the available repertoire of hypervariable loop lengths and conformations. Although insertion of amino acids has also been utilized in antibody engineering, little is known about the functional consequences of such modifications. To investigate this further, we have introduced single-codon insertions and deletions as well as more complex modifications in the complementarity-determining regions of human antibody fragments with different specificities. Our results demonstrate that single amino acid insertions and deletions are generally well tolerated and permit production of stably folded proteins, often with retained antigen recognition, despite the fact that the thus modified loops carry amino acids that are disallowed at key residue positions in canonical loops of the corresponding length or are of a length not associated with a known canonical structure. We have thus shown that single-codon insertions and deletions can efficiently be utilized to expand structure and sequence space of the antigen-binding site beyond what is encoded by the germline gene repertoire. 相似文献

11.

An automatic search for similar spatial arrangements of alpha-helices and beta-strands in globular proteins

R A Abagyan V N Maiorov 《Journal of biomolecular structure & dynamics》1989,6(6):1045-1060

A fast search algorithm to reveal similar polypeptide backbone structural motifs in proteins is proposed. It is based on the vector representation of a polypeptide chain fold in which the elements of regular secondary structures are approximated by linear segments (Abagyan and Maiorov, J. Biomol. Struct. Dyn. 5, 1267-1279 (1988)). The algorithm permits insertions and deletions in the polypeptide chain fragments to be compared. The fast search algorithm implemented in FASEAR program is used for collecting beta alpha beta supersecondary structure units in a number of alpha/beta proteins of Brookhaven Data Bank. Variation of geometrical parameters specifying backbone chain fold is estimated. It appears that the conformation of the majority of the fragments, although almost all of them are right-handed, is quite different from that of standard beta alpha beta units. Apart from searching for specific type of secondary structure motif, the algorithm allows automatically to identify new recurrent folding patterns in proteins. It may be of particular interest for the development of tertiary template approach for prediction of protein three-dimensional structure as well for constructing artificial polypeptides with goal-oriented conformation. 相似文献

12.

An Approach for Searching Insertions in Bacterial Genes Leading to the Phase Shift of Triplet Periodicity

Maria A. Korotkova Nikolay A. Kudryashov Eugene V. Korotkov 《基因组蛋白质组与生物信息学报(英文版)》2011,(Z2):158-170

The concept of the phase shift of triplet periodicity (TP) was used for searching potential DNA insertions in genes from 17 bacterial genomes. A mathematical algorithm for detection of these insertions has been developed. This approach can detect potential insertions and deletions with lengths that are not multiples of three bases, especially insertions of relatively large DNA fragments (>100 bases). New similarity measure between triplet matrixes was employed to improve the sensitivity for detecting the TP phase shift. Sequences of 17,220 bacterial genes with each consisting of more than 1,200 bases were analyzed, and the presence of a TP phase shift has been shown in ~16% of analysed genes (2,809 genes), which is about 4 times more than that detected in our previous work. We propose that shifts of the TP phase may indicate the shifts of reading frame in genes after insertions of the DNA fragments with lengths that are not multiples of three bases. A relationship between the phase shifts of TP and the frame shifts in genes is discussed. 相似文献

13.

Novel PMS1 alleles preferentially affect the repair of primer strand loops during DNA replication

下载免费PDF全文

Erdeniz N Dudley S Gealy R Jinks-Robertson S Liskay RM 《Molecular and cellular biology》2005,25(21):9221-9231

Null mutations in DNA mismatch repair (MMR) genes elevate both base substitutions and insertions/deletions in simple sequence repeats. Data suggest that during replication of simple repeat sequences, polymerase slippage can generate single-strand loops on either the primer or template strand that are subsequently processed by the MMR machinery to prevent insertions and deletions, respectively. In the budding yeast Saccharomyces cerevisiae and mammalian cells, MMR appears to be more efficient at repairing mispairs comprised of loops on the template strand compared to loops on the primer strand. We identified two novel yeast pms1 alleles, pms1-G882E and pms1-H888R, which confer a strong defect in the repair of "primer strand" loops, while maintaining efficient repair of "template strand" loops. Furthermore, these alleles appear to affect equally the repair of 1-nucleotide primer strand loops during both leading- and lagging-strand replication. Interestingly, both pms1 mutants are proficient in the repair of 1-nucleotide loop mispairs in heteroduplex DNA generated during meiotic recombination. Our results suggest that the inherent inefficiency of primer strand loop repair is not simply a mismatch recognition problem but also involves Pms1 and other proteins that are presumed to function downstream of mismatch recognition, such as Mlh1. In addition, the findings reinforce the current view that during mutation avoidance, MMR is associated with the replication apparatus. 相似文献

14.

RNA loop structure prediction via bond scaling and relaxation

Tonya Frederic Rakefet Rosenfeld Charles R. Cantor Charles DeLisi 《Biopolymers》1996,38(6):769-779

We have developed a method for predicting the structure of small RNA loops that can be used to augment already existing RNA modeling techniques. The method requires no input constraints on loop configuration other than end-to-end distance. Initial loop structures are generated by randomizing the torsion angles, beginning at one end of the polynucleotide chain and correlating each successive angle with the previous. The bond lengths of these structures are then scaled to fit within the known end constraints and the equilibrium bond lengths of the potential energy function are scaled accordingly. Through a series of rescaling and minimization steps the structures are allowed to relax to lower energy configurations with standard bond lengths and reduced van der Waals clashes. This algorithm has been tested on the variable loops of yeast tRNA-Asp and yeast tRNA-Phe, as well as the sarcin-ricin tetraloop and the anticodon loop of yeast tRNA-Phe. The results indicate good correlation between potential energy and the loop structure predictions that are closest to the variable loop crystal structures, but poorer correlation for the more isolated stem loops. The number of stacking interactions has proven to be a good objective measure of the best loop predictions. Selecting on the basis of energy and stacking, we obtain two structures with 0.65 and 0.75 Å all-atom rms deviations (RMSD) from the crystal structure for the tRNA-Asp variable loop. The best structure prediction for the tRNA-Phe variable loop has an all-atom RMSD of 2.2 Å and a backbone RMSD of 1.6 Å, with a single base responsible for most of the deviation. For the sarcin-ricin loop from 28S ribosomal RNA, the predicted structure's all-atom RMSD from the nmr structure is 1.0 Å. We obtain a 1.8 Å RMSD structure for the tRNA-Phe anticodon loop. © 1996 John Wiley & Sons, Inc. 相似文献

15.

Predicting 3D structures of transient protein-protein complexes by homology

Kundrotas PJ Alexov E 《Biochimica et biophysica acta》2006,1764(9):1498-1511

The paper reports a homology based approach for predicting the 3D structures of full length hetero protein complexes. We have created a database of templates that includes structures of hetero protein-protein complexes as well as domain-domain structures (), which allowed us to expand the template pool up to 418 two-chain entries (at 40% sequence identity). Two protocols were tested-a protocol based on position specific Blast search (Protocol-I) and a protocol based on structural similarity of monomers (Protocol-II). All possible combinations of two monomers (350,284 pairs) in the ProtCom database were subjected to both protocols to predict if they form complexes. The predictions were benchmarked against the ProtCom database resulting to false-true positives ratios of approximately 5:1 and approximately 7:1 and recovery of 19% and 86%, respectively for protocols I and II. From 350,284 trials Protocol-I made only approximately 500 wrong predictions resulting to 0.5% error. In addition, though it was shown that artificially created domain-domain structures can in principle be good templates for modeling full length protein complexes, more sensitive methods are needed to detect homology relations. The quality of the models was assessed using two different criteria such as interfacial residues and overall RMSD. It was found that there is no correlation between these two measures. In many cases the interface residues were predicted correctly, but the overall RMSD was over 6 A and vice versa. 相似文献

16.

Consensus alignment for reliable framework prediction in homology modeling

Prasad JC Comeau SR Vajda S Camacho CJ 《Bioinformatics (Oxford, England)》2003,19(13):1682-1691

MOTIVATION: Even the best sequence alignment methods frequently fail to correctly identify the framework regions for which backbones can be copied from the template into the target structure. Since the underprediction and, more significantly, the overprediction of these regions reduces the quality of the final model, it is of prime importance to attain as much as possible of the true structural alignment between target and template. RESULTS: We have developed an algorithm called Consensus that consistently provides a high quality alignment for comparative modeling. The method follows from a benchmark analysis of the 3D models generated by ten alignment techniques for a set of 79 homologous protein structure pairs. For 20-to-40% of the targets, these methods yield models with at least 6 A root mean square deviation (RMSD) from the native structure. We have selected the top five performing methods, and developed a consensus algorithm to generate an improved alignment. By building on the individual strength of each method, a set of criteria was implemented to remove the alignment segments that are likely to correspond to structurally dissimilar regions. The automated algorithm was validated on a different set of 48 protein pairs, resulting in 2.2 A average RMSD for the predicted models, and only four cases in which the RMSD exceeded 3 A. The average length of the alignments was about 75% of that found by standard structural superposition methods. The performance of Consensus was consistent from 2 to 32% target-template sequence identity, and hence it can be used for accurate prediction of framework regions in homology modeling. 相似文献

17.

Modeling of loops in protein structures 总被引：27，自引：0，他引：27

下载免费PDF全文

Fiser A Do RK Sali A 《Protein science : a publication of the Protein Society》2000,9(9):1753-1773

Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling. 相似文献

18.

Energy-based graph convolutional networks for scoring protein docking models

Yue Cao Yang Shen 《Proteins》2020,88(8):1091-1099

Structural information about protein-protein interactions, often missing at the interactome scale, is important for mechanistic understanding of cells and rational discovery of therapeutics. Protein docking provides a computational alternative for such information. However, ranking near-native docked models high among a large number of candidates, often known as the scoring problem, remains a critical challenge. Moreover, estimating model quality, also known as the quality assessment problem, is rarely addressed in protein docking. In this study, the two challenging problems in protein docking are regarded as relative and absolute scoring, respectively, and addressed in one physics-inspired deep learning framework. We represent protein and complex structures as intra- and inter-molecular residue contact graphs with atom-resolution node and edge features. And we propose a novel graph convolutional kernel that aggregates interacting nodes’ features through edges so that generalized interaction energies can be learned directly from 3D data. The resulting energy-based graph convolutional networks (EGCN) with multihead attention are trained to predict intra- and inter-molecular energies, binding affinities, and quality measures (interface RMSD) for encounter complexes. Compared to a state-of-the-art scoring function for model ranking, EGCN significantly improves ranking for a critical assessment of predicted interactions (CAPRI) test set involving homology docking; and is comparable or slightly better for Score_set, a CAPRI benchmark set generated by diverse community-wide docking protocols not known to training data. For Score_set quality assessment, EGCN shows about 27% improvement to our previous efforts. Directly learning from 3D structure data in graph representation, EGCN represents the first successful development of graph convolutional networks for protein docking. 相似文献

19.

Assessment and Challenges of Ligand Docking into Comparative Models of G-Protein Coupled Receptors

Elizabeth Dong Nguyen Christoffer Norn Thomas M. Frimurer Jens Meiler 《PloS one》2013,8(7)

The rapidly increasing number of high-resolution X-ray structures of G-protein coupled receptors (GPCRs) creates a unique opportunity to employ comparative modeling and docking to provide valuable insight into the function and ligand binding determinants of novel receptors, to assist in virtual screening and to design and optimize drug candidates. However, low sequence identity between receptors, conformational flexibility, and chemical diversity of ligands present an enormous challenge to molecular modeling approaches. It is our hypothesis that rapid Monte-Carlo sampling of protein backbone and side-chain conformational space with Rosetta can be leveraged to meet this challenge. This study performs unbiased comparative modeling and docking methodologies using 14 distinct high-resolution GPCRs and proposes knowledge-based filtering methods for improvement of sampling performance and identification of correct ligand-receptor interactions. On average, top ranked receptor models built on template structures over 50% sequence identity are within 2.9 Å of the experimental structure, with an average root mean square deviation (RMSD) of 2.2 Å for the transmembrane region and 5 Å for the second extracellular loop. Furthermore, these models are consistently correlated with low Rosetta energy score. To predict their binding modes, ligand conformers of the 14 ligands co-crystalized with the GPCRs were docked against the top ranked comparative models. In contrast to the comparative models themselves, however, it remains difficult to unambiguously identify correct binding modes by score alone. On average, sampling performance was improved by 10³ fold over random using knowledge-based and energy-based filters. In assessing the applicability of experimental constraints, we found that sampling performance is increased by one order of magnitude for every 10 residues known to contact the ligand. Additionally, in the case of DOR, knowledge of a single specific ligand-protein contact improved sampling efficiency 7 fold. These findings offer specific guidelines which may lead to increased success in determining receptor-ligand complexes. 相似文献

20.

Automatic protein structure prediction system enabling rapid and accurate model building for enzyme screening

Joo-Hyun Seo Gang-Seong Lee Juhan Kim Byung-Kwan Cho Keehyoung Joo Jooyoung Lee Byung-Gee Kim 《Enzyme and microbial technology》2009,45(3):218-225

Protein structure prediction has great potential of understanding the function of proteins at the molecular level and designing novel protein functions. Here, we report rapid and accurate structure prediction system running in an automated manner. Since fold recognition of the target protein to be modeled is the starting point of the template-guided model building process, various approaches – such as profile analysis, threading, and SCOP fold classification – have been applied to generate the template library and to select the best template structure. After the best template was determined, fold consistency within the template candidates was considered using TM-score and SCOP database to select additional template structures among the template library. To generate a total of 100 decoy sets, MODELLER was used with the selected template structure. The predicted decoys were clustered with the RMSD deviation criterion of 3 Å to obtain centroids from each cluster. Finally, the selected centroids were subject to side-chain rearrangement using SCWRL module. Our fully automated structure prediction system was examined with sample test sets consisting of recently released 80 PDB chains. Judged by the TM-score (≥0.4), we concluded that 60 cases (75%) showed similar structures of statistical significance. This prediction system provides the users with simple and reliable models within hours of query submission, so that it is quite simply used for high throughput enzyme screening. 相似文献