首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

2.
Analysis of protein structures based on backbone structural patterns known as structural alphabets have been shown to be very useful. Among them, a set of 16 pentapeptide structural motifs known as protein blocks (PBs) has been identified and upon which backbone model of most protein structures can be built. PBs allows simplification of 3D space onto 1D space in the form of sequence of PBs. Here, for the first time, substitution probabilities of PBs in a large number of aligned homologous protein structures have been studied and are expressed as a simplified 16 x 16 substitution matrix. The matrix was validated by benchmarking how well it can align sequences of PBs rather like amino acid alignment to identify structurally equivalent regions in closely or distantly related proteins using dynamic programming approach. The alignment results obtained are very comparable to well established structure comparison methods like DALI and STAMP. Other interesting applications of the matrix have been investigated. We first show that, in variable regions between two superimposed homologous proteins, one can distinguish between local conformational differences and rigid-body displacement of a conserved motif by comparing the PBs and their substitution scores. Second, we demonstrate, with the example of aspartic proteinases, that PBs can be efficiently used to detect the lobe/domain flexibility in the multidomain proteins. Lastly, using protein kinase as an example, we identify regions of conformational variations and rigid body movements in the enzyme as it is changed to the active state from an inactive state.  相似文献   

3.
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.  相似文献   

4.
5.
An algorithm is proposed for the conversion of a virtual-bond polypeptide chain (connected C alpha atoms) to an all-atom backbone, based on determining the most extensive hydrogen-bond network between the peptide groups of the backbone, while maintaining all of the backbone atoms in energetically feasible conformations. Hydrogen bonding is represented by aligning the peptide-group dipoles. These peptide groups are not contiguous in the amino acid sequence. The first dipoles to be aligned are those that are both sufficiently close in space to be arranged in approximately linear arrays termed dipole paths. The criteria used in the construction of dipole paths are: to assure good alignment of the greatest possible number of dipoles that are close in space; to optimize the electrostatic interactions between the dipoles that belong to different paths close in space; and to avoid locally unfavorable amino acid residue conformations. The equations for dipole alignment are solved separately for each path, and then the remaining single dipoles are aligned optimally with the electrostatic field from the dipoles that belong to the dipole-path network. A least-squares minimizer is used to keep the geometry of the alpha-carbon trace of the resulting backbone close to that of the input virtual-bond chain. This procedure is sufficient to convert the virtual-bond chain to a real chain; in applications to real systems, however, the final structure is obtained by minimizing the total ECEPP/2 (empirical conformational energy program for peptides) energy of the system, starting from the geometry resulting from the solution of the alignment equations. When applied to model alpha-helical and beta-sheet structures, the algorithm, followed by the ECEPP/2 energy minimization, resulted in an energy and backbone geometry characteristic of these alpha-helical and beta-sheet structures. Application to the alpha-carbon trace of the backbone of the crystallographic 5PTI structure of bovine pancreatic trypsin inhibitor, followed by ECEPP/2 energy minimization with C alpha-distance constraints, led to a structure with almost as low energy and root mean square deviation as the ECEPP/2 geometry analog of 5PTI, the best agreement between the crystal and reconstructed backbone being observed for the residues involved in the dipole-path network.  相似文献   

6.
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods.  相似文献   

7.
The question of how best to compare and classify the (three‐dimensional) structures of proteins is one of the most important unsolved problems in computational biology. To help tackle this problem, we have developed a novel shape‐density superposition algorithm called 3D‐Blast which represents and superposes the shapes of protein backbone folds using the spherical polar Fourier correlation technique originally developed by us for protein docking. The utility of this approach is compared with several well‐known protein structure alignment algorithms using receiver‐operator‐characteristic plots of queries against the “gold standard” CATH database. Despite being completely independent of protein sequences and using no information about the internal geometry of proteins, our results from searching the CATH database show that 3D‐Blast is highly competitive compared to current state‐of‐the‐art protein structure alignment algorithms. A novel and potentially very useful feature of our approach is that it allows an average or “consensus” fold to be calculated easily for a given group of protein structures. We find that using consensus shapes to represent entire fold families also gives very good database query performance. We propose that using the notion of consensus fold shapes could provide a powerful new way to index existing protein structure databases, and that it offers an objective way to cluster and classify all of the currently known folds in the protein universe. Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

8.
YibK is a 160 residue homodimeric protein belonging to the SPOUT class of methyltransferases. Proteins in this group all display a unique topological feature; the backbone polypeptide chain folds to form a deep trefoil knot. Such knotted structures were completely unpredicted, it being thought impossible for a protein to fold efficiently in this way. However, they are becoming more common and there are now a growing number of examples in the Protein Data Bank. These intriguing knotted structures represent a new and significant challenge in the field of protein folding. Here, we present an initial characterisation of the folding of YibK, one of the smallest knotted proteins to be identified. This is the first detailed folding study on a knotted protein to be reported. We have established conditions under which the protein can be denatured reversibly in vitro using urea, thereby showing that molecular chaperones are not required for the efficient folding of this protein. A series of equilibrium unfolding experiments were performed over a 400-fold range of protein concentration. Both secondary and tertiary structural probes show a single, protein concentration-dependent unfolding transition, and data are most consistent with a three-state equilibrium denaturation model involving a monomeric intermediate. Thermodynamic parameters obtained from the fit of the data to this model indicate that the intermediate is a stable species with appreciable secondary and tertiary structure; whether the topological knot remains in the intermediate state is still to be shown. Together, these results demonstrate that, despite its complex knotted structure, YibK is able to fold efficiently and behaves remarkably similarly to other dimeric proteins under equilibrium conditions.  相似文献   

9.
A conceptual framework for understanding the protein folding problem has remained elusive in spite of many significant advances. We show that geometrical constraints imposed by chain connectivity, compactness, and the avoidance of steric clashes can be encompassed in a natural way using a three-body potential and lead to a selection in structure space, independent of chemical details. Strikingly, secondary motifs such as hairpins, sheets, and helices, which are the building blocks of protein folds, emerge as the chosen structures for segments of the protein backbone based just on elementary geometrical considerations.  相似文献   

10.
Automated minimization of steric clashes in protein structures   总被引:1,自引:0,他引:1  
Molecular modeling of proteins including homology modeling, structure determination, and knowledge-based protein design requires tools to evaluate and refine three-dimensional protein structures. Steric clash is one of the artifacts prevalent in low-resolution structures and homology models. Steric clashes arise due to the unnatural overlap of any two nonbonding atoms in a protein structure. Usually, removal of severe steric clashes in some structures is challenging since many existing refinement programs do not accept structures with severe steric clashes. Here, we present a quantitative approach of identifying steric clashes in proteins by defining clashes based on the Van der Waals repulsion energy of the clashing atoms. We also define a metric for quantitative estimation of the severity of clashes in proteins by performing statistical analysis of clashes in high-resolution protein structures. We describe a rapid, automated, and robust protocol, Chiron, which efficiently resolves severe clashes in low-resolution structures and homology models with minimal perturbation in the protein backbone. Benchmark studies highlight the efficiency and robustness of Chiron compared with other widely used methods. We provide Chiron as an automated web server to evaluate and resolve clashes in protein structures that can be further used for more accurate protein design.  相似文献   

11.
AlphaFold2 is a promising new tool for researchers to predict protein structures and generate high-quality models, with low backbone and global root-mean-square deviation (RMSD) when compared with experimental structures. However, it is unclear if the structures predicted by AlphaFold2 will be valuable targets of docking. To address this question, we redocked ligands in the PDBbind datasets against the experimental co-crystallized receptor structures and against the AlphaFold2 structures using AutoDock-GPU. We find that the quality measure provided during structure prediction is not a good predictor of docking performance, despite accurately reflecting the quality of the alpha carbon alignment with experimental structures. Removing low-confidence regions of the predicted structure and making side chains flexible improves the docking outcomes. Overall, despite high-quality prediction of backbone conformation, fine structural details limit the naive application of AlphaFold2 models as docking targets.  相似文献   

12.
We examine how effectively simple potential functions previously developed can identify compatibilities between sequences and structures of proteins for database searches. The potential function consists of pairwise contact energies, repulsive packing potentials of residues for overly dense arrangement and short-range potentials for secondary structures, all of which were estimated from statistical preferences observed in known protein structures. Each potential energy term was modified to represent compatibilities between sequences and structures for globular proteins. Pairwise contact interactions in a sequence-structure alignment are evaluated in a mean field approximation on the basis of probabilities of site pairs to be aligned. Gap penalties are assumed to be proportional to the number of contacts at each residue position, and as a result gaps will be more frequently placed on protein surfaces than in cores. In addition to minimum energy alignments, we use probability alignments made by successively aligning site pairs in order by pairwise alignment probabilities. The results show that the present energy function and alignment method can detect well both folds compatible with a given sequence and, inversely, sequences compatible with a given fold, and yield mostly similar alignments for these two types of sequence and structure pairs. Probability alignments consisting of most reliable site pairs only can yield extremely small root mean square deviations, and including less reliable pairs increases the deviations. Also, it is observed that secondary structure potentials are usefully complementary to yield improved alignments with this method. Remarkably, by this method some individual sequence-structure pairs are detected having only 5-20% sequence identity.  相似文献   

13.
We present a comprehensive evaluation of a new structure mining method called PB-ALIGN. It is based on the encoding of protein structure as 1D sequence of a combination of 16 short structural motifs or protein blocks (PBs). PBs are short motifs capable of representing most of the local structural features of a protein backbone. Using derived PB substitution matrix and simple dynamic programming algorithm, PB sequences are aligned the same way amino acid sequences to yield structure alignment. PBs are short motifs capable of representing most of the local structural features of a protein backbone. Alignment of these local features as sequence of symbols enables fast detection of structural similarities between two proteins. Ability of the method to characterize and align regions beyond regular secondary structures, for example, N and C caps of helix and loops connecting regular structures, puts it a step ahead of existing methods, which strongly rely on secondary structure elements. PB-ALIGN achieved efficiency of 85% in extracting true fold from a large database of 7259 SCOP domains and was successful in 82% cases to identify true super-family members. On comparison to 13 existing structure comparison/mining methods, PB-ALIGN emerged as the best on general ability test dataset and was at par with methods like YAKUSA and CE on nontrivial test dataset. Furthermore, the proposed method performed well when compared to flexible structure alignment method like FATCAT and outperforms in processing speed (less than 45 s per database scan). This work also establishes a reliable cut-off value for the demarcation of similar folds. It finally shows that global alignment scores of unrelated structures using PBs follow an extreme value distribution. PB-ALIGN is freely available on web server called Protein Block Expert (PBE) at http://bioinformatics.univ-reunion.fr/PBE/.  相似文献   

14.
Two genes that are expressed when precursor cytotoxic T lymphocytes are transformed to T killer cells have been cloned and sequenced. The derived amino acid sequences, coding for cytotoxic cell protease 1 (CCP1) and Hannuka factor (HF) are highly homologous to members of the serine proteinase family. Comparative molecular model building using the known three-dimensional structures and the derived amino acid sequences of the lymphocyte enzymes has provided useful structural information, especially in predicting the conformations of the substrate binding sites. In applying this modelling procedure, we used the X-ray structures of four serine proteinases to provide a structurally based sequence alignment: alpha-chymotrypsin (CHT), bovine trypsin (BT), Streptomyces griseus trypsin (SGT), and rat mast cell protease 2 (RMCP2). The root mean square differences in alpha-carbon atom positions among these four structures when compared in a pairwise fashion range from 0.79 to 0.97 A for structurally equivalent residues. The sequences of the two lymphocyte enzymes were then aligned to these proteinases using chemical criteria and the superimposed X-ray structures as guides. The alignment showed that the sequence of CCP1 was most similar to RMCP2, whereas HF has regions of homology with both RMCP2 and BT. With RMCP2 as a template for CCP1 and the two enzymes RMCP2 and BT as templates for HF, the molecular models were constructed. Intramolecular steric clashes that resulted from the replacement of amino acid side chains of the templates by the aligned residues of CCP1 and HF were relieved by adjustment of the side chain conformational angles in an interactive computer graphics device. This process was followed by energy minimization of the enzyme model to optimize the stereochemical geometry and to relieve any remaining unacceptably close nonbonded contacts. The resulting model of CCP1 has an arginine residue at position 226 in the specificity pocket, thereby predicting a substrate preference for P1 aspartate or glutamate residues. The model also predicts favorable binding for a small hydrophobic residue at the P2 position of the substrate. The primary specificity pocket of HF resembles that of BT and therefore predicts a lysine or arginine preference for the P1 residue. The arginine at position 99 in the model of HF suggests a preference for aspartate or glutamate side chains in the P2 position of the substrate. Both CCP1 and HF have a free cysteine in the segment of polypeptide 88 to 93.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

15.
16.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

17.
Structural comparison of multiple-chain protein complexes is essential in many studies of protein–protein interactions. We develop a new algorithm, MM-align, for sequence-independent alignment of protein complex structures. The algorithm is built on a heuristic iteration of a modified Needleman–Wunsch dynamic programming (DP) algorithm, with the alignment score specified by the inter-complex residue distances. The multiple chains in each complex are first joined, in every possible order, and then simultaneously aligned with cross-chain alignments prevented. The alignments of interface residues are enhanced by an interface-specific weighting factor. MM-align is tested on a large-scale benchmark set of 205 × 3897 non-homologous multiple-chain complex pairs. Compared with a naïve extension of the monomer alignment program of TM-align, the alignment accuracy of MM-align is significantly higher as judged by the average TM-score of the physically-aligned residues. MM-align is about two times faster than TM-align because of omitting the cross-alignment zone of the DP matrix. It also shows that the enhanced alignment of the interfaces helps in identifying biologically relevant protein complex pairs.  相似文献   

18.
The ab initio folding problem can be divided into two sequential tasks of approximately equal computational complexity: the generation of native-like backbone folds and the positioning of side chains upon these backbones. The prediction of side-chain conformation in this context is challenging, because at best only the near-native global fold of the protein is known. To test the effect of displacements in the protein backbones on side-chain prediction for folds generated ab initio, sets of near-native backbones (≤ 4 Å Cα RMS error) for four small proteins were generated by two methods. The steric environment surrounding each residue was probed by placing the side chains in the native conformation on each of these decoys, followed by torsion-space optimization to remove steric clashes on a rigid backbone. We observe that on average 40% of the χ1 angles were displaced by 40° or more, effectively setting the limits in accuracy for side-chain modeling under these conditions. Three different algorithms were subsequently used for prediction of side-chain conformation. The average prediction accuracy for the three methods was remarkably similar: 49% to 51% of the χ1 angles were predicted correctly overall (33% to 36% of the χ1+2 angles). Interestingly, when the inter-side-chain interactions were disregarded, the mean accuracy increased. A consensus approach is described, in which side-chain conformations are defined based on the most frequently predicted χ angles for a given method upon each set of near-native backbones. We find that consensus modeling, which de facto includes backbone flexibility, improves side-chain prediction: χ1 accuracy improved to 51–54% (36–42% of χ1+2). Implications of a consensus method for ab initio protein structure prediction are discussed. Proteins 33:204–217, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

19.
To adequately deal with the inherent complexity of interactions between protein side-chains, we develop and describe here a novel method for characterizing protein packing within a fold family. Instead of approaching side-chain interactions absolutely from one residue to another, we instead consider the relative interactions of contacting residue pairs. The basic element, the pair-wise relative contact, is constructed from a sequence alignment and contact analysis of a set of structures and consists of a cluster of similarly oriented, interacting, side-chain pairs. To demonstrate this construct's usefulness in analyzing protein structure, we used the pair-wise relative contacts to analyze two sets of protein structures as defined by SCOP: the diverse globin-like superfamily (126 structures) and the more uniform heme binding globin family (a 94 structure subset of the globin-like superfamily). The superfamily structure set produced 1266 unique pair-wise relative contacts, whereas the family structure subset gave 1001 unique pair-wise relative contacts. For both sets, we show that these constructs can be used to accurately and automatically differentiate between fold classes. Furthermore, these pair-wise relative contacts correlate well with sequence identity and thus provide a direct relationship between changes in sequence and changes in structure. To capture the complexity of protein packing, these pair-wise relative contacts can be superimposed around a single residue to create a multi-body construct called a relative packing group. Construction of convex hulls around the individual packing groups provides a measure of the variation in packing around a residue and defines an approximate volume of space occupied by the groups interacting with a residue. We find that these relative packing groups are useful in understanding the structural quality of sequence or structure alignments. Moreover, they provide context to calculate a value for structural randomness, which is important in properly assessing the quality of a structural alignment. The results of this study provide the framework for future analysis for correlating sequence changes to specific structure changes.  相似文献   

20.
The distribution of the C(alpha)-C(alpha) distances between residues separated by three to 30 amino acid residues is highly characteristic of protein folds and makes it possible to identify them from a straightforward comparison of the distance histograms. The comparison is carried out by contingency table analysis and yields a probability of identity (PRIDE score), with values between zero and 1. For closely related structures, PRIDE is highly correlated with the root-mean-square distance between C(alpha) atoms, but it provides a correct classification even for unrelated structures for which a structural alignment is not meaningful. For example, an analysis of the CATH database of fold structures showed that 98.8% of the folds fall into the correct CATH homologous superfamily category, based on the highest PRIDE score obtained. Structural alignment and secondary-structure assignment are not necessary for the calculation of PRIDE, which is fast enough to allow the scanning of large databases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号