首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A huge number of high-quality predicted protein structures are now publicly available. However, many of these structures contain non-globular regions, which diminish the performance of downstream structural bioinformatic applications. In this study, we develop AlphaCutter for the removal of non-globular regions from predicted protein structures. A large-scale cleaning of 542,380 predicted SwissProt structures highlights that AlphaCutter is able to (1) remove non-globular regions that are undetectable using pLDDT scores and (2) preserve high integrity of the cleaned domain regions. As useful applications, AlphaCutter improved the folding energy scores and sequence recovery rates in the re-design of domain regions. On average, AlphaCutter takes less than 3 s to clean a protein structure, enabling efficient cleaning of the exploding number of predicted protein structures. AlphaCutter is available at https://github.com/johnnytam100/AlphaCutter . AlphaCutter-cleaned SwissProt structures are available for download at https://doi.org/10.5281/zenodo.7944483 .  相似文献   

3.
In this work we examine how protein structural changes are coupled with sequence variation in the course of evolution of a family of homologs. The sequence-structure correlation analysis performed on 81 homologous protein families shows that the majority of them exhibit statistically significant linear correlation between the measures of sequence and structural similarity. We observed, however, that there are cases where structural variability cannot be mainly explained by sequence variation, such as protein families with a number of disulfide bonds. To understand whether structures from different families and/or folds evolve in the same manner, we compared the degrees of structural change per unit of sequence change ("the evolutionary plasticity of structure") between those families with a significant linear correlation. Using rigorous statistical procedures we find that, with a few exceptions, evolutionary plasticity does not show a statistically significant difference between protein families. Similar sequence-structure analysis performed for protein loop regions shows that evolutionary plasticity of loop regions is greater than for the protein core.  相似文献   

4.
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user‐specified global root‐mean‐squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed‐forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state‐of‐the‐art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.  相似文献   

5.
The structure of a protein molecule consists of both rigid and flexible sections to satisfy the demands for stability and catalysis. Because the flexibility of a protein segment is indispensable for a proteolytic attack, limited proteolysis is a superb tool to analyse both confined local fluctuations and global unfolding events in proteins. While the identification of the primary cleavage products allows the assignment of the flexible regions to the primary structure, the kinetics of proteolytic degradation enables differentiation between local fluctuations in the native protein molecule and the global unfolding process during denaturation. Modifications of the amino acid sequence in the concerned regions can tune proteolytic susceptibility and alter protein stability. In the present paper, we summarise our results on native-state and unfolded-state proteolysis of ribonuclease A (RNase A) and the effect of mutations in the detected flexible regions on the stability and unfolding of the RNase A molecule.  相似文献   

6.
The structure of a protein molecule consists of both rigid and flexible sections to satisfy the demands for stability and catalysis. Because the flexibility of a protein segment is indispensable for a proteolytic attack, limited proteolysis is a superb tool to analyse both confined local fluctuations and global unfolding events in proteins. While the identification of the primary cleavage products allows the assignment of the flexible regions to the primary structure, the kinetics of proteolytic degradation enables differentiation between local fluctuations in the native protein molecule and the global unfolding process during denaturation. Modifications of the amino acid sequence in the concerned regions can tune proteolytic susceptibility and alter protein stability. In the present paper, we summarise our results on native-state and unfolded-state proteolysis of ribonuclease A (RNase A) and the effect of mutations in the detected flexible regions on the stability and unfolding of the RNase A molecule.  相似文献   

7.
The thermodynamic stability of a protein provides an experimental metric for the relationship of protein sequence and native structure. We have investigated an approach based on an analysis of the structural database for stability engineering of an immunoglobulin variable domain. The most frequently occurring residues in specific positions of beta-turn motifs were predicted to increase the folding stability of mutants that were constructed by site-directed mutagenesis. Even in positions in which different residues are conserved in immunoglobulin sequences, the predictions were confirmed. Frequently, mutants with increased beta-turn propensities display increased folding cooperativities, suggesting pronounced effects on the unfolded state independent of the expected effect on conformational entropy. We conclude that structural motifs with predominantly local interactions can serve as templates with which patterns of sequence preferences can be extracted from the database of protein structures. Such preferences can predict the stability effects of mutations for protein engineering and design.  相似文献   

8.
9.
Three-dimensional domain swapping occurs when two or more identical proteins exchange identical parts of their structure to generate an oligomeric unit. It affects proteins with diverse sequences and structures, and is expected to play important roles in evolution, functional regulation and even conformational diseases. Here, we search for traces of domain swapping in the protein sequence, by means of algorithms that predict the structure and stability of proteins using database-derived potentials. Regions whose sequences are not optimal with regard to the stability of the native structure, or showing marked intrinsic preferences for non-native conformations in absence of tertiary interactions are detected in most domain-swapping proteins. These regions are often located in areas crucial in the swapping process and are likely to influence it on a kinetic or thermodynamic level. In addition, cation-pi interactions are frequently observed to zip up the edges of the interface between intertwined chains or to involve hinge loop residues, thereby modulating stability. We end by proposing a set of mutations altering the swapping propensities, whose experimental characterization would contribute to refine our in silico derived hypotheses.  相似文献   

10.
Effects of salt bridges on protein structure and design.   总被引:1,自引:2,他引:1       下载免费PDF全文
Theoretical calculations (Hendsch ZS & Tidor B, 1994, Protein Sci 3:211-226) and experiments (Waldburger CD et al., 1995, Nat Struct Biol 2:122-128; Wimley WC et al., 1996, Proc Natl Acad Sci USA 93:2985-2990) suggest that hydrophobic interactions are more stabilizing than salt bridges in protein folding. The lack of apparent stability benefit for many salt bridges requires an alternative explanation for their occurrence within proteins. To examine the effect of salt bridges on protein structure and stability in more detail, we have developed an energy function for simple cubic lattice polymers based on continuum electrostatic calculations of a representative selection of salt bridges found in known protein crystal structures. There are only three types of residues in the model, with charges of -1, 0, or + 1. We have exhaustively enumerated conformational space and significant regions of sequence space for three-dimensional cubic lattice polymers of length 16. The results demonstrate that, while the more highly charged sequences are less stable, the loss of stability is accompanied by a substantial reduction in the degeneracy of the lowest-energy state. Moreover, the reduction in degeneracy is greater due to charges that pair than for lone charges that remain relatively exposed to solvent. We have also explored and illustrated the use of ion-pairing strategies for rational structural design using model lattice studies.  相似文献   

11.
Daily MD  Gray JJ 《Proteins》2007,67(2):385-399
Allosteric proteins have been studied extensively in the last 40 years, but so far, no systematic analysis of conformational changes between allosteric structures has been carried out. Here, we compile a set of 51 pairs of known inactive and active allosteric protein structures from the Protein Data Bank. We calculate local conformational differences between the two structures of each protein using simple metrics, such as backbone and side-chain Cartesian displacement, and torsion angle change and rearrangement in residue-residue contacts. Thresholds for each metric arise from distributions of motions in two control sets of pairs of protein structures in the same biochemical state. Statistical analysis of motions in allosteric proteins quantifies the magnitude of allosteric effects and reveals simple structural principles about allostery. For example, allosteric proteins exhibit substantial conformational changes comprising about 20% of the residues. In addition, motions in allosteric proteins show strong bias toward weakly constrained regions such as loops and the protein surface. Correlation functions show that motions communicate through protein structures over distances averaging 10-20 residues in sequence space and 10-20 A in Cartesian space. Comparison of motions in the allosteric set and a set of 21 nonallosteric ligand-binding proteins shows that nonallosteric proteins also exhibit bias of motion toward weakly constrained regions and local correlation of motion. However, allosteric proteins exhibit twice as much percent motion on average as nonallosteric proteins with ligand-induced motion. These observations may guide efforts to design flexibility and allostery into proteins.  相似文献   

12.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

13.
It is generally believed that loop regions in globular proteins, and particularly hypervariable loops in immunoglobulins, can accommodate a wide variety of sequence changes without jeopardizing protein structure or stability. We show here, however, that novel sequences introduced within complementarity determining regions (CDRs) 1 and 3 of the immunoglobulin variable domain REI VL can significantly diminish the stability of the native state of this protein. Besides their implications for the general role of loops in the stability of globular proteins, these results suggest previously unrecognized stability constraints on the variability of CDRs that may impact efforts to engineer new and improved activities into antibodies.  相似文献   

14.
T Palzkill  D Botstein 《Proteins》1992,14(1):29-44
A new analytical mutagenesis technique is described that involves randomizing the DNA sequence of a short stretch of a gene (3-6 codons) and determining the percentage of all possible random sequences that produce a functional protein. A low percentage of functional random sequences in a complete library of random substitutions indicates that the region mutagenized is important for the structure and/or function of the protein. Repeating the mutagenesis over many regions throughout a protein gives a global perspective of which amino acid sequences in a protein are critical. We applied this method to 66 codons of the gene encoding TEM-1 beta-lactamase in 19 separate experiments. We found that TEM-1 beta-lactamase is extremely tolerant of amino acid substitutions: on average, 44% of all mutants with random substitutions function and 20% of the substitutions are expressed, secreted, and fold well enough to function at levels similar to those for the wild-type enzyme. We also found a few exceptional regions where only a few random sequences function. Examination of the X-ray structures of homologous beta-lactamases indicates that the regions most sensitive to substitution are in the vicinity of the active site pocket or buried in the hydrophobic core of the protein. DNA sequence analysis of functional random sequences has been used to obtain more detailed information about the amino acid sequence requirements for several regions and this information has been compared to sequence conservation among several related beta-lactamases.  相似文献   

15.
We develop an approximate maximum likelihood method to estimate flanking nucleotide context-dependent mutation rates and amino acid exchange-dependent selection in orthologous protein-coding sequences and use it to analyze genome-wide coding sequence alignments from mammals and yeast. Allowing context-dependent mutation provides a better fit to coding sequence data than simpler (context-independent or CpG "hotspot") models and significantly affects selection parameter estimates. Allowing asymmetric (nonreciprocal) selection on amino acid exchanges gives a better fit than simple dN/dS or symmetric selection models. Relative selection strength estimates from our models show good agreement with independent estimates derived from human disease-causing and engineered mutations. Selection strengths depend on local protein structure, showing expected biophysical trends in helical versus nonhelical regions and increased asymmetry on polar-hydrophobic exchanges with increased burial. The more stringent selection that has previously been observed for highly expressed proteins is primarily concentrated in buried regions, supporting the notion that such proteins are under stronger than average selection for stability. Our analyses indicate that a highly parameterized model of mutation and selection is computationally tractable and is a useful tool for exploring a variety of biological questions concerning protein and coding sequence evolution.  相似文献   

16.
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function.  相似文献   

17.
Baoqiang Cao  Ron Elber 《Proteins》2010,78(4):985-1003
We investigate small sequence adjustments (of one or a few amino acids) that induce large conformational transitions between distinct and stable folds of proteins. Such transitions are intriguing from evolutionary and protein‐design perspectives. They make it possible to search for ancient protein structures or to design protein switches that flip between folds and functions. A network of sequence flow between protein folds is computed for representative structures of the Protein Data Bank. The computed network is dense, on an average each structure is connected to tens of other folds. Proteins that attract sequences from a higher than expected number of neighboring folds are more likely to be enzymes and alpha/beta fold. The large number of connections between folds may reflect the need of enzymes to adjust their structures for alternative substrates. The network of the Cro family is discussed, and we speculate that capacity is an important factor (but not the only one) that determines protein evolution. The experimentally observed flip from all alpha to alpha + beta fold is examined by the network tools. A kinetic model for the transition of sequences between the folds (with only protein stability in mind) is proposed. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Principles of protein folding--a perspective from simple exact models.   总被引:20,自引:12,他引:20       下载免费PDF全文
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse.  相似文献   

19.
We evaluate 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER , a program for comparative protein modeling by satisfaction of spatial restraints. The models have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures. The largest errors occur in the regions that were not aligned correctly or where the template structures are not similar to the correct structure. These regions correspond predominantly to exposed loops, insertions of any length, and non-conserved side chains. When a template structure with more than 40% sequence identity to the target protein is available, the model is likely to have about 90% of the mainchain atoms modeled with an rms deviation from the X-ray structure of ≈ 1 Å, in large part because the templates are likely to be that similar to the X-ray structure of the target. This rms deviation is comparable to the overall differences between refined NMR and X-ray crystallography structures of the same protein. © 1995 Wiley-Liss, Inc.  相似文献   

20.
The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号