首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Li M  Huang Y  Xiao Y 《Proteins》2008,72(4):1161-1170
Proteins with symmetric structures are ideal models to investigate the sequence-structure relations. We investigate proteins with beta-trefoil fold and find they have different degrees of sequence symmetries although they show similar symmetric structures. To understand this, we calculate the strength of interactions of the beta-trefoil folds with surrounding environments and find the low degrees of sequence symmetries are often correlated with large external interactions. Our results give an additional confirmation of Anfinsen's thermodynamic hypothesis that protein structures are not only determined by their sequences but also by their surrounding environments. We suggest the external interactions should be considered additionally in protein structure prediction through ab initio folding.  相似文献   

2.
Determination of the structures of fibroblast growth factors and interleukin-1s has previously revealed that they both adopt a beta-trefoil fold, similar to those found in Kunitz soybean trypsin inhibitors, ricin-like toxins, plant agglutinins and hisactophilin. These families possess distinct functions and occur in different subcellular localisations, and they appear to lack significant similarities in their sequences, ligands and modes of ligand binding. We have analysed the significance of sequence identities observed after structure alignment and provide statistical evidence that these beta-trefoil proteins are all homologues, having arisen from a common ancestor. In addition, we have explored the sequence space of all beta-trefoil proteins and have determined that the actin-binding proteins fascins, and other proteins of unknown function, are beta-trefoil family homologues. Unlike other beta-trefoil proteins, the triplicated repeats in each of the four beta-trefoil domains of fascins are significantly similar in sequence. This hints at how the beta-trefoil fold arose from the duplication of an ancestral gene encoding a homotrimeric single-repeat protein. The combined analysis of structure and sequence databases for detecting significant similarities is suggested as a highly sensitive approach to determining the common ancestry of extremely divergent homologues.  相似文献   

3.
The beta-turn is the most common type of nonrepetitive structure in globular proteins, comprising ~25% of all residues; however, a detailed understanding of effects of specific residues upon beta-turn stability and conformation is lacking. Human acidic fibroblast growth factor (FGF-1) is a member of the beta-trefoil superfold and contains a total of five beta-hairpin structures (antiparallel beta-sheets connected by a reverse turn). beta-Turns related by the characteristic threefold structural symmetry of this superfold exhibit different primary structures, and in some cases, different secondary structures. As such, they represent a useful system with which to study the role that turn sequences play in determining structure, stability, and folding of the protein. Two turns related by the threefold structural symmetry, the beta4/beta5 and beta8/beta9 turns, were subjected to both sequence-swapping and poly-glycine substitution mutations, and the effects upon stability, folding, and structure were investigated. In the wild-type protein these turns are of identical length, but exhibit different conformations. These conformations were observed to be retained during sequence-swapping and glycine substitution mutagenesis. The results indicate that the beta-turn structure at these positions is not determined by the turn sequence. Structural analysis suggests that residues flanking the turn are a primary structural determinant of the conformation within the turn.  相似文献   

4.
The ability to predict structure from sequence is particularly important for toxins, virulence factors, allergens, cytokines, and other proteins of public health importance. Many such functions are represented in the parallel beta-helix and beta-trefoil families. A method using pairwise beta-strand interaction probabilities coupled with evolutionary information represented by sequence profiles is developed to tackle these problems for the beta-helix and beta-trefoil folds. The algorithm BetaWrapPro employs a "wrapping" component that may capture folding processes with an initiation stage followed by processive interaction of the sequence with the already-formed motifs. BetaWrapPro outperforms all previous motif recognition programs for these folds, recognizing the beta-helix with 100% sensitivity and 99.7% specificity and the beta-trefoil with 100% sensitivity and 92.5% specificity, in crossvalidation on a database of all nonredundant known positive and negative examples of these fold classes in the PDB. It additionally aligns 88% of residues for the beta-helices and 86% for the beta-trefoils accurately (within four residues of the exact position) to the structural template, which is then used with the side-chain packing program SCWRL to produce 3D structure predictions. One striking result has been the prediction of an unexpected parallel beta-helix structure for a pollen allergen, and its recent confirmation through solution of its structure. A Web server running BetaWrapPro is available and outputs putative PDB-style coordinates for sequences predicted to form the target folds.  相似文献   

5.
Human acidic fibroblast growth factor (FGF-1) is a member of the beta-trefoil hyperfamily and exhibits a characteristic threefold symmetry of the tertiary structure. However, evidence of this symmetry is not readily apparent at the level of the primary sequence. This suggests that while selective pressures may exist to retain (or converge upon) a symmetric tertiary structure, other selective pressures have resulted in divergence of the primary sequence during evolution. Using intra-chain and homologue sequence comparisons for 19 members of this family of proteins, we have designed mutants of FGF-1 that constrain a subset of core-packing residues to threefold symmetry at the level of the primary sequence. The consequences of these mutations regarding structure and stability were evaluated using a combination of X-ray crystallography and differential scanning calorimetry. The mutational effects on structure and stability can be rationalized through the characterization of "microcavities" within the core detected using a 1.0A probe radius. The results show that the symmetric constraint within the primary sequence is compatible with a well-packed core and near wild-type stability. However, despite the general maintenance of overall thermal stability, a noticeable increase in non-two-state denaturation follows the increase in primary sequence symmetry. Therefore, properties of folding, rather than stability, may contribute to the selective pressure for asymmetric primary core sequences within symmetric protein architectures.  相似文献   

6.
The sequence and structural analysis of cadherins allow us to find sequence determinants-a few positions in sequences whose residues are characteristic and specific for the structures of a given family. Comparison of the five extracellular domains of classic cadherins showed that they share the same sequence determinants despite only a nonsignificant sequence similarity between the N-terminal domain and other extracellular domains. This allowed us to predict secondary structures and propose three-dimensional structures for these domains that have not been structurally analyzed previously. A new method of assigning a sequence to its proper protein family is suggested: analysis of sequence determinants. The main advantage of this method is that it is not necessary to know all or almost all residues in a sequence as required for other traditional classification tools such as BLAST, FASTA, and HMM. Using the key positions only, that is, residues that serve as the sequence determinants, we found that all members of the classic cadherin family were unequivocally selected from among 80,000 examined proteins. In addition, we proposed a model for the secondary structure of the cytoplasmic domain of cadherins based on the principal relations between sequences and secondary structure multialignments. The patterns of the secondary structure of this domain can serve as the distinguishing characteristics of cadherins.  相似文献   

7.
An alternative core packing group, involving a set of five positions, has been introduced into human acidic FGF-1. This alternative group was designed so as to constrain the primary structure within the core region to the same threefold symmetry present in the tertiary structure of the protein fold (the beta-trefoil superfold). The alternative core is essentially indistinguishable from the WT core with regard to structure, stability, and folding kinetics. The results show that the beta-trefoil superfold is compatible with a threefold symmetric constraint on the core region, as might be the case if the superfold arose as a result of gene duplication/fusion events. Furthermore, this new core arrangement can form the basis of a structural "building block" that can greatly simplify the de novo design of beta-trefoil proteins by using symmetric structural complementarity. Remaining asymmetry within the core appears to be related to asymmetry in the tertiary structure associated with receptor and heparin binding functionality of the growth factor.  相似文献   

8.
Based on previous studies of interleukin-1beta (IL-1beta) and both acidic and basic fibroblast growth factors (FGFs), it has been suggested that the folding of beta-trefoil proteins is intrinsically slow and may occur via the formation of essential intermediates. Using optical and NMR-detected quenched-flow hydrogen/deuterium exchange methods, we have measured the folding kinetics of hisactophilin, another beta-trefoil protein that has < 10% sequence identity and unrelated function to IL-1beta and FGFs. We find that hisactophilin can fold rapidly and with apparently two-state kinetics, except under the most stabilizing conditions investigated where there is evidence for formation of a folding intermediate. The hisactophilin intermediate has significant structural similarities to the IL-1beta intermediate that has been observed experimentally and predicted theoretically using a simple, topology-based folding model; however, it appears to be different from the folding intermediate observed experimentally for acidic FGF. For hisactophilin and acidic FGF, intermediates are much less prominent during folding than for IL-1beta. Considering the structures of the different beta-trefoil proteins, it appears that differences in nonconserved loops and hydrophobic interactions may play an important role in differential stabilization of the intermediates for these proteins.  相似文献   

9.
10.
Crystal structures of the bacterial multidrug transporter AcrB in R32 and C2 space groups showing both symmetric and asymmetric trimeric assemblies, respectively, supplemented with biochemical investigations, have provided most of the structural basis for a molecular level understanding of the protein structure and mechanisms for substrate uptake and translocation carried out by this 114-kDa inner membrane protein. They suggest that AcrB captures ligands primarily from the periplasm. Substrates can also enter the inner cavity of the transporter from the cytoplasm, but the exact mechanism of this remains undefined. Analysis of the amino acid sequences of AcrB and its homologs revealed the presence of conserved residues at the N-terminus including two phenylalanines which may be exposed to the cytoplasm. Any potential role that these conserved residues may play in function has not been addressed by existing biochemical or structural studies. Since phenylalanine residues elsewhere in the protein have been implicated in ligand binding, we explored the structure of this N-terminal region to investigate structural determinants near the cytoplasmic opening that may mediate drug uptake. Our structure of AcrB in R32 space group reveals an N-terminus loop, reducing the diameter of the central opening to approximately 15 A as opposed to the previously reported value of approximately 30 A for crystal structures in this space group with disordered N-terminus. Recent structures of the AcrB in C2 space group have revealed a helical conformation of this N-terminus but have not discussed its possible implications. We present the crystal structure of AcrB that reveals the structure of the N-terminus containing the conserved residues. We hope that the structural information provides a structural basis for others to design further biochemical investigation of the role of this portion of AcrB in mediating cytoplasmic ligand discrimination and uptake.  相似文献   

11.
Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.  相似文献   

12.
Based on the recently determined X-ray structures of Torpedo californica acetylcholinesterase and Geotrichum candidum lipase and on their three-dimensional superposition, an improved alignment of a collection of 32 related amino acid sequences of other esterases, lipases, and related proteins was obtained. On the basis of this alignment, 24 residues are found to be invariant in 29 sequences of hydrolytic enzymes, and an additional 49 are well conserved. The conservation in the three remaining sequences is somewhat lower. The conserved residues include the active site, disulfide bridges, salt bridges, and residues in the core of the proteins. Most invariant residues are located at the edges of secondary structural elements. A clear structural basis for the preservation of many of these residues can be determined from comparison of the two X-ray structures.  相似文献   

13.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

14.
Abstract

The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only ~7% of proteins are observed in the corresponding PDB structures, and only ~25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, “Observed” (which correspond to structured regions), “Not observed” (regions with missing electron density, potentially disordered), “Uncharacterized,” and “Ambiguous,” depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a ‘fragment’ or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. “Non-observed,” “Ambiguous,” and “Uncharacterized” regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR® VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the “Observed” dataset are ordered, and that the “Not observed” regions are mostly disordered. The “Uncharacterized” regions possess some tendency toward order, whereas the predictions for the short “Ambiguous” regions are really ambiguous. Long “Ambiguous” regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be “wobbly” domains.

Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset ~10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and ~40% of the proteins possess short regions (≥10 and <30 amino-acid long) of missing and ambiguous residues.  相似文献   

15.
Intrinsic disorder in the Protein Data Bank   总被引:2,自引:0,他引:2  
The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.  相似文献   

16.
A detailed comparison of the structures of aspartate aminotransferase, alanine race-mase, the beta subunit of tryptophan synthase, D-amino acid aminotransferase and glycogen phosphorylase has revealed more extensive structural similarities among pyridoxal phosphate (PLP)-binding domains in these enzymes than was observed previously. These similarities consist of seven common structural segments of the polypeptide chain, which form an extensive common structural organization of the backbone chain responsible for the appropriate disposition of key residues, some from the aligned fragments and some from variable loops joined to these fragments, interacting with PLPs in these enzymes. This common structural organization contains an analogous hydrophobic minicore formed from four amino acid side chains present in the two most conserved structural elements. In addition, equivalent alpha-beta-alpha-beta supersecondary structures are formed by these seven fragments in three of the five structures: alanine racemase, tryptophan synthase and glycogen phosphorylase. Despite these similarities, it is generally accepted that these proteins do not share a common heritage, but have arisen on five separate occasions. The common and contiguous alpha-beta-alpha-beta structure accounts for only 28 residues and all five enzymes differ greatly in both the orientation of the PLP pyridoxal rings and their contacts with residues close to the common structural elements.  相似文献   

17.
The atomic-level structural properties of proteins, such as bond lengths, bond angles, and torsion angles, have been well studied and understood based on either chemistry knowledge or statistical analysis. Similar properties on the residue-level, such as the distances between two residues and the angles formed by short sequences of residues, can be equally important for structural analysis and modeling, but these have not been examined and documented on a similar scale. While these properties are difficult to measure experimentally, they can be statistically estimated in meaningful ways based on their distributions in known proteins structures. Residue-level structural properties including various types of residue distances and angles are estimated statistically. A software package is built to provide direct access to the statistical data for the properties including some important correlations not previously investigated. The distributions of residue distances and angles may vary with varying sequences, but in most cases, are concentrated in some high probability ranges, corresponding to their frequent occurrences in either α-helices or β-sheets. Strong correlations among neighboring residue angles, similar to those between neighboring torsion angles at the atomic-level, are revealed based on their statistical measures. Residue-level statistical potentials can be defined using the statistical distributions and correlations of the residue distances and angles. Ramachandran-like plots for strongly correlated residue angles are plotted and analyzed. Their applications to structural evaluation and refinement are demonstrated. With the increase in both number and quality of known protein structures, many structural properties can be derived from sets of protein structures by statistical analysis and data mining, and these can even be used as a supplement to the experimental data for structure determinations. Indeed, the statistical measures on various types of residue distances and angles provide more systematic and quantitative assessments on these properties, which can otherwise be estimated only individually and qualitatively. Their distributions and correlations in known protein structures show their importance for providing insights into how proteins may fold naturally to various residue-level structures.  相似文献   

18.
Symmetry, and in particular point group symmetry, is generally the rule for the global arrangement between subunits in homodimeric and other oligomeric proteins. The structures of fragments of tropomyosin and bovine fibrinogen are recently published examples, however, of asymmetric interactions between chemically identical chains. Their departures from strict twofold symmetry are based on simple and generalizable chemical designs, but were not anticipated prior to their structure determinations. The current review aims to improve our understanding of the structural principles and functional consequences of asymmetric interactions in proteins. Here, a survey of >100 diverse homodimers has focused on the structures immediately adjacent to the twofold axis. Five regular frameworks in alpha-helical coiled coils and antiparallel beta-sheets accommodate many of the twofold symmetric axes. On the basis of these frameworks, certain sequence motifs can break symmetry in geometrically defined manners. In antiparallel beta-sheets, these asymmetries include register slips between strands of repeating residues and the adoption of different side-chain rotamers to avoid steric clashes of bulky residues. In parallel coiled coils, an axial stagger between the alpha-helices is produced by clusters of core alanines. Such simple designs lead to a basic understanding of the functions of diverse proteins. These functions include regulation of muscle contraction by tropomyosin, blood clot formation by fibrin, half-of-site reactivity of caspase-9, and adaptive protein recognition in the matrix metalloproteinase MMP9. Moreover, asymmetry between chemically identical subunits, by producing multiple equally stable conformations, leads to unique dynamic and self-assembly properties.  相似文献   

19.
20.
Extensive sequence data and structural sampling of expressed proteins from different species lead to the idea that entire molecules or specific domain folds belong to large superfamilies of proteins. A subset of G protein-coupled receptors, one of the largest families involved in cellular signaling, rod and cone opsins are involved in phototransduction in photoreceptor cells. Here, the evolutionary analysis of opsin sequences and structures predicts key residues involved in the transmission of the signal from the binding site of the chromophore to the cytoplasmic surface and residues that are involved in the spectral tuning of opsins to short wavelengths of light.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号