首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our consensus domain dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1,695 folds in the CDD as being non‐autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predictions suggest that 40% of proteins have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets.  相似文献   

2.
3.
Qi Y  Grishin NV 《Proteins》2005,58(2):376-388
Protein structure classification is necessary to comprehend the rapidly growing structural data for better understanding of protein evolution and sequence-structure-function relationships. Thioredoxins are important proteins that ubiquitously regulate cellular redox status and various other crucial functions. We define the thioredoxin-like fold using the structure consensus of thioredoxin homologs and consider all circular permutations of the fold. The search for thioredoxin-like fold proteins in the PDB database identified 723 protein domains. These domains are grouped into eleven evolutionary families based on combined sequence, structural, and functional evidence. Analysis of the protein-ligand structure complexes reveals two major active site locations for the thioredoxin-like proteins. Comparison to existing structure classifications reveals that our thioredoxin-like fold group is broader and more inclusive, unifying proteins from five SCOP folds, five CATH topologies and seven DALI domain dictionary globular folding topologies. Considering these structurally similar domains together sheds new light on the relationships between sequence, structure, function and evolution of thioredoxins.  相似文献   

4.
We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web.  相似文献   

5.
Although many naturally occurring proteins consist of multiple domains, most studies on protein folding to date deal with single-domain proteins or isolated domains of multi-domain proteins. Studies of multi-domain protein folding are required for further advancing our understanding of protein folding mechanisms. Borrelia outer surface protein A (OspA) is a β-rich two-domain protein, in which two globular domains are connected by a rigid and stable single-layer β-sheet. Thus, OspA is particularly suited as a model system for studying the interplays of domains in protein folding. Here, we studied the equilibria and kinetics of the urea-induced folding–unfolding reactions of OspA probed with tryptophan fluorescence and ultraviolet circular dichroism. Global analysis of the experimental data revealed compelling lines of evidence for accumulation of an on-pathway intermediate during kinetic refolding and for the identity between the kinetic intermediate and a previously described equilibrium unfolding intermediate. The results suggest that the intermediate has the fully native structure in the N-terminal domain and the single layer β-sheet, with the C-terminal domain still unfolded. The observation of the productive on-pathway folding intermediate clearly indicates substantial interactions between the two domains mediated by the single-layer β-sheet. We propose that a rigid and stable intervening region between two domains creates an overlap between two folding units and can energetically couple their folding reactions.  相似文献   

6.
It is known that larger globular proteins are built from domains, relatively independent structural units. A domain size seems to be limited, and a single domain consists of from few tens to a couple of hundred amino acids. Based on Monte Carlo simulations of a reduced protein model restricted to the face centered simple cubic lattice, with a minimal set of short-range and long-range interactions, we have shown that some model sequences upon the folding transition spontaneously divide into separate domains. The observed domain sizes closely correspond to the sizes of real protein domains. Short chains with a proper sequence pattern of the hydrophobic and polar residues undergo a two-state folding transition to the structurally ordered globular state, while similar longer sequences follow a multistate transition. Homopolymeric (uniformly hydrophobic) chains and random heteropolymers undergo a continuous collapse transition into a single globule, and the globular state is much less ordered. Thus, the factors responsible for the multidomain structure of proteins are sufficiently long polypeptide chain and characteristic, protein-like, sequence patterns. These findings provide some hints for the analysis of real sequences aimed at prediction of the domain structure of large proteins.  相似文献   

7.
Multidomain proteins are the product of evolutionary selection for diversity of function through concatenation and repurposing of existing modular units of structures. In structures of proteins with multiple domains, components are often globular units stitched together with flexible linkers. Multidomain proteins often fold as multiple distinct order–disorder transitions. However, the relationship between structure and folding is not always straightforward. Tropomyosin binds to actin in muscle and cytoskeletal filaments. The structure is that of a continuous ɑ-helix lacking domain boundaries, but unfolding shows distinct transitions suggesting at least three possible domains do exist. To explore how domains might occur in a continuous structure, we used Lifson-Roig helix-coil models with sequence domains of varying helical nucleation propensities. Of these models, ones with a central folding insulator, separating folding of N- and C-terminal domains, are most consistent with experimental folding studies. The positions of domain boundaries are identified by hydrogen–deuterium exchange mass spectrometry. The presence of structurally cryptic folding domains in tropomyosin could relate to its evolution and explain the uneven distribution of deleterious mutations that lead to various cardiomyopathies.  相似文献   

8.
The domain is a fundamental unit of protein structure. Numerous studies have analyzed folding patterns in protein domains of known structure to gain insight into the underlying protein folding process. Are such patterns a haphazard assortment or are they similar to sentences in a language, which can be generated by an underlying grammar? Specifically, can a small number of intuitively sensible rules generate a large class of folds, including feasible new folds? In this paper, we explore the extent to which four simple rules can generate the known all‐β folds, using tools from graph theory. As a control, an exhaustive set of β‐sandwiches was tested and found to be largely incompatible with such a grammar. The existence of a protein grammar has potential implications for both the mechanism of folding and the evolution of domains.  相似文献   

9.
Although our understanding of globular protein folding continues to advance, the irregular tertiary structures and high cooperativity of globular proteins complicates energetic dissection. Recently, proteins with regular, repetitive tertiary structures have been identified that sidestep limitations imposed by globular protein architecture. Here we review recent studies of repeat-protein folding. These studies uniquely advance our understanding of both the energetics and kinetics of protein folding. Equilibrium studies provide detailed maps of local stabilities, access to energy landscapes, insights into cooperativity, determination of nearest-neighbor interaction parameters using statistical thermodynamics, relationships between consensus sequences and repeat-protein stability. Kinetic studies provide insight into the influence of short-range topology on folding rates, the degree to which folding proceeds by parallel (versus localized) pathways, and the factors that select among multiple potential pathways. The recent application of force spectroscopy to repeat-protein unfolding is providing a unique route to test and extend many of these findings.  相似文献   

10.
Domains are the structural, functional, and evolutionary components of proteins. Most folding studies to date have concentrated on the folding of single domains, but more than 70% of human proteins contain more than one domain, and interdomain interactions can affect both the stability and the folding kinetics. Whether the folding pathway is altered by interdomain interactions is not yet known. Here we investigated the effect of a folded neighbouring domain on the folding pathway of spectrin R16 (the 16th α-helical repeat from chicken brain α-spectrin) by using the two-domain construct R1516. The R16 folds faster and unfolds more slowly in the presence of its folded neighbour R15 (the 15th α-helical repeat from chicken brain α-spectrin). An extensive Φ-value analysis of the R16 domain in R1516 was completed to compare the transition state of the R16 domain alone with that of the R16 domain in a multidomain construct. The results indicate that the folding pathways are the same. This result validates the current approach of breaking up larger proteins into domains for the study of protein folding pathways.  相似文献   

11.
Although progress has been made to determine the native fold of a polypeptide from its primary structure, the diversity of pathways that connect the unfolded and folded states has not been adequately explored. Theoretical and computational studies predict that proteins fold through parallel pathways on funneled energy landscapes, although experimental detection of pathway diversity has been challenging. Here, we exploit the high translational symmetry and the direct length variation afforded by linear repeat proteins to directly detect folding through parallel pathways. By comparing folding rates of consensus ankyrin repeat proteins (CARPs), we find a clear increase in folding rates with increasing size and repeat number, although the size of the transition states (estimated from denaturant sensitivity) remains unchanged. The increase in folding rate with chain length, as opposed to a decrease expected from typical models for globular proteins, is a clear demonstration of parallel pathways. This conclusion is not dependent on extensive curve-fitting or structural perturbation of protein structure. By globally fitting a simple parallel-Ising pathway model, we have directly measured nucleation and propagation rates in protein folding, and have quantified the fluxes along each path, providing a detailed energy landscape for folding. This finding of parallel pathways differs from results from kinetic studies of repeat-proteins composed of sequence-variable repeats, where modest repeat-to-repeat energy variation coalesces folding into a single, dominant channel. Thus, for globular proteins, which have much higher variation in local structure and topology, parallel pathways are expected to be the exception rather than the rule.  相似文献   

12.
The identification of protein domains within multi-domain proteins is a persistent problem. Here, we describe an experimental method (shotgun proteolysis) based on random DNA fragmentation and protease selection of the encoded polypeptides on phage for this purpose. We applied the method to the Escherichia coli genome and identified 124 protease-resistant fragments; several were re-cloned for expression as soluble fragments in bacteria, and corresponded to autonomously folding units with folding energies similar to natural protein domains (DeltaG(u)=3.8-6.6 kcal/mol). Structural information was available for approximately half of the selected proteins, which corresponded to compact, globular and domain-sized units that had been derived from a wide range of protein superfamilies. Furthermore, boundaries of the selected fragments correlated with domain boundaries as defined by bioinformatics predictions (R2=0.82; p=0.016). However, predictions were incomplete or entirely lacking for the remaining fragments, reflecting the limited proteome coverage of current bioinformatics methods. Shotgun proteolysis therefore provides a means to identify domains and other autonomously folding units on a genome-wide scale, without any prior knowledge of sequence or structure. Shotgun proteolysis should be particularly valuable for structural studies of proteins and represents a high-throughput alternative to the classical limited proteolysis method for the isolation of stable components of multi-domain proteins.  相似文献   

13.
Folding intermediates have been detected and characterized for many proteins. However, their structures at atomic resolution have only been determined for two small single domain proteins: Rd-apocytochrome b(562) and engrailed homeo domain. T4 lysozyme has two easily distinguishable but energetically coupled domains: the N and C-terminal domains. An early native-state hydrogen exchange experiment identified an intermediate with the C-terminal domain folded and the N-terminal domain unfolded. We have used a native-state hydrogen exchange-directed protein engineering approach to populate this intermediate and demonstrated that it is on the folding pathway and exists after the rate-limiting step. Here, we determined its high-resolution structure and the backbone dynamics by multi-dimensional NMR methods. We also characterized the folding behavior of the intermediate using stopped-flow fluorescence, protein engineering, and native-state hydrogen exchange. Unlike the folding intermediates of the two single-domain proteins, which have many non-native side-chain interactions, the structure of the hidden folding intermediate of T4 lysozyme is largely native-like. It folds like many small single domain proteins. These results have implications for understanding the folding mechanism and evolution of multi-domain proteins.  相似文献   

14.

Background

As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains?

Results

To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database.

Conclusion

The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins.  相似文献   

15.
Recognition of protein fold from amino acid sequence is a challenging task. The structure and stability of proteins from different fold are mainly dictated by inter-residue interactions. In our earlier work, we have successfully used the medium- and long-range contacts for predicting the protein folding rates, discriminating globular and membrane proteins and for distinguishing protein structural classes. In this work, we analyze the role of inter-residue interactions in commonly occurring folds of globular proteins in order to understand their folding mechanisms. In the medium-range contacts, the globin fold and four-helical bundle proteins have more contacts than that of DNA-RNA fold although they all belong to all-alpha class. In long-range contacts, only the ribonuclease fold prefers 4-10 range and the other folding types prefer the range 21-30 in alpha/beta class proteins. Further, the preferred residues and residue pairs influenced by these different folds are discussed. The information about the preference of medium- and long-range contacts exhibited by the 20 amino acid residues can be effectively used to predict the folding type of each protein.  相似文献   

16.
The score matrix from a structure comparison program (SAP) was used to search for repeated structures using a Fourier analysis. When tested with artificial data, a simple Fourier transform of the smoothed matrix provided a clear signal of the repeat periodicity that could be used to extract the repeating units with the SAP program. The strength of the Fourier signal was calibrated against the signal from model proteins. The most useful of these was the novel random-walk approach employed to generate realistic 'fake' structures. On the basis of these it was possible to conclude that only a small proportion of protein structures have an unexpected degree of symmetry. Artificially generated 'ideal' folds provided an upper limit on the strength of signal that could be expected from a 'perfectly' repeating compact structure. Unexpectedly, some of the very regular beta-propellor folds attained the same strength but the majority of symmetric structures lay below this region. When native proteins were ranked by the power of their spectrum a wide variety of fold types were seen to score highly. In the betaalpha class, these included the globular betaalpha proteins and the more repetitive leucine-rich betaalpha folds. In the all-beta class; beta-propellors, beta-prisms and beta-helices were found as well as the more globular gamma-crystalin domains. When this ranked list was filtered to remove proteins that contained detectable internal sequence similarity (using the program REPRO), the list became exclusively composed of just globular betaalpha class proteins and in the top 50 re-ranked proteins, only a single 4-fold propellor structure remained.  相似文献   

17.
Compactness has been used to locate discontinuous structural units containing one or more polypeptide chains in proteins of known structure. Rather than exhaustively calculating the compactness of all possible units, our procedure uses a screening algorithm to find discontinuous regions that are potentially compact. Precise calculations of compactness are restricted only to units in these regions. With our procedure, compactness can be used to discover discontinuous domains with virtually any number of disjoint peptides. Small, single-domain proteins may contain several compact regions: thus, compact regions do not always correspond to folding domains. Because a domain is an independent folding unit and should contain a hydrophobic core, compact units were further examined for the presence of hydrophobic clusters (Zehfus MH, 1995, Protein Sci 4:1188-1202). This added constraint limits the number of acceptable units and helps greatly in the location of the true structural domains. The larger hydrophobically stabilized compact units correspond to domains, while the smaller units may correspond to folding intermediates.  相似文献   

18.
The Dynameomics project aims to simulate a representative sample of all globular protein metafolds under both native and unfolding conditions. We have identified protein unfolding transition state (TS) ensembles from multiple molecular dynamics simulations of high-temperature unfolding in 183 structurally distinct proteins. These data can be used to study individual proteins and individual protein metafolds and to mine for TS structural features common across all proteins. Separating the TS structures into four different fold classes (all proteins, all-α, all-β, and mixed α/β and α + β) resulted in no significant difference in the overall protein properties. The residues with the most contacts in the native state lost the most contacts in the TS ensemble. On average, residues beginning in an α-helix maintained more structure in the TS ensemble than did residues starting in β-strands or any other conformation. The metafolds studied here represent 67% of all known protein structures, and this is, to our knowledge, the largest, most comprehensive study of the protein folding/unfolding TS ensemble to date. One might have expected broad distributions in the average global properties of the TS relative to the native state, indicating variability in the amount of structure present in the TS. Instead, the average global properties converged with low standard deviations across metafolds, suggesting that there are general rules governing the structure and properties of the TS.  相似文献   

19.
We have devised several mechanical models of globular proteins by approximating them to various polyhedra (dodecahedron, truncated octahedron, icosahedron, truncated icosahedron). The models comprise hollow blocks linked together in a flexible chain. Between blocks there is a set of several reversible, weak magnetic interactions such that when the chain is agitated, it will fold into a stable polyhedral structure about the size of a hand. Folding may be followed in real time with a video camera. Key to the success of the folding process is the lightness of the chain. Several side chains may also be added to the blocks such that they come together to create a polyhedral core when the chain folds. The models have a number of similarities to globular proteins: each chain folds into a unique, but dynamic, three-dimensional structure; the instructions that determine this structure are built into the configuration of blocks; and it is difficult to predict this structure given the unfolded block configuration. Furthermore, the chains fold quickly, generally in less than a minute, several pathways are involved, and these pathways progress through elements of "native" structure. In particular, the models emphasize the importance of restricted conformational mobility in assisting the chain to fold, and also in eliminating undesirable interactions. Because of these similarities to globular proteins, we believe that the polyhedral models will, with continued development, be helpful in understanding the protein folding process, while at the same time acting as valuable educational visual aids. They might also inspire the construction of new types of microscopic, self-assembling devices.  相似文献   

20.
Hierarchic organization of domains in globular proteins   总被引:16,自引:0,他引:16  
An automatic procedure is developed for the identification of domains in globular proteins from X-ray elucidated co-ordinates. Using this tool, domains are shown to be iteratively decomposable into subdomains, leading to a hierarchic molecular architecture.There is no convenient geometry that will fully characterize the atom by atom interdigitation at an interface between domains, and the strategy adopted here was devised to reduce this unwieldy three-dimensional problem to a closely approximating companion analysis in a plane. These analytically derived domain choices can be used subsequently to construct computer-generated, space-filling, color-coded views of the domains; and when this is done, the derived domains are seen to be completely resolved.The number of domains in a protein is a mathematically well-behaved function of the chain length, lending support to the supposition that the domains are an implicit structural consequence of the folding process. A spectrum of domains ranging in size from whole protein monomers to the individual units of secondary structure is apparent in each of the 22 proteins analyzed here.The hierarchic organization of structural domains is evidence in favor of an underlying protein folding process that proceeds by hierarchic condensation. In this highly constrained model, every pathway leading to the native state can be described by a tree of local folding interactions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号