首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this work, we discovered a fundamental connection between selection for protein stability and emergence of preferred structures of proteins. Using a standard exact three-dimensional lattice model we evolve sequences starting from random ones and determine the exact native structure after each mutation. Acceptance of mutations is biased to select for stable proteins. We found that certain structures, "wonderfolds", are independently discovered numerous times as native states of stable proteins in many unrelated runs of selection. The strong dependence of lattice fold usage on the structural determinant of designability quantitatively reproduces uneven fold usage in natural proteins. Diversity of sequences that fold into wonderfold structures gives rise to superfamilies, i.e. sets of dissimilar sequences that fold into the same or very similar structures. The present work establishes a model of pre-biotic structure selection, which identifies dominant structural patterns emerging upon optimization of proteins for survival in a hot environment. Convergently discovered pre-biotic initial superfamilies with wonderfold structures could have served as a seed for subsequent biological evolution involving gene duplications and divergence.  相似文献   

2.
Background: A problem for unique protein folding was raised in 1998: are there proteins having unique optimal foldings for all lengths in the hydrophobic-hydrophilic (hydrophobic-polar; HP) model? To such a question, it was proved that on a square lattice there are (i) closed chains of monomers having unique optimal foldings for all even lengths and (ii) open monomer chains having unique optimal foldings for all lengths divisible by four. In this article, we aim to extend the previous work on a square lattice to the optimal foldings of proteins on a triangular lattice by examining the uniqueness property or stability of HP chain folding. Method: We consider this protein folding problem on a triangular lattice using graph theory. For an HP chain with length n > 13, generally it is very time-consuming to enumerate all of its possible folding conformations. Hence, one can hardly know whether or not it has a unique optimal folding. A natural problem is to determine for what value of n there is an n-node HP chain that has a unique optimal folding on a triangular lattice. Results and conclusion: Using graph theory, this article proves that there are both closed and open chains having unique optimal foldings for all lengths >19 in a triangular lattice. This result is not only general from the theoretical viewpoint, but also can be expected to apply to areas of protein structure prediction and protein design because of their close relationship with the concept of energy state and designability.  相似文献   

3.
Olson MA  Yeh IC  Lee MS 《Biopolymers》2008,89(2):153-159
Many realistic protein-engineering design problems extend beyond the computational limits of what is considered practical when applying all-atom molecular-dynamics simulation methods. Lattice models provide computationally robust alternatives, yet most are regarded as too simplistic to accurately capture the details of complex designs. We revisit a coarse-grained lattice simulation model and demonstrate that a multiresolution modeling approach of reconstructing all-atom structures from lattice chains is of sufficient accuracy to resolve the comparability of sequence-structure modifications of the ricin A-chain (RTA) protein fold. For a modeled structure, the unfolding-folding transition temperature was calculated from the heat capacity using either the potential energy from the lattice model or the all-atom CHARMM19 force-field plus a generalized Born solvent approximation. We found, that despite the low-resolution modeling of conformational states, the potential energy functions were capable of detecting the relative change in the thermodynamic transition temperature that distinguishes between a protein design and the native RTA fold in excellent accord with reported experimental studies of thermal denaturation. A discussion is provided of different sequences fitted to the RTA fold and a possible unfolding model.  相似文献   

4.
The hydrophobic interaction is the main driving force for protein folding. Here, we address the question of what is the optimal fraction, f of hydrophobic (H) residues required to ensure protein collapse. For very small f (say f<0.1), the protein chain is expected to behave as a random coil, where the H residues are "wrapped" locally by polar (P) residues. However, for large enough f this local coverage cannot be achieved and the thermodynamic alternative to avoid contact with water is burying the H residues in the interior of a compact chain structure. The interior also contains P residues that are known to be clustered to optimize their electrostatic interactions. This means that the H residues are clustered as well, i.e. they effectively attract each other like the H-monomers in Dill's HP lattice model. Previously, we asked the question: assuming that the H monomers in the HP model are distributed randomly along the chain, what fraction of them is required to ensure a compact ground state? We claimed there that f approximately p(c), where p(c) is the site percolation threshold of the lattice (in a percolation experiment, each site of an initially empty lattice is visited and a particle is placed there with a probability p. The interest is in the critical (minimal) value, p(c), for which percolation occurs, i.e. a cluster connecting the opposite sides of the lattice is created). Due to the above correspondence between the HP model and real proteins (and assuming that the H residues are distributed at random) we suggest that the experimental f should lead to percolating clusters of H residues over the highly dense protein core, i.e. clusters of the core size. To check this theory, we treat a simplified model consisting of H and P residues represented by their alpha-carbon atoms only. The structure is defined by the C(alpha)-C(alpha) virtual bond lengths, angles and dihedral angles, and the X-ray structure is best-fitted onto a face-centered cubic lattice. Percolation experiments are carried out for 103 single-chain proteins using six different hydrophobic sets of residues. Indeed, on average, percolating clusters are generated, which supports our theory; however, some sets lead to a better core coverage than others. We also calculate the largest actual hydrophobic cluster of each protein and show that, on average, these clusters span the core, again in accord with our theory. We discuss the effect of protein size, deviations from the average picture, and implications of this study for defining reliable simplified models of proteins.  相似文献   

5.
Topology fingerprint approach to the inverse protein folding problem.   总被引:19,自引:0,他引:19  
We describe the most general solution to date of the problem of matching globular protein sequences to the appropriate three-dimensional structures. The screening template, against which sequences are tested, is provided by a protein "structural fingerprint" library based on the contact map and the buried/exposed pattern of residues. Then, a lattice Monte Carlo algorithm validates or dismisses the stability of the proposed fold. Examples of known structural similarities between proteins having weakly or unrelated sequences such as the globins and phycocyanins, the eight-member alpha/beta fold of triose phosphate isomerase and even a close structural equivalence between azurin and immunoglobulins are found.  相似文献   

6.
A new method for the homology-based modeling of protein three-dimensional structures is proposed and evaluated. The alignment of a query sequence to a structural template produced by threading algorithms usually produces low-resolution molecular models. The proposed method attempts to improve these models. In the first stage, a high-coordination lattice approximation of the query protein fold is built by suitable tracking of the incomplete alignment of the structural template and connection of the alignment gaps. These initial lattice folds are very similar to the structures resulting from standard molecular modeling protocols. Then, a Monte Carlo simulated annealing procedure is used to refine the initial structure. The process is controlled by the model's internal force field and a set of loosely defined restraints that keep the lattice chain in the vicinity of the template conformation. The internal force field consists of several knowledge-based statistical potentials that are enhanced by a proper analysis of multiple sequence alignments. The template restraints are implemented such that the model chain can slide along the template structure or even ignore a substantial fraction of the initial alignment. The resulting lattice models are, in most cases, closer (sometimes much closer) to the target structure than the initial threading-based models. All atom models could easily be built from the lattice chains. The method is illustrated on 12 examples of target/template pairs whose initial threading alignments are of varying quality. Possible applications of the proposed method for use in protein function annotation are briefly discussed.  相似文献   

7.
The chaperonin system, GroEL and GroES of Escherichia coli enable certain proteins to fold under conditions when spontaneous folding is prohibitively slow as to compete with other non-productive channels such as aggregation. We investigated the plausible mechanisms of GroEL-mediated folding using simple lattice models. In particular, we have investigated protein folding in a confined environment, such as those offered by the GroEL, to decipher whether rate and yield enhancement can occur when the substrate protein is allowed to fold within the cavity of the chaperonins. The GroEL cavity is modeled as a cubic box and a simple bead model is used to represent the substrate chain. We consider three distinct characteristic of the confining environment. First, the cavity is taken to be a passive Anfinsen cage in which the walls merely reduce the available conformation space. We find that at temperatures when the native conformation is stable, the folding rate is retarded in the Anfinsen cage. We then assumed that the interior of the wall is hydrophobic. In this case the folding times exhibit a complex behavior. When the strength of the interaction between the polypeptide chain and the cavity is too strong or too weak we find that the rates of folding are retarded compared to spontaneous folding. There is an optimum range of the interaction strength that enhances the rates. Thus, above this value there is an inverse correlation between the folding rates and the strength of the substrate-cavity interactions. The optimal hydrophobic walls essentially pull the kinetically trapped states which leads to a smoother the energy landscape. It is known that upon addition of ATP and GroES the interior cavity of GroEL offers a hydrophilic-like environment to the substrate protein. In order to mimic this within the context of the dynamic Anfinsen cage model, we allow for changes in the hydrophobicity of the walls of the cavity. The duration for which the walls remain hydrophobic during one cycle of ATP hydrolysis is allowed to vary. These calculations show that frequent cycling of the wall hydrophobicity can dramatically reduce the folding times and increase the yield as well under non-permissive conditions. Examination of the structures of the substrate proteins before and after the change in hydrophobicity indicates that there is global unfolding involved. In addition, it is found that a fraction of the molecules kinetically partition to the native state in accordabce with the iterative annealing mechanism. Thus, frequent "unfoldase" activity of chaperonins leading to global unfolding of the polypeptide chain results in enhancement of the folding rates and yield of the folded protein. We suggest that chaperonin efficiency can be greatly enhanced if the cycling time is reduced. The calculations are used to interpret a few experiments on chaperonin-mediated protein folding.  相似文献   

8.
A quantitative structure-property relationship (QSPR) was used to design model protein sequences that fold repeatedly and relatively rapidly to stable target structures. The specific model was a 125-residue heteropolymer chain subject to Monte Carlo dynamics on a simple cubic lattice. The QSPR was derived from an analysis of a database of 200 sequences by a statistical method that uses a genetic algorithm to select the sequence attributes that are most important for folding and a neural network to determine the corresponding functional dependence of folding ability on the chosen attributes. The QSPR depends on the number of anti-parallel sheet contacts, the energy gap between the native state and quasi-continuous part of the spectrum and the total energy of the contacts between surface residues. Two Monte Carlo procedures were used in series to optimize both the target structures and the sequences. We generated 20 fully optimized sequences and 60 partially optimized control sequences and tested each for its ability to fold in dynamic MC simulations. Although sequences in which either the number of anti-parallel sheet contacts or the energy of the surface residues is non-optimal are capable of folding almost as well as fully optimized ones, sequences in which only the energy gap is optimized fold markedly more slowly. Implications of the results for the design of proteins are discussed.  相似文献   

9.
We give a 5-approximation algorithm to the rooted Subtree-Prune-and-Regraft (rSPR) distance between two phylogenies, which was recently shown to be NP-complete. This paper presents the first approximation result for this important tree distance. The algorithm follows a standard format for tree distances. The novel ideas are in the analysis. In the analysis, the cost of the algorithm uses a "cascading" scheme that accounts for possible wrong moves. This accounting is missing from previous analysis of tree distance approximation algorithms. Further, we show how all algorithms of this type can be implemented in linear time and give experimental results.  相似文献   

10.
It is shown that and how it is possible to single out the chain fold which is thermo-dynamically most stable. The suggested approach is based on two physical ideas: A "molecular field" approximation permits to examine all protein structures which belong to the same "folding pattern". Only a limited set of the "potentially stable" folding patterns have to be examined. The general approach is illustrated by calculations of the stable folds for two beta-domains.  相似文献   

11.
Doruker P  Jernigan RL 《Proteins》2003,53(2):174-181
The three-dimensional structure of a 1509-residue protein-hemagglutinin is reconstructed on a simple cubic lattice by retaining all lattice sites that fall within close proximity of the X-ray coordinates. Coarse-grained normal modes analysis is performed using these lattice sites as the nodes of an elastic network. The collective deformations of the protein can still be extracted from such a structure that just mimics the overall shape of the protein but not its mass distribution. These results emphasize that the overall shape rather than the details of the protein fold determines the dynamical domains in proteins. Thus, low-resolution protein structures, even those constructed on a regularly spaced lattice, can provide insights about the functionally important global dynamics around the native state.  相似文献   

12.
Theory for the folding and stability of globular proteins   总被引:52,自引:0,他引:52  
K A Dill 《Biochemistry》1985,24(6):1501-1509
Using lattice statistical mechanics, we develop theory to account for the folding of a heteropolymer molecule such as a protein to the globular and soluble state. Folding is assumed to be driven by the association of solvophobic monomers to avoid solvent and opposed by the chain configurational entropy. Theory predicts a phase transition as a function of temperature or solvent character. Molecules that are too short or too long or that have too few solvophobic residues are predicted not to fold. Globular molecules should have a largely solvophobic core, but there is an entropic tendency for some residues to be "out of place", particularly in small molecules. For long chains, molecules comprised of globular domains are predicted to be thermodynamically more stable than spherical molecules. The number of accessible conformations in the globular state is calculated to be an exceedingly small fraction of the number available to the random coil. Previous estimates of this number, which have motivated kinetic theories of folding, err by many tens of orders of magnitude.  相似文献   

13.
We revisit the DOUBLE DIGEST problem, which occurs in sequencing of large DNA strings and consists of reconstructing the relative positions of cut sites from two different enzymes. We first show that DOUBLE DIGEST is strongly NP-complete, improving upon previous results that only showed weak NP-completeness. Even the (experimentally more meaningful) variation in which we disallow coincident cut sites turns out to be strongly NP-complete. In the second part, we model errors in data as they occur in real-life experiments: we propose several optimization variations of DOUBLE DIGEST that model partial cleavage errors. We then show that most of these variations are hard to approximate. In the third part, we investigate variations with the additional restriction that coincident cut sites are disallowed, and we show that it is NP-hard to even find feasible solutions in this case, thus making it impossible to guarantee any approximation ratio at all.  相似文献   

14.
Antibodies that bind to protein surfaces of interest can be used to report the three-dimensional structure of the protein as follows: Proteins are composed of linear polypeptide chains that fold together in complex spatial patterns to create the native protein structure. These folded structures form binding sites for antibodies. Antibody binding sites are typically "assembled" on the protein surface from segments that are far apart in the primary amino acid sequence of the target proteins. Short amino acid probe sequences that bind to the active region of each antibody can be used as witnesses to the antibody epitope surface and these probes can be efficiently selected from random sequence peptide libraries. This paper presents a new method to align these antibody epitopes to discontinuous regions of the one-dimensional amino acid sequence of a target protein. Such alignments of the epitopes indicate how segments of the protein sequence must be folded together in space and thus provide long-range constraints for solving the 3-D protein structure. This new antibody-based approach is applicable to the large fraction of proteins that are refractory to current approaches for structure determination and has the additional advantage of requiring very small amounts of the target protein. The binding site of an antibody is a surface, not just a continuous linear sequence, so the epitope mapping alignment problem is outside the scope of classical string alignment algorithms, such as Smith-Waterman. We formalize the alignment problem that is at the heart of this new approach, prove that the epitope mapping alignment problem is NP-complete, and give some initial results using a branch-and-bound algorithm to map two real-life cases. Initial results for two validation cases are presented for a graph-based protein surface neighbor mapping procedure that promises to provide additional spatial proximity information for the amino acid residues on the protein surface.  相似文献   

15.
The design of a protein folding approximation algorithm is not straightforward even when a simplified model is used. The folding problem is a combinatorial problem, where approximation and heuristic algorithms are usually used to find near optimal folds of proteins primary structures. Approximation algorithms provide guarantees on the distance to the optimal solution. The folding approximation approach proposed here depends on two-dimensional cellular automata to fold proteins presented in a well-studied simplified model called the hydrophobic–hydrophilic model. Cellular automata are discrete computational models that rely on local rules to produce some overall global behavior. One-third and one-fourth approximation algorithms choose a subset of the hydrophobic amino acids to form H–H contacts. Those algorithms start with finding a point to fold the protein sequence into two sides where one side ignores H’s at even positions and the other side ignores H’s at odd positions. In addition, blocks or groups of amino acids fold the same way according to a predefined normal form. We intend to improve approximation algorithms by considering all hydrophobic amino acids and folding based on the local neighborhood instead of using normal forms. The CA does not assume a fixed folding point. The proposed approach guarantees one half approximation minus the H–H endpoints. This lower bound guaranteed applies to short sequences only. This is proved as the core and the folds of the protein will have two identical sides for all short sequences.  相似文献   

16.
A polymer molecule (represented by a statistical chain) end-grafted to a topologically rough surface was studied by static MC simulations. A modified self-avoiding walk on a cubic lattice was used to model the polymer in an athermal solution. Different statistical models of surface roughness were applied. Conformational entropies of chains attached to uncorrelated Gaussian, Brownian, and fractional Brownian surfaces were calculated. Results were compared with the predictions of a simple analytical model of a macromolecule end-grafted to a fractal surface.
Figure
Visualization of SAW generated by the (023) algorithm on a 3D cubic lattice  相似文献   

17.
In this paper, we introduce the 2D hexagonal lattice as a biologically meaningful alternative to the standard square lattice for the study of protein folding in the HP model. We show that the hexagonal lattice alleviates the "sharp turn" problem and models certain aspects of the protein secondary structure more realistically. We present a 1/6-approximation and a clustering heuristic for protein folding on the hexagonal lattice. In addition to these two algorithms, we also implement a Monte Carlo Metropolis algorithm and a branch-and-bound partial enumeration algorithm, and conduct experiments to compare their effectiveness.  相似文献   

18.
Proteins that share even low sequence homologies are known to adopt similar folds. The beta-propeller structural motif is one such example. Identifying sequences that adopt a beta-propeller fold is useful to annotate protein structure and function. Often, tandem sequence repeats provide the necessary signal for identifying beta-propellers in proteins. In our recent analysis to identify cell surface proteins in archaeal and bacterial genomes, we identified some proteins that contain novel tandem repeats "LVIVD", "RIVW" and "LGxL". In this work, based on protein fold predictions and three-dimensional comparative modeling methods, we predicted that these repeat types fold as beta-propeller. Further, the evolutionary trace analysis of all proteins constituting amino acid sequence repeats in beta-propellers suggest that the novel repeats have diverged from a common ancestor.  相似文献   

19.
BACKGROUND: The ability to predict the native conformation of a globular protein from its amino-acid sequence is an important unsolved problem of molecular biology. We have previously reported a method in which reduced representations of proteins are folded on a lattice by Monte Carlo simulation, using statistically-derived potentials. When applied to sequences designed to fold into four-helix bundles, this method generated predicted conformations closely resembling the real ones. RESULTS: We now report a hierarchical approach to protein-structure prediction, in which two cycles of the above-mentioned lattice method (the second on a finer lattice) are followed by a full-atom molecular dynamics simulation. The end product of the simulations is thus a full-atom representation of the predicted structure. The application of this procedure to the 60 residue, B domain of staphylococcal protein A predicts a three-helix bundle with a backbone root mean square (rms) deviation of 2.25-3 A from the experimentally determined structure. Further application to a designed, 120 residue monomeric protein, mROP, based on the dimeric ROP protein of Escherichia coli, predicts a left turning, four-helix bundle native state. Although the ultimate assessment of the quality of this prediction awaits the experimental determination of the mROP structure, a comparison of this structure with the set of equivalent residues in the ROP dime- crystal structure indicates that they have a rms deviation of approximately 3.6-4.2 A. CONCLUSION: Thus, for a set of helical proteins that have simple native topologies, the native folds of the proteins can be predicted with reasonable accuracy from their sequences alone. Our approach suggest a direction for future work addressing the protein-folding problem.  相似文献   

20.
MOTIVATION: The double cut and join operation (abbreviated as DCJ) has been extensively used for genomic rearrangement. Although the DCJ distance between signed genomes with both linear and circular (uni- and multi-) chromosomes is well studied, the only known result for the NP-complete unsigned DCJ distance problem is an approximation algorithm for unsigned linear unichromosomal genomes. In this article, we study the problem of computing the DCJ distance on two unsigned linear multichromosomal genomes (abbreviated as UDCJ). RESULTS: We devise a 1.5-approximation algorithm for UDCJ by exploiting the distance formula for signed genomes. In addition, we show that UDCJ admits a weak kernel of size 2k and hence an FPT algorithm running in O(2(2k)n) time.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号