首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We examine how effectively simple potential functions previously developed can identify compatibilities between sequences and structures of proteins for database searches. The potential function consists of pairwise contact energies, repulsive packing potentials of residues for overly dense arrangement and short-range potentials for secondary structures, all of which were estimated from statistical preferences observed in known protein structures. Each potential energy term was modified to represent compatibilities between sequences and structures for globular proteins. Pairwise contact interactions in a sequence-structure alignment are evaluated in a mean field approximation on the basis of probabilities of site pairs to be aligned. Gap penalties are assumed to be proportional to the number of contacts at each residue position, and as a result gaps will be more frequently placed on protein surfaces than in cores. In addition to minimum energy alignments, we use probability alignments made by successively aligning site pairs in order by pairwise alignment probabilities. The results show that the present energy function and alignment method can detect well both folds compatible with a given sequence and, inversely, sequences compatible with a given fold, and yield mostly similar alignments for these two types of sequence and structure pairs. Probability alignments consisting of most reliable site pairs only can yield extremely small root mean square deviations, and including less reliable pairs increases the deviations. Also, it is observed that secondary structure potentials are usefully complementary to yield improved alignments with this method. Remarkably, by this method some individual sequence-structure pairs are detected having only 5-20% sequence identity.  相似文献   

2.
We consider in this paper the statistical distribution of hydrophobic residues along the length of protein chains. For this purpose we used a binary hydrophobicity scale which assigns hydrophobic residues a value of one and non-hydrophobes a value of zero. The resulting binary sequences are tested for randomness using the standard run test. For the majority of the 5,247 proteins examined, the distribution of hydrophobic residues along a sequence cannot be distinguished from that expected for a random distribution. This suggests that (a) functional proteins may have originated from random sequences, (b) the folding of proteins into compact structures may be much more permissive with less sequence specificity than previously thought, and (c) the clusters of hydrophobic residues along chains which are revealed by hydrophobicity plots are a natural consequence of a random distribution and can be conveniently described by binomial statistics.  相似文献   

3.
Protein sequences can be represented as binary patterns of polar (○) and nonpolar (?) amino acids. These binary sequence patterns are categorized into two classes: Class A patterns match the structural repeat of an idealized amphiphilic α-helix (3.6 residues per turn), and class B patterns match the structural repeat of an idealized amphiphilic β-strand (2 residues per turn). The difference between these two classes of sequence patterns has led to a strategy for de novo protein design based on binary patterning of polar and nonpolar amino acids. Here we ask whether similar binary patterning is incorporated in the sequences and structures of natural proteins. Analysis of the Protein Data Bank demonstrates the following. (1) Class A sequence patterns occur considerably more frequently in the sequences of natural proteins than would be expected at random, but class B patterns occur less often than expected. (2) Each pattern is found predominantly in the secondary structure expected from the binary strategy for protein design. Thus, class A patterns are found more frequently in α-helices than in β-strands, and class B patterns are found more frequently in β-strands than in α-helices. (3) Among the α-helices of natural proteins, the most commonly used binary patterns are indeed the class A patterns. (4) Among all β-strands in the database, the most commonly used binary patterns are not the expected class B patterns. (5) However, for solvent-exposed β-strands, the correlation is striking: All β-strands in the database that contain the class B patterns are exposed to solvent. (6) The bias of class A patterns for α-structure over β-structure and the bias of class B patterns for β-structure over α-structure are significant, not merely when compared to other binary patterns of polar (○) and nonpolar (?) amino acids, but also when compared to the full range of sequences in the database. The implications for the design of novel proteins are discussed.  相似文献   

4.
Morphospaces—representations of phenotypic characteristics—are often populated unevenly, leaving large parts unoccupied. Such patterns are typically ascribed to contingency, or else to natural selection disfavoring certain parts of the morphospace. The extent to which developmental bias, the tendency of certain phenotypes to preferentially appear as potential variation, also explains these patterns is hotly debated. Here we demonstrate quantitatively that developmental bias is the primary explanation for the occupation of the morphospace of RNA secondary structure (SS) shapes. Upon random mutations, some RNA SS shapes (the frequent ones) are much more likely to appear than others. By using the RNAshapes method to define coarse-grained SS classes, we can directly compare the frequencies that noncoding RNA SS shapes appear in the RNAcentral database to frequencies obtained upon a random sampling of sequences. We show that: 1) only the most frequent structures appear in nature; the vast majority of possible structures in the morphospace have not yet been explored; 2) remarkably small numbers of random sequences are needed to produce all the RNA SS shapes found in nature so far; and 3) perhaps most surprisingly, the natural frequencies are accurately predicted, over several orders of magnitude in variation, by the likelihood that structures appear upon a uniform random sampling of sequences. The ultimate cause of these patterns is not natural selection, but rather a strong phenotype bias in the RNA genotype–phenotype map, a type of developmental bias or “findability constraint,” which limits evolutionary dynamics to a hugely reduced subset of structures that are easy to “find.”  相似文献   

5.
RNA molecules, through their dual identity as sequence and structure, are an appropriate experimental and theoretical model to study the genotype-phenotype map and evolutionary processes taking place in simple replicator populations. In this computational study, we relate properties of the sequence-structure map, in particular the abundance of a given secondary structure in a random pool, with the number of replicative events that an initially random population of sequences needs to find that structure through mutation and selection. For common structures, this search process turns out to be much faster than for rare structures. Furthermore, search and fixation processes are more efficient in a wider range of mutation rates for common structures, thus indicating that evolvability of RNA populations is not simply determined by abundance. We also find significant differences in the search and fixation processes for structures of same abundance, and relate them with the number of base pairs forming the structure. Moreover, the influence of the nucleotide content of the RNA sequences on the search process is studied. Our results advance in the understanding of the distribution and attainability of RNA secondary structures. They hint at the fact that, beyond sequence length and sequence-to-function redundancy, the mutation rate that permits localization and fixation of a given phenotype strongly depends on its relative abundance and global, in general non-uniform, distribution in sequence space.  相似文献   

6.
Statistical potentials based on pairwise interactions between C alpha atoms are commonly used in protein threading/fold-recognition attempts. Inclusion of higher order interaction is a possible means of improving the specificity of these potentials. Delaunay tessellation of the C alpha-atom representation of protein structure has been suggested as a means of defining multi-body interactions. A large number of parameters are required to define all four-body interactions of 20 amino acid types (20(4) = 160,000). Assuming that residue order within a four-body contact is irrelevant reduces this to a manageable 8,855 parameters, using a nonredundant dataset of 608 protein structures. Three lines of evidence support the significance and utility of the four-body potential for sequence-structure matching. First, compared to the four-body model, all lower-order interaction models (three-body, two-body, one-body) are found statistically inadequate to explain the frequency distribution of residue contacts. Second, coherent patterns of interaction are seen in a graphic presentation of the four-body potential. Many patterns have plausible biophysical explanations and are consistent across sets of residues sharing certain properties (e.g., size, hydrophobicity, or charge). Third, the utility of the multi-body potential is tested on a test set of 12 same-length pairs of proteins of known structure for two protocols: Sequence-recognizes-structure, where a query sequence is threaded (without gap) through the native and a non-native structure; and structure-recognizes-sequence, where a query structure is threaded by its native and another non-native sequence. Using cross-validated training, protein sequences correctly recognized their native structure in all 24 cases. Conversely, structures recognized the native sequence in 23 of 24 cases. Further, the score differences between correct and decoy structures increased significantly using the three- or four-body potential compared to potentials of lower order.  相似文献   

7.
DNA loss and evolution of genome size in Drosophila   总被引:8,自引:0,他引:8  
Petrov DA 《Genetica》2002,115(1):81-91
  相似文献   

8.
A statistical reference for RNA secondary structures with minimum free energies is computed by folding large ensembles of random RNA sequences. Four nucleotide alphabets are used: two binary alphabets, AU and GC, the biophysical AUGC and the synthetic GCXK alphabet. RNA secondary structures are made of structural elements, such as stacks, loops, joints, and free ends. Statistical properties of these elements are computed for small RNA molecules of chain lengths up to 100. The results of RNA structure statistics depend strongly on the particular alphabet chosen. The statistical reference is compared with the data derived from natural RNA molecules with similar base frequencies. Secondary structures are represented as trees. Tree editing provides a quantitative measure for the distance dt, between two structures. We compute a structure density surface as the conditional probability of two structures having distance t given that their sequences have distance h. This surface indicates that the vast majority of possible minimum free energy secondary structures occur within a fairly small neighborhood of any typical (random) sequence. Correlation lengths for secondary structures in their tree representations are computed from probability densities. They are appropriate measures for the complexity of the sequence-structure relation. The correlation length also provides a quantitative estimate for the mean sensitivity of structures to point mutations. © 1993 John Wiley & Sons, Inc.  相似文献   

9.
Residue contacts in protein structures and implications for protein folding   总被引:3,自引:0,他引:3  
The preferential association of amino acid side groups with specific side chain atoms are examined in 44 known protein structures. The resulting association potentials among residue side groups are used to detect structural homology in proteins displaying little or no homology in their primary sequences. Suggestions are also made regarding the nature of the protein folding process. They are based on statistical observations that delineate the extent of short and long range interactions and that display side group bias in association with other side chain atoms on their N-terminal side.  相似文献   

10.
Recent experiments with combinatorial libraries of de novo proteins have demonstrated that sequences designed to contain polar and non-polar amino acid residues arranged in an alternating pattern form fibrillar structures resembling beta-amyloid. This finding prompted us to probe the distribution of alternating patterns in the sequences of natural proteins. Analysis of a database of 250,514 protein sequences (79,708,024 residues) for all possible binary patterns of polar and non-polar amino acid residues revealed that alternating patterns occur significantly less often than other patterns with similar compositions. The under-representation of alternating binary patterns in natural protein sequences, coupled with the observation that such patterns promote amyloid-like structures in de novo proteins, suggests that sequences of alternating polar and non-polar amino acids are inherently amyloidogenic and consequently have been disfavored by evolutionary selection.  相似文献   

11.
Measurements of protein sequence-structure correlations   总被引:1,自引:0,他引:1  
Crooks GE  Wolfe J  Brenner SE 《Proteins》2004,57(4):804-810
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.  相似文献   

12.
For the past few decades, intensive studies have been carried out in an attempt to understand how the amino acid sequences of proteins encode their three dimensional structures to perform their specific functions. In order to understand the sequence-structure relationship of proteins, several sub-sequence search studies in non-redundant sequence-structure databases have been undertaken which have given some fruitful clues. In our earlier work, we analyzed a set of 3124 non-redundant protein sequences from the Protein Data Bank (PDB) and retrieved 30 identical octapeptides having different secondary structures. These octapeptides were characterized by using different computational procedures. This prompted us to explore the presence of octapeptides with reverse sequences and to analyze whether these octapeptides would adopt similar structures as that of their parent octapeptides. Our identical reverse octapeptide search resulted in the finding of eight octapeptide pairs (octapeptide and reverse octapeptide) with similar secondary structure and 23 octapeptide pairs with different secondary structures. In the present work, the geometrical and biophysical characteristics of identical reverse octapeptides were explored and compared with unrelated octapeptide pairs by using various computational tools. We thus conclude that proteins containing identical reverse octapeptides are not very abundant and residues in the octapeptide pairs do not contribute to the stability of the protein. Furthermore, compared to unrelated octapeptides, identical reverse octapeptides do not show certain biophysical and geometrical properties.  相似文献   

13.
Emberly EG  Miller J  Zeng C  Wingreen NS  Tang C 《Proteins》2002,47(3):295-304
Using an off-lattice model, we fully enumerate folded conformations of polypeptide chains of up to N = 19 monomers. Structures are found to differ markedly in designability, defined as the number of sequences with that structure as a unique lowest-energy conformation. We find that designability is closely correlated with the pattern of surface exposure of the folded structure. For longer chains, complete enumeration of structures is impractical. Instead, structures can be randomly sampled, and relative designability estimated either from designability within the random sample, or directly from surface-exposure pattern. We compare the surface-exposure patterns of those structures identified as highly designable to the patterns of naturally occurring proteins.  相似文献   

14.
Principles of protein folding--a perspective from simple exact models.   总被引:20,自引:12,他引:20       下载免费PDF全文
General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse.  相似文献   

15.
The ability to design specific amino acid sequences that fold into desired structures is central to engineering novel proteins. Protein design is also a good method to assess our understanding of sequence-structure and structure-function relationships. While beta-sheet structures are important elements of protein architecture, it has traditionally been more difficult to design beta-proteins than alpha-helical proteins. Taking advantage of the tandem repeated sequences that form the structural building blocks in a group of beta-propeller proteins; we have used a consensus design approach to engineer modular and relatively large scaffolds. An idealized WD repeat was designed from a structure-based sequence alignment with a set of structural guidelines. Using a plasmid sequential ligation strategy, artificial concatemeric genes with up to 10 copies of this idealized repeat were then constructed. Corresponding proteins with 4 through to 10 WD repeats were soluble when over-expressed in Escherichia coli. Notably, they were sufficiently stable in vivo surviving attack from endogenous proteases, and maintained a homogeneous, non-aggregated form in vitro. The results show that the beta-propeller scaffold is an attractive platform for future engineering work, particularly in experiments in which directed evolution techniques might improve the stability of the molecules and/or tailor them for a specific function.  相似文献   

16.
We previously reported the de novo design of combinatorial libraries of proteins targeted to fold into four-helix bundles. The sequences of these proteins were designed using a binary code strategy in which each position in the linear sequence is designated as either polar or nonpolar, but the exact identity of the amino acid at each position is varied combinatorially. We subsequently reported that approximately half of these binary coded proteins were capable of binding heme. These de novo heme-binding proteins showed CO binding characteristics similar to natural heme proteins, and several were active as peroxidases. Here we analyze the midpoint reduction potentials and heme binding stoichiometries of several of these de novo heme proteins. All the proteins bound heme with a 1:1 stoichiometry. The reduction potentials ranged from -112 to -176 mV. We suggest that this represents an estimate of the default range of potentials for heme proteins that have neither been prejudiced by rational design nor selected by evolution.  相似文献   

17.
18.

Background  

Codon usage bias (CUB), the uneven use of synonymous codons, is a ubiquitous observation in virtually all organisms examined. The pattern of codon usage is generally similar among closely related species, but differs significantly among distantly related organisms, e.g., bacteria, yeast, and Drosophila. Several explanations for CUB have been offered and some have been supported by observations and experiments, although a thorough understanding of the evolutionary forces (random drift, mutation bias, and selection) and their relative importance remains to be determined. The recently available complete genome DNA sequences of twelve phylogenetically defined species of Drosophila offer a hitherto unprecedented opportunity to examine these problems. We report here the patterns of codon usage in the twelve species and offer insights on possible evolutionary forces involved.  相似文献   

19.
Libraries of de novo proteins provide an opportunity to explore the structural and functional potential of biological molecules that have not been biased by billions of years of evolutionary selection. Given the enormity of sequence space, a rational approach to library design is likely to yield a higher fraction of folded and functional proteins than a stochastic sampling of random sequences. We previously investigated the potential of library design by binary patterning of hydrophobic and hydrophilic amino acids. The structure of the most stable protein from a binary patterned library of de novo 4-helix bundles was solved previously and shown to be consistent with the design. One structure, however, cannot fully assess the potential of the design strategy, nor can it account for differences in the stabilities of individual proteins. To more fully probe the quality of the library, we now report the NMR structure of a second protein, S-836. Protein S-836 proved to be a 4-helix bundle, consistent with design. The similarity between the two solved structures reinforces previous evidence that binary patterning can encode stable, 4-helix bundles. Despite their global similarities, the two proteins have cores that are packed at different degrees of tightness. The relationship between packing and dynamics was probed using the Modelfree approach, which showed that regions containing a high frequency of chemical exchange coincide with less well-packed side chains. These studies show (1) that binary patterning can drive folding into a particular topology without the explicit design of residue-by-residue packing, and (2) that within a superfamily of binary patterned proteins, the structures and dynamics of individual proteins are modulated by the identity and packing of residues in the hydrophobic core.  相似文献   

20.
We describe a new computer algorithm for finding low-energy conformations of proteins. It is a chain-growth method that uses a heuristic bias function to help assemble a hydrophobic core. We call it the Core-directed chain Growth method (CG). We test the CG method on several well-known literature examples of HP lattice model proteins [in which proteins are modeled as sequences of hydrophobic (H) and polar (P) monomers], ranging from 20-64 monomers in two dimensions, and up to 88-mers in three dimensions. Previous nonexhaustive methods--Monte Carlo, a Genetic Algorithm, Hydrophobic Zippers, and Contact Interactions--have been tried on these same model sequences. CG is substantially better at finding the global optima, and avoiding local optima, and it does so in comparable or shorter times. CG finds the global minimum energy of the longest HP lattice model chain for which the global optimum is known, a 3D 88-mer that has only been reachable before by the CHCC complete search method. CG has the potential advantage that it should have nonexponential scaling with chain length. We believe this is a promising method for conformational searching in protein folding algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号