共查询到20条相似文献,搜索用时 15 毫秒
1.
The amino-acid sequences of soluble, globular proteins must have hydrophobic residues to form a stable core, but excess sequence hydrophobicity can lead to loss of native state conformational specificity and aggregation. Previous studies of polar-to-hydrophobic mutations in the β-sheet of the Arc repressor dimer showed that a single substitution at position 11 (N11L) leads to population of an alternate dimeric fold in which the β-sheet is replaced by helix. Two additional hydrophobic mutations at positions 9 and 13 (Q9V and R13V) lead to population of a differently folded octamer along with both dimeric folds. Here we conduct a comprehensive study of the sequence determinants of this progressive loss of fold specificity. We find that the alternate dimer-fold specifically results from the N11L substitution and is not promoted by other hydrophobic substitutions in the β-sheet. We also find that three highly hydrophobic substitutions at positions 9, 11, and 13 are necessary and sufficient for oligomer formation, but the oligomer size depends on the identity of the hydrophobic residue in question. The hydrophobic substitutions increase thermal stability, illustrating how increased hydrophobicity can increase folding stability even as it degrades conformational specificity. The oligomeric variants are predicted to be aggregation-prone but may be hindered from doing so by proline residues that flank the β-sheet region. Loss of conformational specificity due to increased hydrophobicity can manifest itself at any level of structure, depending upon the specific mutations and the context in which they occur. 相似文献
2.
Katie L. Stewart Eric D. Dodds Vicki H. Wysocki Matthew H. J. Cordes 《Protein science : a publication of the Protein Society》2013,22(5):641-649
Arc repressor is a homodimeric protein with a ribbon‐helix–helix fold. A single polar‐to‐hydrophobic substitution (N11L) at a solvent‐exposed position leads to population of an alternate dimeric fold in which 310 helices replace a β‐sheet. Here we find that the variant Q9V/N11L/R13V (S‐VLV), with two additional polar‐to‐hydrophobic surface mutations in the same β‐sheet, forms a highly stable, reversibly folded octamer with approximately half the?α‐helical content of wild‐type Arc. At low protein concentration and low ionic strength, S‐VLV also populates both dimeric topologies previously observed for N11L, as judged by NMR chemical shift comparisons. Thus, accumulation of simple hydrophobic mutations in Arc progressively reduces fold specificity, leading first to a sequence with two folds and then to a manifold bridge sequence with at least three different topologies. Residues 9–14 of S‐VLV form a highly hydrophobic stretch that is predicted to be amyloidogenic, but we do not observe aggregates of higher order than octamer. Increases in sequence hydrophobicity can promote amyloid aggregation but also exert broader and more complex effects on fold specificity. Altered native folds, changes in fold coupled to oligomerization, toxic pre‐amyloid oligomers, and amyloid fibrils may represent a near continuum of accessible alternatives in protein structure space. 相似文献
3.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity. 相似文献
4.
5.
蜘蛛丝是已知复合性能最强的天然纤维,兼具极高的抗拉伸强度与韧性,有着“生物钢”之美誉,且具有优良的生物相容性和形状记忆性能,在生物医药、组织工程等多个领域有着巨大的应用潜力。蜘蛛丝由一类富有结构多样性的高分子蛛丝蛋白所组成,天然蛛丝蛋白基因GC含量高、重复核心区氨基酸序列高度重复、特定氨基酸含量高以及分子量大等特点给其异源表达带来了较大的困难。本文重点阐述了蛛丝蛋白重复单元中重复核心区特征基序与其结构、纺丝性能和异源表达之间的相关性。对重组蛛丝蛋白的序列进行优化设计,结合异源表达策略,极大地促进了多功能蛛丝蛋白生物合成的发展。本综述可为重组蛛丝蛋白的理性设计与高效合成提供思路。 相似文献
6.
提出紧结构域的概念,由二级结构序列中一段或几段连续的α螺旋和β折叠构成的空间紧密堆集的最大折叠体称为紧结构域.利用3种紧结构域(α域,β域和α/β域)定义球蛋白的5种结构型:α型蛋白,β型蛋白,α/β型蛋白,多域蛋白和ζ型蛋白.将1 261个代表性的蛋白质(1 022家族)进行分类,并和SCOP库的分类做了比较.进行了删去序列冗余的分析.在此基础上提出结构型的预测方案,成功率在82%~85%. 相似文献
7.
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences (\"reverse\" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates. 相似文献
8.
The rotamer approximation states that protein side-chain conformations can be described well using a finite set of rotational isomers. This approximation is often applied in the context of computational protein design and structure prediction to reduce the complexity of structural sampling. It is an effective way of reducing the structure space to the most relevant conformations. However, the appropriateness of rotamers for sampling structure space does not imply that a rotamer-based energy landscape preserves any of the properties of the true continuous energy landscape. Specifically, because the energy of a van der Waals interaction can be very sensitive to small changes in atomic separation, meaningful van der Waals energies are particularly difficult to calculate from rotamer-based structures. This presents a problem for computational protein design, where the total energy of a given structure is often represented as a sum of precalculated rigid rotamer self and pair contributions. A common way of addressing this issue is to modify the van der Waals function to reduce its sensitivity to atomic position, but excessive modification may result in a strongly nonphysical potential. Although many different van der Waals modifications have been used in protein design, little is known about which performs best, and why. In this paper, we study 10 ways of computing van der Waals energies under the rotamer approximation, representing four general classes, and compare their performance using a variety of metrics relevant to protein design and native-sequence repacking calculations. Scaling van der Waals radii by anywhere from 85 to 95% gives the best performance. Linearizing and capping the repulsive portion of the potential can give additional improvement, which comes primarily from getting rid of unrealistically large clash energies. On the other hand, continuously minimizing individual rotamer pairs prior to evaluating their interaction works acceptably in native-sequence repacking, but fails in protein design. Additionally, we show that the problem of predicting relevant van der Waals energies from rotamer-based structures is strongly nonpairwise decomposable and hence further modifications of the potential are unlikely to give significant improvement. 相似文献
9.
We developed a method CHOP dissecting proteins into domain-like fragments. The basic idea was to cut proteins beginning from very reliable experimental information (PDB), proceeding to expert annotations of domain-like regions (Pfam-A), and completing through cuts based on termini of known proteins. In this way, CHOP dissected more than two thirds of all proteins from 62 proteomes. Analysis of our structural domain-like fragments revealed four surprising results. First, >70% of all dissected proteins contained more than one fragment. Second, most domains spanned on average over approximately 100 residues. This average was similar for eukaryotic and prokaryotic proteins, and it is also valid-although previously not described-for all proteins in the PDB. Third, single-domain proteins were significant longer than most domains in multidomain proteins. Fourth, three fourths of all domains appeared shorter than 210 residues. We believe that our CHOP fragments constituted an important resource for functional and structural genomics. Nevertheless, our main motivation to develop CHOP was that the single-linkage clustering method failed to adequately group full-length proteins. In contrast, CLUP-the simple clustering scheme CLUP introduced here-succeeded largely to group the CHOP fragments from 62 proteomes such that all members of one cluster shared a basic structural core. CLUP found >63,000 multi- and >118,000 single-member clusters. Although most fragments were restricted to a particular cluster, approximately 24% of the fragments were duplicated in at least two clusters. Our thresholds for grouping two fragments into the same cluster were rather conservative. Nevertheless, our results suggested that structural genomics initiatives have to target >30,000 fragments to at least cover the multimember clusters in 62 proteomes. 相似文献
10.
Dengue-1 (D1) Mochizuki strain was examined for its nucleotide and amino acid sequences of genomic RNA and the data obtained were compared with those of other selected virus strains reported previously. Genomic regions corresponding to C, preM and M proteins were the major subjects of study. Parts of E protein were additionally examined. Among the D1 viruses investigated, the Mochizuki virus which was isolated in 1943 in Japan was shown to be close to Philippine 836-1 strain isolated in 1984 and Nauru Island strain isolated in 1974 at the respective places, in contrast with Thai AHF 82-80 strain isolated in 1980 and Caribbean CV1636/77 strain isolated in 1977. At the same time, a difference was noted between the Mochizuki and Philippine/Nauru strains at the cleavage site of preM/M junction: Mochizuki possessed RRGKR/S sequence whereas the Philippine/Nauru had RRDKR/S. The glycosylation site in preM and hydrophobic regions at the carboxyl termini of M and E were well conserved. Significances of the data are discussed in connection with viral epidemiology and variation. 相似文献
11.
Locating sequences compatible with a protein structural fold is the well‐known inverse protein‐folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy‐optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment‐derived sequence profiles and structure‐derived energy profiles. SPIN improves over the fragment‐derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild‐type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single‐body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks‐lab.org . Proteins 2014; 82:2565–2573. © 2014 Wiley Periodicals, Inc. 相似文献
12.
给出了α型、β型、α/β型、多域型蛋白质二级结构主序列六联体的分布规律.提出了根据蛋白质二级结构主序列对蛋白质结构型进行识别(分类)的方法.以蛋白质二级结构主序列三联体为参数,利用Mahalanobis距离方法对上述4种结构型的蛋白质进行识别,分类的总体准确率为81%;以二级结构主序列中六联体的频数构成蛋白质结构的多样性源,利用多样性增量极小化对上述4种结构型进行识别,分类的总体准确率为83%. 同时也给出了对紧结构域的识别途径. 相似文献
13.
We show that long- and short-range interactions in almost all protein native structures are actually consistent with each other for coarse-grained energy scales; specifically we mean the long-range inter-residue contact energies and the short-range secondary structure energies based on peptide dihedral angles, which are potentials of mean force evaluated from residue distributions observed in protein native structures. This consistency is observed at equilibrium in sequence space rather than in conformational space. Statistical ensembles of sequences are generated by exchanging residues for each of 797 protein native structures with the Metropolis method. It is shown that adding the other category of interaction to either the short- or long-range interactions decreases the means and variances of those energies for essentially all protein native structures, indicating that both interactions consistently work by more-or-less restricting sequence spaces available to one of the interactions. In addition to this consistency, independence by these interaction classes is also indicated by the fact that there are almost no correlations between them when equilibrated using both interactions and significant but small, positive correlations at equilibrium using only one of the interactions. Evidence is provided that protein native sequences can be regarded approximately as samples from the statistical ensembles of sequences with these energy scales and that all proteins have the same effective conformational temperature. Designing protein structures and sequences to be consistent and minimally frustrated among the various interactions is a most effective way to increase protein stability and foldability. 相似文献
14.
James T. MacDonald Katarzyna Maksimiak Michael I. Sadowski William R. Taylor 《Proteins》2010,78(5):1311-1325
In recent years, there have been significant advances in the field of computational protein design including the successful computational design of enzymes based on backbone scaffolds from experimentally solved structures. It is likely that large‐scale sampling of protein backbone conformations will become necessary as further progress is made on more complicated systems. Removing the constraint of having to use scaffolds based on known protein backbones is a potential method of solving the problem. With this application in mind, we describe a method to systematically construct a large number of de novo backbone structures from idealized topological forms in a top–down hierarchical approach. The structural properties of these novel backbone scaffolds were analyzed and compared with a set of high‐resolution experimental structures from the protein data bank (PDB). It was found that the Ramachandran plot distribution and relative γ‐ and β‐turn frequencies were similar to those found in the PDB. The de novo scaffolds were sequence designed with RosettaDesign, and the energy distributions and amino acid compositions were comparable with the results for redesigned experimentally solved backbones. Proteins 2010. © 2009 Wiley‐Liss, Inc. 相似文献
15.
It is well known that protein fold recognition can be greatly improved if models for the underlying evolution history of the folds are taken into account. The improvement, however, exists only if such evolutionary information is available. To circumvent this limitation for protein families that only have a small number of representatives in current sequence databases, we follow an alternate approach in which the benefits of including evolutionary information can be recreated by using sequences generated by computational protein design algorithms. We explore this strategy on a large database of protein templates with 1747 members from different protein families. An automated method is used to design sequences for these templates. We use the backbones from the experimental structures as fixed templates, thread sequences on these backbones using a self‐consistent mean field approach, and score the fitness of the corresponding models using a semi‐empirical physical potential. Sequences designed for one template are translated into a hidden Markov model‐based profile. We describe the implementation of this method, the optimization of its parameters, and its performance. When the native sequences of the protein templates were tested against the library of these profiles, the class, fold, and family memberships of a large majority (>90%) of these sequences were correctly recognized for an E‐value threshold of 1. In contrast, when homologous sequences were tested against the same library, a much smaller fraction (35%) of sequences were recognized; The structural classification of protein families corresponding to these sequences, however, are correctly recognized (with an accuracy of >88%). Proteins 2013; © 2013 Wiley Periodicals, Inc. 相似文献
16.
J. Bajorath R. Stenkamp A. Aruffo 《Protein science : a publication of the Protein Society》1993,2(11):1798-1810
We describe how to build protein models from structural templates. Methods to identify structural similarities between proteins in cases of significant, moderate to low, or virtually absent sequence similarity are discussed. The detection and evaluation of structural relationships is emphasized as a central aspect of protein modeling, distinct from the more technical aspects of model building. Computational techniques to generate and complement comparative protein models are also reviewed. Two examples, P-selectin and gp39, are presented to illustrate the derivation of protein model structures and their use in experimental studies. 相似文献
17.
P Alfarano G Varadamsetty C Ewald F Parmeggiani R Pellarin O Zerbe A Plückthun A Caflisch 《Protein science : a publication of the Protein Society》2012,21(9):1298-1314
A multidisciplinary approach based on molecular dynamics (MD) simulations using homology models, NMR spectroscopy, and a variety of biophysical techniques was used to efficiently improve the thermodynamic stability of armadillo repeat proteins (ArmRPs). ArmRPs can form the basis of modular peptide recognition and the ArmRP version on which synthetic libraries are based must be as stable as possible. The 42-residue internal Arm repeats had been designed previously using a sequence-consensus method. Heteronuclear NMR revealed unfavorable interactions present at neutral but absent at high pH. Two lysines per repeat were involved in repulsive interactions, and stability was increased by mutating both to glutamine. Five point mutations in the capping repeats were suggested by the analysis of positional fluctuations and configurational entropy along multiple MD simulations. The most stabilizing single C-cap mutation Q240L was inferred from explicit solvent MD simulations, in which water penetrated the ArmRP. All mutants were characterized by temperature- and denaturant-unfolding studies and the improved mutants were established as monomeric species with cooperative folding and increased stability against heat and denaturant. Importantly, the mutations tested resulted in a cumulative decrease of flexibility of the folded state in silico and a cumulative increase of thermodynamic stability in vitro. The final construct has a melting temperature of about 85°C, 14.5° higher than the starting sequence. This work indicates that in silico studies in combination with heteronuclear NMR and other biophysical tools may provide a basis for successfully selecting mutations that rapidly improve biophysical properties of the target proteins. 相似文献
18.
The concept of consensus in multiple sequence alignments (MSAs) has been used to design and engineer proteins previously with some success. However, consensus design implicitly assumes that all amino acid positions function independently, whereas in reality, the amino acids in a protein interact with each other and work cooperatively to produce the optimum structure required for its function. Correlation analysis is a tool that can capture the effect of such interactions. In a previously published study, we made consensus variants of the triosephosphate isomerase (TIM) protein using MSAs that included sequences form both prokaryotic and eukaryotic organisms. These variants were not completely native-like and were also surprisingly different from each other in terms of oligomeric state, structural dynamics, and activity. Extensive correlation analysis of the TIM database has revealed some clues about factors leading to the unusual behavior of the previously constructed consensus proteins. Among other things, we have found that the more ill-behaved consensus mutant had more broken correlations than the better-behaved consensus variant. Moreover, we report three correlation and phylogeny-based consensus variants of TIM. These variants were more native-like than the previous consensus mutants and considerably more stable than a wild-type TIM from a mesophilic organism. This study highlights the importance of choosing the appropriate diversity of MSA for consensus analysis and provides information that can be used to engineer stable enzymes. 相似文献
19.
We propose a machine-learning approach to sequence-based prediction of protein crystallizability in which we exploit subtle differences between proteins whose structures were solved by X-ray analysis [or by both X-ray and nuclear magnetic resonance (NMR) spectroscopy] and those proteins whose structures were solved by NMR spectroscopy alone. Because the NMR technique is usually applied on relatively small proteins, sequence length distributions of the X-ray and NMR datasets were adjusted to avoid predictions biased by protein size. As feature space for classification, we used frequencies of mono-, di-, and tripeptides represented by the original 20-letter amino acid alphabet as well as by several reduced alphabets in which amino acids were grouped by their physicochemical and structural properties. The classification algorithm was constructed as a two-layered structure in which the output of primary support vector machine classifiers operating on peptide frequencies was combined by a second-level Naive Bayes classifier. Due to the application of metamethods for cost sensitivity, our method is able to handle real datasets with unbalanced class representation. An overall prediction accuracy of 67% [65% on the positive (crystallizable) and 69% on the negative (noncrystallizable) class] was achieved in a 10-fold cross-validation experiment, indicating that the proposed algorithm may be a valuable tool for more efficient target selection in structural genomics. A Web server for protein crystallizability prediction called SECRET is available at http://webclu.bio.wzw.tum.de:8080/secret. 相似文献
20.
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families. 相似文献