首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Lisewski AM 《PloS one》2008,3(9):e3110
The transmission of genomic information from coding sequence to protein structure during protein synthesis is subject to stochastic errors. To analyze transmission limits in the presence of spurious errors, Shannon's noisy channel theorem is applied to a communication channel between amino acid sequences and their structures established from a large-scale statistical analysis of protein atomic coordinates. While Shannon's theorem confirms that in close to native conformations information is transmitted with limited error probability, additional random errors in sequence (amino acid substitutions) and in structure (structural defects) trigger a decrease in communication capacity toward a Shannon limit at 0.010 bits per amino acid symbol at which communication breaks down. In several controls, simulated error rates above a critical threshold and models of unfolded structures always produce capacities below this limiting value. Thus an essential biological system can be realistically modeled as a digital communication channel that is (a) sensitive to random errors and (b) restricted by a Shannon error limit. This forms a novel basis for predictions consistent with observed rates of defective ribosomal products during protein synthesis, and with the estimated excess of mutual information in protein contact potentials.  相似文献   

2.
Measurements of protein sequence-structure correlations   总被引:1,自引:0,他引:1  
Crooks GE  Wolfe J  Brenner SE 《Proteins》2004,57(4):804-810
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.  相似文献   

3.
4.
定点突变后蛋白质稳定性的增加还是降低,是分子生物学和蛋白质工程的核心问题之一,也是目前生物信息学研究的重要领域。基于蛋白质序列信息对蛋白质定点突变后的稳定性进行预测的方法,因其简易、适用面广而得到广泛的研究应用。通过对编码策略(coding schemes)的探索,发现不同编码策略对预测准确率有较大影响,并发现基于进化信息的BLOSUM打分矩阵可以用于蛋白质定点突变稳定性预测,具有较高的预测准确率。应用基于BLOSUM62打分矩阵的神经网络(ANN)和支持向量机(SVM)算法,可以改进蛋白质定点突变后稳定性的预测,而且ANN+ BLOSUM62在1623条序列的数据集上的实测结果优于目前国际通用的几款预测 软件。  相似文献   

5.
Solis AD  Rackovsky S 《Proteins》2008,71(3):1071-1087
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.  相似文献   

6.
Leucine and Isoleucine are two amino acids that differ only by the positioning of one methyl group. This small difference can have important consequences in α-helices, as the β-branching of Ile results in helix destabilization. We set out to investigate whether there are general trends for the occurrences of Leu and Ile residues in the structures and sequences of class A GPCRs (G protein-coupled receptors). GPCRs are integral membrane proteins in which α-helices span the plasma membrane seven times and which play a crucial role in signal transmission. We found that Leu side chains are generally more exposed at the protein surface than Ile side chains. We explored whether this difference might be attributed to different functions of the two amino acids and tested if Leu tunes the hydrophobicity of the transmembrane domain based on the Wimley-White whole-residue hydrophobicity scales. Leu content decreases the variation in hydropathy between receptors and correlates with the non-Leu receptor hydropathy. Both measures indicate that hydropathy is tuned by Leu. To test this idea further, we generated protein sequences with random amino acid compositions using a simple numerical model, in which hydropathy was tuned by adjusting the number of Leu residues. The model was able to replicate the observations made with class A GPCR sequences. We speculate that the hydropathy of transmembrane domains of class A GPCRs is tuned by Leu (and to some lesser degree by Lys and Val) to facilitate correct insertion into membranes and/or to stably anchor the receptors within membranes.  相似文献   

7.
A cDNA clone encoding 55-kDa multifunctional, thyroid hormone binding protein of rabbit skeletal muscle sarcoplasmic reticulum was isolated and sequenced. The cDNA encoded a protein of 509 amino acids, and a comparison of the deduced amino acid sequence with the NH2-terminal amino acid sequence of the purified protein indicates that an 18-residue NH2-terminal signal sequence was removed during synthesis. The deduced amino acid sequence of the rabbit muscle clone suggested that this protein is related to human liver thyroid hormone binding protein, rat liver protein disulfide isomerase, human hepatoma beta-subunit of prolyl 4-hydroxylase and hen oviduct glycosylation site binding protein. The protein contains two repeated sequences Trp-Cys-Gly-His-Cys-Lys proposed to be in the active sites of protein disulfide isomerase. Northern blot analysis showed that the mRNA encoding rabbit skeletal muscle form of the protein is present in liver, kidney, brain, fast- and slow-twitch skeletal muscle, and in the myocardium. In all tissues the cDNA reacts with mRNA of 2.7 kilobases in length. The 55-kDa multifunctional thyroid hormone binding protein was identified in isolated sarcoplasmic reticulum vesicles using a monoclonal antibody specific to the 55-kDa thyroid hormone binding protein from rat liver endoplasmic reticulum. The mature protein of Mr 56,681 contains 95 acidic and 61 basic amino acids. The COOH-terminal amino acid sequence of the protein is highly enriched in acidic residues with 17 of the last 29 amino acids being negatively charged. Analysis of hydropathy of the mature protein suggests that there are no potential transmembrane segments. The COOH-terminal sequence of the protein, Arg-Asp-Glu-Leu (RDEL), is similar to but different from that proposed to be an endoplasmic reticulum retention signal; Lys-Asp-Glu-Leu (KDEL) (Munro, S., and Pelham, H.R.B. (1987) Cell 48, 899-907). This variant of the retention signal may function in a similar manner to the KDEL sequence, to localize the protein to the sarcoplasmic or endoplasmic reticulum. The positively charged amino acids Lys and Arg may thus interchange in this retention signal.  相似文献   

8.
Kinjo AR  Nakamura H 《PloS one》2008,3(4):e1963
Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. In addition, singular vectors may be useful for analyzing and annotating the characteristics of conserved sites in protein families.  相似文献   

9.
Armando D. Solis 《Proteins》2015,83(12):2198-2216
To reduce complexity, understand generalized rules of protein folding, and facilitate de novo protein design, the 20‐letter amino acid alphabet is commonly reduced to a smaller alphabet by clustering amino acids based on some measure of similarity. In this work, we seek the optimal alphabet that preserves as much of the structural information found in long‐range (contact) interactions among amino acids in natively‐folded proteins. We employ the Information Maximization Device, based on information theory, to partition the amino acids into well‐defined clusters. Numbering from 2 to 19 groups, these optimal clusters of amino acids, while generated automatically, embody well‐known properties of amino acids such as hydrophobicity/polarity, charge, size, and aromaticity, and are demonstrated to maintain the discriminative power of long‐range interactions with minimal loss of mutual information. Our measurements suggest that reduced alphabets (of less than 10) are able to capture virtually all of the information residing in native contacts and may be sufficient for fold recognition, as demonstrated by extensive threading tests. In an expansive survey of the literature, we observe that alphabets derived from various approaches—including those derived from physicochemical intuition, local structure considerations, and sequence alignments of remote homologs—fare consistently well in preserving contact interaction information, highlighting a convergence in the various factors thought to be relevant to the folding code. Moreover, we find that alphabets commonly used in experimental protein design are nearly optimal and are largely coherent with observations that have arisen in this work. Proteins 2015; 83:2198–2216. © 2015 Wiley Periodicals, Inc.  相似文献   

10.
The simplest approximation of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodology to determine the contact potentials in proteins from experimental measurements of changes in protein's thermodynamic stabilities (DeltaDeltaG) upon mutations. We apply our methodology to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce experimental measurements by statistical tests. We evaluate the maximum accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of experimental (DeltaDeltaG) values. We argue that it is impossible to reach experimental accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of DeltaDeltaG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.  相似文献   

11.
A heparin-binding peptide within antithrombin III (ATIII) was identified by digestion of ATIII with Staphylococcus aureus V8 protease followed by purification on reverse-phase high pressure liquid chromatography using a C-4 column matrix. The column fractions were assayed for their ability to bind heparin by ligand blotting with 125I-fluoresceinamine-heparin as previously described (Smith, J. W., and Knauer, D. J. (1987) Anal. Biochem. 160, 105-114). This analysis identified at least three fractions with heparin binding ability of which the peptide eluting at 25.4 min gave the strongest signal. Amino acid sequence analysis of this peptide gave a partially split sequence which was consistent with regions encompassing amino acids 89-96 and 114-156. These amino acids are present in a 1:1 molar ratio which is consistent with a disulfide linkage between Cys-95 and Cys-128. High affinity heparin competed more effectively for the binding of 125I-fluoresceinamine-heparin to this peptide than low affinity heparin. Chondroitin sulfate did not block the binding of 125I-fluoresceinamine-heparin to the peptide. These data strongly suggest that the isolated peptide represents a native heparin-binding region within intact ATIII. Computer generation of a plot of running charge density of ATIII confirms that the region encompassing amino acid residues 123-141 has the highest positive charge density within the molecule. A hydropathy plot of ATIII was generated using a method similar to that of Kyte and Doolittle (Kyte, J., and Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132). This plot indicates that amino acid residues 126-140 are exposed to the exterior surface of the molecule. Based on these data, we suggest that the region corresponding to amino acid residues 114-156 is a likely site for the physiological heparin-binding domain of ATIII. We also conclude that the proposed disulfide bridges within the protein are suspect and should be re-examined (Petersen, T. E., Dudek-Wojiechowska, G., Sottrup-Jensen, L., and Magnussun, S. (1979) in The Physiological Inhibitors of Coagulation and Fibrinolysis (Collen, D., Wiman, B., and Verstaeta, M., eds) pp. 43-54, Elsevier Scientific Publishing Co., Amsterdam).  相似文献   

12.
Effects of amino acid substitutions at four fully buried sites of the ubiquitin molecule on the thermodynamic parameters (enthalpy, Gibbs energy) of unfolding were evaluated experimentally using differential scanning calorimetry. The same set of substitutions has been incorporated at each of four sites. These substitutions have been designed to perturb packing (van der Waals) interactions, hydration, and/or hydrogen bonding. From the analysis of the thermodynamic parameters for these ubiquitin variants we conclude that: (i) packing of non-polar groups in the protein interior is favorable and is largely defined by a favorable enthalpy of van der Waals interactions. The removal of one methylene group from the protein interior will destabilize a protein by approximately 5 kJ/mol, and will decrease the enthalpy of a protein by 12 kJ/mol. (ii) Burial of polar groups in the non-polar interior of a protein is highly destabilizing, and the degree of destabilization depends on the relative polarity of this group. For example, burial of Thr side-chain in the non-polar interior will be less destabilizing than burial of Asn side-chain. This decrease in stability is defined by a large enthalpy of dehydration of polar groups upon burial. (iii) The destabilizing effect of dehydration of polar groups upon burial can be compensated if these buried polar groups form hydrogen bonding. The enthalpy of this hydrogen bonding will compensate for the unfavorable dehydration energy and as a result the effect will be energetically neutral or even slightly stabilizing.  相似文献   

13.
14.
A novel algorithm is proposed for predicting transmembrane protein secondary structure from two-dimensional vector trajectories consisting of a hydropathy index and formal charge of a test amino acid sequence using stochastic dynamical system models. Two prediction problems are discussed. One is the prediction of transmembrane region counts; another is that of transmembrane regions, i.e. predicting whether or not each amino acid belongs to a transmembrane region. The prediction accuracies, using a collection of well-characterized transmembrane protein sequences and benchmarking sequences, suggest that the proposed algorithm performs reasonably well. An experiment was performed with a glutamate transporter homologue from Pyrococcus horikoshii. The predicted transmembrane regions of the five human glutamate transporter sequences and observations based on the computed likelihood are reported.  相似文献   

15.
Bastolla U  Porto M  Roman HE  Vendruscolo M 《Gene》2005,347(2):219-230
We review and further develop an analytical model that describes how thermodynamic constraints on the stability of the native state influence protein evolution in a site-specific manner. To this end, we represent both protein sequences and protein structures as vectors: structures are represented by the principal eigenvector (PE) of the protein contact matrix, a quantity that resembles closely the effective connectivity of each site; sequences are represented through the "interactivity" of each amino acid type, using novel parameters that are correlated with hydropathy scales. These interactivity parameters are more strongly correlated than the other hydropathy scales that we examine with: (1) the change upon mutations of the unfolding free energy of proteins with two-states thermodynamics; (2) genomic properties as the genome-size and the genome-wide GC content; (3) the main eigenvectors of the substitution matrices. The evolutionary average of the interactivity vector correlates very strongly with the PE of a protein structure. Using this result, we derive an analytic expression for site-specific distributions of amino acids across protein families in the form of Boltzmann distributions whose "inverse temperature" is a function of the PE component. We show that our predictions are in agreement with site-specific amino acid distributions obtained from the Protein Data Bank, and we determine the mutational model that best fits the observed site-specific amino acid distributions. Interestingly, the optimal model almost minimizes the rate at which deleterious mutations are eliminated by natural selection.  相似文献   

16.
Tan YH  Huang H  Kihara D 《Proteins》2006,64(3):587-600
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.  相似文献   

17.
The catalytic (C) subunit and the type II regulatory (RII) subunit of cAMP-dependent protein kinase can be cross-linked by interchain disulfide bonding. This disulfide bond can be catalyzed by cupric phenanthroline and also can be generated by a disulfide interchange using either RII-subunit or C-subunit that has been modified with either 5,5'-dithiobis(2-nitrobenzoic acid) (DTNB) or N-4(azidophenylthio)phthalimide (APTP). When the 2 cysteine residues of the C-subunit are reacted with DTNB prior to incubation with the RII-subunit, interchain disulfide bonding occurs. Similar observations are seen with C-subunit that had been modified with APTP. Interchain disulfide bonds also form when the RII-subunit is modified with DTNB prior to incubation with the C-subunit. The presence of cAMP facilitates this cross-linking while autophosphorylation of the RII-subunit retards the rate at which the interchain disulfide bond forms. Interchain disulfide bonds also form spontaneously when the RII-subunit and the C-subunit are dialyzed at pH 8.0 in the absence of reducing agents. The specific amino acid residues that participate in intersubunit disulfide bonding have been identified as Cys-97 in the RII-subunit and Cys-199 in the C-subunit. Based on the sequence homologies of the RII-subunit with other kinase substrates and on the proximity of Cys-97 to the catalytic site, a model is proposed in which the autophosphorylation site of the RII-subunit occupies the substrate-binding site in the holoenzyme. The model also proposes that this same site may be occupied by the region flanking Cys-199 in the C-subunit when the C-subunit is dissociated.  相似文献   

18.
An energy potential is constructed and trained to succeed in fold recognition for the general population of proteins as well as an important class which has previously been problematic: small, disulfide-bearing proteins. The potential is modeled on solvation, with the energy a function of side chain burial and the number of disulfide bonds. An accurate disulfide recognition algorithm identifies cysteine pairs which have the appropriate orientation to form a disulfide bridge. The potential has 22 energy parameters which are optimized so the Protein Data Bank (PDB) structure for each sequence in a training set is the lowest in energy out of thousands of alternative structures. One parameter per amino acid type reflects burial preference and a single parameter is used in an overpacking term. Additionally, one optimized parameter provides a favorable contribution for each disulfide identified in a given protein structure. With little training, the potential is >80% accurate in ungapped threading tests using a variety of proteins. The same level of accuracy is observed in a threading test of small proteins which have disulfide bonds. Importantly, the energy potential is also successful with proteins having uncrosslinked cysteines.  相似文献   

19.
The complete peptide map of purified folded recombinant human insulin-like growth factor II (rhIGF-II) was determined to verify its sequence and disulfide bonding scheme. Each peptide generated by digestion with pepsin was purified and characterized by amino acid analysis, amino acid sequence analysis, and fast atom bombardment/mass spectrometry. Some peptides were also sequenced using tandem mass spectrometry. The rhIGF-II peptide map was compared to that of rat insulin-like growth factor II and to that of a disulfide-bonded isomer of rhIGF-II. The data obtained in these studies are consistent with the conclusion that the rhIGF-II obtained from Escherichia coli has the correct amino acid composition, sequence, and the native disulfide-bonded structure. The binding affinities of these forms of recombinant IGF-II for IGF carrier proteins were measured in an IGF binding protein assay. The disulfide isomer of rhIGF-II was 160-fold less potent than native rhIGF-II in the competitive protein binding assay. These studies illustrate the need to characterize recombinant polypeptides containing disulfide bonds to allow the native structure to be verified before characterizing the biological properties of such molecules in hopes of elucidating their physiologic functions.  相似文献   

20.
Using information‐theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information‐based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi‐chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information‐theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号