首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The correlation between the primary and secondary structures of proteins was analysed using a large data set from the Protein Data Bank. Clear preferences of amino acids towards certain secondary structures classify amino acids into four groups: α-helix preferrers, strand preferrers, turn and bend preferrers, and His and Cys (the latter two amino acids show no clear preference for any secondary structure). Amino acids in the same group have similar structural characteristics at their Cβ and Cγ atoms that predicts their preference for a particular secondary structure. All α-helix preferrers have neither polar heteroatoms on Cβ and Cγ atoms, nor branching or aromatic group on the Cβ atom. All strand preferrers have aromatic groups or branching groups on the Cβ atom. All turn and bend preferrers have a polar heteroatom on the Cβ or Cγ atoms or do not have a Cβ atom at all. These new rules could be helpful in making predictions about non-natural amino acids.
Snežana D. ZarićEmail:
  相似文献   

2.
Estimation of secondary structure in polypeptides is important for studying their structure, folding and dynamics. In NMR spectroscopy, such information is generally obtained after sequence specific resonance assignments are completed. We present here a new methodology for assignment of secondary structure type to spin systems in proteins directly from NMR spectra, without prior knowledge of resonance assignments. The methodology, named Combination of Shifts for Secondary Structure Identification in Proteins (CSSI-PRO), involves detection of specific linear combination of backbone 1Hα and 13C′ chemical shifts in a two-dimensional (2D) NMR experiment based on G-matrix Fourier transform (GFT) NMR spectroscopy. Such linear combinations of shifts facilitate editing of residues belonging to α-helical/β-strand regions into distinct spectral regions nearly independent of the amino acid type, thereby allowing the estimation of overall secondary structure content of the protein. Comparison of the predicted secondary structure content with those estimated based on their respective 3D structures and/or the method of Chemical Shift Index for 237 proteins gives a correlation of more than 90% and an overall rmsd of 7.0%, which is comparable to other biophysical techniques used for structural characterization of proteins. Taken together, this methodology has a wide range of applications in NMR spectroscopy such as rapid protein structure determination, monitoring conformational changes in protein-folding/ligand-binding studies and automated resonance assignment. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

3.
Chemical shifts of amino acids in proteins are the most sensitive and easily obtainable NMR parameters that reflect the primary, secondary, and tertiary structures of the protein. In recent years, chemical shifts have been used to identify secondary structure in peptides and proteins, and it has been confirmed that 1Hα, 13Cα, 13Cβ, and 13C′ NMR chemical shifts for all 20 amino acids are sensitive to their secondary structure. Currently, most of the methods are purely based on one-dimensional statistical analyses of various chemical shifts for each residue to identify protein secondary structure. However, it is possible to achieve an increased accuracy from the two-dimensional analyses of these chemical shifts. The 2DCSi approach performs two-dimension cluster analyses of 1Hα, 1HN, 13Cα, 13Cβ, 13C′, and 15NH chemical shifts to identify protein secondary structure and the redox state of cysteine residue. For the analysis of paired chemical shifts of 6 data sets, each of the 20 amino acids has its own 15 two-dimension cluster scattering diagrams. Accordingly, the probabilities for identifying helix and extended structure were calculated by using our scoring matrix. Compared with existing the chemical shift-based methods, it appears to improve the prediction accuracy of secondary structure identification, particularly in the extended structure. In addition, the probability of the given residue to be helix or extended structure is displayed, allows the users to make decisions by themselves. Electronic Supplementary Material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Grant sponsor: National Science Council of ROC; Grant numbers: NSC-94-2323-B006- 001, NSC-93-2212-E-006.  相似文献   

4.
Thermophiles, mesophiles, and psychrophiles have different amino acid frequencies in their proteins, probably because of the way the species adapt to very different temperatures in their environment. In this paper, we analyse how contacts between sidechains vary between homologous proteins from species that are adapted to different temperatures, but displaying relatively high sequence similarity. We investigate whether specific contacts between amino acids sidechains is a key factor in thermostabilisation in proteins. The dataset was divided into two subsets with optimal growth temperatures from 0–40 and 35–102°C. Comparison of homologues was made between low-temperature species and high-temperature species within each subset. We found that unspecific interactions like hydrophobic interactions in the core and solvent interactions and entropic effects at the surface, appear to be more important factors than specific contact types like salt bridges and aromatic clusters. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

5.
Proteinaceous components from four Washington coast margin sediments were extracted with base, fractionated into one of four size classes (<3 kDa, 3–10 kDa, 10–100 kDa, >100 kDa), and analyzed for their amino acid contents. Base-extracted material accounts for ~30% of the total hydrolyzable amino acids (THAA) and each size fraction has a unique composition, regardless of where the sediment was collected (shelf or upper slope). The <3 kDa size fraction (~10% of base-extractable THAA) is relatively enriched in glycine (~30 mol%), lysine (~5 mol%), and non-protein amino acids (~5 mol%). Glycine and non-protein amino acids are common degradation products, and lysine is very surface active. We suggest that the <3 kDa size fraction, therefore, represents a diagenetic mixture of fragments produced during the degradation of larger proteins. The 3–10 and 10–100 kDa size fractions (~10% and 42% of base-extractable THAA, respectively) have similar amino acid distributions dominated by aspartic acid (~30 mol%). Enrichments in Asp is likely due to both preservation of Asp-rich proteins and the production of Asp during degradation. The >100 kDa size fraction (~38% of base-extractable THAA) is not dominated by any particular amino acid and can not be modeled by mixing the amino acid compositions of the other size fractions. We propose that the larger size fractions (10–100 kDa and >100 kDa) represent intact, or near intact, proteins. Estimates of isoelectric points and relative hydrophobicity suggest the base-extractable proteins are primarily acidic and have globular structures. Statistical comparisons to several known proteins indicates that the base-extractable component is most similar to planktonic cytoplasmic proteins.  相似文献   

6.
The metabolic cycle of Saccharomyces cerevisiae consists of alternating oxidative (respiration) and reductive (glycolysis) energy-yielding reactions. The intracellular concentrations of amino acid precursors generated by these reactions oscillate accordingly, attaining maximal concentration during the middle of their respective yeast metabolic cycle phases. Typically, the amino acids themselves are most abundant at the end of their precursor’s phase. We show that this metabolic cycling has likely biased the amino acid composition of proteins across the S. cerevisiae genome. In particular, we observed that the metabolic source of amino acids is the single most important source of variation in the amino acid compositions of functionally related proteins and that this signal appears only in (facultative) organisms using both oxidative and reductive metabolism. Periodically expressed proteins are enriched for amino acids generated in the preceding phase of the metabolic cycle. Proteins expressed during the oxidative phase contain more glycolysis-derived amino acids, whereas proteins expressed during the reductive phase contain more respiration-derived amino acids. Rare amino acids (e.g., tryptophan) are greatly overrepresented or underrepresented, relative to the proteomic average, in periodically expressed proteins, whereas common amino acids vary by a few percent. Genome-wide, we infer that 20,000 to 60,000 residues have been modified by this previously unappreciated pressure. This trend is strongest in ancient proteins, suggesting that oscillating endogenous amino acid availability exerted genome-wide selective pressure on protein sequences across evolutionary time. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users. Benjamin L. de Bivort and Ethan O. Perlstein have contributed equally to this work.  相似文献   

7.
For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

8.
Proteins that assimilate particular elements were found to avoid using amino acids containing the element, which indicates that the metabolic constraints of amino acids may influence the evolution of proteins. We suspected that low contents of carbon, nitrogen, and sulfur may also be selected for economy in highly abundant proteins that consume large amounts of the resources of cells. By analyzing recently available proteomic data in Escherichia coli, Saccharomyces cerevisiae, and Schizosaccharomyces pombe, we found that at least the carbon and nitrogen contents in amino acid side chains are negatively correlated with protein abundance. An amino acid with a high number of carbon atoms in its side chain generally requires relatively more energy for its synthesis. Thus, it may be selected against in highly abundant proteins either because of economy in building blocks or because of economy in energy. Previous studies showed that highly abundant proteins preferentially use cheap (in terms of energy) amino acids. We found that the carbon content is still negatively correlated with protein abundance after controlling for the energetic cost of the amino acids. However, the negative correlation between protein abundance and energetic cost disappeared after controlling for carbon content. Building blocks seem to be more restricted than energy. It seems that the amino acid sequences of highly abundant proteins have to compromise between optimization for their biological functions and reducing the consumption of limiting resources. By contrast, the amino acid sequences of weakly expressed proteins are more likely to be optimized for their biological functions. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

9.
In recent years, solid-state magic-angle spinning nuclear magnetic resonance spectroscopy (MAS NMR) has been growing into an important technique to study the structure of membrane proteins, amyloid fibrils and other protein preparations which do not form crystals or are insoluble. Currently, a key bottleneck is the assignment process due to the absence of the resolving power of proton chemical shifts. Particularly for large proteins (approximately >150 residues) it is difficult to obtain a full set of resonance assignments. In order to address this problem, we present an assignment method based upon samples prepared using [1,3-13C]- and [2-13C]-glycerol as the sole carbon source in the bacterial growth medium (so-called selectively and extensively labelled protein). Such samples give rise to higher quality spectra than uniformly [13C]-labelled protein samples, and have previously been used to obtain long-range restraints for use in structure calculations. Our method exploits the characteristic cross-peak patterns observed for the different amino acid types in 13C-13C correlation and 3D NCACX and NCOCX spectra. An in-depth analysis of the patterns and how they can be used to aid assignment is presented, using spectra of the chicken α-spectrin SH3 domain (62 residues), αB-crystallin (175 residues) and outer membrane protein G (OmpG, 281 residues) as examples. Using this procedure, over 90% of the Cα, Cβ, C′ and N resonances in the core domain of αB-crystallin and around 73% in the flanking domains could be assigned (excluding 24 residues at the extreme termini of the protein). Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

10.
We studied the amino acid frequency and substitution patterns between homologues of prokaryotic species adapted to temperatures in the range 0–102°C, and found a significant temperature-dependent difference in frequency for many of the amino acids. This was particularly clear when we analysed the surface and core residues separately. The difference between the surface and the core is getting more pronounced in proteins adapted to warmer environments, with a more hydrophobic core, and more charged and long-chained amino acids on the surface of the proteins. We also see that mesophiles have a more similar amino acid composition to psychrophiles than to thermophiles, and that archea appears to have a slightly different pattern of substitutions than bacteria. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

11.
In maturing seed cells, proteins that accumulate in the protein storage vacuoles (PSVs) are synthesized on the endoplasmic reticulum (ER) and transported by vesicles to the PSVs. Vacuolar sorting determinants (VSDs) which are usually amino acid sequences of short or moderate length direct the proteins to this pathway. VSDs identified so far are classified into two types: sequence specific VSDs (ssVSDs) and C-terminal VSDs (ctVSDs). We previously demonstrated that VSDs of α′ and β subunits of β-conglycinin, one of major storage proteins of soybean (Glycine max), reside in the C-terminal ten amino acids. Here we show that both types of VSDs coexist within this region of the α′ subunit. Although ctVSDs can function only at the very C-termini of proteins, the C-terminal ten amino acids of α′ subunit directed green fluorescent protein (GFP) to the PSVs even when they were placed at the N-terminus of GFP, indicating that an ssVSD resides in the sequence. By mutation analysis, it was found that the core sequence of the ssVSD is Ser-Ile-Leu (fifth to seventh residues counted from the C-terminus) which is conserved in the α and β subunits and some vicilin-like proteins. On the other hand, the sequence composed of the C-terminal three amino acids (AFY) directed GFP to the PSVs when it was placed at the C-terminus of GFP, though the function as a VSD was disrupted at the N-terminus of GFP, indicating that the AFY sequence is a ctVSD.  相似文献   

12.
Summary We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identity using the run test statistic (r o) of Mood (1940,Ann. Math. Stat. 11, 367–392). The probability density ofr o for a collection of random sequences has mean=0 and variance=1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong α-helix propensity show a strong tendency to cluster whereas those with β-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic “patterns” that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.  相似文献   

13.
Most investigations of the forces shaping protein evolution have focussed on protein function. However, cells are typically 50%–75% protein by dry weight, with protein expression levels distributed over five orders of magnitude. Cells may, therefore, be under considerable selection pressure to incorporate amino acids that are cheap to synthesize into proteins that are highly expressed. Such selection pressure has been demonstrated to alter amino acid usage in a few organisms, but whether “cost selection” is a general phenomenon remains unknown. One reason for this is that reliable protein expression level data is not available for most organisms. Accordingly, I have developed a new method for detecting cost selection. This method depends solely on interprotein gradients in amino acid usage. Applying it to an analysis of 43 whole genomes from all three domains of life, I show that selection on the synthesis cost of amino acids is a pervasive force in shaping the composition of proteins. Moreover, some amino acids have different price tags for different organisms—the cost of amino acids is changed for organisms living in hydrothermal vents compared with those living at the sea surface or for organisms that have difficulty acquiring elements such as nitrogen compared with those that do not—so I also investigated whether differences between organisms in amino acid usage might reflect differences in synthesis or acquisition costs. The results suggest that organisms evolve to alter amino acid usage in response to environmental conditions. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. [Reviewing Editor: Hector Musto]  相似文献   

14.
Overproduction of soluble and stable proteins for functional and structural studies is a major bottleneck for structural genomics programs and traditional biochemistry laboratories. Many high-payoff proteins that are important in various biological processes are “difficult to handle” as protein reagents in their native form. We have recently made several advances in enabling biochemical technologies for improving protein stability (), allowing stratagems for efficient protein domain trapping, solubility-improving mutations, and finding protein folding partners. In particular split-GFP protein tags are a very powerful tool for detection of stable protein domains. Soluble, stable proteins tagged with the 15 amino acid GFP fragment (amino acids 216–228) can be detected in vivo and in vitro using the engineered GFP 1–10 “detector” fragment (amino acids 1–215). If the small tag is accessible, the detector fragment spontaneously binds resulting in fluorescence. Here, we describe our current and on-going efforts to move this process from the bench (manual sample manipulation) to an automated, high-throughput, liquid-handling platform. We discuss optimization and validation of bacterial culture growth, lysis protocols, protein extraction, and assays of soluble and insoluble protein in multiple 96 well plate format. The optimized liquid-handling protocol can be used for rapid determination of the optimal, compact domains from single ORFS, collections of ORFS, or cDNA libraries.  相似文献   

15.
Homology modeling is a powerful tool for predicting protein structures, whose success depends on obtaining a reasonable alignment between a given structural template and the protein sequence being analyzed. In order to leverage greater predictive power for proteins with few structural templates, we have developed a method to rank homology models based upon their compliance to secondary structure derived from experimental solid-state NMR (SSNMR) data. Such data is obtainable in a rapid manner by simple SSNMR experiments (e.g., 13C–13C 2D correlation spectra). To test our homology model scoring procedure for various amino acid labeling schemes, we generated a library of 7,474 homology models for 22 protein targets culled from the TALOS+/SPARTA+ training set of protein structures. Using subsets of amino acids that are plausibly assigned by SSNMR, we discovered that pairs of the residues Val, Ile, Thr, Ala and Leu (VITAL) emulate an ideal dataset where all residues are site specifically assigned. Scoring the models with a predicted VITAL site-specific dataset and calculating secondary structure with the Chemical Shift Index resulted in a Pearson correlation coefficient (−0.75) commensurate to the control (−0.77), where secondary structure was scored site specifically for all amino acids (ALL 20) using STRIDE. This method promises to accelerate structure procurement by SSNMR for proteins with unknown folds through guiding the selection of remotely homologous protein templates and assessing model quality.  相似文献   

16.
A new unique lectin (galactose-specific) purified from the seeds of Dolichos lablab, designated as DLL-II is a heterodimer composed of closely related subunits α and β. These were separated by SDS-PAGE and isolated by electroelution. By ESI-MS analysis their molecular masses were found to be 30.746 kDa (α) and 28.815 kDa (β) respectively. Both subunits were glycosylated and displayed similar amino acid composition. Using advanced mass spectrometry in combination with de novo sequencing and database searches for the peptides derived by enzymatic and chemical cleavage of these subunits, the primary sequence was deduced. This revealed DLL-II to be made of two polypeptide chains of 281(α) and 263(β) amino acids respectively. The β subunit differed from the α subunit by the absence of some amino acids at the carboxy terminal end. This structural difference suggests that possibly, the β subunit is derived from the α subunit by posttranslational proteolytic modification at the COOH-terminus. Comparison of the DLL-II sequence to other leguminous seed lectins indicates a high degree of structural conservation. Electronic Supplementary Material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

17.
A gene encoding an esterase (estO) was identified and sequenced from a gene library screen of the psychrotolerant bacterium Pseudoalteromonas arctica. Analysis of the 1,203 bp coding region revealed that the deduced peptide sequence is composed of 400 amino acids with a predicted molecular mass of 44.1 kDa. EstO contains a N-terminal esterase domain and an additional OsmC domain at the C-terminus (osmotically induced family of proteins). The highly conserved five-residue motif typical for all α/β hydrolases (G × S × G) was detected from position 104 to 108 together with a putative catalytic triad consisting of Ser106, Asp196, and His225. Sequence comparison showed that EstO exhibits 90% amino acid identity with hypothetical proteins containing similar esterase and OsmC domains but only around 10% identity to the amino acid sequences of known esterases. EstO variants with and without the OsmC domain were produced and purified as His-tag fusion proteins in E. coli. EstO displayed an optimum pH of 7.5 and optimum temperature of 25°C with more than 50% retained activity at the freezing point of water. The thermostability of EstO (50% activity after 5 h at 40°C) dramatically increased in the truncated variant (50% activity after 2.5 h at 90°C). Furthermore, the esterase displays broad substrate specificity for esters of short-chain fatty acids (C2–C8).  相似文献   

18.
Measurements of protein sequence-structure correlations   总被引:1,自引:0,他引:1  
Crooks GE  Wolfe J  Brenner SE 《Proteins》2004,57(4):804-810
Correlations between protein structures and amino acid sequences are widely used for protein structure prediction. For example, secondary structure predictors generally use correlations between a secondary structure sequence and corresponding primary structure sequence, whereas threading algorithms and similar tertiary structure predictors typically incorporate interresidue contact potentials. To investigate the relative importance of these sequence-structure interactions, we measured the mutual information among the primary structure, secondary structure and side-chain surface exposure, both for adjacent residues along the amino acid sequence and for tertiary structure contacts between residues distantly separated along the backbone. We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative. This suggests that knowledge-based contact potentials may be less important for structure predication than is generally believed.  相似文献   

19.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected nonhomologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for αhelix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For Β-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.  相似文献   

20.
The Chou-Fasman predictive algorithm for determining the secondary structure of proteins from the primary sequence is reviewed. Many examples of its use are presented which illustrate its wide applicability, such as predicting (a) regions with the potential for conformational change, (b) sequences which are capable of assuming several conformations in different environments, (c) effects of single amino acid mutations, (d) amino acid replacements in synthesis of peptides to bring about a change in conformation, (e) guide to the synthesis of polypeptides with definitive secondary structure,e.g. signal sequences, (f) conformational homologues from varying sequences and (g) the amino acid requirements for amphiphilicα-helical peptides.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号