首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Interior and surface of monomeric proteins   总被引:47,自引:0,他引:47  
The solvent-accessible surface area (As) of 46 monomeric proteins is calculated using atomic co-ordinates from high-resolution and well-refined crystal structures. The As of these proteins can be determined to within 1 to 2% and that of their individual residues to within 10 to 20%. The As values of proteins are correlated with their molecular weight (Mr) in the range 4000 to 35,000: the power law As = 6.3 M0.73 predicts protein As values to within 4% on average. The average water-accessible surface is found to be 57% non-polar, 24% polar and 19% charged, with 5% root-mean-square variations. The molecular surface buried inside the protein is 58% non-polar, 39% polar and 4% charged. The buried surface contains more uncharged polar groups (mostly peptides) than the surface that remains accessible, but many fewer charged groups. On average, 15% of residues in small proteins and 32% in larger ones may be classed as "buried residues", having less than 5% of their surface accessible to the solvent. The accessibilities of most other residues are evenly distributed in the range 5 to 50%. Although the fraction of buried residues increases with molecular weight, the amino acid compositions of the protein interior and surface show no systematic variation with molecular weight, except for small proteins that are often very rich in buried cysteines. From amino acid compositions of protein surfaces and interiors we calculate an effective coefficient of partition for each type of residue, and derive an implied set of transfer free energy values. This is compared with other sets of partition coefficients derived directly from experimental data. The extent to which groups of residues (charged, polar and non-polar) are buried within proteins correlates well with their hydrophobicity derived from amino acid transfer experiments. Within these three groups, the correlation is low.  相似文献   

2.
Summary It has previously been shown that the formation of GU base pairs in RNA copying processes leads to an accumulation of G and U in both strands of the replicating RNA, which results in a non-random distribution of base triplets. In the present paper, this distribution is calculated, and, using the 2-test, a correlation between the distribution of triplets and the amino acid composition of the evolutionarily conservative interior regions of selected globular proteins is established.It is suggested that GU wobbling in early replication of RNA could have led to the observed amino acid composition of present-day protein interiors. If this hypothesis is correct, the GU wobbling must have been very extensive in the imprecisely replicating RNA, even reaching values close to the critical for stability of its double-helical structure. Implications of the hypothesis both for the evolution of the genetic code and of proteins are discussed.  相似文献   

3.
The structures of several variants of staphylococcal nuclease with long flexible unnatural amino acid side chains in the hydrophobic core have been determined by X-ray crystallography. The unnatural amino acids are disulfide moieties between the lone cysteine residue in V23C nuclease and methane, ethane, 1-n-propane, 1-n-butane, 1-n-pentane, and 2-hydroxyethyl thiols. We have examined changes in the core packing of these mutants. Side chains as large as the 1-n-propyl cysteine disulfide can be incorporated without perturbation of the structure. This is due, in part, to cavities present in the wild-type protein. The longest side chains are not well defined, even though they remain buried within the protein interior. These results suggest that the enthalpy-entropy balance that governs the rigidity of protein interiors favors tight packing only weakly. Additionally, the tight packing observed normally in protein interiors may reflect, in part, the limited numbers of rotamers available to the natural amino acids.  相似文献   

4.
5.
Cost measure matrices or different amino acid indices have been widely used for studies in many fields of biology. One major criticism of these studies might be based on the unavailability of an unbiased and yet effective amino acid substitution matrix. Throughout this study we have devised a cost measure matrix based on the solvent accessibility, residue charge, and residue volume indices. Performed analyses on this novel substitution matrix (i.e. solvent accessibility charge volume (SCV) matrix) support the uncontaminated nature of this matrix regarding the genetic code. Although highly similar to a number of previously available cost measure matrices, the SCV matrix results in a more significant optimality in the error-buffering capacity of the genetic code when compared to many other amino acid substitution matrices. Besides, a method to compare an SCV-based scoring matrix with a number of widely used matrices has been devised, the results of which highlights the robustness of this matrix in protein family discrimination.  相似文献   

6.
Shestopalov BV 《Tsitologiia》2003,45(7):702-706
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distributions are considered also; c) codons can be considered as first elements of protein three-dimensional structure language.  相似文献   

7.
Novel models of idiotype nets of antibodies have been developed to study the code responsible for the amino acid interaction and complex formation of proteins. It is shown that the interaction of protein active centres in idiotype nets can be interpreted and predicted basing on the structure of code of codon roots of amino acids and polarity principle. "Internal images" of the sequence antigen determinants of proteins in immunoglobulin molecules are built mainly from the amino acid groups having common codon roots, which is in agreement with the conception of the structure of the root code.  相似文献   

8.
Fifty years have passed since the genetic code was deciphered, but how the genetic code came into being has not been satisfactorily addressed. It is now widely accepted that the earliest genetic code did not encode all 20 amino acids found in the universal genetic code as some amino acids have complex biosynthetic pathways and likely were not available from the environment. Therefore, the genetic code evolved as pathways for synthesis of new amino acids became available. One hypothesis proposes that early in the evolution of the genetic code four amino acids—valine, alanine, aspartic acid, and glycine—were coded by GNC codons (N = any base) with the remaining codons being nonsense codons. The other sixteen amino acids were subsequently added to the genetic code by changing nonsense codons into sense codons for these amino acids. Improvement in protein function is presumed to be the driving force behind the evolution of the code, but how improved function was achieved by adding amino acids has not been examined. Based on an analysis of amino acid function in proteins, an evolutionary mechanism for expansion of the genetic code is described in which individual coded amino acids were replaced by new amino acids that used nonsense codons differing by one base change from the sense codons previously used. The improved or altered protein function afforded by the changes in amino acid function provided the selective advantage underlying the expansion of the genetic code. Analysis of amino acid properties and functions explains why amino acids are found in their respective positions in the genetic code.  相似文献   

9.
The aim of this research was to examine the possible significance of genome/protein relationships in terms of effects on distribution of mass, especially in proteins. Amino acid residues in proteins have side-chains and polypeptide segments. We use "SCM" (side-chain mass), "MCM" (main-chain mass), and "deltaM" (SCM-MCM) as the deviation from "mass balance." Total MCM of the 61 amino acids in the standard code, 3412, equals total SCM: they form a mass balanced set (mean deltaM = 0). Of 14 natural variants of the code, seven have slightly positive mean deltaM values and seven have slightly negative values. Codes with the standard amino acids assigned randomly to the 20 codon sets of the standard code have about one chance in 3,300 of producing a mass balanced set. In natural proteins, as %A + T increases, the proportion of the mass in the side-chains also increases, by about half the amount calculated for standard genes with various AT/GC ratios, partly due to selection of codons with greater variability in composition at synonymous sites. For 203 representative species (including organelles), the total protein mass is distributed approximately equally between SCM and MCM (overall mean deltaM/amino acid residue, -0.06). The attainment of some overall macromolecular mass balance may have been a criterion for selecting the codon/amino acid pairs. When both structural and dynamic requirements are considered, a genetic code based on hydrophobicity and mass balance as key properties seems likely.  相似文献   

10.
Genes 38, which code for a receptor-recognizing protein present at the tip of the long tail fibers, have been sequenced from phages T2, the T-even-type phage K3 and its host range mutants K3hx, K3h1 and K3h1h. The genes from phages T2 and K3 code for proteins consisting of 262 and 260 amino acid residues, respectively. Fifty amino-terminal and 25 carboxy-terminal residues are highly conserved. The amino-terminal amino acids are most likely involved in binding to the neighboring protein 37. Between residues 116 and 226 of the T2 protein and residues 116 and 223 of the K3 protein, sequences exist that are similar to sequences present in Escherichia coli outer membrane proteins and which serve as phage receptors. Most likely, all of these regions in the latter proteins are exposed on the cell surface and are part of their phage receptor areas. In the phage proteins, these sequences are flanked by stretches rich in glycine, perhaps providing an increased flexibility for the polypeptide at these sites; some "wobble" may be required during the protein 38-receptor interaction. The mutational alterations in the host range mutants were found in gene 38. In the K3hx protein, a duplication of six base-pairs caused the wild-type sequence -Gly163-Lys-Leu-Ile- to be changed to -Gly163-Lys-Leu-Lys-Leu-Ile-. In the K3h1 protein, a glutamic acid residue at position 203 was substituted by a lysine. Both alterations occurred within areas similar to outer membrane proteins. Mutant K3h1h, derived from K3h1, exhibits an extended host range as compared to K3h1. No mutational alteration, in addition to that found in K3h1, was found in g38 nor was the part of gene 37 that encodes the carboxy-terminal moiety of the protein altered. K3h1h may represent a "trigger-happy" phage. The results of this and other work show that the phage-phage receptor systems under study represent a primitive immune system.  相似文献   

11.
Summary Chou-Fasman parameters, measuring preferences of each amino acid for different conformational regions in proteins, were used to obtain an amino acid difference index of conformational parameter distance (CPD) values. CPD values were found to be significantly lower for amino acid exchanges representing in the genetic code transitions of purines, GA than for exchanges representing either transitions of pyrimidines, CU, or transversions of purines and pyrimidines. Inasmuch as the distribution of CPD values in these non GA exchanges resembles that obtained for amino acid pairs with double or triple base differences in their underlying codons, we conclude that the genetic code was not particularly designed to minimize effects of mutation on protein conformation. That natural selection minimizes these changes, however, was shown by tabulating results obtained by the maximum parsimony method for eight protein genealogies with a total occurrence of 4574 base substitutions. At the beginning position of the codons GA transitions were in very great excess over other base substitutions, and, conversely, CU transitions were deficient. At the middle position of the codons only fast evolving proteins showed an excess of GA transitions, as though selection mainly preserved conformation in these proteins while weeding out mutations affecting chemical properties of functional sites in slow evolving proteins. In both fast and slow evolving proteins the net direction of transitions and transversions was found to be from G beginning codons to non-G beginning codons resulting in more commonly occurring amino acids, especially alanine with its generalized conformational properties, being replaced at suitable sites by amino acids with more specialized conformational and chemical properties. Historical circumstances pertaining to the origin of the genetic code and the nature of primordial proteins could account for such directional changes leading to increases in the functional density of proteins.In order to further explore the course of protein evolution, a modified parsimony algorithm was developed for constructing protein genealogies on the basis of minimum CPD length. The algorithm's ability to judge with finer discrimination that in protein evolution certain pathways of amino acid substitution should occur more readily than others was considered a potential advantage over strict maximum parsimony. In developing this CPD algorithm, the path of minimum CPD length through intermediate amino acids allowed by the genetic code for each pair of amino acids was determined. It was found that amino acid exchanges representing two base changes have a considerably lower average CPD value per base substitution than the amino acid exchanges representing single base changes. Amino acid exchanges representing three base changes have yet a further marked reduction in CPD per base change. This shows how extreme constraining effects of stabilizing selection can be circumvented, for by way of intermediate amino acids almost any amino acid can ultimately be substituted for another without damage to an evolving protein's conformation during the process.  相似文献   

12.
Why the genetic code has a fixed length? Protein information is transferred by coding each amino acid using codons whose length equals 3 for all amino acids. Hence the most probable and the least probable amino acid get a codeword with an equal length. Moreover, the distributions of amino acids found in nature are not uniform and therefore the efficiency of such codes is sub-optimal. The origins of these apparently non-efficient codes are yet unclear. In this paper we propose an a priori argument for the energy efficiency of such codes resulting from their reversibility, in contrast to their time inefficiency. Such codes are reversible in the sense that a primitive processor, reading three letters in each step, can always reverse its operation, undoing its process.We examine the codes for the distributions of amino acids that exist in nature and show that they could not be both time efficient and reversible. We investigate a family of Zipf-type distributions and present their efficient (non-fixed length) prefix code, their graphs, and the condition for their reversibility. We prove that for a large family of such distributions, if the code is time efficient, it could not be reversible. In other words, if pre-biotic processes demand reversibility, the protein code could not be time efficient. The benefits of reversibility are clear: reversible processes are adiabatic, namely, they dissipate a very small amount of energy. Such processes must be done slowly enough; therefore time efficiency is non-important. It is reasonable to assume that early biochemical complexes were more prone towards energy efficiency, where forward and backward processes were almost symmetrical.  相似文献   

13.
The degradation signal in a short-lived protein   总被引:37,自引:0,他引:37  
A Bachmair  A Varshavsky 《Cell》1989,56(6):1019-1032
Our previous work has shown that the amino-terminal residue of a short-lived protein is a distinct component of the protein's degradation signal. To define the complete signal, otherwise identical dihydrofolate reductase test proteins bearing different extensions and either a "stabilizing" or a "destabilizing" amino-terminal residue were expressed in the yeast S. cerevisiae and their in vivo half-lives compared. The amino-terminal degradation signal is shown to comprise two distinct determinants. One, discovered previously, is the protein's amino-terminal residue. The second determinant, identified in the present work, is a specific lysine residue whose function in the degradation signal is not dependent on the unique amino acid sequences in the vicinity of the residue. The mechanistic significance of the second determinant is illuminated by the finding that in a targeted, short-lived protein, a chain of branched ubiquitin-ubiquitin conjugates is confined to a lysine residue that has been identified in the present work as the second determinant of the degradation signal.  相似文献   

14.
15.
Protein evolution can be seen as the successive replacement of amino acids by other amino acids. In general, it is a very slow process which is triggered by point mutations in the nucleotide sequence. These mutations can transform into single nucleotide polymorphisms (SNPs) within populations and diverging proteins between species. It is well known that in many cases amino acids can be replaced by others without impeding the functioning of the protein, even if these are of quite different physico-chemical character. In some cases, however, almost any replacement would result in a functionally deficient protein. Based upon comprehensive published SNP data and applying correlation analysis we quantified the two antagonist factors controlling the process of amino acid replacement and thus protein evolution: First, the degenerate structure of the genetic code which facilitates the exchange of certain amino acids and, second, the physico-chemical forces which limit the range of possible exchanges to maintain a functional protein. We found that the observed frequencies of amino acid exchanges within species are best explained by the genetic code and that the conservation of physico-chemical properties plays a subordinate role, but has nevertheless to be considered as a key factor. Between moderately diverged species genetic code and physico-chemical properties exert comparable influence on amino acid exchanges. We furthermore studied amino acid exchanges in more detail for six species (four mammals, one bird, and one insect) and found that the profiles are highly correlated across all examined species despite their large evolutionary divergence of up to 800 million years. The species specific exchange profiles are also correlated to the exchange profile observed between different species. The currently available huge body of SNP data allows to characterize the role of two major shaping forces of protein evolution more quantitatively than before.  相似文献   

16.
Statistical analysis of the occurrence of tetrapeptides in 35 globular proteins was performed. It was found that the amino acids along the polypeptide chain are close to being randomly distributed and that the same tetrapeptide segments exist in different types of secondary structure. Therefore, a new method was proposed for locating 'microdomains' in protein interiors. Amino acid replacements in the hydrophobic core of six proteins were analyzed. The results show that the locations of amino acids belonging to defined microdomains are extremely conserved. It is suggested that the structures found may play a role as nucleation centers in protein folding.  相似文献   

17.
J D Irvin  G M Aron 《FEBS letters》1982,148(1):127-130
Pokeweed antiviral protein (PAP) is a protein known to inactivate eukaryotic ribosomes by an unknown enzymatic action and inhibit the production of mammalian viruses in tissue culture. This protein was subjected to a variety of chemical modifications to determine their effects upon ribosomal inactivation, antiviral action, and cytotoxicity. It was found that modifications of a number of different amino acid residues had similar effects upon all 3 activities. Also the inactivation of PAP with diethylpyrocarbonate was not due to its reaction with a histidine residue but to a modification of an unidentified amino acid residue.  相似文献   

18.
A new approach is presented to give evidence for the theories of Jukes and Crick (1-3) that at a more primitive stage the genetic code consisted of doublets separated by "comma-bases" rather than true triplets and that G and C or A and U are the exclusive bases used by the primordial code. This approach makes use of the conservation of the histone IV sequence over extremely long periods of time by comparing the amino acid composition of the average vertebrate protein with the one of histone IV, a reconstructed ancestral polypeptide and various nuclear proteins, homologous or otherwise related to it. All protamines studied and the majority of histones show deviations from the average vertebrate protein which are statistically highly significant if the amino acids sufficiently coded for by the first two bases are compared. A similar result is obtained for those amino acids which are sufficiently coded for by the first two bases of the codon and have codons composed of G and C only.  相似文献   

19.
We have identified the protein biomarkers observed in the matrix-assisted laser desorption/ionization time-of-flight mass spectra (MALDI-TOF-MS) of cell lysates of five strains of Campylobacter upsaliensis and one strain of C. helveticus by "bottom-up" proteomic techniques. Only one C. upsaliensis strain had previously been genomically sequenced. The significant findings are as follows: (1) The protein biomarkers identified were: 10 kD chaperonin, protein of unknown function (DUF465), phnA protein, probable periplasmic protein, D-methionine-binding lipoprotein MetQ, cytochrome c family protein, DNA-binding protein HU, thioredoxin, asparigenase family protein, helix-turn-helix domain protein, as well as several ribosomal and conserved hypothetical proteins. (2) Amino acid substitutions in protein biomarkers across species and strains account for variations in biomarker ion mass-to-charge (m/z). (3) The most common post-translational modifications (PTMs) identified were cleavage of N-terminal methionine and N-terminal signal peptides. The rule that predicts N-terminal methionine cleavage, based on the penultimate residue, does not appear to apply to C. upsaliensis proteins when the penultimate residue is threonine. (4) It was discovered that some protein biomarker genes of the genomically sequenced C. upsaliensis strain were found to have nucleotide sequences with GTG or TTG "start" codons that were not the actual start codon (ATG) of the protein based on proteomic analysis. (5) Proteomic identification of the protein biomarkers of the non-genomically sequenced C. upsaliensis and C. helveticus strains involved identification of homologous protein amino acid sequences to that of the sequenced strain. Interestingly, some protein sequence regions that were not completely homologous to the sequenced strain, due to amino acid substitutions, were found to have homologous sequence regions from more phyogenetically distant species/strains, e.g., C. jejuni. Exploiting this partial homology of more distant species/strains, it was possible to construct a "composite" amino acid sequence using multiple non-overlapping sequence regions from both phylogenetically proximate and distant strains. The new composite sequence was confirmed by both MS and MS/MS data. Thus, it was possible in some cases to determine the amino acid sequence of an unknown protein biomarker from a genomically non-sequenced bacterial strain without the necessity of either genetically sequencing the biomarker gene or resorting to de novo MS/MS analysis of the full protein sequence.  相似文献   

20.
Polypeptides chains are segregated by the translocon channel into secreted or membrane-inserted proteins. Recent reports claim that an in vivo system has been used to break the "amino acid code" used by translocons to make the determination of protein type (i.e. secreted or membrane-inserted). However, the experimental setup used in these studies could have confused the derivation of this code, in particular for polar amino acids. These residues are likely to undergo stabilizing interactions with other protein components in the experiment, shielding them from direct contact with the inhospitable membrane. Hence, it is our view that the "code" for protein translocation has not yet been deciphered and that further experiments are required for teasing apart the various energetic factors contributing to protein translocation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号