首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The amino acid compositions of proteins from halophilic archaea were compared with those from non-halophilic mesophiles and thermophiles, in terms of the protein surface and interior, on a genome-wide scale. As we previously reported for proteins from thermophiles, a biased amino acid composition also exists in halophiles, in which an abundance of acidic residues was found on the protein surface as compared to the interior. This general feature did not seem to depend on the individual protein structures, but was applicable to all proteins encoded within the entire genome. Unique protein surface compositions are common in both halophiles and thermophiles. Statistical tests have shown that significant surface compositional differences exist among halophiles, non-halophiles, and thermophiles, while the interior composition within each of the three types of organisms does not significantly differ. Although thermophilic proteins have an almost equal abundance of both acidic and basic residues, a large excess of acidic residues in halophilic proteins seems to be compensated by fewer basic residues. Aspartic acid, lysine, asparagine, alanine, and threonine significantly contributed to the compositional differences of halophiles from meso- and thermophiles. Among them, however, only aspartic acid deviated largely from the expected amount estimated from the dinucleotide composition of the genomic DNA sequence of the halophile, which has an extremely high G+C content (68%). Thus, the other residues with large deviations (Lys, Ala, etc.) from their non-halophilic frequencies could have arisen merely as "dragging effects" caused by the compositional shift of the DNA, which would have changed to increase principally the fraction of aspartic acid alone.  相似文献   

2.
Assembly of the ribosome from its protein and RNA constituents has been studied extensively over the past 50?years, and here we utilize a comparative analysis approach to relate the composition of ribosomal proteins (r-proteins) to their role in the assembly process. We computed the amino acid distributions for the 30S subunit r-protein sequences from 560 bacterial species and compared this composition to those of other house-keeping proteins from the same species. We found that r-proteins have a significantly higher content of positively charged residues (Lysine, K, and Arginine, R) than do nonribosomal proteins (10% for R and 11% for K in r-proteins, vs. 4.7% R and 5.9% K in non-ribosomal proteins), which is consistent with prior knowledge of net positive charges carried by r-proteins (Baker et al., 2001; Klein et al., 2004; Burton et al., 2012). Furthermore, these two residues are also highly represented at contact sites along the protein/RNA interface (contact enrichment factor (CEF)?>?1). These results provide further evidence of the importance of electrostatic interactions between the positively charged proteins and negatively charged ribosomal RNA (rRNA) during ribosome assembly. Other highly represented contact residues include polar and aromatic residues, which are likely to interact with rRNA via hydrogen bonds and base stacking interactions, respectively. Interestingly, the proportion of K residues generally decreases with r-protein size, reflecting a negative correlation between protein lengths and the proportion of K (Spearman’s rank correlation, ρ?=??0.802, p?=?2.60e???5). We suggest that this trend helps the smaller r-proteins, which experience higher translational entropy than large proteins, overcome the increased free energy barrier during assembly. When the r-protein sequences were categorized according to the species’ optimal growth temperature, we found that thermophiles show increased R, Isoleucine (I), and Tyrosine (Y) content, whereas mesophiles have increased proportions of Serine (S) and Threonine (T). These results reflect one typical distinction between thermophiles and mesophiles (Kumar and Nussinov 2001), yet these differences in amino acid distributions do not extend to their respective contact sites. That is, the makeup of thermophilic and mesophilic r-protein contact residues are not significantly different (p?>?0.01). This indicates that, while the percent compositions of amino acids relating to qualities such as thermostability and protein folding are expected to vary with environmental temperature, the distributions of residues in contact with rRNA are comparable for all bacterial species. From this, we conclude that the electrostatic interactions that guide ribosome assembly are independent of temperature.  相似文献   

3.
I G Wool  Y L Chan  A Glück  K Suzuki 《Biochimie》1991,73(7-8):861-870
The covalent structures of rat ribosomal proteins P0, P1, and P2 were deduced from the sequences of nucleotides in recombinant cDNAs. P0 contains 316 amino acids and has a molecular weight of 34,178; P1 has 114 residues and a molecular weight of 11,490: and P2 has 115 amino acids and a molecular weight of 11,684. The rat P-proteins have a near identical (16 of 17 residues) sequence of amino acids at their carboxyl termini and are related to analogous proteins in other eukaryotic species. A proposal is made for a uniform nomenclature for rat and yeast ribosomal proteins.  相似文献   

4.
To facilitate swift structural characterizations, structural genomic/proteomic projects need to divide large multi-domain proteins into structural domains and to determine their structures separately. Thus, the assignment of structural domains based solely on sequence information, especially on the physico-chemical properties of the amino acid sequences, could be very helpful for such projects. In this study, we examined the characteristics of domain linker sequences, which are loop sequences connecting two structural domains. To this end, we prepared a set of 101 non-redundant multi-domain protein sequences with known structures, and performed an analysis of the linker sequences. The analysis revealed that the frequencies of five (Pro, Gly, Asp, Asn, Lys) amino acid residues differed significantly between the linker and non-linker loop sequences. Moreover, we observed a similar deviation for the residue pair frequencies between the two types of loop sequences. Finally, we describe an automated method, based on the above analysis, to detect loops that have high probabilities of being domain linkers in a protein sequence.  相似文献   

5.
Ribosomal proteins in halobacteria   总被引:2,自引:0,他引:2  
The amino acid sequences of 16 ribosomal proteins from archaebacterium Halobacterium marismortui have been determined by a direct protein chemical method. In addition, amino acid sequences of three proteins, S11, S18, and L25, have been established by DNA sequencing of their genes as well as by protein sequencing. Comparison of their sequences with those of ribosomal proteins from other organisms revealed that proteins S14, S16, S19, and L25 are related to both eukaryotic and eubacterial ribosomal proteins, being more homologous to eukaryotic than eubacterial counterparts, and proteins S12, S15, and L16 are related to only eukaryotic ribosomal proteins. Furthermore, some proteins are found to be similar to only eubacterial proteins, whereas other proteins show no homology to any other known ribosomal proteins. Comparisons of amino acid compositions between halophilic and nonhalophilic ribosomal proteins revealed that halophilic proteins gain aspartic and glutamic acid residues and significantly lose lysine and arginine residues. In addition, halophilic proteins seem to lose isoleucine as compared with Escherichia coli ribosomal proteins.  相似文献   

6.
We address the question of the thermal stability of proteins in thermophiles through comprehensive genome comparison, focussing on the occurrence of salt bridges. We compared a set of 12 genomes (from four thermophilic archaeons, one eukaryote, six mesophilic eubacteria, and one thermophilic eubacteria). Our results showed that thermophiles have a greater content of charged residues than mesophiles, both at the overall genomic level and in alpha helices. Furthermore, we found that in thermophiles the charged residues in helices tend to be preferentially arranged with a 1–4 helical spacing and oriented so that intra-helical charge pairs agree with the helix dipole. Collectively, these results imply that intra-helical salt bridges are more prevalent in thermophiles than mesophiles and thus suggest that they are an important factor stabilizing thermophilic proteins. We also found that the proteins in thermophiles appear to be somewhat shorter than those in mesophiles. However, this later observation may have more to do with evolutionary relationships than with physically stabilizing factors. In all our statistics we were careful to controls for various biases. These could have, for instance, arisen due to repetitive or duplicated sequences. In particular, we repeated our calculation using a variety of random and directed sampling schemes. One of these involved making a "stratified sample," a representative cross-section of the genomes derived from a set of 52 orthologous proteins present roughly once in each genome. For another sample, we focused on the subset of the 52 orthologs that had a known 3D structure. This allowed us to determine the frequency of tertiary as well as main-chain salt bridges. Our statistical controls supported our overall conclusion about the prevalence of salt bridges in thermophiles in comparison to mesophiles. Electronic Publication  相似文献   

7.
The majority of chloroplast ribosomal proteins are encoded in the nuclear genome. In order to characterize these proteins through their mRNA, we have previously constructed a spinach cDNA expression library and raised antisera to several spinach chloroplast ribosomal proteins. Here we describe the immuno isolation of cDNA clones encoding protein L11 and its chloroplast-targeting presequence. The cytoplasmic precursor form of L11 is 224 amino acid residues long (Mr 23,662); the mature L11 and the transit sequence are predicted to be of approximately 159 and approximately 65 residues, respectively. The predicted chloroplast L11 is significantly longer than the E coli L11, but similar (in size) to archaebacterial and yeast cytoplasmic L11. In sequence it is closer to E coli L11 (54% identity) than to the archaebacterial (32%) or yeast (23%) proteins. These results and the conservation of the contexts of the 3 methyl modified residues found in E coli L11 are discussed in the light of the endosymbiont theory and nuclear relocation of the rp/KAJL gene cluster.  相似文献   

8.
The occurrence and relative positions of cysteine residues were investigated in proteins of various species. Considering random mathematical occurrence for an amino acid coded by two codons (3. 28%), cysteine is underrepresented in all organisms investigated. Representation of cysteine appears to correlate positively with the complexity of the organism, ranging between 2.26% in mammals and 0. 5% in some members of the Archeabacteria order. This observation, together with the results obtained from comparison of cysteine content of various ribosomal proteins, indicates that evolution takes advantage of increased use of cysteine residues. In all organisms studied except plants, two cysteines are frequently found two amino acid residues apart (C-(X)(2)-C motif). Such a motif is known to be present in a variety of metal-binding proteins and oxidoreductases. Remarkably, more than 21% of all of cysteines were found within the C-(X)(2)-C motifs in ARCHEA.: This observation may indicate that cysteine appeared in ancient metal-binding proteins first and was introduced into other proteins later.  相似文献   

9.
It is known that in thermophiles the G+C content of ribosomal RNA linearly correlates with growth temperature, while that of genomic DNA does not. Although the G+C contents (singlet) of the genomic DNAs of thermophiles and methophiles do not differ significantly, the dinucleotide (doublet) compositions of the two bacterial groups clearly do. The average amino acid compositions of proteins of the two groups are also distinct. Based on these facts, we here analyzed the DNA and protein compositions of various bacteria in terms of the optimal growth temperature (OGT). Regression analyses of the sequence data for thermophilic, mesophilic and psychrophilic bacteria revealed good linear relationships between OGT and the dinucleotide compositions of DNA, and between OGT and the amino acid compositions of proteins. Together with the above-mentioned linear relationship between ribosomal RNA and OGT, the DNA and protein compositions can be regarded as thermostability measures for RNA, DNA and proteins, covering a wide range of temperatures. Both the DNA and proteins of psychrophiles apparently exhibit characteristics diametrically opposite to those of thermophiles. The physicochemical parameters of dinucleotides suggested that supercoiling of DNA is relevant to its thermostability. Protein stability in thermophiles is realized primarily through global changes that increase charged residues (i.e., Glu, Arg, and Lys) on the molecular surface of all proteins. This kind of global change is attainable through a change in the amino acid composition coupled with alterations in the DNA base composition. The general strategies of thermophiles and psychrophiles for adaptation to higher and lower temperatures, respectively, that are suggested by the present study are discussed.  相似文献   

10.
To understand more fully how amino acid composition of proteins has changed over the course of evolution, a method has been developed for estimating the composition of proteins in an ancestral genome. Estimates are based upon the composition of conserved residues in descendant sequences and empirical knowledge of the relative probability of conservation of various amino acids. Simulations are used to model and correct for errors in the estimates. The method was used to infer the amino acid composition of a large protein set in the Last Universal Ancestor (LUA) of all extant species. Relative to the modern protein set, LUA proteins were found to be generally richer in those amino acids that are believed to have been most abundant in the prebiotic environment and poorer in those amino acids that are believed to have been unavailable or scarce. It is proposed that the inferred amino acid composition of proteins in the LUA probably reflects historical events in the establishment of the genetic code.  相似文献   

11.
We identified 34 new ribosomal protein genes in the Schizosaccharomyces pombe database at the Sanger Centre coding for 30 different ribosomal proteins. All contain the Homol D-box in their promoter. We have shown that Homol D is, in this promoter type, the TATA-analogue. Many promoters contain the Homol E-box, which serves as a proximal activation sequence. Furthermore, comparative sequence analysis revealed a ribosomal protein gene encoding a protein which is the equivalent of the mammalian ribosomal protein L28. The budding yeast Saccharomyces cerevisiae has no L28 equivalent. Over the past 10 years we have isolated and characterized nine ribosomal protein (rp) genes from the fission yeast S.pombe . This endeavor yielded promoters which we have used to investigate the regulation of rp genes. Since eukaryotic ribosomal proteins are remarkably conserved and several rp genes of the budding yeast S.cerevisiae were sequenced in 1985, we probed DNA fragments encoding S.cerevisiae ribosomal proteins with genomic libraries of S.pombe . The deduced amino acid sequence of the different isolated rp genes of fission yeast share between 65 and 85% identical amino acids with their counterparts of budding yeast.  相似文献   

12.
13.
A method is proposed to represent and to analyze complete genome sequences (52 species from procaryotes and eukaryotes), based upon n-gram sequence's frequencies of amino acid pairs (bigrams), separated by a given number of other residues. For each of the species analyzed, it allows us to construct over-abundant and over-deficient occurrence profiles, summarizing amino acid bigram frequencies over the entire genome. The method deals efficiently with a sparseness of statistical representations of individual sequences, and describes every gene sequence in the same way, independently of its length and of the genome sizes. The frequency of over-abundant and over-deficient occurrences of bigrams presents a singular periodicity around 3.5 peptide bonds, suggesting a relation with the alpha helical secondary structure.  相似文献   

14.
Beta-lactoglobulin is the major whey protein in the milk of ruminants and is expressed in the mammary gland during pregnancy and lactation. Here we describe the isolation and characterization of genomic clones encoding ovine beta-lactoglobulin. Two very similar but non-identical, types of beta-lactoglobulin clone were obtained. DNA sequence analysis of one of these showed that the gene is 4900 bases long and contains seven exons. It codes for a protein of 180 amino acid residues, containing an 18-residue signal peptide, within exons I to VI; exon VII is non-coding. We show that the genes encoding serum retinol binding protein, major urinary protein, alpha-1-acid glycoprotein and apolipoprotein D have a similar organization of exons and introns to beta-lactoglobulin. In particular, a comparison between beta-lactoglobulin and retinol binding protein shows that both genes encode equivalent elements of three-dimensional protein structure within analogous exons. These proteins are all members of a large, diverse family of secretory proteins, many of which function in binding small hydrophobic molecules.  相似文献   

15.
Conformational isomers of insect odorant-binding proteins.   总被引:5,自引:0,他引:5  
We have identified and cloned the cDNAs encoding odorant-binding proteins (OBPs) from the large black chafer, Holotrichia parallela, and the yellowish elongate chafer, Heptophylla picea. Each species possess two OBPs, the proteins migrating faster in native gels (OBP1) showed high amino acid identity (>88%) to previously identified pheromone-binding proteins (PBPs) from scarab beetles. HparOBP1 and HpicOBP1 have 116 amino acids and six highly conserved cysteine residues. In contrast to OBP1 that gave a single band, both HparOBP2 and HpicOBP2 separated each into two bands in native gels (15%). The N-terminal amino acid sequences for the two bands from each species were indistinguishable, and they had the same molecular masses. Although we sequenced several clones from each species, they all encode only one protein for each species, indicating they are different conformational isomers of the same protein. HparOBP2 and HpicOBP2 have 133 amino acids and cysteine residues are conserved in proteins of the same family.  相似文献   

16.
Majewski J  Ott J 《Gene》2003,305(2):167-173
Functional differences between amino acids have long been of interest in understanding protein evolution. Several indices exist for comparing residues on the basis of their physicochemical properties and frequencies of occurrence in conserved protein alignments. Here we present a residue dissimilarity index based on coding single nucleotide polymorphisms (SNPs) in the human genome. The index represents an average, organism-wide set of differences between residues and provides important insight into evolutionary restraints on residue substitutions in the human genome. Unlike previous models, it is not restricted to highly conserved protein structures, nor confounded by evolutionary differences between species. Our results confirm earlier observations regarding residue mutabilities but also suggest that in addition to the established key properties, such as size and polarity, charge conservation may be an important and currently underestimated factor in protein evolution. We also estimate that less than 51% of amino acid substitutions occurring in the human genome are evolutionarily neutral.  相似文献   

17.
A partial nucleotide sequence of the mRNA encoding a major part of elongation factor 1 alpha (EF1 alpha) from a mitochondria-lacking protozoan, Giardia lamblia, was reported, and the phylogenetic relationship among lower eukaryotes was inferred by the maximum- likelihood and maximum-parsimony methods of protein phylogeny. Both the methods consistently demonstrated that, G. lamblia among the four protozoan species being analyzed, is the earliest offshoot of the eukaryotic tree. Although the Giardia EF1 alpha gene showed an extremely high G+C content as compared with those of other protozoa, it was concentrated only at the third codon positions, resulting in no remarkable differences of amino acid frequencies vis-a-vis those of other species. This clearly suggests (a) that the amino acid frequencies of conservative proteins are free from the drastic bias of genome G+C content, which is a serious problem in the widely used tree of ribosomal RNA, and (b) that protein phylogeny gives a robust estimation for the early divergences in the evolution of eukaryotes.   相似文献   

18.
The usage of synonymous codons and the frequencies of amino acids were investigated in the complete genome of the bacterium Thermotoga maritima using a multivariate statistical approach. The GC3 content of each gene was the most prominent source of variation of codon usage. Surprisingly the usage of UGU and UGC (synonymous triplets coding for Cys, the least frequent amino acid in this species) was detected as the second most prominent source of variation. However, this result is probably an artifact due to the very low frequency of Cys together with the nonbiased composition of this genome. The third trend was related to the preferential usage of a subset of codons among highly expressed genes, and these triplets are presumed to be translationally optimal. Concerning the amino acid usage, the hydropathy level of each protein (and therefore the frequency of charged residues) was the main trend, while the second factor was related to the frequency of usage of the smaller residues, suggesting that the cell economy strongly influences the architecture of the proteins. The third axis of the analysis discriminated the usage of Phe, Tyr, Trp (aromatic residues) plus Cys, Met, and His. These six residues have in common the property of being the preferential targets of reactive oxygen species, and therefore the anaerobic condition of T. maritima is an important factor for the amino acid frequencies. Finally, the Cys content of each protein was the fourth trend. Received: 22 June 2001 / Accepted: 1 October 2001  相似文献   

19.
MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have systematically analyzed the amino acid composition of globular proteins from different structural classes and outer membrane proteins. We found that the residues, Glu, His, Ile, Cys, Gln, Asn and Ser, show a significant difference between globular and outer membrane proteins. Based on this information, we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 89% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 80%. These accuracy levels are comparable to other methods in the literature, and this is a simple method, which could be used for dissecting outer membrane proteins from genomic sequences. The influence of protein size, structural class and specific residues for discrimination is discussed.  相似文献   

20.
Liang HK  Huang CM  Ko MT  Hwang JK 《Proteins》2005,59(1):58-63
Structural analysis is useful in elucidating structural features responsible for enhanced thermal stability of proteins. However, due to the rapid increase of sequenced genomic data, there are far more protein sequences than the corresponding three-dimensional (3D) structures. The usual sequence-based amino acid composition analysis provides useful but simplified clues about the amino acid types related to thermal stability of proteins. In this work, we developed a statistical approach to identify the significant amino acid coupling sequence patterns in thermophilic proteins. The amino acid coupling sequence pattern is defined as any 2 types of amino acids separated by 1 or more amino acids. Using this approach, we construct the rho profiles for the coupling patterns. The rho value gives a measure of the relative occurrence of a coupling pattern in thermophiles compared with mesophiles. We found that thermophiles and mesophiles exhibit significant bias in their amino acid coupling patterns. We showed that such bias is mainly due to temperature adaptation instead of species or GC content variations. Though no single outstanding coupling pattern can adequately account for protein thermostability, we can use a group of amino acid coupling patterns having strong statistical significance (p values < 10(-7)) to distinguish between thermophilic and mesophilic proteins. We found a good correlation between the optimal growth temperatures of the genomes and the occurrences of the coupling patterns (the correlation coefficient is 0.89). Furthermore, we can separate the thermophilic proteins from their mesophilic orthologs using the amino acid coupling patterns. These results may be useful in the study of the enhanced stability of proteins from thermophiles-especially when structural information is scarce. Proteins 2005. (c) 2005 Wiley-Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号