首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.  相似文献   

2.
Evolutionary traces of thermophilic adaptation are manifest, on the whole-genome level, in compositional biases toward certain types of amino acids. However, it is sometimes difficult to discern their causes without a clear understanding of underlying physical mechanisms of thermal stabilization of proteins. For example, it is well-known that hyperthermophiles feature a greater proportion of charged residues, but, surprisingly, the excess of positively charged residues is almost entirely due to lysines but not arginines in the majority of hyperthermophilic genomes. All-atom simulations show that lysines have a much greater number of accessible rotamers than arginines of similar degree of burial in folded states of proteins. This finding suggests that lysines would preferentially entropically stabilize the native state. Indeed, we show in computational experiments that arginine-to-lysine amino acid substitutions result in noticeable stabilization of proteins. We then hypothesize that if evolution uses this physical mechanism as a complement to electrostatic stabilization in its strategies of thermophilic adaptation, then hyperthermostable organisms would have much greater content of lysines in their proteomes than comparably sized and similarly charged arginines. Consistent with that, high-throughput comparative analysis of complete proteomes shows extremely strong bias toward arginine-to-lysine replacement in hyperthermophilic organisms and overall much greater content of lysines than arginines in hyperthermophiles. This finding cannot be explained by genomic GC compositional biases or by the universal trend of amino acid gain and loss in protein evolution. We discovered here a novel entropic mechanism of protein thermostability due to residual dynamics of rotamer isomerization in native state and demonstrated its immediate proteomic implications. Our study provides an example of how analysis of a fundamental physical mechanism of thermostability helps to resolve a puzzle in comparative genomics as to why amino acid compositions of hyperthermophilic proteomes are significantly biased toward lysines but not similarly charged arginines.  相似文献   

3.
Amino acid repeats, or homorepeats, are low complexity protein motifs consisting of tandem repetitions of a single amino acid. Their presence and relative number vary in different proteomes, and some studies have tried to address this variation, proteome by proteome. In this work, we present a full characterization of amino acid homorepeats across evolution. We studied the presence and differential usage of each possible homorepeat in proteomes from various taxonomic groups, using clusters of very similar proteins to eliminate redundancy. The position of each amino acid repeat within proteins, and the order of co‐occurring amino acid repeats were also addressed. As a result, we present evidence about the unevenly evolution of homorepeats, as well as the functional implications of their relative position in proteins. We discuss some of these cases in their taxonomic context. Collectively, our results show evolutionary and positional signals that suggest that homorepeats have biological function, likely creating unspecific protein interactions or modulating specific interactions in a context dependent manner. In conclusion, our work supports the functional importance of homorepeats and establishes a basis for the study of other low complexity repeats. Proteins 2017; 85:709–719. © 2016 Wiley Periodicals, Inc.  相似文献   

4.
All amino acid sequences derived from 248 prokaryotic genomes, 10 invertebrate genomes (plants and fungi) and 10 vertebrate genomes were analysed by the autocorrelation function of charge sequences. The analysis of the total amino acid sequences derived from the 268 biological genomes showed that a significant periodicity of 28 residues is observable for the vertebrate genomes, but not for the other genomes. When proteins with a charge periodicity of 28 residues (PCP28) were selected from the total proteomes, we found that PCP28 in fact exists in all proteomes, but the number of PCP28 is much larger for the vertebrate proteomes than for the other proteomes. Although excess PCP28 in the vertebrate proteomes are only poorly characterized, a detailed inspection of the databases suggests that most excess PCP28 are nuclear proteins.  相似文献   

5.
The beta hairpin motif is a ubiquitous protein structural motif that can be found in molecules across the tree of life. This motif, which is also popular in synthetically designed proteins and peptides, is known for its stability and adaptability to broad functions. Here, we systematically probe all 49,000 unique beta hairpin substructures contained within the Protein Data Bank (PDB) to uncover key characteristics correlated with stable beta hairpin structure, including amino acid biases and enriched interstrand contacts. We find that position specific amino acid preferences, while seen throughout the beta hairpin structure, are most evident within the turn region, where they depend on subtle turn dynamics associated with turn length and secondary structure. We also establish a set of broad design principles, such as the inclusion of aspartic acid residues at a specific position and the careful consideration of desired secondary structure when selecting residues for the turn region, that can be applied to the generation of libraries encoding proteins or peptides containing beta hairpin structures.  相似文献   

6.
Across the streptophyte lineage, which includes charophycean algae and embryophytic plants, there have been at least four independent transitions to the terrestrial habitat. One of these involved the evolution of embryophytes (bryophytes and tracheophytes) from a charophycean ancestor, while others involved the earliest branching lineages, containing the monotypic genera Mesostigma and Chlorokybus, and within the Klebsormidiales and Zygnematales lineages. To overcome heat, water stress, and increased exposure to ultraviolet radiation, which must have accompanied these transitions, adaptive mechanisms would have been required. During periods of dehydration and/or desiccation, proteomes struggle to maintain adequate cytoplasmic solute concentrations. The increased usage of charged amino acids (DEHKR) may be one way of maintaining protein hydration, while increased use of aromatic residues (FHWY) protects proteins and nucleic acids by absorbing damaging UV, with both groups of residues thought to be important for the stabilization of protein structures. To test these hypotheses we examined amino acid sequences of orthologous proteins representing both mitochondrion- and plastid-encoded proteomes across streptophytic lineages. We compared relative differences within categories of amino acid residues and found consistent patterns of amino acid compositional fluxuation in extra-membranous regions that correspond with episodes of terrestrialization: positive change in usage frequency for residues with charged side-chains, and aromatic residues of the light-capturing chloroplast proteomes. We also found a general decrease in the usage frequency of hydrophobic, aliphatic, and small residues. These results suggest that amino acid compositional shifts in extra-membrane regions of plastid and mitochondrial proteins may represent biochemical adaptations that allowed green plants to colonize the land.  相似文献   

7.
Archaea, bacteria and eukaryotes represent the main kingdoms of life. Is there any trend for amino acid compositions of proteins found in full genomes of species of different kingdoms? What is the percentage of totally unstructured proteins in various proteomes? We obtained amino acid frequencies for different taxa using 195 known proteomes and all annotated sequences from the Swiss-Prot data base. Investigation of the two data bases (proteomes and Swiss-Prot) shows that the amino acid compositions of proteins differ substantially for different kingdoms of life, and this difference is larger between different proteomes than between different kingdoms of life. Our data demonstrate that there is a surprisingly small selection for the amino acid composition of proteins for higher organisms (eukaryotes) and their viruses in comparison with the "random" frequency following from a uniform usage of codons of the universal genetic code. On the contrary, lower organisms (bacteria and especially archaea) demonstrate an enhanced selection of amino acids. Moreover, according to our estimates, 12%, 3% and 2% of the proteins in eukaryotic, bacterial and archaean proteomes are totally disordered, and long (> 41 residues) disordered segments are found to occur in 16% of arhaean, 20% of eubacterial and 43% of eukaryotic proteins for 19 archaean, 159 bacterial and 17 eukaryotic proteomes, respectively. A correlation between amino acid compositions of proteins of various taxa, show that the highest correlation is observed between eukaryotes and their viruses (the correlation coefficient is 0.98), and bacteria and their viruses (the correlation coefficient is 0.96), while correlation between eukaryotes and archaea is 0.85 only.  相似文献   

8.
Analysis of the Arabidopsis thaliana, Saccharomyces cerevisiae, Mus musculus, Escherichia coli, Bacillus subtilis, Thermoplasma acidophilum, and Sulfolobus tokodaii genomes demonstrate that many amino acid biases occur at the N- and C-termini of proteins, a statistically significant number of these biases are evolutionarily conserved, and these biases occur in amino acids beyond the first and last five amino acids. Analyses designed to shed light on the mechanism causing amino acid biases suggest that in at least some cases the bias is caused by forces acting at the nucleic acid level. It is also demonstrated that in E. coli functionally related proteins show similar biases at the N- and C-termini suggesting that the mechanisms causing the biases are complex and in some cases are related to function.  相似文献   

9.
An exhaustive statistical analysis of the amino acid sequences at the carboxyl (C) and amino (N) termini of proteins and of coding nucleic acid sequences at the 5' side of the stop codons was undertaken. At the N ends, Met and Ala residues are over-represented at the first (+1) position whereas at positions 2 and 5 Thr is preferred. These peculiarities at N-termini are most probably related to the mechanism of initiation of translation (for Met) and to the mechanisms governing the life-span of proteins via regulation of their degradation (for Ala and Thr). We assume that the C-terminal bias facilitates fixation of the C ends on the protein globule by a preference for charged and Cys residues. The terminal biases, a novel feature of protein structure, have to be taken into account when molecular evolution, three-dimensional structure, initiation and termination of translation, protein folding and life-span are concerned. In addition, the bias of protein termini composition is an important feature which should be considered in protein engineering experiments.  相似文献   

10.
The levels of cellular organization in living organisms are the results of a variety of selection pressures. We have investigated here the final outcome of this integrated selective process in proteins of the best known microbial models Escherichia coli, Bacillus subtilis, and Methanococcus jannaschii, supposed to have undergone separate evolution for more than 1 billion years. Using multivariate analysis methods, including correspondence analysis, we studied the overall amino acid composition of all proteins making a proteome. Starting from and further developing previous results that had pointed out some general forces driving the amino acid composition of the proteomes of these model bacteria, we explored the correlations existing between the structure and functions of the proteins forming a proteome and their amino acid composition. The electric charge of amino acids measured against hydrophobicity creates a highly homogeneous cluster, made exclusively of proteins that are core components of the cytoplasmic membrane of the cell (integral inner membrane proteins). A second bias is imposed by the G+C content of the genome, indicating that protein functions are so robust with respect to amino acid changes that they can accommodate a large shift in the nucleotide content of the genome. A remarkable role of aromatic amino acids was uncovered. Expressed orphan proteins are enriched in these residues, suggesting that they might participate in a process of gain of function during evolution.  相似文献   

11.
The aim of this work is to elucidate how physical principles of protein design are reflected in natural sequences that evolved in response to the thermal conditions of the environment. Using an exactly solvable lattice model, we design sequences with selected thermal properties. Compositional analysis of designed model sequences and natural proteomes reveals a specific trend in amino acid compositions in response to the requirement of stability at elevated environmental temperature: the increase of fractions of hydrophobic and charged amino acid residues at the expense of polar ones. We show that this “from both ends of the hydrophobicity scale” trend is due to positive (to stabilize the native state) and negative (to destabilize misfolded states) components of protein design. Negative design strengthens specific repulsive non-native interactions that appear in misfolded structures. A pressure to preserve specific repulsive interactions in non-native conformations may result in correlated mutations between amino acids that are far apart in the native state but may be in contact in misfolded conformations. Such correlated mutations are indeed found in TIM barrel and other proteins.  相似文献   

12.
Now it is known that 18 neurological inherited diseases connected with mutations of multiple insertion of one amino acid residue in protein sequence. Therefore, studying the functional role of such simple motifs is an important task in biology. In this work we have investigated how often homorepeats, i.e. runs of a single amino acid residue, of 6 amino acid residues long as well as simple motifs consisting from two amino acid residues of 6 residues long situated in any position occur in three eukaryotic well studied proteomes: Homo sapiens, Drosophila melanogaster, Caenorhabditis elegans. It turns out that many simple motifs occur very often. The occurrence for each motif can be found at our site: http://antares.protres.ru/motifs_six_residues.html. One can suggest that such short similar motifs are responsible for the common functions for nongomologous, unrelated proteins from different organisms.  相似文献   

13.
As protein aggregation is potentially lethal, control of protein conformation by molecular chaperones is essential for cellular organisms. This is especially important during protein expression and translocation, since proteins are then unfolded and therefore most susceptible to form non-native interactions. Using TANGO, a statistical mechanics algorithm to predict protein aggregation, we here analyse the aggregation propensities of 28 complete proteomes. Our results show that between 10% and 20% of the residues in these proteomes are within aggregating protein segments and that this represents a lower limit for the aggregation tendency of globular proteins. Further, we show that not only evolution strongly pressurizes aggregation downwards by minimizing the amount of strongly aggregating sequences but also by selectively capping strongly aggregating hydrophobic protein sequences with arginine, lysine and proline. These residues are strongly favoured at these positions as they function as gatekeepers that are most efficient at opposing aggregation. Finally, we demonstrate that the substrate specificity of different unrelated chaperone families is geared by these gatekeepers. Chaperones face the difficulty of having to combine substrate affinity for a broad range of hydrophobic sequences and selectivity for those hydrophobic sequences that aggregate most strongly. We show that chaperones achieve these requirements by preferentially binding hydrophobic sequences that are capped by positively charged gatekeeper residues. In other words, targeting evolutionarily selected gatekeepers allows chaperones to prioritize substrate recognition according to aggregation propensity.  相似文献   

14.
Patterns of hydrophobic and hydrophilic residues play a major role in protein folding and function. Long, predominantly hydrophobic strings of 20-22 amino acids each are associated with transmembrane helices and have been used to identify such sequences. Much less attention has been paid to hydrophobic sequences within globular proteins. In prior work on computer simulations of the competition between on-pathway folding and off-pathway aggregate formation, we found that long sequences of consecutive hydrophobic residues promoted aggregation within the model, even controlling for overall hydrophobic content. We report here on an analysis of the frequencies of different lengths of contiguous blocks of hydrophobic residues in a database of amino acid sequences of proteins of known structure. Sequences of three or more consecutive hydrophobic residues are found to be significantly less common in actual globular proteins than would be predicted if residues were selected independently. The result may reflect selection against long blocks of hydrophobic residues within globular proteins relative to what would be expected if residue hydrophobicities were independent of those of nearby residues in the sequence.  相似文献   

15.

Background

Microsatellites have been used extensively in the field of comparative genomics. By studying microsatellites in coding regions we have a simple model of how genotypic changes undergo selection as they are directly expressed in the phenotype as altered proteins. The simplest of these tandem repeats in coding regions are the tri-nucleotide repeats which produce a repeat of a single amino acid when translated into proteins. Tri-nucleotide repeats are often disease associated, and are also known to be unstable to both expansion and contraction. This makes them sensitive markers for studying proteome evolution, in closely related species.

Results

The evolutionary history of the family of malarial causing parasites Plasmodia is complex because of the life-cycle of the organism, where it interacts with a number of different hosts and goes through a series of tissue specific stages. This study shows that the divergence between the primate and rodent malarial parasites has resulted in a lineage specific change in the simple amino acid repeat distribution that is correlated to A–T content. The paper also shows that this altered use of amino acids in SAARs is consistent with the repeat distributions being under selective pressure.

Conclusions

The study shows that simple amino acid repeat distributions can be used to group related species and to examine their phylogenetic relationships. This study also shows that an outgroup species with a similar A–T content can be distinguished based only on the amino acid usage in repeats, and suggest that this might be a useful feature for proteome clustering. The lineage specific use of amino acids in repeat regions suggests that comparative studies of SAAR distributions between proteomes gives an insight into the mechanisms of expansion and the selective pressures acting on the organism.  相似文献   

16.
Aggregation of expanded polyglutamine (polyQ) seems to be the cause of various genetic neurodegenerative diseases. Relatively little is known as yet about the polyQ structure and the mechanism that induces aggregation. We have characterised the solution structure of polyQ in a proteic context using a model system based on glutathione S-transferase fusion proteins. A wide range of biophysical techniques was applied. For the first time, nuclear magnetic resonance was used to observe directly and selectively the conformation of polyQ in the pathological range. We demonstrate that, in solution, polyQs are in a random coil conformation. However, under destabilising conditions, their aggregation behaviour is determined by the polyQ length.  相似文献   

17.
Combining the motif discovery and disorder protein segment identification in PDB allows us to create the first and largest library of disordered patterns. At present the library includes 109 disordered patterns. Here we offer a comprehensive analysis of the occurrence of selected disordered patterns and 20 homorepeats of 6 residues long in 123 proteomes. 27 disordered patterns occur sparsely in all considered proteomes, but the patterns of low-complexity-homorepeats-appear more often in eukaryotic than in bacterial proteomes. A comparative analysis of the number of proteins containing homorepeats of 6 residues long and the disordered selected patterns in these proteomes has been performed. The matrices of correlation coefficients between numbers of proteins where at least once a homorepeat of six residues long for each of 20 types of amino acid residues and 109 disordered patterns from the library appears in 9 kingdoms of eukaryota and 5 phyla of bacteria have been calculated. As a rule, the correlation coefficients are higher inside the considered kingdom than between them. The largest fraction of homorepeats of 6 residues belongs to Amoebozoa proteomes (D. discoideum), 46%. Moreover, the longest uninterrupted repeats belong to S306 from D. discoideum (Amoebozoa). Homorepeats of some amino acids occur more frequently than others and the type of homorepeats varies across different proteomes, . For example, E6 appears most frequent for all considered proteomes for Chordata, Q6 for Arthropoda, S6 for Nematoda. The averaged occurrence of multiple long runs of 6 amino acids in a decreasing order for 97 eukaryotic proteomes is as follows: Q6, S6, A6, G6, N6, E6, P6, T6, D6, K6, L6, H6, R6, F6, V6, I6, Y6, C6, M6, W6, and for 26 bacterial proteomes it is A6, G6, P6, and the others occur seldom. This suggests that such short similar motifs are responsible for common functions for nonhomologous, unrelated proteins from different organisms.  相似文献   

18.
Recent studies across animal phyla have suggested a possible link between amino acid compositional shifts and adaptive evolution across mitochondrial proteomes enabling longer lifespans. These studies examined associations of a gradual loss of cysteine (Cys) residues, increased usage of methionine (Met), and increased usage of threonine (Thr), with the evolution of longevity. Here, we examine all three hypotheses in a framework that considers nucleotide composition. We find that nucleotide composition is strongly correlated across codon positions, and with the above amino acid frequency patterns. We also find that the ND6 gene, which in vertebrates is the only mitochondrial gene situated on the “light-strand” shows no significant pattern for any of the amino acid associations. We also reasoned that if the mitochondrially-encoded proteins of oxidative phosphorylation (OXPHOS) were under selection for such shifts, then nuclear-encoded components should also reflect such pressure. However, we found non-correspondence of these patterns in the nuclear genes when compared to the mitochondrial genes previously associated with positive selection. These results are strongly suggestive of mutational bias, or less efficient purifying selection, as the primary driver of whole proteome shifts in amino acid composition.  相似文献   

19.
Stabilization of secondary structure elements by specific combinations of hydrophobic and hydrophilic amino acids has been studied by the way of analysis of pentapeptide fragments from twelve partial bacterial proteomes. PDB files describing structures of proteins from species with extremely high and low genomic GC-content, as well as with average G + C were included in the study. Amino acid residues in 78,009 pentapeptides from alpha helices, beta strands and coil regions were classified into hydrophobic and hydrophilic ones. The common propensity scale for 32 possible combinations of hydrophobic and hydrophilic amino acid residues in pentapeptide has been created: specific pentapeptides for helix, sheet and coil were described. The usage of pentapeptides preferably forming alpha helices is decreasing in alpha helices of partial bacterial proteomes with the increase of the average genomic GC-content in first and second codon positions. The usage of pentapeptides preferably forming beta strands is increasing in coil regions and in helices of partial bacterial proteomes with the growth of the average genomic GC-content in first and second codon positions. Due to these circumstances the probability of coil-sheet and helix-sheet transitions should be increased in proteins encoded by GC-rich genes making them prone to form amyloid in certain conditions. Possible causes of the described fact that importance of alpha helix and coil stabilization by specific combinations of hydrophobic and hydrophilic amino acids is growing with the decrease of genomic GC-content have been discussed.  相似文献   

20.
MOTIVATION: Tandem peptide repeats play a key role in self-assembly and aggregation processes. A notable example is the occurrence of tandem peptide repeats in prionic proteins and their role in the aggregation process that leads to the formation of the prion. One of the structural characteristics that is evident from the comparison of mammalian and yeast prion proteins is the presence of aromatic residues in their tandem repeats. These residues are accompanied by glycine residues before and/or after the aromatic amino acid. Such aromatic-glycine conjugates are also present in the tandem repeats of the large family of the bacterial ice nucleation proteins. To study the significance of such aromatic-glycine occurrences, a global analysis of all the aromatic octapeptide repeats in the Swiss-Prot and TrEMBL databases was conducted. The search pattern was formulated to compare the number of conjugates of each of the 20 natural amino acids before or after the different aromatic residues. RESULTS: The presence of aromatic-glycine conjugates appears to be significantly higher than aromatic conjugates to any other amino acid. Furthermore, all the six various combination of glycine occurrences before or after the three aromatic residues are present. No such pattern was observed for any other amino acid. The significance of the findings is being discussed in the context of the physicochemical properties of aromatic-glycine conjugates and its possible role in the facilitation of aggregates formation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号