首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylo-genetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ?81 aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ?10 exons of small size [?176 nucleotides (nt)]. Streptophyta have on average only ?5.7 exons of medium size (?230 nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400 nt). Among sub-cellular compartments, membrane proteins are the largest (?520 aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (?240 aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ?34%more but ?20%smal-ler proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes.  相似文献   

2.
The elemental composition of proteins influences the quantities of different elements required by organisms. Here, we considered variation in the sulphur content of whole proteomes among 19 Archaea, 122 Eubacteria and 10 eukaryotes whose genomes have been fully sequenced. We found that different species vary greatly in the sulphur content of their proteins, and that average sulphur content of proteomes and genome base composition are related. Forces contributing to variation in proteomic sulphur content appear to operate quite uniformly across the proteins of different species. In particular, the sulphur content of orthologous proteins was frequently correlated with mean proteomic sulphur contents. Among prokaryotes, proteomic sulphur content tended to be greater in anaerobes, relative to non-anaerobes. Thermophiles tended to have lower proteomic sulphur content than non-thermophiles, consistent with the thermolability of cysteine and methionine residues. This work suggests that persistent environmental growth conditions can influence the evolution of elemental composition of whole proteomes in a manner that may have important implications for the amount of sulphur used by living organisms to build proteins. It extends previous studies that demonstrated links between transient changes in environmental conditions and the elemental composition of subsets of proteins expressed under these conditions.  相似文献   

3.
We cloned and sequenced a plant cDNA that encodes U1 small nuclear ribonucleoprotein (snRNP) 70K protein. The plant U1 snRNP 70K protein cDNA is not full length and lacks the coding region for 68 amino acids in the amino-terminal region as compared to human U1 snRNP 70K protein. Comparison of the deduced amino acid sequence of the plant U1 snRNP 70K protein with the amino acid sequence of animal and yeast U1 snRNP 70K protein showed a high degree of homology. The plant U1 snRNP 70K protein is more closely related to the human counter part than to the yeast 70K protein. The carboxy-terminal half is less well conserved but, like the vertebrate 70K proteins, is rich in charged amino acids. Northern analysis with the RNA isolated from different parts of the plant indicates that the snRNP 70K gene is expressed in all of the parts tested. Southern blotting of genomic DNA using the cDNA indicates that the U1 snRNP 70K protein is coded by a single gene.  相似文献   

4.
Archaea, bacteria and eukaryotes represent the main kingdoms of life. Is there any trend for amino acid compositions of proteins found in full genomes of species of different kingdoms? What is the percentage of totally unstructured proteins in various proteomes? We obtained amino acid frequencies for different taxa using 195 known proteomes and all annotated sequences from the Swiss-Prot data base. Investigation of the two data bases (proteomes and Swiss-Prot) shows that the amino acid compositions of proteins differ substantially for different kingdoms of life, and this difference is larger between different proteomes than between different kingdoms of life. Our data demonstrate that there is a surprisingly small selection for the amino acid composition of proteins for higher organisms (eukaryotes) and their viruses in comparison with the "random" frequency following from a uniform usage of codons of the universal genetic code. On the contrary, lower organisms (bacteria and especially archaea) demonstrate an enhanced selection of amino acids. Moreover, according to our estimates, 12%, 3% and 2% of the proteins in eukaryotic, bacterial and archaean proteomes are totally disordered, and long (> 41 residues) disordered segments are found to occur in 16% of arhaean, 20% of eubacterial and 43% of eukaryotic proteins for 19 archaean, 159 bacterial and 17 eukaryotic proteomes, respectively. A correlation between amino acid compositions of proteins of various taxa, show that the highest correlation is observed between eukaryotes and their viruses (the correlation coefficient is 0.98), and bacteria and their viruses (the correlation coefficient is 0.96), while correlation between eukaryotes and archaea is 0.85 only.  相似文献   

5.
Lipid bodies store oils in the form of triacylglycerols. Oleosin, caleosin and steroleosin are unique proteins localized on the surface of lipid bodies in seed plants. This study has identified genes encoding lipid body proteins oleosin, caleosin and steroleosin in the genomes of five plants: Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Selaginella moellendorffii and Physcomitrella patens. The protein sequence alignment indicated that each oleosin protein contains a highly-conserved proline knot motif, and proline knob motif is well conserved in steroleosin proteins, while caleosin proteins possess the Dx[D/N]xDG-containing calcium-binding motifs. The identification of motifs (proline knot and knob) and conserved amino acids at active site was further supported by the sequence logos. The phylogenetic analysis revealed the presence of magnoliophyte-and bryophyte-specific subgroups. We analyzed the public microarray data for expression of oleosin, caleosin and steroleosin in Arabidopsis and rice during the vegetative and reproductive stages, or under abiotic stresses. Our results indicated that genes encoding oleosin, caleosin and steroleosin proteins were expressed predominantly in plant seeds. This work may facilitate better understanding of the members of lipid-body-membrane proteins in diverse organisms and their gene expression in model plants Arabidopsis and rice.  相似文献   

6.
Flowering plants, angiosperms, can be divided into two major clades, monocots and dicots, and while differences in amino acid composition in different species from the two clades have been reported, a systematic analysis of amino acid content and distribution remains outstanding. Here, we show that monocot and dicot proteins have developed distinct amino acid content. In Arabidopsis thaliana and poplar, as in the ancestral moss Physcomitrella patens, the average mass per amino acid appears to be independent of protein length, while in the monocots rice, maize and sorghum, shorter proteins tend to be made of lighter amino acids. An examination of the elemental content of these proteomes reveals that the difference between monocot and dicot proteins can be largely attributed to their different carbon signatures. In monocots, the shorter proteins, which comprise the majority of all proteins, are made of amino acids with less carbon, while the nitrogen content is unchanged in both monocots and dicots. We hypothesise that this signature could be the result of carbon use and energy optimisation in fast-growing annual Poaceae (grasses).  相似文献   

7.
Evolutionary traces of thermophilic adaptation are manifest, on the whole-genome level, in compositional biases toward certain types of amino acids. However, it is sometimes difficult to discern their causes without a clear understanding of underlying physical mechanisms of thermal stabilization of proteins. For example, it is well-known that hyperthermophiles feature a greater proportion of charged residues, but, surprisingly, the excess of positively charged residues is almost entirely due to lysines but not arginines in the majority of hyperthermophilic genomes. All-atom simulations show that lysines have a much greater number of accessible rotamers than arginines of similar degree of burial in folded states of proteins. This finding suggests that lysines would preferentially entropically stabilize the native state. Indeed, we show in computational experiments that arginine-to-lysine amino acid substitutions result in noticeable stabilization of proteins. We then hypothesize that if evolution uses this physical mechanism as a complement to electrostatic stabilization in its strategies of thermophilic adaptation, then hyperthermostable organisms would have much greater content of lysines in their proteomes than comparably sized and similarly charged arginines. Consistent with that, high-throughput comparative analysis of complete proteomes shows extremely strong bias toward arginine-to-lysine replacement in hyperthermophilic organisms and overall much greater content of lysines than arginines in hyperthermophiles. This finding cannot be explained by genomic GC compositional biases or by the universal trend of amino acid gain and loss in protein evolution. We discovered here a novel entropic mechanism of protein thermostability due to residual dynamics of rotamer isomerization in native state and demonstrated its immediate proteomic implications. Our study provides an example of how analysis of a fundamental physical mechanism of thermostability helps to resolve a puzzle in comparative genomics as to why amino acid compositions of hyperthermophilic proteomes are significantly biased toward lysines but not similarly charged arginines.  相似文献   

8.
Protein products of highly expressed genes tend to favor amino acids that have lower average biosynthetic costs (i.e., they exhibit metabolic efficiency). While this trend has been observed in several studies, the specific sites where cost-reducing substitutions accumulate have not been well characterized. Toward that end, weighted costs in conserved and variable positions were evaluated across a total of 9,119 homologous proteins in four mammalian orders (primate, carnivore, rodent, and artiodactyls), which together contain a total of 20,457,072 amino acids. Degree of conservation at homologous positions in these mammalian proteins and average-weighted cost across all positions within a single protein are significantly correlated. Dividing human genes into two classes (those with and those without CpG islands in their promoters) suggests that humans also preferentially utilize less costly amino acids in highly expressed genes. In contrast to the intuitive expectation that the relatively weak selective force associated with metabolic efficiency would be a selection pressure in complex multicellular organisms, the overall level of selective constraint within the variable regions of mammalian proteins allows the metabolic efficiency to derive a reduction of overall biosynthetic cost, particularly in genes with the highest levels of expression.  相似文献   

9.
The plastids of ecologically and economically important algae from phyla such as stramenopiles, dinoflagellates and cryptophytes were acquired via a secondary endosymbiosis and are surrounded by three or four membranes. Nuclear‐encoded plastid‐localized proteins contain N‐terminal bipartite targeting peptides with the conserved amino acid sequence motif ‘ASAFAP’. Here we identify the plastid proteomes of two diatoms, Thalassiosira pseudonana and Phaeodactylum tricornutum, using a customized prediction tool (ASAFind) that identifies nuclear‐encoded plastid proteins in algae with secondary plastids of the red lineage based on the output of SignalP and the identification of conserved ‘ASAFAP’ motifs and transit peptides. We tested ASAFind against a large reference dataset of diatom proteins with experimentally confirmed subcellular localization and found that the tool accurately identified plastid‐localized proteins with both high sensitivity and high specificity. To identify nucleus‐encoded plastid proteins of T. pseudonana and P. tricornutum we generated optimized sets of gene models for both whole genomes, to increase the percentage of full‐length proteins compared with previous assembly model sets. ASAFind applied to these optimized sets revealed that about 8% of the proteins encoded in their nuclear genomes were predicted to be plastid localized and therefore represent the putative plastid proteomes of these algae.  相似文献   

10.
11.
12.
Nitrogen (N) is a fundamental component of nucleotides and amino acids and is often a limiting nutrient in natural ecosystems. Thus, study of the N content of biomolecules may establish important connections between ecology and genomics. However, while significant differences in the elemental composition of whole organisms are well documented, how the flux of nutrients in the cell has shaped the evolution of different cellular processes remains poorly understood. By examining the elemental composition of major functional classes of proteins in four multicellular eukaryotic model organisms, we find that the catabolic machinery shows substantially lower N content than the anabolic machinery and the rest of the proteome. This pattern suggests that ecological selection for N conservation specifically targets cellular components that are highly expressed in response to nutrient limitation. We propose that the RNA component of the anabolic machineries is the mechanistic force driving the elemental imbalance we found, and that RNA functions as an intracellular nutrient reservoir that is degraded and recycled during starvation periods. A comparison of the elemental composition of the anabolic and catabolic machineries in species that have experienced different levels of N limitation in their evolutionary history (animals versus plants) suggests that selection for N conservation has preferentially targeted the catabolic machineries of plants, resulting in a lower N content of the proteins involved in their catabolic processes. These findings link the composition of major cellular components to the environmental factors that trigger the activation of those components, suggesting that resource availability has constrained the atomic composition and the molecular architecture of the biotic processes that enable cells to respond to reduced nutrient availability.  相似文献   

13.
The plant enzyme 4-coumarate:coenzyme A ligase (4CL) is part of a family of adenylate-forming enzymes present in all organisms. Analysis of genome sequences shows the presence of '4CL-like' enzymes in plants and other organisms, but their evolutionary relationships and functions remain largely unknown. 4CL and 4CL-like genes were identified by BLAST searches in Arabidopsis, Populus, rice, Physcomitrella, Chlamydomonas and microbial genomes. Evolutionary relationships were inferred by phylogenetic analysis of aligned amino acid sequences. Expression patterns of a conserved set of Arabidopsis and poplar 4CL-like acyl-CoA synthetase (ACS) genes were assayed. The conserved ACS genes form a land plant-specific class. Angiosperm ACS genes grouped into five clades, each of which contained representatives in three fully sequenced genomes. Expression analysis revealed conserved developmental and stress-induced expression patterns of Arabidopsis and poplar genes in some clades. Evolution of plant ACS enzymes occurred early in land plants. Differential gene expansion of angiosperm ACS clades has occurred in some lineages. Evolutionary and gene expression data, combined with in vitro and limited in vivo protein function data, suggest that angiosperm ACS enzymes play conserved roles in octadecanoid and fatty acid metabolism, and play roles in organ development, for example in anthers.  相似文献   

14.
15.
16.
Few quantitative measures of genome architecture or organization exist to support assumptions of differences between microorganisms that are broadly defined as being free-living or pathogenic. General principles about complete proteomes exist for codon usage, amino acid biases and essential or core genes. Genome-wide shifts in amino acid usage between free-living and pathogenic microorganisms result in fundamental differences in the complexity of their respective proteomes that are size and gene content independent. These differences are evident across broad phylogenetic groups–a result of environmental factors and population genetic forces rather than phylogenetic distance. A novel comparative analysis of amino acid usage–utilizing linguistic analyses of word frequency in language and text–identified a global pattern of higher peptide word repetition in 376 free-living versus 421 pathogen genomes across broad ranges of genome size, G+C content and phylogenetic ancestry. This imprint of repetitive word usage indicates free-living microorganisms have a bias for repetitive sequence usage compared to pathogens. These findings quantify fundamental differences in microbial genomes relative to life-history function.  相似文献   

17.
At least six rust resistance specificities (P and P1 to P5) map to the complex P locus in flax. The P2 resistance gene was identified by transposon tagging and transgenic expression. P2 is a member of a small multigene family and encodes a protein with nucleotide binding site (NBS) and leucine-rich repeat (LRR) domains and an N-terminal Toll/interleukin-1 receptor (TIR) homology domain, as well as a C-terminal non-LRR (CNL) domain of approximately 150 amino acids. A related CNL domain was detected in almost half of the predicted Arabidopsis TIR-NBS-LRR sequences, including the RPS4 and RPP1 resistance proteins, and in the tobacco N protein, but not in the flax L and M proteins. Presence or absence of this domain defines two subclasses of TIR-NBS-LRR resistance genes. Truncations of the P2 CNL domain cause loss of function, and evidence for diversifying selection was detected in this domain, suggesting a possible role in specificity determination. A spontaneous rust-susceptible mutant of P2 contained a G-->E amino acid substitution in the GLPL motif, which is conserved in the NBS domains of plant resistance proteins and the animal cell death control proteins APAF-1 and CED4, providing direct evidence for the importance of this motif in resistance gene function. A P2 homologous gene isolated from a flax line expressing the P resistance specificity encodes a protein with only 10 amino acid differences from the P2 protein. Chimeric gene constructs indicate that just six of these amino acid changes, all located within the predicted beta-strand/beta-turn motif of four LRR units, are sufficient to alter P2 to the P specificity.  相似文献   

18.
19.
Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We compared orthologous genes between animal mitochondrial genomes that show large differences in GC and AT skews. Specifically, we compared the mitochondrial genomes of mammals, which are characterized by a negative GC skew and a positive AT skew, to those of flatworms, which show the opposite skews for both GC and AT base pairs. We found that the mammalian proteins are highly enriched in amino acids encoded by CA-rich codons (as predicted by their negative GC and positive AT skews), whereas their flatworm orthologs were enriched in amino acids encoded by GT-rich codons (also as predicted from their skews). We found that these differences in mitochondrial strand asymmetry (measured as GC and AT skews) can have very large, predictable effects on the composition of the encoded proteins.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号