首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Archaea, bacteria and eukaryotes represent the main kingdoms of life. Is there any trend for amino acid compositions of proteins found in full genomes of species of different kingdoms? What is the percentage of totally unstructured proteins in various proteomes? We obtained amino acid frequencies for different taxa using 195 known proteomes and all annotated sequences from the Swiss-Prot data base. Investigation of the two data bases (proteomes and Swiss-Prot) shows that the amino acid compositions of proteins differ substantially for different kingdoms of life, and this difference is larger between different proteomes than between different kingdoms of life. Our data demonstrate that there is a surprisingly small selection for the amino acid composition of proteins for higher organisms (eukaryotes) and their viruses in comparison with the "random" frequency following from a uniform usage of codons of the universal genetic code. On the contrary, lower organisms (bacteria and especially archaea) demonstrate an enhanced selection of amino acids. Moreover, according to our estimates, 12%, 3% and 2% of the proteins in eukaryotic, bacterial and archaean proteomes are totally disordered, and long (> 41 residues) disordered segments are found to occur in 16% of arhaean, 20% of eubacterial and 43% of eukaryotic proteins for 19 archaean, 159 bacterial and 17 eukaryotic proteomes, respectively. A correlation between amino acid compositions of proteins of various taxa, show that the highest correlation is observed between eukaryotes and their viruses (the correlation coefficient is 0.98), and bacteria and their viruses (the correlation coefficient is 0.96), while correlation between eukaryotes and archaea is 0.85 only.  相似文献   

2.
3.
G D'Onofrio  G Bernardi 《Gene》1992,110(1):81-88
We have investigated the compositional distributions of third codon positions of genes from the 16 prokaryotes and seven eukaryotes for which the largest numbers of coding sequences are available in data banks. In prokaryotes, both narrow and broad distributions were found. In eukaryotes, distributions were very broad (except for Saccharomyces cerevisiae) and remarkably different for different genomes. In low-GC genomes, third codon positions were lower in GC than first + second codon positions and trailed towards high GC; the opposite situation was found for high-GC genomes. In all genomes, first codon positions were higher in GC than second codon positions. We then investigated the compositional correlations between third and first + second codon positions in prokaryotic genomes (the 16 mentioned above plus 87 additional ones) and in genome compartments of eukaryotes. A general, common relationship was found, which also holds within the same (heterogeneous) genomes. This universal correlation is due to the fact that the relative effects of compositional constraints on different codon positions are the same, on the average, whatever the genome under consideration.  相似文献   

4.
The ribosomes of the amitochondriate but hydrogenosome-containing protist lineage, the trichomonads, have previously been reported to be prokaryotic or primitive eukaryotic, based on evidence that they have a 70S sedimentation coefficient and a small number of proteins, similar to prokaryotic ribosomes. In order to determine whether the components of the trichomonad ribosome indeed differ from those of typical eukaryotic ribosomes, the ribosome of a representative trichomonad, Trichomonas vaginalis, was characterized. The sedimentation coefficient of the T. vaginalis ribosome was smaller than that of Saccharomyces cerevisiae and larger than that of Escherichia coli. Based on two-dimensional PAGE analysis, the number of different ribosomal proteins was estimated to be approximately 80. This number is the same as those obtained for typical eukaryotes (approximately 80) but larger than that of E. coli (approximately 55). N-Terminal amino acid sequencing of 18 protein spots and the complete sequences of 4 ribosomal proteins as deduced from their genes revealed these sequences to display typical eukaryotic features. Phylogenetic analyses of the five ribosomal proteins currently available also clearly confirmed that the T. vaginalis sequences are positioned within a eukaryotic clade. Comparison of deduced secondary structure models of the small and large subunit rRNAs of T. vaginalis with those of other eukaryotes revealed that all helices commonly found in typical eukaryotes are present and conserved in T. vaginalis, while variable regions are shortened or lost. These lines of evidence demonstrate that the T. vaginalis ribosome has no prokaryotic or primitive eukaryotic features but is clearly a typical eukaryotic type.  相似文献   

5.
A compositional transition was previously detected by comparing orthologous coding sequences from cold- and warm-blooded vertebrates (see Bernardi, G., Hughes, S., Mouchiroud, D., 1997. The major compositional transitions in the vertebrate genome. J. Mol. Evol. 44, S44-S51 for a review). The transition is characterized by higher GC levels (GC is the molar ratio of guanine+cytosine in DNA) and, especially, by higher GC3 levels (GC3 is the GC level of third codon positions) in coding sequences from warm-blooded vertebrates. This transition essentially affects GC-rich genes, although the nucleotide substitution rate is of the same order of magnitude in both GC-poor and GC-rich genes. In order to understand the evolutionary basis of the changes, we have compared the hydrophobicity of orthologous proteins from Xenopus and human. Although the differences are small in proteins encoded by coding sequences ranging from 0 to 65% in GC3, they are large in the proteins encoded by sequences characterized by GC3 values higher than 65%. The latter proteins are more hydrophobic in human than in Xenopus.  相似文献   

6.

Background

The origin of eukaryotes remains a fundamental question in evolutionary biology. Although it is clear that eukaryotic genomes are a chimeric combination of genes of eubacterial and archaebacterial ancestry, the specific ancestry of most eubacterial genes is still unknown. The growing availability of microbial genomes offers the possibility of analyzing the ancestry of eukaryotic genomes and testing previous hypotheses on their origins.

Methodology/Principal Findings

Here, we have applied a phylogenomic analysis to investigate a possible contribution of the Myxococcales to the first eukaryotes. We conducted a conservative pipeline with homologous sequence searches against a genomic sampling of 40 eukaryotic and 357 prokaryotic genomes. The phylogenetic reconstruction showed that several eukaryotic proteins traced to Myxococcales. Most of these proteins were associated with mitochondrial lipid intermediate pathways, particularly enzymes generating reducing equivalents with pivotal roles in fatty acid β-oxidation metabolism. Our data suggest that myxococcal species with the ability to oxidize fatty acids transferred several genes to eubacteria that eventually gave rise to the mitochondrial ancestor. Later, the eukaryotic nucleocytoplasmic lineage acquired those metabolic genes through endosymbiotic gene transfer.

Conclusions/Significance

Our results support a prokaryotic origin, different from α-proteobacteria, for several mitochondrial genes. Our data reinforce a fluid prokaryotic chromosome model in which the mitochondrion appears to be an important entry point for myxococcal genes to enter eukaryotes.  相似文献   

7.
Generalized structures of the 5S ribosomal RNAs.   总被引:15,自引:14,他引:1       下载免费PDF全文
The sequences of 5S ribosomal RNAs from a wide-range of organisms have been compared. All sequences fit a generalized 5S RNA secondary structural model. Twenty-three nucleotide positions are found universally, i.e., in 5S RNAs of eukaryotes, prokaryotes, archaebacteria, chloroplasts and mitochondria. One major distinguishing feature between the prokaryotic and eukaryotic 5S RNAs is the number of nucleotide positions between certain universal positions, e.g., prokaryotic 5S RNAs have three positions between the universal positions PuU40 and G44 (using the E. coli numbering system) and eukaryotic 5S RNAs have two. The archaebacterial 5S RNAs appear to resemble the eukaryotic 5S RNAs to varying degrees depending on the species of archaebacteria although all the RNAs conform with the prokaryotic "rule" of chain length between PuU40 and G44. The green plant chloroplast and wheat mitochondrial 5S RNAs appear prokaryotic-like when comparing the number of positions between universal nucleotides. Nucleotide positions common to eukaryotic 5S RNAs have been mapped; in addition, nucleotide sequences, helix lengths and looped-out residues specific to phyla are proposed. Several of the common nucleotides found in the 5S RNAs of metazoan somatic tissue differ in the 5S RNAs of oocytes. These changes may indicate an important functional role of the 5S RNA during oocyte maturation.  相似文献   

8.
D'Onofrio G  Ghosh TC 《Gene》2005,345(1):27-33
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.  相似文献   

9.
Molecular evolution of the HSP70 multigene family   总被引:38,自引:0,他引:38  
Eukaryotic genomes encode multiple 70-kDa heat-shock proteins (HSP70s). The Saccharomyces cerevisiae HSP70 family is comprised of eight members. Here we present the nucleotide sequence of the SSA3 and SSB2 genes, completing the nucleotide sequence data for the yeast HSP70 family. We have analyzed these yeast sequences as well as 29 HSP70s from 24 additional eukaryotic and prokaryotic species. Comparison of the sequences demonstrates the extreme conservation of HSP70s; proteins from the most distantly related species share at least 45% identity and more than one-sixth of the amino acids are identical in the aligned region (567 amino acids) among all proteins analyzed. Phylogenetic trees constructed by two independent methods indicate that ancient molecular and cellular events have given rise to at least four monophyletic groups of eukaryotic HSP70 proteins. Each group of evolutionarily similar HSP70s shares a common intracellular localization and is presumed to be comprised of functional homologues; these include heat-shock proteins of the cytoplasm, endoplasmic reticulum, mitochondria, and chloroplasts. HSP70s localized in mitochondria and plastids are most similar to the DnaK HSP70 homologues in purple bacteria and cyanobacteria, respectively, which is consistent with the proposed prokaryotic origin of these organelles. The analyses indicate that the major eukaryotic HSP70 groups arose prior to the divergence of the earliest eukaryotes, roughly 2 billion years ago. In some cases, as exemplified by the SSA genes encoding the cytoplasmic HSP70s of S. cerevisiae, more recent duplication events have given rise to subfamilies within the major groups. The S. cerevisiae SSB proteins comprise a unique subfamily not identified in other species to date. This subfamily appears to have resulted from an ancient gene duplication that occurred at approximately the same time as the origin of the major eukaryotic HSP70 groups. Correspondence to: E.A. Craig  相似文献   

10.
A strong correlation between GC content and recombination rate is observed in many eukaryotes, which is thought to be due to conversion events linked to the repair of meiotic double-strand breaks. In several organisms, the length of conversion tracts has been shown to decrease exponentially with increasing distance from the sites of meiotic double-strand breaks. I show here that this behavior leads to a simple analytical model for the evolution and the equilibrium state of the GC content of sequences devoid of meiotic double-strand break sites. In the yeast Saccharomyces cerevisiae, meiotic double-strand breaks are practically excluded from protein-coding sequences. A good fit was observed between the predictions of the model and the variations of the average GC content of the third codon position (GC3) of S. cerevisiae genes. Moreover, recombination parameters that can be extracted by fitting the data to the model coincide with experimentally determined values. These results thus indicate that meiotic recombination plays an important part in determining the fluctuations of GC content in yeast coding sequences. The model also accounted for the different patterns of GC variations observed in the genes of Candida species that exhibit a variety of sexual lifestyles, and hence a wide range of meiotic recombination rates. Finally, the variations of the average GC3 content of human and chicken coding sequences could also be fitted by the model. These results suggest the existence of a widespread pattern of GC variation in eukaryotic genes due to meiotic recombination, which would imply the generality of two features of meiotic recombination: its association with GC-biased gene conversion and the quasi-exclusion of meiotic double-strand breaks from coding sequences. Moreover, the model points out to specific constraints on protein fragments encoded by exon terminal sequences, which are the most affected by the GC bias.  相似文献   

11.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

12.
It has been hypothesized that the length of an exon tends to increase with the GC content because stop codons are AT-rich and should occur less frequently in GC-rich exons. This prediction assumes that mutation pressure plays a significant role in the occurrence and distribution of stop codons. However, the prediction is applicable not to all exons, but only to the last coding exon of a gene and to single-exon CDS sequences. We classified exons in multiexon genes in eight eukaryotic species into three groups-the first exon, the internal, and the last exon-and computed the Spearman correlation between the exon length and the percentage GC (%GC) for each of the three groups. In only five of the species studied is the correlation for the last coding exon greater than that for the first or internal exons. For the single-exon CDS sequences, the correlation between CDS length and %GC is mostly negative. Thus, eukaryotic genomes do not support the predicted relationship between exon length and %GC. In prokaryotic genomes, CDS length and %GC are positively correlated in each of the 68 completely sequenced prokaryotic genomes in GenBank with genomic GC contents varying from 25 to 68%, except for the wall-less Mycoplasma genitalium and the syphilis pathogen Treponema pallidum. Moreover, the average CDS length and the genomic GC content are also positively correlated. After correcting for genome size, the partial correlation between the average CDS length and the genomic GC content is 0.3217 ( p < 0.025).  相似文献   

13.
C‐tail‐anchored (TA) proteins constitute a heterogeneous group of membrane proteins that are inserted into membranes by unique post‐translational mechanisms and that play key roles within cells. During recent years, bioinformatic screens on eukaryotic genomes have helped to obtain comprehensive pictures of the number, intracellular distribution and functions of TA proteins, but similar screens had not yet been carried out on prokaryotic cells. Here, we report the results of a bioinformatic screen of the genomes of two bacteria and one archeon. We find that all three of these prokaryotes contain TA proteins in proportions approaching those found in eukaryotic cells, indicating that this protein group is present in all three domains of life. Although some of our hits correspond to proteins of unknown function, others are enzymes with hydrophobic substrates or have functions carried out at the inner face of the cytoplasmic membrane. To generate hypotheses on the insertion mechanisms of prokaryotic TA proteins, we compared the sequences of the prokaryotic and eukaryotic versions of Asna1/Trc40/GET3, a cytosolic ATPase that plays a key role in TA protein post‐translational delivery to membranes in eukaryotic cells. We found that hydrophobic residues involved in TA binding by the eukaryotic chaperone (Mateja et al., Nature 2009;461:361–366) are generally replaced with equally hydrophobic amino acids in the archeal homologue (ArsA), whereas this is not the case for the bacterial protein. Thus, eukaryotes may have inherited the GET3 targeting pathway from our archeal ancestor, while the bacterial homologue may be exclusively dedicated to heavy metal resistance.  相似文献   

14.
Okayasu T  Sorimachi K 《Amino acids》2009,36(2):261-271
We recently classified 23 bacteria into two types based on their complete genomes; “S-type” as represented by Staphylococcus aureus and “E-type” as represented by Escherichia coli. Classification was characterized by concentrations of Arg, Ala or Lys in the amino acid composition calculated from the complete genome. Based on these previous classifications, not only prokaryotic but also eukaryotic genome structures were investigated by amino acid compositions and nucleotide contents. Organisms consisting of 112 bacteria, 15 archaea and 18 eukaryotes were classified into two major groups by cluster analysis using GC contents at the three codon positions calculated from complete genomes. The 145 organisms were classified into “AT-type” and “GC-type” represented by high A or T (low G or C) and high G or C (low A or T) contents, respectively, at every third codon position. Reciprocal changes between G or C and A or T contents at the third codon position occurred almost synchronously in every codon among the organisms. Correlations between amino acid concentrations (Ala, Ile and Lys) and the nucleotide contents at the codon position were obtained in both “AT-type” and “GC-type” organisms, but with different regression coefficients. In certain correlations of amino acid concentrations with GC contents, eukaryotes, archaea and bacteria showed different behaviors; thus these kingdoms evolved differently. All organisms are basically classifiable into two groups having characteristic codon patterns; organisms with low GC and high AT contents at the third codon position and their derivatives, and organisms with an inverse relationship.  相似文献   

15.
编码序列和非编码序列的3-tuple分布特征   总被引:2,自引:0,他引:2  
傅强  钱敏平  陈良标  朱玉贤 《遗传学报》2005,32(10):1018-1026
非编码序列,特别是内含子的起源,是一个重要的悬而未决的问题。首先通过计算模式生物的编码序列和非编码序列的不同阅读框中3-tupie的频率分布,发现编码区中不同阅读框具有十分不同的3-tuple分布,而在非编码区中,不同阅读框的3-tuple分布几乎相等,并且这一性质不具有物种依赖性。为了描述分布差异的程度,引进夏量一对称相对熵,并通过比较原核生物和真核生物,发现无论是编码区还是非编码区,原核生物都具有比真核生物更高的SRE值。进一步研究表明,某一生物的SRE值与该生物全基因组中编码区所占的百分比存在一定的相关性(相关系数为0.86)。计算机模拟进化实验发现,2%的突变就足以使典型的嗯核生物编码区高SRE值变为真核生物内含子区特有的低SRE值。比对数据库中已经注释的内含子和编码区序列,证明确实有一部分与编码区具有很高同源性的内含子序列。实验表明,至少部分真核生物的内含子可能起源于编码序列,同时也说明SRE可能被用于研究物种基因组序列的进化。  相似文献   

16.
编码序列的(G+C)%与蛋白质的耐热性相关性分析   总被引:4,自引:0,他引:4  
朱蔚  郑佐华 《遗传学报》1999,26(4):418-427
运用计算机统计方法,对以木糖异构酶为主的几个蛋白质家族的核酸和氨基酸序列进行分析,发现密码子各位上的(G+C)%与编码序列的(G+C)%成线性正相关,大多数氨基酸的含量与编码序列的(G+C)%也存在相关性,按其相关性,将氨基酸分为正相关,负相关和不相关3类,对木糖异构酶氨基酸序列和酶的耐热性的统计发现,那些在统计学上显著的,可能提高蛋白质耐热性的氨基酸替换,往往伴随关编码序列中GC含量的上升,这提  相似文献   

17.

Background

The sizes of proteins are relevant to their biochemical structure and for their biological function. The statistical distribution of protein lengths across a diverse set of taxa can provide hints about the evolution of proteomes.

Results

Using the full genomic sequences of over 1,302 prokaryotic and 140 eukaryotic species two datasets containing 1.2 and 6.1 million proteins were generated and analyzed statistically. The lengthwise distribution of proteins can be roughly described with a gamma type or log-normal model, depending on the species. However the shape parameter of the gamma model has not a fixed value of 2, as previously suggested, but varies between 1.5 and 3 in different species. A gamma model with unrestricted shape parameter described best the distributions in ~48% of the species, whereas the log-normal distribution described better the observed protein sizes in 42% of the species. The gamma restricted function and the sum of exponentials distribution had a better fitting in only ~5% of the species. Eukaryotic proteins have an average size of 472 aa, whereas bacterial (320 aa) and archaeal (283 aa) proteins are significantly smaller (33-40% on average). Average protein sizes in different phylogenetic groups were: Alveolata (628 aa), Amoebozoa (533 aa), Fornicata (543 aa), Placozoa (453 aa), Eumetazoa (486 aa), Fungi (487 aa), Stramenopila (486 aa), Viridiplantae (392 aa). Amino acid composition is biased according to protein size. Protein length correlated negatively with %C, %M, %K, %F, %R, %W, %Y and positively with %D, %E, %Q, %S and %T. Prokaryotic proteins had a different protein size bias for %E, %G, %K and %M as compared to eukaryotes.

Conclusions

Mathematical modeling of protein length empirical distributions can be used to asses the quality of small ORFs annotation in genomic releases (detection of too many false positive small ORFs). There is a negative correlation between average protein size and total number of proteins among eukaryotes but not in prokaryotes. The %GC content is positively correlated to total protein number and protein size in prokaryotes but not in eukaryotes. Small proteins have a different amino acid bias than larger proteins. Compared to prokaryotic species, the evolution of eukaryotic proteomes was characterized by increased protein number (massive gene duplication) and substantial changes of protein size (domain addition/subtraction).  相似文献   

18.
Summary Four complete and three partial sequences ofE. coli L7/L12-type ribosomal A proteins obtained from four eukaryotes (Saccharomyces cerevisiae, Artemia salina, rat liver, and wheat germ), two metabacteria (Halobacterium cutirubrum andMethanobacterium thermoautotrophicum), and the prokaryoteEscherichia coli have been compared using a computer program that searches for homologous tertiary structures. Comparison matrices show that eukaryotic sequences sequentially match each other if deletions and/or insertions of certain residues (gaps) are assumed at specific sites corresponding to residues 36, 51, 72, and 94 ofS. cerevisiae protein YL44c. This is similar to what was previously found in prokaryotes. Metabacteria, which exhibit eukaryote-type sequences, must have separated from the eukaryotes in ancient times, because an additional deletion site is found in their sequences and their sequences have low correlation coefficients with those of all the other eukaryotes. When the eukaryote-type A proteins (110–111 residues) are compared withE. coli L7/L12 (120 residues) four groups of well-matching segments are found. It was deduced that the eukaryote-type A proteins had regenerated from the prokaryote types by a transposition and several deletions, resulting in the eukaryote-type lengths. The correspondence between the eukaryotic and prokaryotic proteins, as well as that among eukaryotic proteins themselves, is discussed in terms of protein evolution.In addition, ribosomal protein YL35 fromS. cerevisiae has been compared with RL37 from rat liver, with results indicating five well-matching parts separated by four gaps, one of which consists of 20 residues. These results contrasts with those previously reported by Lin et al. No prokaryotic counterparts to these ribosomal proteins have yet been identified.  相似文献   

19.

Background

A better understanding of the size and abundance of open reading frames (ORFS) in whole genomes may shed light on the factors that control genome complexity. Here we examine the statistical distributions of open reading frames (i.e. distribution of start and stop codons) in the fully sequenced genomes of 297 prokaryotes, and 14 eukaryotes.

Methodology/Principal Findings

By fitting mixture models to data from whole genome sequences we show that the size-frequency distributions for ORFS are strikingly similar across prokaryotic and eukaryotic genomes. Moreover, we show that i) a large fraction (60–80%) of ORF size-frequency distributions can be predicted a priori with a stochastic assembly model based on GC content, and that (ii) size-frequency distributions of the remaining “non-random” ORFs are well-fitted by log-normal or gamma distributions, and similar to the size distributions of annotated proteins.

Conclusions/Significance

Our findings suggest stochastic processes have played a primary role in the evolution of genome complexity, and that common processes govern the conservation and loss of functional genomics units in both prokaryotes and eukaryotes.  相似文献   

20.
The distributions of the junction sequences of homooligomer tracts of various lengths have been examined in prokaryotic DNA sequences and compared with those of eukaryotes. The general trends in the nearest and next to nearest neighbors to the tracts are similar for both groups. In both prokaryotes and eukaryotes A/T runs are preferentially flanked on either the 5' or the 3' ends by A and/or T. G/C runs are preferentially flanked by G and/or C. There is discrimination against A/T runs flanked by G or C and G/C runs flanked by A or T. However, whereas the distribution of prokaryotic homooligomer tract junction sequences was quite homogeneous, large variations were observed in the 5-fold larger eukaryotic database, increasing in magnitude from tracts of length 2 to 3 to 4 base pairs long. Possible DNA conformational implications and in particular DNA curvature and packaging aspects of prokaryotes and eukaryotes are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号