首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
BACKGROUND: The composition and sequence of amino acids in a protein may serve the underlying needs of the nucleic acids that encode the protein (the genome phenotype). In extreme form, amino acids become mere placeholders inserted between functional segments or domains, and--apart from increasing protein length--playing no role in the specific function or structure of a protein (the conventional phenotype). METHODS: We studied the genomes of two malarial parasites and 521 prokaryotes (144 complete) that differ widely in GC% and optimum growth temperature, comparing the base compositions of the protein coding regions and corresponding lengths (kilobases). RESULTS: Malarial parasites show distinctive responses to base-compositional pressures that increase as protein lengths increase. A low-GC% species (Plasmodium falciparum) is likely to have more placeholder amino acids than an intermediate-GC% species (P. vivax), so that homologous proteins are longer. In prokaryotes, GC% is generally greater and AG% is generally less in open reading frames (ORFs) encoding long proteins. The increased GC% in long ORFs increases as species' GC% increases, and decreases as species' AG% increases. In low- and intermediate-GC% prokaryotic species, increases in ORF GC% as encoded proteins increase in length are largely accounted for by the base compositions of first and second (amino acid-determining) codon positions. In high-GC% prokaryotic species, first and third (non-amino acid-determining) codon positions play this role. CONCLUSION: In low- and intermediate-GC% prokaryotes, placeholder amino acids are likely to be well defined, corresponding to codons enriched in G and/or C at first and second positions. In high-GC% prokaryotes, placeholder amino acids are likely to be less well defined. Increases in ORF GC% as encoded proteins increase in length are greater in mesophiles than in thermophiles, which are constrained from increasing protein lengths in response to base-composition pressures.  相似文献   

2.
Summary We have investigated the compositional properties of coding sequences from cold-blooded vertebrates and we have compared them with those from warm-blooded vertebrates. Moreover, we have studied the compositional correlations of coding sequences with the genomes in which they are contained, as well as the compositional correlations among the codon positions of the genes analyzed.The distribution of GC levels of the third codon positions of genes from cold-blooded vertebrates are distinctly different from those of warm-blooded vertebrates in that they do not reach the high values attained by the latter. Moreover, coding sequences from cold-blooded vertebrates are either equal, or, in most cases, lower in GC (not only in third, but also in first and second codon positions) than homologous coding sequences from warm-blooded vertebrates; higher values are exceptional. These results at the gene level are in agreement with the compositional differences between cold-blooded and warm-blooded vertebrates previously found at the whole genome (DNA) level (Bernardi and Bernardi 1990a,b).Two linear correlations were found: one between the GC levels of coding sequences (or of their third codon positions) and the GC levels of the genomes of cold-blooded vertebrates containing them; and another between the GC levels of third and first+ second codon positions of genes from cold-blooded vertebrates. The first correlation applies to the genomes (or genome compartments) of all vertebrates and the second to the genes of all living organisms. These correlations are tantamount to a genomic code.  相似文献   

3.
The translation of viral mRNAs by host ribosomes is essential for infection. Hence, codon usage of virus genes may influence efficiency of infection. In addition, composition of nucleotides in the third position within codons of genes can reflect evolutionary relationships. In this study, third position codon composition was examined for the seven genes of eight Cauliflower mosaic virus isolates. Genes IV-VII had similar codon composition values and were termed Class 1 genes. Genes I-III possessed corresponding codon composition values and were termed Class 2 genes. The codon composition values of Class 1 and genes differed significantly. Neither Class 1 nor Class 2 genes had codon composition values identical to that of the host plant, Arabidopsis thaliana. However, Class 1 genes possessed codon composition values closer to those of the host than Class 2 genes. Examination of the genomes of three Rous sarcoma virus isolates indicated that codon composition values were similar for the gag, pol, and env genes but these genes differed significantly from the src genes. Since codon composition values for Rous sarcoma virus distinguished a "foreign" gene from the rest of the viral genome, it is possible that the Cauliflower mosaic virus genome is composed of genes from two different sources. Others have suggested that Cauliflower mosaic virus evolved in this manner and our data provide support for this hypothesis.  相似文献   

4.
G D'Onofrio  G Bernardi 《Gene》1992,110(1):81-88
We have investigated the compositional distributions of third codon positions of genes from the 16 prokaryotes and seven eukaryotes for which the largest numbers of coding sequences are available in data banks. In prokaryotes, both narrow and broad distributions were found. In eukaryotes, distributions were very broad (except for Saccharomyces cerevisiae) and remarkably different for different genomes. In low-GC genomes, third codon positions were lower in GC than first + second codon positions and trailed towards high GC; the opposite situation was found for high-GC genomes. In all genomes, first codon positions were higher in GC than second codon positions. We then investigated the compositional correlations between third and first + second codon positions in prokaryotic genomes (the 16 mentioned above plus 87 additional ones) and in genome compartments of eukaryotes. A general, common relationship was found, which also holds within the same (heterogeneous) genomes. This universal correlation is due to the fact that the relative effects of compositional constraints on different codon positions are the same, on the average, whatever the genome under consideration.  相似文献   

5.
Adenine nucleotides have been found to appear preferentially in the regions after the initiation codons or before the termination codons of bacterial genes. Our previous experiments showed that AAA and AAT, the two most frequent second codons in Escherichia coli, significantly enhance translation efficiency. To determine whether such a characteristic feature of base frequencies exists in eukaryote genes, we performed a comparative analysis of the base biases at the gene terminal portions using the proteomes of seven eukaryotes. Here we show that the base appearance at the codon third positions of gene terminal regions is highly biased in eukaryote genomes, although the codon third positions are almost free from amino acid preference. The bias changes depending on its position in a gene, and is characteristic of each species. We also found that bias is most outstanding at the second codon, the codon after the initiation codon. NCN is preferred in every genome; in particular, GCG is strongly favored in human and plant genes. The presence of the bias implies that the base sequences at the second codon affect translation efficiency in eukaryotes as well as bacteria.  相似文献   

6.
Xu X  Wu X  Yu Z 《Génome》2010,53(12):1041-1052
Extraordinary variation has been found in mitochondrial (mt) genome inheritance, gene content and arrangement among bivalves. However, only few bivalve mt genomes have been comparatively analyzed to infer their evolutionary scenarios. In this study, the complete mt genome of the venerid Paphia euglypta (Bivalvia: Veneridae) was firstly studied and, secondly, it was comparatively analyzed with other venerids (e.g., Venerupis philippinarum and Meretrix petechialis) to better understand the mt genome evolution within a family. Though several common features such as the AT content, codon usage of protein-coding genes, and AT/GC skew are shared by the three venerids, a high level of variability is observed in genome size, gene content, gene order, arrangements and primary sequence of nucleotides or amino acids. Most of the gene rearrangement can be explained by the "tandem duplication and random loss" model. From the observed rearrangement patterns, we speculate that block interchange between adjacent genes may be common in the evolution of mt genomes in venerids. Furthermore, this study presents several new findings in mt genome annotation of V. philippinarum and M. petechialis, and hence we have reannotated the genome of these two species as: (1) the ORF of the formerly annotated cox2 gene in V. philippinarum is deduced by using a truncated "T" codon and a second cox2 gene is identified; (2) the trnS-AGN gene is identified and marked in the mt genome of both venerids. Thus, this study demonstrated a high variability of mt genomes in the Veneridae, and showed the importance of comparative mt genome analysis to interpret the evolution of the bivalve mt genome.  相似文献   

7.
Summary Ubiquitin is ubiquitous in all eukaryotes and its amino acid sequence shows extreme conservation. Ubiquitin genes comprise direct repeats of the ubiquitin coding unit with no spacers. The nucleotide sequences coding for 13 ubiquitin genes from 11 species reported so far have been compiled and analyzed. The G+C content of codon third base reveals a positive linear correlation with the genome G+C content of the corresponding species. The slope strongly suggests that the overall G+C content of codons of polyubiquitin genes clearly reflects the genome G+C content by AT/GC substitutions at the codon third position. The G+C content of ubiquitin codon third base also shows a positive linear correlation with the overall G+C content of coding regions of compiled genes, indicating the codon choices among synonymous codons reflect the average codon usage pattern of corresponding species. On the other hand, the monoubiquitin gene, which is different from the polyubiquitin gene in gene organization, gene expression, and function of the encoding protein, shows a different codon usage pattern compared with that of the polyubiquitin gene. From comparisons of the levels of synonymous substitutions among ubiquitin repeats and the homology of the amino acid sequence of the tail of monomeric ubiquitin genes, we propose that the molecular evolution of ubiquitin genes occurred as follows: Plural primitive ubiquitin sequences were dispersed on genome in ancestral eukaryotes. Some of them situated in a particular environment fused with the tail sequence to produce monomeric ubiquitin genes that were maintained across species. After divergence of species, polyubiquitin genes were formed by duplication of the other primitive ubiquitin sequences on different chromosomes. Differences in the environments in which ubiquitin genes are embedded reflect the differences in codon choice and in gene expression pattern between poly- and monomeric ubiquitin genes.  相似文献   

8.
Small segments of rice genome sequence have been compared with that of the model plant Arabidopsis thaliana and with several closer relatives, including the cereals maize, rice, sorghum, barley and wheat. The rice genome is relatively stable relative to those of other grasses. Nevertheless, comparisons with other cereals have demonstrated that the DNA between cereal genes is highly variable and evolves rapidly. Genic regions have undergone many more small rearrangements than have been revealed by recombinational mapping studies. Tandem gene duplication/deletion is particularly common, but other types of deletions, inversions and translocations also occur. The many thousands of small genic rearrangements within the rice genome complicate but do not negate its use as a model for larger cereal genomes.  相似文献   

9.
G+C3 structuring along the genome: a common feature in prokaryotes   总被引:1,自引:0,他引:1  
The heterogeneity of gene nucleotide content in prokaryotic genomes is commonly interpreted as the result of three main phenomena: (1) genes undergo different selection pressures both during and after translation (affecting codon and amino acid choice); (2) genes undergo different mutational pressure whether they are on the leading or lagging strand; and (3) genes may have different phylogenetic origins as a result of lateral transfers. However, this view neglects the necessity of organizing genetic information on a chromosome that needs to be replicated and folded, which may add constraints to single gene evolution. As a consequence, genes are potentially subjected to different mutation and selection pressures, depending on their position in the genome. In this paper, we analyze the structuring of different codon usage measures along completely sequenced bacterial genomes. We show that most of them are highly structured, suggesting that genes have different base content, depending on their location on the chromosome. A peculiar pattern of genome structure, with a tendency toward an A+T-enrichment near the replication terminus, is found in most bacterial phyla and may reflect common chromosome constraints. Several species may have lost this pattern, probably because of genome rearrangements or integration of foreign DNA. We show that in several species, this enrichment is associated with an increase of evolutionary rate and we discuss the evolutionary implications of these results. We argue that structural constraints acting on the circular chromosome are not negligible and that this natural structuring of bacterial genomes may be a cause of overestimation in lateral gene transfer predictions using codon composition indices.  相似文献   

10.
An ab initio model for gene prediction in prokaryotic genomes is proposed based on physicochemical characteristics of codons calculated from molecular dynamics (MD) simulations. The model requires a specification of three calculated quantities for each codon: the double-helical trinucleotide base pairing energy, the base pair stacking energy, and an index of the propensity of a codon for protein-nucleic acid interactions. The base pairing and stacking energies for each codon are obtained from recently reported MD simulations on all unique tetranucleotide steps, and the third parameter is assigned based on the conjugate rule previously proposed to account for the wobble hypothesis with respect to degeneracies in the genetic code. The third interaction propensity parameter values correlate well with ab initio MD calculated solvation energies and flexibility of codon sequences as well as codon usage in genes and amino acid composition frequencies in ∼175,000 protein sequences in the Swissprot database. Assignment of these three parameters for each codon enables the calculation of the magnitude and orientation of a cumulative three-dimensional vector for a DNA sequence of any length in each of the six genomic reading frames. Analysis of 372 genomes comprising ∼350,000 genes shows that the orientations of the gene and nongene vectors are well differentiated and make a clear distinction feasible between genic and nongenic sequences at a level equivalent to or better than currently available knowledge-based models trained on the basis of empirical data, presenting a strong support for the possibility of a unique and useful physicochemical characterization of DNA sequences from codons to genomes.  相似文献   

11.
It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.  相似文献   

12.
We sequenced most of the mitochondrial (mt) genomes of 2 apocritan taxa: Vanhornia eucnemidarum and Primeuchroeus spp. These mt genomes have similar nucleotide composition and codon usage to those of mt genomes reported for other Hymenoptera, with a total A + T content of 80.1% and 78.2%, respectively. Gene content corresponds to that of other metazoan mt genomes, but gene organization is not conserved. There are a total of 6 tRNA genes rearranged in V. eucnemidarum and 9 in Primeuchroeus spp. Additionally, several noncoding regions were found in the mt genome of V. eucnemidarum, as well as evidence of a sustained gene duplication involving 3 tRNA genes. We also report an inversion of the large and small ribosomal RNA genes in Primeuchroeus spp. mt genome. However, none of the rearrangements reported are phylogenetically informative with respect to the current taxon sample.  相似文献   

13.
Torgerson DG  Singh RS 《Genetics》2004,168(3):1421-1432
Gene duplication is an important mechanism for acquiring new genes and creating genetic novelty in organisms. Evidence suggests that duplicated genes are retained at a much higher rate than originally thought and that functional divergence of gene copies is a major factor promoting their retention in the genome. We find that two Drosophila testes-specific alpha4 proteasome subunit genes (alpha4-t1 and alpha4-t2) have a higher polymorphism within species and are significantly more diverged between species than the somatic alpha4 gene. Our data suggest that following gene duplication, the alpha4-t1 gene experienced relaxed selective constraints, whereas the alpha4-t2 gene experienced positive selection acting on several codons. We report significant heterogeneity in evolutionary rates among all three paralogs at homologous codons, indicating that functional divergence has coincided with genic divergence. Reproductive subfunctionalization may allow for a more rapid evolution of reproductive traits and a greater specialization of testes function. Our data add to the increasing evidence that duplicated genes experience lower selective constraints and in some cases positive selection following duplication. Newly duplicated genes that are freer from selective constraints may provide a mechanism for developing new interactions and a pathway for the evolution of new genes.  相似文献   

14.
Codon catalog usage and the genome hypothesis.   总被引:34,自引:31,他引:34       下载免费PDF全文
Frequencies for each of the 61 amino acid codons have been determined in every published mRNA sequence of 50 or more codons. The frequencies are shown for each kind of genome and for each individual gene. A surprising consistency of choices exists among genes of the same or similar genomes. Thus each genome, or kind of genome, appears to possess a "system" for choosing between codons. Frameshift genes, however, have widely different choice strategies from normal genes. Our work indicates that the main factors distinguishing between mRNA sequences relate to choices among degenerate bases. These systematic third base choices can therefore be used to establish a new kind of genetic distance, which reflects differences in coding strategy. The choice patterns we find seem compatible with the idea that the genome and not the individual gene is the unit of selection. Each gene in a genome tends to conform to its species' usage of the codon catalog; this is our genome hypothesis.  相似文献   

15.
S Zoubak  A Rynditch  G Bernardi 《Gene》1992,119(2):207-213
The compositional distributions of genomes, genes (and their third codon positions) and long terminal repeats from retroviruses of warm-blooded vertebrates are characterized by a striking bimodality which is accompanied by a remarkable compositional homogeneity within each retroviral genome. A first, major class of retroviral genomes is GC-rich, whereas a second, minor class is GC-poor. Representative expressed viral genomes from the two classes integrate in GC-rich and GC-poor isochores, respectively, of host genomes. The first class comprises all oncoviruses (except B-types and some D-types), the second, lentiviruses, spumaviruses, as well as B-type and some D-type oncoviruses (e.g., mouse mammary tumor virus and simian retroviruses type D, respectively). The compositional bimodal distribution of retroviral genomes and the accompanying compositional homogeneity within each retroviral genome appear to be the result of the compositional evolution of retroviral genomes in their integrated form.  相似文献   

16.
An increasing number of cases where tri-nucleotide stop codons do not signal the termination of protein synthesis are being reported. In order to identify what constitutes an efficient stop signal, we analysed the region around natural stop codons in genes from a wide variety of eukaryotic species and gene families. Certain stop codons and nucleotides following stop codons are over-represented, and this pattern is accentuated in highly expressed genes. For example, the preferred signal for Saccharomyces cerevisiae and Drosophila melanogaster highly expressed genes is UAAG, and generally the signals UAA(A/G) and UGA(A/G) are preferred in eukaryotes. The GC% of the organism or DNA region can affect whether there is A or G in the second or fourth positions. We suggest therefore, that the stop codon and the nucleotide following it comprise a tetra-nucleotide stop signal. A model is proposed in which the polypeptide chain release factor, a protein, recognises this sequence, but will tolerate some substitution, particularly A to G in the second or third positions.  相似文献   

17.
基因倍增研究进展   总被引:2,自引:0,他引:2  
李鸿健  谭军 《生命科学》2006,18(2):150-154
基因倍增是指DNA片段在基因组中复制出一个或更多的拷贝,这种DNA片段可以是一小段基因组序列、整条染色体,甚至是整个基因组。基因倍增是基因组进化最主要的驱动力之一,是产生具有新功能的基因和进化出新物种的主要原因之一。本文综述了脊椎动物、模式植物和酵母在进化过程中基因倍增研究领域的最新进展,并讨论了基因倍增研究的发展方向。  相似文献   

18.
Covarion structure in plastid genome evolution: a new statistical test   总被引:4,自引:0,他引:4  
Covarion models of molecular evolution allow the rate of evolution of a site to vary through time. There are few simple and effective tests for covarion evolution, and consequently, little is known about the presence of covarion processes in molecular evolution. We describe two new tests for covarion evolution and demonstrate with simulations that they perform well under a wide range of conditions. A survey of covarion evolution in sequenced plastid genomes found evidence of covarion drift in at least 26 out of 57 genes. Covarion evolution is most evident in first and second codon positions of the plastid genes, and there is no evidence of covarion evolution in third codon positions. Therefore, the significant covarion tests are likely due to changes in the selective constraints of amino acids. The frequency of covarion evolution within the plastid genome suggests that covarion processes of evolution were important in generating the observed patterns of sequence variation among plastid genomes.  相似文献   

19.
SK Behura  DW Severson 《PloS one》2012,7(8):e43111

Background

Codon bias is a phenomenon of non-uniform usage of codons whereas codon context generally refers to sequential pair of codons in a gene. Although genome sequencing of multiple species of dipteran and hymenopteran insects have been completed only a few of these species have been analyzed for codon usage bias.

Methods and Principal Findings

Here, we use bioinformatics approaches to analyze codon usage bias and codon context patterns in a genome-wide manner among 15 dipteran and 7 hymenopteran insect species. Results show that GAA is the most frequent codon in the dipteran species whereas GAG is the most frequent codon in the hymenopteran species. Data reveals that codons ending with C or G are frequently used in the dipteran genomes whereas codons ending with A or T are frequently used in the hymenopteran genomes. Synonymous codon usage orders (SCUO) vary within genomes in a pattern that seems to be distinct for each species. Based on comparison of 30 one-to-one orthologous genes among 17 species, the fruit fly Drosophila willistoni shows the least codon usage bias whereas the honey bee (Apis mellifera) shows the highest bias. Analysis of codon context patterns of these insects shows that specific codons are frequently used as the 3′- and 5′-context of start and stop codons, respectively.

Conclusions

Codon bias pattern is distinct between dipteran and hymenopteran insects. While codon bias is favored by high GC content of dipteran genomes, high AT content of genes favors biased usage of synonymous codons in the hymenopteran insects. Also, codon context patterns vary among these species largely according to their phylogeny.  相似文献   

20.
DNA replication in vertebrate mitochondria is usually directional, leaving different portions of the genome single-stranded for different periods of time. During this time, mutations resulting from deaminations of cytosines to thymines and adenines to guanines accumulate on the heavy strand. Therefore, T/C and G/A ratios increase along mitochondrial genomes, proportionally to the time spent single-stranded during replication. Such trends exist at third codon positions for base ratios averaged across genes in individual genomes as well as for gene-specific and site-specific substitution frequencies estimated using phylogenetic methods. We use multiple regressions to test for the potential functioning of all 12 tRNA clusters in 19 primate mitochondrial genomes as alternative origins of light strand replication (OL). We provide a general algorithm for calculating time spent single stranded by a given site for any possible locations of the site and OL. For codon positions 1, 2, and 3, respectively, 23%, 9% and 35% of tRNA gene clusters have significant (p < 0.05) deamination gradients originating from them. The strength of the deamination gradient originating from tRNA gene clusters varies among species, and for five clusters, correlates with the tendency of tRNA genes in each of these clusters to form secondary structures that resemble the OL's structure. This is notably true for all codon positions for tRNA-Lys, which in absence of nuclear regulation, forms secondary structures resembling the hairpin structure of OL. For two tRNA gene clusters, correlations were statistically significant, but opposite to the direction expected by the known unidirectional replication, putatively compatible with bi-directional replication. Few substitutions in tRNA sequences can be neutral at the level of cloverleaf structure and function, yet significantly alter capacities to form OL-like structures, causing sudden evolution of genome-wide nucleotide contents.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号