首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Herein, we rigorously develop novel 3-dimensional algebraic models called Genetic Hotels of the Standard Genetic Code (SGC). We start by considering the primeval RNA genetic code which consists of the 16 codons of type RNY (purine-any base-pyrimidine). Using simple algebraic operations, we show how the RNA code could have evolved toward the current SGC via two different intermediate evolutionary stages called Extended RNA code type I and II. By rotations or translations of the subset RNY, we arrive at the SGC via the former (type I) or via the latter (type II), respectively. Biologically, the Extended RNA code type I, consists of all codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type) and third (YRN type) reading frames. The Extended RNA code type II, comprises all codons of the type RNY plus codons that arise from transversions of the RNA code in the first (YNY type) and third (RNR) nucleotide bases. Since the dimensions of remarkable subsets of the Genetic Hotels are not necessarily integer numbers, we also introduce the concept of algebraic fractal dimension. A general decoding function which maps each codon to its corresponding amino acid or the stop signals is also derived. The Phenotypic Hotel of amino acids is also illustrated. The proposed evolutionary paths are discussed in terms of the existing theories of the evolution of the SGC. The adoption of 3-dimensional models of the Genetic and Phenotypic Hotels will facilitate the understanding of the biological properties of the SGC.  相似文献   

In this work, we explicitly consider the evolution of the Standard Genetic Code (SGC) by assuming two evolutionary stages, to wit, the primeval RNY code and two intermediate codes in between. We used network theory and graph theory to measure the connectivity of each phenotypic graph. The connectivity values are compared to the values of the codes under different randomization scenarios. An error-correcting optimal code is one in which the algebraic connectivity is minimized. We show that the SGC is optimal in regard to its robustness and error-tolerance when compared to all random codes under different assumptions.  相似文献   

An algebraic and geometrical approach is used to describe the primaeval RNA code and a proposed Extended RNA code. The former consists of all codons of the type RNY, where R means purines, Y pyrimidines, and N any of them. The latter comprises the 16 codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type), and the third, (YRN type) reading frames. In each of these reading frames, there are 16 triplets that altogether complete a set of 48 triplets, which specify 17 out of the 20 amino acids, including AUG, the start codon, and the three known stop codons. The other 16 codons, do not pertain to the Extended RNA code and, constitute the union of the triplets YYY and RRR that we define as the RNA-less code. The codons in each of the three subsets of the Extended RNA code are represented by a four-dimensional hypercube and the set of codons of the RNA-less code is portrayed as a four-dimensional hyperprism. Remarkably, the union of these four symmetrical pairwise disjoint sets comprises precisely the already known six-dimensional hypercube of the Standard Genetic Code (SGC) of 64 triplets. These results suggest a plausible evolutionary path from which the primaeval RNA code could have originated the SGC, via the Extended RNA code plus the RNA-less code. We argue that the life forms that probably obeyed the Extended RNA code were intermediate between the ribo-organisms of the RNA World and the last common ancestor (LCA) of the Prokaryotes, Archaea, and Eucarya, that is, the cenancestor. A general encoding function, E, which maps each codon to its corresponding amino acid or the stop signal is also derived. In 45 out of the 64 cases, this function takes the form of a linear transformation F, which projects the whole six-dimensional hypercube onto a four-dimensional hyperface conformed by all triplets that end in cytosine. In the remaining 19 cases the function E adopts the form of an affine transformation, i.e., the composition of F with a particular translation. Graphical representations of the four local encoding functions and E, are illustrated and discussed. For every amino acid and for the stop signal, a single triplet, among those that specify it, is selected as a canonical representative. From this mapping a graphical representation of the 20 amino acids and the stop signal is also derived. We conclude that the general encoding function E represents the SGC itself.  相似文献   

J.C. Shepherd notes that codons of the type RNY (R = purine, N = any nucleotide base, Y = pyrimidine) predominate over RNR in the genes for proteins. He has hypothesized that RNY codons are the relics of “a primitive code” composed of repeating RNY triplets. He found that RNY codons predominated in fourfold RNN codon sets (family boxes). These family boxes code for valine, threonine, alanine, and glycine. We argue that the proposed “comma-less” code composed of RNY never existed, and that, in any case, survival of such a code would have long since been erased by mutations. The excess of RNY codons in family boxes is probably attributable to preference for the corresponding tRNAs.  相似文献   

A model for the developmental pathway of the genetic code, grounded on group theory and the thermodynamics of codon-anticodon interaction is presented. At variance with previous models, it takes into account not only the optimization with respect to amino acid attributes but, also physicochemical constraints and initial conditions. A 'simple-first' rule is introduced after ranking the amino acids with respect to two current measures of chemical complexity. It is shown that a primeval code of only seven amino acids is enough to build functional proteins. It is assumed that these proteins drive the further expansion of the code. The proposed primeval code is compared with surrogate codes randomly generated and with another proposal for primeval code found in the literature. The departures from the 'universal' code, observed in many organisms and cellular compartments, fit naturally in the proposed evolutionary scheme. A strong correlation is found between, on one side, the two classes of aminoacyl-tRNA synthetases, and on the other, the amino acids grouped by end-atom-type and by codon type. An inverse of Davydov's rules, to associate the amino acid end atoms (O/N and non-O/non-N) of 18 amino acids with codons containing a weak base (A/U), extended to the 20 amino acids, is derived.  相似文献   



The standard genetic code (SGC) is a unique set of rules which assign amino acids to codons. Similar amino acids tend to have similar codons indicating that the code evolved to minimize the costs of amino acid replacements in proteins, caused by mutations or translational errors. However, if such optimization in fact occurred, many different properties of amino acids must have been taken into account during the code evolution. Therefore, this problem can be reformulated as a multi-objective optimization task, in which the selection constraints are represented by measures based on various amino acid properties.


To study the optimality of the SGC we applied a multi-objective evolutionary algorithm and we used the representatives of eight clusters, which grouped over 500 indices describing various physicochemical properties of amino acids. Thanks to that we avoided an arbitrary choice of amino acid features as optimization criteria. As a consequence, we were able to conduct a more general study on the properties of the SGC than the ones presented so far in other papers on this topic. We considered two models of the genetic code, one preserving the characteristic codon blocks structure of the SGC and the other without this restriction. The results revealed that the SGC could be significantly improved in terms of error minimization, hereby it is not fully optimized. Its structure differs significantly from the structure of the codes optimized to minimize the costs of amino acid replacements. On the other hand, using newly defined quality measures that placed the SGC in the global space of theoretical genetic codes, we showed that the SGC is definitely closer to the codes that minimize the costs of amino acids replacements than those maximizing them.


The standard genetic code represents most likely only partially optimized systems, which emerged under the influence of many different factors. Our findings can be useful to researchers involved in modifying the genetic code of the living organisms and designing artificial ones.

Herein we outline a plausible proteome, encoded by assuming a primeval RNY genetic code. We unveil the primeval phenotype by using only the RNA genotype; it means that we recovered the most ancestral proteome, mostly made of the 8 amino acids encoded by RNY triplets. By looking at those fragments, it is noticeable that they are positioned, not at catalytic sites, but in the cofactor binding sites. It implies that the stabilization of a molecule appeared long before its catalytic activity, and therefore the Ur-proteome comprised a set of proteins modules that corresponded to Cofactor Stabilizing Binding Sites (CSBSs), which we call the primitive bindome. With our method, we reconstructed the structures of the “first protein modules” that Sobolevsky and Trifonov (2006) found by using only RMSD. We also examine the probable cofactors that bound to them. We discuss the notion of CSBSs as the first proteins modules in progenotes in the context of several proposals about the primitive forms of life.  相似文献   

The primitive comma-free genetic code may have had 16 triplets of the form RNY, where R = purine, N = purine or pyrimidine, and Y = pyrimidine, specifying eight (present-day) amino acids. Calculations reveal that in this primitive code all transition changes (A?G, C?U) are either silent or missense i.e. result in the same or another one of these particular eight amino acids. There are no single transitions to non-RNY codons. Single transversions in the primitive codons can, individually, generate new (present-day) codons for four or eight amino acids. Present-day glutamine, tryptophan and stop (UGA, UAA, UAG) codons cannot be so derived., by single transversions, from any of the eight primitive codons. The modern initiation codons, AUG and GUG, can however be generated by both C → G and U → G single transversions in primitive codons. Overall, a total of 32 modern sense codons, not represented in the primitive RNY code, can be derived from this code by single transversions. Many modern codons, including all those not generated by single transversions in the primitive code, can also be produced by either of the two types of frameshift possible in runs of U- or C-rich primitive codons. Present-day stop codons are generated by +1 (-2) type frameshifts in U-rich primitive runs; AUG and GUG initiation codons are produced by the other type, +2 (-1), frameshifts in U-rich runs.  相似文献   

Genetic code is not universal. Various non-standard versions of the code were found in mitochondrial, prokaryotic and eukaryotic genomes. Stop codons are used to signal the ribosome stop translation of the coding sequence and prone to reassignment to sense codons. Class-1 termination factors recognize stop codons and promote hydrolysis of the peptidyl-tRNA in ribosome (RF1, RF2 in prokaryotes and eRF1 in eukaryotes). The class-1 factor termination specificity is changed in non-standart codes organisms. Pyrrolysine and selenocysteine use dissimilar decoding strategies. The various non-standart code origin hypotheses are described. It was proposed that specificity alteration of the class-1 release factor was a starting point for stop codon reassignment.  相似文献   

The standard genetic code (SGC) has been extensively analyzed for the biological ramifications of its nonrandom structure. For instance, mismatch errors due to point mutation or mistranslation have an overall smaller effect on the amino acid polar requirement under the SGC than under random genetic codes (RGCs). A similar observation was recently made for frameshift errors, prompting the assertion that the SGC has been shaped by natural selection for frameshift-robustness—conservation of certain amino acid properties upon a frameshift mutation or translational frameshift. However, frameshift-robustness confers no benefit because frameshifts usually create premature stop codons that cause nonsense-mediated mRNA decay or production of nonfunctional truncated proteins. We here propose that the frameshift-robustness of the SGC is a byproduct of its mismatch-robustness. Of 564 amino acid properties considered, the SGC exhibits mismatch-robustness in 93–133 properties and frameshift-robustness in 55 properties, respectively, and that the latter is largely a subset of the former. For each of the 564 real and 564 randomly constructed fake properties of amino acids, there is a positive correlation between mismatch-robustness and frameshift-robustness across one million RGCs; this correlation arises because most amino acid changes resulting from a frameshift are also achievable by a mismatch error. Importantly, the SGC does not show significantly higher frameshift-robustness in any of the 55 properties than RGCs of comparable mismatch-robustness. These findings support that the frameshift-robustness of the SGC need not originate through direct selection and can instead be a site effect of its mismatch-robustness.  相似文献   

The causes and consequences of the nonrandom structure of the standard genetic code (SGC) have been of long-standing interest. A recent study reported that mutations in present-day protein-coding sequences are less likely to increase proteomic nitrogen and carbon uses under the SGC than under random genetic codes, concluding that the SGC has been selectively optimized for resource conservation. If true, this finding might offer important information on the environment in which the SGC and some of the earliest life forms evolved. However, we here show that the hypothesis of optimization of a genetic code for resource conservation is theoretically untenable. We discover that the aforementioned study estimated the expected mutational effect by inappropriately excluding mutations lowering resource consumptions and including mutations involving stop codons. After remedying these problems, we find no evidence that the SGC is optimized for nitrogen or carbon conservation.  相似文献   

Base sequences of φ × 174 and MS2 viruses genomes and of some mRNAs (Coat protein fd virus, Rabbit B. Globin, Rat Growth Hormone and Human Chorionic, Somatomammotropin) show a preferential use of some amino-acid codons. Based on this observation the reliability of three non-degenerate codes are analyzed. All of them display higher reliability than the standing genetic code and specially one formed by a set of non-directly related codons.The absence of these type of codes in Nature is discussed in terms of a balance between reliability and mutability of the genetic information, able to preserve species and maintain evolution  相似文献   

In the genetic code, the UGA codon has a dual function as it encodes selenocysteine (Sec) and serves as a stop signal. However, only the translation terminator function is used in gene annotation programs, resulting in misannotation of selenoprotein genes. Here, we applied two independent bioinformatics approaches to characterize a selenoprotein set in prokaryotic genomes. One method searched for selenoprotein genes by identifying RNA stem-loop structures, selenocysteine insertion sequence elements; the second approach identified Sec/Cys pairs in homologous sequences. These analyses identified all or almost all selenoproteins in completely sequenced bacterial and archaeal genomes and provided a view on the distribution and composition of prokaryotic selenoproteomes. In addition, lineage-specific and core selenoproteins were detected, which provided insights into the mechanisms of selenoprotein evolution. Characterization of selenoproteomes allows interpretation of other UGA codons in completed genomes of prokaryotes as terminators, addressing the UGA dual-function problem.  相似文献   

Summary We searched the complete 39,936 base DNA sequence of bacteriophage T7 for nonrandomness that might be attributed to natural selection. Codon usage in the 50 genes of T7 is nonrandom, both over the whole code and among groups of synonymous codons. There is a great excess of purineany base-pyrimidine (RNY) codons. Codon usage varies between genes, but from the pooled data for the whole genome (12,145 codons) certain putative selective constraints can be identified. Codon usage appears to be influenced by host tRNA abundance (particularly in highly expressed genes), tRNA-mRNA interactions (one such interaction being perhaps responsible for maintaining the excess of RNY codons) and a lack of short palindromes. This last constraint is probably due to selection against host restriction enzyme recognition sites; this is the first report of an effect of this kind on codon usage. Selection against susceptibility to mutational damage does not appear to have been involved.  相似文献   

It has been suggested that tRNA acceptor stems specify an operational RNA code for amino acids. In the last 20 years several attributes of the putative code have been elucidated for a small number of model organisms. To gain insight about the ensemble attributes of the code, we analyzed 4925 tRNA sequences from 102 bacterial and 21 archaeal species. Here, we used a classification and regression tree (CART) methodology, and we found that the degrees of degeneracy or specificity of the RNA codes in both Archaea and Bacteria differ from those of the genetic code. We found instances of taxon-specific alternative codes, i.e., identical acceptor stem determinants encrypting different amino acids in different species, as well as instances of ambiguity, i.e., identical acceptor stem determinants encrypting two or more amino acids in the same species. When partitioning the data by class of synthetase, the degree of code ambiguity was significantly reduced. In cryptographic terms, a plausible interpretation of this result is that the class distinction in synthetases is an essential part of the decryption rules for resolving the subset of RNA code ambiguities enciphered by identical acceptor stem determinants of tRNAs acylated by enzymes belonging to the two classes. In evolutionary terms, our findings lend support to the notion that in the pre-DNA world, interactions between tRNA acceptor stems and synthetases formed the basis for the distinction between the two classes; hence, ambiguities in the ancient RNA code were pivotal for the fixation of these enzymes in the genomes of ancestral prokaryotes.  相似文献   

We searched the complete 39,936 base DNA sequence of bacteriophage T7 for nonrandomness that might be attributed to natural selection. Codon usage in the 50 genes of T7 is nonrandom, both over the whole code and among groups of synonymous codons. There is a great excess of purine- any base-pyrimidine (RNY) codons. Codon usage varies between genes, but from the pooled data for the whole genome (12,145 codons) certain putative selective constraints can be identified. Codon usage appears to be influenced by host tRNA abundance (particularly in highly expressed genes), tRNA-mRNA (one such interaction being perhaps responsible for maintaining the excess of RNY codons) and a lack of short palindromes. This last constraint is probably due to selection against host restriction enzyme recognition sites; this is the first report of an effect of this kind on codon usage. Selection against susceptibility to mutational damage does not appear to have been involved.  相似文献   

Error detection and correction properties are fundamental for informative codes. Hamming's distance allows us to study this noise resistance. We present codes characterized by the resistance optimization to nonsense mutational effects. The calculation of the cumulated Hamming's distance allowing to determine the number of optimal codes and their structure can be detailed. The principle of these laws of optimization of resistance consists of choosing constituent codons connected by mutational neighbouring in such a way that random application of mutations on such a code minimize the occurrence of nonsense n-uplets or terminators. New coding symmetries are then described and screened using Galois's polynomials properties and Baudot's code. Such a study can be applied to any length of the codons. Here we present the principles of this optimization for the most simple doublet codes. Another constraint is discussed: the distribution of optimal subcodes for synonymity and the frequencies of utilization of the different codons.We compare these results to those of the present genetic code, and we observe that all coded amino acids (except the particular case of SER) are using optimal sub-codes of synonymity.This work suggests that the appearance of the genetic code was provoked by mutations while optimizing on several levels its resistance to their effects. Thus genetic coding would have been the best automata that could be produced in prebiotic conditions.  相似文献   

A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, A, C, G, U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G ≡ C and A = U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B3)N of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.  相似文献   

The genetic code is one of the most highly conserved characters in living organisms. Only a small number of genomes have evolved slight variations on the code, and these non-canonical codes are instrumental in understanding the selective pressures maintaining the code. Here, we describe a new case of a non-canonical genetic code from the oxymonad flagellate Streblomastix strix. We have sequenced four protein-coding genes from S.strix and found that the canonical stop codons TAA and TAG encode the amino acid glutamine. These codons are retained in S.strix mRNAs, and the legitimate termination codons of all genes examined were found to be TGA, supporting the prediction that this should be the only true stop codon in this genome. Only four other lineages of eukaryotes are known to have evolved non-canonical nuclear genetic codes, and our phylogenetic analyses of alpha-tubulin, beta-tubulin, elongation factor-1 alpha (EF-1 alpha), heat-shock protein 90 (HSP90), and small subunit rRNA all confirm that the variant code in S.strix evolved independently of any other known variant. The independent origin of each of these codes is particularly interesting because the code found in S.strix, where TAA and TAG encode glutamine, has evolved in three of the four other nuclear lineages with variant codes, but this code has never evolved in a prokaryote or a prokaryote-derived organelle. The distribution of non-canonical codes is probably the result of a combination of differences in translation termination, tRNAs, and tRNA synthetases, such that the eukaryotic machinery preferentially allows changes involving TAA and TAG.  相似文献   

Genetic code redundancy would yield, on the average, the assignment of three codons for each of the natural amino acids. The fact that this number is observed only for incorporating Ile and to stop RNA translation still waits for an overall explanation. Through a Structural Bioinformatics approach, the wealth of information stored in the Protein Data Bank has been used here to look for unambiguous clues to decipher the rationale of standard genetic code (SGC) in assigning from one to six different codons for amino acid translation. Leu and Arg, both protected from translational errors by six codons, offer the clearest clue by appearing as the most abundant amino acids in protein-protein and protein-nucleic acid interfaces. Other SGC hidden messages have been sought by analyzing, in a protein structure framework, the roles of over- and under-protected amino acids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号