By considering two important factors involved in the codon-anticodon interactions, the hydrogen bond number and the chemical type of bases, a codon array of the genetic code table as an increasing code scale of interaction energies of amino acids in proteins was obtained. Next, in order to consecutively obtain all codons from the codon AAC, a sum operation has been introduced in the set of codons. The group obtained over the set of codons is isomorphic to the group (Z64, +) of the integer module 64. On the Z64-algebra of the set of 64N codon sequences of length N, gene mutations are described by means of endomorphisms f:(Z64)N→(Z64)N. Endomorphisms and automorphisms helped us describe the gene mutation pathways. For instance, 77.7% mutations in 749 HIV protease gene sequences correspond to unique diagonal endomorphisms of the wild type strain HXB2. In particular, most of the reported mutations that confer drug resistance to the HIV protease gene correspond to diagonal automorphisms of the wild type. What is more, in the human beta-globin gene a similar situation appears where most of the single codon mutations correspond to automorphisms. Hence, in the analyses of molecular evolution process on the DNA sequence set of length N, the Z64-algebra will help us explain the quantitative relationships between genes.  相似文献   

Summary One-half of the twenty amino acids of the genetic code are just one mutational step away from the chain-terminator codons UAA, UAG, and UGA. It is postulated that somatic mutation to terminator is a hazard to which the organism has had to respond by adjusting certain proteins in the direction of fewer mutable residues. This view is supported by calculations based on the primary structure of five of the human hemoglobin chains. Each chain is scored for mutability to terminator in accord with the numbers and kinds of amino acids present. Among the adult chains, the most essential one, the alpha, has lowest mutability. The beta and delta follow, and in order of the presumed harm to the organism of a shortage of chain copies. Ante-natal chains tend to have higher mutabilities, supporting the view that cumulative mutational change in DNA can do little harm if the gene ceases to transcribe early in life. Two other predictions based on the supposition of effective selection against mutability to terminator are also met: chain length of polypeptides is negatively correlated with their scores for mutability to terminator, and examination of the recently determined sequence of beta messenger RNA shows preferential use of codons that are not readily mutable to terminator.Supported in part by the National Institutes of Health, Grant HL-16005  相似文献   

shCherbak VI 《Bio Systems》2003,70(3):187-209
The first information system emerged on the earth as primordial version of the genetic code and genetic texts. The natural appearance of arithmetic power in such a linguistic milieu is theoretically possible and practical for producing information systems of extremely high efficiency. In this case, the arithmetic symbols should be incorporated into an alphabet, i.e. the genetic code. A number is the fundamental arithmetic symbol produced by the system of numeration. If the system of numeration were detected inside the genetic code, it would be natural to expect that its purpose is arithmetic calculation e.g., for the sake of control, safety, and precise alteration of the genetic texts. The nucleons of amino acids and the bases of nucleic acids seem most suitable for embodiments of digits. These assumptions were used for the analyzing the genetic code.

The compressed, life-size, and split representation of the Escherichia coli and Euplotes octocarinatus code versions were considered simultaneously. An exact equilibration of the nucleon sums of the amino acid standard blocks and/or side chains was found repeatedly within specified sets of the genetic code. Moreover, the digital notations of the balanced sums acquired, in decimal representation, the unique form 111, 222, …, 999. This form is a consequence of the criterion of divisibility by 037. The criterion could simplify some computing mechanism of a cell if any and facilitate its computational procedure. The cooperative symmetry of the genetic code demonstrates that possibly a zero was invented and used by this mechanism. Such organization of the genetic code could be explained by activities of some hypothetical molecular organelles working as natural biocomputers of digital genetic texts.

It is well known that if mutation replaces an amino acid, the change of hydrophobicity is generally weak, while that of size is strong. The antisymmetrical correlation between the amino acid size and the degeneracy number is known as well. It is shown that these and some other familiar properties may be a physicochemical effect of arithmetic inside the genetic code.

The “frozen accident” model, giving unlimited freedom to the mapping function, could optimally support the appearance of both arithmetic symbols and physicochemical protection inside the genetic code.  相似文献   

The genetic code has been regarded as arbitrary in the sense that the codon-amino acid assignments could be different than they actually are. This general idea has been spelled out differently by previous, often rather implicit accounts of arbitrariness. They have drawn on the frozen accident theory, on evolutionary contingency, on alternative causal pathways, and on the absence of direct stereochemical interactions between codons and amino acids. It has also been suggested that the arbitrariness of the genetic code justifies attributing semantic information to macromolecules, notably to DNA. I argue that these accounts of arbitrariness are unsatisfactory. I propose that the code is arbitrary in the sense of Jacques Monod's concept of chemical arbitrariness: the genetic code is arbitrary in that any codon requires certain chemical and structural properties to specify a particular amino acid, but these properties are not required in virtue of a principle of chemistry. This notion of arbitrariness is compatible with several recent hypotheses about code evolution. I maintain that the code's chemical arbitrariness is neither sufficient nor necessary for attributing semantic information to nucleic acids.  相似文献   

We used simulated evolution to study the adaptability level of the canonical genetic code. An adapted genetic algorithm (GA) searches for optimal hypothetical codes. Adaptability is measured as the average variation of the hydrophobicity that the encoded amino acids undergo when errors or mutations are present in the codons of the hypothetical codes. Different types of mutations and point mutation rates that depend on codon base number are considered in this study. Previous works have used statistical approaches based on randomly generated alternative codes or have used local search techniques to determine an optimum value. In this work, we emphasize what can be concluded from the use of simulated evolution considering the results of previous works. The GA provides more information about the difficulty of the evolution of codes, without contradicting previous studies using statistical or engineering approaches. The GA also shows that, within the coevolution theory, the third base clearly improves the adaptability of the current genetic code.  相似文献   

It is known that different codons may be unified into larger groups related to the hierarchical structure, approximate hidden symmetries, and evolutionary origin of the universal genetic code. Using a simplified evolutionary motivated two-letter version of genetic code, the general principles of the most stable coding are discussed. By the complete enumeration in such a reduced code it is strictly proved that the maximum stability with respect to point mutations and shifts in the reading frame needs the fixation of the middle letters within codons in groups with different physico-chemical properties, thus, explaining a key feature of the universal genetic code. The translational stability of the genetic code is studied by the mapping of code onto de Bruijn graph providing both the compact visual representation of mutual relationships between different codons as well as between codons and protein coding DNA sequence and a powerful tool for the investigation of stability of protein coding. Then, the results are extended to four-letter codes. As is shown, the universal genetic code obeys mainly the principles of optimal coding. These results demonstrate the hierarchical character of optimization of universal genetic code with strictly optimal coding being evolved at the earliest stages of molecular evolution. Finally, the universal genetic code is compared with the other natural variants of genetic codes.  相似文献   

We have investigated the origin of genes, the genetic code, proteins and life using six indices (hydropathy, α-helix, β-sheet and β-turn formabilities, acidic amino acid content and basic amino acid content) necessary for appropriate three-dimensional structure formation of globular proteins. From the analysis of microbial genes, we have concluded that newly-born genes are products of nonstop frames (NSF) on antisense strands of microbial GC-rich genes [GC-NSF(a)] and from SNS repeating sequences [(SNS)n] similar to the GC-NSF(a) (S and N mean G or C and either of four bases, respectively). We have also proposed that the universal genetic code used by most organisms on the earth presently could be derived from a GNC-SNS primitive genetic code. We have further presented the [GADV]-protein world hypothesis of the origin of life as well as a hypothesis of protein production, suggesting that proteins were originally produced by random peptide formation of amino acids restricted in specific amino acid compositions termed as GNC-, SNS and GC-NSF(a)-0th order structures of proteins. The [GADV]-protein world hypothesis is primarily derived from the GNC-primitive genetic code hypothesis. It is also expected that basic properties of extant genes and proteins could be revealed by considerations based on the scenario with four stages This review is a modified English version of the paper, which was written in Japanese and published inViva Origino 2001 29 66–85.  相似文献   

The standard genetic code is known to be much more efficient in minimizing adverse effects of misreading errors and one-point mutations in comparison with a random code having the same structure, i.e. the same number of codons coding for each particular amino acid. We study the inverse problem, how the code structure affects the optimal physico-chemical parameters of amino acids ensuring the highest stability of the genetic code. It is shown that the choice of two or more amino acids with given properties determines unambiguously all the others. In this sense the code structure determines strictly the optimal parameters of amino acids or the corresponding scales may be derived directly from the genetic code. In the code with the structure of the standard genetic code the resulting values for hydrophobicity obtained in the scheme “leave one out” and in the scheme with fixed maximum and minimum parameters correlate significantly with the natural scale. The comparison of the optimal and natural parameters allows assessing relative impact of physico-chemical and error-minimization factors during evolution of the genetic code. As the resulting optimal scale depends on the choice of amino acids with given parameters, the technique can also be applied to testing various scenarios of the code evolution with increasing number of codified amino acids. Our results indicate the co-evolution of the genetic code and physico-chemical properties of recruited amino acids.  相似文献   

Reprogramming of the standard genetic code to include non-canonical amino acids (ncAAs) opens new prospects for medicine, industry, and biotechnology. There are several methods of code engineering, which allow us for storing new genetic information in DNA sequences and producing proteins with new properties. Here, we provided a theoretical background for the optimal genetic code expansion, which may find application in the experimental design of the genetic code. We assumed that the expanded genetic code includes both canonical and non-canonical information stored in 64 classical codons. What is more, the new coding system is robust to point mutations and minimizes the possibility of reversion from the new to old information. In order to find such codes, we applied graph theory to analyze the properties of optimal codon sets. We presented the formal procedure in finding the optimal codes with various number of vacant codons that could be assigned to new amino acids. Finally, we discussed the optimal number of the newly incorporated ncAAs and also the optimal size of codon groups that can be assigned to ncAAs.  相似文献   

A model for the formation of the genetic code is presented where protein synthesis is directed initially by tRNA dimers. Proteins that are resistant to degradation and efficient RNA-binders protect the RNAs. Replication becomes elongational producing poly-tRNAs from which the mRNAs and ribosomes are derived. Attributions are successively fixed to tRNAs paired through the perfect palindromic anticodons, with the same bases at the extremities (5′ANA: UNU 3′; GNG: CNC; principal dinucleotides, pDiN). The 5′ degeneracy is then developed. The first pairs to be encoded correspond to the hydropathy correlation outliers (Gly-CC: Pro-GG and Ser-GA: Ser-CU) and to the sector of homogeneous pDiN, composed by two pyrimidines or two purines. These amino acids are preferred in the N-ends of proteins, stabilizers of proteins against catabolism and strong RNA-binders. The next pairs complete the sector of homogeneous pDiN (Asp, Glu-UC: Leu-AG and Asn, Lys-UU: Phe-AA). This set of nine amino acids forms the protein cores with the predominant aperiodic conformation. Next enter the pairs with mixed pDiN (one purine and one pyrimidine), the RY attributions composing the protein N-ends and the YR attributions the C-ends. The last pair contains the main punctuation signs (Ile, Met, iMet-AU: Tyr, Stop-UA). The model indicates that genetic information emerged during the process of formation of the coding/decoding system and that genes were defined by the proteins. Stable proteins constructed the nucleoprotein system by binding to the RNAs that produced them. In this circular rationale, genes are memories in a metabolic system for production of proteins that stabilize it. The simplicity and the highly deterministic character of the process suggest that the Last Universal Common Ancestor populations could be composed, in early stages, of lineages bearing similar genetic codes.  相似文献   

Two ideas have essentially been used to explain the origin of the genetic code: Crick's frozen accident and Woese's amino acid-codon specific chemical interaction. Whatever the origin and codon-amino acid correlation, it is difficult to imagine the sudden appearance of the genetic code in its present form of 64 codons coding for 20 amino acids without appealing to some evolutionary process. On the contrary, it is more reasonable to assume that it evolved from a much simpler initial state in which a few triplets were coding for each of a small number of amino acids. Analysis of genetic code through information theory and the metabolism of pyrimidine biosynthesis provide evidence that suggests that the genetic code could have begun in an RNA world with the two letters A and U grouped in eight triplets coding for seven amino acids and one stop signal. This code could have progressively evolved by making gradual use of letters G and C to end with 64 triplets coding for 20 amino acids and three stop signals. According to proposed evidence, DNA could have appeared after the four-letter structure was already achieved. In the newborn DNA world, T substituted U to get higher physicochemical and genetic stability.  相似文献   

Error detection and correction properties are fundamental for informative codes. Hamming's distance allows us to study this noise resistance. We present codes characterized by the resistance optimization to nonsense mutational effects. The calculation of the cumulated Hamming's distance allowing to determine the number of optimal codes and their structure can be detailed. The principle of these laws of optimization of resistance consists of choosing constituent codons connected by mutational neighbouring in such a way that random application of mutations on such a code minimize the occurrence of nonsense n-uplets or terminators. New coding symmetries are then described and screened using Galois's polynomials properties and Baudot's code. Such a study can be applied to any length of the codons. Here we present the principles of this optimization for the most simple doublet codes. Another constraint is discussed: the distribution of optimal subcodes for synonymity and the frequencies of utilization of the different codons.We compare these results to those of the present genetic code, and we observe that all coded amino acids (except the particular case of SER) are using optimal sub-codes of synonymity.This work suggests that the appearance of the genetic code was provoked by mutations while optimizing on several levels its resistance to their effects. Thus genetic coding would have been the best automata that could be produced in prebiotic conditions.  相似文献   

We describe a compact representation of the genetic code that factorizes the table in quartets. It represents a “least grammar” for the genetic language. It is justified by the Klein-4 group structure of RNA bases and codon doublets. The matrix of the outer product between the column-vector of bases and the corresponding row-vector VT = (C G U A), considered as signal vectors, has a block structure consisting of the four cosets of the K × K group of base transformations acting on doublet AA. This matrix, translated into weak/strong (W/S) and purine/pyrimidine (R/Y) nucleotide classes, leads to a code table with mixed and unmixed families in separate regions. A basic difference between them is the non-commuting (R/Y) doublets: AC/CA, GU/UG. We describe the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24, employing modulo multiplication groups. We illustrate binary sub-codes characterizing mutations in the quartets. We introduce a decision-tree to predict the mode of tRNA recognition corresponding to each codon, and compare our result with related findings by Jestin and Soulé [Jestin, J.-L., Soulé, C., 2007. Symmetries by base substitutions in the genetic code predict 2′ or 3′ aminoacylation of tRNAs. J. Theor. Biol. 247, 391–394], and the rearrangements of the table by Delarue [Delarue, M., 2007. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13, 161–169] and Rodin and Rodin [Rodin, S.N., Rodin, A.S., 2008. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity 100, 341–355], respectively.  相似文献   

New insights into the arrangement of the genetic code table, based on the analysis of the physico-chemical properties of its molecular constituents, are reported in this paper. It will be demonstrated that the code has a twofold symmetry that is not apparent from the conventional code table, but becomes apparent when the codon-anticodon energies are listed for each triplet. The evolutionary development of the current code based on single base replacement mutations (transitions) from an 'iso-energetic' degenerated subset of 16 of the 64 codons is discussed. The energy landscape of all 64 codons is presented. A detailed analysis of the energy changes due to mutations in the 3rd, 1st or 2nd position of a codon reveals that the modern genetic code is highly robust. Changes come in small discrete steps that can be quantified in relation to the thermal noise of the system. The relation of the individual codon to its neighbours in the rearranged codon table can be completely understood based on thermodynamic considerations.  相似文献   

Summary We have calculated the average effect of changing a codon by a single base for all possible single-base changes in the genetic code and for changes in the first, second, and third codon positions separately. Such values were calculated for an amino acid's polar requirement, hydropathy, molecular volume, and isoelectric point. For each attribute the average effect of single-base changes was also calculated for a large number of randomly generated codes that retained the same level of redundancy as the natural code. Amino acids whose codons differed by a single base in the first and third codon positions were very similar with respect to polar requirement and hydropathy. The major differences between amino acids were specified by the second codon position. Codons with U in the second position are hydrophobic, whereas most codons with A in the second position are hydrophilic. This accounts for the observation of complementary hydropathy. Single-base changes in the natural code had a smaller average effect on polar requirement than all but 0.02% of random codes. This result is most easily explained by selection to minimize deleterious effects of translation errors during the early evolution of the code.  相似文献   

Summary The use of triplet code words inE. coli,X174, MS2, and rabbit globin was examined. A significant deficiency of purines in the third position of fourfold degenerate codons was noted, although its significance is not understood. There has been no consistent selection against uracil in pyrimidine restricted codons. For many amino acids the choice between code words appears random, while for arginine, isoleucine, and probably glycine, distinct biases exist which can be explained in terms of tRNA availability.  相似文献   

