首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
2.
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D, A, C, G, U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G ≡ C and A = U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B3)N of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.  相似文献   

3.
Explaining the apparent non-random codon distribution and the nature and number of amino acids in the ‘standard’ genetic code remains a challenge, despite the various hypotheses so far proposed. In this paper we propose a simple new hypothesis for code evolution involving a progression from singlet to doublet to triplet codons with a reading mechanism that moves three bases each step. We suggest that triplet codons gradually evolved from two types of ambiguous doublet codons, those in which the first two bases of each three-base window were read (‘prefix’ codons) and those in which the last two bases of each window were read (‘suffix’ codons). This hypothesis explains multiple features of the genetic code such as the origin of the pattern of four-fold degenerate and two-fold degenerate triplet codons, the origin of its error minimising properties, and why there are only 20 amino acids. Reviewing Editor: Dr. Laura Landweber An erratum to this article can be found at .  相似文献   

4.
The frequencies of A, C, G, and T in mitochondrial DNA vary among species due to unequal rates of mutation between the bases. The frequencies of bases at fourfold degenerate sites respond directly to mutation pressure. At first and second positions, selection reduces the degree of frequency variation. Using a simple evolutionary model, we show that first position sites are less constrained by selection than second position sites and, therefore, that the frequencies of bases at first position are more responsive to mutation pressure than those at second position. We define a measure of distance between amino acids that is dependent on eight measured physical properties and a similarity measure that is the inverse of this distance. Columns 1, 2, 3, and 4 of the genetic code correspond to codons with U, C, A, and G in their second position, respectively. The similarity of amino acids in the four columns decreases systematically from column 1 to column 2 to column 3 to column 4. We then show that the responsiveness of first position bases to mutation pressure is dependent on the second position base and follows the same decreasing trend through the four columns. Again, this shows the correlation between physical properties and responsiveness. We determine a proximity measure for each amino acid, which is the average similarity between an amino acid and all others that are accessible via single point mutations in the mitochondrial genetic code structure. We also define a responsiveness for each amino acid, which measures how rapidly an amino acid frequency changes as a result of mutation pressure acting on the base frequencies. We show that there is a strong correlation between responsiveness and proximity, and that both these quantities are also correlated with the mutability of amino acids estimated from the mtREV substitution rate matrix. We also consider the variation of base frequencies between strands and between genes on a strand. These trends are consistent with the patterns expected from analysis of the variation among genomes. [Reviewing Editor: Dr. David Pollock]  相似文献   

5.
By considering two important factors involved in the codon-anticodon interactions, the hydrogen bond number and the chemical type of bases, a codon array of the genetic code table as an increasing code scale of interaction energies of amino acids in proteins was obtained. Next, in order to consecutively obtain all codons from the codon AAC, a sum operation has been introduced in the set of codons. The group obtained over the set of codons is isomorphic to the group (Z64, +) of the integer module 64. On the Z64-algebra of the set of 64N codon sequences of length N, gene mutations are described by means of endomorphisms f:(Z64)N→(Z64)N. Endomorphisms and automorphisms helped us describe the gene mutation pathways. For instance, 77.7% mutations in 749 HIV protease gene sequences correspond to unique diagonal endomorphisms of the wild type strain HXB2. In particular, most of the reported mutations that confer drug resistance to the HIV protease gene correspond to diagonal automorphisms of the wild type. What is more, in the human beta-globin gene a similar situation appears where most of the single codon mutations correspond to automorphisms. Hence, in the analyses of molecular evolution process on the DNA sequence set of length N, the Z64-algebra will help us explain the quantitative relationships between genes.  相似文献   

6.
It is known that different codons may be unified into larger groups related to the hierarchical structure, approximate hidden symmetries, and evolutionary origin of the universal genetic code. Using a simplified evolutionary motivated two-letter version of genetic code, the general principles of the most stable coding are discussed. By the complete enumeration in such a reduced code it is strictly proved that the maximum stability with respect to point mutations and shifts in the reading frame needs the fixation of the middle letters within codons in groups with different physico-chemical properties, thus, explaining a key feature of the universal genetic code. The translational stability of the genetic code is studied by the mapping of code onto de Bruijn graph providing both the compact visual representation of mutual relationships between different codons as well as between codons and protein coding DNA sequence and a powerful tool for the investigation of stability of protein coding. Then, the results are extended to four-letter codes. As is shown, the universal genetic code obeys mainly the principles of optimal coding. These results demonstrate the hierarchical character of optimization of universal genetic code with strictly optimal coding being evolved at the earliest stages of molecular evolution. Finally, the universal genetic code is compared with the other natural variants of genetic codes.  相似文献   

7.
Information theoretic analysis of genetic languages indicates that the naturally occurring 20 amino acids and the triplet genetic code arose by duplication of 10 amino acids of class-II and a doublet genetic code having codons NNY and anticodons GNN. Evidence for this scenario is presented based on the properties of aminoacyl-tRNA synthetases, amino acids and nucleotide bases.  相似文献   

8.
9.
Error detection and correction properties are fundamental for informative codes. Hamming's distance allows us to study this noise resistance. We present codes characterized by the resistance optimization to nonsense mutational effects. The calculation of the cumulated Hamming's distance allowing to determine the number of optimal codes and their structure can be detailed. The principle of these laws of optimization of resistance consists of choosing constituent codons connected by mutational neighbouring in such a way that random application of mutations on such a code minimize the occurrence of nonsense n-uplets or terminators. New coding symmetries are then described and screened using Galois's polynomials properties and Baudot's code. Such a study can be applied to any length of the codons. Here we present the principles of this optimization for the most simple doublet codes. Another constraint is discussed: the distribution of optimal subcodes for synonymity and the frequencies of utilization of the different codons.We compare these results to those of the present genetic code, and we observe that all coded amino acids (except the particular case of SER) are using optimal sub-codes of synonymity.This work suggests that the appearance of the genetic code was provoked by mutations while optimizing on several levels its resistance to their effects. Thus genetic coding would have been the best automata that could be produced in prebiotic conditions.  相似文献   

10.
遗传密码和DNA序列的高维空间数字编码   总被引:13,自引:7,他引:6  
二进制数字化编码是信息科学最基本的编码方式。用0(00)、1(01)、2(10)和3(11)4个数码对4种碱基(C、T、A、G)进行二进制数字编码,共有24种可能的编码组合,其中8种满足碱基到补法则,它们是拓扑等价的。按碱基分子量大小排列的编码格式:0123/CTAG是最理想的编码格式。用二进制数对DNA的字符序列进行编码,有以下优点:1)压缩信息冗余度,提高编码效率;2)可以对碱基的结构、功能基  相似文献   

11.
Since the genetic code first was determined, many have claimed that it is organized adaptively, so as to assign similar codons to similar amino acids. This claim has proved difficult to establish due to the absence of relevant comparative data on alternative primordial codes and of objective measures of amino acid exchangeability. Here we use a recently developed measure of exchangeability to evaluate a null hypothesis and two alternative hypotheses about the adaptiveness of the genetic code. The null hypothesis that there is no tendency for exchangeable amino acids to be assigned to similar codons can be excluded here as expected from earlier work. The first alternative hypothesis is that any such correlation between codon distance and amino acid distance is due to incremental mechanisms of code evolution, and not to adaptation to reduce deleterious effects of future mutations. More specifically, new codon assignments that occur by ambiguity reduction or by codon capture will tend to give rise to correlations, whether due to the condition of amino acid ambiguity, or to the condition of similarity between a new tRNA synthetase (or tRNA) and its parent. The second alternative hypothesis, the adaptive hypothesis, then may be defined as an excess relative to what may be expected given the incremental nature of evolution, reflecting true adaptation for robustness rather than an incidental effect. The results reported here indicate that most of the nonrandomness in the amino acids to codon assignments can be explained by incremental code evolution, with a small residue of orderliness that may reflect code adaptation.  相似文献   

12.
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides-adenine, thymine, guanine and cytosine-according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.  相似文献   

13.
Summary We lay new foundations to the hypothesis that the genetic code is adapted to evolutionary retention of information in the antisense strands of natural DNA/RNA sequences. In particular, we show that the genetic code exhibits, beyond the neutral replacement patterns of amino acid substitutions, optimal properties by favoring simultaneous evolution of proteins encoded in DNA/RNA sense-antisense strands. This is borne out in the sense-antisense transformations of the codons of every amino acid which target amino acids physicochemically similar to each other. Moreover, silent mutations in the sense strand generate conservative ones in its antisense counterpart and vice versa. Coevolution of proteins coded by complementary strands is shown to be a definite possibility, a result which does not depend on any physical interaction between the coevolving proteins. Likewise, the degree to which the present genetic code is dedicated to evolutionary sense-antisense tolerance is demonstrated by comparison with many randomized codes. Double-strand coding is quantified from an information-theoretical point of view.  相似文献   

14.
The aminoacyl-tRNA synthetases exist as two enzyme families which were apparently generated by divergent evolution from two primordial synthetases. The two classes of enzymes exhibit intriguing familial relationships, in that they are distributed nonrandomly within the codon-amino acid matrix of the genetic code. For example, all XCX codons code for amino acids handled by class II synthetases, and all but one of the XUX codons code for amino acids handled by class I synthetases. One interpretation of these patterns is that the synthetases coevolved with the genetic code. The more likely explanation, however, is that the synthetases evolved in the context of an already-established genetic code—a code which developed earlier in an RNA world. The rules which governed the development of the genetic code, and led to certain patterns in the coding catalog between codons and amino acids, would also have governed the subsequent evolution of the synthetases in the context of a fixed code, leading to patterns in synthetase distribution such as those observed. These rules are (1) conservative evolution of amino acid and adapter binding sites and (2) minimization of the disruptive effects on protein structure caused by codon meaning changes.  相似文献   

15.
We have previously proposed an SNS hypothesis on the origin of the genetic code (Ikehara and Yoshida 1998). The hypothesis predicts that the universal genetic code originated from the SNS code composed of 16 codons and 10 amino acids (S and N mean G or C and either of four bases, respectively). But, it must have been very difficult to create the SNS code at one stroke in the beginning. Therefore, we searched for a simpler code than the SNS code, which could still encode water-soluble globular proteins with appropriate three-dimensional structures at a high probability using four conditions for globular protein formation (hydropathy, α-helix, β-sheet, and β-turn formations). Four amino acids (Gly [G], Ala [A], Asp [D], and Val [V]) encoded by the GNC code satisfied the four structural conditions well, but other codes in rows and columns in the universal genetic code table do not, except for the GNG code, a slightly modified form of the GNC code. Three three-amino acid systems ([D], Leu and Tyr; [D], Tyr and Met; Glu, Pro and Ile) also satisfied the above four conditions. But, some amino acids in the three systems are far more complex than those encoded by the GNC code. In addition, the amino acids in the three-amino acid systems are scattered in the universal genetic code table. Thus, we concluded that the universal genetic code originated not from a three-amino acid system but from a four-amino acid system, the GNC code encoding [GADV]-proteins, as the most primitive genetic code. Received: 11 June 2001 / Accepted: 11 October 2001  相似文献   

16.
Two ideas have essentially been used to explain the origin of the genetic code: Crick's frozen accident and Woese's amino acid-codon specific chemical interaction. Whatever the origin and codon-amino acid correlation, it is difficult to imagine the sudden appearance of the genetic code in its present form of 64 codons coding for 20 amino acids without appealing to some evolutionary process. On the contrary, it is more reasonable to assume that it evolved from a much simpler initial state in which a few triplets were coding for each of a small number of amino acids. Analysis of genetic code through information theory and the metabolism of pyrimidine biosynthesis provide evidence that suggests that the genetic code could have begun in an RNA world with the two letters A and U grouped in eight triplets coding for seven amino acids and one stop signal. This code could have progressively evolved by making gradual use of letters G and C to end with 64 triplets coding for 20 amino acids and three stop signals. According to proposed evidence, DNA could have appeared after the four-letter structure was already achieved. In the newborn DNA world, T substituted U to get higher physicochemical and genetic stability.  相似文献   

17.
New insights into the arrangement of the genetic code table, based on the analysis of the physico-chemical properties of its molecular constituents, are reported in this paper. It will be demonstrated that the code has a twofold symmetry that is not apparent from the conventional code table, but becomes apparent when the codon-anticodon energies are listed for each triplet. The evolutionary development of the current code based on single base replacement mutations (transitions) from an 'iso-energetic' degenerated subset of 16 of the 64 codons is discussed. The energy landscape of all 64 codons is presented. A detailed analysis of the energy changes due to mutations in the 3rd, 1st or 2nd position of a codon reveals that the modern genetic code is highly robust. Changes come in small discrete steps that can be quantified in relation to the thermal noise of the system. The relation of the individual codon to its neighbours in the rearranged codon table can be completely understood based on thermodynamic considerations.  相似文献   

18.
We describe a compact representation of the genetic code that factorizes the table in quartets. It represents a “least grammar” for the genetic language. It is justified by the Klein-4 group structure of RNA bases and codon doublets. The matrix of the outer product between the column-vector of bases and the corresponding row-vector VT = (C G U A), considered as signal vectors, has a block structure consisting of the four cosets of the K × K group of base transformations acting on doublet AA. This matrix, translated into weak/strong (W/S) and purine/pyrimidine (R/Y) nucleotide classes, leads to a code table with mixed and unmixed families in separate regions. A basic difference between them is the non-commuting (R/Y) doublets: AC/CA, GU/UG. We describe the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24, employing modulo multiplication groups. We illustrate binary sub-codes characterizing mutations in the quartets. We introduce a decision-tree to predict the mode of tRNA recognition corresponding to each codon, and compare our result with related findings by Jestin and Soulé [Jestin, J.-L., Soulé, C., 2007. Symmetries by base substitutions in the genetic code predict 2′ or 3′ aminoacylation of tRNAs. J. Theor. Biol. 247, 391–394], and the rearrangements of the table by Delarue [Delarue, M., 2007. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13, 161–169] and Rodin and Rodin [Rodin, S.N., Rodin, A.S., 2008. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity 100, 341–355], respectively.  相似文献   

19.
Given a genetic code formed by 64 codons, we calculate the number of partitions of the set of encoding amino acid codons. When there are 0-3 stop codons, the results indicate that the most probable number of partitions is 19 and/or 20. Then, assuming that in the early evolution the genetic code could have had random variations, we suggest that the most probable number of partitions of the set of encoding amino acid codons determined the actual number 20 of standard amino acids.  相似文献   

20.
Reprogramming of the standard genetic code to include non-canonical amino acids (ncAAs) opens new prospects for medicine, industry, and biotechnology. There are several methods of code engineering, which allow us for storing new genetic information in DNA sequences and producing proteins with new properties. Here, we provided a theoretical background for the optimal genetic code expansion, which may find application in the experimental design of the genetic code. We assumed that the expanded genetic code includes both canonical and non-canonical information stored in 64 classical codons. What is more, the new coding system is robust to point mutations and minimizes the possibility of reversion from the new to old information. In order to find such codes, we applied graph theory to analyze the properties of optimal codon sets. We presented the formal procedure in finding the optimal codes with various number of vacant codons that could be assigned to new amino acids. Finally, we discussed the optimal number of the newly incorporated ncAAs and also the optimal size of codon groups that can be assigned to ncAAs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号