首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
All established methods for detecting positive selection at the molecular level rely on comparisons between nucleotide sequences. An exceptional method that purports to detect selection on the basis of a single genomic sequence has recently been proposed. This method uses a measure called "codon volatility," defined for each codon as the ratio between the number of nonsynonymous codons that differ from the codon under study at a single nucleotide position and the number of sense codons that differ from the codon under study at a single nucleotide position. Here, we examine various properties of codon volatility and its derivatives and use simulation of evolutionary processes to determine whether they can be used to detect selective pressures. Codons for only four amino acids (glycine, leucine, arginine, and serine) show any variation in codon volatility. Thus, codon volatility is mainly a proxy for amino acid usage, rather than for codon usage, with 65% of all synonymous changes and 27% of all nonsynonymous changes being undetectable by this measure. Genes identified by the volatility method as being subject to positive selection tend to have idiosyncratic amino acid compositions (e.g., they are glycine rich or arginine poor). An additional property of codon volatility is the near zero variance of its mean expectation, which translates into overestimated statistical significance estimates, especially in the absence of corrections for multiple comparisons. A comparison with measures of selection inferred through comparative methodology reveals no relationship between the results of the two methods. Finally, we show that codon volatility can increase in the absence of positive Darwinian selection; that is, increased codon volatility is not indicative of positive selection.  相似文献   

2.
The codon table for the canonical genetic code can be rearranged in such a way that the code is divided into four quarters and two halves according to the variability of their GC and purine contents, respectively. For prokaryotic genomes, when the genomic GC content increases, their amino acid contents tend to be restricted to the GC-rich quarter and the purine-content insensitive half, where all codons are fourfold degenerate and relatively mutation-tolerant. Conversely, when the genomic GC content decreases, most of the codons retract to the AUrich quarter and the purine-content sensitive half; most of the codons not only remain encoding physicochemically diversified amino acids but also vary when transversion (between purine and pyrimidine) happens. Amino acids with sixfolddegenerate codons are distributed into all four quarters and across the two halves; their fourfold-degenerate codons are all partitioned into the purine-insensitive half in favorite of robustness against mutations. The features manifested in the rearranged codon table explain most of the intrinsic relationship between protein coding sequences (the informational content) and amino acid compositions (the functional content). The renovated codon table is useful in predicting abundant amino acids and positioning the amino acids with related or distinct physicochemical properties.  相似文献   

3.
Summary Using many more cytochrome sequences than previously available, we have confirmed: 1, the eukaryotic cytochromes c diverged from a common ancestor; 2, the ancestral eukaryotic cytochrome c was not greatly different in character from those present today; 3, fixations are non-randomly distributed among the codons, there being evidence for at least four classes of variability; 4, there are similar classes of variability when the data are considered according to the nucleotide position within the codon; 5, the number of covarions (concomitantly variable codons) in mammalian cytochrome c genes is about 12 and the same value has been obtained for dicotyledonous plants as well; 6, all of the hyper- and most highly variable codons are for external residues, nearly 60 per cent of the invariable codons are for internal residues and nearly half of the codons for internal residues are invariable; 7, the first nucleotide position of a codon is more likely and the second position less likely to fix mutations than would be expected on the basis of the number of ways that alternative amino acids can be reached; 8, the character of nucleotide replacements is enormously non-random, with GA interchanges representing 42% of those observed in the first nucleotide position, but the observation does not stem from a bias in the DNA strand receiving the mutation, nor from the presence of a compositional equilibrium, nor from a bias in the frequency with which different nucleotides mutate, but rather from a bias in the acceptability of an alternative nucleotide as circumscribed by the functional acceptability of the new amino acid encoded; and 9, the unit evolutionary period is approximately 150 million years/observable (amino acid changing) nucleotide replacement/cytochrome c covarion in two diverging lines.Wherever non-randomness has been observed, it has always been consistent with the consideration that an alternative amino acid at any location is more likely to be acceptable the more closely it resembles the present amino acid in its physico-chemical properties.Finally, in no case did the a priori assumption of a biologically realistic phylogeny lead to any observations or conclusions that were in any way significantly different from those obtained when the phylogeny was based solely upon the sequences, proving that the earlier results were not a consequence of some internal circularity.  相似文献   

4.
A periodic table of codons has been designed where the codons are in regular locations. The table has four fields (16 places in each) one with each of the four nucleotides (A, U, G, C) in the central codon position. Thus, AAA (lysine), UUU (phenylalanine), GGG (glycine), and CCC (proline) were placed into the corners of the fields as the main codons (and amino acids) of the fields. They were connected to each other by six axes. The resulting nucleic acid periodic table showed perfect axial symmetry for codons. The corresponding amino acid table also displaced periodicity regarding the biochemical properties (charge and hydropathy) of the 20 amino acids and the position of the stop signals. The table emphasizes the importance of the central nucleotide in the codons and predicts that purines control the charge while pyrimidines determine the polarity of the amino acids. This prediction was experimentally tested.  相似文献   

5.
Summary The mRNA sequences of beta hemoglobin for human, mouse and rabbit were examined. Observations included the following: (1) there is a significant bias against the use of codons only one nucleotide different from terminating codons; (2) less than 4% of the codons end in adenine; (3), guanine is the most common third position nucleotide but it never follows a second position cytosine; (4) nearest neighbor (doublet) nucleotides are non-random with the greatest contributor to non-randomness being the third position suggesting that codon choice for a given amino acid rather than a choice among amino acids is the more important contributor; (5) the CG dinucleotide is even rarer in positions other than the first and second of the codon than it is in those two, suggesting that the need for arginine has in fact elevated the CG frequency in those positions; (6) 77 per cent of the nucleotides are unsubstituted among these three taxa, which could be a sampling effect, but there is strong evidence that about one-third of them are in fact unsubstitutable because of selective constrainsts; (7) the two longest stretches of unsubstituted nucleotides (32 and 35 consecutive nucleotides) surround the points of the two non-coding insertion sequences; (8) over half the substitutions occur in the third nucleotide position of the codons; (9) silent (non-amino acid changing) substitutions occur at about four times the rate of non-silent substitutions on the basis of their relative opportunity to occur; (10) silent substitutions occur slightly but significantly more often in codons that also have non-silent substitutions than independence of the two events would predict; (11) substitutions occur in adjacent nucleotides significantly more often than chance would predict; (12) among four-fold degenerate codons, third position transitions (principally cytosine-uracil interchanges) outnumber transversions by two to one although the reverse ratio would be expected.The analysis of these messengers provided an opportunity to evaluate the random evolutionary hit (REH) theory. I observed that: (1) the REH theory is premised upon five assumptions, all false; (2) the theory leads to contradictory estimates of the number of varions; (3) the REH values are underestimates; (4) the REH values frequently violate the triangle inequality; (5) the REH values, contrary to claim, are not concordant either with accepted point mutations (PAMs) or augmented distances; (6) the REH values are more likely than values uncorrected for multiple substitutions to give incorrect phylogenies; and (7) the REH values have statistical problems probably associated with a large variance in its fundamental parameter, re. From this I conclude that REH theory is not suitable for its intended purpose of estimating from protein sequences of nucleotide substitutions since the common ancestor of two gene products.  相似文献   

6.
In the analysis of protein-coding nucleotide sequences, the ratio of the number of nonsynonymous substitutions to that of synonymous substitutions (d(N)/d(S)) is used as an indicator for the direction and magnitude of natural selection operating at the amino acid sequence level. The d(S) and d(N) values are estimated based on the comparison of homologous codons, which are often identified by converting (reverse-translating) aligned amino acid sequences into codon sequences. In this method, however, homologous codons may be mis-identified when frame-shifts occurred or amino acid sequences were mis-aligned, which may lead to overestimation of the d(N)/d(S) ratio. Here the effect of reverse-translating aligned amino acid sequences on the estimation of d(N)/d(S) ratio was examined through a large-scale analysis of protein-coding nucleotide sequences from vertebrate species. Apparently, 1-9% of codon sites that were identified as homologous with reverse-translation contained non-homologous codons, where the d(N)/d(S) ratio was unduly high. By correcting the d(N)/d(S) ratio for these codon sites, it was inferred that the ratio was 5-43% overestimated with reverse-translation. These results suggest that caution should be exerted in the study of natural selection using the d(N)/d(S) ratio by reverse-translating aligned amino acid sequences.  相似文献   

7.
The study of the poultry needs in basic nutrients allowed the development of a scheme for obtaining feed polypeptides (“polypeptide cassettes”) enriched with L-amino acids, which are necessary for the metabolism of birds. The amino acid and nucleotide profiles of about 500 bioinformation sequences of thermostable plant proteins and archaea were studied, on the basis of which candidate sequences were selected. In silico, the amino acid and domain composition of the thermostable polypeptides has been optimized. A library of genetically engineered constructs encoding optimized polypeptides with the necessary composition of L-amino acids irreplaceable for poultry has been created. Primary E. coli producer strains were obtained, and the expression and thermostability of the target polypeptides were studied.  相似文献   

8.
M A Soto  C J Tohá 《Bio Systems》1985,18(2):209-215
A quantitative rationale for the evolution of the genetic code is developed considering the principle of minimal hardware. This principle defines an optimal code as one that minimizes for a given amount of information encoded, the product of the number of physical devices used by the average complexity of each device. By identifying the number of different amino acids, number of nucleotide positions per codon and number of base types that can occupy each such position with, respectively, the amount of information, number of devices and the complexity, we show that optimal codes occur for 3, 7 and 20 amino acids with codons having a single, two and three base positions per codon, respectively. The advantage of a code of exactly 4 symbols is deduced, as well as a plausible evolutionary pathway from a code of doublets to triplets. The present day code of 20 amino acids encoded by 64 codons is shown to be the most optimal in an absolute sense. Using a tetraplet code further evolution to a code in which there would be 55 amino acids is in principle possible, but such a code would deviate slightly more than the present day code from the minimal hardware configuration. The change from a triplet code to a tetraplet code would occur at about 32 amino acids. Our conclusions are independent of, but consistent with, the observed physico-chemical properties of the amino acids and codon structures. These correlations could have evolved within the constrains imposed by the minimal hardware principle.  相似文献   

9.
The nucleotide frequencies 5' and 3' to the sense codons in highly and weakly expressed genes have been investigated by the chi-squares method. A comparison between the experimental and computer-generated random nucleotide sequences (in which each codon is substituted by a random synonymous one) was made. It was shown that the choice of a particular codon among the synonymous ones in a given position of the gene depends on the three nucleotides 3' and 5' adjacent to the codon in highly expressed genes (the triplet 3' and a single nucleotide 5' to the codons in weakly expressed genes). Concrete patterns for the preferable choice of synonymous codons depending on their contexts are presented. It is suggested that these constraints are related to the efficiency of messenger translation. The constraints on the amino acid sequences of encoded proteins also lead to statistically significant bases in nucleotide frequencies around the sense codons. The biological role of these constraints is discussed.  相似文献   

10.
《BBA》2022,1863(8):148597
The origin of the genetic code is an abiding mystery in biology. Hints of a ‘code within the codons’ suggest biophysical interactions, but these patterns have resisted interpretation. Here, we present a new framework, grounded in the autotrophic growth of protocells from CO2 and H2. Recent work suggests that the universal core of metabolism recapitulates a thermodynamically favoured protometabolism right up to nucleotide synthesis. Considering the genetic code in relation to an extended protometabolism allows us to predict most codon assignments. We show that the first letter of the codon corresponds to the distance from CO2 fixation, with amino acids encoded by the purines (G followed by A) being closest to CO2 fixation. These associations suggest a purine-rich early metabolism with a restricted pool of amino acids. The second position of the anticodon corresponds to the hydrophobicity of the amino acid encoded. We combine multiple measures of hydrophobicity to show that this correlation holds strongly for early amino acids but is weaker for later species. Finally, we demonstrate that redundancy at the third position is not randomly distributed around the code: non-redundant amino acids can be assigned based on size, specifically length. We attribute this to additional stereochemical interactions at the anticodon. These rules imply an iterative expansion of the genetic code over time with codon assignments depending on both distance from CO2 and biophysical interactions between nucleotide sequences and amino acids. In this way the earliest RNA polymers could produce non-random peptide sequences with selectable functions in autotrophic protocells.  相似文献   

11.
Amino acid substitution plays a vital role in both the molecular engineering of proteins and analysis of structure-activity relationships. High-throughput substitution is achieved by codon randomisation, which generates a library of mutants (a randomised gene library) in a single experiment. For full randomisation, key codons are typically replaced with NNN (64 sequences) or NN(G)(CorT) (32 sequences). This obligates cloning of redundant codons alongside those required to encode the 20 amino acids. As the number of randomised codons increases, there is therefore a progressive loss of randomisation efficiency; the number of genes required per protein rises exponentially. The redundant codons cause amino acids to be represented unevenly; for example, methionine is encoded just once within NNN, whilst arginine is encoded six times. Finally, the organisation of the genetic code makes it impossible to encode functional subsets of amino acids (e.g. polar residues only) in a single experiment. Here, we present a novel solution to randomisation where genetic redundancy is eliminated; the number of different genes equals the number of encoded proteins, regardless of codon number. There is no inherent amino acid bias and any required subset of amino acids may be encoded in one experiment. This generic approach should be widely applicable in studies involving randomisation of proteins.  相似文献   

12.
Herein, we rigorously develop novel 3-dimensional algebraic models called Genetic Hotels of the Standard Genetic Code (SGC). We start by considering the primeval RNA genetic code which consists of the 16 codons of type RNY (purine-any base-pyrimidine). Using simple algebraic operations, we show how the RNA code could have evolved toward the current SGC via two different intermediate evolutionary stages called Extended RNA code type I and II. By rotations or translations of the subset RNY, we arrive at the SGC via the former (type I) or via the latter (type II), respectively. Biologically, the Extended RNA code type I, consists of all codons of the type RNY plus codons obtained by considering the RNA code but in the second (NYR type) and third (YRN type) reading frames. The Extended RNA code type II, comprises all codons of the type RNY plus codons that arise from transversions of the RNA code in the first (YNY type) and third (RNR) nucleotide bases. Since the dimensions of remarkable subsets of the Genetic Hotels are not necessarily integer numbers, we also introduce the concept of algebraic fractal dimension. A general decoding function which maps each codon to its corresponding amino acid or the stop signals is also derived. The Phenotypic Hotel of amino acids is also illustrated. The proposed evolutionary paths are discussed in terms of the existing theories of the evolution of the SGC. The adoption of 3-dimensional models of the Genetic and Phenotypic Hotels will facilitate the understanding of the biological properties of the SGC.  相似文献   

13.
RNA-ligand chemistry: a testable source for the genetic code   总被引:5,自引:3,他引:2       下载免费PDF全文
In the genetic code, triplet codons and amino acids can be shown to be related by chemical principles. Such chemical regularities could be created either during the code's origin or during later evolution. One such chemical principle can now be shown experimentally. Natural or particularly selected RNA binding sites for at least three disparate amino acids (arginine, isoleucine, and tyrosine) are enriched in codons for the cognate amino acid. Currently, in 517 total nucleotides, binding sites contain 2.4-fold more codon sequences than surrounding nucleotides. The aggregate probability of this enrichment is 10(-7) to 10(-8), had codons and binding site sequences been independent. Thus, at least some primordial coding assignments appear to have exploited triplets from amino acid binding sites as codons.  相似文献   

14.
Selection Intensity for Codon Bias   总被引:26,自引:7,他引:19       下载免费PDF全文
D. L. Hartl  E. N. Moriyama    S. A. Sawyer 《Genetics》1994,138(1):227-234
The patterns of nonrandom usage of synonymous codons (codon bias) in enteric bacteria were analyzed. Poisson random field (PRF) theory was used to derive the expected distribution of frequencies of nucleotides differing from the ancestral state at aligned sites in a set of DNA sequences. This distribution was applied to synonymous nucleotide polymorphisms and amino acid polymorphisms in the gnd and putP genes of Escherichia coli. For the gnd gene, the average intensity of selection against disfavored synonymous codons was estimated as approximately 7.3 X 10(-9); this value is significantly smaller than the estimated selection intensity against selectively disfavored amino acids in observed polymorphisms (2.0 X 10(-8)), but it is approximately of the same order of magnitude. The selection coefficients for optimal synonymous codons estimated from PRF theory were consistent with independent estimates based on codon usage for threonine and glycine. Across 118 genes in E. coli and Salmonella typhimurium, the distribution of estimated selection coefficients, expressed as multiples of the effective population size, has a mean and standard deviation of 0.5 +/- 0.4. No significant differences were found in the degree of codon bias between conserved positions and replacement positions, suggesting that translational misincorporation is not an important selective constraint among synonymous polymorphic codons in enteric bacteria. However, across the first 100 codons of the genes, conserved amino acids with identical codons have significantly greater codon bias than of either synonymous or nonidentical codons, suggesting that there are unique selective constraints, perhaps including mRNA secondary structures, in this part of the coding region.  相似文献   

15.

Background

In plant organelles, specific messenger RNAs (mRNAs) are subjected to conversion editing, a process that often converts the first or second nucleotide of a codon and hence the encoded amino acid. No systematic patterns in converted sites were found on mRNAs, and the converted sites rarely encoded residues located at the active sites of proteins. The role and origin of RNA editing in plant organelles remain to be elucidated.

Results

Here we study the relationship between amino acid residues encoded by edited codons and the structural characteristics of these residues within proteins, e.g., in protein-protein interfaces, elements of secondary structure, or protein structural cores. We find that the residues encoded by edited codons are significantly biased toward involvement in helices and protein structural cores. RNA editing can convert codons for hydrophilic to hydrophobic amino acids. Hence, only the edited form of an mRNA can be translated into a polypeptide with helix-preferring and core-forming residues at the appropriate positions, which is often required for a protein to form a functional three-dimensional (3D) structure.

Conclusion

We have performed a novel analysis of the location of residues affected by RNA editing in proteins in plant organelles. This study documents that RNA editing sites are often found in positions important for 3D structure formation. Without RNA editing, protein folding will not occur properly, thus affecting gene expression. We suggest that RNA editing may have conferring evolutionary advantage by acting as a mechanism to reduce susceptibility to DNA damage by allowing the increase in GC content in DNA while maintaining RNA codons essential to encode residues required for protein folding and activity.  相似文献   

16.
Tetrahymena thermophila and Paramecium tetraurelia are ciliates that reassign TAA and TAG from stop codons to glutamine codons. Because of the lack of full genome sequences, few studies have concentrated on analyzing the effects of codon reassignment in protein evolution. We used the recently sequenced genome of these species to analyze the patterns of amino acid substitution in ciliates that reassign the code. We show that, as expected, the codon reassignment has a large impact on amino acid substitutions in closely related proteins; however, contrary to expectations, these effects also hold for very diverged proteins. Previous studies have used amino acid substitution data to calculate the minimization of the genetic code; our results show that because of the lasting influence of the code in the patterns of substitution, such studies are tautological. These different substitution patterns might affect alignment of ciliate proteins, as alignment programs use scoring matrices based on substitution patterns of organisms that use the standard code. We also show that glutamine is used more frequently in ciliates than in other species, as often as expected based on the presence of the 2 new reassigned codons, indicating that the frequencies of amino acids in proteomes is mostly determined by neutral processes based on their number of codons.  相似文献   

17.
X Liu  H Liu  W Guo  K Yu 《Gene》2012,509(1):136-141
Codon models are now widely used to draw evolutionary inferences from alignments of homologous sequence data. Incorporating physicochemical properties of amino acids into codon models, two novel codon substitution models describing the evolution of protein-coding DNA sequences are presented based on the similarity scores of amino acids. To describe substitutions between codons a continue-time Markov process is used. Transition/transversion rate bias and nonsynonymous codon usage bias are allowed in the models. In our implementation, the parameters are estimated by maximum-likelihood (ML) method as in previous studies. Furthermore, instantaneous mutations involving more than one nucleotide position of a codon are considered in the second model. Then the two suggested models are applied to five real data sets. The analytic results indicate that the new codon models considering physicochemical properties of amino acids can provide a better fit to the data comparing with existing codon models, and then produce more reliable estimates of certain biologically important measures than existing methods.  相似文献   

18.
遗传密码子的设定表现出令人困惑的多态性特点 :不同氨基酸拥有的密码子的数目 ,除 5个外 ,从 1个到 6个都有 .这种特点显示出密码子无论在翻译行为还是进化轨迹上 ,都存在诸多的异质性 .因此 ,简并性一词的收敛含义 ,并不能表征这种多态性的进化内涵 .没有同义密码子的AUG(Met)和UGG (Trp)并无简并现象 .其余的密码子则可分为两大类 :一类是 ,4个同义密码子为 1组 ,具有相同的第 1、2位碱基 ,并遵循“3中读 2”的读出规则 .同组的 4个同义密码子 ,不过是来自同一个双字母原始密码子 (XYN)的孑遗物 ,从这个意义上讲 ,也不宜视为简并现象 ;另一类则主要是 ,2个同义密码子为一组 ,并遵循“3中读 3”读出规则 .它们是由编码 2个氨基酸的双义原始密码子 ,第 3位的未定碱基N进一步设定形成 .至于有 6个同义密码子的 ,特别令人困感不解的组别 ,实际上是 4 + 2个 ,这启示它们可能源于上述两大类 .遗传密码子多态性的起源 ,可能始于最初阶段 ,氨基酸同某类寡核苷酸的起始二联体的相互作用 ,而完成于所有的双义原始密码子的第 3位碱基的分化 .这种进化轨迹被传统的简并性一词所模糊 ,并导致鉴定各有关理论可信性的坚实依据和令不同观点取得共识的基础被掩盖起来 .这可能就是在遗传密码子起源领域里 ,长期存在着众  相似文献   

19.
Kamatani T  Yamamoto T 《Bio Systems》2007,90(2):362-370
To gain insight into the nature of the mitochondrial genomes (mtDNA) of different Candida species, the synonymous codon usage bias of mitochondrial protein coding genes and the tRNAs in C. albicans, C. parapsilosis, C. stellata, C. glabrata and the closely related yeast Saccharomyces cerevisiae were analyzed. Common features of the mtDNA in Candida species are a strong A+T pressure on protein coding genes, and insufficient mitochondrial tRNA species are encoded to perform protein synthesis. The wobble site of the anticodon is always U for the NNR (NNA and NNG) codon families, which are dominated by A-ending codons, and always G for the NNY (NNC and NNU) codon families, which is dominated by U-ending codons, and always U for the NNN (NNA, NNU, NNC and NNG) codon families, which are dominated by A-ending codons and U-ending codons. Patterns of synonymous codon usage of Candida species can be classified into three groups: (1) optimal codon-anticodon usage, Glu, Lys, Leu (translated by anti-codon UAA), Gln, Arg (translated by anti-codon UCU) and Trp are containing NNR codons. NNA, whose corresponding tRNA is encoded in the mtDNA, is used preferentially. (2) Non-optimal codon-anticodon usage, Cys, Asp, Phe, His, Asn, Ser (translated by anti-codon GCU) and Tyr are containing NNY codons. The NNU codon, whose corresponding tRNA is not encoded in the mtDNA, is used preferentially. (3) Combined codon-anticodon usage, Ala, Gly, Leu (translated by anti-codon UAG), Pro, Ser (translated by anti-codon UGA), Thr and Val are containing NNN codons. NNA (tRNA encoded in the mtDNA) and NNU (tRNA not encoded in the mtDNA) are used preferentially. In conclusion, we propose that in Candida species, codons containing A or U at third position are used preferentially, regardless of whether corresponding tRNAs are encoded in the mtDNA. These results might be useful in understanding the common features of the mtDNA in Candida species and patterns of synonymous codon usage.  相似文献   

20.
We propose that glycine was the first amino acid to be incorporated into the genetic code, followed by serine, aspartic and/or glutamic acid—small hydrophilic amino acids that all have codons in the bottom right-hand corner of the standard genetic code table. Because primordial ribosomal synthesis is presumed to have been rudimentary, this stage would have been characterized by the synthesis of short, water-soluble peptides, the first of which would have comprised polyglycine. Evolution of the code is proposed to have occurred by the duplication and mutation of tRNA sequences, which produced a radiation of codon assignment outwards from the bottom right-hand corner. As a result of this expansion, we propose a trend from small hydrophilic to hydrophobic amino acids, with selection for longer polypeptides requiring a hydrophobic core for folding and stability driving the incorporation of hydrophobic amino acids into the code.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号