首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
The entropy of the amino acid sequences coded by DNA is considered as a measure of diversity or variety of proteins, and is taken as a measure of evolution. The DNA or m-RNA sequence is corsidered as a stationary second-order Markov chain composed of four kinds of bases. Because of the biased nature of the genetic code table, increase of entropy of amino acid sequences is possible with biased nucleotide sequence. Thus the biased DNA base composition and the extreme rarity of the base doubletC p G of higher organisms are explained. It is expected that the amino acid composition was highly biased at the days of the origin of the genetic code table, and the more frequent amino acids have tended to get rarer, and the rarer ones more frequent. This tendency is observed in the evolution of hemoglobin, cytochrome C, fibrinopeptide, immunoglobulin and lysozyme, and protein as a whole.  相似文献   

2.
3.
4.
In the RNA world, RNA is assumed to be the dominant macromolecule performing most, if not all, core "house-keeping" functions. The ribo-cell hypothesis suggests that the genetic code and the translation machinery may both be born of the RNA world, and the introduction of DNA to ribo-cells may take over the informational role of RNA gradually, such as a mature set of genetic code and mechanism enabling stable inheritance of sequence and its variation. In this context, we modeled the genetic code in two content variables-GC and purine contents-of protein-coding sequences and measured the purine content sensitivities for each codon when the sensitivity (% usage) is plotted as a function of GC content variation. The analysis leads to a new pattern-the symmetric pattern-where the sensitivity of purine content variation shows diagonally symmetry in the codon table more significantly in the two GC content invariable quarters in addition to the two existing patterns where the table is divided into either four GC content sensitivity quarters or two amino acid diversity halves. The most insensitive codon sets are GUN (valine) and CAN (CAR for asparagine and CAY for aspartic acid) and the most biased amino acid is valine (always over-estimated) followed by alanine (always under-estimated). The unique position of valine and its codons suggests its key roles in the final recruitment of the complete codon set of the canonical table. The distinct choice may only be attributable to sequence signatures or signals of splice sites for spliceosomal introns shared by all extant eukaryotes.  相似文献   

5.
Statistical and biochemical studies have revealed non-random patterns in codon assignments. The canonical genetic code is known to be highly efficient in minimizing the effects of mistranslation errors and point mutations, since it is known that when an amino acid is converted to another due to error, the biochemical properties of the resulted amino acid are usually very similar to those of the original one. In this study, using altered forms of the fitness functions used in the prior studies, we have optimized the parameters involved in the calculation of the error minimizing property of the genetic code so that the genetic code outscores the random codes as much as possible. This work also compares two prominent matrices, the Mutation Matrix and Point Accepted Mutations 74-100 (PAM(74-100)). It has been resulted that the hypothetical properties of the coevolution theory of the genetic code are already considered in PAM(74-100), giving more evidence on the existence of bias towards the genetic code in this matrix. Furthermore, our results indicate that PAM(74-100) is biased towards the single base mistranslation occurrences in second codon position as well as the frequency of amino acids. Thus PAM(74-100) is not a suitable substitution matrix for the studies conducted on the evolution of the genetic code.  相似文献   

6.
The Shannon information entropy of protein sequences.   总被引:6,自引:1,他引:5       下载免费PDF全文
A comprehensive data base is analyzed to determine the Shannon information content of a protein sequence. This information entropy is estimated by three methods: a k-tuplet analysis, a generalized Zipf analysis, and a "Chou-Fasman gambler." The k-tuplet analysis is a "letter" analysis, based on conditional sequence probabilities. The generalized Zipf analysis demonstrates the statistical linguistic qualities of protein sequences and uses the "word" frequency to determine the Shannon entropy. The Zipf analysis and k-tuplet analysis give Shannon entropies of approximately 2.5 bits/amino acid. This entropy is much smaller than the value of 4.18 bits/amino acid obtained from the nonuniform composition of amino acids in proteins. The "Chou-Fasman" gambler is an algorithm based on the Chou-Fasman rules for protein structure. It uses both sequence and secondary structure information to guess at the number of possible amino acids that could appropriately substitute into a sequence. As in the case for the English language, the gambler algorithm gives significantly lower entropies than the k-tuplet analysis. Using these entropies, the number of most probable protein sequences can be calculated. The number of most probable protein sequences is much less than the number of possible sequences but is still much larger than the number of sequences thought to have existed throughout evolution. Implications of these results for mutagenesis experiments are discussed.  相似文献   

7.
The genetic code provides the translation table necessary to transform the information contained in DNA into the language of proteins. In this table, a correspondence between each codon and each amino acid is established: tRNA is the main adaptor that links the two. Although the genetic code is nearly universal, several variants of this code have been described in a wide range of nuclear and organellar systems, especially in metazoan mitochondria. These variants are generally found by searching for conserved positions that consistently code for a specific alternative amino acid in a new species. We have devised an accurate computational method to automate these comparisons, and have tested it with 626 metazoan mitochondrial genomes. Our results indicate that several arthropods have a new genetic code and translate the codon AGG as lysine instead of serine (as in the invertebrate mitochondrial genetic code) or arginine (as in the standard genetic code). We have investigated the evolution of the genetic code in the arthropods and found several events of parallel evolution in which the AGG codon was reassigned between serine and lysine. Our analyses also revealed correlated evolution between the arthropod genetic codes and the tRNA-Lys/-Ser, which show specific point mutations at the anticodons. These rather simple mutations, together with a low usage of the AGG codon, might explain the recurrence of the AGG reassignments.  相似文献   

8.
9.
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides-adenine, thymine, guanine and cytosine-according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.  相似文献   

10.
The distinctive amino acid compositions of protein exteriors and interiors were compared to the composition bias imposed by genetic code redundancy. It transpired that the synonym allocation is biased more in favour of those residues which are preferred in interiors, and this leads to an average interior residue being more probable and less mutable compared to an exterior residue. The general implications for protein evolution are discussed in association with the known evolutionary behavior of particular protein families. It is suggested that some proteins may have their structural history "fossilised" in their interiors and that the "amino acid" code is in reality a "protein" code.  相似文献   

11.
The yeast tcml gene, which codes for ribosomal protein L3, has been isolated by using recombinant DNA and genetic complementation. The DNA fragment carrying this gene has been subcloned and we have determined its DNA sequence. The 20 amino acid residues at the amino terminus as inferred from the nucleotide sequence agreed exactly with the amino acid sequence data. The amino acid composition of the encoded protein agreed with that determined for purified ribosomal protein L3. Codon usage in the tcml gene was strongly biased in the direction found for several other abundant Saccharomyces cerevisiae proteins. The tcml gene has no introns, which appears to be atypical of ribosomal protein structural genes.  相似文献   

12.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

13.
V V Sukhodolets 《Genetika》1986,22(11):2551-2559
The literature data are considered concerning the significance of genetic recombination and crossing over. An obvious result of recombination is production of the genotypically diverse offspring, but the main role of recombination consists of combining the genes from diverging subspecies and races, thus maintaining a rather wide ecological potential of a species. This effect of recombination substantiates the tendency for increasing complexity of organic forms in progressive evolution. Accordingly, evolution is considered as a chain of recombinational "syntheses". The literature data treating crossing over as a mechanism of DNA repair are discussed. This function of crossing over is interpreted, based on a notion implying, from the composition of genetic code, that a crystalline associate composed of bases as free molecules precedes the appearance of DNA in evolution. The stability of the crystalline associate of bases was due to "balanced" distribution of bases for their electrochemical properties. The degeneracy of genetic code seems to provide possibility of construction of the electrostatically "balanced" base sequences in highly expressed bacterial genes. Crossing over possibly recovers "balanced" distribution of bases for their electrochemical properties and thus "repairs" a high level of heterocatalytic DNA activity.  相似文献   

14.
Models of amino acid substitution were developed and compared using maximum likelihood. Two kinds of models are considered. "Empirical" models do not explicitly consider factors that shape protein evolution, but attempt to summarize the substitution pattern from large quantities of real data. "Mechanistic" models are formulated at the codon level and separate mutational biases at the nucleotide level from selective constraints at the amino acid level. They account for features of sequence evolution, such as transition-transversion bias and base or codon frequency biases, and make use of physicochemical distances between amino acids to specify nonsynonymous substitution rates. A general approach is presented that transforms a Markov model of codon substitution into a model of amino acid replacement. Protein sequences from the entire mitochondrial genomes of 20 mammalian species were analyzed using different models. The mechanistic models were found to fit the data better than empirical models derived from large databases. Both the mutational distance between amino acids (determined by the genetic code and mutational biases such as the transition-transversion bias) and the physicochemical distance are found to have strong effects on amino acid substitution rates. A significant proportion of amino acid substitutions appeared to have involved more than one codon position, indicating that nucleotide substitutions at neighboring sites may be correlated. Rates of amino acid substitution were found to be highly variable among sites.   相似文献   

15.
It is well known that sequences of bases in DNA are translated into sequences of amino acids in cells via the genetic code. More recently, it has been discovered that the sequence of DNA bases also influences the geometry and deformability of the DNA. These two correspondences represent a naturally arising example of duplexed codes, providing two different ways of interpreting the same DNA sequence. This paper will set up the notation and basic results necessary to mathematically investigate the relationship between these two natural DNA codes. It then undertakes two very different such investigations: one graphical approach based only on expected values and another analytic approach incorporating the deformability of the DNA molecule and approximating the mutual information of the two codes. Special emphasis is paid to whether there is evidence that pressure to maximize the duplexing efficiency influenced the evolution of the genetic code. Disappointingly, the results fail to support the hypothesis that the genetic code was influenced in this way. In fact, applying both methods to samples of realistic alternative genetic codes shows that the duplexing of the genetic code found in nature is just slightly less efficient than average. The implications of this negative result are considered in the final section of the paper.  相似文献   

16.
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.  相似文献   

17.
对8个节瓜(Benincasa hispida var.chieh-qua How)品系基因组DNA中的Ty1-copia类逆转座子逆转录酶核苷酸序列进行扩增,并对品系A39FA的29个克隆产物的核苷酸序列及翻译的氨基酸序列的系统进化和同源性进行了分析,还对29条氨基酸序列进行了比对。扩增结果表明:8个节瓜品系的基因组DNA中均包含长度约260 bp的逆转录酶核苷酸片段;从品系A39FA中获得的29条Ty1-copia类逆转座子逆转录酶核苷酸序列(CqRt1至CqRt29)的长度为247~267 bp,同源率为46.2%~98.1%,而它们的氨基酸序列同源率为26.7%~98.8%。序列分析结果表明:节瓜Ty1-copia类逆转座子逆转录酶核苷酸序列中碱基A、T、G和C的数量分别为65~96、47~92、45~74和32~49,所有序列均富含碱基A和T,AT/GC比为1.35~2.33;缺失突变是造成节瓜Ty1-copia类逆转座子逆转录酶核苷酸序列长度差异的主要因素,在序列长度和碱基组成方面的明显差异表明节瓜Ty1-copia类逆转座子逆转录酶核苷酸序列具有高度异质性。翻译后的氨基酸序列中有21条序列存在终止密码子突变、12条序列存在移框突变,表明Ty1-copia类逆转座子是节瓜基因组内序列重组的热点。通过聚类分析可将29个逆转录酶核苷酸序列分为5个家族(Family),分别包括16、4、4、4和1条序列,其中Family 1可能是具有转座活性的逆转座子家族,但存在转录活性的逆转录酶序列仅占全部序列数量的20.69%。将每一家族中的1~2条序列与其他15种植物的Ty1-copia类逆转座子逆转录酶的氨基酸序列进行比对,显示出较高的同源性。研究结果表明:节瓜与其他植物的Ty1-copia类逆转座子可能有相同起源,而且Ty1-copia类逆转座子可在不同类群间横向传递。  相似文献   

18.
The genetic code relates nucleotide sequence to amino acid sequence and is shared across all organisms, with the rare exceptions of lineages in which one or a few codons have acquired novel assignments. Recoding of UGA from stop to tryptophan has evolved independently in certain reduced bacterial genomes, including those of the mycoplasmas and some mitochondria. Small genomes typically exhibit low guanine plus cytosine (GC) content, and this bias in base composition has been proposed to drive UGA Stop to Tryptophan (Stop→Trp) recoding. Using a combination of genome sequencing and high-throughput proteomics, we show that an α-Proteobacterial symbiont of cicadas has the unprecedented combination of an extremely small genome (144 kb), a GC–biased base composition (58.4%), and a coding reassignment of UGA Stop→Trp. Although it is not clear why this tiny genome lacks the low GC content typical of other small bacterial genomes, these observations support a role of genome reduction rather than base composition as a driver of codon reassignment.  相似文献   

19.
The genetic code relates nucleotide sequence to amino acid sequence and is shared across all organisms, with the rare exceptions of lineages in which one or a few codons have acquired novel assignments. Recoding of UGA from stop to tryptophan has evolved independently in certain reduced bacterial genomes, including those of the mycoplasmas and some mitochondria. Small genomes typically exhibit low guanine plus cytosine (GC) content, and this bias in base composition has been proposed to drive UGA Stop to Tryptophan (Stop→Trp) recoding. Using a combination of genome sequencing and high-throughput proteomics, we show that an α-Proteobacterial symbiont of cicadas has the unprecedented combination of an extremely small genome (144 kb), a GC–biased base composition (58.4%), and a coding reassignment of UGA Stop→Trp. Although it is not clear why this tiny genome lacks the low GC content typical of other small bacterial genomes, these observations support a role of genome reduction rather than base composition as a driver of codon reassignment.  相似文献   

20.
Fifty years have passed since the genetic code was deciphered, but how the genetic code came into being has not been satisfactorily addressed. It is now widely accepted that the earliest genetic code did not encode all 20 amino acids found in the universal genetic code as some amino acids have complex biosynthetic pathways and likely were not available from the environment. Therefore, the genetic code evolved as pathways for synthesis of new amino acids became available. One hypothesis proposes that early in the evolution of the genetic code four amino acids—valine, alanine, aspartic acid, and glycine—were coded by GNC codons (N = any base) with the remaining codons being nonsense codons. The other sixteen amino acids were subsequently added to the genetic code by changing nonsense codons into sense codons for these amino acids. Improvement in protein function is presumed to be the driving force behind the evolution of the code, but how improved function was achieved by adding amino acids has not been examined. Based on an analysis of amino acid function in proteins, an evolutionary mechanism for expansion of the genetic code is described in which individual coded amino acids were replaced by new amino acids that used nonsense codons differing by one base change from the sense codons previously used. The improved or altered protein function afforded by the changes in amino acid function provided the selective advantage underlying the expansion of the genetic code. Analysis of amino acid properties and functions explains why amino acids are found in their respective positions in the genetic code.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号