首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In the past, 2 kinds of Markov models have been considered to describe protein sequence evolution. Codon-level models have been mechanistic with a small number of parameters designed to take into account features, such as transition-transversion bias, codon frequency bias, and synonymous-nonsynonymous amino acid substitution bias. Amino acid models have been empirical, attempting to summarize the replacement patterns observed in large quantities of data and not explicitly considering the distinct factors that shape protein evolution. We have estimated the first empirical codon model (ECM). Previous codon models assume that protein evolution proceeds only by successive single nucleotide substitutions, but our results indicate that model accuracy is significantly improved by incorporating instantaneous doublet and triplet changes. We also find that the affiliations between codons, the amino acid each encodes and the physicochemical properties of the amino acids are main factors driving the process of codon evolution. Neither multiple nucleotide changes nor the strong influence of the genetic code nor amino acids' physicochemical properties form a part of standard mechanistic models and their views of how codon evolution proceeds. We have implemented the ECM for likelihood-based phylogenetic analysis, and an assessment of its ability to describe protein evolution shows that it consistently outperforms comparable mechanistic codon models. We point out the biological interpretation of our ECM and possible consequences for studies of selection.  相似文献   

2.
Models of amino acid substitution were developed and compared using maximum likelihood. Two kinds of models are considered. "Empirical" models do not explicitly consider factors that shape protein evolution, but attempt to summarize the substitution pattern from large quantities of real data. "Mechanistic" models are formulated at the codon level and separate mutational biases at the nucleotide level from selective constraints at the amino acid level. They account for features of sequence evolution, such as transition-transversion bias and base or codon frequency biases, and make use of physicochemical distances between amino acids to specify nonsynonymous substitution rates. A general approach is presented that transforms a Markov model of codon substitution into a model of amino acid replacement. Protein sequences from the entire mitochondrial genomes of 20 mammalian species were analyzed using different models. The mechanistic models were found to fit the data better than empirical models derived from large databases. Both the mutational distance between amino acids (determined by the genetic code and mutational biases such as the transition-transversion bias) and the physicochemical distance are found to have strong effects on amino acid substitution rates. A significant proportion of amino acid substitutions appeared to have involved more than one codon position, indicating that nucleotide substitutions at neighboring sites may be correlated. Rates of amino acid substitution were found to be highly variable among sites.   相似文献   

3.
密码子偏性的分析方法及相关研究进展   总被引:22,自引:0,他引:22  
密码子偏性是指生物体中编码同一种氨基酸的同义密码子的非均衡使用的现象,由于这一现象与遗传信息的载体分子DNA和生物功能分子蛋白质相关联,所以具有重要的生物学意义;本文概述了密码子偏性研究方面的基本理论和常用分析方法,归纳了密码子使用分析的常用软件和提供在线分析网站,介绍了与密码子偏性相关的生物学领域及最新的研究进展,并对深入研究进行展望。  相似文献   

4.
A codon-based model of nucleotide substitution for protein-coding DNA sequences   总被引:34,自引:23,他引:11  
A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.   相似文献   

5.
Over the years, there have been claims that evolution proceeds according to systematically different processes over different timescales and that protein evolution behaves in a non-Markovian manner. On the other hand, Markov models are fundamental to many applications in evolutionary studies. Apparent non-Markovian or time-dependent behavior has been attributed to influence of the genetic code at short timescales and dominance of physicochemical properties of the amino acids at long timescales. However, any long time period is simply the accumulation of many short time periods, and it remains unclear why evolution should appear to act systematically differently across the range of timescales studied. We show that the observed time-dependent behavior can be explained qualitatively by modeling protein sequence evolution as an aggregated Markov process (AMP): a time-homogeneous Markovian substitution model observed only at the level of the amino acids encoded by the protein-coding DNA sequence. The study of AMPs sheds new light on the relationship between amino acid-level and codon-level models of sequence evolution, and our results suggest that protein evolution should be modeled at the codon level rather than using amino acid substitution models.  相似文献   

6.
The evolutionary selection forces acting on a protein are commonly inferred using evolutionary codon models by contrasting the rate of synonymous to nonsynonymous substitutions. Most widely used models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates. In this paper, we develop a general method that allows assimilation of empirical amino acid replacement probabilities into a codon-substitution matrix. In this way, the resulting codon model takes into account not only the transition-transversion bias and the nonsynonymous/synonymous ratio, but also the different amino acid replacement probabilities as specified in empirical amino acid matrices. Different empirical amino acid replacement matrices, such as secondary structure-specific matrices or organelle-specific matrices (e.g., mitochondria and chloroplasts), can be incorporated into the model, making it context dependent. Using a diverse set of coding DNA sequences, we show that the novel model better fits biological data as compared with either mechanistic or empirical codon models. Using the suggested model, we further analyze human immunodeficiency virus type 1 protease sequences obtained from drug-treated patients and reveal positive selection in sites that are known to confer drug resistance to the virus.  相似文献   

7.
Models of codon substitution are developed that incorporate physicochemical properties of amino acids. When amino acid sites are inferred to be under positive selection, these models suggest the nature and extent of the physicochemical properties under selection. This is accomplished by first partitioning the codons on the basis of some property of the encoded amino acids. This partition is used to parametrize the rates of property-conserving and property-altering base substitutions at the codon level by means of finite mixtures of Markov models that also account for codon and transition:transversion biases. Here, we apply this method to two positively selected receptors involved in ligand-recognition: the class I alleles of the human major histocompatibility complex (MHC) of known structure and the S-locus receptor kinase (SRK) of the sporophytic self-incompatibility system (SSI) in cruciferous plants (Brassicaceae), whose structure is unknown. Through likelihood ratio tests we demonstrate that at some sites, the positively selected MHC and SRK proteins are under physicochemical selective pressures to alter polarity, volume, polarity and/or volume, and charge to various extents. An empirical Bayes approach is used to identify sites that may be important for ligand recognition in these proteins.Reviewing Editor : Dr. Willie Swanson  相似文献   

8.
To better conceptualize the mechanism underlying the evolution of synonymous codons, we have analysed intragenic codon usage in chosen "regions" of some mouse and human genes. We divided a given gene into two regions: one consisting of a trinucleotide repeat (TNR) and the other consisting of the "rest of the coding region" (RCR). Usually, a TNR is composed of a repetitive single codon, which may reflect its frequency in a gene. In contrast, a non-random frequency of a codon in the RCR versus TNR (or vice versa) of a gene should indicate a bias for that codon within the TNR. We examined this scenario by comparing codon frequency between the RCR and the cognate TNR(s) for a set of human and mouse genes. A TNR length of six amino acids or more was used to identify genes from the Genbank database. Twenty nine human and twenty one mouse genes containing TNRs coding for nine different amino acid runs were identified. The ratio of codon frequency in a TNR versus the corresponding RCR was expressed as "fold change" which was also regarded as a measure of codon bias (defined as preferential use either in TNR or in RCR). Chi-square values were then determined from the distribution of codon frequency in a TNR vs. the cognate RCR. At p<0.001, 22% and 27%, respectively, of human and mouse TNRs showed codon bias. Greater than 40% of the TNRs (29 out of 69 in human, and 18 of 42 in mouse) showed codon bias at p<0.05. In addition, we identify eight single-codon TNRs in mouse and ten in human genes. Thus, our results show intragenic codon bias in both mouse and human genes expressed in diverse tissue types. Since our results are independent of the Codon Adaptation Index (CAI) and starvation CAI, and since the tRNA repertoire in a cell or in a tissue is constant, our data suggest that other constraints besides tRNA abundance played a role in creating intragenic codon bias in these genes.  相似文献   

9.
Amino acids are essential measurements for the potential growth stage because of connecting to protein structures and functions. The objective of this paper was to analyze chromosomes feature at plastid region of rice represented by nucleotide, synonymous codon, and amino acid usage to predict gene expression through codon usage pattern. The results showed that the values of the codon adaption index ranged from 0.733 in chromosome 9 to 0.631 in chromosome 8 with full length of these two chromosomes were 3738 and 1635 respectively. The higher value of guanine and cytosine content was 60% in chromosomes 9 while the lower values was 37% in chromosomes 11. Eight chromosomes (ch1, ch2, ch3, ch5, ch7, ch8, ch10, and ch12) were greater value of modified relative codon bias than threshold (threshold: 0.66) especially in cysteine for ch1, ch2, ch5, ch10, and ch12. While other remaining chromosomes were less than the threshold. Relative synonymous codon usage found that the over-represented of amino acids were asparagine, aspartate, cysteine, glutamate, and phenylalanine across all 12 chromosomes. These results would establish a platform for more and further projects concerning rice breeding and genetics and codon optimization in the amino acids for developing varieties. These results also will help breeders to select desirable genes through the genome for improve target traits.  相似文献   

10.
The organization of the canonical genetic code needs to be thoroughly illuminated. Here we reorder the four nucleotides-adenine, thymine, guanine and cytosine-according to their emergence in evolution, and apply the organizational rules to devising an algebraic representation for the canonical genetic code. Under a framework of the devised code, we quantify codon and amino acid usages from a large collection of 917 prokaryotic genome sequences, and associate the usages with its intrinsic structure and classification schemes as well as amino acid physicochemical properties. Our results show that the algebraic representation of the code is structurally equivalent to a content-centric organization of the code and that codon and amino acid usages under different classification schemes were correlated closely with GC content, implying a set of rules governing composition dynamics across a wide variety of prokaryotic genome sequences. These results also indicate that codons and amino acids are not randomly allocated in the code, where the six-fold degenerate codons and their amino acids have important balancing roles for error minimization. Therefore, the content-centric code is of great usefulness in deciphering its hitherto unknown regularities as well as the dynamics of nucleotide, codon, and amino acid compositions.  相似文献   

11.
We analyzed the complete genome sequence of Arabidopsis thaliana and sequence data from 83 genes in the outcrossing A. lyrata, to better understand the role of gene expression on the strength of natural selection on synonymous and replacement sites in Arabidopsis. From data on tRNA gene abundance, we find a good concordance between codon preferences and the relative abundance of isoaccepting tRNAs in the complete A. thaliana genome, consistent with models of translational selection. Both EST-based and new quantitative measures of gene expression (MPSS) suggest that codon preferences derived from information on tRNA abundance are more strongly associated with gene expression than those obtained from multivariate analysis, which provides further support for the hypothesis that codon bias in Arabidopsis is under selection mediated by tRNA abundance. Consistent with previous results, analysis of protein evolution reveals a significant correlation between gene expression level and amino acid substitution rate. Analysis by MPSS estimates of gene expression suggests that this effect is primarily the result of a correlation between the number of tissues in which a gene is expressed and the rate of amino acid substitution, which indicates that the degree of tissue specialization may be an important determinant of the rate of protein evolution in Arabidopsis.  相似文献   

12.
Wall DP  Herbeck JT 《Journal of molecular evolution》2003,56(6):673-88; discussion 689-90
In this study we reconstruct the evolution of codon usage bias in the chloroplast gene rbcL using a phylogeny of 92 green-plant taxa. We employ a measure of codon usage bias that accounts for chloroplast genomic nucleotide content, as an attempt to limit plausible explanations for patterns of codon bias evolution to selection- or drift-based processes. This measure uses maximum likelihood-ratio tests to compare the performance of two models, one in which a single codon is overrepresented and one in which two codons are overrepresented. The measure allowed us to analyze both the extent of bias in each lineage and the evolution of codon choice across the phylogeny. Despite predictions based primarily on the low G + C content of the chloroplast and the high functional importance of rbcL, we found large differences in the extent of bias, suggesting differential molecular selection that is clade specific. The seed plants and simple leafy liverworts each independently derived a low level of bias in rbcL, perhaps indicating relaxed selectional constraint on molecular changes in the gene. Overrepresentation of a single codon was typically plesiomorphic, and transitions to overrepresentation of two codons occurred commonly across the phylogeny, possibly indicating biochemical selection. The total codon bias in each taxon, when regressed against the total bias of each amino acid, suggested that twofold amino acids play a strong role in inflating the level of codon usage bias in rbcL, despite the fact that twofolds compose a minority of residues in this gene. Those amino acids that contributed most to the total codon usage bias of each taxon are known through amino acid knockout and replacement to be of high functional importance. This suggests that codon usage bias may be constrained by particular amino acids and, thus, may serve as a good predictor of what residues are most important for protein fitness.  相似文献   

13.
The codon table for the canonical genetic code can be rearranged in such a way that the code is divided into four quarters and two halves according to the variability of their GC and purine contents, respectively. For prokaryotic genomes, when the genomic GC content increases, their amino acid contents tend to be restricted to the GC-rich quarter and the purine-content insensitive half, where all codons are fourfold degenerate and relatively mutation-tolerant. Conversely, when the genomic GC content decreases, most of the codons retract to the AUrich quarter and the purine-content sensitive half; most of the codons not only remain encoding physicochemically diversified amino acids but also vary when transversion (between purine and pyrimidine) happens. Amino acids with sixfolddegenerate codons are distributed into all four quarters and across the two halves; their fourfold-degenerate codons are all partitioned into the purine-insensitive half in favorite of robustness against mutations. The features manifested in the rearranged codon table explain most of the intrinsic relationship between protein coding sequences (the informational content) and amino acid compositions (the functional content). The renovated codon table is useful in predicting abundant amino acids and positioning the amino acids with related or distinct physicochemical properties.  相似文献   

14.
自然界中,同义密码子的存在使得众多氨基酸能够同时被多种密码子编码合成。随着研究的深入,同义密码子使用偏嗜性发挥出的生物学功能已经渗透到了基因复制、转录、翻译以及化学修饰等生命活动过程中。基于同义密码子使用偏嗜性的生物学特性,陆续发现密码子对(codon pair)和密码子共现(codon co-occurrence)同样在使用模式上存在明显的偏嗜性。在基因表达的过程中,针对编码序列的密码子优化能够显著提升基因的表达水平,这在生物工程领域对于蛋白表达有着重要的生物学意义。此外,同义密码子使用模式在调控基因转录、化学修饰以及翻译过程中间接控制着细胞内生命活动的有序性。而这些与同义密码子使用模式有着千丝万缕联系的生命过程主要是受精微翻译选择压力来调控运行的。本文中,我们结合当前同义密码子使用模式介导的精微翻译选择压力,简述密码子使用模式如何从转录、化学修饰以及翻译等方面来影响基因表达及蛋白产物生物学功能。这将为今后生物工程学领域如何优化蛋白高效表达以及深入研究重要生物学活动中基因表达调控提供可参考的思路与理念。  相似文献   

15.
Rao Y  Wu G  Wang Z  Chai X  Nie Q  Zhang X 《DNA research》2011,18(6):499-512
Synonymous codons are used with different frequencies both among species and among genes within the same genome and are controlled by neutral processes (such as mutation and drift) as well as by selection. Up to now, a systematic examination of the codon usage for the chicken genome has not been performed. Here, we carried out a whole genome analysis of the chicken genome by the use of the relative synonymous codon usage (RSCU) method and identified 11 putative optimal codons, all of them ending with uracil (U), which is significantly departing from the pattern observed in other eukaryotes. Optimal codons in the chicken genome are most likely the ones corresponding to highly expressed transfer RNA (tRNAs) or tRNA gene copy numbers in the cell. Codon bias, measured as the frequency of optimal codons (Fop), is negatively correlated with the G + C content, recombination rate, but positively correlated with gene expression, protein length, gene length and intron length. The positive correlation between codon bias and protein, gene and intron length is quite different from other multi-cellular organism, as this trend has been only found in unicellular organisms. Our data displayed that regional G + C content explains a large proportion of the variance of codon bias in chicken. Stepwise selection model analyses indicate that G + C content of coding sequence is the most important factor for codon bias. It appears that variation in the G + C content of CDSs accounts for over 60% of the variation of codon bias. This study suggests that both mutation bias and selection contribute to codon bias. However, mutation bias is the driving force of the codon usage in the Gallus gallus genome. Our data also provide evidence that the negative correlation between codon bias and recombination rates in G. gallus is determined mostly by recombination-dependent mutational patterns.  相似文献   

16.
Based on the differences in synonymous codon use between E. coli and S. typhimurium, the synonymous substitution rates can be estimated. In contrast to previous studies on the substitution rates in these two organisms, we use a kinetic model that explicitly takes the selection bias into account. The selection pressure on synonymous codons for a particular amino acid can be calculated from the observed codon bias. This offers a unique opportunity to study systematically the relationship between substitution-rate constants and selection pressure. The results indicate that the codon bias in these organisms is determined by a mutation-selection balance rather than by stabilizing selection. A best fit to the data implies that the mutation rate constant increases about threefold in genes at low expression levels relative to those that are highly expressed.Correspondence to: O.G. Berg  相似文献   

17.
Suzuki H  Saito R  Tomita M 《FEBS letters》2005,579(28):6499-6504
Multivariate analyses are often used to identify major trends of variation in synonymous codon usage among genes. These analyses need to be performed on properly normalized codon usage data to avoid biases masking this synonymous variation, i.e., gene length, amino acid usage, and codon degeneracy; however, previous studies have failed to do so. In this paper, we demonstrate that the use of alternative normalized data (called 'relative adaptiveness' in the literature) can avoid all these biases and furthermore, can identify more trends of variation among genes, including GC-ending codon usage, GT-ending codon usage, and gene expression level.  相似文献   

18.
Different mechanisms regulate the expression level of tissue specific genes in human. Here we report some compositional features such as codon usage bias, amino acid usage bias, codon frequency, and base composition which may be potentially related to mRNA amount of tissue specific tumor suppressor genes. Our findings support the possibility that structural elements in gene and protein may play an important role in the regulation of tumor suppressor genes, development, and tumorigenesis. The data presented here can open broad vistas in the understanding and treatment of a variety of human malignancies.  相似文献   

19.
Patterns of codon usage have been extensively studied among Bacteria and Eukaryotes, but there has been little investigation of species from the third domain of life, the Archaea. Here, we examine the nature of codon usage bias in a methanogenic archaeon, Methanococcus maripaludis. Genome-wide patterns of codon usage are dominated by a strong A + T bias, presumably largely reflecting mutation patterns. Nevertheless, there is variation among genes in the use of a subset of putatively translationally optimal codons, which is strongly correlated with gene expression level. In comparison with Bacteria such as Escherichia coli, the strength of selected codon usage bias in highly expressed genes in M. maripaludis seems surprisingly high given its moderate growth rate. However, the pattern of selected codon usage differs between M. maripaludis and E. coli: in the archaeon, strongly selected codon usage bias is largely restricted to twofold degenerate amino acids (AAs). Weaker bias among the codons for fourfold degenerate AAs is consistent with the small number of tRNA genes in the M. maripaludis genome.  相似文献   

20.
To reveal how the AT-rich genome of bacteriophage PhiKZ has been shaped in order to carryout its growth in the GC-rich host Pseudomonas aeruginosa,synonymous codon and amino acid usage bias ofPhiKZ was investigated and the data were compared with that of P.aeruginosa.It was found that synonymouscodon and amino acid usage of PhiKZ was distinct from that of P.aeruginosa.In contrast to P.aeruginosa,the third codon position of the synonymous codons of PhiKZ carries mostly A or T base;codon usage biasin PhiKZ is dictated mainly by mutational bias and,to a lesser extent,by translational selection.A clusteranalysis of the relative synonymous codon usage values of 16 myoviruses including PhiKZ shows that PhiKZis evolutionary much closer to Escherickia coli phage T4.Further analysis reveals that the three factors ofmean molecular weight,aromaticity and cysteine content are mostly responsible for the variation of aminoacid usage in PhiKZ proteins,whereas amino acid usage of P.aeruginosa proteins is mainly governed bygrand average of hydropathicity,aromaticity and cysteine content.Based on these observations,we suggestthat codons of the phage-like PhiKZ have evolved to preferentially incorporate the smaller amino acid residuesinto their proteins during translation,thereby economizing the cost of its development in GC-rich P.aeruginosa.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号