首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
An analytical model based on the statistical properties of Open Reading Frames (ORFs) of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.  相似文献   

2.
Codon usage in Clonorchis sinensis was analyzed using 12,515 codons from 38 coding sequences. Total GC content was 49.83%, and GC1, GC2 and GC3 contents were 56.32%, 43.15% and 50.00%, respectively. The effective number of codons converged at 51-53 codons. When plotted against total GC content or GC3, codon usage was distributed in relation to GC3 biases. Relative synonymous codon usage for each codon revealed a single major trend, which was highly correlated with GC content at the third position when codons began with A or U at the first two positions. In codons beginning with G or C base at the first two positions, the G or C base rarely occurred at the third position. These results suggest that codon usage is shaped by a bias towards G or C at the third base, and that this is affected by the first and second bases.  相似文献   

3.
Synonymous codon usage of 53 protein coding genes in chloroplast genome of Coffea arabica was analyzed for the first time to find out the possible factors contributing codon bias. All preferred synonymous codons were found to use A/T ending codons as chloroplast genomes are rich in AT. No difference in preference for preferred codons was observed in any of the two strands, viz., leading and lagging strands. Complex correlations between total base compositions (A, T, G, C, GC) and silent base contents (A3, T3, G3, C3, GC3) revealed that compositional constraints played crucial role in shaping the codon usage pattern of C. arabica chloroplast genome. ENC Vs GC3 plot grouped majority of the analyzed genes on or just below the left side of the expected GC3 curve indicating the influence of base compositional constraints in regulating codon usage. But some of the genes lie distantly below the continuous curve confirmed the influence of some other factors on the codon usage across those genes. Influence of compositional constraints was further confirmed by correspondence analysis as axis 1 and 3 had significant correlations with silent base contents. Correlation of ENC with axis 1, 4 and CAI with 1, 2 prognosticated the minor influence of selection in nature but exact separation of highly and lowly expressed genes could not be seen. From the present study, we concluded that mutational pressure combined with weak selection influenced the pattern of synonymous codon usage across the genes in the chloroplast genomes of C. arabica.  相似文献   

4.
ABSTRACT: BACKGROUND: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. RESULTS: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG. CONCLUSIONS: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.ReviewersThis article was reviewed by Michail Gelfand, Arcady Mushegian and Shamil Sunyaev. For the full reviews, please go to the Reviewers' Comments section.  相似文献   

5.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

6.
Rao Y  Wu G  Wang Z  Chai X  Nie Q  Zhang X 《DNA research》2011,18(6):499-512
Synonymous codons are used with different frequencies both among species and among genes within the same genome and are controlled by neutral processes (such as mutation and drift) as well as by selection. Up to now, a systematic examination of the codon usage for the chicken genome has not been performed. Here, we carried out a whole genome analysis of the chicken genome by the use of the relative synonymous codon usage (RSCU) method and identified 11 putative optimal codons, all of them ending with uracil (U), which is significantly departing from the pattern observed in other eukaryotes. Optimal codons in the chicken genome are most likely the ones corresponding to highly expressed transfer RNA (tRNAs) or tRNA gene copy numbers in the cell. Codon bias, measured as the frequency of optimal codons (Fop), is negatively correlated with the G + C content, recombination rate, but positively correlated with gene expression, protein length, gene length and intron length. The positive correlation between codon bias and protein, gene and intron length is quite different from other multi-cellular organism, as this trend has been only found in unicellular organisms. Our data displayed that regional G + C content explains a large proportion of the variance of codon bias in chicken. Stepwise selection model analyses indicate that G + C content of coding sequence is the most important factor for codon bias. It appears that variation in the G + C content of CDSs accounts for over 60% of the variation of codon bias. This study suggests that both mutation bias and selection contribute to codon bias. However, mutation bias is the driving force of the codon usage in the Gallus gallus genome. Our data also provide evidence that the negative correlation between codon bias and recombination rates in G. gallus is determined mostly by recombination-dependent mutational patterns.  相似文献   

7.
The complete 15,223-bp mitochondrial genome (mitogenome) of Tryporyza incertulas (Walker) (Lepidoptera: Pyraloidea: Crambidae) was determined, characterized and compared with seven other species of superfamily Pyraloidea. The order of 37 genes was typical of insect mitochondrial DNA sequences described to date. Compared with other moths of Pyraloidea, the A + T biased (77.0%) of T. incertulas was the lowest. Eleven protein-coding genes (PCGs) utilized the standard ATN, but cox1 used CGA and nad4 used AAT as the initiation codons. Ten protein-coding genes had the common stop codon TAA, except nad3 having TAG as the stop codon, and cox2, nad4 using T, TA as the incomplete stop codons, respectively. All of the tRNA genes had typical cloverleaf secondary structures except trnS1(AGN), in which the dihydrouridine (DHU) arm did not form a stable stem-loop structure. There was a spacer between trnQ and nad2, which was common in Lepidoptera moths. A 6-bp motif ‘ATACTA’ between trnS2(UCN) and nad1, a 7-bp motif “AGC(T)CTTA” between trnW and trnC and a 6-bp motif “ATGATA” of overlapping region between atp8 and atp6 were found in Pyraloidea moths. The A + T-rich region contained an ‘ATAGT(A)’-like motif followed by a poly-T stretch. In addition, two potential stem-loop structures, a duplicated 19-bp repeat element, and two microsatellites ‘(TA)12’ and ‘(TA)9’ were observed in the A + T-rich region of T. incertulas mitogenome. Finally, the phylogenetic relationships of Pyraloidea species were constructed based on amino acid sequences of 13 PCGs of mitogenomes using Bayesian inference (BI) and maximum likelihood (ML) methods. These molecular-based phylogenies supported the morphological classification on relationships within Pyraloidea species.  相似文献   

8.
Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome.  相似文献   

9.
Background: Mitochondrial ND gene, which encodes NADH dehydrogenase, is the first enzyme of the mitochondrial electron transport chain. Leigh syndrome, a neurodegenerative disease caused by mutation in the ND2 gene (T4681C), is associated with bilateral symmetric lesions in basal ganglia and subcortical brain regions. Therefore, it is of interest to analyze mitochondrial DNA to glean information for evolutionary relationship. This study highlights on the analysis of compositional dynamics and selection pressure in shaping the codon usage patterns in the coding sequence of MT-ND2 gene across pisces, aves and mammals by using bioinformatics tools like effective number of codons (ENC), codon adaptation index (CAI), relative synonymous codon usage (RSCU) etc. Results: We observed a low codon usage bias as reflected by high ENC values in MT-ND2 gene among pisces, aves and mammals. The most frequently used codons were ending with A/C at the 3rd position of codon and the gene was AT rich in all the three classes. The codons TCA, CTA, CGA and TGA were over represented in all three classes. The F1 correspondence showed significant positive correlation with G, T3 and CAI while the F2 axis showed significant negative correlation with A and T but significant positive correlation with G, C, G3, C3, ENC, GC, GC1, GC2 and GC3. Conclusions: The codon usage bias in MTND2 gene is not associated with expression level. Mutation pressure and natural selection affect the codon usage pattern in MT-ND 2 gene.  相似文献   

10.
The complete mitochondrial genome is of great importance for better understanding the genome-level characteristics and phylogenetic relationships among related species. In the present study, we determined the complete mitochondrial genome DNA sequence of the mud crab (Scylla paramamosain) by 454 deep sequencing and Sanger sequencing approaches. The complete genome DNA was 15,824 bp in length and contained a typical set of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and a putative control region (CR). Of 37 genes, twenty-three were encoded by the heavy strand (H-strand), while the other ones were encoded by light strand (L-strand). The gene order in the mitochondrial genome was largely identical to those obtained in most arthropods, although the relative position of gene tRNAHis differed from other arthropods. Among 13 protein-coding genes, three (ATPase subunit 6 (ATP6), NADH dehydrogenase subunits 1 (ND1) and ND3) started with a rare start codon ATT, whereas, one gene cytochrome c oxidase subunit I (COI) ended with the incomplete stop codon TA. All 22 tRNAs could fold into a typical clover-leaf secondary structure, with the gene sizes ranging from 63 to 73 bp. The phylogenetic analysis based on 12 concatenated protein-coding genes showed that the molecular genetic relationship of 19 species of 11 genera was identical to the traditional taxonomy.  相似文献   

11.
12.
Ciliated protozoa of the genus Euplotes have undergone genetic code reassignment, redefining the termination codon UGA to encode cysteine. In addition, Euplotes spp. genes very frequently employ shifty stop frameshifting. Both of these phenomena involve noncanonical events at a termination codon, suggesting they might have a common cause. We recently demonstrated that Euplotes octocarinatus peptide release factor eRF1 ignores UGA termination codons while continuing to recognize UAA and UAG. Here we show that both the Tetrahymena thermophila and E. octocarinatus eRF1 factors allow efficient frameshifting at all three termination codons, suggesting that UGA redefinition also impaired UAA/UAG recognition. Mutations of the Euplotes factor restoring a phylogenetically conserved motif in eRF1 (TASNIKS) reduced programmed frameshifting at all three termination codons. Mutation of another conserved residue, Cys124, strongly reduces frameshifting at UGA while actually increasing frameshifting at UAA/UAG. We will discuss these results in light of recent biochemical characterization of these mutations.  相似文献   

13.
Salim HM  Ring KL  Cavalcanti AR 《Protist》2008,159(2):283-298
We used the recently sequenced genomes of the ciliates Tetrahymena thermophila and Paramecium tetraurelia to analyze the codon usage patterns in both organisms; we have analyzed codon usage bias, Gln codon usage, GC content and the nucleotide contexts of initiation and termination codons in Tetrahymena and Paramecium. We also studied how these trends change along the length of the genes and in a subset of highly expressed genes. Our results corroborate some of the trends previously described in Tetrahymena, but also negate some specific observations. In both genomes we found a strong bias toward codons with low GC content; however, in highly expressed genes this bias is smaller and codons ending in GC tend to be more frequent. We also found that codon bias increases along gene segments and in highly expressed genes and that the context surrounding initiation and termination codons are always AT rich. Our results also suggest differences in the efficiency of translation of the reassigned stop codons between the two species and between the reassigned codons. Finally, we discuss some of the possible causes for such translational efficiency differences.  相似文献   

14.
In two Escherichia coli genomes, laboratory strain K-12 and pathological strain O157:H7, tandem termination codons as a group are slightly over-represented as termination signals. Individually however, they span the range of representations, over, as expected, or under, in one or both of the strains. In vivo, tandem termination codons do not make more efficient signals. The second codon can act as a backstop where readthrough of the first has occurred, but not at the expected efficiency. UGAUGA remains an enigma, highly over-represented, but with the second UGA a relatively inefficient back up stop codon.  相似文献   

15.
Codon usage and base composition in sequences from the A + T-rich genome ofRickettsia prowazekii, a member of the alpha Proteobacteria, have been investigated. Synonymous codon usage patterns are roughly similar among genes, even though the data set includes genes expected to be expressed at very different levels, indicating that translational selection has been ineffective in this species. However, multivariate statistical analysis differentiates genes according to their G + C contents at the first two codon positions. To study this variation, we have compared the amino acid composition patterns of 21R. prowazekii proteins with that of a homologous set of proteins fromEscherichia coli. The analysis shows that individual genes have been affected by biased mutation rates to very different extents: genes encoding proteins highly conserved among other species being the least affected. Overall, protein coding and intergenic spacer regions have G + C content values of 32.5% and 21.4%, respectively. Extrapolation from these values suggests thatR. prowazekii has around 800 genes and that 60–70% of the genome may be coding. Correspondence to: S.G.E. Andersson  相似文献   

16.
为分析栽培大豆和野生大豆线粒体基因组的密码子使用特征差异,该文以其线粒体基因组编码序列为研究对象,比较其密码子偏性形成的影响因素和演化过程。结果表明:(1)栽培大豆和野生大豆线粒体基因组编码区的GC含量分别为44.56%和44.58%,说明栽培大豆和野生大豆线粒体编码基因均富含A/T碱基。(2)栽培大豆和野生大豆线粒体基因组密码子第1位、第2位GC含量平均值与第3位GC含量的相关性均呈极显著水平,说明突变在其密码子偏性形成中的作用不可忽略; PR2-plot分析显示,在同义密码子第3位碱基的使用频率上,嘌呤低于嘧啶; Nc-plot分析中Nc比值位于-0.1~0.2区间的基因数占总基因数的95%以上;突变和选择等多重因素共同作用影响了大豆线粒体基因组编码序列密码子使用偏性的形成。(3)有20、21个密码子分别被确定为栽培大豆和野生大豆线粒体基因组编码序列的最优密码子,其中除丝氨酸TCC密码子外均以A或T结尾。综上结果认为,栽培大豆线粒体密码子偏性的形成受选择的影响要高于野生大豆,这可能是栽培大豆由野生大豆经长期人工栽培驯化的结果。  相似文献   

17.
Long stretches of “rare” codons are known to severely inhibit the efficiency of translation. Understanding the distribution of such rare codons is of critical importance in improving the efficiency of heterologous gene expression systems. Accurate estimates of codon usage take the abundance of each protein into consideration. In this paper, we analyze the correlation between approximate measures of codon usage and the availability of tRNA at various growth rates in E coli. We show that the computationally derived estimates of tRNA isoacceptor concentration enable the finding of poorly translated codons.  相似文献   

18.
Insects, the most biodiverse taxonomic group, have high AT content in their mitochondrial genomes. Although codon usage tends to be AT-rich, base composition and codon usage of mitochondrial genomes may vary among taxa. Thus, we compare base composition and codon usage patterns of 49 insect mitochondrial genomes. For protein coding genes, AT content is as high as 80% in the Hymenoptera and Lepidoptera and as low as 72% in the Orthopotera. The AT content is high at positions 1 and 3, but A content is low at position 2. A close correlation occurs between codon usage and tRNA abundance in nuclear genomes. Optimal codons can pair well with the antr codons of the most abundant tRNAs. One tRNA gene translates a synonymous codon family in vertebrate mitochondrial genomes and these tRNA anticodons can pair with optimal codons. However, optimal codons cannot pair with anticodons in mtDNA ofCochiiomyia hominivorax (Dipteral: CaLliphoridae). Ten optimal codons cannot pair with tRNA anticodons in all 49 insect mitochondrial genomes; non-optimal codon-anticodon usage is common and codon usage is not influenced by tRNA abundance.  相似文献   

19.

Background

Pine moths (Lepidoptera; Bombycoidea; Lasiocampidae: Dendrolimus spp.) are among the most serious insect pests of forests, especially in southern China. Although COI barcodes (a standardized portion of the mitochondrial cytochrome c oxidase subunit I gene) can distinguish some members of this genus, the evolutionary relationships of the three morphospecies Dendrolimus punctatus, D. tabulaeformis and D. spectabilis have remained largely unresolved. We sequenced whole mitochondrial genomes of eight specimens, including D. punctatuswenshanensis. This is an unambiguous subspecies of D. punctatus, and was used as a reference for inferring the relationships of the other two morphospecies of the D. punctatus complex. We constructed phylogenetic trees from this data, including twelve published mitochondrial genomes of other Bombycoidea species, and examined the relationships of the Dendrolimus taxa using these trees and the genomic features of the mitochondrial genome.

Results

The eight fully sequenced mitochondrial genomes from the three morphospecies displayed similar genome structures as other Bombycoidea species in terms of gene content, base composition, level of overall AT-bias and codon usage. However, the Dendrolimus genomes possess a unique feature in the large ribosomal 16S RNA subunits (rrnL), which are more than 60 bp longer than other members of the superfamily and have a higher AC proportion. The eight mitochondrial genomes of Dendrolimus were highly conservative in many aspects, for example with identical stop codons and overlapping regions. But there were many differences in start codons, intergenic spacers, and numbers of mismatched base pairs of tRNA (transfer RNA genes).Our results, based on phylogenetic trees, genetic distances, species delimitation and genomic features (such as intergenic spacers) of the mitochondrial genome, indicated that D. tabulaeformis is as close to D. punctatus as is D. punctatus wenshanensis, whereas D. spectabilis evolved independently from D. tabulaeformis and D. punctatus. Whole mitochondrial DNA phylogenies showed that D. spectabilis formed a well-supported monophyletic clade, with a clear species boundary separating it from the other congeners examined here. However, D. tabulaeformis often clustered with D. punctatus and with the subspecies D. punctatus wenshanensis. Genetic distance analyses showed that the distance between D. tabulaeformis and D. punctatus is generally less than the intraspecific distance of D. punctatus and its subspecies D. punctatus wenshanensis. In the species delimitation analysis of Poisson Tree Processes (PTP), D. tabulaeformis, D. punctatus and D. punctatus wenshanensis clustered into a putative species separated from D. spectabilis. In comparison with D. spectabilis, D. tabulaeformis and D. punctatus also exhibit a similar structure in intergenic spacer characterization. These different types of evidence suggest that D. tabulaeformis is very close to D. punctatus and its subspecies D. punctatus wenshanensis, and is likely to be another subspecies of D. punctatus.

Conclusions

Whole mitochondrial genomes possess relatively rich genetic information compared with the traditional use of single or multiple genes for phylogenetic purposes. They can be used to better infer phylogenetic relationships and degrees of relatedness of taxonomic groups, at least from the aspect of maternal lineage: caution should be taken due to the maternal-only inheritance of this genome. Our results indicate that D. spectabilis is an independent lineage, while D. tabulaeformis shows an extremely close relationship to D. punctatus.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1566-5) contains supplementary material, which is available to authorized users.  相似文献   

20.
Analysis of synonymous codon usage pattern in the genome of a thermophilic cyanobacterium, Thermosynechococcus elongatus BP-1 using multivariate statistical analysis revealed a single major explanatory axis accounting for codon usage variation in the organism. This axis is correlated with the GC content at third base of synonymous codons (GC3s) in correspondence analysis taking T. elongatus genes. A negative correlation was observed between effective number of codons i.e. Nc and GC3s. Results suggested a mutational bias as the major factor in shaping codon usage in this cyanobacterium. In comparison to the lowly expressed genes, highly expressed genes of this organism possess significantly higher proportion of pyrimidine-ending codons suggesting that besides, mutational bias, translational selection also influenced codon usage variation in T. elongatus. Correspondence analysis of relative synonymous codon usage (RSCU) with A, T, G, C at third positions (A3s, T3s, G3s, C3s, respectively) also supported this fact and expression levels of genes and gene length also influenced codon usage. A role of translational accuracy was identified in dictating the codon usage variation of this genome. Results indicated that although mutational bias is the major factor in shaping codon usage in T. elongatus, factors like translational selection, translational accuracy and gene expression level also influenced codon usage variation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号