首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
A simple and species-independent coding measure   总被引:2,自引:0,他引:2  
We present a coding measure which is based on the statistical properties of the stop codons, and that is able to estimate accurately the variation of coding content along an anonymous sequence. As the stop codons play the same role in all the genomes (with very few exceptions) the measure turns out to be species-independent. We show results both for prokaryotic and for eukaryotic genomes, indicating, first, the accuracy of the measure, and, second, that better prediction is achieved if the measure is applied on homogeneous, isochore-like sequences than if it is applied following the standard moving window approach. Finally, we discuss on some of the possible applications of the measure.  相似文献   

3.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

4.
Fortes GG  Bouza C  Martínez P  Sánchez L 《Genetica》2007,129(3):281-289
To review the general consideration about the different compositional structure of warm and cold-blooded vertebrates genomes, we used of the increasing number of genetic sequences, including coding (exons) and non-coding (introns) regions, that have been deposited on the databases throughout last years. The nucleotide distributions of the third codon positions (GC3) have been analyzed in 1510 coding sequences (CDS) of fish, 1414 CDS of amphibians and 320 CDS of reptiles. Also, the relationship between GC content of 74, 56 and 25 CDS of fish, amphibians and reptiles, respectively and that of their corresponding introns (GCI) have been considerated. In accordance with recent data, sequence analysis showed the presence of very GC3-rich CDS in these poikilotherm vertebrates. However, very high diversity in compositional patterns among different orders of fish, amphibians and reptiles was found. Significant positive correlations between GC3 and GCI was also confirmed for the genes analyzed. Nevertheless, introns resulted to be poorer in GC than their corresponding CDS, this difference being larger than in human genome. Because the limited number of available sequences including exons and introns we must be cautious about the results derived from them. However, the indicious of higher GC richness of coding sequences than of their corresponding introns could aid to understand the discrepancy of sequence analysis with the ultracentrifugation studies in cold-blooded vertebrates that did not predict the existence of GC-rich isochores.  相似文献   

5.
We conducted a multi-genome analysis correlating protein domain organization with the exon-intron structure of genes in nine eukaryotic genomes. We observed a significant correlation between the borders of exons and domains on a genomic scale for both invertebrates and vertebrates. In addition, we found that the more complex organisms displayed consistently stronger exon-domain correlation, with substantially more significant correlations detected in vertebrates compared with invertebrates. Our observations concur with the principles of exon shuffling theory, including the prediction of predominantly symmetric phase of introns flanking the borders of correlating exons. These results suggest that extensive exon shuffling events during evolution significantly contributed to the shaping of eukaryotic proteomes.  相似文献   

6.
Since base composition of translational stop codons (TAG, TAA, and TGA) is biased toward a low G+C content, a differential density for these termination signals is expected in random DNA sequences of different base compositions. The expected length of reading frames (DNA segments of sense codons flanked by in-phase stop codons) in random sequences is thus a function of GC content. The analysis of DNA sequences from several genome databases stratified according to GC content reveals that the longest coding sequences—exons in vertebrates and genes in prokaryotes—are GC-rich, while the shortest ones are GC-poor. Exon lengthening in GC-rich vertebrate regions does not result, however, in longer vertebrate proteins, perhaps because of the lower number of exons in the genes located in these regions. The effects on coding-sequence lengths constitute a new evolutionary meaning for compositional variations in DNA GC content. Correspondence to: J. L. Oliver  相似文献   

7.
The codon table for the canonical genetic code can be rearranged in such a way that the code is divided into four quarters and two halves according to the variability of their GC and purine contents, respectively. For prokaryotic genomes, when the genomic GC content increases, their amino acid contents tend to be restricted to the GC-rich quarter and the purine-content insensitive half, where all codons are fourfold degenerate and relatively mutation-tolerant. Conversely, when the genomic GC content decreases, most of the codons retract to the AUrich quarter and the purine-content sensitive half; most of the codons not only remain encoding physicochemically diversified amino acids but also vary when transversion (between purine and pyrimidine) happens. Amino acids with sixfolddegenerate codons are distributed into all four quarters and across the two halves; their fourfold-degenerate codons are all partitioned into the purine-insensitive half in favorite of robustness against mutations. The features manifested in the rearranged codon table explain most of the intrinsic relationship between protein coding sequences (the informational content) and amino acid compositions (the functional content). The renovated codon table is useful in predicting abundant amino acids and positioning the amino acids with related or distinct physicochemical properties.  相似文献   

8.
Genetic code is not universal. Various non-standard versions of the code were found in mitochondrial, prokaryotic and eukaryotic genomes. Stop codons are used to signal the ribosome stop translation of the coding sequence and prone to reassignment to sense codons. Class-1 termination factors recognize stop codons and promote hydrolysis of the peptidyl-tRNA in ribosome (RF1, RF2 in prokaryotes and eRF1 in eukaryotes). The class-1 factor termination specificity is changed in non-standart codes organisms. Pyrrolysine and selenocysteine use dissimilar decoding strategies. The various non-standart code origin hypotheses are described. It was proposed that specificity alteration of the class-1 release factor was a starting point for stop codon reassignment.  相似文献   

9.
ABSTRACT: BACKGROUND: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes. RESULTS: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG. CONCLUSIONS: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.ReviewersThis article was reviewed by Michail Gelfand, Arcady Mushegian and Shamil Sunyaev. For the full reviews, please go to the Reviewers' Comments section.  相似文献   

10.
Base composition varies among and within eukaryote genomes. Although mutational bias and selection have initially been invoked, more recently GC-biased gene conversion (gBGC) has been proposed to play a central role in shaping nucleotide landscapes, especially in yeast, mammals, and birds. gBGC is a kind of meiotic drive in favor of G and C alleles, associated with recombination. Previous studies have also suggested that gBGC could be at work in grass genomes. However, these studies were carried on third codon positions that can undergo selection on codon usage. As most preferred codons end in G or C in grasses, gBGC and selection can be confounded. Here we investigated further the forces that might drive GC content evolution in the rice genus using both coding and noncoding sequences. We found that recombination rates correlate positively with equilibrium GC content and that selfing species (Oryza sativa and O. glaberrima) have significantly lower equilibrium GC content compared with more outcrossing species. As recombination is less efficient in selfing species, these results suggest that recombination drives GC content. We also detected a positive relationship between expression levels and GC content in third codon positions, suggesting that selection favors codons ending with G or C bases. However, the correlation between GC content and recombination cannot be explained by selection on codon usage alone as it was also observed in noncoding positions. Finally, analyses of polymorphism data ruled out the hypothesis that genomic variation in GC content is due to mutational processes. Our results suggest that both gBGC and selection on codon usage affect GC content in the Oryza genus and likely in other grass species.  相似文献   

11.
The small genome size (740 Mb), short life cycle (3 months) and high economic importance as a food crop legume make chickpea (Cicer arietinum L.) an important system for genomics research. Although several genetic linkage maps using various markers and genomic tools have become available, sequencing efforts and their use are limited in chickpea genomic research. In this study, we explored the genome organization of chickpea by sequencing approximately 500 kb from 11 BAC clones (three representing ascochyta blight resistance QTL1 (ABR-QTL1) and eight randomly selected BAC clones). Our analysis revealed that these sequenced chickpea genomic regions have a gene density of one per 9.2 kb, an average gene length of 2,500 bp, an average of 4.7 exons per gene, with an average exon and intron size of 401 and 316 bp, respectively, and approximately 8.6% repetitive elements. Other features analyzed included exon and intron length, number of exons per gene, protein length and %GC content. Although there are reports on high synteny among legume genomes, the microsynteny between the 500 kb chickpea and available Medicago truncatula genomic sequences varied depending on the region analyzed. The GBrowse-based annotation of these BACs is available at http://www.genome.ou.edu/plants_totals.html . We believe that our work provides significant information that supports a chickpea genome sequencing effort in the future.  相似文献   

12.
13.
Six diverse prokaryotic and five eukaryotic genomes were compared to deduce whether the protein synthesis termination signal has common determinants within and across both kingdoms. Four of the six prokaryotic and all of the eukaryotic genomes investigated demonstrated a similar pattern of nucleotide bias both 5′ and 3′ of the stop codon. A preferred core signal of 4 nt was evident, encompassing the stop codon and the following nucleotide. Codons decoded by hyper-modified tRNAs were over-represented in the region 5′ to the stop codon in genes from both kingdoms. The origin of the 3′ bias was more variable particularly among the prokaryotic organisms. In both kingdoms, genes with the highest expression index exhibited a strong bias but genes with the lowest expression showed none. Absence of bias in parasitic prokaryotes may reflect an absence of pressure to evolve more efficient translation. Experiments were undertaken to determine if a correlation existed between bias in signal abundance and termination efficiency. In Escherichia coli signal abundance correlated with termination efficiency for UAA and UGA stop codons, but not in mammalian cells. Termination signals that were highly inefficient could be made more efficient by increasing the concentration of the cognate decoding release factor.  相似文献   

14.
15.
The genomic DNAs of 1 1 species of percid fishes representing the five recognized North American genera are characterized using data from thermal denaturation assays. Base compositions were estimated from the transitional melting temperature of native and sonicated DNA and expressed as per cent guanine-cytosine (%GC) values. Among genera, %GC values for native DNAs (c, 23,000 base pairs in length) range between 38.3% GC for yellow perch, Perca flavescens (Mitchill), to 43.2% GC for sauger, Stizostedion cunadense (Smith). Significant variation in %GC values was observed among surveyed genera of the subfamily Percinae, which include Perca, Percinu, Etheostoma and Ammocrypfa . Melting profiles were generated for each species, and distinct GC rich regions were identified within the genomes of walleye, Sfizostcdion virreum (Mitchill) and Etheostoma spp. Compositional heterogeneity (CH) and asymmetry values were calculated from melting profile data. Patterns of variation in genomic characters differed among the genera surveyed. Members of the speciose genus Etheostomu showed relatively little variation in genomic characters, whereas Stizosredion exhibited significant interspecific variation.  相似文献   

16.
It is well known that stop codons play a critical role in the process of protein synthesis. However, little effort has been made to investigate whether stop codon usage exhibits biases, such as widely seen for synonymous codon usage. Here we systematically investigate stop codon usage bias in various eukaryotes as well as its relationships with its context, GC3 content, gene expression level, and secondary structure. The results show that there is a strong bias for stop codon usage in different eukaryotes, i.e., UAA is overrepresented in the lower eukaryotes, UGA is overrepresented in the higher eukaryotes, and UAG is least used in all eukaryotes. Different conserved patterns for each stop codon in different eukaryotic classes are found based on information content and logo analysis. GC3 contents increase with increasing complexity of organisms. Secondary structure prediction revealed that UAA is generally associated with loop structures, whereas UGA is more uniformly present in loop and stem structures, i.e., UGA is less biased toward having a particular structure. The stop codon usage bias, however, shows no significant relationship with GC3 content and gene expression level in individual eukaryotes. The results indicate that genomic complexity and GC3 content might contribute to stop codon usage bias in different eukaryotes. Our results indicate that stop codons, like synonymous codons, exhibit biases in usage. Additional work will be needed to understand the causes of these biases and their relationship to the mechanism of protein termination. [Reviewing Editor: Dr. Manyuan Long]  相似文献   

17.
18.
Salim HM  Ring KL  Cavalcanti AR 《Protist》2008,159(2):283-298
We used the recently sequenced genomes of the ciliates Tetrahymena thermophila and Paramecium tetraurelia to analyze the codon usage patterns in both organisms; we have analyzed codon usage bias, Gln codon usage, GC content and the nucleotide contexts of initiation and termination codons in Tetrahymena and Paramecium. We also studied how these trends change along the length of the genes and in a subset of highly expressed genes. Our results corroborate some of the trends previously described in Tetrahymena, but also negate some specific observations. In both genomes we found a strong bias toward codons with low GC content; however, in highly expressed genes this bias is smaller and codons ending in GC tend to be more frequent. We also found that codon bias increases along gene segments and in highly expressed genes and that the context surrounding initiation and termination codons are always AT rich. Our results also suggest differences in the efficiency of translation of the reassigned stop codons between the two species and between the reassigned codons. Finally, we discuss some of the possible causes for such translational efficiency differences.  相似文献   

19.
The GC contents of 2670 prokaryotic genomes that belong to diverse phylogenetic lineages were analyzed in this paper. These genomes had GC contents that ranged from 13.5% to 74.9%. We analyzed the distance of base frequencies at the three codon positions, codon frequencies, and amino acid compositions across genomes with respect to the differences in the GC content of these prokaryotic species. We found that although the phylogenetic lineages were remote among some species, a similar genomic GC content forced them to adopt similar base usage patterns at the three codon positions, codon usage patterns, and amino acid usage patterns. Our work demonstrates that in prokaryotic genomes: a) base usage, codon usage, and amino acid usage change with GC content with a linear correlation; b) the distance of each usage has a linear correlation with the GC content difference; and c) GC content is more essential than phylogenetic lineage in determining base usage, codon usage, and amino acid usage. This work is exceptional in that we adopted intuitively graphic methods for all analyses, and we used these analyses to examine as many as 2670 prokaryotes. We hope that this work is helpful for understanding common features in the organization of microbial genomes.  相似文献   

20.
Huang G  Wen Q  Gao Q  Zhang F  Bai Y 《Biotechnology letters》2011,33(10):1939-1947
As gene cloning from difficult templates with regionalized high GC content is a long recognized problem, we have developed a novel and reliable method to clone such genes. Firstly, the high GC content region of the target cDNA was synthesized directly after codon optimization and the remaining cDNA fragment without high GC content was generated by routine RT-PCR. Then the entire redesigned coding sequence of the target gene was obtained by fusing the above available two cDNA fragments with SOE-PCR (splicing by overlapping extension-PCR). We have cloned the human RANK gene (ten exons; CDS 1851 bp) using this strategy. The redesigned cDNA was transfected into an eukaryotic expression system (A459 cells) to verify its expression. RT-PCR and western blotting confirmed this. To validate our method, we also successfully cloned human TIMP2 gene (five exons; CDS 660 bp) also having a regionalized high GC content. Our strategy for combining codon optimization and SOE-PCR to clone difficult genes is thus feasible and potentially universally applicable.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号