首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Tandem stop codons are extra stop codons hypothesized to be present downstream of genes to act as a backup in case of read-through of the real stop codon. Although seemingly absent from Escherichia coli, recent studies have confirmed the presence of such codons in yeast. In this paper we will analyze the genomes of two ciliate species—Paramecium tetraurelia and Tetrahymena thermophila—that reassign the stop codons TAA and TAG to glutamine, for the presence of tandem stop codons. We show that there are more tandem stop codons downstream of both Paramecium and Tetrahymena genes than expected by chance given the base composition of the downstream regions. This excess of tandem stop codons is larger in Tetrahymena and Paramecium than in yeast. We propose that this might be caused by a higher frequency of stop codon read-through in these species than in yeast, possibly because of a leaky termination machinery resulting from stop codon reassignment.  相似文献   

2.
Chen LL  Gao F 《The FEBS journal》2005,272(13):3328-3336
Eukaryotic genomes are composed of isochores, i.e. long sequences relatively homogeneous in GC content. In this paper, the isochore structure of Arabidopsis thaliana genome has been studied using a windowless technique based on the Z curve method and intuitive curves are drawn for all the five chromosomes. Using these curves, we can calculate the GC content at any resolution, even at the base level. It is observed that all the five chromosomes are composed of several GC-rich and AT-rich regions alternatively. Usually, these regions, named 'isochore-like regions', have large fluctuations in the GC content. Five isochores with little fluctuations are also observed. Detailed analyses have been performed for these isochores. A GC-rich 'isochore-like region' and a GC-isochore in chromosome II and IV, respectively, are the nucleolar organizer regions (NORs), and genes located in the two regions prefer to use GC-ending codons. Another GC-isochore located in chromosome II is a mitochondrial DNA insertion region, the position and size of this region is precisely predicted by the current method. The amino acid usage and codon preference of genes in this organellar-to-nuclear transfer region show significant difference from other regions. Moreover, the centromeres are located in GC-rich 'isochore-like regions' in all the five chromosomes. The current method can provide a useful tool for analyzing whole genomic sequences of eukaryotes.  相似文献   

3.
It has been hypothesized that the length of an exon tends to increase with the GC content because stop codons are AT-rich and should occur less frequently in GC-rich exons. This prediction assumes that mutation pressure plays a significant role in the occurrence and distribution of stop codons. However, the prediction is applicable not to all exons, but only to the last coding exon of a gene and to single-exon CDS sequences. We classified exons in multiexon genes in eight eukaryotic species into three groups-the first exon, the internal, and the last exon-and computed the Spearman correlation between the exon length and the percentage GC (%GC) for each of the three groups. In only five of the species studied is the correlation for the last coding exon greater than that for the first or internal exons. For the single-exon CDS sequences, the correlation between CDS length and %GC is mostly negative. Thus, eukaryotic genomes do not support the predicted relationship between exon length and %GC. In prokaryotic genomes, CDS length and %GC are positively correlated in each of the 68 completely sequenced prokaryotic genomes in GenBank with genomic GC contents varying from 25 to 68%, except for the wall-less Mycoplasma genitalium and the syphilis pathogen Treponema pallidum. Moreover, the average CDS length and the genomic GC content are also positively correlated. After correcting for genome size, the partial correlation between the average CDS length and the genomic GC content is 0.3217 ( p < 0.025).  相似文献   

4.
The genomes of homeothermic (warm-blooded) vertebrates are mosaic interspersions of homogeneously GC-rich and GC-poor regions (isochores). Evolution of genome compartmentalization and GC-rich isochores is hypothesized to reflect either selective advantages of an elevated GC content or chromosome location and mutational pressure associated with the timing of DNA replication in germ cells. To address the present controversy regarding the origins and maintenance of isochores in homeothermic vertebrates, newly obtained as well as published nucleotide sequences of the insulin and insulin-like growth factor (IGF) genes, members of a well-characterized gene family believed to have evolved by repeated duplication and divergence, were utilized to examine the evolution of base composition in nonconstrained (flanking) and weakly constrained (introns and fourfold degenerate sites) regions. A phylogeny derived from amino acid sequences supports a common evolutionary history for the insulin/IGF family genes. In cold- blooded vertebrates, insulin and the IGFs were similar in base composition. In contrast, insulin and IGF-II demonstrate dramatic increases in GC richness in mammals, but no such trend occurred in IGF- I. Base composition of the coding portions of the insulin and IGF genes across vertebrates correlated (r = 0.90) with that of the introns and flanking regions. The GC content of homologous introns differed dramatically between insulin/IGF-II and IGF-I genes in mammals but was similar to the GC level of noncoding regions in neighboring genes. Our findings suggest that the base composition of introns and flanking regions is determined by chromosomal location and the mutational pressure of the isochore in which the sequences are embedded. An elevated GC content at codon third positions in the insulin and the IGF genes may reflect selective constraints on the usage of synonymous codons.   相似文献   

5.
Synonymous codon choices vary considerably among Schistosoma mansoni genes. Principal components analysis detects a single major trend among genes, which highly correlates with GC content in third codon positions and exons, but does not discriminate among putatively highly and lowly expressed genes. The effective number of codons used in each gene, and its distribution when plotted against GC3, suggests that codon usage is shaped mainly by mutational biases. The GC content of exons, GC3, 5′, 3′, and flanking (5′+ 3′+ introns) regions are all correlated among them, suggesting that variations in GC content may exist among different regions of the S. mansoni genome. We propose that this genome structure might be among the most important factors shaping codon usage in this species, although the action of selection on certain sequences cannot be excluded. Received: 10 March 1997 / Accepted: 27 June 1997  相似文献   

6.
Base compositions were examined at every position in codons of more than 50 genes from taxonomically different bacteria and of the corresponding antisense sequences on the bacterial genes. We propose that the nonstop frame on antisense strand [NSF(a)] of GC-rich bacterial genes is the most promising sequence for newly-born genes. Reasons are: (i) NSF(a) frequently appears on the antisense strand of GC-rich bacterial genes; (ii) base compositions at three positions in the codon are nearly symmetrical between the gene having around 55% GC content and the corresponding NSF(a); (iii) amino acid compositions of actual proteins are also similar to those of hypothetical proteins from the GC-rich NSF(a); and (iv) proteins from NSF(a) of 60% or more GC content are flexible enough to adapt to various molecules encountered as novel substrates, due to the high glycine content. To support our proposition, using a computer we generated hypothetical antisense sequences with the same base compositions as of NSF(a) at each base position in the codon, and examined properties of resulting proteins encoded by the imaginary genes. It was confirmed that NSF(a) of GC-rich gene carrying about 60% GC content is competent enough for a newly-born gene.  相似文献   

7.
Summary The compositional distribution of coding sequences from five vertebrates (Xenopus, chicken, mouse, rat, and human) is shifted toward higher GC values compared to that of the DNA molecules (in the 35–85-kb size range) isolated from the corresponding genomes. This shift is due to the lower GC levels of intergenic sequences compared to coding sequences. In the cold-blooded vertebrate, the two distributions are similar in that GC-poor genes and GC-poor DNA molecules are largely predominant. In contrast, in the warm-blooded vertebrates, GC-rich genes are largely predominant over GC-poor genes, whereas GC-poor DNA molecules are largely predominant over GC-rich DNA molecules. As a consequence, the genomes of warm-blooded vertebrates show a compositional gradient of gene concentration. The compositional distributions of coding sequences (as well as of DNA molecules) showed remarkable differences between chicken and mammals, and between mouse (or rat) and human. Differences were also detected in the compositional distribution of housekeeping and tissue-specific genes, the former being more abundant among GC-rich genes.  相似文献   

8.
9.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

10.
DNA helix: the importance of being GC-rich   总被引:14,自引:2,他引:12       下载免费PDF全文
  相似文献   

11.
The short-chain oxidoreductase (SCOR) family of enzymes includes over 6000 members, extending from bacteria and archaea to humans. Nucleic acid sequence analysis reveals that significant numbers of these genes are remarkably free of stopcodons in reading frames other than the coding frame, including those on the antisense strand. The genes from this subset also use almost entirely the GC-rich half of the 64 codons. Analysis of a million hypothetical genes having random nucleotide composition shows that the percentage of SCOR genes having multiple open reading frames exceeds random by a factor of as much as 1 x 10(6). Nevertheless, screening the content of the SWISS-PROT TrEMBL database reveals that 15% of all genes contain multiple open reading frames. The SCOR genes having multiple open reading frames and a GC-rich coding bias exhibit a similar GC bias in the nucleotide triple composition of their DNA. This bias is not correlated with the GC content of the species in which the SCOR genes are found. One possible explanation for the conservation of multiple open reading frames and extreme bias in nucleic acid composition in the family of Rossman folds is that the primordial member of this family was encoded early using only very stable GC-rich DNA and that evolution proceeded with extremely limited introduction of any codons having two or more adenine or thymine nucleotides. These and other data suggest that the SCOR family of enzymes may even have diverged from a common ancestor before most of the AT-rich half of the genetic code was fully defined.  相似文献   

12.
Vertebrate genomes are characterized with CpG deficiency, particularly for GCpoor regions. The GC content-related CpG deficiency is probably caused by context-dependent deamination of methylated CpG sites. This hypothesis was examined in this study by comparing nucleotide frequencies at CpG flanking positions among invertebrate and vertebrate genomes. The finding is a transition of nucleotide preference of 5' T to 5' A at the invertebrate-vertebrate boundary, indicating that a large number of CpG sites with 5' Ts were depleted because of global DNA methylation developed in vertebrates. At genome level, we investigated CpG observed/expected (obs/exp) values in 500 bp fragments, and found that higher CpG obs/exp value is shown in GC-poor regions of invertebrate genomes (except sea urchin) but in GC-rich sequences of vertebrate genomes. We next compared GC content at CpG flanking positions with genomic average, showing that the GC content is lower than the average in invertebrate genomes, but higher than that in vertebrate genomes. These results indicate that although 5' T and 5' A are different in inducing deamination of methylated CpG sites, GC content is even more important in affecting the deamination rate. In all the tests, the results of sea urchin are similar to vertebrates perhaps due to its fractional DNA methylation. CpG deficiency is therefore suggested to be mainly a result of high mutation rates of methylated CpG sites in GC-poor regions.  相似文献   

13.
We present an algorithm to detect distances between oligonucleotidesin large collections of nucleic acids sequences. The ratiosof actual frequencies of occurrence of short oligonucleotidesat a given distance to the corresponding expected frequencieswere analyzed in four categories of DNA sequences (eukaryoticexons, bacterial genes, introns and non-Alu repeated DNAs).Three base periodic occurrences (independent of the readingframe) of all combinations of mononucleotides and repeats ofall dinucleotides was characteristic for protein coding regions.This was also the case with the majority of trinucleotides (includingtranslational stop signals) in these regions. Mirror-symmetrictrinucleotides (except GCG and CGC) displayed a strong tendencyto be two base periodically repeated in introns. Some two andthree base periodic motifs were also observed in repeated DNAs.The possible biological implications of outstanding three baseperiodicities in bacterial genes and eukaryotic exons are discussed. Received on March 2, 1987; accepted on May 5, 1987  相似文献   

14.
Schmegner C  Hoegel J  Vogel W  Assum G 《Genetics》2007,175(1):421-428
The human genome is composed of long stretches of DNA with distinct GC contents, called isochores or GC-content domains. A boundary between two GC-content domains in the human NF1 gene region is also a boundary between domains of early- and late-replicating sequences and of regions with high and low recombination frequencies. The perfect conservation of the GC-content distribution in this region between human and mouse demonstrates that GC-content stabilizing forces must act regionally on a fine scale at this locus. To further elucidate the nature of these forces, we report here on the spectrum of human SNPs and base pair substitutions between human and chimpanzee. The results show that the mutation rate changes exactly at the GC-content transition zone from low values in the GC-poor sequences to high values in GC-rich ones. The GC content of the GC-poor sequences can be explained by a bias in favor of GC > AT mutations, whereas the GC content of the GC-rich segment may result from a fixation bias in favor of AT > GC substitutions. This fixation bias may be explained by direct selection by the GC content or by biased gene conversion.  相似文献   

15.
Congenital adrenal hyperplasia (CAH) is a common autosomal recessive disorder mainly caused by defects in the steroid 21-hydroxylase (CYP21) gene. We have analyzed CYP21 gene sequences in 65 CAH families in Taiwan. All ten exons of the CYP21 gene were analyzed by differential polymerase chain reaction followed by single-strand conformation polymorphism electrophoresis and the amplification-created restriction site method. About 95% (123 chromosomes) contain mutations due to conversion of DNA sequences into its neighboring homologous pseudogene, CYP21P. Four novel mutations representing 5% of the total chromosomes have also been identified. The mutations were confirmed by sequencing an aberrant DNA fragment. These four mutations included a base change of the splicing donor site at intron 2 from GT to AT, a base substitution of C to T at codon 316, deletion of ten bases (TCCAGCTCCC) at codons 330–333 of exon 8, and duplication of 16 bases (CCTGGATGACACGGTC) at codons 393–397 of exon 9. The loss of the splicing donor site at intron 2 and the premature stop at codon 316 may result in aberrant splicing to reduce enzyme activity and a truncated protein with no enzyme activity, respectively. Likewise, both the duplication and the deletion forms create a frameshift and premature stop during translation. The resulting proteins lack the heme-binding domain and hence are expected to lose enzymatic activity. Since these mutations are not found in the neighboring CYP21P pseudogene, gene conversion should not be the cause of these novel mutations. Received: 20 April 1998 / Accepted: 30 May 1998  相似文献   

16.
Vertebrate genomes are comprised of isochores that are relatively long (>100 kb) regions with a relatively homogenous (either GC-rich or AT-rich) base composition and with rather sharp boundaries with neighboring isochores. Mammals and living archosaurs (birds and crocodilians) have heterogeneous genomes that include very GC-rich isochores. In sharp contrast, the genomes of amphibians and fishes are more homogeneous and they have a lower overall GC content. Because DNA with higher GC content is more thermostable, the elevated GC content of mammalian and archosaurian DNA has been hypothesized to be an adaptation to higher body temperatures. This hypothesis can be tested by examining structure of isochores across the reptilian clade, which includes the archosaurs, testudines (turtles), and lepidosaurs (lizards and snakes), because reptiles exhibit diverse body sizes, metabolic rates, and patterns of thermoregulation. This study focuses on a comparative analysis of a new set of expressed genes of the red-eared slider turtle and orthologs of the turtle genes in mammalian (human, mouse, dog, and opossum), archosaurian (chicken and alligator), and amphibian (western clawed frog) genomes. EST (expressed sequence tag) data from a turtle cDNA library enriched for genes that have specialized functions (developmental genes) revealed using the GC content of the third-codon-position to examine isochore structure requires careful consideration of the types of genes examined. The more highly expressed genes (e.g., housekeeping genes) are more likely to be GC-rich than are genes with specialized functions. However, the set of highly expressed turtle genes demonstrated that the turtle genome has a GC content that is intermediate between the GC-poor amphibians and the GC-rich mammals and archosaurs. There was a strong correlation between the GC content of all turtle genes and the GC content of other vertebrate genes, with the slope of the line describing this relationship also indicating that the isochore structure of turtles is intermediate between that of amphibians and other amniotes. These data are consistent with some thermal hypotheses of isochore evolution, but we believe that the credible set of models for isochore evolution still includes a variety of models. These data expand the amount of genomic data available from reptiles upon which future studies of reptilian genomics can build.  相似文献   

17.
Plant introns are typically AU-rich or U-rich, and this feature has been shown to be important for splicing. In maize, however, about 20% of the introns exceed 50% GC, and most of them are efficiently spliced. A series of constructs has been designed to analyze the cis requirements for splicing of the GC-rich Bz2 maize intron and two other GC-rich intron derivatives. By manipulating exon, intron and splice site sequences it is shown that exons can play an important role in intron definition: changes in exon sequences can increase splicing efficiency of a GC-rich intron from 17% to 86%. The relative difference, or base compositional contrast, in GC and U content between exon and intron sequences in the vicinity of splice sites, rather than the absolute base-content of the intron or exons, correlates with splicing efficiency. It is also shown that GC-rich intron constructs that are poorly spliced can be partially rescued by an improved 3' splice site.  相似文献   

18.
The mechanism by which protein-coding portions of eukaryotic genes came to be separated by long non-coding stretches of DNA, and the purpose for this perplexing arrangement, have remained unresolved fundamental biological problems for three decades. We report here a plausible solution to this problem based on analysis of open reading frame (ORF) length constraints in the genomes of nine diverse species. If primordial nucleic acid sequences were random in sequence, functional proteins that are innately long would not be encoded due to the frequent occurrence of stop codons. The best possible way that a long protein-coding sequence could have been derived was by evolving a split-structure from the random DNA (or RNA) sequence. Results of the systematic analyses of nine complete genome sequences presented here suggests that perhaps the major underlying structural features of split-genes have evolved due to the indigenous occurrence of split protein-coding genes in primordial random nucleotide sequence. The results also suggest that intron-rich genes containing short exons may have been the original form of genes intrinsically occurring in random DNA, and that intron-poor genes containing long exons were perhaps derived from the original intron-rich genes.  相似文献   

19.
Role of premature stop codons in bacterial evolution   总被引:1,自引:0,他引:1  
When the stop codons TGA, TAA, and TAG are found in the second and third reading frames of a protein-encoding gene, they are considered premature stop codons (PSC). Deinococcus radiodurans disproportionately favored TGA more than the other two triplets as a PSC. The TGA triplet was also found more often in noncoding regions and as a stop codon, though the bias was less pronounced. We investigated this phenomenon in 72 bacterial species with widely differing chromosomal GC contents. Although TGA and TAG were compositionally similar, we found a great variation in use of TGA but a very limited range of use of TAG. The frequency of use of TGA in the gene sequences generally increased with the GC content of the chromosome, while the frequency of use of TAG, like that of TAA, was inversely proportional to the GC content of the chromosome. The patterns of use of TAA, TGA and TAG as real stop codons were less biased and less influenced by the GC content of the chromosome. Bacteria with higher chromosomal GC contents often contained fewer PSC trimers in their genes. Phylogenetically related bacteria often exhibited similar PSC ratios. In addition, metabolically versatile bacteria have significantly fewer PSC trimers in their genes. The bias toward TGA but against TAG as a PSC could not be explained either by the preferential usage of specific codons or by the GC contents of individual chromosomes. We proposed that the quantity and the quality of the PSC in the genome might be important in bacterial evolution.  相似文献   

20.
Although bacterial species display wide variation in their overall GC contents, the genes within a particular species' genome are relatively similar in base composition. As a result, sequences that are novel to a bacterial genome—i.e., DNA introduced through recent horizontal transfer—often bear unusual sequence characteristics and can be distinguished from ancestral DNA. At the time of introgression, horizontally transferred genes reflect the base composition of the donor genome; but, over time, these sequences will ameliorate to reflect the DNA composition of the new genome because the introgressed genes are subject to the same mutational processes affecting all genes in the recipient genome. This process of amelioration is evident in a large group of genes involved in host-cell invasion by enteric bacteria and can be modeled to predict the amount of time required after transfer for foreign DNA to resemble native DNA. Furthermore, models of amelioration can be used to estimate the time of introgression of foreign genes in a chromosome. Applying this approach to a 1.43-megabase continuous sequence, we have calculated that the entire Escherichia coli chromosome contains more than 600 kb of horizontally transferred, protein-coding DNA. Estimates of amelioration times indicate that this DNA has accumulated at a rate of 31 kb per million years, which is on the order of the amount of variant DNA introduced by point mutations. This rate predicts that the E. coli and Salmonella enterica lineages have each gained and lost more than 3 megabases of novel DNA since their divergence. Received: 7 July 1996 / Accepted: 27 September 1996  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号