首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Analysis of DNA sequences of 132 introns and 140 exons from 42 pairs of orthologous genes of mouse and rat was used to compare patterns of evolutionary change between introns and exons. The mean of the absolute difference in length (measured in base pairs) between the two species was nearly five times as high in the case of introns as in the case of exons. The average rate of nucleotide substitution in introns was very similar to the rate of synonymous substitution in exons, and both were about three times the rate of substitution at nonsynonymous sites in exons. G+C content of introns and exons of the same gene were correlated; but mean G+C content at the third positions of exons was significantly higher than that of introns or positions 1–2 of exons from the same gene. G+C content was conserved over evolutionary time, as indicated by strong correlations between mouse and rat; but the change in G+C content was greatest at position 3 of exons, intermediate in introns, and lowest at positions 1–2 in introns. Received: 23 December 1996 / Accepted: 1 April 1997  相似文献   

2.
We previously observed that Antarctic fish genes contain intron sequences of high A+T content (60-70% average A+T) which are in stark contrast with adjacent protein coding-sequences. Here, we report that this disparity in intron/exon base composition is a common feature among teleosts. We analyzed 483 teleost genomic DNA sequences, containing 2583 introns, from 80 teleost genera that populate polar, temperate, or tropical habitats. Eighty-nine percent of teleost introns display an A+T content between 50-84% A+T with a mean of 60% A+T. In contrast, only 37% of teleost exons have an A+T content greater-than 50% with a mean of 48% A+T. A comparison to homologous mammalian genes showed a striking difference; in this case, introns and exons have similar base compositions, averaging 45-47% A+T. This indicates that most teleost genes exhibit a large difference in base composition between their introns and exons. There was no correlation of teleost intron A+T content to intron length or habitat temperature range. Thus, teleost intron sequences tend to show the common feature of being much higher in A+T content then neighboring exons.  相似文献   

3.
CpG islands in vertebrate genomes   总被引:120,自引:0,他引:120  
  相似文献   

4.
Base composition is not uniform across the genome of Drosophila melanogaster. Earlier analyses have suggested that there is variation in composition in D. melanogaster on both a large scale and a much smaller, within-gene, scale. Here we present analyses on 117 genes which have reliable intron/exon boundaries and no known alternative splicing. We detect significant heterogeneity in G+C content among intron segments from the same gene, as well as a significant positive correlation between the intron and the third codon position G+C content within genes. Both of these observations appear to be due, in part, to an overall decline in intron and third codon position G+C content along Drosophila genes with introns. However, there is also evidence of an increase in third codon position G+C content at the start of genes; this is particularly evident in genes without introns. This is consistent with selection acting against preferred codons at the start of genes. Received: 24 February 1997 / Accepted: 10 November 1997  相似文献   

5.
6.
Correlation was positive between the G + C content at the codon third position in genes of vertebrates and the G + C content of the genome portion surrounding each gene. Exons of genes with a high G + C% at the codon 3rd position are surrounded by G + C-rich introns and G + C-rich flanking sequences, and those with a low G + C% at the position by A + T-rich introns and flanking sequences. Analysis of G + C content distribution along DNA sequences using a DNA Sequence Data Bank supported the view that the vertebrate genome is a mosaic of regions with clear differences in their G + C content. The biological significance of the variation in G + C content throughout the vertebrate genome is discussed in connection with chromosomal banding.  相似文献   

7.
The recent determination of the complete sequence of chromosome III from the yeast Saccharomyces cerevisiae allows, for the first time, the investigation of the long range primary structure of a eukaryotic chromosome. We have found that, against a background G+C level of about 35%, there are two regions (one in each chromosome arm) in which G+C values rise to over 50%. This effect is seen in silent sites within genes, but not in noncoding intergenic sequences. The variation in G+C content is not related to differential selection of synonymous codons, and probably reflects mutational biases. That the intergenic regions do not exhibit the same phenomenon is particularly interesting, and suggests that they are under substantial constraint. The yeast chromosome may be a model of the structure of the human genome, since there is evidence that it is also a mosaic of long regions of different base compositions, reflected in wide variation of G+C content at silent sites among genes. Two possible causes of this regional effect, replication timing, and recombination frequency, are discussed.  相似文献   

8.
We have sequenced 14 introns from the ciliate Tetrahymena thermophila and include these in an analysis of the 27 intron sequences available from seven T. thermophila protein-encoding genes. Consensus 5' and 3' splice junctions were determined and found to resemble the junctions of other nuclear pre-mRNA introns. Unique features are noted and discussed. Overall the introns have a mean A + T content of 85% (21% higher than neighbouring exons) with smaller introns tending towards a higher A + T content. Approximately half of the introns are less than 100 bp. Introns from other organisms (approximately 30 of each) were also examined. The introns of Dictyostelium discoideum, Caenorhabditis elegans and Drosophila melanogaster, like those of T. thermophila, have a much higher mean A + T content than their neighbouring exons (greater than 20%). Introns from plants, Neurospora crassa and Schizosaccharomyces pombe also have a significantly higher A + T content (10%-20%). Since a high A + T content is required for intron splicing in plants (58), the elevated A + T content in the introns of these other organisms may also be functionally significant. The introns of yeast (Saccharomyces cerevisiae) and mammals (humans) appear to lack this trait and thus in some aspects may be atypical. The polypyrimidine tract, so distinctive of vertebrate introns, is not a trait of the introns in the non-vertebrate organisms examined in this study.  相似文献   

9.
10.
Identical G+1 mutations in three different introns of the gene for type III procollagen (COL3A1) that cause aberrant splicing of RNA were found in three probands with life-threatening variants of Ehlers-Danlos syndrome. Because the three mutations were in a gene with multiple and homologous exons, they provided an interesting test for factors that influence aberrant splicing. The G+1 to A mutation in intron 16 caused extensive exon skipping, the G+1 to A mutation in intron 20 caused both use of a cryptic splice site and retention of all the intron sequences, and the G+1 to A mutation in intron 42 caused efficient use of a single cryptic splice site. The different patterns of RNA splicing were not explained by evaluation of potential cryptic splice sites in the introns by either their homology with 5'-splice sites from other genes or by their delta G(0)37 values for binding to U1 RNA. Instead, the results suggested that the patterns of aberrant RNA splicing were primarily determined by the relative rates at which adjacent introns were normally spliced.  相似文献   

11.
The majority of eukaryotic genes consist of exons and introns. Introns can be inserted either between codons (phase 0) or within codons, after the first nucleotide (phase 1) and after the second (phase 2). We report here that the frequency of phase 0 increases and phase 1 declines from the 5′ region to the 3′ end of genes. This trend is particularly noticeable in genomes of Homo sapiens and Arabidopsis thaliana, in which gains of novel introns in the 3′ portion of genes were probably a dominant process. Similar but more moderate gradients exist in Drosophila melanogaster and Caenorhabditis elegans genomes, where the accumulation of novel introns was not a prevailing factor. There are nine types of exons, three symmetric (0,0; 1,1; 2,2) and six asymmetric (0,1; 1,0; 1,2; 2,1; 2,0; 0,2). Assuming random distribution of different types of introns along genes, one can expect the frequencies of asymmetric exons such as 0,1 and 1,0 or 1,2 and 2,1 to be approximately equal, allowing for some variation caused by randomness. The gradient in intron distribution leads to a small but consistent and statistically significant bias: phase 1 introns are more likely at the 5′ ends and phase 0 introns are more likely at the 3′ ends of asymmetric exons. For the same reason, the frequency of 0,0 exons increases and the frequency of 1,1 exons decreases in the 3′ direction, at least in H. sapiens and A. thaliana. The number of introns per gene also affects the distribution and frequency of phase 0 and 1 introns. The gradient provides an insight into the evolution of intron-exon structures of eukaryotic genes. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor: Dr. Manyuan Long]  相似文献   

12.
13.
Thalassiosira weissflogii (Grun.) Fryxell et Hasle is one of the more commonly studied centric diatoms, and yet molecular studies of this organism are still in their infancy. The ability to identify open reading frames and thus distinguish between introns and exons, coding and noncoding sequence is essential to move from nuclear DNA sequences to predicted amino acid sequences. To facilitate the identification of open reading frames in T. weissflogii , two newly identified nuclear genes encoding β-tubulin and t  -complex polypeptide (TCP)-γ, along with six previously published nuclear DNA sequences, were examined for general structural features. The coding region of the nuclear open reading frames had a G + C content of about 49% and could readily be distinguished from noncoding sequence due to a significant difference in G + C content. The introns were uniformly small, about 100 base pairs in size. Furthermore, the 5' and 3' splice sites of introns displayed the canonical GT/AG sequence, further facilitating recognition of noncoding regions. Six of the nuclear open reading frames displayed relatively little bias in the use of synonymous codons, as exemplified by the cDNAs encoding β-tubulin and TCP-γ. Two open reading frames displayed strong bias in the use of particular codons (although the codons used were different), as exemplified by the cDNA encoding fucoxanthin chlorophyll a/c binding protein. Knowledge of codon bias should facilitate, for example, design of degenerate PCR primers and potential heterologous reporter gene constructs.  相似文献   

14.
15.
To study the tissue-specific expression of the heart(H)- and liver(L)-type of rat cytochrome-c oxidase subunit VIa (rCOXVIa), we have screened and sequenced the genes for the two isoforms. Both genes contain three exons and two introns, spanning 880 bp (rCOXVIa-H) and 3089 bp (rCOXVIa-L), respectively. In both genes, exon I codes for the whole leader sequence comprising 12 (rCOXVIa-H) or 26 (rCOXVIa-L) amino acids and for 12 (rCOXVIa-H) or 10 (rCOXVIa-L) amino acids of the corresponding mature protein, while the remaining amino acids for the mature proteins are encoded by exons II and III. The 5′ region of the genes lack both TATA and CAAT boxes, but show a high G+C content in the early 5′-upstream region. We have identified in upstream regions and in the introns of both genes several putative binding sites associated with respiratory function, muscle gene activation and housekeeping function. In rCOXVIa-H, we identified a CCAC/Myo-D motif, known to be required for muscle-specific expression of the human myoglobin-encoding gene, which is not present in rCOXVIa-L. In addition, we have analyzed a pseudogene, showing 84% homology to the COXVIa-L cDNA sequence.  相似文献   

16.
In recent years, the amount of molecular sequencing data from Tetrahymena thermophila has dramatically increased. We analyzed G + C content, codon usage, initiator codon context and stop codon sites in the extremely A + T rich genome of this ciliate. Average G + C content was 38% for protein coding regions, 21% for 5' non-coding sequences, 19% for 3' non-coding sequences, 15% for introns, 19% for micronuclear limited sequences and 17% for macronuclear retained sequences flanking micronuclear specific regions. The 75 available T. thermophila protein coding sequences favored codons ending in T and, where possible, avoided those with G in the third position. Highly expressed genes were relatively G + C-rich and exhibited an extremely biased pattern of codon usage while developmentally regulated genes were more A + T-rich and showed less codon usage bias. Regions immediately preceding Tetrahymena translation initiator codons were generally A-rich. For the 60 stop codons examined, the frequency of G in the end + 1 site was much higher than expected whereas C never occupied this position.  相似文献   

17.
18.
19.
The G + C content of silent sites in codons varies greatly among Serratia marcescens genes; the value in any one gene seems to reflect a balance between mutation pressure towards high G + C content and natural selection constraining choice among synonymous codons. Interestingly, non-coding sequences have substantially lower G + C content than silent sites thought to be under little selective constraint.  相似文献   

20.
Can Codon Usage Bias Explain Intron Phase Distributions and Exon Symmetry?   总被引:1,自引:0,他引:1  
More introns exist between codons (phase 0) than between the first and the second bases (phase 1) or between the second and the third base (phase 2) within the codon. Many explanations have been suggested for this excess of phase 0. It has, for example, been argued to reflect an ancient utility for introns in separating exons that code for separate protein modules. There may, however, be a simple, alternative explanation. Introns typically require, for correct splicing, particular nucleotides immediately 5 in exons (typically a G) and immediately 3 in the following exon (also often a G). Introns therefore tend to be found between particular nucleotide pairs (e.g., G|G pairs) in the coding sequence. If, owing to bias in usage of different codons, these pairs are especially common at phase 0, then intron phase biases may have a trivial explanation. Here we take codon usage frequencies for a variety of eukaryotes and use these to generate random sequences. We then ask about the phase of putative intron insertion sites. Importantly, in all simulated data sets intron phase distribution is biased in favor of phase 0. In many cases the bias is of the magnitude observed in real data and can be attributed to codon usage bias. It is also known that exons may carry either the same phase (symmetric) or different phases (asymmetric) at the opposite ends. We simulated a distribution of different types of exons using frequencies of introns observed in real genes assuming random combination of intron phases at the opposite sides of exons. Surprisingly the simulated pattern was quite similar to that observed. In the simulants we typically observe a prevalence of symmetric exons carrying phase 0 at both ends, which is common for eukaryotic genes. However, at least in some species, the extent of the bias in favor of symmetric (0,0) exons is not as great in simulants as in real genes. These results emphasize the need to construct a biologically relevant null model of successful intron insertion.Reviewing Editor: Dr. Manyuan Long  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号