首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.  相似文献   

2.
3.
We have collected over half a million splice sites from five species-Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana-and classified them into four subtypes: U2-type GT-AG and GC-AG and U12-type GT-AG and AT-AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT-AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3' splice sites (3'ss) and (iv) distinct evolutionary histories of 5' and 3'ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.  相似文献   

4.
For the purpose of analyzing the relation between the splice sites and the order of introns, we conducted the following analysis for the GT-AG and GC-AG splice site groups. First, the pre-mRNAs of H. sapiens, M. musculus, D. melanogaster, A. thaliana and O. sativa were sampled by mapping the full-length cDNA to the genomes. Next, the consensus sequences at different regions of pre-mRNAs were analyzed in the five species. We also investigated the mononucleotide and dinucleotide frequencies in the extensive regions around the 5' splice sites (5'ss) and 3' splice sites (3'ss). As a result, differential frequencies of nucleotides at the first 5'ss in both the GT-AG and GC-AG splice site groups were observed in A. thaliana and O. sativa pre-mRNAs. The trend, which indicates that GC 5'ss possess strong consensus sequences, was observed not only in mammalian pre-mRNAs but also in the pre-mRNAs of D. melanogaster, A. thaliana and O. sativa. Furthermore, we examined the consensus sequences of the constitutive and alternative splice sites. It was suggested that in the case of the alternative GC-AG introns, the tendency to have a weak consensus sequence at 5'ss is different between H. sapiens and M. musculus pre-mRNAs.  相似文献   

5.
A combination of experimental and computational approaches was employed to identify introns with noncanonical GC-AG splice sites (GC-AG introns) within euascomycete genomes. Evaluation of 2335 cDNA-confirmed introns from Neurospora crassa revealed 27 such introns (1.2%). A similar frequency (1.0%) of GC-AG introns was identified in Fusarium graminearum, in which 3 of 292 cDNA-confirmed introns contained GC-AG splice sites. Computational analyses of the N. crassa genome using a GC-AG intron consensus sequence identified an additional 20 probable GC-AG introns in this fungus. For 8 of the 47 GC-AG introns identified in N. crassa a GC donor site is also present in a homolog from Magnaporthe grisea, F. graminearum, or Aspergillus nidulans. In most cases, however, homologs in these fungi contain a GT-AG intron or no intron at the corresponding position. These findings have important implications for fungal genome annotation, as the automated annotations of euascomycete genomes incorrectly identified intron boundaries for all of the confirmed and probable GC-AG introns reported here.  相似文献   

6.
It has been previously observed that the intrinsically weak variant GC donor sites, in order to be recognized by the U2-type spliceosome, possess strong consensus sequences maximized for base pair formation with U1 and U5/U6 snRNAs. However, variability in signal strength is a fundamental mechanism for splice site selection in alternative splicing. Here we report human alternative GC-AG introns (for the first time from any species), and show that while constitutive GC-AG introns do possess strong signals at their donor sites, a large subset of alternative GC-AG introns possess weak consensus sequences at their donor sites. Surprisingly, this subset of alternative isoforms shows strong consensus at acceptor exon positions 1 and 2. The improved consensus at the acceptor exon can facilitate a strong interaction with U5 snRNA, which tethers the two exons for ligation during the second step of splicing. Further, these isoforms nearly always possess alternative acceptor sites and exhibit particularly weak polypyrimidine tracts characteristic of AG-dependent introns. The acceptor exon nucleotides are part of the consensus required for the U2AF35-mediated recognition of AG in such introns. Such improved consensus at acceptor exons is not found in either normal or alternative GT-AG introns having weak donor sites or weak polypyrimidine tracts. The changes probably reflect mechanisms that allow GC-AG alternative intron isoforms to cope with two conflicting requirements, namely an apparent need for differential splice strength to direct the choice of alternative sites and a need for improved donor signals to compensate for the central mismatch base pair (C-A) in the RNA duplex of U1 snRNA and the pre-mRNA. The other important findings include (i) one in every twenty alternative introns is a GC-AG intron, and (ii) three of every five observed GC-AG introns are alternative isoforms.  相似文献   

7.
8.
GC-AG introns represent 0.7% of total human pre-mRNA introns. To study the function of GC-AG introns in splicing regulation, 196 cDNA-confirmed GC-AG introns were identified in Caenorhabditis elegans. These represent 0.6% of the cDNA- confirmed intron data set for this organism. Eleven of these GC-AG introns are involved in alternative splicing. In a comparison of the genomic sequences of homologous genes between C.elegans and Caenorhabditis briggsae for 26 GC-AG introns, the C at the +2 position is conserved in only five of these introns. A system to experimentally test the function of GC-AG introns in alternative splicing was developed. Results from these experiments indicate that the conserved C at the +2 position of the tenth intron of the let-2 gene is essential for developmentally regulated alternative splicing. This C allows the splice donor to function as a very weak splice site that works in balance with an alternative GT splice donor. A weak GT splice donor can functionally replace the GC splice donor and allow for splicing regulation. These results indicate that while the majority of GC-AG introns appear to be constitutively spliced and have no evolutionary constraints to prevent them from being GT-AG introns, a subset of GC-AG introns is involved in alternative splicing and the C at the +2 position of these introns can have an important role in splicing regulation.  相似文献   

9.
ING4 (inhibitor of growth 4) is a candidate tumor suppressor gene that is implicated as a repressor of cell growth, angiogenesis, cell spreading and cell migration and can suppress loss of contact inhibition in vitro. Another group and we identified four wobble-splicing isoforms of ING4 generated by alternative splicing at two tandem splice sites, GC(N)7GT and NAGNAG, which caused canonical (GT-AG) and non-canonical (GC-AG) splice site wobbling selection. Expression of the four ING4 wobble-splicing isoforms did not vary significantly in any of the cell lines examined. Here we show that ING4_v1 is translocated to the nucleolus, indicating that ING4 contains an intrinsic nucleolar localization signal. We further demonstrate that the subcellular localization of ING4 is modulated by two wobble-splicing events at the exon 4-5 boundary, causing displacement from the nucleolus to the nucleus. We also observed that ING4 is degraded through the ubiquitin-proteasome pathway and that it is subjected to N-terminal ubiquitination. We demonstrate that nucleolar accumulation of ING4 prolongs its half-life, but lack of nucleolar targeting potentially increases ING4 degradation. Taken together, our data suggest that the two wobble-splicing events at the exon 4-5 boundary influence subnuclear localization and degradation of ING4.  相似文献   

10.
Cleaning the GenBank Arabidopsis thaliana data set.   总被引:3,自引:1,他引:2       下载免费PDF全文
Data driven computational biology relies on the large quantities of genomic data stored in international sequence data banks. However, the possibilities are drastically impaired if the stored data is unreliable. During a project aiming to predict splice sites in the dicot Arabidopsis thaliana, we extracted a data set from the A.thaliana entries in GenBank. A number of simple 'sanity' checks, based on the nature of the data, revealed an alarmingly high error rate. More than 15% of the most important entries extracted did contain erroneous information. In addition, a number of entries had directly conflicting assignments of exons and introns, not stemming from alternative splicing. In a few cases the errors are due to mere typographical misprints, which may be corrected by comparison to the original papers, but errors caused by wrong assignments of splice sites from experimental data are the most common. It is proposed that the level of error correction should be increased and that gene structure sanity checks should be incorporated--also at the submitter level--to avoid or reduce the problem in the future. A non-redundant and error corrected subset of the data for A.thaliana is made available through anonymous FTP.  相似文献   

11.
Expressed sequence tags (ESTs) currently encompass more entries in the public databases than any other form of sequence data. Thus, EST data sets provide a vast resource for gene identification and expression profiling. We have mapped the complete set of 176,915 publicly available Arabidopsis EST sequences onto the Arabidopsis genome using GeneSeqer, a spliced alignment program incorporating sequence similarity and splice site scoring. About 96% of the available ESTs could be properly aligned with a genomic locus, with the remaining ESTs deriving from organelle genomes and non-Arabidopsis sources or displaying insufficient sequence quality for alignment. The mapping provides verified sets of EST clusters for evaluation of EST clustering programs. Analysis of the spliced alignments suggests corrections to current gene structure annotation and provides examples of alternative and non-canonical pre-mRNA splicing. All results of this study were parsed into a database and are accessible via a flexible Web interface at http://www.plantgdb.org/AtGDB/.  相似文献   

12.
AT-AC introns constitute a minor class of eukaryotic pre-mRNA introns, characterized by 5''-AT and AC-3'' boundaries, in contrast to the 5''-GT and AG-3'' boundaries of the much more prevalent conventional introns. In addition to the AT-AC borders, most known AT-AC introns have highly conserved 5'' splice site and branch site sequence elements of 7-8 nt. Intron 6 of the nucleolar P120 gene and intron 2 of the SCN4A voltage-gated skeletal muscle sodium channel are AT-AC introns that have been shown recently to be processed via a unique splicing pathway involving several minor U snRNAs. Interestingly, intron 21 of the same SCN4A gene and the corresponding intron 25 of the SCN5A cardiac muscle sodium channel gene also have 5''-AT and AC-3'' boundaries, but they have divergent 5'' splice site and presumptive branch site sequences. Here, we report the accurate in vitro processing of these two divergent AT-AC introns and show that they belong to a functionally distinct subclass of AT-AC introns. Splicing of these introns does not require U12, U4atac, and U6atac snRNAs, but instead requires the major spliceosomal snRNAs U1, U2, U4, U5, and U6. Previous studies showed that G --> A mutation at the first position and G --> C mutation at the last position of a conventional yeast or mammalian GT-AG intron suppress each other in vivo, suggesting that the first and last bases participate in an essential non-Watson-Crick interaction. Our results show that such introns, hereafter termed AT-AC II introns, occur naturally and are spliced by a mechanism distinct from that responsible for processing of the apparently more common AT-AC I introns.  相似文献   

13.
14.

Background  

While the current model of pre-mRNA splicing is based on the recognition of four canonical intronic motifs (5' splice site, branchpoint sequence, polypyrimidine (PY) tract and 3' splice site), it is becoming increasingly clear that splicing is regulated by both canonical and non-canonical splicing signals located in the RNA sequence of introns and exons that act to recruit the spliceosome and associated splicing factors. The diversity of human intronic sequences suggests the existence of novel recognition pathways for non-canonical introns. This study addresses the recognition and splicing of human introns that lack a canonical PY tract. The PY tract is a uridine-rich region at the 3' end of introns that acts as a binding site for U2AF65, a key factor in splicing machinery recruitment.  相似文献   

15.
16.
MOTIVATION: High accuracy of data always governs the large-scale gene discovery projects. The data should not only be trustworthy but should be correctly annotated for various features it contains. Sequence errors are inherent in single-pass sequences such as ESTs obtained from automated sequencing. These errors further complicate the automated identification of EST-related sequencing. A tool is required to prepare the data prior to advanced annotation processing and submission to public databases. RESULTS: This paper describes ESTprep, a program designed to preprocess expressed sequence tag (EST) sequences. It identifies the location of features present in ESTs and allows the sequence to pass only if it meets various quality criteria. Use of ESTprep has resulted in substantial improvement in accurate EST feature identification and fidelity of results submitted to GenBank. AVAILABILITY: The program is freely available for download from http://genome.uiowa.edu/pubsoft/software.html  相似文献   

17.
Peutz-Jeghers syndrome (PJS) is an autosomal dominant disorder associated with gastrointestinal polyposis and an increased cancer risk. PJS is caused by germline mutations in the tumor suppressor gene LKB1. One such mutation, IVS2+1A>G, alters the second intron 5' splice site, which has sequence features of a U12-type AT-AC intron. We report that in patients, LKB1 RNA splicing occurs from the mutated 5' splice site to several cryptic, noncanonical 3' splice sites immediately adjacent to the normal 3' splice site. In vitro splicing analysis demonstrates that this aberrant splicing is mediated by the U12-dependent spliceosome. The results indicate that the minor spliceosome can use a variety of 3' splice site sequences to pair to a given 5' splice site, albeit with tight constraints for maintaining the 3' splice site position. The unusual splicing defect associated with this PJS-causing mutation uncovers differences in splice-site recognition between the major and minor pre-mRNA splicing pathways.  相似文献   

18.
目的:计算识别果蝇中新的非经典剪接位点,以探索未知的剪接机制。方法:基于黑腹果蝇表达序列标签(EST)与其基因组序列比对数据重构基因结构,从中发现非经典的剪接位点,并采用Weblogo软件分析非经典剪接位点上下游序列,以期发现剪接相关的特异性元件。结果:共得到265个非经典的剪接位点,这些剪接位点落在195个蛋白编码基因上。结论:应用生物信息学方法在果蝇中发现了上百个非经典剪接位点,为研究非经典剪接机制奠定了基础。  相似文献   

19.
Bacteriophages T2 and T4 encode DNA-[N6-adenine] methyltransferases (Dam) which differ from each other by only three amino acids. The canonical recognition sequence for these enzymes in both cytosine and 5-hydroxymethylcytosine-containing DNA is GATC; at a lower efficiency they also recognize some non-canonical sites in sequences derived from GAY (where Y is cytosine or thymine). We found that T4 Dam fails to methylate certain GATA and GATT sequences which are methylated by T2 Dam. This indicates that T2 Dam and T4 Dam do not have identical sequence specificities. We analyzed DNA sequence data files obtained from GenBank, containing about 30% of the T4 genome, to estimate the overall frequency of occurrence of GATC, as well as non-canonical sites derived from GAY. The observed N6methyladenine (m6A) content of T4 DNA, methylated exclusively at GATC (by Escherichia coli Dam), was found to be in good agreement with this estimate. Although GATC is fully methylated in virion DNA, only a small percentage of the non-canonical sequences are methylated.  相似文献   

20.
The cytosine C5 methyltransferase M.HaeIII recognises and methylates the central cytosine of its canonical site GGCC. Here we report that M.HaeIII can also, with lower efficiency, methylate cytosines located in a wide range of non-canonical sequences. Using bisulphite sequencing we mapped the methyl- cytosine residues in DNA methylated in vitro and in vivo by M.HaeIII. Methyl-cytosine residues were observed in multiple sequence contexts, most commonly, but not exclusively, at star sites (sites differing by a single base from the canonical sequence). The most frequently used star sites had changes at positions 1 and 4, but there is little or no methylation at star sites changed at position 2. The rate of methylation of non-canonical sites can be quite significant: a DNA substrate lacking a canonical site was methylated by M.HaeIII in vitro at a rate only an order of magnitude slower than an otherwise identical substrate containing the canonical site. In vivo methylation of non-canonical sites may therefore be significant and may have provided the starting point for the evolution of restriction–modification systems with novel sequence specificities.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号