首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The human immunodeficiency virus type-1 regulatory protein Rev is absolutely required for the production of viral structural proteins. Splice sites have been seen to function ascis-acting repressor sequendes (CRS) and inhibit expression of the Rev-dependent RNAs. In order to analyze the role of a splice donor in Rev dependence, the wild-type 5 splice donor of HIV-1 was mutated in the context of othergag sequences. Following transient transfection, RNA expression by RT-PCR was analyzed. The unspliced RNA produced by the mutant construct still required Rev for the cytoplasmic accumulation of the RNA. Despite deletion of the wild-type 5 splice donor and thetat splice acceptor was used. A cryptic splice donor was identified by PCR and subsequent cloning of the spliced RNA. The cryptic site is 5/9 to the consensus sequence and located immediately downstream of the initiation codon (ATG) for Gag. Analysis of the RNA product containing the cryptic splice donor revealed that the Rev was required for the cytoplasmic accumulation of unspliced RNA, while spliced RNA was Rev independent. Transfection of a wild-type construct also demonstrated usage of the cryptic splice donor. These results indicate that a cryptic splice donor can be activated when the wild-type splice donor is inactivated and that the cryptic splice donor may retain Rev regulation. The findings also suggest the potential for cryptic splice sites to serve as CRS in the determining the Rev dependence of viral RNAs.  相似文献   

2.
A clean data set of verified splice sites from Homo sapiens are reported as well as the standards used for the clean-up procedure. The sites were validated by: (i) standard cleaning procedures such as requiring consistency in the annotation of the gene structural elements, completeness of the coding regions and elimination of redundant sequences; (ii) clustering by decision trees coupled with analysis of ClustalW alignments of the translated protein sequence with homologous proteins from SWISS-PROT; (iii) matching against human EST sequences. The sites are categorised as: (i) donor sites, a set of 619 EST-confirmed donor sites, for which 138 are either the sites or the regions around the sites involved in alternative splice events; (ii) acceptor sites, a set of 623 EST-confirmed acceptor sites, for which 144 are either the sites or the regions around the sites are involved in alternative splice events; (iii) genuine splice sites, a set of 392 splice sites wherein both the donor and acceptor sites had EST confirmation and were not involved in any alternative splicing; (iv) alternative splice sites, a set of 209 splice sites wherein both the donor and acceptor sites had EST confirmation and the sites or the regions around them were involved in alternative splicing. A set of nucleotide regions that can be used to generate a control set of false splice sites that have a high confidence of being non-functional are also reported.  相似文献   

3.
4.
Zhang L  Luo L 《Nucleic acids research》2003,31(21):6214-6220
Based on the conservation of nucleotides at splicing sites and the features of base composition and base correlation around these sites we use the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to study the dependence structure of splicing sites and predict the exons/introns and their boundaries for four model genomes: Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and human. The comparison of compositional features between two sequences and the comparison of base dependencies at adjacent or non-adjacent positions of two sequences can be integrated automatically in the increment of diversity (ID). Eight feature variables around a potential splice site are defined in terms of ID. They are integrated in a single formal framework given by IDQD. In our calculations 7 (8) base region around the donor (acceptor) sites have been considered in studying the conservation of nucleotides and sequences of 48 bp on either side of splice sites have been used in studying the compositional and base-correlating features. The windows are enlarged to 16 (donor), 29 (acceptor) and 80 bp (either side) to improve the prediction for human splice sites. The prediction capability of the present method is comparable with the leading splice site detector—GeneSplicer.  相似文献   

5.
Xia H  Bi J  Li Y 《Nucleic acids research》2006,34(21):6305-6313
Alternative splicing plays an important role in regulating gene expression. Currently, most efficient methods use expressed sequence tags or microarray analysis for large-scale detection of alternative splicing. However, it is difficult to detect all alternative splice events with them because of their inherent limitations. Previous computational methods for alternative splicing prediction could only predict particular kinds of alternative splice events. Thus, it would be highly desirable to predict alternative 5'/3' splice sites with various splicing levels using genomic sequences alone. Here, we introduce the competition mechanism of splice sites selection into alternative splice site prediction. This approach allows us to predict not only rarely used but also frequently used alternative splice sites. On a dataset extracted from the AltSplice database, our method correctly classified approximately 70% of the splice sites into alternative and constitutive, as well as approximately 80% of the locations of real competitors for alternative splice sites. It outperforms a method which only considers features extracted from the splice sites themselves. Furthermore, this approach can also predict the changes in activation level arising from mutations in flanking cryptic splice sites of a given splice site. Our approach might be useful for studying alternative splicing in both computational and molecular biology.  相似文献   

6.
Conserved quartets near 5' intron junctions in primate nuclear pre-mRNA   总被引:2,自引:0,他引:2  
Analysis of a 1000 nucleotide span around 664 primate 5' exon/intron junctions revealed frequent recurrences of G-rich runs downstream of the 5' splice sites. In particular, AGGG, GGGA, GGGG, GGGT and TGGG are frequent at this site. Some C-rich quarters are frequent upstream of the 5' splice site. Similar behaviour of these G- and C-rich quartets is indicated for the 587 rodent introns and for a combined eukaryotic file containing 1688 introns. (A)GGG(A) is also frequent in the introns 60 nucleotides upstream of the 3' splice site, and (A)CCC(A) is frequently found in the exons downstream of the 3' site. The same consistent behaviour of the 3' splice sites is obtained as for the 5' sites, for the primates, rodents and combined eukaryotic file. These results suggest that in addition to the well-conserved 5' and 3' splice sequences, exon as well as intron sequences may play a role in nuclear pre-mRNA splicing.  相似文献   

7.
Jin HY  Luo LF  Zhang LR 《Gene》2008,424(1-2):115-120
A crucial part in the gene structure prediction is to identify the accurate splice sites, not only constitutive but also alternative ones. Here, we use the maximum information principle (MIP) to analyze the conservative segments around splice sites. According to the MIP, a reaction free energy (RFE) expression is deduced, which can be employed to estimate the free energy change during splicing reaction involving a donor or acceptor site. The expression contains not only the background probability factors, but also all kinds of dependencies among both adjacent and non-adjacent bases. We apply the RFE expression to recognize splice sites and their flanking competitors in human genes, the results show high sensitivity and specificity, so the RFE expression accords well with the splicing reaction process. Moreover, the RFE expression is better than previous methods for predicting competitors of splice sites, and it outperforms the reaction free energy subtraction (RFES), that implies RFE competition between a given splice site and its flanking competitor may not be an only primary factor for alternative splice site selection. The work is helpful to not only the understanding of splicing reaction from its relation to MIP, but also the research on computational recognition of splicing sites and alternative splice events.  相似文献   

8.
We have collected over half a million splice sites from five species-Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana-and classified them into four subtypes: U2-type GT-AG and GC-AG and U12-type GT-AG and AT-AC. We have also found new examples of rare splice-site categories, such as U12-type introns without canonical borders, and U2-dependent AT-AC introns. The splice-site sequences and several tools to explore them are available on a public website (SpliceRack). For the U12-type introns, we find several features conserved across species, as well as a clustering of these introns on genes. Using the information content of the splice-site motifs, and the phylogenetic distance between them, we identify: (i) a higher degree of conservation in the exonic portion of the U2-type splice sites in more complex organisms; (ii) conservation of exonic nucleotides for U12-type splice sites; (iii) divergent evolution of C.elegans 3' splice sites (3'ss) and (iv) distinct evolutionary histories of 5' and 3'ss. Our study proves that the identification of broad patterns in naturally-occurring splice sites, through the analysis of genomic datasets, provides mechanistic and evolutionary insights into pre-mRNA splicing.  相似文献   

9.
A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus approximately 600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.  相似文献   

10.
It has long been considered that cryptic splice sites are ignored by the splicing machinery in the context of intact genuine splice sites. In the present study, it is shown that cryptic splice sites are utilized in all circumstances, when the authentic site is intact, partially functional or completely abolished. Their use would therefore contribute to a background lack of fidelity in the context of the wild-type sequence. We also found that a mutation at the 5' splice site of beta-globin intron 1 accommodates multiple cryptic splicing pathways, including three previously reported pathways. Focusing on the two major cryptic 5' splice sites within beta-globin exon 1, we show that cryptic splice site selection ex vivo varies depending upon: (a) the cell stage of development during terminal erythroid differentiation; (b) the nature of the mutation at the authentic 5' splice site; and (c) the nature of the promoter. Finally, we found that the two major cryptic 5' splice sites are utilized with differential efficiencies in two siblings sharing the same beta-globin chromosome haplotype in the homozygous state. Collectively, these data suggest that intrinsic, sequence specific factors and cell genetic background factors both contribute to promote a subtle differential use of cryptic splice sites in vivo.  相似文献   

11.
The performance of computational tools that can predict human splice sites are reviewed using a test set of EST-confirmed splice sites. The programs (namely HMMgene, NetGene2, HSPL, NNSPLICE, SpliceView and GeneID-3) differ from one another in the degree of discriminatory information used for prediction. The results indicate that, as expected, HMMgene and NetGene2 (which use global as well as local coding information and splice signals) followed by HSPL (which uses local coding information and splice signals) performed better than the other three programs (which use only splice signals). For the former three programs, one in every three false positive splice sites was predicted in the vicinity of true splice sites while only one in every 12 was expected to occur in such a region by chance. The persistence of this observation for programs (namely FEXH, GRAIL2, MZEF, GeneID-3, HMMgene and GENSCAN) that can predict all the potential exons (including optimal and sub-optimal) was assessed. In a high proportion (>50%) of the partially correct predicted exons, the incorrect exon ends were located in the vicinity of the real splice sites. Analysis of the distribution of proximal false positives indicated that the splice signals used by the algorithms are not strong enough to discriminate particularly those false predictions that occur within ± 25 nt around the real sites. It is therefore suggested that specialised statistics that can discriminate real splice sites from proximal false positives be incorporated in gene prediction programs.  相似文献   

12.
A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT-AG junctions (22 199 entries) and 0.56% have non-canonical GC-AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors corrected to AT-AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac. uk/spldb/SpliceDB.html and at http://www.softberry. com/spldb/SpliceDB.html.  相似文献   

13.
The fourth exon of the mouse polymeric immuno-globulin receptor (pIgR) is 654 nt long and, despite being surrounded by large introns, is constitutively spliced into the mRNA. Deletion of an 84 nt sequence from this exon strongly activated both cryptic 5' and 3' splice sites surrounding a 78 nt cryptic intron. The 84 nt deletion is just upstream of the cryptic 3' splice site; the cryptic 3' splice site was likely activated because the deletion created a better 3' splice site. However, the cryptic 5' splice site was also required to activate the cryptic splice reaction; point mutations in either of the cryptic splice sites that decreased their match to the consensus splice site sequence inactivated the cryptic splice reaction. The activation and inactivation of these cryptic splice sites as a pair suggests that they are being co-recognized by the splicing machinery. Interestingly, the large fourth exon of the pIgR gene encodes two immunoglobulin-like extracellular protein domains; the cryptic 3' splice site coincides with the junction between these protein domains. The cryptic 5' splice site is located between protein subdomains where an intron is found in another gene of the immunoglobulin superfamily.  相似文献   

14.
Alternative 5' splice site selection allows Bcl-x to produce two isoforms with opposite effects on apoptosis. The pro-apoptotic Bcl-x(S) variant is up-regulated by ceramide and down-regulated by protein kinase C through specific cis-acting exonic elements, one of which is bound by SAP155. Splicing to the Bcl-x(S) 5' splice site is also enforced by heterogeneous nuclear ribonucleoprotein (hnRNP) F/H proteins and by Sam68 in cooperation with hnRNP A1. Here, we have characterized exon elements that influence splicing to the 5' splice site of the anti-apoptotic Bcl-x(L) isoform. Within a 86-nucleotide region (B3) located immediately upstream of the Bcl-x(L) donor site we have identified two elements (ML2 and AM2) that stimulate splicing to the Bcl-x(L) 5' splice site. SRp30c binds to these elements and can shift splicing to the 5' splice site of Bcl-x(L) in an ML2/AM2-dependent manner in vitro and in vivo. The B3 region also contains an element that represses the use of Bcl-x(L). This element is bound by U1 small nuclear ribonucleoprotein and contains two 5' splice sites that can be used when the Bcl-x(L) 5' splice site is mutated or the ML2/AM2 elements are deleted. Conversely, mutating the cryptic 5' splice sites stimulates splicing to the Bcl-x(L) site. Thus, SRp30c stimulates splicing to the downstream 5' splice site of Bcl-x(L), thereby attenuating the repressive effect of upstream U1 snRNP binding sites.  相似文献   

15.
基于支持向量机(SVM)的剪接位点识别   总被引:14,自引:1,他引:13  
剪接位点的识别作为基因识别中的一个重要环节, 一直受到研究人员的关注。考虑到剪接位点附近存在的序列保守性,已有一些基于统计特性的方法被用于剪接位点的识别中,但效果仍有待进一步改进。支持向量机(Support Vector Machines) 作为一种新的基于统计学习理论的学习机,近几年有了很大的发展,已被应用在模式识别的许多问题中。文中将其用于剪接位点的识别中,并针对满足GT- AG 规则的序列样本中虚假剪接位点的样本数远大于真实位点这一特性, 提出了一种基于SVM 的平衡取小法以获得更好的识别效果。实验结果表明,应用支持向量机进行剪接位点的识别能更好地提取位点附近保守序列的统计特征,对测试集具有更好的推广能力,并且使用上更加简单。这一结果为剪接位点的识别提供了一种新的方法,同时也为生物大分子研究中结构和位点的识别问题的解决提供了新的线索。  相似文献   

16.
Prediction of exact boundaries of exons   总被引:3,自引:0,他引:3  
It is known that while the programs used to predict genes are good at determining coding nucleotides, there are considerable inaccuracies in the determination of the gene structural elements. Among them, the most notable is that of the exact boundaries of exons. In order to assess this, we had earlier reviewed various programs that predict potential splice sites and exons. The results led to the following two observations: (i) a high proportion of false positive splice sites from computational predictions occur in the vicinity of real splice sites; and (ii) current algorithms are misled to predict wrong splice sites more often when the coding potential ends within +/-25 nucleotides from real sites than when it ends at farther positions. In this report, we review decision tree models for human splice sites and the resultant software tool, namely SpliceProximalCheck, that discriminates such'proximal' false positives from real splice sites. Further presented is an integrated system (MZEF-SPC) with Splice ProximalCheck (SPC) as a front-end tool operating on the results of Michael Zhang's exon finder program. Examination of the output of the integrated program on an illustrative gene set revealed that as much as 61 of 93 MZEF-predicted false positive exons could be eliminated by SPC for a loss of only 3 out of 33 MZEF-predicted true positive exons.  相似文献   

17.
18.
19.
20.
Splice site selection is a key element of pre-mRNA splicing and involves specific recognition of consensus sequences at the 5(') and 3(') splice sites. Evidently, the compliance of a given sequence with the consensus 5(') splice site sequence is not sufficient to define it as a functional 5(') splice site, because not all sequences that conform with the consensus are used for splicing. We have previously hypothesized that the necessity to avoid the inclusion of premature termination codons within mature mRNAs may serve as a criterion that differentiates normal 5(') splice sites from unused (latent) ones. We further provided experimental support to this idea, by analyzing the splicing of pre-mRNAs in which in-frame stop codons upstream of a latent 5(') splice site were mutated, and showing that splicing using the latent site is indeed activated by such mutations. Here we evaluate this hypothesis by a computerized survey for latent 5(') splice sites in 446 protein-coding human genes. This data set contains 2311 introns, in which we found 10490 latent 5(') splice sites. The utilization of 10045 (95.8%) of these sites for splicing would have led to the inclusion of an in-frame stop codon within the resultant mRNA. The validity of this finding is confirmed here by statistical analyses. This finding, together with our previous experimental results, invokes a nuclear scanning mechanism, as part of the splicing machine, which identifies in-frame stop codons within the pre-mRNA and prevents splicing that could lead to the formation of a prematurely terminated protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号