首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Of the rules used by the splicing machinery to precisely determine intron-exon boundaries only a fraction is known. Recent evidence suggests that specific short sequences within exons help in defining these boundaries. Such sequences are known as exonic splicing enhancers (ESE). A possible bioinformatical approach to studying ESE sequences is to compare genes that harbor introns with genes that do not. For this purpose two non-redundant samples of 719 intron-containing and 63 intron-lacking human genes were created. We performed a statistical analysis on these datasets of intron-containing and intron-lacking human coding sequences and found a statistically significant difference (P = 0.01) between these samples in terms of 5-6mer oligonucleotide distributions. The difference is not created by a few strong signals present in the majority of exons, but rather by the accumulation of multiple weak signals through small variations in codon frequencies, codon biases and context-dependent codon biases between the samples. A list of putative novel human splicing regulation sequences has been elucidated by our analysis.  相似文献   

2.
The aim of the present study was to examine whether statistical methods common for the analysis of point process signals could be applied to the electromyogram, in order to extract information concerning the physiological mechanisms involved. This was carried out on the assumption that the electromyogram can be treated as the superposition result of a number of point process signals, each representing the firing pattern of one motor unit. No correlated activity between the different spike trains was assumed at this stage. A digital model for the superposition of event sequences was constructed, assigning to the individual sequences a Gaussian interval distribution. The effects of varying the number of spike trains participating in the superposition process, and changing the mean rates of firing were explored. The statistical methods used in the analysis were serial correlation, event autocorrelation, and power spectrum studies. It has been found that serial correlograms of the superimposed processes may be helpful in detecting the number of spike trains involved in the superposition, whereas power spectrum studies are useful in determining the mean rates of firing of the individual sequences.  相似文献   

3.
Identification of splicing regulatory elements (SREs) deserves special attention because these cis-acting short sequences are vital parts of splicing code. The fact that a variety of other biological signals cooperatively govern the splicing pattern indicates the necessity of developing novel tools to incorporate information from multiple sources to improve splicing factor binding sites prediction. Under this context, we proposed a Varying Effect Regression for Splicing Elements (VERSE) to discover intronic SREs in the proximity of exon junctions by integrating other biological features. As a result, 1562 intronic SREs were identified in 16 human tissues, many of which overlapped with experimentally verified binding motifs for several well-known splicing factors, including FOX-1, PTB, hnRNP A/B, hnRNP F/H, and so on. The discovered tissue, region, and conservation preferences of the putative motifs demonstrate that splice site selection is a complicated process that needs subtle and delicate regulation. VERSE may serve as a powerful tool to not only discover SREs by incorporating additional informative signals but also precisely quantify their varying contribution under different biological contexts.  相似文献   

4.
Pyrosequencing is one of the important next-generation sequencing technologies. We derive the distribution of the number of positive signals in pyrograms of this sequencing technology as a function of flow cycle numbers and nucleotide probabilities of the target sequences. As for the distribution of sequence length, we also derive the distribution of positive signals for the fixed flow cycle model. Explicit formulas are derived for the mean and variance of the distributions. A simple result for the mean of the distribution is that the mean number of positive signals in a pyrogram is approximately twice the number of flow cycles, regardless of nucleotide probabilities. The statistical distributions will be useful for instrument and software development for pyrosequencing and other related platforms.  相似文献   

5.
A database of 209 Drosophila introns was extracted from Genbank (release number 64.0) and examined by a number of methods in order to characterize features that might serve as signals for messenger RNA splicing. A tight distribution of sizes was observed: while the smallest introns in the database are 51 nucleotides, more than half are less than 80 nucleotides in length, and most of these have lengths in the range of 59-67 nucleotides. Drosophila splice sites found in large and small introns differ in only minor ways from each other and from those found in vertebrate introns. However, larger introns have greater pyrimidine-richness in the region between 11 and 21 nucleotides upstream of 3' splice sites. The Drosophila branchpoint consensus matrix resembles C T A A T (in which branch formation occurs at the underlined A), and differs from the corresponding mammalian signal in the absence of G at the position immediately preceding the branchpoint. The distribution of occurrences of this sequence suggests a minimum distance between 5' splice sites and branchpoints of about 38 nucleotides, and a minimum distance between 3' splice sites and branchpoints of 15 nucleotides. The methods we have used detect no information in exon sequences other than in the few nucleotides immediately adjacent to the splice sites. However, Drosophila resembles many other species in that there is a discontinuity in A + T content between exons and introns, which are A + T rich.  相似文献   

6.
In the past years, identification of alternative splicing (AS) variants has been gaining momentum. We developed AVATAR, a database for documenting AS using 5,469,433 human EST sequences and 26,159 human mRNA sequences. AVATAR contains 12000 alternative splicing sites identified by mapping ESTs and mRNAs with the whole human genome sequence. AVATAR also contains AS information for 6 eukaryotes. We mapped EST alignment information into a graph model where exons and introns are represented with vertices and edges, respectively. AVATAR can be queried using, (1) gene names, (2) number of identified AS events in a gene, (3) minimal number of ESTs supporting a splicing site, etc. as search parameters. The system provides visualized AS information for queried genes.

Availability  相似文献   


7.
8.
Recognition of coding regions within eukaryotic genomes is one of oldest but yet not solved problems of bioinformatics. New high-accuracy methods of splicing sites recognition are needed to solve this problem. A question of current interest is to identify specific features of nucleotide sequences nearby splicing sites and recognize sites in sequence context. We performed a statistical analysis of human genes fragment database and revealed some characteristics of nucleotide sequences in splicing sites neighborhood. Frequencies of all nucleotides and dinucleotides in splicing sites environment were computed and nucleotides and dinucleotides with extremely high\low occurrences were identified. Statistical information obtained in this work can be used in further development of the methods of splicing sites annotation and exon-intron structure recognition.  相似文献   

9.
10.
11.
Auxiliary splicing signals in introns play an important role in splice site selection, but these elements are poorly understood. We show that a subset of serine/arginine (SR)-rich proteins activate a cryptic 3' splice site in a sense Alu repeat located in intron 4 of the human LST1 gene. Utilization of this cryptic splice site is controlled by juxtaposed Alu-derived splicing silencers and enhancers between closely linked short tandem repeats TNFd and TNFe. Systematic mutagenesis of these elements showed that AG dinucleotides that were not preceded by purine residues were critical for repressing exon inclusion of a chimeric splicing reporter. Since the splice acceptor-like sequences are present in excess in exonic splicing silencers, these signals may contribute to inhibition of a large number of pseudosites in primate genomes.  相似文献   

12.
This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79–95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 super-family proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length.The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079–2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377–1382).  相似文献   

13.
水稻NBS-LRR基因选择性剪接的全基因组检测及分析   总被引:1,自引:0,他引:1  
顾连峰  郭荣发 《遗传学报》2007,34(3):247-257
选择性剪接是促进基因组复杂性和蛋白质组多样性的一种主要机制,但是对水稻NBS-LRR序列选择性剪接的全基因组分析却未见报道。通过隐马尔柯夫模型搜索,从TIGR数据库里得到了855条编码NBS-LRR基序的序列。利用这些序列在KOME、TIGR基因索引及UniProt三个数据库中进行同源搜索,获得同源的完整cDNA序列、假设一致性序列和蛋白质序列。再利用Spidey和SIM4程序把完整cDNA序列和假设一致性序列联配到相应的BAC序列上来预测选择性剪接。蛋白质序列和基因组序列之间的联配使用tBLASTn。在这875个NBS-LRR基因中,119个基因具有选择性剪接现象,其中包括71内含子保留,20个外显子跳跃,25个选择性起始,16个选择性终止,12个5′端的选择性剪接和16个3′端选择性剪接。大多数选择性剪接都为两个和多个转录本所支持。可以通过访问http://www.bioinfor.org查询这些数据。进而通过生物信息学分析剪接边界发现外显子跳跃和内含子保留的‘GT…AG’的规则不如组成型的保守。这暗示了它们是通过不同的调控机制来指导剪接变构体的形成。通过分析内含子保留对蛋白质的影响,发现选择性剪接的蛋白更倾向于改变其C端氨基酸序列。最后对选择性剪接的组织分布和蛋白质定位进行分析,结果表明选择性剪接的最大类的组织分布是根和愈伤组织。超过1/3剪接变构体的蛋白质定位是质膜和细胞质。这些选择性剪接蛋白可能在抗病信号转导中起到重要作用。  相似文献   

14.
The adenovirus E1A region encodes three overlapping mRNAs, designated 9S, 12S and 13S. They differ from each other with regard to the length of the intron which is removed by RNA splicing. We have constructed E1A genes with deletions and insertions in the intervening sequence that is common to all three E1A mRNAs, in a search for signals which influence splicing of the 13S mRNA. Mutant plasmids were transfected into HeLa cells and the transiently expressed E1A mRNAs characterized by the S1 protection assay. The results show that five upstream and 20 downstream nucleotides are sufficient to allow for a correct utilization of the 5'-splice junction for the E1A 13S mRNA. Moreover, we show that a minimal intron length of 78 nucleotides is required for efficient 13S mRNA splicing. The ability of mutants with large intron deletions to maturate a 13S mRNA could partially be restored by expanding the intron length with phage lambda sequences. However, in no case was the normal splicing efficiency obtained with these mutants. In contrast, one mutant in which sequences from the authentic 13S mRNA intron were used to expand the intron expressed almost normal levels of 13S mRNA, thus suggesting that signals which specifically promote 13S mRNA splicing exist.  相似文献   

15.
16.
Pre-mRNA splicing in higher plants   总被引:13,自引:0,他引:13  
Most plant mRNAs are synthesized as precursors containing one or more intervening sequences (introns) that are removed during the process of splicing. The basic mechanism of spliceosome assembly and intron excision is similar in all eukaryotes. However, the recognition of introns in plants has some unique features, which distinguishes it from the reactions in vertebrates and yeast. Recent progress has occurred in characterizing the splicing signals in plant pre-mRNAs, in identifying the mutants affected in splicing and in discovering new examples of alternatively spliced mRNAs. In combination with information provided by the Arabidopsis genome-sequencing project, these studies are contributing to a better understanding of the splicing process and its role in the regulation of gene expression in plants.  相似文献   

17.
Signal-dependent alternative splicing is important for regulating gene expression in eukaryotes, yet our understanding of how signals impact splicing mechanisms is limited. A model to address this issue is alternative splicing of Drosophila TAF1 pre-mRNA in response to camptothecin (CPT)-induced DNA damage signals. CPT treatment of Drosophila S2 cells causes increased inclusion of TAF1 alternative cassette exons 12a and 13a through an ATR signaling pathway. To evaluate the role of TAF1 pre-mRNA sequences in the alternative splicing mechanism, we developed a TAF1 minigene (miniTAF1) and an S2 cell splicing assay that recapitulated key aspects of CPT-induced alternative splicing of endogenous TAF1. Analysis of miniTAF1 indicated that splice site strength underlies independent and distinct mechanisms that control exon 12a and 13a inclusion. Mutation of the exon 13a weak 5' splice site or weak 3' splice site to a consensus sequence was sufficient for constitutive exon 13a inclusion. In contrast, mutation of the exon 12a strong 5' splice site or moderate 3' splice site to a consensus sequence was only sufficient for constitutive exon 12a inclusion in the presence of CPT-induced signals. Analogous studies of the exon 13 3' splice site suggest that exon 12a inclusion involves signal-dependent pairing between constitutive and alternative splice sites. Finally, intronic elements identified by evolutionary conservation were necessary for full repression of exon 12a inclusion or full activation of exon 13a inclusion and may be targets of CPT-induced signals. In summary, this work defines the role of sequence elements in the regulation of TAF1 alternative splicing in response to a DNA damage signal.  相似文献   

18.
E L Madison  P Bird 《Gene》1992,121(1):179-180
A phagemid (pSHT) containing the pUC and M13 ori sequences was constructed to facilitate the expression of partial cDNAs or of sequences encoding mammalian membrane- and secretory-protein domains. It provides a start codon and signal sequence flanked upstream by the simian virus 40 and bacteriophage T7 promoters and downstream by cloning sites, stop codons in all three frames, splicing and polyadenylation signals.  相似文献   

19.
We previously reported a computational approach to infer alternative splicing patterns from Mus musculus full-length cDNA clones and microarray data. Although we predicted a large number of unreported splice variants, the general mechanisms regulating alternative splicing were yet unknown. In the present study, we compared alternative exons and constitutive exons in terms of splice-site strength and frequency of potential regulatory sequences. These regulatory features were further compared among five different species: Homo sapiens, M. musculus, Arabidopsis thaliana, Oryza sativa, and Drosophila melanogaster. Solid statistical validations of our comparative analyses indicated that alternative exons have (1) weaker splice sites and (2) more potential regulatory sequences than constitutive exons. Based on our observations, we propose a combinatorial model of alternative splicing mechanisms, which suggests that alternative exons contain weak splice sites regulated alternatively by potential regulatory sequences on the exons.  相似文献   

20.
The branchpoint sequence and associated polypyrimidine tract are firmly established splicing signals in vertebrates. In plants, however, these signals have not been characterized in detail. The potato invertase mini-exon 2 (9 nt) requires a branchpoint sequence positioned around 50 nt upstream of the 5' splice site of the neighboring intron and a U11 element found adjacent to the branchpoint in the upstream intron (Simpson et al., RNA, 2000, 6:422-433). Utilizing the sensitivity of this plant splicing system, these elements have been characterized by systematic mutation and analysis of the effect on inclusion of the mini-exon. Mutation of the branchpoint sequence in all possible positions demonstrated that branchpoints matching the consensus, CURAY, were most efficient at supporting splicing. Branchpoint sequences that differed from this consensus were still able to permit mini-exon inclusion but at greatly reduced levels. Mutation of the downstream U11 element suggested that it functioned as a polypyrimidine tract rather than a UA-rich element, common to plant introns. The minimum sequence requirement of the polypyrimidine tract for efficient splicing was two closely positioned groups of uridines 3-4 nt long (<6 nt apart) that, within the context of the mini-exon system, required being close (<14 nt) to the branchpoint sequence. The functional characterization of the branchpoint sequence and polypyrimidine tract defines these sequences in plants for the first time, and firmly establishes polypyrimidine tracts as important signals in splicing of at least some plant introns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号