首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
近年来, 关于DNA 序列的分形尺度特性的研究引起了研究者广泛的兴趣, 许多研究表明,DNA 序列的外显子和内含子区域具有不同的分形尺度特性,这有可能成为区别外显子和内含子序列的特征之一。文中应用WTMM( Wavelet Transform Modulus Maxim) 方法分析DNA 序列的分形结构,计算表征分形结构尺度特性的量化参数 Hlder 指数。考虑到外显子序列的三联体编码特性, 计算了DNA 序列及三个不同的相位序列分别在三种DNA walk 方式下得到的序列的Hlder 指数,并将每个Hlder 指数作为一维特征,考察外显子与内含子序列的分布。计算结果表明,只按单个分形尺度参数来看,外显子与内含子不具有可分性。在此基础上,从模式识别的角度出发, 将外显子与内含子视为由此构成的多维特征空间中的两个模式类, 由此设计基于LLM(LocalLinear Map) 神经网络的分类器,并对分类器的错误率进行估计,实验结果表明外显子序列与内含子序列在此特征空间中具有聚类特性,从而表明以这一组分形尺度参数作为序列特征,外显子与内含子具有可分性。这一结果为研究外显子与内含子序列的识别算法提供了新的线索  相似文献   

2.
引入碱基间的关联,研究了外显子和内含子序列以双碱基为单位的分维,我们发现在这种情况下,外显子和内显子序列在短程和中程存在自相似性并分别定义了这两个区域的分维。结果表明,短程的分维值Dg一般比中程的Dm大,外显子的两个分维值比内含子大。我们改变双联体的位相而分维却不变,这反映出在双联体基础上,外显子的不规则性大于内含子,短程的不规则性大于中程,外显子和内含子序列对以2为周期的结构没有位相的特异性。  相似文献   

3.
(二)RNA水平上的基因重排 自发现内含子(intron)后,证明大多数真核基因是由外显子(exon)与内含子(intron)两种成分组成。人们把在mRNA成熟过程中被删除的那些片段称为内含子,保留在成熟的mRNA中的那些片段称为外显子。但是,人们很快发现intron里面有exon,exon里面有intron,有的序列单元在形成不同的mRNA中时隐时显,有时是外显子有时是内含子以至形成外显子不显,内含子不含,内  相似文献   

4.
鱼类基因内含子研究进展   总被引:1,自引:0,他引:1  
内含子是指断裂基因中的非编码区序列,在编码蛋白质前被去除。在高等生物中,内含子的长度远大于外显子,大部分随机突变会发生在内含子中。因此,内含子的存在使高等生物对突变的耐受能力大大增强了。研究表明,内含子可以提高基因表达效率;影响RNA的转录、剪接加工、出核孔以及翻译等过程;启动某些基因的表达;并通过选择性剪接调控基因的表达。内含子功能的研究成果给当前鱼类免疫基因研究开拓了全新的视野。对内含子的分类、剪接、功能以及鱼类内含子研究的新进展进行了综述,并展望了内含子在鱼类免疫基因研究中的应用。  相似文献   

5.
线虫核糖核蛋白基因内含子与相应编码序列的相互作用   总被引:1,自引:0,他引:1  
对线虫核糖核蛋白基因内含子序列与相应编码序列采用Smith-Waterman方法做局域比对分析,探讨两者之间的相互作用机制.发现内含子中部序列确实存在与相应编码序列的相互作用区域.第一内含子的最佳匹配分布在内含子15%~55%的区域内,第二内含子的最佳匹配分布在内含子30%~80%的区域内.对于长内含子,在与外显子序列比对时,最佳匹配分布在内含子5%~20% 区域内,在与整个编码序列比对时,出现了两个峰区,一个位于内含子15%~30%区域内,另一个位于内含子54%~78%区域内.推测第一个峰区与外显子内部序列有关,第二个峰区与外显子-外显子结合区域的序列有关.还发现编码序列上存在多个与内含子序列的相互作用域和一些禁配区域分布.推测这些禁配区域与蛋白质结合区域有关.结论印证了内含子序列与相应编码序列协同进化的观点.  相似文献   

6.
目的:研究一典型的青少年发病的成人型糖尿病家系并研究其基因突变位点。方法:以1个典型MODY家系的7名成员为研究对象,同时以10名无糖尿病家族史的普通2型糖尿病患者和10名健康人员为2个对照组。抽取外周血,分离白细胞,用快速盐析法提取基因组DNA,以基因组DNA为模板对HNF-1α基因的4号、2号和6号外显子和GCK基因的1号外显子进行PCR扩增,扩增产物经纯化后直接进行序列测定,并分别和各自的正常序列进行比较。结果:7名MODY家系成员HNF-1α2号外显子上游的内含子均存在一碱基G→A置换,即IVS2nt-42 G-A;4例存在HNF-1α6号外显子P380fsinsG移码突变,其中1例合并P379S点突变和IVS6nt-4G-A突变;1例存在P379S点突变;2例未发现突变和多态性。GCK基因的1号外显子及其内含子均未发现有突变或多态性。20名对照组成员均未发现有GCK1号外显子和HNF-1α2、4、6号外显子及其内含子的突变。结论:本家系是HNF-1α基因的6号外显子及其上游内含子突变(移码突变和/或点突变)和2号外显子上游内含子的一个碱基突变,该家系MODY属于MODY3。  相似文献   

7.
剪切反应是基因选择性表达中的重要环节之一,其过程主要包括2步:去除内含子及连接外显子。依据碱基序列和潜在折叠方式的差异,内含子可分为3种类型:Ⅰ类内含子、Ⅱ类内含子、Ⅲ类内含子。其中前2类内含子能进行自我剪切,而Ⅲ类内含子的剪切反应则需由核RNA和蛋白质组成的剪接复合体介导。综述不同类别的内含子的识别与剪接机制,并对内含子在生物信息学中的应用做简要介绍。  相似文献   

8.
异源生物中筛选高剪接活性Intein系统的建立   总被引:1,自引:0,他引:1  
原始物种体内蛋白质内含子(intein)介导的自催化蛋白剪接反应以100%效率进行.当这些蛋白质内含子被克隆入异源物种时,其剪接效率往往大大降低,绝大多数甚至完全失去剪接能力.本研究根据蛋白质内含子剪接活性与蛋白质外显子(extein)C端第1个保守氨基酸直接相关的特点,设计含有所有这些保守氨基酸的多个短的蛋白质外显子序列,通过PCR引入到卡那霉素抗性蛋白(KanR)的不同位点中,在此外显子中克隆入相应的蛋白质内含子,构建在大肠杆菌中依赖卡那霉素抗性来筛选高剪接活性蛋白质内含子的系统.结果显示,卡那霉素平板上菌落生长的结果与Western印迹检测的结果基本一致.说明建立的筛选高剪接活性蛋白质内含子系统成功.这种含有可选择蛋白质外显子的筛选系统,将蛋白质剪接与卡那霉素抗性相结合,直接从平板上观测剪接结果,成为快速、稳定筛选在异源物种中具有剪接活性蛋白内含子的新手段.  相似文献   

9.
酵母内含子在基因序列中的分布对基因转录效率的影响   总被引:4,自引:2,他引:4  
对酵母中高效转录和低效转录基因内含子序列寡核苷酸使用情况的对照分析,显示两类内含子的序列结构有差异,并且高效转录基因内含子序列含有较多潜在的转录因子结合位点,由此推测内含子可能参与基因转录的调控.这个结论有待更多的数据证实.对内含子和外显子在两组基因序列中的分布(长度、位置等)进行详细比较分析后显示,高效转录基因内含子和低效转录基因内含子的长度有比较明显的界限.两组基因中外显子长度的均值虽然有些差异,却没有明显的界限.基因序列长度与外显子长度的情况相似.虽然内含子的相对位置在两类基因中都很靠近5′端,但是从实际位置看,高效转录基因中比较多的内含子很靠近基因的5′端,有些则位于5′-UTR区域.这些结果提示,基因的转录效率与内含子的长度有关,与外显子及基因序列的长度无关,内含子的位置也可能影响转录效率,内含子对基因转录的调控可能与基因上游的转录调控有关联,或者是上游调控的延续.  相似文献   

10.
人乳铁蛋白cDNA 基因乳腺表达载体的构建与鉴定   总被引:2,自引:0,他引:2  
为了构建人乳铁蛋白基因 (hLF) 的乳腺表达载体并验证其在乳腺细胞中的表达情况,本载体以山羊β-casein基因上游包括启动子、外显子1、内含子1、部分外显子2作为5′端调控序列,下游包括部分外显子7、内含子7、外显子8、内含子8、外显子9及3′部分基因组片段作为3′端调控序列,长度分别为6.2 kb和7.1 kb,将hLF基因 (目的基因) 和Neo基因 (筛选标记) 分别插入到5′端调控序列和3′端调控序列的下游,构建成pBC1-hLF-Neo载体,其全长为25.348 kb。为了检测该载体的生物学  相似文献   

11.
Wu J 《BMC genomics》2008,9(Z2):S13

Background

Computational gene prediction tools routinely generate large volumes of predicted coding exons (putative exons). One common limitation of these tools is the relatively low specificity due to the large amount of non-coding regions.

Methods

A statistical approach is developed that largely improves the gene prediction specificity. The key idea is to utilize the evolutionary conservation principle relative to the coding exons. By first exploiting the homology between genomes of two related species, a probability model for the evolutionary conservation pattern of codons across different genomes is developed. A probability model for the dependency between adjacent codons/triplets is added to differentiate coding exons and random sequences. Finally, the log odds ratio is developed to classify putative exons into the group of coding exons and the group of non-coding regions.

Results

The method was tested on pre-aligned human-mouse sequences where the putative exons are predicted by GENSCAN and TWINSCAN. The proposed method is able to improve the exon specificity by 73% and 32% respectively, while the loss of the sensitivity ≤ 1%. The method also keeps 98% of RefSeq gene structures that are correctly predicted by TWINSCAN when removing 26% of predicted genes that are in non-coding regions. The estimated number of true exons in TWINSCAN's predictions is 157,070. The results and the executable codes can be downloaded from http://www.stat.purdue.edu/~jingwu/codon/

Conclusion

The proposed method demonstrates an application of the evolutionary conservation principle to coding exons. It is a complementary method which can be used as an additional criteria to refine many existing gene predictions.
  相似文献   

12.
13.
CD45 is a transmembrane protein tyrosine phosphatase, which in mammals plays an important role in T and B cell receptor and cytokine signaling. Recently, a catfish cDNA was shown to contain all characteristic CD45 features: an alternatively spliced amino-terminus, a cysteine-rich region, three fibronectin domains, a transmembrane region, and two phosphotyrosine phosphatase domains. However, analyses of CD45 cDNAs from various catfish lymphoid cell lines demonstrated that catfish CD45 is unique in that it contains a large number of alternatively spliced exons. Sequence analyses of cDNAs derived from the catfish clonal B cell line 3B11 indicated that this cell line expresses up to 13 alternatively spliced exons. Furthermore, sequence similarity among the alternatively spliced exons suggested duplication events. To establish the exact number and organization of alternatively spliced exons, a bacterial artificial chromosome library was screened, and the catfish functional CD45 gene plus six CD45 pseudogenes were sequenced. The catfish functional CD45 gene spans 37 kb and contains 49 exons. In comparison, the human and pufferfish CD45 genes consist of 34 and 30 exons, respectively. This difference in the otherwise structurally conserved catfish gene is due to the presence of 18 alternatively spliced exons that were likely derived through several duplication events. In addition, duplication events were also likely involved in generating the six pseudogenes, truncated at the 3 ends. A similarly 3 truncated CD45 pseudogene is also present in the pufferfish genome, suggesting that this specific CD45 gene duplication occurred before catfish and pufferfish diverged (400 million years ago).  相似文献   

14.
Our previous work applied neural network techniques to the problem of discriminating open reading frame (ORF) sequences taken from introns versus exons. The method counted the codon frequencies in an ORF of a specified length, and then used this codon frequency representation of DNA fragments to train a neural net (essentially a Perceptron with a sigmoidal, or "soft step function", output) to perform this discrimination. After training, the network was then applied to a disjoint "predict" set of data to assess accuracy. The resulting accuracy in our previous work was 98.4%, exceeding accuracies reported in the literature at that time for other algorithms. Here, we report even higher accuracies stemming from calculations of mutual information (a correlation measure) of spatially separated codons in exons, and in introns. Significant mutual information exists in exons, but not in introns, between adjacent codons. This suggests that dicodon frequencies of adjacent codons are important for intron/exon discrimination. We report that accuracies obtained using a neural net trained on the frequency of dicodons is significantly higher at smaller fragment lengths than even our original results using codon frequencies, which were already higher than simple statistical methods that also used codon frequencies. We also report accuracies obtained from including codon and dicodon statistics in all six reading frames, i.e. the three frames on the original and complement strand. Inclusion of six-frame statistics increases the accuracy still further. We also compare these neural net results to a Bayesian statistical prediction method that assumes independent codon frequencies in each position. The performance of the Bayesian scheme is poorer than any of the neural based schemes, however many methods reported in the literature either explicitly, or implicitly, use this method. Specifically, Bayesian prediction schemes based on codon frequencies achieve 90.9% accuracy on 90 codon ORFs, while our best neural net scheme reaches 99.4% accuracy on 60 codon ORFs. "Accuracy" is defined as the average of the exon and intron sensitivities. Achievement of sufficiently high accuracies on short fragment lengths can be useful in providing a computational means of finding coding regions in unannotated DNA sequences such as those arising from the mega-base sequencing efforts of the Human Genome Project. We caution that the high accuracies reported here do not represent a complete solution to the problem of identifying exons in "raw" base sequences. The accuracies are considerably lower from exons of small length, although still higher than accuracies reported in the literature for other methods. Short exon lengths are not uncommon.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

15.
Rapid direct sequence analysis of the dystrophin gene   总被引:8,自引:0,他引:8       下载免费PDF全文
Mutations in the dystrophin gene result in both Duchenne and Becker muscular dystrophy (DMD and BMD), as well as X-linked dilated cardiomyopathy. Mutational analysis is complicated by the large size of the gene, which consists of 79 exons and 8 promoters spread over 2.2 million base pairs of genomic DNA. Deletions of one or more exons account for 55%-65% of cases of DMD and BMD, and a multiplex polymerase chain reaction method-currently the most widely available method of mutational analysis-detects approximately 98% of deletions. Detection of point mutations and small subexonic rearrangements has remained challenging. We report the development of a method that allows direct sequence analysis of the dystrophin gene in a rapid, accurate, and economical fashion. This same method, termed "SCAIP" (single condition amplification/internal primer) sequencing, is applicable to other genes and should allow the development of widely available assays for any number of large, multiexon genes.  相似文献   

16.
A family of mammalian protocadherin (Pcdh) proteins is encoded by three closely linked gene clusters (alpha, beta, and gamma). Multiple alpha and gamma Pcdh mRNAs are expressed in distinct patterns in the nervous system and are generated by alternative pre-mRNA splicing between different "variable" exons and three "constant" exons within each cluster. We show that each Pcdh variable exon is preceded by a promoter and that promoter choice determines which variable exon is included in a Pcdh mRNA. In addition, we provide evidence that alternative splicing of variable exons within a gene cluster occurs via a cis-splicing mechanism. However, virtually every variable exon can engage in trans-splicing with constant exons from another cluster, albeit at a far lower level.  相似文献   

17.
Genomic cloning and chromosomal assignment of rat regucalcin gene   总被引:1,自引:0,他引:1  
The gene for a Ca2+-binding protein regucalcin was cloned from a rat genomic library which was constructed in FIX II by screening with radiolabeled probe (complementary DNA of rat liver regucalcin). Positive clone had 19.9 kb insert of size and contained four exons of the gene coding for a rat regucalcin. These exons included the partial coding sequence (61.2% of open reading frame) and the entire 3-untranslated region of the gene. The nucleotide sequence of exons completely agreed with that of a rat regucalcin cDNA clone. The sequence analysis of the clone showed that the identifier sequence and two simple repeated sequences exist in the intron of the gene. Moreover, chromosomal location of the rat regucalcin gene was determined by direct R-banding fluorescencein situ hybridization (FISH) method with the 19.9 kb clone containing four exons. The regucalcin gene was localized on rat chromosome Xq11.1–12 proximal end.The nucleotide sequence data reported in this paper will appear in the DDBJ, EMBL and GenBank Nucleotide Sequence Databases with the following accession number D31662  相似文献   

18.
19.
20.

Background  

ESTs are a tremendous resource for determining the exon-intron structures of genes, but even extensive EST sequencing tends to leave many exons and genes untouched. Gene prediction systems based exclusively on EST alignments miss these exons and genes, leading to poor sensitivity. De novo gene prediction systems, which ignore ESTs in favor of genomic sequence, can predict such "untouched" exons, but they are less accurate when predicting exons to which ESTs align. TWINSCAN is the most accurate de novo gene finder available for nematodes and N-SCAN is the most accurate for mammals, as measured by exact CDS gene prediction and exact exon prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号