首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
针对DNA序列编码区的识别问题,本研究提出一个特征向量和逻辑回归的组合模型。首先对DNA序列进行数值处理转化为特征向量,并结合k字符相对频率技术提取特征向量的元素特征,之后利用二分类逻辑回归算法,对编码区和非编码区进行准确区分。选取了HMR195和BG570两个基准数据集进行五折交叉验证,结果表明,平均AUC(Area Under Curve)值分别为0.981 3和0.987 4,明显优于传统的贝叶斯判别法和VOSSDFT等方法。此外,本文提出的特征向量的维度很低,提高了运算效率。因此,本文组合模型能够较为高效准确地识别蛋白质编码区。  相似文献   

2.
克隆人成骨肉瘤细胞CD147基因编码区并进行序列测定。以人成骨肉瘤细胞系MG63的mRNA为模板,采用RT-PCR方法得到人CD147的编码区cDNA,克隆至载体pcDNA3.1( )中,酶切鉴定后进行序列分析。结果获得人CD147编码区基因和重组质粒pcDNA3.1( )/asCD147,并经DNA序列分析证实其序列和献报道一致。以上结果证明了CD147在成骨肉瘤细胞中也有表达,为进一步进行其结构和功能研究打下了基础。  相似文献   

3.
编码区和非编码区SSR标记对水稻类群的比较研究   总被引:1,自引:0,他引:1  
设计14对水稻编码区SSR引物和选取已公布的非编码区SSR引物12对、编码区SSR引物3对,采用SSR技术,对29个标记在60个水稻材料中的多态性进行分析。结果表明,编码区SSR标记平均检测到3.59个多态性位点,多态信息量PIC(polymorphism information conten)在0.032~P0.853之间,平均值为0.447;非编码区SSR标记平均检测到3.92个多态性位点,PIC在0.063~P0.795之间,平均值为0.521。聚类分析显示,非编码区SSR标记能更加精确地区分来自不同地区的水稻类群,编码区SSR标记也具有良好的多态性,同样可以用于分析水稻的亲缘关系。  相似文献   

4.
真核生物DNA非编码区的组分分析   总被引:4,自引:0,他引:4  
在全基因组水平上,用直方图、混沌表示灰度图、距离差异度和信息熵差异度四种方法,研究了拟南芥、线虫、果蝇的DNA内含子、基因间隔区DNA、外显子三种区域的核苷酸短序列组分及组分复杂度.结果表明:a.不同基因组之间,不管基因数目多少,用4种方法得到的外显子部分其组分复杂度都比较接近,而非编码区部分的组分复杂度却很大.这一点定量地说明了物种之间的复杂程度,主要不体现在编码区部分,而体现在非编码区部分.b.同一基因组中,内含子的核苷酸短序列组分复杂度都是相似的,外显子和intergenic DNA部分的组分复杂度也是相似的.c.内含子和intergenic DNA在转录、剪切、二级结构等方面有很大的不同,但它们在核苷酸短序列组分上的差异却很小,说明内含子和intergenic DNA在转录、剪切、二级结构上的不同并不通过核苷酸短序列组分来进行限制.  相似文献   

5.
人Nmi mRNA编码区存在变异形式   总被引:1,自引:0,他引:1  
Nmi基因编码一种可与Myc相互作用的蛋白质.在人红白血病细胞系TF-1细胞去细胞因子GM-CSF后8h,发现NmiRNA表达水平升高.利用PCR方法从中扩增NmicDNA编码区,发现除正常大小的扩增片段外,还有一比公布核酸序列小约100~200bp的扩增片段.序列分析表明该片段为编码区第337~509位的碱基缺失,由GTTCCATTGCG11个碱基取代,形成一个开放读码框架,编码254个氨基酸,比野生型Nmi编码的307个氨基酸少53个氨基酸.  相似文献   

6.
孙奕钢  高雷  张忠华  薛庆中 《遗传》2005,27(4):629-635
在分析DNA序列复杂度、预测基因编码区和非编码的DNA边界识别等问题中,以熵为基础构造的离散量度量提供了一种强有力的工具。为改进寻找水稻基因编码与非编码区边界的效率,本文提出了两个新的离散量度量(α-KL离散量与α-Jensen-Shannon 离散量),根据密码子的GC含量对氨基酸对应密码子构建了粗粒化向量。 比较了融合Jensen-Shannon 离散量、Jensen-Renyi 离散量、α-KL离散量和α-Jensen-Shannon 离散量等不同向量所获得的精度,结果表明,在对水稻基因编码区‘终止子’的识别效率上,构建的密码子粗粒化向量融合新引进的度量方法比Bernaola等人的方法(2000)提高了4~5倍。  相似文献   

7.
对豇豆花叶病毒两个衣壳蛋白(VP37和VP23)的氨基端和羧基端氨基酸序列进行了分析,这些结果可以允许VP37和VP23编码区在病毒中间组份(M)RNA的核苷酸序列上进行基因定位。这两个编码区是相邻的,并表明,从M RNA的原始翻译产物中释出VP37和VP23的蛋白酶解部位,分别是谷氨酰胺-甲硫氨酸和谷氨酰胺-甘氨酸二肽序列。  相似文献   

8.
吴芳  刘英华  刘琳  邓光兵  余懋群  陈孝 《遗传》2007,29(11):1399-1404
为分析LMW-GS基因对面团强度的影响, 利用两个重组自交系99G45/京771和Pm97034/京771的F9代,对LMW-GS基因特异位点和与其紧密连锁的Gli-1位点进行分析, 研究对面团强度影响差异显著的Glu-B3位点的等位基因核心编码区的序列差异。结果表明, 3个亲本LMW-GS核心编码区都具有6个半胱氨酸残基, 但PB较GB和JB缺失了一个7氨基酸序列的重复单元, 并且在不同序列中出现了氨基酸代换, 其中有2个代换可能影响氨基酸序列的亲水性, 进而影响面团强度。  相似文献   

9.
本文对粟(Sctcriaitailica,谷称:谷子)叶绿体基因pabA上22kb的 EcoRI片段进行了克隆。该基因5′-未端非编码区就位于这个片段上。序列分析显示这个编码区存在着与原核生物基因类似的启动子结构:其“-10”区序列为TATACT,与核生物的仅相差一个核苷酸;“-35”区序列为TTGACA,与原核生物的完全相同。另一方面,在“-10”和“-35”区之间还存在着一个类似真核生物核基因启动子结构的“TATATA”保守序列。这表明粟psbA基因的启动子既具有原核基因的特征、又具有真核基因的特征。粟pabA基因的mRNA前导序列区长87bp,与高粱的完全一致。可以推测:禾本科C3和C4植物中,psbA基因mRNA前导序列区的差异可能具有某种普通性。计算机分析结果显示,6种植物的psbS基因mRNA前导序列区内均能形成小的茎环结构,而且这段“CTATTTT”额外序列恰好位于茎环结构中,造成了6种植物间茎环大小的差异。可能,这个小的二级结构对psbA基因的表达调控有一定的影响。  相似文献   

10.
随着后基因组时代的到来,非编码区的研究已经成为科学家面临的挑战,对基因非编码区的一个主要研究方向就是对调控元件的研究。识别转录调控元件是理解基因转录机制和表达模式的关键。较全面地介绍了基因非编码区以及调控元件,包括功能和作用,常用识别算法,并对常用数据库进行介绍,提出可能的研究方法和发展方向。  相似文献   

11.
外显子和内含子的序列复杂性   总被引:1,自引:0,他引:1  
引入了两个新的关于序列复杂性的测度,并以此为指标分析比较了结构基因序列中的外显子和内含子的复杂性差异。  相似文献   

12.
周雪平  刘勇 《病毒学报》1997,13(3):240-246
根据烟草花叶病毒U1株系序列,人工合成引物,用RT法合成了cDNA后,通过PCR技术扩增并克隆了烟草花叶病毒蚕豆株系的外壳蛋白的基因和3‘端非编码区。DNA序列测定结果表明,外壳蛋白基因全长480个碱基,编码158个氨基酸,3’端非编码区全长204个碱基,与TMV-U1株系的同源率为100%。  相似文献   

13.
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5' untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to 'pure' UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by 'coding' noise, thus enhancing significantly the prediction of 5' UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3' ends of non-coding exons and 5' non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2-3-fold better compared with NetGene2 and GenScan in 5' UTRs. We also tested the 5' UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR.  相似文献   

14.
Masked and exposed sites in rabbit beta-globin messenger RNA were identified through S1 nuclease mapping of RNase T1 cleavage sites. Sites exposed to this enzyme were compared in deproteinized polysomal RNA and in mRNA in its native configuration in reticulocyte extracts. The analysis showed that most of the 3' non-coding region is well accessible to the enzyme, both in deproteinized RNA and in the cell extract. A possible protecting function for the poly(A) sequence is suggested by the fact that molecules with very short poly(A) segments were cleaved preferentially in this region. The G residues in the 5' non-coding region were inaccessible to RNase T1. A highly sensitive site adjacent to the initiation AUG codon was evident in the deproteinized RNA. This site was far less accessible to the enzyme in the mRNA associated with ribosomes in the cell extract. The first 150 nucleotides in the coding region showed very little susceptibility to digestion by the enzyme, in deproteinized RNA as well as in the cell extracts. Preparations of untreated mRNA showed the occurrence of truncated molecules, apparently generated by cleavage by endogenous nucleases. These cleavages were most prevalent in the two non-coding regions. They occurred at sites containing A-U sequences in the 3' non-coding region, and at sites with different sequences in the 5' non-coding region. Incubation of cell extracts at 37 degrees C did not cause any increase in these endogenous cleavages. It is suggested that they may have been generated in the intact cells, possibly as part of the mRNA degradation process in maturing reticulocytes.  相似文献   

15.
16.
Summary Coding sequences of eucaryotic nuclear DNA were characterized by an excess of short runs and a deficit of long runs of weak and of strong hydrogen bonding bases; non-coding sequences by a deficit of short runs and an excess of long runs, in the same of purines and of pyrimidines. The conservation of these attributes across DNA sequences coding for proteins of widely different function, across widely different eucaryotic species for the same protein and across related genes that diverged a long time ago and that now show large differences in base and, if coding, amino acid sequence suggested that these attributes have survival value. It was concluded that these attributes constitute probalistic constraints on th primary structure (base sequence) of both coding and non-coding DNA.  相似文献   

17.
18.
Most of the gene prediction algorithms for prokaryotes are based on Hidden Markov Models or similar machine-learning approaches, which imply the optimization of a high number of parameters. The present paper presents a novel method for the classification of coding and non-coding regions in prokaryotic genomes, based on a suitably defined compression index of a DNA sequence. The main features of this new method are the non-parametric logic and the costruction of a dictionary of words extracted from the sequences. These dictionaries can be very useful to perform further analyses on the genomic sequences themselves. The proposed approach has been applied on some prokaryotic complete genomes, obtaining optimal scores of correctly recognized coding and non-coding regions. Several false-positive and false-negative cases have been investigated in detail, which have revealed that this approach can fail in the presence of highly structured coding regions (e.g., genes coding for modular proteins) or quasi-random non-coding regions (e.g., regions hosting non-functional fragments of copies of functional genes; regions hosting promoters or other protein-binding sequences). We perform an overall comparison with other gene-finder software, since at this step we are not interested in building another gene-finder system, but only in exploring the possibility of the suggested approach.  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号