首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
以人类DNA序列为原始数据,对其进行数字编码。在利用功率谱分析编码DNA序列的基础上,利用Hurst指数进一步分析序列的自相似性。从分析结果看出,DNA序列中的确存在长程相关现象,而且这种长程相关现象与DNA的组成结构有关,表现为它的结构基团的Hurst指数大于其功能基团的Hurst指数。同时,从内含子含量的角度分析序列的长程相关程度,结果表明,同一序列中,内含子含量与不同的化学基团具有不同的关系。这表明不同的化学结构对DNA序列特征具有不同的贡献。这些结论从非线性方法的角度对DNA序列的分析提供了新的思路。  相似文献   

2.
基于差错控制原理,将遗传信息中由海量碱基构成的DNA序列看作经过编码而获得的具有某种编码特性的编码序列。在此基础上,将编码理论中分组码的分析方法用于对DNA序列进行分析,选用了(6,3)分组码对三种原核生物和两种真核生物DNA序列进行分析,在ORF起始端和结束端观察到明显的码间距离变化。证明该方法对DNA序列分析有较好的指导作用。  相似文献   

3.
在DNA序列相似性的研究中,通常采用的动态规划算法对空位罚分函数缺乏理论依据而带有主观性,从而取得不同的结果,本文提出了一种基于DTW(Dynamic Time Warping,动态时间弯曲)距离的DNA序列相似性度量方法可以解决这一问题.通过DNA序列的图形表示把DNA序列转化为时间序列,然后计算DTW距离来度量序列相似度以表征DNA序列属性,得到能够比较DNA序列相似性度量方法,并用这个方法比较分析了七种东亚钳蝎神经毒素(Buthusmartensi Karsch neurotoxin)基因序列的相似性,验证了该度量方法的有效性和准确性.  相似文献   

4.
传统的DNA序列可视化模型局限于短DNA序列的可视化,并且缺乏对可视化图形的通用分析方法。因此,文章提出了一种基于图像的DNA序列可视化模型,这种模型通过将一维的DNA序列转换为二维的256色的灰度图像,可以实现长DNA序列的可视化,具有很高的空间紧密性。借助成熟的图像处理方法来分析DNA可视化图像,可以获取原始DNA序列的规模、4种不同碱基的分布、无序程度等重要信息。通过比较不同DNA序列的可视化图像,可以获取这些序列的相似性信息。  相似文献   

5.
针对DNA序列编码区的识别问题,本研究提出一个特征向量和逻辑回归的组合模型。首先对DNA序列进行数值处理转化为特征向量,并结合k字符相对频率技术提取特征向量的元素特征,之后利用二分类逻辑回归算法,对编码区和非编码区进行准确区分。选取了HMR195和BG570两个基准数据集进行五折交叉验证,结果表明,平均AUC(Area Under Curve)值分别为0.981 3和0.987 4,明显优于传统的贝叶斯判别法和VOSSDFT等方法。此外,本文提出的特征向量的维度很低,提高了运算效率。因此,本文组合模型能够较为高效准确地识别蛋白质编码区。  相似文献   

6.
为了深入研究基因组序列的多重分形性质,首先选取12条较长的DNA序列,并根据此12条DNA序列的编码/非编码片段将DNA序列转换成相应的12条时间序列,其次对这12个时间序列进行多重分形Hurst分析,计算它们的Hurst指数,并且利用Hurst指数分析序列的自相似性,进一步将得到的Hurst指数与DNA一维游走模型相比较,发现12条序列均具有长程相关性,这说明DNA序列中确实存在着长程相关现象。  相似文献   

7.
田靖  赵志虎  陈惠鹏 《遗传》2009,31(11):1067-1076
比较基因组学的研究发现: 人类基因组中约5%的序列受到选择压力的限制, 但编码序列只占其中很小一部分, 约3.5%是保守、非编码序列。这些保守非编码元件具有重要功能。可能在染色质构型(高级结构)、DNA转录和RNA加工等不同水平参与了基因的表达调控, 与哺乳动物的形态发生和人类疾病相关。文章简要综述了保守非编码元件的识别、功能及验证、起源演化以及与人类疾病的关系。  相似文献   

8.
秦丹  徐存拴 《遗传》2013,35(11):1253-1264
非编码DNA序列是指基因组中不编码蛋白质的DNA序列。这些序列可以结合调节因子、转录为功能性RNA、单独或协同地调节生理活动和病理过程。文章围绕基因表达调控作用, 总结了近几年非编码DNA序列的研究成果, 对其结构、功能和可能的作用机制进行了初步阐述, 介绍了目前鉴定非编码DNA序列中功能元件的计算方法和实验技术, 并对非编码DNA未来的研究进行了展望。  相似文献   

9.
依据烟草质体全基因组序列设计引物,以甘薯质体基因组DNA为模板,PCR扩增包含质体accD基因完整编码区在内的一段序列(GenBank登录号为GQ395771)。序列分析表明:该片段全长为2209bp,包括1548bp的nccD基因编码序列,推测编码515个氨基酸的蛋白质,该蛋白序列具有异质型β-CT中保守的锌指结构和C末端5个基元。同时绘制了该DNA片段的限制性酶切图谱。相似性比较显示,甘薯accD基因与大豆、马铃薯、拟南芥、人参、莴苣、葡萄、海岛棉、甘蓝、辣椒、菠菜、番茄和烟草的accD基因核苷酸相似性为72%-87%,氨基酸相似性为58%-83%。  相似文献   

10.
DNA序列编码的研究进展   总被引:1,自引:0,他引:1  
DNA计算的可靠性和精确度与编码设计的优劣密切相关。随着DNA计算的发展以及DNA计算研究的进一步深入,对编码的要求也越来越高。围绕如何保证理想的生化反应和解的成功提取问题,人们提出了各种各样的约束条件和评价模型,本文简述了DNA编码的原理,综述了当前DNA序列编码的方法,最后分析了DNA编码设计中存在的问题,并对编码设计的未来发展进行了展望。  相似文献   

11.
A fractal method to distinguish coding and non-coding sequences in a complete genome is proposed, based on different statistical behaviors between these two kinds of sequences. We first propose a number sequence representation of DNA sequences. Multifractal analysis is then performed on the measure representation of the obtained number sequence. The three exponents C(-1), C1 and C2 are selected from the result of multifractal analysis. Each DNA may be represented by a point in the three-dimensional space generated by these three-component vectors. It is shown that points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes are roughly distributed in different regions. Fisher's discriminant algorithm can be used to separate these two regions in the spanned space. If the point (C(-1),C1,C2) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. For all 51 prokaryotes we considered , the average discriminant accuracies pc,pnc,qc and qnc reach 72.28%, 84.65%, 72.53% and 84.18%, respectively.  相似文献   

12.
Coding capacity of complementary DNA strands.   总被引:7,自引:4,他引:3       下载免费PDF全文
A Fortran computer algorithm has been used to analyze the nucleotide sequence of several structural genes. The analysis performed on both coding and complementary DNA strands shows that whereas open reading frames shorter than 100 codons are randomly distributed on both DNA strands, open reading frames longer than 100 codons ("virtual genes") are significantly more frequent on the complementary DNA strand than on the coding one. These "virtual genes" were further investigated by looking at intron sequences, splicing points, signal sequences and by analyzing gene mutations. On the basis of this analysis coding and complementary DNA strands of several eukaryotic structural genes cannot be distinguished. In particular we suggest that the complementary DNA strand of the human epsilon-globin gene might indeed code for a protein.  相似文献   

13.
DNA sequence representation without degeneracy   总被引:2,自引:0,他引:2       下载免费PDF全文
Yau SS  Wang J  Niknejad A  Lu C  Jin N  Ho YK 《Nucleic acids research》2003,31(12):3078-3080
Graphical representation of DNA sequence provides a simple way of viewing, sorting and comparing various gene structures. A new two-dimensional graphical representation method using a two- quadrant Cartesian coordinates system has been derived for mathematical denotation of DNA sequence. The two-dimensional graphic representation resolves sequences’ degeneracy and is mathematically proven to eliminate circuit formation. Given x-projection and y-projection of any point on the graphical representation, the number of A, G, C and T from the beginning of the sequence to that point could be found. Compared with previous methods, this graphical representation is more in-line with the conventional recognition of linear sequences by molecular biologists, and also provides a metaphor in two dimensions for local and global DNA sequence comparison.  相似文献   

14.
基于DNA序列的3D图形表示,通过L/L矩阵的规范化最大特征值组成的3维向量来刻画了DNA序列,并基于这种方法,用β-globin基因的第一个外显子分析了11个物种的相似性问题。  相似文献   

15.
We introduce a novel 2D graphical representation of DNA sequences based on the pairs of the neighboring nucleotides (PNNs). Then we get the PNNs' distributions and obtain a y-M. The construction of the PNN-curve has some important advantages (1) It avoids loss of information and the PNN-curve standing for DNA sequences does not overlap or intersect with itself. (2) The novel 2D representation is more sensitive. The utility of this method can be illustrated by the examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of eleven different species in Table 2.  相似文献   

16.
New 3D graphical representation of DNA sequence based on dual nucleotides   总被引:2,自引:2,他引:0  
We introduce a 3D graphical representation of DNA sequences based on the pairs of dual nucleotides (DNs). Based on this representation, we consider some mathematical invariants and construct two 16-component vectors associated with these invariants. The vectors are used to characterize and compare the complete coding sequence part of beta globin gene of nine different species. The examination of similarities/dissimilarities illustrates the utility of the approach.  相似文献   

17.
We consider a novel 2-D graphical representation of DNA sequences according to chemical structures of bases, reflecting distribution of bases with different chemical structure, preserving information on sequential adjacency of bases, and allowing numerical characterization. The representation avoids loss of information accompanying alternative 2-D representations in which the curve standing for DNA overlaps and intersects itself. Based on this representation we present a numerical characterization approach by the leading eigenvalues of the matrices associated with the DNA sequences. The utility of the approach is illustrated on the coding sequences of the first exon of human beta-globin gene.  相似文献   

18.
张闻  明洪  龙莉  陈元晓 《遗传》2001,23(6):511-514
通过分析GenBank中的全部95个HIV-1完整基因组序列,设计融膜肽“探针序列”,对所获得的95段融膜肽的编码DNA序列进行了翻译、对准和分析。得到融膜肽及其编码序称的“优势序列”及突变分布。  相似文献   

19.
CRITICA: coding region identification tool invoking comparative analysis.   总被引:34,自引:0,他引:34  
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号