首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
DNA序列高维空间数字编码的运算法则   总被引:1,自引:0,他引:1  
DNA序列的高维空间二进制数字编码,除可以对DNA序列的碱基结构、功能基团、碱基互补、氢键强弱等性质进行编码之外,还可以方便地进行 数学运算和逻辑运算。DNA序列高维空间数字编码的运算法则是:(1)根据DNA序列数码的奇偶性质,可以推导出其与末位碱基的对应关系。当DNA序列S的数值X(S)=4n,4n 1,4n 2,4n 3时,其末位碱基依次为C,T,A,G(n=0,1,2,…)。(2)提出DNA序列高维空间的表观维数Nv,数值维数Nx及差异维数Nd的概念。当Nd=0时,首位碱基为A或G,当Nd=2n或2n 1(n=1,2,…)时,首痊碱基为(C)^n或(C)^nT。(3)推导出DNA序列点突变(单核苷酸多态性SNP)的运算法则。(4)推导出DNA重复序列(Tandem repeat)的运算法则。(5)提出DNA子序列(subsequence)的概念并定义DNA子序列的定值部Xi(digital value)和定位部Qi(location value)及其计算公式。(6)推导出DNA序列的延长运算、删除运算、缺失运算、插入运算、转位运算、换位运算和置换运算等的运算法则。(7)通过按位加运算求得DNA序列的汉明距离dh,碱基距离dh‘,基团距离dh″和共轭距离dG以及这些距离的意义与联系。(8)分析结果表明DNA序列的数字编码比常规的字符编码在数学运算上具有明显的优越性。  相似文献   

2.
作为一种系统进化足迹,基因组非编码保守DNA序列受到极大关注。由于非编码保守DNA序列很可能与转录因子或特异蛋白质相互作用,直接参与调控基因表达或稳定染色体结构等重要的生命活动。因此,它极有可能成为基因组研究的下一个新浪潮。在总结对生物非编码保守DNA序列的认识过程的基础上,详细阐述了非编码保守DNA序列形成与演化的模型及其分子生物学机制,进一步展望了非编码保守DNA序列在生物学研究中的应用前景。  相似文献   

3.
基于DNA序列K-tuple分布的一种非序列比对分析   总被引:1,自引:0,他引:1  
沈娟  吴文武  解小莉  郭满才  袁志发 《遗传》2010,32(6):606-612
文章在基因组K-tuple分布的基础上, 给出了一种推测生物序列差异大小的非序列比对方法。该方法可用于衡量真实DNA序列和随机重排序列在K-tuple分布上的差异。将此方法用于构建含有26种胎盘哺乳动物线粒体全基因组的系统树时, 随着K的增大, 系统树的分类效果与生物学一致公认的结果愈加匹配。结果表明, 用此方法构建的系统进化树比用其他非序列比对分析方法构建的更加合理。  相似文献   

4.
本文以人腺病毒B亚种31条基因组序列及D亚种39条基因组序列为研究材料,利用ImperfectMicrosatelliteExtractor和DNAMAN软件对这些基因组序列中简单重复序列(SSR)的分布情况进行了系统性分析和比较。分析结果显示:人腺病毒B、D亚种基因组中简单重复序列的平均相对密度是十分接近的,但在不同类型SSR中分布情况又有所不同。D亚种中二型SSR明显高于B亚种,在两亚种一型SSR中(A)n、(T)n都是比较多的,而在两亚种二型SSR中的(CG/GC)n表现出了较高的偏好性。在同亚种多序列比对分析中,D亚种表现出了更高的稳定性。B、D亚种中SSR的这种特异性分布可能与它们的进化机制和致病性有关。  相似文献   

5.
以人类1号染色体DNA序列为样本,分别计算了CDS、5'UTR、3'UTR、内含子和基因间五类序列中8-mer出现的频数,并得到8-mer相对模体数随频数的分布。发现内含子和基因间序列是明显的三峰分布,CDS是单峰分布,5'UTR和3'UTR是近似单峰分布。为了揭示这些分布出现差异的原因,将8-mer集合按照8-mer中包含两个或两个以上某二核苷酸(XY2)、包含一个某二核苷酸(XY1)和包含0个某二核苷酸(XY0)进行分类。发现16种分类中,只有CGj分类的三个子集分别形成独立的单峰分布。表明DNA序列是由三类CGj子集组成的,它们出现的频数是独立进化的结果;实际的8-mer频数分布是这三个CGj频数分布的叠加;由于这三个分布的距离不同,才造成了五类序列中8-mer分布的差异。对五类序列CGj三个子集中二核苷酸和三核苷酸出现的相对频数进行分析,发现CG2模体的相对频数在五类序列中基本相同,CG1模体的相对频数可将五类序列明显区分,CG0模体的相对频数可将编码序列和非编码序列明显区分。总之,CGj模体集合在DNA序列的组成上具有特定的规律性,在DNA序列进化上扮演了重要的角色。  相似文献   

6.
扩展青霉PF898碱性脂肪酶基因组DNA的克隆及序列分析   总被引:6,自引:0,他引:6  
扩展青霉 (Penicilliumexpansum)PF898可产生一种具有重要工业生产价值的碱性脂肪酶(PEL) .在通过 3′RACE和 5′RACE获得PEL完整的cDNA序列的基础上 ,通过PCR方法首次克隆了该脂肪酶的完整的基因组DNA序列 (GenBank登录号为AF330 6 35 ) .该脂肪酶DNA全长 14 0 4bp ,包括PEL编码区、3′非翻译区和部分 5′非翻译区基因的序列 .编码区DNA由 1135个碱基组成 ,含有 5个内含子 ,大小分别为 5 8bp、4 7bp、5 0bp、5 6bp和 6 9bp .在已报道的丝状真菌脂肪酶中 ,PEL基因的内含子数量最多 ,而其大小与其它丝状真菌脂肪酶基因的内含子一样 ,均为只有几十个碱基的小内含子 .PCR扩增获得的PLEDNA序列还包括由 195个碱基组成的 3′端非编码区序列 ,74个碱基的部分 5′端非编码区序列 .PELDNA全长序列中的 - 2 4至 - 2 7nt为TATAbox ,终止码TGA下游15 6nt出现AATAAA序列 ,TGA下游 182位出现poly(A)尾 ,为典型的真核基因结构 .同源性序列分析表明 ,PEL与其它真菌来源脂肪酶的基因组DNA序列同源性约为 39%~ 4 9% ,PEL内含子之间或PEL内含子与其它丝状真菌脂肪酶基因的内含子之间的序列同源性约 4 2 %~ 5 7% .  相似文献   

7.
在分子系统学研究中,目的基因或者基因片段的选择是最关键的一步,由于进化速率的差异,不同的DNA序列适用于不同分类阶元的系统发育研究.本文综述了目前蕨类分子系统发育研究中常用的DNA序列分析,它们分别来自叶绿体基因组、核基因组和线粒体基因组,着重阐明叶绿体基因在蕨类分子系统学研究中的应用.本文还简要介绍了分子系统学研究中常见的问题及解决方法(如内类群和外类群的选择.适宜DNA片段的选择策略),总结了目前蕨类植物分子系统学研究所取得的进展和研究现状,展望了当今国际蕨类分子系统学的研究趋势.  相似文献   

8.
【目的】为探索胡颓子科叶绿体基因组演化趋势,从而为胡颓子科植物物种鉴定以及资源开发利用提供理论依据。【方法】研究从头组装并注释了沙棘属(Hippophae)和野牛果属(Shepherdia)共4个类群的叶绿体基因组,结合已发表的叶绿体基因组序列,比较了胡颓子科各类群叶绿体基因组的基因构成、重复序列和结构特征,建立了系统发育树,并通过高分化区定位了该科叶绿体基因组的潜在DNA条形码区域。【结果】胡颓子科各属叶绿体基因组在四分体结构、基因数量和排列上高度相似;沙棘属和野牛果属的反向重复区(IR)和整个基因组重复序列数目较胡颓子属有扩张和增加的趋势。基于胡颓子科18个类群的叶绿体全基因组序列的系统发育树中,胡颓子属、沙棘属和野牛果属各自聚为一支,前者先分化出来,沙棘属和野牛果属有最近共同祖先;从长单拷贝区(LSC)和短单拷贝区(SSC)筛选出3个DNA条形码候选区,其中ycf1基因的鉴定效果最佳,基于此构建的各类群系统发育关系与基于全基因组序列的结果一致。【结论】胡颓子科的叶绿体基因组结构保守,但其非编码区序列在各属间存在明显差异,且IR区序列与重复序列在演化过程中分别有扩张和增多的趋势。研究选定的DNA条形码序列能很好区分胡颓子科各属之间以及胡颓子属内物种间关系。  相似文献   

9.
DNA 序列在蕨类分子系统学研究中的应用   总被引:1,自引:0,他引:1  
刘红梅  张宪春  曾辉 《植物学报》2009,44(2):143-158
在分子系统学研究中, 目的基因或者基因片段的选择是最关键的一步, 由于进化速率的差异, 不同的DNA序列适用于不同分类阶元的系统发育研究。本文综述了目前蕨类分子系统发育研究中常用的DNA序列分析, 它们分别来自叶绿体基因组、核基因组和线粒体基因组, 着重阐明叶绿体基因在蕨类分子系统学研究中的应用。本文还简要介绍了分子系统学研究中常见的问题及解决方法(如内类群和外类群的选择, 适宜DNA片段的选择策略), 总结了目前蕨类植物分子系统学研究所取得的进展和研究现状, 展望了当今国际蕨类分子系统学的研究趋势。  相似文献   

10.
为了完善DNA序列对称理论,本文将12阶DNA群(D群)推广为24阶DNA全对称群(Dd群).DNA全对称群被定义为特殊的交换群(S4),其交换元素是DNA序列的4个碱基.DNA群与四面体群(T群)同构,DNA全对称群与正四面体全对称群(Td群)同构,D群是Dd群的一个子群.本文还推导出了Dd群12个新元素的矩阵表,Dd群的乘法表,得到了在Dd群操作下四碱基A,C,G和T的变换表等.  相似文献   

11.
《Genomics》2019,111(6):1574-1582
Given the vast amount of genomic data, alignment-free sequence comparison methods are required due to their low computational complexity. k-mer based methods can improve comparison accuracy by extracting an effective feature of the genome sequences. The aim of this paper is to extract k-mer intervals of a sequence as a feature of a genome for high comparison accuracy. In the proposed method, we calculated the distance between genome sequences by comparing the distribution of k-mer intervals. Then, we identified the classification results using phylogenetic trees. We used viral, mitochondrial (MT), microbial and mammalian genome sequences to perform classification for various genome sets. We confirmed that the proposed method provides a better classification result than other k-mer based methods. Furthermore, the proposed method could efficiently be applied to long sequences such as human and mouse genomes.  相似文献   

12.
13.
The distribution of the time T for the occurrence of the rth event is derived in a linear function Poisson process, which belongs to a recently introduced family of distributions (JANARDAN and RAO, 1981), called Lagrangian distributions of the second kind (LD2). In analogy with the ordinary Poisson process, the distribution of the random variable T has been termed the Lagrangian Gamma distribution of the second kind (LG2); several of its properties and special cases are studied. The frequency function is graphed for some parameter values.  相似文献   

14.
根据2014-2017年南海北部近海8个调查航次渔获量数据,结合统计方法分析该海域渔业资源密度分布特征并探索其适宜概率分布类型,进而估算区域平均资源密度.结果表明:各时期资源密度变异系数(CV)在0.67~1.03,说明该海域渔业资源密度呈较高程度的不均匀空间分布,且渔获资源密度频率分布呈现明显的右偏特征,总体以0~1000 kg·km-2资源密度为主导;单样本Kolmogorov-Smirnov检验结果表明,对数正态、伽玛和韦伯分布是该区域资源密度的适宜分布类型;在海域平均资源密度估算方面,对数正态所得结果与另两个分布类型在统计学上无显著差异,而伽玛和韦伯分布的估计值有显著差异.与1960-1970年代相比,该海域渔业资源密度适宜概率分布型已从单一类型转变为多类型,这主要归于渔业资源结构、捕捞强度以及气候变化等引起的低渔获量比例变化.  相似文献   

15.
裸藻门植物在江汉平原分布的特点是:分布广泛,出现机率高,达96.4%;用方差分析法,检测各采集点浮游生物标本中裸藻门植物种类丰度在平原各地随机分布的差异程度,其结果是无显著的区域性差异;水体中裸藻门植物种类丰度的频数分布近似于一正态曲线,这反映了大多数水体中裸藻类的种类丰度较大;相同频数的种类数与其出现频数的相关性呈明显的幂函数关系,显示了只有少数非常普生的裸藻门种类在区内广泛分布。    相似文献   

16.
Wei C  Wang G  Chen X  Huang H  Liu B  Xu Y  Li F 《PloS one》2011,6(10):e26296
Identification and typing of human enterovirus (HEVs) are important to pathogen detection and therapy. Previous phylogeny-based typing methods are mainly based on multiple sequence alignments of specific genes in the HEVs, but the results are not stable with respect to different choices of genes. Here we report a novel method for identification and typing of HEVs based on information derived from their whole genomes. Specifically, we calculate the k-mer based barcode image for each genome, HEV or other human viruses, for a fixed k, 1相似文献   

17.
Two types of distributions for the frequencies of occurrence of amino acids in each position of hypervariable regions CDR-1 and CDR-2 were obtained for 2,000 immunoglobulins. The results show that some positions fit an inverse power-law distribution, while others fit an exponential-type distribution. As a result of comparison with structural data in the literature it is proposed that sites in which the frequency distribution fits the inverse power law are critical to maintaining canonical shapes of the recognition regions or are involved in modulating these canonical conformations, while those sites where the distribution fits the exponential law are those which should be exclusively involved in the recognition mechanism. Correspondence to: F. Lara-Ochoa  相似文献   

18.
Ohta S  Morishita M 《Hereditas》2001,135(2-3):101-110
To elucidate the genome relationships in the genus Dasypyrum and the ancestry of tetraploid D. breviaristatum, two cytotypes of D. breviaristatum and D. villosum were reciprocally crossed with one another. Chromosome pairing at the first metaphase of meiosis and fertility were examined in the F1 hybrids and the parental plants. The mean pairing configuration and mean arm pairing frequency in D. villosum-D. breviaristatum (2x) hybrids were 11.12I + 1.44II per cell and 0.107, respectively, and they were almost completely sterile. In D. breviaristatum (4x)-D. breviaristatum (2x) hybrid, up to seven trivalents were formed, and the mean pairing configuration was 3.38I + 3.20II + 3.74III + 0.005IV per cell. The mean arm pairing frequency and relative affinity calculated in that F1 hybrid were 0.915 and 0.641, respectively. Seven bivalents and seven univalents were characteristically formed in D. villosum-D. breviaristatum (4x) hybrids. Based on the present results, we clearly concluded that the genome of diploid D. breviaristatum is distantly related to the genome V of D. villosum, and that these two species have different basic genomes. We, therefore, proposed the symbol Vb for the haploid genome of diploid cytotype of D. breviaristatum. Moreover, we concluded that tetraploid D. breviaristatum is an autotetraploid with doubled sets of the genomes homologous with that of diploid D. breviaristatum, and we proposed the genome constitution VbVb for the haploid genome set of tetraploid cytotype of D. breviaristatum. Furthermore, from the chromosome pairing in the F1 hybrids involving Moroccan and Greek accessions, it was suggested that complicated rearrangements of chromosome structure have occurred in tetraploid D. breviaristatum in its natural populations across the entire distribution area.  相似文献   

19.
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer.  相似文献   

20.
《Genomics》2019,111(4):966-972
Recombination hotspots in a genome are unevenly distributed. Hotspots are regions in a genome that show higher rates of meiotic recombinations. Computational methods for recombination hotspot prediction often use sophisticated features that are derived from physico-chemical or structure based properties of nucleotides. In this paper, we propose iRSpot-SF that uses sequence based features which are computationally cheap to generate. Four feature groups are used in our method: k-mer composition, gapped k-mer composition, TF-IDF of k-mers and reverse complement k-mer composition. We have used recursive feature elimination to select 17 top features for hotspot prediction. Our analysis shows the superiority of gapped k-mer composition and reverse complement k-mer composition features over others. We have used SVM with RBF kernel as a classification algorithm. We have tested our algorithm on standard benchmark datasets. Compared to other methods iRSpot-SF is able to produce significantly better results in terms of accuracy, Mathew's Correlation Coefficient and sensitivity which are 84.58%, 0.6941 and 84.57%. We have made our method readily available to use as a python based tool and made the datasets and source codes available at: https://github.com/abdlmaruf/iRSpot-SF. An web application is developed based on iRSpot-SF and freely available to use at: http://irspot.pythonanywhere.com/server.html.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号