首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
作为一种系统进化足迹,基因组非编码保守DNA序列受到极大关注。由于非编码保守DNA序列很可能与转录因子或特异蛋白质相互作用,直接参与调控基因表达或稳定染色体结构等重要的生命活动。因此,它极有可能成为基因组研究的下一个新浪潮。在总结对生物非编码保守DNA序列的认识过程的基础上,详细阐述了非编码保守DNA序列形成与演化的模型及其分子生物学机制,进一步展望了非编码保守DNA序列在生物学研究中的应用前景。  相似文献   

2.
基于DNA序列K-tuple分布的一种非序列比对分析   总被引:1,自引:0,他引:1       下载免费PDF全文
文章在基因组K-tuple分布的基础上,给出了一种推测生物序列差异大小的非序列比对方法。该方法可用于衡量真实DNA序列和随机重排序列在K-tuple分布上的差异。将此方法用于构建含有26种胎盘哺乳动物线粒体全基因组的系统树时,随着K的增大,系统树的分类效果与生物学一致公认的结果愈加匹配。结果表明,用此方法构建的系统进化树比用其他非序列比对分析方法构建的更加合理。  相似文献   

3.
DNA序列高维空间数字编码的运算法则   总被引:1,自引:0,他引:1  
DNA序列的高维空间二进制数字编码,除可以对DNA序列的碱基结构、功能基团、碱基互补、氢键强弱等性质进行编码之外,还可以方便地进行 数学运算和逻辑运算。DNA序列高维空间数字编码的运算法则是:(1)根据DNA序列数码的奇偶性质,可以推导出其与末位碱基的对应关系。当DNA序列S的数值X(S)=4n,4n 1,4n 2,4n 3时,其末位碱基依次为C,T,A,G(n=0,1,2,…)。(2)提出DNA序列高维空间的表观维数Nv,数值维数Nx及差异维数Nd的概念。当Nd=0时,首位碱基为A或G,当Nd=2n或2n 1(n=1,2,…)时,首痊碱基为(C)^n或(C)^nT。(3)推导出DNA序列点突变(单核苷酸多态性SNP)的运算法则。(4)推导出DNA重复序列(Tandem repeat)的运算法则。(5)提出DNA子序列(subsequence)的概念并定义DNA子序列的定值部Xi(digital value)和定位部Qi(location value)及其计算公式。(6)推导出DNA序列的延长运算、删除运算、缺失运算、插入运算、转位运算、换位运算和置换运算等的运算法则。(7)通过按位加运算求得DNA序列的汉明距离dh,碱基距离dh‘,基团距离dh″和共轭距离dG以及这些距离的意义与联系。(8)分析结果表明DNA序列的数字编码比常规的字符编码在数学运算上具有明显的优越性。  相似文献   

4.
本文以人腺病毒B亚种31条基因组序列及D亚种39条基因组序列为研究材料,利用ImperfectMicrosatelliteExtractor和DNAMAN软件对这些基因组序列中简单重复序列(SSR)的分布情况进行了系统性分析和比较。分析结果显示:人腺病毒B、D亚种基因组中简单重复序列的平均相对密度是十分接近的,但在不同类型SSR中分布情况又有所不同。D亚种中二型SSR明显高于B亚种,在两亚种一型SSR中(A)n、(T)n都是比较多的,而在两亚种二型SSR中的(CG/GC)n表现出了较高的偏好性。在同亚种多序列比对分析中,D亚种表现出了更高的稳定性。B、D亚种中SSR的这种特异性分布可能与它们的进化机制和致病性有关。  相似文献   

5.
以人类1号染色体DNA序列为样本,分别计算了CDS、5'UTR、3'UTR、内含子和基因间五类序列中8-mer出现的频数,并得到8-mer相对模体数随频数的分布。发现内含子和基因间序列是明显的三峰分布,CDS是单峰分布,5'UTR和3'UTR是近似单峰分布。为了揭示这些分布出现差异的原因,将8-mer集合按照8-mer中包含两个或两个以上某二核苷酸(XY2)、包含一个某二核苷酸(XY1)和包含0个某二核苷酸(XY0)进行分类。发现16种分类中,只有CGj分类的三个子集分别形成独立的单峰分布。表明DNA序列是由三类CGj子集组成的,它们出现的频数是独立进化的结果;实际的8-mer频数分布是这三个CGj频数分布的叠加;由于这三个分布的距离不同,才造成了五类序列中8-mer分布的差异。对五类序列CGj三个子集中二核苷酸和三核苷酸出现的相对频数进行分析,发现CG2模体的相对频数在五类序列中基本相同,CG1模体的相对频数可将五类序列明显区分,CG0模体的相对频数可将编码序列和非编码序列明显区分。总之,CGj模体集合在DNA序列的组成上具有特定的规律性,在DNA序列进化上扮演了重要的角色。  相似文献   

6.
扩展青霉 (Penicilliumexpansum)PF898可产生一种具有重要工业生产价值的碱性脂肪酶(PEL) .在通过 3′RACE和 5′RACE获得PEL完整的cDNA序列的基础上 ,通过PCR方法首次克隆了该脂肪酶的完整的基因组DNA序列 (GenBank登录号为AF330 6 35 ) .该脂肪酶DNA全长 14 0 4bp ,包括PEL编码区、3′非翻译区和部分 5′非翻译区基因的序列 .编码区DNA由 1135个碱基组成 ,含有 5个内含子 ,大小分别为 5 8bp、4 7bp、5 0bp、5 6bp和 6 9bp .在已报道的丝状真菌脂肪酶中 ,PEL基因的内含子数量最多 ,而其大小与其它丝状真菌脂肪酶基因的内含子一样 ,均为只有几十个碱基的小内含子 .PCR扩增获得的PLEDNA序列还包括由 195个碱基组成的 3′端非编码区序列 ,74个碱基的部分 5′端非编码区序列 .PELDNA全长序列中的 - 2 4至 - 2 7nt为TATAbox ,终止码TGA下游15 6nt出现AATAAA序列 ,TGA下游 182位出现poly(A)尾 ,为典型的真核基因结构 .同源性序列分析表明 ,PEL与其它真菌来源脂肪酶的基因组DNA序列同源性约为 39%~ 4 9% ,PEL内含子之间或PEL内含子与其它丝状真菌脂肪酶基因的内含子之间的序列同源性约 4 2 %~ 5 7% .  相似文献   

7.
DNA 序列在蕨类分子系统学研究中的应用   总被引:1,自引:0,他引:1  
刘红梅  张宪春  曾辉 《植物学报》2009,44(2):143-158
在分子系统学研究中, 目的基因或者基因片段的选择是最关键的一步, 由于进化速率的差异, 不同的DNA序列适用于不同分类阶元的系统发育研究。本文综述了目前蕨类分子系统发育研究中常用的DNA序列分析, 它们分别来自叶绿体基因组、核基因组和线粒体基因组, 着重阐明叶绿体基因在蕨类分子系统学研究中的应用。本文还简要介绍了分子系统学研究中常见的问题及解决方法(如内类群和外类群的选择, 适宜DNA片段的选择策略), 总结了目前蕨类植物分子系统学研究所取得的进展和研究现状, 展望了当今国际蕨类分子系统学的研究趋势。  相似文献   

8.
在分子系统学研究中,目的基因或者基因片段的选择是最关键的一步,由于进化速率的差异,不同的DNA序列适用于不同分类阶元的系统发育研究.本文综述了目前蕨类分子系统发育研究中常用的DNA序列分析,它们分别来自叶绿体基因组、核基因组和线粒体基因组,着重阐明叶绿体基因在蕨类分子系统学研究中的应用.本文还简要介绍了分子系统学研究中常见的问题及解决方法(如内类群和外类群的选择.适宜DNA片段的选择策略),总结了目前蕨类植物分子系统学研究所取得的进展和研究现状,展望了当今国际蕨类分子系统学的研究趋势.  相似文献   

9.
为了完善DNA序列对称理论,本文将12阶DNA群(D群)推广为24阶DNA全对称群(Dd群).DNA全对称群被定义为特殊的交换群(S4),其交换元素是DNA序列的4个碱基.DNA群与四面体群(T群)同构,DNA全对称群与正四面体全对称群(Td群)同构,D群是Dd群的一个子群.本文还推导出了Dd群12个新元素的矩阵表,Dd群的乘法表,得到了在Dd群操作下四碱基A,C,G和T的变换表等.  相似文献   

10.
东方田鼠特异DNA片段的克隆及核苷酸序列分析   总被引:12,自引:1,他引:11  
目的获得东方田鼠的特异DNA序列.方法Aβ基因使用PCR,基因克隆,斑点杂交,DNA序列分析,生物信息学技术.结果根据小鼠MHCⅡ外显子2及其两侧序列,合成引物并扩增东方田鼠基因组DNA,将PCR产物回收、测序后,分别设计内引物扩增东方田鼠基因组DNA,其中一对引物可得到特异性扩增带,将得到的DNA片段插入PGEM-Teasy载体,进行序列分析.用这对引物扩增人、昆明小鼠、BALB/c小鼠及C57BL/6J小鼠基因组DNA,均无扩增产物.以东方田鼠特异性扩增产物为探针进行斑点杂交,除东方田鼠基因组DNA外,其他几种动物基因组DNA均为阴性结果.进一步对该DNA片段进行了BLAST同源性搜索和外显子预测,在Genbank中没有发现高度同源序列,并且找到一个可能的外显子,该外显子由69个氨基酸组成.结论获得的DNA片段为东方田鼠的特异片段,这将为从分子水平深入研究东方田鼠的遗传背景、生物进化规律以及东方田鼠抗日本血吸虫的机理奠定基础.  相似文献   

11.
《Genomics》2019,111(6):1574-1582
Given the vast amount of genomic data, alignment-free sequence comparison methods are required due to their low computational complexity. k-mer based methods can improve comparison accuracy by extracting an effective feature of the genome sequences. The aim of this paper is to extract k-mer intervals of a sequence as a feature of a genome for high comparison accuracy. In the proposed method, we calculated the distance between genome sequences by comparing the distribution of k-mer intervals. Then, we identified the classification results using phylogenetic trees. We used viral, mitochondrial (MT), microbial and mammalian genome sequences to perform classification for various genome sets. We confirmed that the proposed method provides a better classification result than other k-mer based methods. Furthermore, the proposed method could efficiently be applied to long sequences such as human and mouse genomes.  相似文献   

12.
13.
The distribution of the time T for the occurrence of the rth event is derived in a linear function Poisson process, which belongs to a recently introduced family of distributions (JANARDAN and RAO, 1981), called Lagrangian distributions of the second kind (LD2). In analogy with the ordinary Poisson process, the distribution of the random variable T has been termed the Lagrangian Gamma distribution of the second kind (LG2); several of its properties and special cases are studied. The frequency function is graphed for some parameter values.  相似文献   

14.
Two types of distributions for the frequencies of occurrence of amino acids in each position of hypervariable regions CDR-1 and CDR-2 were obtained for 2,000 immunoglobulins. The results show that some positions fit an inverse power-law distribution, while others fit an exponential-type distribution. As a result of comparison with structural data in the literature it is proposed that sites in which the frequency distribution fits the inverse power law are critical to maintaining canonical shapes of the recognition regions or are involved in modulating these canonical conformations, while those sites where the distribution fits the exponential law are those which should be exclusively involved in the recognition mechanism. Correspondence to: F. Lara-Ochoa  相似文献   

15.
Wei C  Wang G  Chen X  Huang H  Liu B  Xu Y  Li F 《PloS one》2011,6(10):e26296
Identification and typing of human enterovirus (HEVs) are important to pathogen detection and therapy. Previous phylogeny-based typing methods are mainly based on multiple sequence alignments of specific genes in the HEVs, but the results are not stable with respect to different choices of genes. Here we report a novel method for identification and typing of HEVs based on information derived from their whole genomes. Specifically, we calculate the k-mer based barcode image for each genome, HEV or other human viruses, for a fixed k, 1相似文献   

16.
Ohta S  Morishita M 《Hereditas》2001,135(2-3):101-110
To elucidate the genome relationships in the genus Dasypyrum and the ancestry of tetraploid D. breviaristatum, two cytotypes of D. breviaristatum and D. villosum were reciprocally crossed with one another. Chromosome pairing at the first metaphase of meiosis and fertility were examined in the F1 hybrids and the parental plants. The mean pairing configuration and mean arm pairing frequency in D. villosum-D. breviaristatum (2x) hybrids were 11.12I + 1.44II per cell and 0.107, respectively, and they were almost completely sterile. In D. breviaristatum (4x)-D. breviaristatum (2x) hybrid, up to seven trivalents were formed, and the mean pairing configuration was 3.38I + 3.20II + 3.74III + 0.005IV per cell. The mean arm pairing frequency and relative affinity calculated in that F1 hybrid were 0.915 and 0.641, respectively. Seven bivalents and seven univalents were characteristically formed in D. villosum-D. breviaristatum (4x) hybrids. Based on the present results, we clearly concluded that the genome of diploid D. breviaristatum is distantly related to the genome V of D. villosum, and that these two species have different basic genomes. We, therefore, proposed the symbol Vb for the haploid genome of diploid cytotype of D. breviaristatum. Moreover, we concluded that tetraploid D. breviaristatum is an autotetraploid with doubled sets of the genomes homologous with that of diploid D. breviaristatum, and we proposed the genome constitution VbVb for the haploid genome set of tetraploid cytotype of D. breviaristatum. Furthermore, from the chromosome pairing in the F1 hybrids involving Moroccan and Greek accessions, it was suggested that complicated rearrangements of chromosome structure have occurred in tetraploid D. breviaristatum in its natural populations across the entire distribution area.  相似文献   

17.
K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer.  相似文献   

18.
Theoretical and analytical problems of the dynamics of distribution and abundance in animal communities were examined. In many communities, species with low abundance and of limited spatial occurrence (i.e., rare species) typically form a conspicuous peak when a frequency distribution of the number of species is constructed with respect to the proportion of sites occupied within an area of distribution. Models of distribution dynamics, including a new model proposed here, were compared with a range of animal community data using a new procedure to assess single- and bi-modal patterns in frequency distributions of spatial occurrence. Data reveal that single-modality with an excess of rare species occurs more frequently than bimodality. Even when bimodality is detected, the mode representing wide-spread species is in the majority of cases smaller than that for rare species. Thus, a new model in which the rate of local extinctions is assumed to be negatively related to patch occupancy (or population abundance) is in better agreement with observed data than earlier models. Some problems of analysis, in particular model assumptions and testing, are discussed.  相似文献   

19.
《Genomics》2019,111(4):966-972
Recombination hotspots in a genome are unevenly distributed. Hotspots are regions in a genome that show higher rates of meiotic recombinations. Computational methods for recombination hotspot prediction often use sophisticated features that are derived from physico-chemical or structure based properties of nucleotides. In this paper, we propose iRSpot-SF that uses sequence based features which are computationally cheap to generate. Four feature groups are used in our method: k-mer composition, gapped k-mer composition, TF-IDF of k-mers and reverse complement k-mer composition. We have used recursive feature elimination to select 17 top features for hotspot prediction. Our analysis shows the superiority of gapped k-mer composition and reverse complement k-mer composition features over others. We have used SVM with RBF kernel as a classification algorithm. We have tested our algorithm on standard benchmark datasets. Compared to other methods iRSpot-SF is able to produce significantly better results in terms of accuracy, Mathew's Correlation Coefficient and sensitivity which are 84.58%, 0.6941 and 84.57%. We have made our method readily available to use as a python based tool and made the datasets and source codes available at: https://github.com/abdlmaruf/iRSpot-SF. An web application is developed based on iRSpot-SF and freely available to use at: http://irspot.pythonanywhere.com/server.html.  相似文献   

20.
Assuming some general models for the HIV epidemic, in this paper I derive the HIV incubation distributions under AZT treatment. It is shown that under some conditions, these probability distributions are mixtures of some generalized Gamma distributions and products of generalized Gamma distributions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号