首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
在DNA序列相似性的研究中,通常采用的动态规划算法对空位罚分函数缺乏理论依据而带有主观性,从而取得不同的结果,本文提出了一种基于DTW(Dynamic Time Warping,动态时间弯曲)距离的DNA序列相似性度量方法可以解决这一问题.通过DNA序列的图形表示把DNA序列转化为时间序列,然后计算DTW距离来度量序列相似度以表征DNA序列属性,得到能够比较DNA序列相似性度量方法,并用这个方法比较分析了七种东亚钳蝎神经毒素(Buthusmartensi Karsch neurotoxin)基因序列的相似性,验证了该度量方法的有效性和准确性.  相似文献   

2.
DNA条形码是一段可用于物种鉴定的DNA序列。本文综述了近年来多种基于DNA条形码的分析方法及其在物种鉴定和隐存种发现中的应用,主要包括遗传距离法、进化树法、相似性比对法、诊断法和统计分类法等,旨在为这一技术的广泛应用提供参考。  相似文献   

3.
给出了蛋白质序列的一种六维表示方法,根据这种表示方法有3种不同表示形式,利用这3种形式来构造距离矩阵的信息熵,然后通过信息熵向量的欧式距离、夹角来比较序列之间的相似性。  相似文献   

4.
时间序列的相似性测度   总被引:1,自引:0,他引:1  
时间序列(time series)是指按时间顺序排列的观测值集合,在生物信息学研究领域中,DNA序列和基因表达数据都可以视为时间序列数据。时间序列分析中很重要的环节就是刻划两个时间序列或者时间子序列的相似性,用于序列比对等。时间序列的相似性测度是时间序列研究中的基础和重点,直接影响查询、聚类等后续计算的效率和精度,在高通量基因芯片数据分析、基因网络构建等研究中,具有重要的应用,目前已引起了众多研究人员的关注,在欧氏距离的基础上进行了大量的研究,本文综述了基于欧式距离和时间弯曲的时间序列相似性测度及其相关领域的研究进展,可作为进一步研究的参考。  相似文献   

5.
黄檗丛枝菌根真菌鉴定   总被引:1,自引:0,他引:1  
目的:利用形态学特征与Nested-PCR技术鉴定黄檗丛枝菌根真菌。方法:采用酸性品红染色法挑选黄檗丛枝菌根。同时,利用湿筛法获得AM真菌孢子,进行形态学鉴定。运用Nested-PCR技术,对黄檗粗提DNA进行特异性扩增,采用blastn进行序列相似性比较。并构建系统进化树,确定侵染黄檗根系的AM真菌。结果:编号为HDAM-1的AM真菌孢子,形态特征与G.intraradices的特征描述一致。Nested-PCR检测到约455bp的目的片段,其序列与G.intraradices(DQ469118)相似性最高,达97.8%,有11个碱基的差异。系统进化树显示该序列在基于25S rDNA的进化树中与G.intraradices(DQ469118.1)处于同一分支,确定G.intraradices侵染黄檗根系。结论:将形态学特征与Nested-PCR技术相结合鉴定AM真菌,不仅简易、经济,而且能够提高研究结果的可靠性。  相似文献   

6.
张堃  赵静静  唐旭清 《生命科学研究》2011,15(2):101-106,124
基于经典HP模型,利用蛋白质序列的矩阵图谱表达法(MGR)及数值刻画的思想提出了一种新的蛋白质序列的比对方法,通过观察蛋白质序列的数值刻画图及计算两蛋白质序列之间的欧氏距离d,对木聚糖酶两家族的蛋白质序列进行了相似性分析.发现被划分为同一木聚糖酶家族的蛋白质序列之间的相似性更大,而且蛋白质序列的相似性程度与分子大小、结构和分子进化相关.  相似文献   

7.
本文对我国首株与手足口病相关的柯萨奇病毒B5(01/CVB5/SD/CHN/09,CVB5/09)进行了基因组测序并与现有的相关序列进行了比较和进化分析。CVB5/09基因组长7399nt,共编码氨基酸2185aa,与现有的CVB5基因组核酸序列相似性在80.6%~85.3%之间,氨基酸序列相似性在96.1%~96.9%。进化分析发现,利用不同的基因组片段P1、P2和P3区构建的进化树中,CVB5/09分别处在不同的进化分支上,不同基因组片段有着不同的进化速率。Simplot相似性分析没有发现基因组有明显的重组发生。本文完成了我国第1株柯萨奇病毒B5全基因组序列的测定,通过与其它相关病毒的比较分析深入了解其遗传特征,以期为手足口病的流行病学调查和预防控制提供有价值的信息。  相似文献   

8.
利用两类不同的引物,即通用引物(L1490,H219S)与特异引物(Pat,Jerry)分别对4种常见金龟子线粒体细胞色素C氧化酶I(COI)基因片段序列进行扩增和测序,获得长度为689 bp与775 bp的序列.对测序结果进行遗传距离分析,并构建了4种金龟子系统进化树.结果表明,特异引物扩增序列的遗传距离在种内稳定性与种间的差异性都明显优于通用引物扩增序列,利用特异引物扩增序列所构建的系统进化树最符合实际情况,因此利用特异引物扩增序列更能够准确的对金龟子进行分类.  相似文献   

9.
利用两类不同的引物,即通用引物(L1490,H2198)与特异引物(Pat,Jerry)分别对4种常见金龟子线粒体细胞色素C氧化酶I(COⅠ)基因片段序列进行扩增和测序,获得长度为689 bp与775 bp的序列。对测序结果进行遗传距离分析,并构建了4种金龟子系统进化树。结果表明,特异引物扩增序列的遗传距离在种内稳定性与种间的差异性都明显优于通用引物扩增序列,利用特异引物扩增序列所构建的系统进化树最符合实际情况,因此利用特异引物扩增序列更能够准确的对金龟子进行分类。  相似文献   

10.
目的探讨随机引物PCR(Arbitrary primerPCR,AP—PCR)结合克隆测序及生物信息学分析等方法在研究幽门螺杆菌(Helicobacter pylori,Hpylori)菌株地域起源特征中的价值和意义。方法针对临床分离培养的Hpylori菌株的基因组DNA,采用一组10nt的寡核苷酸引物进行随机PCR扩增,选取相对保守的片段进行回收、克隆及测序,测序的基因序列提交GenBank数据库进行序列相似性的BLAST比对,收集BLAST比对得到的同源性较高不同地域来源螺杆菌的对应序列,用ClustalX软件进行排序,采用Mega4.0软件中的邻位相连法(Neighbor-joining)和最大简约法(Maximum—parsimony)进行进化树分析。结果随机引物扩增及筛选克隆测序得到的基因产物为NADH脱氢酶G和H亚单位的部分编码序列,与27株不同地域来源H.pylori及1株猫科动物来源螺杆菌菌株的同源性均高达90%以上,表明Hpylori中NADH脱氢酶基因序列为保守结构,进化树分析显示:采用AP.PCR方法得到的Hpylori临床菌株的基因序列,显示出东亚菌株来源的遗传特征,与具有东亚菌株特征的美洲秘鲁Sat464和Shi470菌株、韩国的52、51菌株、日本的1757菌株遗传距离较近,与南亚、欧洲菌株距离较远,与非洲的SouthAffica7菌株和猫科动物来源的Sheeba菌株的遗传距离最远。结论不仅某些特殊基因可以反映地域差异,随机定位相对保守的基因片段同样可以反映Hpylori的地域起源特征。AP—PCR、测序等技术方法与进化树分析相结合是探讨Hpylori地域起源特征的一种更为便捷有效的新方法。  相似文献   

11.
In the ciliate Tetrahymena thermophila, thousands of DNA segments of variable size are eliminated from the developing somatic macronucleus by specific DNA rearrangements. It is unclear whether rearrangement of the many different DNA elements occurs via a single mechanism or via multiple rearrangement systems. In this study, we characterized in vivo cis-acting sequences required for the rearrangement of the 1.1-kbp R deletion element. We found that rearrangement requires specific sequences flanking each side of the deletion element. The required sequences on the left side appear to span roughly a 70-bp region that is located at least 30 bp from the rearrangement boundary. When we moved the location of the left cis-acting sequences closer to the eliminated region, we observed a rightward shift of the rearrangement boundary such that the newly formed deletion junction retained its original distance from this flanking region. Likewise, when we moved the flanking region as much as 500 bp away from the deletion element, the rearrangement boundary shifted to remain in relative juxtaposition. Clusters of base substitutions made throughout this critical flanking region did not affect rearrangement efficiency or accuracy, which suggests a complex nature for this regulatory sequence. We also found that the right flanking region effectively replaced the essential sequences identified on the left side, and thus, the two flanking regions contain sequences of analogous function despite the lack of obvious sequence identity. These data taken together indicate that the R-element flanking regions contain sequences that position the rearrangement boundaries from a short distance away. Previously, a 10-bp polypurine tract flanking the M-deletion element was demonstrated to act from a distance to determine its rearrangement boundaries. No apparent sequence similarity exists between the M and R elements. The functional similarity between these different cis-acting sequences of the two elements is firm support for a common mechanism controlling Tetrahymena rearrangement.  相似文献   

12.
Phylogenetic tree reconstruction requires construction of a multiple sequence alignment (MSA) from sequences. Computationally, it is difficult to achieve an optimal MSA for many sequences. Moreover, even if an optimal MSA is obtained, it may not be the true MSA that reflects the evolutionary history of the underlying sequences. Therefore, errors can be introduced during MSA construction which in turn affects the subsequent phylogenetic tree construction. In order to circumvent this issue, we extend the application of the k-tuple distance to phylogenetic tree reconstruction. The k-tuple distance between two sequences is the sum of the differences in frequency, over all possible tuples of length k, between the sequences and can be estimated without MSAs. It has been traditionally used to build a fast ‘guide tree’ to assist the construction of MSAs. Using the 1470 simulated sets of sequences generated under different evolutionary scenarios, the neighbor-joining trees and BioNJ trees, we compared the performance of the k-tuple distance with four commonly used distance estimators including Jukes–Cantor, Kimura, F84 and Tamura–Nei. These four distance estimators fall into the category of model-based distance estimators, as each of them takes account of a specific substitution model in order to compute the distance between a pair of already aligned sequences. Results show that trees constructed from the k-tuple distance are more accurate than those from other distances most time; when the divergence between underlying sequences is high, the tree accuracy could be twice or higher using the k-tuple distance than other estimators. Furthermore, as the k-tuple distance voids the need for constructing an MSA, it can save tremendous amount of time for phylogenetic tree reconstructions when the data include a large number of sequences.  相似文献   

13.
Genome analysis with distance to the nearest dissimilar nucleotide   总被引:1,自引:0,他引:1  
DNA may be represented by sequences of four symbols, but it is often useful to convert those symbols into real or complex numbers for further analysis. Several mapping schemes have been used in the past, but most of them seem to be unrelated to any intrinsic characteristic of DNA. The objective of this work was to study a mapping scheme that is directly related to DNA characteristics, and that could be useful in discriminating between different species.Recently, we have proposed a methodology based on the inter-nucleotide distance, which proved to contribute to the discrimination among species. In this paper, we introduce a new distance, the distance to the nearest dissimilar nucleotide, which is the distance of a nucleotide to first occurrence of a different nucleotide. This distance is related to the repetition structure of single nucleotides. Using the information resulting from the concatenation of the distance to the nearest dissimilar and the inter-nucleotide distance, we found that this new distance brings additional discriminative capabilities. This suggests that the distance to the nearest dissimilar nucleotide might contribute with useful information about the evolution of the species.  相似文献   

14.
Biotechnological and biomolecular advances have introduced novel uses for DNA such as DNA computing, storage, and encryption. For these applications, DNA sequence design requires maximal desired (and minimal undesired) hybridizations, which are the product of a single new DNA strand from 2 single DNA strands. Here, we propose a novel constraint to design DNA sequences based on thermodynamic properties. Existing constraints for DNA design are based on the Hamming distance, a constraint that does not address the thermodynamic properties of the DNA sequence. Using a unique, improved genetic algorithm, we designed DNA sequence sets which satisfy different distance constraints and employ a free energy gap based on a minimum free energy (MFE) to gauge DNA sequences based on set thermodynamic properties. When compared to the best constraints of the Hamming distance, our method yielded better thermodynamic qualities. We then used our improved genetic algorithm to obtain lower-bound DNA sequence sets. Here, we discuss the effects of novel constraint parameters on the free energy gap.  相似文献   

15.
Operational taxonomic units (OTUs) are conventionally defined at a phylogenetic distance (0.03—species, 0.05—genus, 0.10—family) based on full-length 16S rRNA gene sequences. However, partial sequences (700 bp or shorter) have been used in most studies. This discord may affect analysis of diversity and species richness because sequence divergence is not distributed evenly along the 16S rRNA gene. In this study, we compared a set each of bacterial and archaeal 16S rRNA gene sequences of nearly full length with multiple sets of different partial 16S rRNA gene sequences derived therefrom (approximately 440-700 bp), at conventional and alternative distance levels. Our objective was to identify partial sequence region(s) and distance level(s) that allow more accurate phylogenetic analysis of partial 16S rRNA genes. Our results showed that no partial sequence region could estimate OTU richness or define OTUs as reliably as nearly full-length genes. However, the V1-V4 regions can provide more accurate estimates than others. For analysis of archaea, we recommend the V1-V3 and the V4-V7 regions and clustering of species-level OTUs at 0.03 and 0.02 distances, respectively. For analysis of bacteria, the V1-V3 and the V1-V4 regions should be targeted, with species-level OTUs being clustered at 0.04 distance in both cases.  相似文献   

16.
Most molecular analyses, including phylogenetic inference, are based on sequence alignments. We present an algorithm that estimates relatedness between biomolecules without the requirement of sequence alignment by using a protein frequency matrix that is reduced by singular value decomposition (SVD), in a latent semantic index information retrieval system. Two databases were used: one with 832 proteins from 13 mitochondrial gene families and another composed of 1000 sequences from nine types of proteins retrieved from GenBank. Firstly, 208 sequences from the first database and 200 from the second were randomly selected and compared using edit distance between each pair of sequences and respective cosines and Euclidean distances from SVD. Correlation between cosine and edit distance was -0.32 (P < 0.01) and between Euclidean distance and edit distance was +0.70 (P < 0.01). In order to check the ability of SVD in classifying sequences according to their categories, we used a sample of 202 sequences from the 13 gene families as queries (test set), and the other proteins (630) were used to generate the frequency matrix (training set). The classification algorithm applies a voting scheme based on the five most similar sequences with each query. With a 3-peptide frequency matrix, all 202 queries were correctly classified (accuracy = 100%). This algorithm is very attractive, because sequence alignments are neither generated nor required. In order to achieve results similar to those obtained with edit distance analysis, we recommend that Euclidean distance be used as a similarity measure for protein sequences in latent semantic indexing methods.  相似文献   

17.
The focus of the research is on the analysis of genome sequences. Based on the inter-nucleotide distance sequence, we propose the conditional multinomial distribution profile for the complete genomic sequence. These profiles can be used to define a very simple, computationally efficient, alignment-free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to build the phylogenetic tree of 24 complete genome sequences of coronaviruses. Our results demonstrate the new method is powerful and efficient.  相似文献   

18.
19.
Molecular phylogenetic studies are executed by the alignment of protein or nucleotide sequences, followed by the construction of trees according either to distance, parsimony or maximum likelihood methods. Linguistic analysis was investigated here as an alternative method to aligning sequences. In an empirical study, we inferred trees for a variable number of Bovidae and sister taxa based on three different mitochondrial orthologous sequences. Comparison of our results with existing phylogenies indicated that the method, except for some still disputable points, was able to establish sensible systematic relationships, similar to patterns of radiation of the family found in recent studies.  相似文献   

20.
Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号