首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
时间序列的相似性测度   总被引:1,自引:0,他引:1  
时间序列(time series)是指按时间顺序排列的观测值集合,在生物信息学研究领域中,DNA序列和基因表达数据都可以视为时间序列数据。时间序列分析中很重要的环节就是刻划两个时间序列或者时间子序列的相似性,用于序列比对等。时间序列的相似性测度是时间序列研究中的基础和重点,直接影响查询、聚类等后续计算的效率和精度,在高通量基因芯片数据分析、基因网络构建等研究中,具有重要的应用,目前已引起了众多研究人员的关注,在欧氏距离的基础上进行了大量的研究,本文综述了基于欧式距离和时间弯曲的时间序列相似性测度及其相关领域的研究进展,可作为进一步研究的参考。  相似文献   

2.
一种用于蛋白质相似性分析的新的相对距离   总被引:1,自引:0,他引:1  
本文论述了一种新的相对距离,用于分析不同蛋白质序列的相似性分析和构造进化树.此种距离基于Lempel-Zip复杂度,不需要进行序列比对和复杂性算法.为了说明这种距离的合理性,本文对8个物种进行了相似性分析并构造了其进化树.  相似文献   

3.
DNA序列高维空间数字编码的运算法则   总被引:1,自引:0,他引:1  
DNA序列的高维空间二进制数字编码,除可以对DNA序列的碱基结构、功能基团、碱基互补、氢键强弱等性质进行编码之外,还可以方便地进行 数学运算和逻辑运算。DNA序列高维空间数字编码的运算法则是:(1)根据DNA序列数码的奇偶性质,可以推导出其与末位碱基的对应关系。当DNA序列S的数值X(S)=4n,4n 1,4n 2,4n 3时,其末位碱基依次为C,T,A,G(n=0,1,2,…)。(2)提出DNA序列高维空间的表观维数Nv,数值维数Nx及差异维数Nd的概念。当Nd=0时,首位碱基为A或G,当Nd=2n或2n 1(n=1,2,…)时,首痊碱基为(C)^n或(C)^nT。(3)推导出DNA序列点突变(单核苷酸多态性SNP)的运算法则。(4)推导出DNA重复序列(Tandem repeat)的运算法则。(5)提出DNA子序列(subsequence)的概念并定义DNA子序列的定值部Xi(digital value)和定位部Qi(location value)及其计算公式。(6)推导出DNA序列的延长运算、删除运算、缺失运算、插入运算、转位运算、换位运算和置换运算等的运算法则。(7)通过按位加运算求得DNA序列的汉明距离dh,碱基距离dh‘,基团距离dh″和共轭距离dG以及这些距离的意义与联系。(8)分析结果表明DNA序列的数字编码比常规的字符编码在数学运算上具有明显的优越性。  相似文献   

4.
在生物序列的二维图形表示的基础上,利用Balaban指数和信息分布指数比较生物序列的相似性,我们以包括人类等9种不同物种的DNA序列和yar029w等6种蛋白质为例来说明该方法的使用.  相似文献   

5.
给出了蛋白质序列的一种六维表示方法,根据这种表示方法有3种不同表示形式,利用这3种形式来构造距离矩阵的信息熵,然后通过信息熵向量的欧式距离、夹角来比较序列之间的相似性。  相似文献   

6.
DNA的图形编码是在几何意义下,在不同位置,用不同的标记符号及不同的方向线段,对DNA的序列进行编码.DNA图形编码相对于DNA的字符编码而言,具有直观、简明、形象和便于比较局部DNA序列的相似性等特点。在分析已知各类:DNA的图形表示模式的基础上,提出一种DNA序列的“双符三阶”图形编码,并以此对一些特异DNA编码序列进行分析。DNA图形编码与DNA字符编码呈一一对应关系,具有简便易行、编译方便、形象丰富、便于比较等优点。适用于DNA短序列的相似性检测与分析,在生物信息学上有一定的应用前景。  相似文献   

7.
艾亮  冯杰 《生物信息学》2023,21(3):179-186
本文提出了一种新的快速非比对的蛋白质序列相似性与进化分析方法。在刻画蛋白质序列特征时,首先将氨基酸的10种理化性质通过主成分分析浓缩为6个主成分,并且将每条蛋白质序列里的氨基酸数目作为权重对主成分得分值进行加权平均,然后再融合氨基酸的位置信息构成一个26维的蛋白质序列特征向量,最后利用欧式距离度量蛋白质序列间的相似性及进化关系。通过对3个蛋白质序列数据集的测试表明,本文提出的方法能将每条蛋白质序列准确聚类,并且简便快捷,说明了该方法的有效性。  相似文献   

8.
杨子恒 《遗传》1991,13(2):9-11
本文介绍了一组应用于多个DNA序列比较分析的BASIC程序。程序K1246p利用1-P、2-P、4-P和6-P替代模型估计同源DNA序列分化后的核昔酸替代数K;程序DOT利用点阵方法比较两个DNA或蛋白质序列,寻找序列间的相似性以及寡聚核昔酸、回文序列等初级结构特征。  相似文献   

9.
基于动态规划的快速序列比对算法   总被引:3,自引:0,他引:3  
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。  相似文献   

10.
社鼠(Niviventer confucianus)属于啮齿目(Rodentia)、鼠科(Muridae)、白腹鼠属(Niviventer),关于该物种的分子系统学研究极少。为获取社鼠线粒体基因组全序列,提取其基因组总DNA,参照近缘物种线粒体基因组全序列设计34对特异性引物,利用PCR扩增全部片段后进行测序,之后对其基因组组成及结构特点进行了初步分析。结果表明,社鼠线粒体基因组全序列长16 281 bp(GenBank收录号:KJ152220),包含22个tRNA基因、13个蛋白质编码基因、2个rRNA基因和1个非编码控制区;基因组核苷酸组成为34.0%A、28.6%T、24.9%C、12.5%G。将所得序列与社鼠近缘物种(川西白腹鼠、小家鼠、褐家鼠)的线粒体全基因组进行比较,结果显示,四个物种的线粒体基因组虽然在基因组大小、部分tRNA二级结构、部分蛋白质编码基因的起始或终止密码子及控制区长度和碱基组成上有差异,但基因组结构和序列特征方面都具有较高的相似性。四个物种线粒体全基因组间的遗传距离显示,社鼠与川西白腹鼠距离最近,而与小家鼠距离最远。该研究为利用线粒体全基因组信息进行啮齿类分子系统学研究提供了有价值的资料。  相似文献   

11.
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.  相似文献   

12.
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.  相似文献   

13.
Here we propose a weighted measure for the similarity analysis of DNA sequences. It is based on LZ complexity and (0,1) characteristic sequences of DNA sequences. This weighted measure enables biologists to extract similarity information from biological sequences according to their requirements. For example, by this weighted measure, one can obtain either the full similarity information or a similarity analysis from a given biological aspect. Moreover, the length of DNA sequence is not problematic. The application of the weighted measure to the similarity analysis of β-globin genes from nine species shows its flexibility.  相似文献   

14.
The degree of similarity of DNA sequences can be concluded according to the comparison of DNA sequences, which helps to speculate their relationship in respect of the structure, function and evolution. In this paper, we introduce the fundamental of the weighted relative entropy based on 2-step Markov Model to compare DNA sequences. The DNA sequence, consisted of four characters A, T, C, G, can be considered as a Markov chain. By taking state space I = {A, T, C, G} and describe the DNA sequences with 2-step transition probability matrix we can get the eigenvalue of the DNA sequence to define the similarity metric. Therefore, we find a new method to compare the DNA sequences, which is used to classify chromosomes DNA sequences obtained from 30 species. The phylogenetic tree built by the alignment-free method of the distance matrix resulted from the weighted relative entropy has clearer and more accurate division.  相似文献   

15.
A database of the structural properties of all 32,896 unique DNA octamer sequences has been calculated, including information on stability, the minimum energy conformation and flexibility. The contents of the database have been analysed using a variety of Euclidean distance similarity measures. A global comparison of sequence similarity with structural similarity shows that the structural properties of DNA are much less diverse than the sequences, and that DNA sequence space is larger and more diverse than DNA structure space. Thus, there are many very different sequences that have very similar structural properties, and this may be useful for identifying DNA motifs that have similar functional properties that are not apparent from the sequences. On the other hand, there are also small numbers of almost identical sequences that have very different structural properties, and these could give rise to false-positives in methods used to identify function based on sequence alignment. A simple validation test demonstrates that structural similarity can differentiate between promoter and non-promoter DNA. Combining structural and sequence similarity improves promoter recall beyond that possible using either similarity measure alone, demonstrating that there is indeed information available in the structure of double-helical DNA that is not readily apparent from the sequence.  相似文献   

16.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

17.
Zuckerkandl and Pauling (1962, "Horizons in Biochemistry," pp. 189-225, Academic Press, New York) first noticed that the degree of sequence similarity between the proteins of different species could be used to estimate their phylogenetic relationship. Since then models have been developed to improve the accuracy of phylogenetic inferences based on amino acid or DNA sequences. Most of these models were designed to yield distance measures that are linear with time, on average. The reliability of phylogenetic reconstruction, however, depends on the variance of the distance measure in addition to its expectation. In this paper we show how the method of generalized least squares can be used to combine data types, each most informative at different points in time, into a single distance measure. This measure reconstructs phylogenies more accurately than existing non-likelihood distance measures. We illustrate the approach for a two-rate mutation model and demonstrate that its application provides more accurate phylogenetic reconstruction than do currently available analytical distance measures.  相似文献   

18.
Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.  相似文献   

19.
We develop a novel method of asserting the similarity between two biological sequences without the need for alignment. The proposed method uses free energy of nearest-neighbor interactions as a simple measure of dissimilarity. It is used to perform a search for similarities of a query sequence against three complex datasets. The sensitivity and selectivity are computed and evaluated and the performance of the proposed distance measure is compared. Real data analysis shows that is a very efficient, sensitive and high-selective algorithm in comparing large dataset of DNA sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号