首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper, a novel 3D graphical representation of DNA sequence based on codons is proposed. Since there is not loss of information due to overlapping and containing loops, this representation will be useful for comparison of different DNA sequences. This 3D curve will be convenient for DNA mutations comparison specially. In continues we give a numerical characterization of DNA sequences based on the new 3D curve. This characterization facilitates quantitative comparisons of similarities/dissimilarities analysis of DNA sequences based on codons.  相似文献   

2.
In this paper, we first present a new concept of ‘weight’ for 64 triplets and define a different weight for each kind of triplet. Then, we give a novel 2D graphical representation for DNA sequences, which can transform a DNA sequence into a plot set to facilitate quantitative comparisons of DNA sequences. Thereafter, associating with a newly designed measure of similarity, we introduce a novel approach to make similarities/dissimilarities analysis of DNA sequences. Finally, the applications in similarities/dissimilarities analysis of the complete coding sequences of β-globin genes of 11 species illustrate the utilities of our newly proposed method.  相似文献   

3.
4.
We consider a novel 2-D graphical representation of DNA sequences according to chemical structures of bases, reflecting distribution of bases with different chemical structure, preserving information on sequential adjacency of bases, and allowing numerical characterization. The representation avoids loss of information accompanying alternative 2-D representations in which the curve standing for DNA overlaps and intersects itself. Based on this representation we present a numerical characterization approach by the leading eigenvalues of the matrices associated with the DNA sequences. The utility of the approach is illustrated on the coding sequences of the first exon of human beta-globin gene.  相似文献   

5.
在生物序列的二维图形表示的基础上,利用Balaban指数和信息分布指数比较生物序列的相似性,我们以包括人类等9种不同物种的DNA序列和yar029w等6种蛋白质为例来说明该方法的使用.  相似文献   

6.
Journal of Molecular Modeling - A new 3D graphical representation of DNA sequences is introduced. This representation is called 3D-dynamic representation. It is a generalization of the 2D-dynamic...  相似文献   

7.
DNA sequencing has resulted in an abundance of data on DNA sequences for various species. Hence, the characterization and comparison of sequences become more important but still difficult tasks. In this paper, we first give a 2-D ladderlike graphical representation for the characteristic sequences of a DNA sequence, and then construct a 3-component vector, in which the normalized ALE-indices extracted from such three 2-D graphs via D/D matrices are individual components, to characterize the DNA sequence. The examination of similarities/dissimilarities among sequences of the beta-globin genes of different species illustrates the utility of the approach.  相似文献   

8.
基于CGR的DNA序列的时间序列模型(英文)   总被引:1,自引:0,他引:1  
高洁  蒋丽丽  徐振源 《生物信息学》2010,8(2):156-160,164
利用DNA序列的混沌游戏表示(chaos game representation,CGR),提出了将2维DNA图谱转化成相应的类谱格式的方法。该方法不仅提供了一个较好的视觉表示,而且可将DNA序列转化成一个时间序列。利用CGR坐标将DNA序列转化成CGR弧度序列,并引入长记忆ARFIMA(p,d,q)模型去拟合此类序列,发现此类序列中有显著的长相关性且拟合度很好。  相似文献   

9.
10.
MOTIVATION: Biologists usually work with textual DNA sequences (succession of A, C, G and T). This representation allows biologists to study the syntax and other linguistic properties of DNA sequences. Nevertheless, such a linear coding offers only a local and a one-dimensional vision of the molecule. The 3D structure of DNA is known to be very important in many essential biological mechanisms. By using 3D conformation models, one is able to construct a 3D trajectory of a naked DNA molecule. From the various studies that we performed, it turned out that two very different textual DNA sequences could have similar 3D structures. RESULTS: In this article, we address a new research work on 3D pattern matching for DNA sequences. The aim of this work is to enhance conventional pattern matching analyses with 3D-augmented criteria. We have developed an algorithm, based on 3D trajectories, which compares angles formed by these trajectories and thus quantifies the difference between two 3D DNA sequences. This analysis performs from a global scale to al local one. AVAILABILITY: Available on request from the authors.  相似文献   

11.
Directed graphs of DNA sequences and their numerical characterization   总被引:1,自引:0,他引:1  
In this paper we (1) introduce a directed graphical representation of DNA primary sequences; (2) describe a scheme that transforms the directed graph of a DNA sequence into an upper triangular matrix; (3) investigate whether or not the existing matrix-based invariants of DNA sequences are compatible for the upper triangular matrix representation. The utility of our method is illustrated by an examination of the similarity between human and other seven species.  相似文献   

12.
In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species.  相似文献   

13.
Three related polyoma virus species, designated D92 (92% the size of full-length polyoma virus DNA), D91 (91%) and D76 (76%) have been analysed and their structures compared with that of polyoma virus A2 DNA. Three independent methods (restriction endonuclease cleavage, depurination fingerprinting and DNA-DNA hybridization) were used in the analysis.The defective DNAs appear to be: (1) entirely composed of viral sequences (no host DNA sequences were detected): (2) made up in part of long continuous sequences of DNA which appear identical to sequences of A2 DNA (D92 contains continuous sequences from 1 to 72 map units on the physical map of A2 DNA; that is, it contains the entire late region and part of the early region of the viral DNA. D91 and D76 contain those same sequences except for a 1% deletion around 18 map units): (3) made up in part of rearranged viral sequences.Several interesting features were noted about the rearranged sequences present in the defective DNAs. Sequences from the region around 67 map units were found linked to other (non-contiguous) regions of the DNA. Sequences from about 72 map units were linked to sequences from about 1 map unit. Multiple copies of sequences from 67 to 72 map units (from around the origin of DNA replication) were found (4 copies in D91 and D92, and 2 copies in D76).  相似文献   

14.
Stupar RM  Song J  Tek AL  Cheng Z  Dong F  Jiang J 《Genetics》2002,162(3):1435-1444
The heterochromatin in eukaryotic genomes represents gene-poor regions and contains highly repetitive DNA sequences. The origin and evolution of DNA sequences in the heterochromatic regions are poorly understood. Here we report a unique class of pericentromeric heterochromatin consisting of DNA sequences highly homologous to the intergenic spacer (IGS) of the 18S.25S ribosomal RNA genes in potato. A 5.9-kb tandem repeat, named 2D8, was isolated from a diploid potato species Solanum bulbocastanum. Sequence analysis indicates that the 2D8 repeat is related to the IGS of potato rDNA. This repeat is associated with highly condensed pericentromeric heterochromatin at several hemizygous loci. The 2D8 repeat is highly variable in structure and copy number throughout the Solanum genus, suggesting that it is evolutionarily dynamic. Additional IGS-related repetitive DNA elements were also identified in the potato genome. The possible mechanism of the origin and evolution of the IGS-related repeats is discussed. We demonstrate that potato serves as an interesting model for studying repetitive DNA families because it is propagated vegetatively, thus minimizing the meiotic mechanisms that can remove novel DNA repeats.  相似文献   

15.
DNA sequences seen in the normal character-based representation appear to have a formidable mixing of the four nucleotides without any apparent order. Nucleotide frequencies and distributions in the sequences have been studied extensively, since the simple rule given by Chargaff almost a century ago that equates the total number of purines to the pyrimidines in a duplex DNA sequence. While it is difficult to trace any relationship between the bases from studies in the character representation of a DNA sequence, graphical representations may provide a clue. These novel representations of DNA sequences have been useful in providing an overview of base distribution and composition of the sequences and providing insights into many hidden structures. We report here our observation based on a graphical representation that the intra-purine and intra-pyrimidine differences in sequences of conserved genes generally follow a quadratic distribution relationship and show that this may have arisen from mutations in the sequences over evolutionary time scales. From this hitherto undescribed relationship for the gene sequences considered in this report we hypothesize that such relationships may be characteristic of these sequences and therefore could become a barrier to large scale sequence alterations that override such characteristics, perhaps through some monitoring process inbuilt in the DNA sequences. Such relationship also raises the possibility of intron sequences playing an important role in maintaining the characteristics and could be indicative of possible intron-late phenomena.  相似文献   

16.
New 3D graphical representation of DNA sequence based on dual nucleotides   总被引:2,自引:2,他引:0  
We introduce a 3D graphical representation of DNA sequences based on the pairs of dual nucleotides (DNs). Based on this representation, we consider some mathematical invariants and construct two 16-component vectors associated with these invariants. The vectors are used to characterize and compare the complete coding sequence part of beta globin gene of nine different species. The examination of similarities/dissimilarities illustrates the utility of the approach.  相似文献   

17.
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution.  相似文献   

18.
ABSTRACT: BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies.  相似文献   

19.
20.
We describe the unique features of an aberrantly rearranged mu immunoglobulin heavy chain gene isolated from MPC-11 cells (a gamma 2b producing Balb/c plasmacytoma). A novel rearrangement has occurred 1.5 Kb 5' of the MPC-11 mu gene (denoted 18b mu) resulting in the deletion of the majority of the repetitive switch region (S mu) and 5' flanking DNA including the Joining (JH) sequences. The remainder (275 bp) of the S mu repeat has undergone a complete sequence inversion. DNA sequences 5' of the inverted S mu sequence do not resemble Variable (VH), Diversity (D), JH or their conserved flanking sequences. A DNA sequence localized 5' of the inverted S mu sequence, (p18b mu-1.4) detects a small family of homologous sequences in Balb/c DNA. The 18b mu-1.4 like sequences lack homology to S mu, exhibit flanking sequence polymorphisms in 5 out of 6 inbred mouse strains and undergo partial or complete deletion in 5 out of 10 plasmacytomas tested. Two 18b mu-1.4 homologous sequences display a higher copy number in C57Bl/6, AL/N and CAL9 mouse strains.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号