首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.  相似文献   

2.

Background  

Representing symbolic sequences graphically using iterated maps has enjoyed an enduring popularity since it was first proposed in Jeffrey 1990 as chaos game representation (CGR). The usefulness of this representation goes beyond the convenience of a scale independent representation, it provides a variable memory length representation of transition. This includes the representation of succession with non-integer order, which comes with the promise of generalizing Markovian formalisms. The original proposal targeted genomic sequences only but since then several generalizations have been proposed, many specifically designed to handle protein data.  相似文献   

3.
基于CGR的DNA序列的时间序列模型(英文)   总被引:1,自引:0,他引:1  
高洁  蒋丽丽  徐振源 《生物信息学》2010,8(2):156-160,164
利用DNA序列的混沌游戏表示(chaos game representation,CGR),提出了将2维DNA图谱转化成相应的类谱格式的方法。该方法不仅提供了一个较好的视觉表示,而且可将DNA序列转化成一个时间序列。利用CGR坐标将DNA序列转化成CGR弧度序列,并引入长记忆ARFIMA(p,d,q)模型去拟合此类序列,发现此类序列中有显著的长相关性且拟合度很好。  相似文献   

4.
Hai ming Ni  Da wei Qi  Hongbo Mu 《Genomics》2018,110(3):180-190
Converting DNA sequence to image by using chaos game representation (CGR) is an effective genome sequence pretreatment technology, which provides the basis for further analysis between the different genes. In this paper, we have constructed 10 mammal species, 48 hepatitis E virus (HEV), and 10 kinds of bacteria genetic CGR images, respectively, to calculate the mean structural similarity (MSSIM) coefficient between every two CGR images. From our analysis, the MSSIM coefficient of gene CGR images can accurately reflect the similarity degrees between different genomes. Hierarchical clustering analysis was used to calculate the class affiliation and construct a dendrogram. Large numbers of experiments showed that this method gives comparable results to the traditional Clustal X phylogenetic tree construction method, and is significantly faster in the clustering analysis process. Meanwhile MSSIM combined CGR method was also able to efficiently clustering of large genome sequences, which the traditional multiple sequence alignment methods (e.g. Clustal X, Clustal Omega, Clustal W, et al.) cannot classify.  相似文献   

5.
基于混沌游走方法的Rh血型系统中RHD基因的分析   总被引:3,自引:0,他引:3  
高雷  齐斌  朱平 《生命科学研究》2009,13(5):408-412
利用基于经典HP模型的蛋白质序列混沌游走方法(chaos game representation,CGR),给出了RHD基因的蛋白质序列CGR图,可视作蛋白质序列二级结构的一个特征图谱描述.对临床上的血型鉴别有一定的参考价值.另外.还根据由Jeffrey在1990年提出的描绘DNA序列的CGR方法,给出了RHD基因的DNA序列的CGR图.并且根据RHD基因DNA序列的CGR图算出了尺日D基因相应的马尔可夫两步转移概率矩阵,从概率矩阵表可以看出RHD基因对编码氨基酸的三联子的第3个碱基的使用偏好性.  相似文献   

6.
Chaos game representation (CGR) was proposed recently to visualize nucleotide sequences as one of the first applications of this technique in the field of biochemistry.1 In this paper we would like to demonstrate that representations similar to CGR can be generalized and applied for visualizing and analyzing protein databases. Examples of applications will be presented for investigating regularities, and motifs in the primary structure of proteins, and for analyzing possible structural attachments on the super-secondary structure level of proteins. A further application will be presented for testing structure prediction methods using CGR.  相似文献   

7.
Similar to the chaos game representation (CGR) of DNA sequences proposed by Jeffrey (Nucleic Acid Res. 18 (1990) 2163), a new CGR of protein sequences based on the detailed HP model is proposed. Multifractal and correlation analyses of the measures based on the CGR of protein sequences from complete genomes are performed. The Dq spectra of all organisms studied are multifractal-like and sufficiently smooth for the Cq curves to be meaningful. The Cq curves of bacteria resemble a classical phase transition at a critical point. The correlation distance of the difference between the measure based on the CGR of protein sequences and its fractal background is also proposed to construct a more precise phylogenetic tree of bacteria.  相似文献   

8.
The chaos game representation (CGR) is a scatter plot derived from a DNA sequence, with each point of the plot corresponding to one base of the sequence. If the DNA sequence were a random collection of bases, the CGR would be a uniformly filled square; conversely, any patterns visible in the CGR represent some pattern (information) in the DNA sequence. In this paper, patterns previously observed in a variety of DNA sequences are explained solely in terms of nucleotide, dinucleotide and trinucleotide frequencies.  相似文献   

9.
《Genomics》2020,112(2):1847-1852
A novel method is proposed to detect the acceptor and donor splice sites using chaos game representation and artificial neural network. In order to achieve high accuracy, inputs to the neural network, or feature vector, shall reflect the true nature of the DNA segments. Therefore it is important to have one-to-one numerical representation, i.e. a feature vector should be able to represent the original data. Chaos game representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane in a one-to-one manner. Using CGR, a DNA sequence can be mapped to a numerical sequence that reflects the true nature of the original sequence. In this research, we propose to use CGR as feature input to a neural network to detect splice sites on the NN269 dataset. Computational experiments indicate that this approach gives good accuracy while being simpler than other methods in the literature, with only one neural network component. The code and data for our method can be accessed from this link: https://github.com/thoang3/portfolio/tree/SpliceSites_ANN_CGR.  相似文献   

10.
We explored DNA structures of genomes by means of a new tool derived from the "chaotic dynamical systems" theory (the so-called chaos game representation [CGR]), which allows the depiction of frequencies of oligonucleotides in the form of images. Using CGR, we observe that subsequences of a genome exhibit the main characteristics of the whole genome, attesting to the validity of the genomic signature concept. Base concentrations, stretches (runs of complementary bases or purines/pyrimidines), and patches (over- or underexpressed words of various lengths) are the main factors explaining the variability observed among sequences. The distance between images may be considered a measure of phylogenetic proximity. Eukaryotes and prokaryotes can be identified merely on the basis of their DNA structures.  相似文献   

11.
Many studies have demonstrated the presence of scale invariance and long-range correlation in animal and human neuronal spike trains. The methodologies to extract the fractal or scale-invariant properties, however, do not address the issue as to the existence within the train of fine temporal structures embedded in the global fractal organisation. The present study addresses this question in human spike trains by the chaos game representation (CGR) approach, a graphical analysis with which specific temporal sequences reveal themselves as geometric structures in the graphical representation. The neuronal spike train data were obtained from patients whilst undergoing pallidotomy. Using this approach, we observed highly structured regions in the representation, indicating the presence of specific preferred sequences of interspike intervals within the train. Furthermore, we observed that for a given spike train, the higher the magnitude of its scaling exponent, the more pronounced the geometric patterns in the representation and, hence, higher probability of occurrence of specific subsequences. Given its ability to detect and specify in detail the preferred sequences of interspike intervals, we believe that CGR is a useful adjunct to the existing set of methodologies for spike train analysis.  相似文献   

12.
A probabilistic measure for alignment-free sequence comparison   总被引:3,自引:0,他引:3  
MOTIVATION: Alignment-free sequence comparison methods are still in the early stages of development compared to those of alignment-based sequence analysis. In this paper, we introduce a probabilistic measure of similarity between two biological sequences without alignment. The method is based on the concept of comparing the similarity/dissimilarity between two constructed Markov models. RESULTS: The method was tested against six DNA sequences, which are the thrA, thrB and thrC genes of the threonine operons from Escherichia coli K-12 and from Shigella flexneri; and one random sequence having the same base composition as thrA from E.coli. These results were compared with those obtained from CLUSTAL W algorithm (alignment-based) and the chaos game representation (alignment-free). The method was further tested against a more complex set of 40 DNA sequences and compared with other existing sequence similarity measures (alignment-free). AVAILABILITY: All datasets and computer codes written in MATLAB are available upon request from the first author.  相似文献   

13.
Summary Chaos game representation (CGR) is a novel holistic approach that provides a visual image of a DNA sequence quite different from the traditional linear arrangement of nucleotides. Although it is known that CGR patterns depict base composition and sequentiality, the biological significance of the specific features of each pattern is not understood. To systematically examine these features, we have examined the coding sequences of 7 human globin genes and 29 relatively conserved alcohol dehydrogenase (Adh) genes from phylogenetically divergent species. The CGRs of human globin cDNAs were similar to one another and to the entire human globin gene complex. Interestingly, human globin CGRs were also strikingly similar to human Adh CGRs. Adh CGRs were similar for genes of the same or closely related species but were different for relatively conserved Adh genes from distantly related species. Dinucleotide frequencies may account for the self-similar pattern that is characteristic of vertebrate CGRs and the genome-specific features of CGR patterns. Mutational frequencies of dinucleotides may vary among genome types. The special features of CG dinucleotides of vertebrates represent such an example. The CGR patterns examined thus far suggest that the evolution of a gene and its coding sequence should not be examined in isolation. Consideration should be given to genome-specific differential mutation rates for different dinucleotides or specific oligonucleotides. Offprint requests to: S. M. Singh  相似文献   

14.
Analysis of genomic sequences by Chaos Game Representation   总被引:4,自引:0,他引:4  
MOTIVATION: Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their position in a continuous space. This distribution of positions has two properties: it is unique, and the source sequence can be recovered from the coordinates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to identify succession schemes have been entirely overlooked in previous studies which raises the possibility that CGR may be upgraded from a mere representation technique to a sequence modeling tool. RESULTS: The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-integer orders. Therefore, Markov models are particular cases of CGR models rather than the reverse, as currently accepted. In addition, the CGR generalization has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherichia coli K-12 as a test data-set, in particular, the genes thrA, thrB and thrC of the threonine operon.  相似文献   

15.
【背景】目前犬布鲁氏菌病诊断存在一定的困难。【目的】筛选并研究犬种布鲁氏菌单克隆抗体4H3株的特异性抗原表位。【方法】利用噬菌体肽库展示技术,以犬种布鲁氏菌单克隆抗体4H3株作为靶分子,包被酶标板,用12肽随机肽库经过3轮生物淘洗程序进行筛选。经过3轮筛选后,噬菌体产出率从5.00×10-7增加到9.84×10-6,假阳性率逐轮降低。从第3轮筛选的阳性克隆中随机挑取14个进行增殖,提取基因组DNA,进行测序分析;并通过iELISA和cELISA检测阳性克隆的亲和性和特异性。【结果】14株阳性单克隆噬菌体共出现3种不同的短肽序列,分别是KMSIRHPIRLPI、ILRRRRKRIIQI和QRIHMRLTTQS;iELISA结果表明3种短肽序列与单克隆抗体的亲和性依次为KMSIRHPIRLPI>ILRRRRKRIIQI>QRIHMRLTTQS;cELISA结果显示短肽KMSIRHPIRLPI和ILRRRRKRIIQI特异性较强。对亲和性较强、特异性较高的2条短肽KMSIRHPIRLPI和ILRRRRKRIIQI展开具体分析,比对分析表...  相似文献   

16.

Background

Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2 -L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.

Results

The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.

Conclusions

The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.  相似文献   

17.
Detection of the antimicrobial peptide gene in different Amaranthus species   总被引:1,自引:0,他引:1  
Using primers to amplify the gene AMP2 in Amaranthus caudatus, we found the gene to be present in seven other species of the Amaranthus genus (A. albus, A. cruentus, A. blitum, A. hybridus, A. hypochondriacus, A. retroflexus and A. tricolor), in which it had not been described previously. The PCR products were sequenced and it was established that all the sequences were identical, except for two polymorphisms. These single nucleotide polymorphisms occurred at nucleotide positions 45 and 246. This exchange of one nucleotide for another was manifested in an amino acid change in both cases. Due to the fact that both polymorphisms lay outside the region encoding the chitin-binding peptide domain, which is crucial for antimicrobial peptide function, they will not likely affect the proper functioning of the peptide. With the exception of the above-mentioned polymorphisms, all sequences were identical to the sequence of the AMP2 gene that codes for the A. caudatus Ac-AMP2 (antimicrobial peptide isolated from Amaranthus caudatus seeds). The detection of sequences with high degree of sequence similarity to A. caudatus AMP2 gene leads us to the assumption that an antimicrobial peptide could also be produced by other amaranth species.  相似文献   

18.
In this paper, we intend to predict protein structural classes (α, β, α+β, or α/β) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.  相似文献   

19.
A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy , a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.  相似文献   

20.
Efforts to predict protein secondary structure have been hampered by the apparent structural plasticity of local amino acid sequences. Kabsch and Sander (1984, Proc. Natl. Acad. Sci. USA 81, 1075–1078) articulated this problem by demonstrating that identical pentapeptide sequences can adopt distinct structures in different proteins. With the increased size of the protein structure database and the availability of new methods to characterize structural environments, we revisit this observation of structural plasticity. Within a set of proteins with less than 50% sequence identity, 59 pairs of identical hexapeptide sequences were identified. These local structures were compared and their surrounding structural environments examined. Within a protein structural class (α/α, β/β, α/β, α + β), the structural similarity of sequentially identical hexapeptides usually is preserved. This study finds eight pairs of identical hexapeptide sequences that adopt β-strand structure in one protein and α-helical structure in the other. In none of the eight cases do the members of these sequence pairs come from proteins within the same folding class. These results have implications for class dependent secondary structure prediction algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号