首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
基于混沌游走方法的Rh血型系统中RHD基因的分析   总被引:3,自引:0,他引:3  
高雷  齐斌  朱平 《生命科学研究》2009,13(5):408-412
利用基于经典HP模型的蛋白质序列混沌游走方法(chaos game representation,CGR),给出了RHD基因的蛋白质序列CGR图,可视作蛋白质序列二级结构的一个特征图谱描述.对临床上的血型鉴别有一定的参考价值.另外.还根据由Jeffrey在1990年提出的描绘DNA序列的CGR方法,给出了RHD基因的DNA序列的CGR图.并且根据RHD基因DNA序列的CGR图算出了尺日D基因相应的马尔可夫两步转移概率矩阵,从概率矩阵表可以看出RHD基因对编码氨基酸的三联子的第3个碱基的使用偏好性.  相似文献   

2.
基于CGR的DNA序列的时间序列模型(英文)   总被引:1,自引:0,他引:1  
高洁  蒋丽丽  徐振源 《生物信息学》2010,8(2):156-160,164
利用DNA序列的混沌游戏表示(chaos game representation,CGR),提出了将2维DNA图谱转化成相应的类谱格式的方法。该方法不仅提供了一个较好的视觉表示,而且可将DNA序列转化成一个时间序列。利用CGR坐标将DNA序列转化成CGR弧度序列,并引入长记忆ARFIMA(p,d,q)模型去拟合此类序列,发现此类序列中有显著的长相关性且拟合度很好。  相似文献   

3.
Comprehensive knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 °C degree plays a major role for helping to design stable proteins. How to predict function-unknown proteins to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate hidden patterns in protein sequences, and also can visually reveal their previously unknown structures. In this paper, using the general form of pseudo amino acid composition to represent protein samples, we proposed a novel method for presenting protein sequence to a CGR picture using CGR algorithm. A 24-dimensional vector extracted from these CGR segments and the first two PCA features are used to classify thermophilic and mesophilic proteins by Support Vector Machine (SVM). Our method is evaluated by the jackknife test. For the 24-dimensional vector, the accuracy is 0.8792 and Matthews Correlation Coefficient (MCC) is 0.7587. The 26-dimensional vector by hybridizing with PCA components performs highly satisfaction, in which the accuracy achieves 0.9944 and MCC achieves 0.9888. The results show the effectiveness of the new hybrid method.  相似文献   

4.
Hai ming Ni  Da wei Qi  Hongbo Mu 《Genomics》2018,110(3):180-190
Converting DNA sequence to image by using chaos game representation (CGR) is an effective genome sequence pretreatment technology, which provides the basis for further analysis between the different genes. In this paper, we have constructed 10 mammal species, 48 hepatitis E virus (HEV), and 10 kinds of bacteria genetic CGR images, respectively, to calculate the mean structural similarity (MSSIM) coefficient between every two CGR images. From our analysis, the MSSIM coefficient of gene CGR images can accurately reflect the similarity degrees between different genomes. Hierarchical clustering analysis was used to calculate the class affiliation and construct a dendrogram. Large numbers of experiments showed that this method gives comparable results to the traditional Clustal X phylogenetic tree construction method, and is significantly faster in the clustering analysis process. Meanwhile MSSIM combined CGR method was also able to efficiently clustering of large genome sequences, which the traditional multiple sequence alignment methods (e.g. Clustal X, Clustal Omega, Clustal W, et al.) cannot classify.  相似文献   

5.
Chaos game representation (CGR) was proposed recently to visualize nucleotide sequences as one of the first applications of this technique in the field of biochemistry.1 In this paper we would like to demonstrate that representations similar to CGR can be generalized and applied for visualizing and analyzing protein databases. Examples of applications will be presented for investigating regularities, and motifs in the primary structure of proteins, and for analyzing possible structural attachments on the super-secondary structure level of proteins. A further application will be presented for testing structure prediction methods using CGR.  相似文献   

6.
Analysis of genomic sequences by Chaos Game Representation   总被引:4,自引:0,他引:4  
MOTIVATION: Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their position in a continuous space. This distribution of positions has two properties: it is unique, and the source sequence can be recovered from the coordinates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to identify succession schemes have been entirely overlooked in previous studies which raises the possibility that CGR may be upgraded from a mere representation technique to a sequence modeling tool. RESULTS: The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-integer orders. Therefore, Markov models are particular cases of CGR models rather than the reverse, as currently accepted. In addition, the CGR generalization has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherichia coli K-12 as a test data-set, in particular, the genes thrA, thrB and thrC of the threonine operon.  相似文献   

7.

Background

Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2 -L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations.

Results

The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm.

Conclusions

The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems.  相似文献   

8.
In this paper, we propose two four-base related 2D curves of DNA primary sequences (termed as F-B curves) and their corresponding single-base related 2D curves (termed as A-related, G-related, T-related and C-related curves). The constructions of these graphical curves are based on the assignments of individual base to four different sinusoidal (or tangent) functions; then by connecting all these points on these four sinusoidal (tangent) functions, we can get the F-B curves; similarly, by connecting the points on each of the four sinusoidal (tangent) functions, we get the single-base related 2D curves. The proposed 2D curves are all strictly non degenerate. Then, a 8-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on a normalized geometrical centers of the proposed curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species, similarity of cDNA sequences of beta-globin gene from eight species, and similarity of the whole mitochondrial genomes of 18 eutherian mammals. The experimental results well demonstrate the effectiveness of the proposed method.  相似文献   

9.
Chaos Game Representation (CGR) can recognize patterns in the nucleotide sequences, obtained from databases, of a class of genes using the techniques of fractal structures and by considering DNA sequences as strings composed of four units, G, A, T and C. Such recognition of patterns relies only on visual identification and no mathematical characterization of CGR is known. The present report describes two algorithms that can predict the presence or absence of a stretch of nucleotides in any gene family. The first algorithm can be used to generate DNA sequences represented by any point in the CGR. The second algorithm can simulate known CGR patterns for different gene families by setting the probabilities of occurrence of different di- or trinucleotides by a trial and error process using some guidelines and approximate rules-of-thumb. The validity of the second algorithm has been tested by simulating sequences that can mimic the CGRs of vertebrate non-oncogenes, proto-oncogenes and oncogenes. These algorithms can provide a mathematical basis of the CGR patterns obtained using nucleotide sequences from databases.  相似文献   

10.
Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications.  相似文献   

11.
We propose a new method for classifying and identifying transmembrane (TM) protein functions in proteome-scale by applying a single-linkage clustering method based on TM topology similarity, which is calculated simply from comparing the lengths of loop regions. In this study, we focused on 87 prokaryotic TM proteomes consisting of 31 proteobacteria, 22 gram-positive bacteria, 19 other bacteria, and 15 archaea. Prior to performing the clustering, we first categorized individual TM protein sequences as "known," "putative" (similar to "known" sequences), or "unknown" by using the homology search and the sequence similarity comparison against SWISS-PROT to assess the current status of the functional annotation of the TM proteomes based on sequence similarity only. More than three-quarters, that is, 75.7% of the TM protein sequences are functionally "unknown," with only 3.8% and 20.5% of them being classified as "known" and "putative," respectively. Using our clustering approach based on TM topology similarity, we succeeded in increasing the rate of TM protein sequences functionally classified and identified from 24.3% to 60.9%. Obtained clusters correspond well to functional superfamilies or families, and the functional classification and identification are successfully achieved by this approach. For example, in an obtained cluster of TM proteins with six TM segments, 109 sequences out of 119 sequences annotated as "ATP-binding cassette transporter" are properly included and 122 "unknown" sequences are also contained.  相似文献   

12.
Choleresis induced by dehydrocholate (DHC) stimulates the discharge into bile of lysosomes, which are implicated in the biliary excretion of proteins. Contrary to taurocholate-induced choleresis, DHC choleresis is not affected by microtubule (mt) inhibition. Therefore, the role of mt's in the biliary protein excretion during bile salt choleresis was analyzed in this study. Normal rats and rats treated with the mt poisons colchicine or vinblastine or with the acidotropic agent chloroquine (Cq) were used. The analysis of the protein component in bile was made on SDS-polyacrylamide gel, and the individual polypeptides were quantitated by densitometry. The excretion of bile polypeptides were compared with that of lysosomal acid phosphatase. Bile flow and bile salt output did not show changes on account of treatments. The biliary excretion of acid phosphatase was stimulated by DHC, and it was not affected by mt inhibitors but was markedly diminished by Cq. DHC choleresis produced different effects on the bile polypeptides. The biliary excretion of polypeptide of high molecular mass (84-140 kDa) was stimulated by DHC. Cq treatment increased their basal biliary excretions, whereas DHC-induced secretion was qualitatively and quantitatively similar to that of controls. The 69-kDa polypeptide (albumin) also increased during DHC-induced choleresis, but it showed a different excretory pattern. Cq treatment inhibited such an increase but no correlation with the excretory pattern of the lysosomal marker was found. The biliary excretion of polypeptides of low molecular mass (down to 14 kDa) suffered a transitory decrease and then a subsequent increase over basal values during the DHC choleresis.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

13.
The chaos game representation (CGR) is a scatter plot derived from a DNA sequence, with each point of the plot corresponding to one base of the sequence. If the DNA sequence were a random collection of bases, the CGR would be a uniformly filled square; conversely, any patterns visible in the CGR represent some pattern (information) in the DNA sequence. In this paper, patterns previously observed in a variety of DNA sequences are explained solely in terms of nucleotide, dinucleotide and trinucleotide frequencies.  相似文献   

14.
To evaluate the possibility of an unknown protein to be a resistant gene against Xanthomonas oryzae pv. oryzae, a different mode of pseudo amino acid composition (PseAAC) is proposed to formulate the protein samples by integrating the amino acid composition, as well as the Chaos games representation (CGR) method. Some numerical comparisons of triangle, quadrangle and 12-vertex polygon CGR are carried to evaluate the efficiency of using these fractal figures in classifiers. The numerical results show that among the three polygon methods, triangle method owns a good fractal visualization and performs the best in the classifier construction. By using triangle + 12-vertex polygon CGR as the mathematical feature, the classifier achieves 98.13% in Jackknife test and MCC achieves 0.8462.  相似文献   

15.
Pectins are critical polysaccharides of the cell wall that are involved in key aspects of a plant's life, including cell‐wall stiffness, cell‐to‐cell adhesion, and mechanical strength. Pectins undergo methylesterification, which affects their cellular roles. Pectin methyltransferases are believed to methylesterify pectins in the Golgi, but little is known about their identity. To date, there is only circumstantial evidence to support a role for QUASIMODO2 (QUA2)‐like proteins and an unrelated plant‐specific protein, cotton Golgi‐related 3 (CGR3), in pectin methylesterification. To add to the knowledge of pectin biosynthesis, here we characterized a close homolog of CGR3, named CGR2, and evaluated the effect of loss‐of‐function mutants and over‐expression lines of CGR2 and CGR3 in planta. Our results show that, similar to CGR3, CGR2 is a Golgi protein whose enzyme active site is located in the Golgi lumen where pectin methylesterification occurs. Through phenotypical analyses, we also established that simultaneous loss of CGR2 and CGR3 causes severe defects in plant growth and development, supporting critical but overlapping functional roles of these proteins. Qualitative and quantitative cell‐wall analytical assays of the double knockout mutant demonstrated reduced levels of pectin methylesterification, coupled with decreased microsomal pectin methyltransferase activity. Conversely, CGR2 and CGR3 over‐expression lines have markedly opposite phenotypes to the double knockout mutant, with increased cell‐wall methylesterification levels and microsomal pectin methyltransferase activity. Based on these findings, we propose that CGR2 and CGR3 are critical proteins in plant growth and development that act redundantly in pectin methylesterification in the Golgi apparatus.  相似文献   

16.
17.
Obtaining soluble proteins in sufficient concentrations is a major obstacle in various experimental studies. How to predict the propensity of targets in large-scale proteomics projects to be soluble is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) can investigate the patterns hiding in protein sequences, and can visually reveal previously unknown structure. Fractal dimensions are good tools to measure sizes of complex, highly irregular geometric objects. In this paper, we convert each protein sequence into a high-dimensional vector by CGR algorithm and fractal dimension, and then predict protein solubility by these fractal features together with Chou's pseudo amino acid composition features and support vector machine (SVM). We extract and study six groups of features computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test. As the results of comparisons, the group of 445-dimensional vector gets the best results, the average accuracy is 0.8741 and average MCC is 0.7358. The resulting predictor is also compared with existing methods and shows significant improvement.  相似文献   

18.

Background  

Representing symbolic sequences graphically using iterated maps has enjoyed an enduring popularity since it was first proposed in Jeffrey 1990 as chaos game representation (CGR). The usefulness of this representation goes beyond the convenience of a scale independent representation, it provides a variable memory length representation of transition. This includes the representation of succession with non-integer order, which comes with the promise of generalizing Markovian formalisms. The original proposal targeted genomic sequences only but since then several generalizations have been proposed, many specifically designed to handle protein data.  相似文献   

19.
We describe a mammalian artificial mini-chromosome lacking human alphoid DNA and mouse minor and major satellite DNA repeats. This mini-chromosome, initially recovered in a mouse embryonic stem (ES) cell line (CGR8), is 2.6 Mb in size and consists of sequences derived from the human Y chromosome and mouse chromosomes 12 and 15. It is not stable in the CGR8 cells but replicates and segregates with high fidelity after transfer into chicken DT40 cells. Combined analysis by immunocytochemistry/fluorescence in situ hybridisation (FISH) on metaphase spreads detected an active neo-centromere on the mini-chromosome in these cells. Further analysis by immunocytochemistry/FISH on stretched chromatin allowed the localisation of the CENP-C protein to the DNA sequence derived from interval 5 of the human Y chromosome.  相似文献   

20.
刘娟  高洁 《生物信息学》2011,9(2):97-101
用时间序列模型来分析乙型、丙型这两种流感病毒,对乙流、丙流病毒DNA序列提供了一种新的时间序列模型,即CGR弧度序列。利用CGR坐标将乙流、丙流病毒DNA序列转换成CGR弧度序列,且引入长记忆ARFIMA模型去拟合这两类序列。发现随机找来的10条乙流序列,10条丙流序列都具有长相关性且拟合很好,并且还发现这两种病毒序列可以尝试用不同的ARFIMA模型ARFIMA(0,d,4)模型,ARFIMA(1,d,1)模型去识别。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号