首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.  相似文献   

3.
We introduce a new approach to compare DNA primary sequences. The core of our method is a new measure of pairwise distances among sequences. Using the primitive discrimination substrings of sequence S and Q, a discrimination measure DM(S, Q) is defined for the similarity analysis of them. The proposed method does not require multiple alignments and is fully automatic. To illustrate its utility, we construct phylogenetic trees on two independent data sets. The results indicate that the method is efficient and powerful.  相似文献   

4.
New 3D graphical representation of DNA sequence based on dual nucleotides   总被引:2,自引:2,他引:0  
We introduce a 3D graphical representation of DNA sequences based on the pairs of dual nucleotides (DNs). Based on this representation, we consider some mathematical invariants and construct two 16-component vectors associated with these invariants. The vectors are used to characterize and compare the complete coding sequence part of beta globin gene of nine different species. The examination of similarities/dissimilarities illustrates the utility of the approach.  相似文献   

5.
Under the hypothesis of no-strand-bias conditions, the Watson and Crick base-pairing rule decreases the complexity of models of DNA evolution by reducing to six the maximum number of substitution rates. It was shown that intrastrand equimolarity between A and T (A * T *) and between G and C (G * C *) is a general asymptotic property of this class of models. This statistical prediction was observed on 60 long genomic fragments (>50 kbp) from various kingdoms, even when the effect of the two opposite orientations for coding sequences is removed. The practical consequence of the model for estimating the expected number of substitutions per site between two homologous DNA sequences is discussed.Abbreviations BPR Watson and Crick base pairing rule (A:T, G:C) - PRI Intrastrand type-1 parity rule (i j, m(i,j)m( )) - PRII Intra strand type-2 parity rule (A * T *, G * C *)  相似文献   

6.
A fractal method to distinguish coding and non-coding sequences in a complete genome is proposed, based on different statistical behaviors between these two kinds of sequences. We first propose a number sequence representation of DNA sequences. Multifractal analysis is then performed on the measure representation of the obtained number sequence. The three exponents C(-1), C1 and C2 are selected from the result of multifractal analysis. Each DNA may be represented by a point in the three-dimensional space generated by these three-component vectors. It is shown that points corresponding to coding and non-coding sequences in the complete genome of many prokaryotes are roughly distributed in different regions. Fisher's discriminant algorithm can be used to separate these two regions in the spanned space. If the point (C(-1),C1,C2) for a DNA sequence is situated in the region corresponding to coding sequences, the sequence is discriminated as a coding sequence; otherwise, the sequence is classified as a non-coding one. For all 51 prokaryotes we considered , the average discriminant accuracies pc,pnc,qc and qnc reach 72.28%, 84.65%, 72.53% and 84.18%, respectively.  相似文献   

7.
We introduce a novel 2D graphical representation of DNA sequences based on the pairs of the neighboring nucleotides (PNNs). Then we get the PNNs' distributions and obtain a y-M. The construction of the PNN-curve has some important advantages (1) It avoids loss of information and the PNN-curve standing for DNA sequences does not overlap or intersect with itself. (2) The novel 2D representation is more sensitive. The utility of this method can be illustrated by the examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of eleven different species in Table 2.  相似文献   

8.
基于CGR的DNA序列的时间序列模型(英文)   总被引:1,自引:0,他引:1  
高洁  蒋丽丽  徐振源 《生物信息学》2010,8(2):156-160,164
利用DNA序列的混沌游戏表示(chaos game representation,CGR),提出了将2维DNA图谱转化成相应的类谱格式的方法。该方法不仅提供了一个较好的视觉表示,而且可将DNA序列转化成一个时间序列。利用CGR坐标将DNA序列转化成CGR弧度序列,并引入长记忆ARFIMA(p,d,q)模型去拟合此类序列,发现此类序列中有显著的长相关性且拟合度很好。  相似文献   

9.

Background

A commonplace analysis in high-throughput DNA methylation studies is the comparison of methylation extent between different functional regions, computed by averaging methylation states within region types and then comparing averages between regions. For example, it has been reported that methylation is more prevalent in coding regions as compared to their neighboring introns or UTRs, leading to hypotheses about novel forms of epigenetic regulation.

Results

We have identified and characterized a bias present in these seemingly straightforward comparisons that results in the false detection of differences in methylation intensities across region types. This bias arises due to differences in conservation rates, rather than methylation rates, and is broadly present in the published literature. When controlling for conservation at coding start sites the differences in DNA methylation rates disappear. Moreover, a re-evaluation of methylation rates at intronexon junctions reveals that the magnitude of previously reported differences is greatly exaggerated. We introduce two correction methods to address this bias, an inferencebased matrix completion algorithm and an averaging approach, tailored to address different underlying biological questions. We evaluate how analysis using these corrections affects the detection of differences in DNA methylation across functional boundaries.

Conclusions

We report here on a bias in DNA methylation comparative studies that originates in conservation rate differences and manifests itself in the false discovery of differences in DNA methylation intensities and their extents. We have characterized this bias and its broad implications, and show how to control for it so as to enable the study of a variety of biological questions.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1604-3) contains supplementary material, which is available to authorized users.  相似文献   

10.
In this article, we introduce three 3D graphical representations of DNA primary sequences, which we call RY-curve, MK-curve and SW-curve, based on three classifications of the DNA bases. The advantages of our representations are that (i) these 3D curves are strictly non-degenerate and there is no loss of information when transferring a DNA sequence to its mathematical representation and (ii) the coordinates of every node on these 3D curves have clear biological implication. Two applications of these 3D curves are presented: (a) a simple formula is derived to calculate the content of the four bases (A, G, C and T) from the coordinates of nodes on the curves; and (b) a 12-component characteristic vector is constructed to compare similarity among DNA sequences from different species based on the geometrical centers of the 3D curves. As examples, we examine similarity among the coding sequences of the first exon of beta-globin gene from eleven species and validate similarity of cDNA sequences of beta-globin gene from eight species.  相似文献   

11.
Choong MK  Yan H 《Bioinformation》2008,2(7):273-278
This paper presents a new method for exon detection in DNA sequences based on multi-scale parametric spectral analysis. A forward-backward linear prediction (FBLP) with the singular value decomposition (SVD) algorithm FBLP-SVD is applied to the double-base curves (DB-curves) of a DNA sequence using a variable moving window sizes to estimate the signal spectrum at multiple scales. Simulations are done on short human genes in the range of 11bp to 2032bp and the results show that our proposed method out-performs the classical Fourier transform method. The multi-scale approach is shown to be more effective than using a single scale with a fixed window size. In addition, our method is flexible as it requires no training data.  相似文献   

12.
给出了蛋白质序列的一种六维表示方法,根据这种表示方法有3种不同表示形式,利用这3种形式来构造距离矩阵的信息熵,然后通过信息熵向量的欧式距离、夹角来比较序列之间的相似性。  相似文献   

13.
根据核糖体DNA ITS序列分析苜蓿属的系统分类   总被引:4,自引:0,他引:4  
对苜蓿属28个种和1个草木樨种的核糖体基因的内转录间隔子区(internal transcribed spacer,ITS)的核苷酸序列变异做了分析。黄香草木樨被用作外类群。系统分析产生的进化树与该属传统分类基本一致。本研究提示,黄花苜蓿应与紫色苜蓿列入一种。M.hybrida,M.cancellata和M.prostrata是与栽培苜蓿亲缘关系较近的野生种。研究结果证实先前被称做胡卢巴属的植物种应被归于苜蓿属,而芷蓿属内的Heynianae,Platycarpae和Spirocarpos等族的分类应予以重新考虑。  相似文献   

14.
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time‐consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user‐friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two‐stage algorithm. First, an alignment‐free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment‐based K2P distance nearest‐neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment‐free methods and (ii) higher scalability than alignment‐based distance methods and character‐based methods. These results suggest that this platform is able to deal with both large‐scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/ .  相似文献   

15.
B. Liao  T. Wang  K. Ding 《Molecular simulation》2013,39(14-15):1063-1071
In this paper, we proposed a seven-dimensional (7D) representation of ribonucleic acid (RNA) secondary structures. The use of the 7D representation is illustrated by constructing structure invariants. Comparisons with the similarity/dissimilarity results based on 7D representation for a set of RNA 3 secondary structures at the 3′-terminus of different viruses, are considered to illustrate the use of our structure invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal.  相似文献   

16.
In this work, we report thermodynamic, kinetic, and microrheological studies relative to the formation of PNA‐ and PNA/DNA‐based noncovalent polymeric systems, useful tools for biotechnological and bioengineering applications. We realized two kinds of systems: a PNA‐based system formed by a self‐assembling PNA tridendron, and a PNA/DNA hybrid system formed by a PNA tridendron and a DNA linker. The formation of a three‐dimensional polymeric network, by means of specific Watson–Crick base pairing, was investigated by a detailed UV and CD spectroscopic study. Preliminary microrheology experiments were performed on both systems to evaluate their viscoelastic properties which resulted in agreement with the formation of soluble hyperbranched polymers that could be useful for drug/gene delivery, as well as for encapsulating organic pollutants of different shapes and sizes in environmental applications. Copyright © 2009 European Peptide Society and John Wiley & Sons, Ltd.  相似文献   

17.
This paper presents a new approach for modeling of DNA sequences for the purpose of exon detection. The proposed model adopts the sum-of-sinusoids concept for the representation of DNA sequences. The objective of the modeling process is to represent the DNA sequence with few coefficients. The modeling process can be performed on the DNA signal as a whole or on a segment-by-segment basis. The created models can be used instead of the original sequences in a further spectral estimation process for exon detection. The accuracy of modeling is evaluated evaluated by using the Root Mean Square Error (RMSE) and the R-square metrics. In addition, non-parametric spectral estimation methods are used for estimating the spectral of both original and modeled DNA sequences. The results of exon detection based on original and modeled DNA sequences coincide to a great extent, which ensures the success of the proposed sum-of-sinusoids method for modeling of DNA sequences.  相似文献   

18.
A new classification scheme based on the melting profile of DNA sequences simulated thermal melting profiles. This method was applied in the classification of (a) several species of mammalian - β globin and (b) α-chain class II MHC genes. Comparison of the thermal melting profile with the molecular phylogenetic trees constructed using the sequences shows that the melting temperature based approach is able to reproduce most of the major features of the sequence based evolutionary tree. Melting profile method takes into account the inherent structure and dynamics of the DNA molecule, does not require sequence alignment prior to tree construction, and provides a means to verify the results experimentally. Therefore our results show that melting profile based classification of DNA sequences could be a useful tool for sequence analysis.  相似文献   

19.
This paper analyses the research progress in the use of molecular techniques based on ribosomal RNA and DNA (rRNA/rDNA) for rumen microbial ecosystem since first literature by Stahl et al. (1988). Because rumen microbial populations could be under-estimated by adopting the traditional techniques such as roll-tube technique or most-probable-number estimates, modern molecular techniques based on 16S/18S rRNA/rDNA can be used to more accurately provide molecular characterization, microbe populations and classification scheme than traditional methods. Phylogenetic-group-specific probes can be used to hybridize samples for detecting and quantifying of rumen microbes. But, competitive-PCR and real-time PCR can more sensitively quantify rumen microbes than hybridization. Molecular fingerprinting techniques including both denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and restriction fragment length polymorphisms (RFLP) can used to explore diversity of bacteria, protozoa and fungi in the rumen ecosystem. By constructing clone libraries of 16S/18S rRNA/rDNA of rumen microbes, more new microbes can be discovered and identified. For fungi, internal transcribed spacers (ITS) of fungi are better than 18S rRNA/rDNA for discriminating operational taxonomic units. In conclusion, 16S/18S rRNA/rDNA procedures have been used with success in rumen microbes and are quickly gaining acceptance for studying rumen microbial ecosystem, and will become useful methods for rumen ecology research. However, molecular techniques based on 16S/18S rRNA/rDNA don't preclude classical and traditional microbiological techniques. It should used together to acquire accurate and satisfactory results.  相似文献   

20.
Mycobacterium leprae has undergone extensive degenerative evolution, with a large number of pseudogenes. It is also the organism with the greatest divergence between gene annotations from independent institutes. Therefore, M. leprae is a good model to verify the currently predicted coding sequence regions between different annotations, to identify new ones and to investigate the expression of pseudogenes. We submitted a total extract of the bacteria isolated from Armadillo to Gel‐LC‐MS/MS using a linear quadrupole ion trap‐Orbitrap mass spectrometer. Spectra were analyzed using the Leproma (1614 genes and 1133 pseudogenes) and TIGR (5446 genes) databases and a database containing the full genome translation. We identified a total of 1046 proteins, including five proteins encoded by previously predicted pseudogenes, which upon closer inspection appeared to be proper genes. Only 11 of the additional annotations by TIGR were verified. We also identified six tryptic peptides from five proteins from regions not considered to be coding sequences, in addition to peptides from two unannotated gene candidates that overlap with other genes. Our data show that the Leproma annotation of M. leprae is quite accurate, and there were no peptide observations corresponding to true pseudogenes, except for a new gene candidate, overlapping with an essential enolase on the complementary strand.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号