首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
In this paper, a novel 3D graphical representation of DNA sequence based on codons is proposed. Since there is not loss of information due to overlapping and containing loops, this representation will be useful for comparison of different DNA sequences. This 3D curve will be convenient for DNA mutations comparison specially. In continues we give a numerical characterization of DNA sequences based on the new 3D curve. This characterization facilitates quantitative comparisons of similarities/dissimilarities analysis of DNA sequences based on codons.  相似文献   

2.
We consider a novel 2-D graphical representation of DNA sequences according to chemical structures of bases, reflecting distribution of bases with different chemical structure, preserving information on sequential adjacency of bases, and allowing numerical characterization. The representation avoids loss of information accompanying alternative 2-D representations in which the curve standing for DNA overlaps and intersects itself. Based on this representation we present a numerical characterization approach by the leading eigenvalues of the matrices associated with the DNA sequences. The utility of the approach is illustrated on the coding sequences of the first exon of human beta-globin gene.  相似文献   

3.
Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.  相似文献   

4.
A genome space is a moduli space of genomes. In this space, each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Currently, there is no method to represent genomes by a point in a space without losing biological information. Here, we propose a new graphical representation for DNA sequences. The breakthrough of the subject is that we can construct the moment vectors from DNA sequences using this new graphical method and prove that the correspondence between moment vectors and DNA sequences is one-to-one. Using these moment vectors, we have constructed a novel genome space as a subspace in RN. It allows us to show that the SARS-CoV is most closely related to a coronavirus from the palm civet not from a bird as initially suspected, and the newly discovered human coronavirus HCoV-HKU1 is more closely related to SARS than to any other known member of group 2 coronavirus. Furthermore, we reconstructed the phylogenetic tree for 34 lentiviruses (including human immunodeficiency virus) based on their whole genome sequences. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.  相似文献   

5.
DNA sequencing has resulted in an abundance of data on DNA sequences for various species. Hence, the characterization and comparison of sequences become more important but still difficult tasks. In this paper, we first give a 2-D ladderlike graphical representation for the characteristic sequences of a DNA sequence, and then construct a 3-component vector, in which the normalized ALE-indices extracted from such three 2-D graphs via D/D matrices are individual components, to characterize the DNA sequence. The examination of similarities/dissimilarities among sequences of the beta-globin genes of different species illustrates the utility of the approach.  相似文献   

6.
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.  相似文献   

7.
Repetitive DNA sequences in the rice genome comprise more than half of the nuclear DNA. The isolation and characterization of these repetitive DNA sequences should lead to a better understanding of rice chromosome structure and genome organization. We report here the characterization and chromosome localization of a chromosome 5-specific repetitive DNA sequence. This repetitive DNA sequence was estimated to have at least 900 copies. DNA sequence analysis of three genomic clones which contain the repeat unit indicated that the DNA sequences have two sub-repeat units of 37 bp and 19 bp, connected by 30-to 90-bp short sequences with high similarity. RFLP mapping and physical mapping by fluorescence in situ hybridization (FISH) indicated that almost all copies of the repetitive DNA sequence are located in the centromeric heterochromatic region of the long arm of chromosome 5. The strategy for cloning such repetitive DNA sequences and their uses in rice genome research are discussed.  相似文献   

8.
Whole genome base-resolution methylome sequencing allows for the most comprehensive analysis of DNA methylation, however, the considerable sequencing cost often limits its applications. While reduced representation sequencing can be an affordable alternative, over 80% of CpGs in the genome are not covered. Building on our recently developed TET-assisted pyridine borane sequencing (TAPS) method, we here described endonuclease enrichment TAPS (eeTAPS), which utilizes dihydrouracil (DHU)-cleaving endonuclease digestion of TAPS-converted DNA to enrich methylated CpG sites (mCpGs). eeTAPS can accurately detect 87% of mCpGs in the mouse genome with a sequencing depth equivalent to 4× whole genome sequencing. In comparison, reduced representation TAPS (rrTAPS) detected less than 4% of mCpGs with 2.5× sequencing depth. Our results demonstrate eeTAPS to be a new strategy for cost-effective genome-wide methylation analysis at single-CpG resolution that can fill the gap between whole-genome and reduced representation sequencing.  相似文献   

9.
Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.  相似文献   

10.
A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy , a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.  相似文献   

11.
12.
Whole-genome sequencing and variant discovery in C. elegans   总被引:1,自引:0,他引:1  
Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage and representation. Massively parallel sequencing facilitates strain-to-reference comparison for genome-wide sequence variant discovery. Owing to the short-read-length sequences produced, we developed a revised approach to determine the regions of the genome to which short reads could be uniquely mapped. We then aligned Solexa reads from C. elegans strain CB4858 to the reference, and screened for single-nucleotide polymorphisms (SNPs) and small indels. This study demonstrates the utility of massively parallel short read sequencing for whole genome resequencing and for accurate discovery of genome-wide polymorphisms.  相似文献   

13.
Douglas L. Vizard 《Biopolymers》1978,17(9):2057-2082
The method of DNA partial denaturation and intramolecular renaturation (in the absence of biomolecular reassociation) is developed analytically and presented as a means by which the supraorganization of the DNA sequences within large complex genomes may be studied. This analysis provides for the comparison of the actual borganization of DNA sequences with a random arrangement of the same sequences. The sequence organization of the E. coli genome does not appear to be very different from DNA sequences arranged along the genome without preference to sequence stabilities, whereas an orderly physical arrangement of DNA sequences is implicated for the human genoma.  相似文献   

14.
New global method for computer prediction of functional sites in nucleotide sequences, based on the fractal representation, is presented. Fractal representation of set of sequences (FRS) provides simple way for generating recognitions matrix of functionally similar sequences and simple estimations of its efficiency for searching homologous regions in new sequences. Other advantages of the method are absence of the necessity of sequences alignment during generating based set and searching new homologous regions and small CPU time. Usage of the method illustrated for searching globin and histone genes, for ALU repeats in human genome and long terminal repeats in virus genome.  相似文献   

15.
Whole genome amplification protocols are revolutionizing the fields of molecular and conservation biology as they open the possibility of obtaining a large number of copies of a complete genome from minute amounts of sample. Multiple displacement amplification (MDA) is a whole genome amplification technique based on the properties of the phi29 DNA polymerase, which leads to a uniform representation of the genome with very low error rates. In this study we performed MDA on 28 macaque DNA samples extracted from blood or non-invasively collected semen from which we obtained mitochondrial control region sequences both before and after MDA. The length of the readable sequences was longer for the original samples than for the MDA products, but the number of unresolved positions was comparable both before and after MDA. We conclude that the MDA technique is useful for increasing the amount of DNA for sequencing mitochondrial regions in the case of non-invasively collected semen samples.  相似文献   

16.
Li C  Xing L  Wang X 《BMB reports》2008,41(3):217-222
Based on a five-letter model of the 20 amino acids, we propose a new 2-D graphical representation of protein sequence. Then we transform the 2-D graphical representation into a numerical characterization that will facilitate quantitative comparisons of protein sequences. As an application, we construct the phylogenetic tree of 56 coronavirus spike proteins. The resulting tree agrees well with the established taxonomic groups.  相似文献   

17.
Chaos game representation of gene structure.   总被引:21,自引:2,他引:19       下载免费PDF全文
This paper presents a new method for representing DNA sequences. It permits the representation and investigation of patterns in sequences, visually revealing previously unknown structures. Based on a technique from chaotic dynamics, the method produces a picture of a gene sequence which displays both local and global patterns. The pictures have a complex structure which varies depending on the sequence. The method is termed Chaos Game Representation (CGR). CGR raises a new set of questions about the structure of DNA sequences, and is a new tool for investigating gene structure.  相似文献   

18.
榉属植物总DNA提取方法研究   总被引:18,自引:1,他引:17  
榉属植物体内酚类、多糖等次生物质含量较高, 严重影响从中提取总DNA 的产量和质量。针对这一问题探索出一种适合榉属植物总DNA 的提取方法, 并对提取的总DNA 进行了纯度和浓度的鉴定。结果表明此方法可有效去除次生物质对DNA 的干扰, 样品DNA 的质量和纯度较高, 可用于随机引物RAPD 扩增和随后的各种遗传学分析。  相似文献   

19.
Complementary strand-specific adenovirus DNA, either full length or from restriction enzyme cleavage fragments, was used to estimate the fractional representation and abundance of viral sequences in two adenovirus type 2 (Ad2)-transformed rat cell lines, A2F19 and A2T2C4. The reassociation method introduced is based on the linear relationship, after exhaustive hybridization, between the inverted fraction of hybrid DNA and the molar ratio of probe to cellular DNA in the reaction mixture. The amount of viral DNA in A2F19 cells represents 12 to 14% of the viral genome at a level of around seven copies per diploid cell equivalent. For the cell line A2T2C4, the pattern of integrated viral DNA sequences is more complex. With full-length Ad2 DNA strands as a probe, about 56% of the probe was represented in cellular DNA. When each of the four BamHI fragment strands of Ad2 DNA was used as a probe, the fraction of the viral DNA present also amounted to around 56% with one to five copies from different regions of the viral genome. The results demonstrate the advantage of using strand-specific viral DNA as a probe in reassociation analysis with denatured cell DNA. The method should be useful in any system in which complementary strand separation of viral DNA sequences can be achieved.  相似文献   

20.
DNA sequence representation without degeneracy   总被引:2,自引:0,他引:2       下载免费PDF全文
Yau SS  Wang J  Niknejad A  Lu C  Jin N  Ho YK 《Nucleic acids research》2003,31(12):3078-3080
Graphical representation of DNA sequence provides a simple way of viewing, sorting and comparing various gene structures. A new two-dimensional graphical representation method using a two- quadrant Cartesian coordinates system has been derived for mathematical denotation of DNA sequence. The two-dimensional graphic representation resolves sequences’ degeneracy and is mathematically proven to eliminate circuit formation. Given x-projection and y-projection of any point on the graphical representation, the number of A, G, C and T from the beginning of the sequence to that point could be found. Compared with previous methods, this graphical representation is more in-line with the conventional recognition of linear sequences by molecular biologists, and also provides a metaphor in two dimensions for local and global DNA sequence comparison.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号