首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Directed graphs of DNA sequences and their numerical characterization   总被引:1,自引:0,他引:1  
In this paper we (1) introduce a directed graphical representation of DNA primary sequences; (2) describe a scheme that transforms the directed graph of a DNA sequence into an upper triangular matrix; (3) investigate whether or not the existing matrix-based invariants of DNA sequences are compatible for the upper triangular matrix representation. The utility of our method is illustrated by an examination of the similarity between human and other seven species.  相似文献   

2.
We introduce a novel 2D graphical representation of DNA sequences based on the pairs of the neighboring nucleotides (PNNs). Then we get the PNNs' distributions and obtain a y-M. The construction of the PNN-curve has some important advantages (1) It avoids loss of information and the PNN-curve standing for DNA sequences does not overlap or intersect with itself. (2) The novel 2D representation is more sensitive. The utility of this method can be illustrated by the examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of eleven different species in Table 2.  相似文献   

3.
In this paper, we first present a new concept of ‘weight’ for 64 triplets and define a different weight for each kind of triplet. Then, we give a novel 2D graphical representation for DNA sequences, which can transform a DNA sequence into a plot set to facilitate quantitative comparisons of DNA sequences. Thereafter, associating with a newly designed measure of similarity, we introduce a novel approach to make similarities/dissimilarities analysis of DNA sequences. Finally, the applications in similarities/dissimilarities analysis of the complete coding sequences of β-globin genes of 11 species illustrate the utilities of our newly proposed method.  相似文献   

4.
Graphical representation of DNA sequences is one of the most popular techniques for alignment-free sequence comparison. Here, we propose a new method for the feature extraction of DNA sequences represented by binary images, by estimating the similarity between DNA sequences using the frequency histograms of local bitmap patterns of images. Our method shows linear time complexity for the length of DNA sequences, which is practical even when long sequences, such as whole genome sequences, are compared. We tested five distance measures for the estimation of sequence similarities, and found that the histogram intersection and Manhattan distance are the most appropriate ones for phylogenetic analyses.  相似文献   

5.
Hartmut Wohlrab 《BBA》2005,1709(2):157-168
Protein sequence similarities and predicted structures identified 75 mitochondrial transport proteins (37 subfamilies) from among the 28,994 human RefSeq (NCBI) protein sequences. All, except two, have an E-value of less than 4e−05 with respect to the structure of the single subunit bovine ADP/ATP carrier/carboxyatractyloside complex (bAAC/CAT) (mGenThreader program). The two 30-kDa exceptions have E-values of 0.003 and 0.005. 21 have been functionally identified and belong to 14 subfamilies. A subset of subfamilies with sequence similarities for each of 12 different protein regions was identified. Many of the 12 protein regions for each tested protein yielded different size subsets. The sum of subfamilies in the 12 subsets was lowest for the phosphate transport protein (PTP) and highest for aralar 1. Transmembrane sequences are most unique. Sequence similarities are highest near the membrane center and matrix. They are highest for the region of transmembrane helices H1, H2 and connecting matrix loop 12 and smallest for transmembrane helices H3, H4 and loop 34. These sequence similarities and the predicted high similarities to the bAAC/CAT structure point to common structural/functional elements that could include subunit/subunit contact sites as they have been identified for PTP and AAC. The four residues protein segment (SerLysGlnIle) of loop 12 is the only segment projecting into the center of the funnel-like structure of the bAAC/CAT. It is present in its entirety only in the AACs and with some replacements in the large Ca2+-modulated aspartate/glutamate transporters. Other transporters have deletions and replacements in this region of loop 12. This protein segment with its central location and variation in size and composition likely contributes to the substrate specificity of the transporters.  相似文献   

6.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.  相似文献   

7.
Abstract

In this paper, we propose a new method based on the 2-D graphical representation to analyze the similarity of biological sequences and classify the protein secondary structure sequences. Instead of computing some characteristics from the distance matrix, the average area surrounded by the curve and X axis is computed as a new invariant. The new method is tested on two sets: the coding sequences of 30 mitochondrial genes from NCBI and 12 protein secondary structure sequences. The similarity/disimilarity and phylogenetic tree (dendrogram) of these sequences verify the validity of our method.  相似文献   

8.
Protein folding occurs in a very high dimensional phase space with an exponentially large number of states, and according to the energy landscape theory it exhibits a topology resembling a funnel. In this statistical approach, the folding mechanism is unveiled by describing the local minima in an effective one-dimensional representation. Other approaches based on potential energy landscapes address the hierarchical structure of local energy minima through disconnectivity graphs. In this paper, we introduce a metric to describe the distance between any two conformations, which also allows us to go beyond the one-dimensional representation and visualize the folding funnel in 2D and 3D. In this way it is possible to assess the folding process in detail, e.g., by identifying the connectivity between conformations and establishing the paths to reach the native state, in addition to regions where trapping may occur. Unlike the disconnectivity maps method, which is based on the kinetic connections between states, our methodology is based on structural similarities inferred from the new metric. The method was developed in a 27-mer protein lattice model, folded into a 3×3×3 cube. Five sequences were studied and distinct funnels were generated in an analysis restricted to conformations from the transition-state to the native configuration. Consistent with the expected results from the energy landscape theory, folding routes can be visualized to probe different regions of the phase space, as well as determine the difficulty in folding of the distinct sequences. Changes in the landscape due to mutations were visualized, with the comparison between wild and mutated local minima in a single map, which serves to identify different trapping regions. The extension of this approach to more realistic models and its use in combination with other approaches are discussed.  相似文献   

9.
10.
Discerning significant relationships in small data sets remains challenging. We introduce here the Hamming distance matrix and show that it is a quantitative classifier of similarities among short time-series. Its elements are derived by computing a modified form of the Hamming distance of pairs of symbol sequences obtained from the original data sets. The values from the Hamming distance matrix are then amenable to statistical analysis. Examples from stem cell research are presented to illustrate different aspects of the method. The approach is likely to have applications in many fields.  相似文献   

11.
基于DNA序列的3D图形表示,通过L/L矩阵的规范化最大特征值组成的3维向量来刻画了DNA序列,并基于这种方法,用β-globin基因的第一个外显子分析了11个物种的相似性问题。  相似文献   

12.

Background  

Protein-protein interaction (PPI) is essential to most biological processes. Abnormal interactions may have implications in a number of neurological syndromes. Given that the association and dissociation of protein molecules is crucial, computational tools capable of effectively identifying PPI are desirable. In this paper, we propose a simple yet effective method to detect PPI based on pairwise similarity and using only the primary structure of the protein. The PPI based on Pairwise Similarity (PPI-PS) method consists of a representation of each protein sequence by a vector of pairwise similarities against large subsequences of amino acids created by a shifting window which passes over concatenated protein training sequences. Each coordinate of this vector is typically the E-value of the Smith-Waterman score. These vectors are then used to compute the kernel matrix which will be exploited in conjunction with support vector machines.  相似文献   

13.
A genome space is a moduli space of genomes. In this space, each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Currently, there is no method to represent genomes by a point in a space without losing biological information. Here, we propose a new graphical representation for DNA sequences. The breakthrough of the subject is that we can construct the moment vectors from DNA sequences using this new graphical method and prove that the correspondence between moment vectors and DNA sequences is one-to-one. Using these moment vectors, we have constructed a novel genome space as a subspace in RN. It allows us to show that the SARS-CoV is most closely related to a coronavirus from the palm civet not from a bird as initially suspected, and the newly discovered human coronavirus HCoV-HKU1 is more closely related to SARS than to any other known member of group 2 coronavirus. Furthermore, we reconstructed the phylogenetic tree for 34 lentiviruses (including human immunodeficiency virus) based on their whole genome sequences. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.  相似文献   

14.
In this paper, a novel 3D graphical representation of DNA sequence based on codons is proposed. Since there is not loss of information due to overlapping and containing loops, this representation will be useful for comparison of different DNA sequences. This 3D curve will be convenient for DNA mutations comparison specially. In continues we give a numerical characterization of DNA sequences based on the new 3D curve. This characterization facilitates quantitative comparisons of similarities/dissimilarities analysis of DNA sequences based on codons.  相似文献   

15.
Apis mellifera jemenitica incorporates a few perceived subspecies that vary in their natural properties and farming qualities. Mitochondrial COI gene sequence (mtCOI) has not been used before for bee identification in the southwestern region of Saudi Arabia. The aim of this work was to study the morphometry and analyzing the mtCOI of all collected bees. The nucleotide sequence of the mtCOI gene was analyzed. Similarity searches and distances between each obtained DNA and sequences available in GenBank were made. Morphometric analysis revealed close similarities among the studied bees, but these similarities are different from those previously indicated in earlier studies of the same region. Molecular studies revealed that the collected bees are similar to each other and some other sequences found in GenBank, but these bees are a new hybrid or subspecies that are different from those previously reported in the same region, indicating the emergence of a new hybrid.  相似文献   

16.
We present the results of two sets of experiments designed to express high methionine proteins in transgenic seeds in three different plant species. In the first approach, two chimeric genes were constructed in which parts of the Arabidopsis 2S albumin gene 1 (AT2S1) were fused at different positions to a Brazil nut 2S albumin cDNA clone. Brazil nut 2S albumin was found to accumulate stably in transgenic Arabidopsis, Brassica napus, and tobacco seeds. In the second approach, methionine-enriched AT2S1 genes were constructed by deleting sequences encoding a region of the protein which is not highly conserved among 2S albumins of different species and replacing them with methioninerich sequences. Introduction of the modified AT2S1 genes into three different plant species resulted in the accumulation of the methionine-enriched 2S albumins in all three species at levels reaching 1 to 2% of the total high salt-extractable seed protein.  相似文献   

17.
DNA sequence representation without degeneracy   总被引:2,自引:0,他引:2       下载免费PDF全文
Yau SS  Wang J  Niknejad A  Lu C  Jin N  Ho YK 《Nucleic acids research》2003,31(12):3078-3080
Graphical representation of DNA sequence provides a simple way of viewing, sorting and comparing various gene structures. A new two-dimensional graphical representation method using a two- quadrant Cartesian coordinates system has been derived for mathematical denotation of DNA sequence. The two-dimensional graphic representation resolves sequences’ degeneracy and is mathematically proven to eliminate circuit formation. Given x-projection and y-projection of any point on the graphical representation, the number of A, G, C and T from the beginning of the sequence to that point could be found. Compared with previous methods, this graphical representation is more in-line with the conventional recognition of linear sequences by molecular biologists, and also provides a metaphor in two dimensions for local and global DNA sequence comparison.  相似文献   

18.
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.  相似文献   

19.
Rat liver nuclei contain a 29-nucleotides-long RNA (fr 3-RNA) which is transcribed from middle repetitive DNA sequences. By Southern analysis of restriction fragments of rat albumin and α-fetoprotein genomic clones, DNA sequences complementary to this RNA were detected on a 4.6 kbp EcoRI fragment located 600 bp downstream from the termination exon of the albumin gene and on a 2 kbp EcoRI-HindIII fragment located 10 kbp downstream from the restriction fragment containing the α-fetoprotein site. No sequence complementary to this RNA was found either in the introns of exons of both genes or in the regions extending 7 kbp upstream from the first albumin exon and 10 kbp upstream of the first α-fetoprotein exon. We concluded that sequences complementary to fr 3-RNA are present at the 3′-end flanking regions of the rat albumin and α-fetoprotein gene complexes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号