首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Comparing DNA or protein sequences plays an important role in the functional analysis of genomes. Despite many methods available for sequences comparison, few methods retain the information content of sequences. We propose a new approach, the Yau-Hausdorff method, which considers all translations and rotations when seeking the best match of graphical curves of DNA or protein sequences. The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm. The Yau-Hausdorff method can be used for measuring the similarity of DNA sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of DNA sequences. The graphical representations of DNA sequences conserve all sequence information and the Yau-Hausdorff distance is mathematically proved as a true metric. Therefore, the proposed distance can preciously measure the similarity of DNA sequences. The phylogenetic analyses of DNA sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of DNA or protein sequences. This study demonstrates that Yau-Hausdorff distance is a natural metric for DNA and protein sequences with high level of stability. The approach can be also applied to similarity analysis of protein sequences by graphic representations, as well as general two dimensional shape matching.  相似文献   

2.
《BIOSILICO》2003,1(5):169-176
A solid definition and comprehensive graphical representation of biological networks is essential for efficient and accurate dissemination of information on biological models. Several proposals have already been made toward this aim. The most well known representation of this kind is a molecular interaction map, or ‘Kohn Map’. However, although the molecular interaction map is a well-defined and compact notation, there are several drawbacks, such as difficulties in intuitive understanding of temporal changes of reactions and additional complexities arising from particular graphical representations. This article proposes several improvements to the molecular interaction map, as well as the use of the ‘process diagram’ to help understand temporal sequences of reactions.  相似文献   

3.
4.
5.
We find that the traditional numerical characterizations of biological sequences, such as E matrix, D/D matrix, L/L matrix and their "high order" matrices, have their limitations to characterize the biological sequences exactly, but they are widely used to analyze the biological sequences. Here, we propose a better numerical characterization for graphical representations of biological sequences, C(i,j) matrix. It is associated with the curvature of every point and has many advantages: (1) It can characterize the graphical representations for DNA sequences exactly, because it can overcome the limitation of the traditional matrices. (2) If we choose an appropriate fixed point, we can make the elements of the C(i,j) matrix less than or equal to 1.  相似文献   

6.
Abstract

In this paper, we propose a new method based on the 2-D graphical representation to analyze the similarity of biological sequences and classify the protein secondary structure sequences. Instead of computing some characteristics from the distance matrix, the average area surrounded by the curve and X axis is computed as a new invariant. The new method is tested on two sets: the coding sequences of 30 mitochondrial genes from NCBI and 12 protein secondary structure sequences. The similarity/disimilarity and phylogenetic tree (dendrogram) of these sequences verify the validity of our method.  相似文献   

7.
8.
Lu H  Zhu X  Liu H  Skogerbø G  Zhang J  Zhang Y  Cai L  Zhao Y  Sun S  Xu J  Bu D  Chen R 《Nucleic acids research》2004,32(16):4804-4811
The refinement and high-throughput of protein interaction detection methods offer us a protein–protein interaction network in yeast. The challenge coming along with the network is to find better ways to make it accessible for biological investigation. Visualization would be helpful for extraction of meaningful biological information from the network. However, traditional ways of visualizing the network are unsuitable because of the large number of proteins. Here, we provide a simple but information-rich approach for visualization which integrates topological and biological information. In our method, the topological information such as quasi-cliques or spoke-like modules of the network is extracted into a clustering tree, where biological information spanning from protein functional annotation to expression profile correlations can be annotated onto the representation of it. We have developed a software named PINC based on our approach. Compared with previous clustering methods, our clustering method ADJW performs well both in retaining a meaningful image of the protein interaction network as well as in enriching the image with biological information, therefore is more suitable in visualization of the network.  相似文献   

9.
MOTIVATION: Cellular signaling networks are dynamic systems that propagate and process information, and, ultimately, cause phenotypical responses. Understanding the circuitry of the information flow in cells is one of the keys to understanding complex cellular processes. The development of computational quantitative models is a promising avenue for attaining this goal. Not only does the analysis of the simulation data based on the concentration variations of biological compounds yields information about systemic state changes, but it is also very helpful for obtaining information about the dynamics of signal propagation. RESULTS: This article introduces a new method for analyzing the dynamics of signal propagation in signaling pathways using Petri net theory. The method is demonstrated with the Ca(2+)/calmodulin-dependent protein kinase II (CaMKII) regulation network. The results constitute temporal information about signal propagation in the network, a simplified graphical representation of the network and of the signal propagation dynamics and a characterization of some signaling routes as regulation motifs.  相似文献   

10.
A genome space is a moduli space of genomes. In this space, each point corresponds to a genome. The natural distance between two points in the genome space reflects the biological distance between these two genomes. Currently, there is no method to represent genomes by a point in a space without losing biological information. Here, we propose a new graphical representation for DNA sequences. The breakthrough of the subject is that we can construct the moment vectors from DNA sequences using this new graphical method and prove that the correspondence between moment vectors and DNA sequences is one-to-one. Using these moment vectors, we have constructed a novel genome space as a subspace in RN. It allows us to show that the SARS-CoV is most closely related to a coronavirus from the palm civet not from a bird as initially suspected, and the newly discovered human coronavirus HCoV-HKU1 is more closely related to SARS than to any other known member of group 2 coronavirus. Furthermore, we reconstructed the phylogenetic tree for 34 lentiviruses (including human immunodeficiency virus) based on their whole genome sequences. Our genome space will provide a new powerful tool for analyzing the classification of genomes and their phylogenetic relationships.  相似文献   

11.

Background  

Software tools that model and simulate the dynamics of biological processes and systems are becoming increasingly important. Some of these tools offer sophisticated graphical user interfaces (GUIs), which greatly enhance their acceptance by users. Such GUIs are based on symbolic or graphical notations used to describe, interact and communicate the developed models. Typically, these graphical notations are geared towards conventional biochemical pathway diagrams. They permit the user to represent the transport and transformation of chemical species and to define inhibitory and stimulatory dependencies. A critical weakness of existing tools is their lack of supporting an integrative representation of transport, transformation as well as biological information processing.  相似文献   

12.
DNA-binding proteins are crucial for various cellular processes and hence have become an important target for both basic research and drug development. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to establish an automated method for rapidly and accurately identifying DNA-binding proteins based on their sequence information alone. Owing to the fact that all biological species have developed beginning from a very limited number of ancestral species, it is important to take into account the evolutionary information in developing such a high-throughput tool. In view of this, a new predictor was proposed by incorporating the evolutionary information into the general form of pseudo amino acid composition via the top-n-gram approach. It was observed by comparing the new predictor with the existing methods via both jackknife test and independent data-set test that the new predictor outperformed its counterparts. It is anticipated that the new predictor may become a useful vehicle for identifying DNA-binding proteins. It has not escaped our notice that the novel approach to extract evolutionary information into the formulation of statistical samples can be used to identify many other protein attributes as well.  相似文献   

13.
DNA sequence representation without degeneracy   总被引:2,自引:0,他引:2       下载免费PDF全文
Yau SS  Wang J  Niknejad A  Lu C  Jin N  Ho YK 《Nucleic acids research》2003,31(12):3078-3080
Graphical representation of DNA sequence provides a simple way of viewing, sorting and comparing various gene structures. A new two-dimensional graphical representation method using a two- quadrant Cartesian coordinates system has been derived for mathematical denotation of DNA sequence. The two-dimensional graphic representation resolves sequences’ degeneracy and is mathematically proven to eliminate circuit formation. Given x-projection and y-projection of any point on the graphical representation, the number of A, G, C and T from the beginning of the sequence to that point could be found. Compared with previous methods, this graphical representation is more in-line with the conventional recognition of linear sequences by molecular biologists, and also provides a metaphor in two dimensions for local and global DNA sequence comparison.  相似文献   

14.
Yau SS  Yu C  He R 《DNA and cell biology》2008,27(5):241-250
Graphical representation of gene sequences provides a simple way of viewing, sorting, and comparing various gene structures. Here we first report a two-dimensional graphical representation for protein sequences. With this method, we constructed the moment vectors for protein sequences, and mathematically proved that the correspondence between moment vectors and protein sequences is one-to-one. Therefore, each protein sequence can be represented as a point in a map, which we call protein map, and cluster analysis can be used for comparison between the points. Sixty-six proteins from five protein families were analyzed using this method. Our data showed that for proteins in the same family, their corresponding points in the map are close to each other. We also illustrate the efficiency of this approach by performing an extensive cluster analysis of the protein kinase C family. These results indicate that this protein map could be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence.  相似文献   

15.
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber.  相似文献   

16.
V. A. Namiot 《Biophysics》2008,53(4):256-259
A method is proposed for examining the surface of small biological objects such as macromolecules and their complexes. Based on interference of low-energy electrons, it allows constructing analogs of optical holograms but with a resolution on the order of interatomic distances. A set of such holograms obtained at different electron energies can provide sufficient information for identifying the surface groups of the object. Thus the method can be used for fast reading of e.g. polynucleotide sequences.  相似文献   

17.
M Nanard  J Nanard 《Biochimie》1985,67(5):429-432
Learning methods developed by artificial intelligence research teams are very efficient for biological sequences analysis but they need running on large computers accessed by terminals. These computers are interfaced with standard displays involving long and unpleasant alphanumerical data handling. The "biological work station" is a personal computer with a color graphic screen providing a user-friendly interface for the artificial intelligence learning programs running on large computers. It provides to biologist a graphical convenient tool for sequence analysis built with efficient man-machine communication methods such as multiwindows, icons and mouse selection. It allows the biologist to edit and display sequences in an efficient and natural way, showing off directly on color pictures the data and the results of learning programs.  相似文献   

18.
We introduce a new representation and feature extraction method for biological sequences. Named bio-vectors (BioVec) to refer to biological sequences in general with protein-vectors (ProtVec) for proteins (amino-acid sequences) and gene-vectors (GeneVec) for gene sequences, this representation can be widely used in applications of deep learning in proteomics and genomics. In the present paper, we focus on protein-vectors that can be utilized in a wide array of bioinformatics investigations such as family classification, protein visualization, structure prediction, disordered protein identification, and protein-protein interaction prediction. In this method, we adopt artificial neural network approaches and represent a protein sequence with a single dense n-dimensional vector. To evaluate this method, we apply it in classification of 324,018 protein sequences obtained from Swiss-Prot belonging to 7,027 protein families, where an average family classification accuracy of 93%±0.06% is obtained, outperforming existing family classification methods. In addition, we use ProtVec representation to predict disordered proteins from structured proteins. Two databases of disordered sequences are used: the DisProt database as well as a database featuring the disordered regions of nucleoporins rich with phenylalanine-glycine repeats (FG-Nups). Using support vector machine classifiers, FG-Nup sequences are distinguished from structured protein sequences found in Protein Data Bank (PDB) with a 99.8% accuracy, and unstructured DisProt sequences are differentiated from structured DisProt sequences with 100.0% accuracy. These results indicate that by only providing sequence data for various proteins into this model, accurate information about protein structure can be determined. Importantly, this model needs to be trained only once and can then be applied to extract a comprehensive set of information regarding proteins of interest. Moreover, this representation can be considered as pre-training for various applications of deep learning in bioinformatics. The related data is available at Life Language Processing Website: http://llp.berkeley.edu and Harvard Dataverse: http://dx.doi.org/10.7910/DVN/JMFHTN.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号