首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
In this paper, we introduce a probabilistic measure for computing the similarity between two biological sequences without alignment. The computation of the similarity measure is based on the Kullback-Leibler divergence of two constructed Markov models. We firstly validate the method on clustering nine chromosomes from three species. Secondly, we give the result of similarity search based on our new method. We lastly apply the measure to the construction of phylogenetic tree of 48 HEV genome sequences. Our results indicate that the weighted relative entropy is an efficient and powerful alignment-free measure for the analysis of sequences in the genomic scale.  相似文献   

2.
A complete texture image retrieval system includes two techniques: texture feature extraction and similarity measurement. Specifically, similarity measurement is a key problem for texture image retrieval study. In this paper, we present an effective similarity measurement formula. The MIT vision texture database, the Brodatz texture database, and the Outex texture database were used to verify the retrieval performance of the proposed similarity measurement method. Dual-tree complex wavelet transform and nonsubsampled contourlet transform were used to extract texture features. Experimental results show that the proposed similarity measurement method achieves better retrieval performance than some existing similarity measurement methods.  相似文献   

3.
One of the major functions of vision is to allow for an efficient and active interaction with the environment. In this study, we investigate the capacity of human observers to extract visual information from observation of their own actions, and those of others, from different viewpoints. Subjects discriminated the size of objects by observing a point-light movie of a hand reaching for an invisible object. We recorded real reach-and-grasp actions in three-dimensional space towards objects of different shape and size, to produce two-dimensional 'point-light display' movies, which were used to measure size discrimination for reach-and-grasp motion sequences, release-and-withdraw sequences and still frames, all in egocentric and allocentric perspectives. Visual size discrimination from action was significantly better in egocentric than in allocentric view, but only for reach-and-grasp motion sequences: release-and-withdraw sequences or still frames derived no advantage from egocentric viewing. The results suggest that the system may have access to an internal model of action that contributes to calibrate visual sense of size for an accurate grasp.  相似文献   

4.
人体无支撑运动的数学模型   总被引:1,自引:0,他引:1  
基于Hanavan[1]人体模型,本文建立了人体空中运动的一般数学模型,并利用微分方程数值解法对某些实际运动进行了模拟.结果表明该模型能用于反映人体空中无支撑运动的运动过程,并可应用于宇航、体操、技巧、跳水、舞蹈等等运动的研究.  相似文献   

5.
We introduce a new approach to compare DNA primary sequences. The core of our method is a new measure of pairwise distances among sequences. Using the primitive discrimination substrings of sequence S and Q, a discrimination measure DM(S, Q) is defined for the similarity analysis of them. The proposed method does not require multiple alignments and is fully automatic. To illustrate its utility, we construct phylogenetic trees on two independent data sets. The results indicate that the method is efficient and powerful.  相似文献   

6.
We develop a novel method of asserting the similarity between two biological sequences without the need for alignment. The proposed method uses free energy of nearest-neighbor interactions as a simple measure of dissimilarity. It is used to perform a search for similarities of a query sequence against three complex datasets. The sensitivity and selectivity are computed and evaluated and the performance of the proposed distance measure is compared. Real data analysis shows that is a very efficient, sensitive and high-selective algorithm in comparing large dataset of DNA sequences.  相似文献   

7.

Background  

The rapid burgeoning of available protein data makes the use of clustering within families of proteins increasingly important. The challenge is to identify subfamilies of evolutionarily related sequences. This identification reveals phylogenetic relationships, which provide prior knowledge to help researchers understand biological phenomena. A good evolutionary model is essential to achieve a clustering that reflects the biological reality, and an accurate estimate of protein sequence similarity is crucial to the building of such a model. Most existing algorithms estimate this similarity using techniques that are not necessarily biologically plausible, especially for hard-to-align sequences such as proteins with different domain structures, which cause many difficulties for the alignment-dependent algorithms. In this paper, we propose a novel similarity measure based on matching amino acid subsequences. This measure, named SMS for Substitution Matching Similarity, is especially designed for application to non-aligned protein sequences. It allows us to develop a new alignment-free algorithm, named CLUSS, for clustering protein families. To the best of our knowledge, this is the first alignment-free algorithm for clustering protein sequences. Unlike other clustering algorithms, CLUSS is effective on both alignable and non-alignable protein families. In the rest of the paper, we use the term "phylogenetic" in the sense of "relatedness of biological functions".  相似文献   

8.
MOTIVATION: Many bioinformatics data resources not only hold data in the form of sequences, but also as annotation. In the majority of cases, annotation is written as scientific natural language: this is suitable for humans, but not particularly useful for machine processing. Ontologies offer a mechanism by which knowledge can be represented in a form capable of such processing. In this paper we investigate the use of ontological annotation to measure the similarities in knowledge content or 'semantic similarity' between entries in a data resource. These allow a bioinformatician to perform a similarity measure over annotation in an analogous manner to those performed over sequences. A measure of semantic similarity for the knowledge component of bioinformatics resources should afford a biologist a new tool in their repertoire of analyses. RESULTS: We present the results from experiments that investigate the validity of using semantic similarity by comparison with sequence similarity. We show a simple extension that enables a semantic search of the knowledge held within sequence databases. AVAILABILITY: Software available from http://www.russet.org.uk.  相似文献   

9.
Due to the increased availability of digital human models, the need for knowing human movement is important in product design process. If the human motion is derived rapidly as design parameters change, a developer could determine the optimal parameters. For example, the optimal design of the door panel of an automobile can be obtained for a human operator to conduct the easiest ingress and egress motion. However, acquiring motion data from existing methods provides only unrealistic motion or requires a great amount of time. This not only leads to an increased time consumption for a product development, but also causes inefficiency of the overall design process. To solve such problems, this research proposes an algorithm to rapidly and accurately predict full-body human motion using an artificial neural network (ANN) and a motion database, as the design parameters are varied. To achieve this goal, this study refers to the processes behind human motor learning procedures. According to the previous research, human generate new motion based on past motion experience when they encounter new environments. Based on this principle, we constructed a motion capture database. To construct the database, motion capture experiments were performed in various environments using an optical motion capture system. To generate full-body human motion using this data, a generalized regression neural network (GRNN) was used. The proposed algorithm not only guarantees rapid and accurate results but also overcomes the ambiguity of the human motion objective function, which has been pointed out as a limitation of optimization-based research. Statistical criteria were utilized to confirm the similarity between the generated motion and actual human motion. Our research provides the basis for a rapid motion prediction algorithm that can include a variety of environmental variables. This research contributes to an increase in the usability of digital human models, and it can be applied to various research fields.  相似文献   

10.
Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects.  相似文献   

11.
We have developed a pattern comparative method for identifying functionally important motifs in protein sequences. The essence of most standard pattern comparative methods is a comparison of patterns occurring in different sequences using an optimized weight matrix. In contrast, our approach is based on a measure of similarity among all the candidate motifs within the same sequence. This method may prove to be particularly efficient for proteins encoding the same biochemical function, but with different primary sequences, and when tertiary structure information from one or more sequences is available. We have applied this method to a special class of zinc-binding enzymes known as endopeptidases.  相似文献   

12.

Background  

Sequence alignment is one of the most important techniques to analyze biological systems. It is also true that the alignment is not complete and we have to develop it to look for more accurate method. In particular, an alignment for homologous sequences with low sequence similarity is not in satisfactory level. Usual methods for aligning protein sequences in recent years use a measure empirically determined. As an example, a measure is usually defined by a combination of two quantities (1) and (2) below: (1) the sum of substitutions between two residue segments, (2) the sum of gap penalties in insertion/deletion region. Such a measure is determined on the assumption that there is no an intersite correlation on the sequences. In this paper, we improve the alignment by taking the correlation of consecutive residues.  相似文献   

13.
Cluster analysis has proven to be a useful tool for investigating the association structure among genes in a microarray data set. There is a rich literature on cluster analysis and various techniques have been developed. Such analyses heavily depend on an appropriate (dis)similarity measure. In this paper, we introduce a general clustering approach based on the confidence interval inferential methodology, which is applied to gene expression data of microarray experiments. Emphasis is placed on data with low replication (three or five replicates). The proposed method makes more efficient use of the measured data and avoids the subjective choice of a dissimilarity measure. This new methodology, when applied to real data, provides an easy-to-use bioinformatics solution for the cluster analysis of microarray experiments with replicates (see the Appendix). Even though the method is presented under the framework of microarray experiments, it is a general algorithm that can be used to identify clusters in any situation. The method's performance is evaluated using simulated and publicly available data set. Our results also clearly show that our method is not an extension of the conventional clustering method based on correlation or euclidean distance.  相似文献   

14.
Summary This paper addresses two questions. 1. Does Schistocerca gregaria detect edges which are defined solely by velocity-contrast, that is by the difference in the image speeds generated by an object and its background when the locust moves? 2. Is the locust's ability to measure the distance of a target by motion parallax independent of the relative motion between target and back-ground?A locust walking on a circular platform was surrounded by a stationary cylinder which was lined with an irregular texture. Against this background, the insect viewed 3 stationary, equidistant targets. One target was black, one grey and the last was textured like the cylinder. Peers and jumps were aimed preferentially at the textured and black targets showing that targets can be detected by virtue of their velocity-contrast with the background. When textured targets were wide, jumps were seen to be aimed at the targets' edge.To assess whether velocity-contrast between target and background distorts distance-estimates, we used jump-velocity as a measure of apparent distance and examined how it varied with different arrangements of target and background. When a textured background is close to a target or the target is very wide, velocity contrast is small. The locust's jump-velocity is then 10% greater than when velocity-contrast is increased by making the background distant or the target narrow. This suggests that the locust is efficient at separating signals encoding absolute motion from those encoding relative motion.  相似文献   

15.
Here we propose a weighted measure for the similarity analysis of DNA sequences. It is based on LZ complexity and (0,1) characteristic sequences of DNA sequences. This weighted measure enables biologists to extract similarity information from biological sequences according to their requirements. For example, by this weighted measure, one can obtain either the full similarity information or a similarity analysis from a given biological aspect. Moreover, the length of DNA sequence is not problematic. The application of the weighted measure to the similarity analysis of β-globin genes from nine species shows its flexibility.  相似文献   

16.
Atomic level molecular similarity and diversity studies have gained considerable importance through their wide application in Bioinformatics and Chemo-informatics for drug design. The availability of large volumes of data on chemical compounds requires new methodologies for efficient and effective searching of its archives in less time with optimal computational power. We describe an alphabetic algorithm for similarity searching based on atom-atom bonding preference for ligands. We represented 170 cyclindependent kinase 2 inhibitors using strings of pre-defined alphabets for searching using known protein sequence alignment tools. Thus, a common pattern was extracted using this set of compounds for database searching to retrieve similar active compounds. Area under the receiver operating characteristic (ROC) curve was used for the discrimination of similar and dissimilar compounds in the databases. An average retrieval rate of about 60% is obtained in cross-validation using the home-grown dataset and the directory of useful decoys (DUD, formally known as the ZINC database) data. This will help in the effective retrieval of similar compounds using database search.  相似文献   

17.
SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword.  相似文献   

18.
To unscramble the relationship between protein function and protein structure, it is essential to assess the protein similarity from different aspects. Although many methods have been proposed for protein structure alignment or comparison, alternative similarity measures are still strongly demanded due to the requirement of fast screening and query in large-scale structure databases. In this paper, we first formulate a novel representation of a protein structure, i.e., Feature Sequence of Surface (FSS). Then, a new score scheme is developed to measure the similarity between two representations. To verify the proposed method, numerical experiments are conducted in four different protein data sets. We also classify SARS coronavirus to verify the effectiveness of the new method. Furthermore, preliminary results of fast classification of the whole CATH v2.5.1 database based on the new macrostructure similarity are given as a pilot study. We demonstrate that the proposed approach to measure the similarities between protein structures is simple to implement, computationally efficient, and surprisingly fast. In addition, the method itself provides a new and quantitative tool to view a protein structure.  相似文献   

19.
20.
在DNA序列相似性的研究中,通常采用的动态规划算法对空位罚分函数缺乏理论依据而带有主观性,从而取得不同的结果,本文提出了一种基于DTW(Dynamic Time Warping,动态时间弯曲)距离的DNA序列相似性度量方法可以解决这一问题.通过DNA序列的图形表示把DNA序列转化为时间序列,然后计算DTW距离来度量序列相似度以表征DNA序列属性,得到能够比较DNA序列相似性度量方法,并用这个方法比较分析了七种东亚钳蝎神经毒素(Buthusmartensi Karsch neurotoxin)基因序列的相似性,验证了该度量方法的有效性和准确性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号