共查询到20条相似文献,搜索用时 15 毫秒
1.
This article presents a new method for the comparison of multiple macromolecular sequences. It is based on a hierarchical sequence synthesis procedure that does not require anya priori knowledge of the molecular structure of the sequences or the phylogenetic relations among the sequences. It differs from the existing methods as it has the capability of: (i) generating a statistical-structural model of the sequences through a synthesis process that detects homologous groups of the sequences, and (ii) aligning the sequences while the taxonomic tree of the sequences is being constructed in one single phase. It produces superior results when compared with some existing methods. 相似文献
2.
Protein sequence alignment has become an essential task in modern molecular biology research. A number of alignment techniques have been documented in literature and their corresponding tools are made available as freeware and commercial software. The choice and use of these tools for sequence alignment through the complete interpretation of alignment results is often considered non-trivial by end-users with limited skill in Bioinformatics algorithm development. Here, we discuss the comparison of sequence alignment techniques based on dynamic programming (N-W, S-W) and heuristics (LFASTA, BL2SEQ) for four sets of sequence data towards an educational purpose. The analysis suggests that heuristics based methods are faster than dynamic programming methods in alignment speed. 相似文献
3.
General methods of sequence comparison 总被引:9,自引:0,他引:9
Michael S. Waterman 《Bulletin of mathematical biology》1984,46(4):473-500
Mathematical methods for comparison of nucleic acid sequences are reviewed. There are two major methods of sequence comparison: dynamic programming and a method referred to here as the regions method. The problem types discussed are comparison of two sequences, location of long matching segments, efficient database searches and comparison of several sequences. This work was supported by a grant from the System Development Foundation. 相似文献
4.
A comprehensive comparison of multiple sequence alignment programs. 总被引:31,自引:4,他引:31
In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10-20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms. 相似文献
5.
6.
A comparison of multiple trait selection methods in the mouse 总被引:2,自引:0,他引:2
7.
Protein sequence comparison: methods and significance 总被引:1,自引:0,他引:1
8.
One of the most endangered assemblages of species in Europe is insectsassociated with old trees. For that reason there is a need of developing methodsto survey this fauna. This study aims at comparing three methods – windowtrapping, pitfall trapping and wood mould sampling – to assess speciesrichness and composition of the saproxylic beetle fauna in living, hollow oaks.We have used these methods at the same site, and to a large extent in the sametrees. Useful information was obtained from all methods, but they partiallytarget different assemblages of species. Window trapping collected the highestnumber of species. Pitfall trapping collected beetles associated with treehollows which rarely are collected by window traps and therefore it isprofitable to combine these two methods. As wood mould sampling is the cheapestmethod to use, indicator species should preferably be chosen among specieswhich are efficiently collected with this method. 相似文献
9.
The most popular way of comparing the performance of multiple sequence alignment programs is to use empirical testing on sets of test sequences. Several such test sets now exist, each with potential strengths and weaknesses. We apply several different alignment packages to 6 benchmark datasets, and compare their relative performances. HOMSTRAD, a collection of alignments of homologous proteins, is regularly used as a benchmark for sequence alignment though it is not designed as such, and lacks annotation of reliable regions within the alignment. We introduce this annotation into HOMSTRAD using protein structural superposition. Results on each database show that method performance is dependent on the input sequences. Alignment benchmarks are regularly used in combination to measure performance across a spectrum of alignment problems. Through combining benchmarks, it is possible to detect whether a program has been over-optimised for a single dataset, or alignment problem type. 相似文献
10.
11.
SUMMARY: MaxBench is a web-based system available for evaluating the results of sequence and structure comparison methods, based on the SCOP protein domain classification. The system makes it easy for developers to both compare the overall performance of their methods to standard algorithms and investigate the results of individual comparisons. AVAILABILITY: http://www.sanger.ac.uk/Users/lp1/MaxBench/ 相似文献
12.
We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from http://www.tcoffee.org/. 相似文献
13.
Vihinen Mauno; Euranto Antti; Luostarinen Petri; Nevalainen Olli 《Bioinformatics (Oxford, England)》1992,8(1):35-38
An algorithm for multiple sequence comparison was implementedin FORTRAN 77 for VAX/VMS in GCG-atible format. The MULTICOMPprogram package includes several procedures with which one querysequence can be compared simultaneously to several DNA, RNAor amino acid sequences. The same technique was also introducedfor comparing propensities of secondary structural features,which can be predicted on the basis of amino acid sequences.The technique has been applied to a wide range of sequence andstructural analyses. 相似文献
14.
A comparison of signal sequence prediction methods using a test set of signal peptides 总被引:7,自引:0,他引:7
We describe the creation of a test set containing secretory and non-secretory proteins. Five existing prediction programs for signal sequences and their cleavage sites are compared on the basis of this test set: SPScan, SigCleave, SignalP V1.1, SignalP V2.0. b2-HMM and SignalP V2.0.b2-NN. 相似文献
15.
MOTIVATION: Studies of efficient and sensitive sequence comparison methods are driven by a need to find homologous regions of weak similarity between large genomes. RESULTS: We describe an improved method for finding similar regions between two sets of DNA sequences. The new method generalizes existing methods by locating word matches between sequences under two or more word models and extending word matches into high-scoring segment pairs (HSPs). The method is implemented as a computer program named DDS2. Experimental results show that DDS2 can find more HSPs by using several word models than by using one word model. AVAILABILITY: The DDS2 program is freely available for academic use in binary code form at http://bioinformatics.iastate.edu/aat/align/align.html and in source code form from the corresponding author. 相似文献
16.
Chen Z 《Bioinformatics (Oxford, England)》2003,19(18):2456-2460
MOTIVATION: Comprehensive performance assessment is important for improving sequence database search methods. Sensitivity, selectivity and speed are three major yet usually conflicting evaluation criteria. The average precision (AP) measure aims to combine the sensitivity and selectivity features of a search algorithm. It can be easily visualized and extended to analyze results from a set of queries. Finally, the time-AP plot can clearly show the overall performance of different search methods. RESULTS: Experiments are performed based on the SCOP database. Popular sequence comparison algorithms, namely Smith-Waterman (SSEARCH), FASTA, BLAST and PSI-BLAST are evaluated. We find that (1) the low-complexity segment filtration procedure in BLAST actually harms its overall search quality; (2) AP scores of different search methods are approximately in proportion of the logarithm of search time; and (3) homologs in protein families with many members tend to be more obscure than those in small families. This measure may be helpful for developing new search algorithms and can guide researchers in selecting most suitable search methods. AVAILABILITY: Test sets and source code of this evaluation tool are available upon request. 相似文献
17.
Dan Gusfield 《Bulletin of mathematical biology》1993,55(1):141-154
Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and given two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value isguaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obviouslower bound on the value of the optimal alignment. 相似文献
18.
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies. 相似文献
19.
A comparison of somatotype methods 总被引:8,自引:0,他引:8
In order to compare Parnell's and Heath's somatotype methods, the authors independently somatotyped a series of 59 adult male and 61 adult female subjects, (1) using the criteria of Heath's method, (2) using the criteria of Parnell's method, and (3) taking into consideration tentatively adapted Parnell criteria in addition to Heath's criteria. The authors conclude that when use similar rating criteria their mean differences are smaller, their overall correlations are similar, and their percentage agreements to a half-unit are higher (96%) than for comparisons reported by other investigators. The study considers the potentially important relationships of measurements of subcutaneous fat to ratings of the first component. The similarity of distributions of subcutaneous fat measurements and of first component ratings in selected samples suggest important interrelationships among ratings of the first component, height/ weight ratios and subcutaneous fat measurements. The authors feel: (1) that Parnell's method fails to modify the basic weaknesses in Sheldon's somatotype method; and (2) that analyses of the anthropometric data basic to Parnell's method, if guided by the criteria of Heath's method, will further objectify and simplify Heath's method, will improve agreement among independent raters, and will increase the usefulness of somatotyping as a research instrument. 相似文献
20.
W R Taylor 《Protein engineering》1988,2(2):77-86