共查询到20条相似文献,搜索用时 15 毫秒
1.
Efficient sequence alignment algorithms 总被引:3,自引:0,他引:3
M S Waterman 《Journal of theoretical biology》1984,108(3):333-337
Sequence alignments are becoming more important with the increase of nucleic acid data. Fitch and Smith have recently given an example where multiple insertion/deletions (rather than a series of adjacent single insertion/deletions) are necessary to achieve the correct alignment. Multiple insertion/deletions are known to increase computation time from O(n2) to O(n3) although Gotoh has presented an O(n2) algorithm in the case the multiple insertion/deletion weighting function is linear. It is argued in this paper that it could be desirable to use concave weighting functions. For that case, an algorithm is derived that is conjectured to be O(n2). 相似文献
2.
Bayesian adaptive sequence alignment algorithms 总被引:2,自引:1,他引:2
The selection of a scoring matrix and gap penalty parameters continues to
be an important problem in sequence alignment. We describe here an
algorithm, the 'Bayes block aligner, which bypasses this requirement.
Instead of requiring a fixed set of parameter settings, this algorithm
returns the Bayesian posterior probability for the number of gaps and for
the scoring matrices in any series of interest. Furthermore, instead of
returning the single best alignment for the chosen parameter settings, this
algorithm returns the posterior distribution of all alignments considering
the full range of gapping and scoring matrices selected, weighing each in
proportion to its probability based on the data. We compared the Bayes
aligner with the popular Smith-Waterman algorithm with parameter settings
from the literature which had been optimized for the identification of
structural neighbors, and found that the Bayes aligner correctly identified
more structural neighbors. In a detailed examination of the alignment of a
pair of kinase and a pair of GTPase sequences, we illustrate the
algorithm's potential to identify subsequences that are conserved to
different degrees. In addition, this example shows that the Bayes aligner
returns an alignment-free assessment of the distance between a pair of
sequences.
相似文献
3.
Michael S. Waterman 《Bulletin of mathematical biology》1994,56(4):743-767
Recently algorithms for parametric alignment (Watermanet al., 1992,Natl Acad. Sci. USA
89, 6090–6093; Gusfieldet al., 1992,Proceedings of the Third Annual ACM-SIAM Discrete Algorithms) find optimal scores for all penalty parameters, both for global and local sequence alignment. This paper reviews those techniques.
Then in the main part of this paper dynamic programming methods are used to compute ensemble alignment, finding all alignment
scores for all parameters. Both global and local ensemble alignments are studied, and parametric alignment is used to compute
near optimal ensemble alignments. 相似文献
4.
Notredame C 《PLoS computational biology》2007,3(8):e123
5.
Pei J 《Current opinion in structural biology》2008,18(3):382-386
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality. 相似文献
6.
Protein multiple sequence alignment is an important bioinformatics tool. It has important applications in biological evolution analysis and protein structure prediction. A variety of alignment algorithms in this field have achieved great success. However, each algorithm has its own inherent deficiencies. In this paper, permutation similarity is proposed to evaluate several protein multiple sequence alignment algorithms that are widely used currently. As the permutation similarity method only concerns the relative order of different protein evolutionary distances, without taking into account the slight difference between the evolutionary distances, it can get more robust evaluations. The longest common subsequence method is adopted to define the similarity between different permutations. Using these methods, we assessed Dialign, Tcoffee, ClustalW and Muscle and made comparisons among them. 相似文献
7.
This article presents an immune inspired algorithm to tackle the Multiple Sequence Alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may also be applied to DNA alignments. The problem of finding the multiple alignment was investigated in the study by Bonizzoni and Vedova and Wang and Jiang, and proved to be a NP-hard (non-deterministic polynomial-time hard) problem. The presented algorithm, called Immunological Multiple Sequence Alignment Algorithm (IMSA), incorporates two new strategies to create the initial population and specific ad hoc mutation operators. It is based on the 'weighted sum of pairs' as objective function, to evaluate a given candidate alignment. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0, 2.0 and 3.0), and experimental results indicate that it is comparable with state-of-the-art multiple alignment algorithms, in terms of quality of alignments, weighted Sums-of-Pairs (SP) and Column Score (CS) values. The main novelty of IMSA is its ability to generate more than a single suboptimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select a biologically relevant multiple sequence alignment. Finally, the designed algorithm can be used as a local search procedure to properly explore promising alignments of the search space. 相似文献
8.
Background
Many algorithms exist for protein structural alignment, based on internal protein coordinates or on explicit superposition of the structures. These methods are usually successful for detecting structural similarities. However, current practical methods are seldom supported by convergence theories. In particular, although the goal of each algorithm is to maximize some scoring function, there is no practical method that theoretically guarantees score maximization. A practical algorithm with solid convergence properties would be useful for the refinement of protein folding maps, and for the development of new scores designed to be correlated with functional similarity. 相似文献9.
A new interactive protein sequence alignment program and comparison of its results with widely used algorithms 总被引:1,自引:0,他引:1
A computer program that allows interactive sequence comparisonis described. It graphically displays a search matrix usingresidue physicochemical characteristics and multilength segmentalcomparisons. The user selects through a mousing device and screenpointer the sequence spans to be matched. The results of thismethod are compared with those of ALIGN and BESTFIT. Received on August 23, 1988; accepted on December 6, 1988 相似文献
10.
Background
While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. 相似文献11.
Bilu Y Agarwal PK Kolodny R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(4):408-422
Multiple sequence alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm 相似文献
12.
The Smith-Waterman (SW) algorithm is a typical technique for local sequence alignment in computational biology. However, the SW algorithm does not consider the local behaviours of the amino acids, which may result in loss of some useful information. Inspired by the success of Markov Edit Distance (MED) method, this paper therefore proposes a novel Markov pairwise protein sequence alignment (MPPSA) method that takes the local context dependencies into consideration. The numerical results have shown its superiority to the SW for pairwise protein sequence comparison. 相似文献
13.
Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences. 相似文献
14.
Francisco M. Ortu?o Olga Valenzuela Hector Pomares Fernando Rojas Javier P. Florido Jose M. Urquiza Ignacio Rojas 《Nucleic acids research》2013,41(1):e26
Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased. 相似文献
15.
W R Taylor 《Journal of molecular biology》1986,188(2):233-258
A pattern-matching procedure is described, based on fitting templates to the sequence, which allows general structural constraints to be imposed on the patterns identified. The templates correspond to structurally conserved regions of the sequence and were initially derived from a small number of related sequences whose tertiary structures are known. The templates were then made more representative by aligning other sequences of unknown structure. Two alignments were built up containing 100 immunoglobulin variable domain sequences and 85 constant domain sequences, respectively. From each of these extended alignments, templates were generated to represent features conserved in all the sequences. These consisted mainly of patterns of hydrophobicity associated with beta-structure. For structurally conserved beta-strands with no conserved features, templates based on general secondary structure prediction principles were used to identify their possible locations. The specificity of the templates was demonstrated by their ability to identify the conserved features in known immunoglobulin and immunoglobulin-related sequences but not in other non-immunoglobulin sequences. 相似文献
16.
17.
Contact-based sequence alignment 总被引:1,自引:1,他引:1
This paper introduces the novel method of contact-based protein sequence alignment, where structural information in the form of contact mutation probabilities is incorporated into an alignment routine using contact-mutation matrices (CAO: Contact Accepted mutatiOn). The contact-based alignment routine optimizes the score of matched contacts, which involves four (two per contact) instead of two residues per match in pairwise alignments. The first contact refers to a real side-chain contact in a template sequence with known structure, and the second contact is the equivalent putative contact of a homologous query sequence with unknown structure. An algorithm has been devised to perform a pairwise sequence alignment based on contact information. The contact scores were combined with PAM-type (Point Accepted Mutation) substitution scores after parameterization of gap penalties and score weights by means of a genetic algorithm. We show that owing to the structural information contained in the CAO matrices, significantly improved alignments of distantly related sequences can be obtained. This has allowed us to annotate eight putative Drosophila IGF sequences. Contact-based sequence alignment should therefore prove useful in comparative modelling and fold recognition. 相似文献
18.
Multiple sequence alignment 总被引:13,自引:0,他引:13
A method has been developed for aligning segments of several sequences at once. The number of search steps depends only polynomially on the number of sequences, instead of exponentially, because most alignments are rejected without being evaluated explicitly. A data structure herein called the "heap" facilitates this process. For a set of n sequence segments, the overall similarity is taken to be the sum of all the constituent segment pair similarities, which are in turn sums of corresponding residue similarity scores from a Table. The statistical models that test alignments for significance make it possible to group sequences objectively, even when most or all of the interrelationships are weak. These tests are very sensitive, while remaining quite conservative, and discourage the addition of "misfit" sequences to an existing set. The new techniques are applied to a set of five DNA-binding proteins, to a group of three enzymes that employ the coenzyme FAD, and to a control set. The alignment previously proposed for the DNA-binding proteins on the basis of structural comparisons and inspection of sequences is supported quite dramatically, and a highly significant alignment is found for the FAD-binding proteins. 相似文献
19.
Homology-extended sequence alignment 总被引:4,自引:1,他引:4
We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading. 相似文献
20.
This paper presents a dynamic programming algorithm for aligning two sequeces when the alignment is constrained to lie between two arbitrary boundary lines in the dynamic programming matrix. For affine gap penalties, the algorithm requires onlyO(F) computation time andO(M+N) space, whereF is the area of the feasible region andM andN are the sequence lengths. The result extends to concave gap penalties, with somewhat increased time and space bounds. K.-M. C. and W. M. were supported in part by grant R01 LM05110 from the National Library of Medicine. R. C. H. was supported by PHS grant R01 DK27635. 相似文献