期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An iterative method for faster sum-of-pairs multiple sequence alignment

Reinert K Stoye J Will T 《Bioinformatics (Oxford, England)》2000,16(9):808-814

MOTIVATION: Multiple sequence alignment is an important tool in computational biology. In order to solve the task of computing multiple alignments in affordable time, the most commonly used multiple alignment methods have to use heuristics. Nevertheless, the computation of optimal multiple alignments is important in its own right, and it provides a means of evaluating heuristic approaches or serves as a subprocedure of heuristic alignment methods. RESULTS: We present an algorithm that uses the divide-and-conquer alignment approach together with recent results on search space reduction to speed up the computation of multiple sequence alignments. The method is adaptive in that depending on the time one wants to spend on the alignment, a better, up to optimal alignment can be obtained. To speed up the computation in the optimal alignment step, we apply the alpha(*) algorithm which leads to a procedure provably more efficient than previous exact algorithms. We also describe our implementation of the algorithm and present results showing the effectiveness and limitations of the procedure. 相似文献

2.

Approximate multiple protein structure alignment using the sum-of-pairs distance.

Jieping Ye Ravi Janardan 《Journal of computational biology》2004,11(5):986-1000

An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein for the set. The algorithm is a heuristic in that it computes an approximation to the optimal multiple structure alignment that minimizes the sum of the pairwise distances between the protein structures. The algorithm chooses an input protein as the initial consensus and computes a correspondence between the protein structures (which are represented as sets of unit vectors) using an approach analogous to the center-star method for multiple sequence alignment. From this correspondence, a set of rotation matrices (optimal for the given correspondence) is derived to align the structures and derive the new consensus. The process is iterated until the sum of pairwise distances converges. The computation of the optimal rotations is itself an iterative process that both makes use of the current consensus and generates simultaneously a new one. This approach is based on an interesting result that allows the sum of all pairwise distances to be represented compactly as distances to the consensus. Experimental results on several protein families are presented, showing that the algorithm converges quite rapidly. 相似文献

3.

Lower bounds on multiple sequence alignment using exact 3-way alignment

Charles J Colbourn Sudhir Kumar 《BMC bioinformatics》2007,8(1):140

Background

Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations. 相似文献

4.

Efficient methods for multiple sequence alignment with guaranteed error bounds 总被引：11，自引：0，他引：11

Dan Gusfield 《Bulletin of mathematical biology》1993,55(1):141-154

Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and given two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value isguaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obviouslower bound on the value of the optimal alignment. 相似文献

5.

Optimal sequence alignment using affine gap costs 总被引：27，自引：0，他引：27

Stephen F. Altschul Bruce W. Erickson 《Bulletin of mathematical biology》1986,48(5-6):603-616

When comparing two biological sequences, it is often desirable for a gap to be assigned a cost not directly proportional to its length. If affine gap costs are employed, in other words if opening a gap costsv and each null in the gap costsu, the algorithm of Gotoh (1982,J. molec. Biol. 162, 705) finds the minimum cost of aligning two sequences in orderMN steps. Gotoh's algorithm attempts to find only one from among possibly many optimal (minimum-cost) alignments, but does not always succeed. This paper provides an example for which this part of Gotoh's algorithm fails and describes an algorithm that finds all and only the optimal alignments. This modification of Gotoh's algorithm still requires orderMN steps. A more precise form of path graph than previously used is needed to represent accurately all optimal alignments for affine gap costs. 相似文献

6.

Optimal alignment between groups of sequences and its application to multiple sequence alignment 总被引：13，自引：2，他引：11

Gotoh Osamu 《Bioinformatics (Oxford, England)》1993,9(3):361-370

Four algorithms, A–D, were developed to align two groupsof biological sequences. Algorithm A is equivalent to the conventionaldynamic programming method widely used for aligning ordinarysequences, whereas algorithms B – D are designed to evaluatethe cost for a deletion/insertion more accurately when internalgaps are present in either or both groups of sequences. Rigorousoptimization of the ‘sum of pairs’ (SP) score isachieved by algorithm D, whose average performance is closeto O(MNL²) where M and N are numbers of sequences included inthe two groups and L is the mean length of the sequences. AlgorithmB uses some app mximations to cope with profile-based operations,whereas algorithm C is a simpler variant of algorithm D. Thesegroup-to-group alignment algorithms were applied to multiplesequence alignment with two iterative strategies: a progressivemethod based on a given binary tree and a randomized grouping-realignmentmethod. The advantages and disadvantages of the four algorithmsare discussed on the basis of the results of exatninations ofseveral protein families. 相似文献

7.

Local multiple sequence alignment using dead-end elimination 总被引：2，自引：0，他引：2

Lukashin AV Rosa JJ 《Bioinformatics (Oxford, England)》1999,15(11):947-953

MOTIVATION: Local multiple sequence alignment is a basic tool for extracting functionally important regions shared by a family of protein sequences. We present an effectively polynomial-time algorithm for rigorously solving the local multiple alignment problem. RESULTS: The algorithm is based on the dead-end elimination procedure that makes it possible to avoid an exhaustive search. In the framework of the sum-of-pairs scoring system, certain rejection criteria are derived in order to eliminate those sequence segments and segment pairs that can be mathematically shown to be inconsistent (dead-ending) with the globally optimal alignment. Iterative application of the elimination criteria results in a rapid reduction of combinatorial possibilities without considering them explicitly. In the vast majority of cases, the procedure converges to a unique globally optimal solution. In contrast to the exhaustive search, whose computational complexity is combinatorial, the algorithm is computationally feasible because the number of operations required to eliminate the dead-ending segments and segment pairs grows quadratically and cubically, respectively, with the total number of sequence elements. The method is illustrated on a set of protein families for which the globally optimal alignments are well recognized. AVAILABILITY: The source code of the program implementing the algorithm is available upon request from the authors. CONTACT: alex_lukashin@biogen.com. 相似文献

8.

A multiple sequence alignment program. 总被引：23，自引：7，他引：16

下载免费PDF全文

E Sobel H M Martinez 《Nucleic acids research》1986,14(1):363-374

A program is described for simultaneously aligning two or more molecular sequences which is based on first finding common segments above a specified length and then piecing these together to maximize an alignment scoring function. Optimal as well as near-optimal alignments are found, and there is also provided a means for randomizing the given sequences for testing the statistical significance of an alignment. Alignments may be made in the original alphabets of the sequences or in user-specified alternate ones to take advantage of chemical similarities (such as hydrophobic-hydrophilic). 相似文献

9.

Segment-based multiple sequence alignment

Rausch T Emde AK Weese D Döring A Notredame C Reinert K 《Bioinformatics (Oxford, England)》2008,24(16):i187-i192

相似文献

10.

Sequence alignment of citrate synthase proteins using a multiple sequence alignment algorithm and multiple scoring matrices 总被引：1，自引：0，他引：1

C M Henneke M J Danson D W Hough D J Osguthorpe 《Protein engineering》1989,2(8):597-604

The alignment of Escherichia coli citrate synthase to pig heart citrate synthase and the multiple alignment of the known sequences of the citrate synthase family of enzymes have been performed using six different amino acid similarity scoring matrices and a large range of gap penalty ratios for insertions and deletions of amino acids. The alignment studies have been performed as the first step in a project aimed at homology modelling E. coli citrate synthase (a hexamer) from pig heart citrate synthase (a dimer) in a molecular modelling approach to the study of multi-subunit enzymes. The effects of several important variables in producing realistic alignments have been investigated. The difference between multiple alignment of the family of enzymes versus simple pairwise alignment of the pig heart and E. coli proteins was explored. The effects of initial separate multiple alignments of the most highly related or most homologous species of the family of enzymes upon a subsequent pairwise alignment between species was evaluated. The value of 'fingerprinting' certain residues to bias the alignment in favour of matching those residues, as well as the worth of the computerized approach compared to an intuitive alignment technique, were assessed. 相似文献

11.

Strategies for multiple sequence alignment

Nicholas HB Ropelewski AJ Deerfield DW 《BioTechniques》2002,32(3):572-4, 576, 578 passim

We present an overview of multiple sequence alignments to outline the practical consequences for the choices among different techniques and parameters. We begin with a discussion of the scoring methods for quantifying the quality of a multiple sequence alignment, followed by a discussion of the algorithms implemented within a variety of multiple sequence alignment programs. We also discuss additional alignment details such as gap penalty and distance metrics. The paper concludes with a discussion on how to improve alignment quality and the limitations of the techniques described in this paper 相似文献

12.

Progressive multiple alignment using sequence triplet optimizations and three-residue exchange costs

Konagurthu AS Whisstock J Stuckey PJ 《Journal of bioinformatics and computational biology》2004,2(4):719-745

In this paper we demonstrate a practical approach to construct progressive multiple alignments using sequence triplet optimizations rather than a conventional pairwise approach. Using the sequence triplet alignments progressively provides a scope for the synthesis of a three-residue exchange amino acid substitution matrix. We develop such a 20 x 20 x 20 matrix for the first time and demonstrate how its use in optimal sequence triplet alignments increases the sensitivity of building multiple alignments. Various comparisons were made between alignments generated using the progressive triplet methods and the conventional progressive pairwise procedure. The assessment of these data reveal that, in general, the triplet based approaches generate more accurate sequence alignments than the traditional pairwise based procedures, especially between more divergent sets of sequences. 相似文献

13.

Probalign: multiple sequence alignment using partition function posterior probabilities 总被引：2，自引：0，他引：2

Roshan U Livesay DR 《Bioinformatics (Oxford, England)》2006,22(22):2715-2721

MOTIVATION: The maximum expected accuracy optimization criterion for multiple sequence alignment uses pairwise posterior probabilities of residues to align sequences. The partition function methodology is one way of estimating these probabilities. Here, we combine these two ideas for the first time to construct maximal expected accuracy sequence alignments. RESULTS: We bridge the two techniques within the program Probalign. Our results indicate that Probalign alignments are generally more accurate than other leading multiple sequence alignment methods (i.e. Probcons, MAFFT and MUSCLE) on the BAliBASE 3.0 protein alignment benchmark. Similarly, Probalign also outperforms these methods on the HOMSTRAD and OXBENCH benchmarks. Probalign ranks statistically highest (P-value < 0.005) on all three benchmarks. Deeper scrutiny of the technique indicates that the improvements are largest on datasets containing N/C-terminal extensions and on datasets containing long and heterogeneous length proteins. These points are demonstrated on both real and simulated data. Finally, our method also produces accurate alignments on long and heterogeneous length datasets containing protein repeats. Here, alignment accuracy scores are at least 10% and 15% higher than the other three methods when standard deviation of length is >300 and 400, respectively. AVAILABILITY: Open source code implementing Probalign as well as for producing the simulated data, and all real and simulated data are freely available from http://www.cs.njit.edu/usman/probalign 相似文献

14.

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

Shinsuke Yamada Osamu Gotoh Hayato Yamana 《BMC bioinformatics》2006,7(1):524-17

Background

Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. 相似文献

15.

基于Progressive多序列比对方法的求解多序列比对的启发式算法

张津郭茂祖王亚东《生物信息学》2005,3(4):171-174

在生物信息学研究中,生物序列比对问题占有重要的地位。多序列比对问题是一个NPC问题,由于时间和空间的限制不能够求出精确解。文中简要介绍了Feng和Doolittle提出的多序列比对算法的基本思想,并改进了该算法使之具有更好的比对精度。实验结果表明,新算法对解决一般的progressive多序列比对方法中遇到的局部最优问题有较好的效果。相似文献

16.

DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

Martin Schmollinger Kay Nieselt Michael Kaufmann Burkhard Morgenstern 《BMC bioinformatics》2004,5(1):128

Background

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. 相似文献

17.

A comprehensive comparison of multiple sequence alignment programs. 总被引：35，自引：4，他引：31

J D Thompson F Plewniak O Poch 《Nucleic acids research》1999,27(13):2682-2690

In recent years improvements to existing programs and the introduction of new iterative algorithms have changed the state-of-the-art in protein sequence alignment. This paper presents the first systematic study of the most commonly used alignment programs using BAliBASE benchmark alignments as test cases. Even below the 'twilight zone' at 10-20% residue identity, the best programs were capable of correctly aligning on average 47% of the residues. We show that iterative algorithms often offer improved alignment accuracy though at the expense of computation time. A notable exception was the effect of introducing a single divergent sequence into a set of closely related sequences, causing the iteration to diverge away from the best alignment. Global alignment programs generally performed better than local methods, except in the presence of large N/C-terminal extensions and internal insertions. In these cases, a local algorithm was more successful in identifying the most conserved motifs. This study enables us to propose appropriate alignment strategies, depending on the nature of a particular set of sequences. The employment of more than one program based on different alignment techniques should significantly improve the quality of automatic protein sequence alignment methods. The results also indicate guidelines for improvement of alignment algorithms. 相似文献

18.

Computational complexity of multiple sequence alignment with SP-score.

W Just 《Journal of computational biology》2001,8(6):615-623

It is shown that the multiple alignment problem with SP-score is NP-hard for each scoring matrix in a broad class M that includes most scoring matrices actually used in biological applications. The problem remains NP-hard even if sequences can only be shifted relative to each other and no internal gaps are allowed. It is also shown that there is a scoring matrix M(0) such that the multiple alignment problem for M(0) is MAX-SNP-hard, regardless of whether or not internal gaps are allowed. 相似文献

19.

Gap costs for multiple sequence alignment 总被引：6，自引：0，他引：6

S F Altschul 《Journal of theoretical biology》1989,138(3):297-309

Standard methods for aligning pairs of biological sequences charge for the most common mutations, which are substitutions, deletions and insertions. Because a single mutation may insert or delete several nucleotides, gap costs that are not directly proportional to gap length are usually the most effective. How to extend such gap costs to alignments of three or more sequences is not immediately obvious, and a variety of approaches have been taken. This paper argues that, since gap and substitution costs together specify optimal alignments, they should be defined using a common rationale. Specifically, a new definition of gap costs for multiple alignments is proposed and compared with previous ones. Since the new definition links a multiple alignment's cost to that of its pairwise projections, it allows knowledge gained about two-sequence alignments to bear on the multiple alignment problem. Also, such linkage is a key element of recent algorithms that have rendered practical the simultaneous alignment of as many as six sequences. 相似文献

20.

Heuristics for multiobjective multiple sequence alignment

Maryam Abbasi Luís Paquete Francisco B. Pereira 《Biomedical engineering online》2016,15(1):70

Background

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.

Methods

We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments.

Results and conclusions

The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.

相似文献