首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Landan G  Graur D 《Gene》2009,441(1-2):141-147
We characterize pairwise and multiple sequence alignment (MSA) errors by comparing true alignments from simulations of sequence evolution with reconstructed alignments. The vast majority of reconstructed alignments contain many errors. Error rates rapidly increase with sequence divergence, thus, for even intermediate degrees of sequence divergence, more than half of the columns of a reconstructed alignment may be expected to be erroneous. In closely related sequences, most errors consist of the erroneous positioning of a single indel event and their effect is local. As sequences diverge, errors become more complex as a result of the simultaneous mis-reconstruction of many indel events, and the lengths of the affected MSA segments increase dramatically. We found a systematic bias towards underestimation of the number of gaps, which leads to the reconstructed MSA being on average shorter than the true one. Alignment errors are unavoidable even when the evolutionary parameters are known in advance. Correct reconstruction can only be guaranteed when the likelihood of true alignment is uniquely optimal. However, true alignment features are very frequently sub-optimal or co-optimal, with the result that optimal albeit erroneous features are incorporated into the reconstructed MSA. Progressive MSA utilizes a guide-tree in the reconstruction of MSAs. The quality of the guide-tree was found to affect MSA error levels only marginally.  相似文献   

2.
While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on a few sets of benchmark alignments that are commonly used to measure alignment accuracy, and the average improvements in accuracy can be as much as 1–3% on protein sequence alignment and 5–10% on DNA/RNA sequence alignment. Unlike previous algorithms, consistent average improvements can be obtained across all identity levels.  相似文献   

3.
4.
Protein multiple sequence alignment is an important bioinformatics tool. It has important applications in biological evolution analysis and protein structure prediction. A variety of alignment algorithms in this field have achieved great success. However, each algorithm has its own inherent deficiencies. In this paper, permutation similarity is proposed to evaluate several protein multiple sequence alignment algorithms that are widely used currently. As the permutation similarity method only concerns the relative order of different protein evolutionary distances, without taking into account the slight difference between the evolutionary distances, it can get more robust evaluations. The longest common subsequence method is adopted to define the similarity between different permutations. Using these methods, we assessed Dialign, Tcoffee, ClustalW and Muscle and made comparisons among them.  相似文献   

5.
This article presents an immune inspired algorithm to tackle the Multiple Sequence Alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may also be applied to DNA alignments. The problem of finding the multiple alignment was investigated in the study by Bonizzoni and Vedova and Wang and Jiang, and proved to be a NP-hard (non-deterministic polynomial-time hard) problem. The presented algorithm, called Immunological Multiple Sequence Alignment Algorithm (IMSA), incorporates two new strategies to create the initial population and specific ad hoc mutation operators. It is based on the 'weighted sum of pairs' as objective function, to evaluate a given candidate alignment. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0, 2.0 and 3.0), and experimental results indicate that it is comparable with state-of-the-art multiple alignment algorithms, in terms of quality of alignments, weighted Sums-of-Pairs (SP) and Column Score (CS) values. The main novelty of IMSA is its ability to generate more than a single suboptimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select a biologically relevant multiple sequence alignment. Finally, the designed algorithm can be used as a local search procedure to properly explore promising alignments of the search space.  相似文献   

6.
MOTIVATION: Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS: We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database.  相似文献   

7.
多序列比对是一种重要的生物信息学工具,在生物的进化分析以及蛋白质的结构预测方面有着重要的应用。以ClustalW为代表的渐进式多序列比对算法在这个领域取得了很大的成功,成为应用最为广泛的多序列比对程序。但其固有的缺陷阻碍了比对精度的进一步提高,近年来出现了许多渐进式比对算法的改进算法,并取得良好的效果。本文选取了其中比较有代表性的几种算法对其基本比对思想予以描述,并且利用多序列比对程序平台BAliBASE和仿真程序ROSE对它们的精度和速度分别进行了比较和评价。  相似文献   

8.
The Smith-Waterman (SW) algorithm is a typical technique for local sequence alignment in computational biology. However, the SW algorithm does not consider the local behaviours of the amino acids, which may result in loss of some useful information. Inspired by the success of Markov Edit Distance (MED) method, this paper therefore proposes a novel Markov pairwise protein sequence alignment (MPPSA) method that takes the local context dependencies into consideration. The numerical results have shown its superiority to the SW for pairwise protein sequence comparison.  相似文献   

9.
Multiple sequence alignment by a pairwise algorithm   总被引:1,自引:0,他引:1  
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem. Received on September 25, 1986; accepted on January 29, 1987  相似文献   

10.
Efficient sequence alignment algorithms   总被引:3,自引:0,他引:3  
Sequence alignments are becoming more important with the increase of nucleic acid data. Fitch and Smith have recently given an example where multiple insertion/deletions (rather than a series of adjacent single insertion/deletions) are necessary to achieve the correct alignment. Multiple insertion/deletions are known to increase computation time from O(n2) to O(n3) although Gotoh has presented an O(n2) algorithm in the case the multiple insertion/deletion weighting function is linear. It is argued in this paper that it could be desirable to use concave weighting functions. For that case, an algorithm is derived that is conjectured to be O(n2).  相似文献   

11.
12.

Background  

Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy.  相似文献   

13.
序列比对是生物信息学研究的一个重要工具,它在序列拼接、蛋白质结构预测、蛋白质结构功能分析、系统进化分析、数据库检索以及引物设计等问题的研究中被广泛使用。本文详细介绍了在生物信息学中常用的一些序列比对算法,比较了这些算法所需的计算复杂度,优缺点,讨论了各自的使用范围,并指出今后序列比对研究的发展方向。  相似文献   

14.
Nicholas HB  Ropelewski AJ  Deerfield DW 《BioTechniques》2002,32(3):572-4, 576, 578 passim
We present an overview of multiple sequence alignments to outline the practical consequences for the choices among different techniques and parameters. We begin with a discussion of the scoring methods for quantifying the quality of a multiple sequence alignment, followed by a discussion of the algorithms implemented within a variety of multiple sequence alignment programs. We also discuss additional alignment details such as gap penalty and distance metrics. The paper concludes with a discussion on how to improve alignment quality and the limitations of the techniques described in this paper  相似文献   

15.
在生物信息学研究中,生物序列比对问题占有重要的地位。多序列比对问题是一个NPC问题,由于时间和空间的限制不能够求出精确解。文中简要介绍了Feng和Doolittle提出的多序列比对算法的基本思想,并改进了该算法使之具有更好的比对精度。实验结果表明,新算法对解决一般的progressive多序列比对方法中遇到的局部最优问题有较好的效果。  相似文献   

16.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

17.

Background  

We are interested in the problem of predicting secondary structure for small sets of homologous RNAs, by incorporating limited comparative sequence information into an RNA folding model. The Sankoff algorithm for simultaneous RNA folding and alignment is a basis for approaches to this problem. There are two open problems in applying a Sankoff algorithm: development of a good unified scoring system for alignment and folding and development of practical heuristics for dealing with the computational complexity of the algorithm.  相似文献   

18.
Bayesian adaptive sequence alignment algorithms   总被引:2,自引:1,他引:2  
The selection of a scoring matrix and gap penalty parameters continues to be an important problem in sequence alignment. We describe here an algorithm, the 'Bayes block aligner, which bypasses this requirement. Instead of requiring a fixed set of parameter settings, this algorithm returns the Bayesian posterior probability for the number of gaps and for the scoring matrices in any series of interest. Furthermore, instead of returning the single best alignment for the chosen parameter settings, this algorithm returns the posterior distribution of all alignments considering the full range of gapping and scoring matrices selected, weighing each in proportion to its probability based on the data. We compared the Bayes aligner with the popular Smith-Waterman algorithm with parameter settings from the literature which had been optimized for the identification of structural neighbors, and found that the Bayes aligner correctly identified more structural neighbors. In a detailed examination of the alignment of a pair of kinase and a pair of GTPase sequences, we illustrate the algorithm's potential to identify subsequences that are conserved to different degrees. In addition, this example shows that the Bayes aligner returns an alignment-free assessment of the distance between a pair of sequences.   相似文献   

19.
Distance-based approaches in phylogenetics such as Neighbor-Joining are a fast and popular approach for building trees. These methods take pairs of sequences, and from them construct a value that, in expectation, is additive under a stochastic model of site substitution. Most models assume a distribution of rates across sites, often based on a gamma distribution. Provided the (shape) parameter of this distribution is known, the method can correctly reconstruct the tree. However, if the shape parameter is not known then we show that topologically different trees, with different shape parameters and associated positive branch lengths, can lead to exactly matching distributions on pairwise site patterns between all pairs of taxa. Thus, one could not distinguish between the two trees using pairs of sequences without some prior knowledge of the shape parameter. More surprisingly, this can happen for any choice of distinct shape parameters on the two trees, and thus the result is not peculiar to a particular or contrived selection of the shape parameters. On a positive note, we point out known conditions where identifiability can be restored (namely, when the branch lengths are clocklike, or if methods such as maximum likelihood are used).  相似文献   

20.
Gap costs for multiple sequence alignment   总被引:6,自引:0,他引:6  
Standard methods for aligning pairs of biological sequences charge for the most common mutations, which are substitutions, deletions and insertions. Because a single mutation may insert or delete several nucleotides, gap costs that are not directly proportional to gap length are usually the most effective. How to extend such gap costs to alignments of three or more sequences is not immediately obvious, and a variety of approaches have been taken. This paper argues that, since gap and substitution costs together specify optimal alignments, they should be defined using a common rationale. Specifically, a new definition of gap costs for multiple alignments is proposed and compared with previous ones. Since the new definition links a multiple alignment's cost to that of its pairwise projections, it allows knowledge gained about two-sequence alignments to bear on the multiple alignment problem. Also, such linkage is a key element of recent algorithms that have rendered practical the simultaneous alignment of as many as six sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号