首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Optimal sequence alignment allowing for long gaps   总被引:7,自引:0,他引:7  
A new algorithm for optimal sequence alignment allowing for long insertions and deletions is developed. The algorithm requires O((L+C)MN) computational steps, O(LN) primary memory and O(MN) secondary memory storage, whereM andN(M≥N) are sequence lengths,L (typicallyL≤3) is the number of segment specifying the gap weighting function, andC is a constant. We have also modified our earlier traceback algorithm so that it finds all and only the optimal alignments in a compact form of a directed graph. The current versions accept a set of aligned sequences as input, which facilitates multiple sequence alignment by some iterative procedures. Dedicated to Professor Akiyoshi Wada on the occasion of his 60th birthday.  相似文献   

2.
Given a sequenceA and regular expressionR, theapproximate regular expression matching problem is to find a sequence matchingR whose optimal alignment withA is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in timeO(MN), whereM andN are the lengths ofA andR. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires onlyO(N) space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for substrings ofA that strongly align with a sequence inR, as required for typical data base searches. We also show how to deliver an optimal alignment betweenA andR in onlyO(N+logM) space usingO(MN logM) time. Finally, anO(MN(M+N)+N 2logN) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.  相似文献   

3.

Background  

Studies on the distribution of indel sizes have consistently found that they obey a power law. This finding has lead several scientists to propose that logarithmic gap costs, G (k) = a + c ln k, are more biologically realistic than affine gap costs, G (k) = a + bk, for sequence alignment. Since quick and efficient affine costs are currently the most popular way to globally align sequences, the goal of this paper is to determine whether logarithmic gap costs improve alignment accuracy significantly enough the merit their use over the faster affine gap costs.  相似文献   

4.
Sequence alignment underpins common tasks in molecular biology, including genome annotation, molecular phylogenetics, and homology modeling. Fundamental to sequence alignment is the placement of gaps, which represent character insertions or deletions. We assessed the ability of a generalized affine gap cost model to reliably detect remote protein homology and to produce high-quality alignments. Generalized affine gap alignment with optimal gap parameters performed as well as the traditional affine gap model in remote homology detection. Evaluation of alignment quality showed that the generalized affine model aligns fewer residue pairs than the traditional affine model but achieves significantly higher per-residue accuracy. We conclude that generalized affine gap costs should be used when alignment accuracy carries more importance than aligned sequence length.  相似文献   

5.
MOTIVATION: Sequence alignments obtained using affine gap penalties are not always biologically correct, because the insertion of long gaps is over-penalised. There is a need for an efficient algorithm which can find local alignments using non-linear gap penalties. RESULTS: A dynamic programming algorithm is described which computes optimal local sequence alignments for arbitrary, monotonically increasing gap penalties, i.e. where the cost g(k) of inserting a gap of k symbols is such that g(k) >/= g(k-1). The running time of the algorithm is dependent on the scoring scheme; if the expected score of an alignment between random, unrelated sequences of lengths m, n is proportional to log mn, then with one exception, the algorithm has expected running time O(mn). Elsewhere, the running time is no greater than O(mn(m+n)). Optimisations are described which appear to reduce the worst-case run-time to O(mn) in many cases. We show how using a non-affine gap penalty can dramatically increase the probability of detecting a similarity containing a long gap. AVAILABILITY: The source code is available to academic collaborators under licence.  相似文献   

6.
Gap costs for multiple sequence alignment   总被引:6,自引:0,他引:6  
Standard methods for aligning pairs of biological sequences charge for the most common mutations, which are substitutions, deletions and insertions. Because a single mutation may insert or delete several nucleotides, gap costs that are not directly proportional to gap length are usually the most effective. How to extend such gap costs to alignments of three or more sequences is not immediately obvious, and a variety of approaches have been taken. This paper argues that, since gap and substitution costs together specify optimal alignments, they should be defined using a common rationale. Specifically, a new definition of gap costs for multiple alignments is proposed and compared with previous ones. Since the new definition links a multiple alignment's cost to that of its pairwise projections, it allows knowledge gained about two-sequence alignments to bear on the multiple alignment problem. Also, such linkage is a key element of recent algorithms that have rendered practical the simultaneous alignment of as many as six sequences.  相似文献   

7.
Fast, optimal alignment of three sequences using linear gap costs   总被引:2,自引:0,他引:2  
Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a fixed cost to each possible point mutation-mismatch, deletion, insertion. These algorithms tend to find optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(n(k)) time. This is impractical if k>/=3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs.  相似文献   

8.
Summary The problem of choosing an alignment of two or more nucleotide sequences is particularly difficult for nucleic acids, such as 5S ribosomal RNA, which do not code for protein and for which secondary structure is unknown. Given a set of costs for the various types of replacement mutations and for base insertion or deletion, we present a dynamic programming algorithm which finds the optimal (least costly) alignment for a set of N sequences simultaneously, where each sequence is associated with one of the N tips of a given evolutionary tree. Concurrently, protosequences are constructed corresponding to the ancestral nodes of the tree. A version of this algorithm, modified to be computationally feasible, is implemented to align the sequences of 5S RNA from nine organisms. Complete sets of alignments and proto-sequence reconstructions are done for a large number of different con-figurations of mutation costs. Examination of the family of curves of total replacements inferred versus the ratio of transitions/trans-versions inferred, each curve corresponding to a given number of in-sertions-deletions inferred, provides a method for estimating relative costs and relative frequencies for these different types of mutation.  相似文献   

9.
A major problem in sequence alignments based on the standard dynamic programming method is that the optimal path does not necessarily yield the best equivalencing of residues assessed by structural or functional criteria. An algorithm is presented that finds suboptimal alignments of protein sequences by a simple modification to the standard dynamic programming method. The standard pairwise weight matrix elements are modified in order to penalize, but not eliminate, the equivalencing of residues obtained from previous alignments. The algorithm thereby yields a limited set of alternate alignments that can differ considerably from the optimal. The approach is benchmarked on the alignments of immunoglobulin domains. Without a prior knowledge of the optimal choice of gap penalty, one of the suboptimal alignments is shown to be more accurate than the optimal.  相似文献   

10.
Locally optimal subalignments using nonlinear similarity functions   总被引:2,自引:0,他引:2  
Nonlinear similarity functions are often better than linear functions at distinguishing interesting subalignments from those due to chance. Nonlinear similarity functions useful for comparing biological sequences are developed. Several new algorithms are presented for finding locally optimal subalignments of two sequences. Unlike previous algorithms, they may use any reasonable similarity function as a selection criterion. Among these algorithms are VV-1, which finds all and only the locally optimal subalignments of two sequences, and CC-1, which finds all and only the weakly locally optimal subalignments of two sequences. The VV-1 algorithm is slow and interesting only for theoretical reasons. In contrast, the CC-1 algorithm has average time complexityO(MN) when used to find only very good subalignments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号