共查询到20条相似文献,搜索用时 46 毫秒
1.
Bilu Y Agarwal PK Kolodny R 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(4):408-422
Multiple sequence alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm 相似文献
2.
基于动态规划的快速序列比对算法 总被引:3,自引:0,他引:3
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。 相似文献
3.
Poleksic A 《Journal of bioinformatics and computational biology》2011,9(3):367-382
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods. 相似文献
4.
5.
6.
Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a fixed cost to each possible point mutation-mismatch, deletion, insertion. These algorithms tend to find optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(n(k)) time. This is impractical if k>/=3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs. 相似文献
7.
A major problem in sequence alignments based on the standard dynamic programming method is that the optimal path does not necessarily yield the best equivalencing of residues assessed by structural or functional criteria. An algorithm is presented that finds suboptimal alignments of protein sequences by a simple modification to the standard dynamic programming method. The standard pairwise weight matrix elements are modified in order to penalize, but not eliminate, the equivalencing of residues obtained from previous alignments. The algorithm thereby yields a limited set of alternate alignments that can differ considerably from the optimal. The approach is benchmarked on the alignments of immunoglobulin domains. Without a prior knowledge of the optimal choice of gap penalty, one of the suboptimal alignments is shown to be more accurate than the optimal. 相似文献
8.
Hong Changjin Tewfik Ahmed H. 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2009,6(4):570-582
Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound” (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm. 相似文献
9.
O Gotoh 《Journal of theoretical biology》1986,121(3):327-337
The major algorithms currently used for aligning biological sequences are those based on dynamic programming method. A dynamic programming algorithm consists of two major procedures, forward and traceback routines. This paper describes a dynamic programming algorithm for aligning three sequences at a time. Deletions and insertions are penalized according to their numbers and lengths. A forward process is accomplished in O(L3) computational steps, where L is the average sequence length. On the other hand, a traceback process is done in T steps, where T is the number of elementary configurations involved in the optimal alignment (usually T much less than L). The traceback procedure uses an effective technique for memory management, which is applicable to a wide range of sequence-matching methods. 相似文献
10.
Diego A. Oyarzún Brian P. Ingalls Richard H. Middleton Dimitrios Kalamatianos 《Bulletin of mathematical biology》2009,71(8):1851-1872
The regulation of cellular metabolism facilitates robust cellular operation in the face of changing external conditions. The
cellular response to this varying environment may include the activation or inactivation of appropriate metabolic pathways.
Experimental and numerical observations of sequential timing in pathway activation have been reported in the literature. It
has been argued that such patterns can be rationalized by means of an underlying optimal metabolic design. In this paper we
pose a dynamic optimization problem that accounts for time-resource minimization in pathway activation under constrained total
enzyme abundance. The optimized variables are time-dependent enzyme concentrations that drive the pathway to a steady state
characterized by a prescribed metabolic flux. The problem formulation addresses unbranched pathways with irreversible kinetics.
Neither specific reaction kinetics nor fixed pathway length are assumed.
In the optimal solution, each enzyme follows a switching profile between zero and maximum concentration, following a temporal
sequence that matches the pathway topology. This result provides an analytic justification of the sequential activation previously
described in the literature. In contrast with the existent numerical approaches, the activation sequence is proven to be optimal
for a generic class of monomolecular kinetics. This class includes, but is not limited to, Mass Action, Michaelis–Menten,
Hill, and some Power-law models. This suggests that sequential enzyme expression may be a common feature of metabolic regulation,
as it is a robust property of optimal pathway activation. 相似文献
11.
A dynamic programming algorithm for finding alternative RNA secondary structures. 总被引:14,自引:8,他引:6
下载免费PDF全文
![点击此处可从《Nucleic acids research》网站下载免费的PDF全文](/ch/ext_images/free.gif)
Dynamic programming algorithms that predict RNA secondary structure by minimizing the free energy have had one important limitation. They were able to predict only one optimal structure. Given the uncertainties of the thermodynamic data and the effects of proteins and other environmental factors on structure, the optimal structure predicted by these methods may not have biological significance. We present a dynamic programming algorithm that can determine optimal and suboptimal secondary structures for an RNA. The power and utility of the method is demonstrated in the folding of the intervening sequence of the rRNA of Tetrahymena. By first identifying the major secondary structures corresponding to the lowest free energy minima, a secondary structure of possible biological significance is derived. 相似文献
12.
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank (PDB) a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is used to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50 and 100%) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6 A RMSD from the native structure) decays linearly as a function of the TM structural-alignment score. 相似文献
13.
Dynamic programming algorithms for restriction map comparison 总被引:1,自引:0,他引:1
For most sequence comparison problems there is a correspondingmap comparison algorithm. While map data may appear to be incompatiblewith dynamic programming, we show in this paper that the rigorand efficiency of dynamic programming algorithms carry overto the map comparison algorithms. We present algorithms forrestriction map comparison that deal with two types of map errors:(i) closely spaced sites for different enzymes can be orderedincorrectly, and (ii) closely spaced sites for the same enzymecan be mapped as a single site. The new algorithms are a naturalextension of a previous map comparison model. Dynamic programmingalgorithms for computing optimal global and local alignmentsunder the new model are described. The new algorithms take aboutthe same order of time as previous map comparison algorithms.Programs implementing some of the new algorithms are used tofind similar regions within the Escherichia coli restrictionmap of Kohara et al. 相似文献
14.
A greedy algorithm for aligning DNA sequences. 总被引:39,自引:0,他引:39
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information. 相似文献
15.
M Zuker 《Journal of molecular biology》1991,221(2):403-420
A molecular sequence alignment algorithm based on dynamic programming has been extended to allow the computation of all pairs of residues that can be part of optimal and suboptimal sequence alignments. The uncertainties inherent in sequence alignment can be displayed using a new form of dot plot. The method allows the qualitative assessment of whether or not two sequences are related, and can reveal what parts of the alignment are better determined than others. It also permits the computation of representative optimal and suboptimal alignments. The relation between alignment reliability and alignment parameters is discussed. Other applications are to cyclical permutations of sequences and the detection of self-similarity. An application to multiple sequence alignment is noted. 相似文献
16.
17.
Protein engineering by combinatorial site-directed mutagenesis evaluates a portion of the sequence space near a target protein, seeking variants with improved properties (e.g., stability, activity, immunogenicity). In order to improve the hit-rate of beneficial variants in such mutagenesis libraries, we develop methods to select optimal positions and corresponding sets of the mutations that will be used, in all combinations, in constructing a library for experimental evaluation. Our approach, OCoM (Optimization of Combinatorial Mutagenesis), encompasses both degenerate oligonucleotides and specified point mutations, and can be directed accordingly by requirements of experimental cost and library size. It evaluates the quality of the resulting library by one- and two-body sequence potentials, averaged over the variants. To ensure that it is not simply recapitulating extant sequences, it balances the quality of a library with an explicit evaluation of the novelty of its members. We show that, despite dealing with a combinatorial set of variants, in our approach the resulting library optimization problem is actually isomorphic to single-variant optimization. By the same token, this means that the two-body sequence potential results in an NP-hard optimization problem. We present an efficient dynamic programming algorithm for the one-body case and a practically-efficient integer programming approach for the general two-body case. We demonstrate the effectiveness of our approach in designing libraries for three different case study proteins targeted by previous combinatorial libraries--a green fluorescent protein, a cytochrome P450, and a beta lactamase. We found that OCoM worked quite efficiently in practice, requiring only 1 hour even for the massive design problem of selecting 18 mutations to generate 10? variants of a 443-residue P450. We demonstrate the general ability of OCoM in enabling the protein engineer to explore and evaluate trade-offs between quality and novelty as well as library construction technique, and identify optimal libraries for experimental evaluation. 相似文献
18.
Elena Taycher Andreas Rolfs Yanhui Hu Dongmei Zuo Stephanie E Mohr Janice Williamson Joshua LaBaer 《BMC bioinformatics》2007,8(1):198
Background
Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation. 相似文献19.
Optimal feed control for the fed-batch fermentation process of ethanol production is studied. Additional inequality constraints are introduced in this optimization problem to assure the optimal solution in a reality region. Introducing an updating rule of augmented Lagrange multipliers to handle these inequality constraints, iterative dynamic programming can be used in a straightforward manner for the optimization of fed-batch fermentors. To obtain more accurate solution a method of sequential quadratic programming can be used to solve this problem again. As a result of this optimal control, the maximum production at final time is very close to the theoretical yield. Although sequential quadratic programming can be rapid convergence to the optimal solution, but very good initial starting points has to be used to ensure obtaining the global optimum. Experimental works were used to validate this study. The simulated results could fit the experiments satisfactorily. 相似文献
20.
The Smith-Waterman algorithm for local sequence alignment is one of the most important techniques in computational molecular biology. This ingenious dynamic programming approach was designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the existing notion of local similarity has a serious flaw: it does not discard poorly conserved intermediate segments. The Smith-Waterman algorithm finds the local alignment with maximal score but it is unable to find local alignment with maximum degree of similarity (e.g. maximal percent of matches). Moreover, there is still no efficient algorithm that answers the following natural question: do two sequences share a (sufficiently long) fragment with more than 70% of similarity? As a result, the local alignment sometimes produces a mosaic of well-conserved fragments artificially connected by poorly-conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction as recently pointed out by Zhang et al. (Bioinformatics, 15, 1012-1019, 1999). In this paper we propose a new sequence comparison algorithm (normalized local alignment ) that reports the regions with maximum degree of similarity. The algorithm is based on fractional programming and its running time is O(n2log n). In practice, normalized local alignment is only 3-5 times slower than the standard Smith-Waterman algorithm. 相似文献