首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Multiple sequence alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm  相似文献   

2.
基于动态规划的快速序列比对算法   总被引:3,自引:0,他引:3  
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。  相似文献   

3.
The problem of finding an optimal structural alignment for a pair of superimposed proteins is often amenable to the Smith-Waterman dynamic programming algorithm, which runs in time proportional to the product of lengths of the sequences being aligned. While the quadratic running time is acceptable for computing a single alignment of two fixed protein structures, the time complexity becomes a bottleneck when running the Smith-Waterman routine multiple times in order to find a globally optimal superposition and alignment of the input proteins. We present a subquadratic running time algorithm capable of computing an alignment that optimizes one of the most widely used measures of protein structure similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. The algorithm presented in this article can be used to significantly improve the speed-accuracy tradeoff in a number of popular protein structure alignment methods.  相似文献   

4.
一个新的核酸序列比对算法及其在序列全局比对中的应用   总被引:1,自引:0,他引:1  
目前在序列比对中所广泛使用的动态规划算法,虽然能达到最优比对结果,但却由于具有高计算复杂度O(N_2)而极大地降低了计算效率。将多阶段动态规划决策算法用于两两序列比对并用Visual BASIC编程实现,结果发现该新算法在将计算复杂度减小到O(N)的同时,也能够获得较为理想的计算精度,预期将在序列全局比对中起重要作用。  相似文献   

5.
序列比对是生物信息学中的一项重要任务,通过序列比对可以发现生物序列中的功能、结构和进化的信息。序列比对结果的生物学意义与所选择的匹配、不匹配、插入和删除以及空隙的罚分函数密切相关。现介绍一种参数序列比对方法,该方法把最佳比对作为权值和罚分的函数,可以系统地得到参数的选择对最佳比对结果的影响。然后将其应用于RNA序列比对,分析不同的参数选择对序列比对结果的影响。最后指出参数序列比对算法的应用以及未来的发展方向。  相似文献   

6.
Fast, optimal alignment of three sequences using linear gap costs   总被引:2,自引:0,他引:2  
Alignment algorithms can be used to infer a relationship between sequences when the true relationship is unknown. Simple alignment algorithms use a cost function that gives a fixed cost to each possible point mutation-mismatch, deletion, insertion. These algorithms tend to find optimal alignments that have many small gaps. It is more biologically plausible to have fewer longer gaps rather than many small gaps in an alignment. To address this issue, linear gap cost algorithms are in common use for aligning biological sequence data. More reliable inferences are obtained by aligning more than two sequences at a time. The obvious dynamic programming algorithm for optimally aligning k sequences of length n runs in O(n(k)) time. This is impractical if k>/=3 and n is of any reasonable length. Thus, for this problem there are many heuristics for aligning k sequences, however, they are not guaranteed to find an optimal alignment. In this paper, we present a new algorithm guaranteed to find the optimal alignment for three sequences using linear gap costs. This gives the same results as the dynamic programming algorithm for three sequences, but typically does so much more quickly. It is particularly fast when the (three-way) edit distance is small. Our algorithm uses a speed-up technique based on Ukkonen's greedy algorithm (Ukkonen, 1983) which he presented for two sequences and simple costs.  相似文献   

7.
A major problem in sequence alignments based on the standard dynamic programming method is that the optimal path does not necessarily yield the best equivalencing of residues assessed by structural or functional criteria. An algorithm is presented that finds suboptimal alignments of protein sequences by a simple modification to the standard dynamic programming method. The standard pairwise weight matrix elements are modified in order to penalize, but not eliminate, the equivalencing of residues obtained from previous alignments. The algorithm thereby yields a limited set of alternate alignments that can differ considerably from the optimal. The approach is benchmarked on the alignments of immunoglobulin domains. Without a prior knowledge of the optimal choice of gap penalty, one of the suboptimal alignments is shown to be more accurate than the optimal.  相似文献   

8.
Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound” (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm.  相似文献   

9.
The major algorithms currently used for aligning biological sequences are those based on dynamic programming method. A dynamic programming algorithm consists of two major procedures, forward and traceback routines. This paper describes a dynamic programming algorithm for aligning three sequences at a time. Deletions and insertions are penalized according to their numbers and lengths. A forward process is accomplished in O(L3) computational steps, where L is the average sequence length. On the other hand, a traceback process is done in T steps, where T is the number of elementary configurations involved in the optimal alignment (usually T much less than L). The traceback procedure uses an effective technique for memory management, which is applicable to a wide range of sequence-matching methods.  相似文献   

10.
The regulation of cellular metabolism facilitates robust cellular operation in the face of changing external conditions. The cellular response to this varying environment may include the activation or inactivation of appropriate metabolic pathways. Experimental and numerical observations of sequential timing in pathway activation have been reported in the literature. It has been argued that such patterns can be rationalized by means of an underlying optimal metabolic design. In this paper we pose a dynamic optimization problem that accounts for time-resource minimization in pathway activation under constrained total enzyme abundance. The optimized variables are time-dependent enzyme concentrations that drive the pathway to a steady state characterized by a prescribed metabolic flux. The problem formulation addresses unbranched pathways with irreversible kinetics. Neither specific reaction kinetics nor fixed pathway length are assumed. In the optimal solution, each enzyme follows a switching profile between zero and maximum concentration, following a temporal sequence that matches the pathway topology. This result provides an analytic justification of the sequential activation previously described in the literature. In contrast with the existent numerical approaches, the activation sequence is proven to be optimal for a generic class of monomolecular kinetics. This class includes, but is not limited to, Mass Action, Michaelis–Menten, Hill, and some Power-law models. This suggests that sequential enzyme expression may be a common feature of metabolic regulation, as it is a robust property of optimal pathway activation.  相似文献   

11.
Dynamic programming algorithms that predict RNA secondary structure by minimizing the free energy have had one important limitation. They were able to predict only one optimal structure. Given the uncertainties of the thermodynamic data and the effects of proteins and other environmental factors on structure, the optimal structure predicted by these methods may not have biological significance. We present a dynamic programming algorithm that can determine optimal and suboptimal secondary structures for an RNA. The power and utility of the method is demonstrated in the folding of the intervening sequence of the rRNA of Tetrahymena. By first identifying the major secondary structures corresponding to the lowest free energy minima, a secondary structure of possible biological significance is derived.  相似文献   

12.
Vallat BK  Pillardy J  Elber R 《Proteins》2008,72(3):910-928
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank (PDB) a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is used to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50 and 100%) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6 A RMSD from the native structure) decays linearly as a function of the TM structural-alignment score.  相似文献   

13.
Dynamic programming algorithms for restriction map comparison   总被引:1,自引:0,他引:1  
For most sequence comparison problems there is a correspondingmap comparison algorithm. While map data may appear to be incompatiblewith dynamic programming, we show in this paper that the rigorand efficiency of dynamic programming algorithms carry overto the map comparison algorithms. We present algorithms forrestriction map comparison that deal with two types of map errors:(i) closely spaced sites for different enzymes can be orderedincorrectly, and (ii) closely spaced sites for the same enzymecan be mapped as a single site. The new algorithms are a naturalextension of a previous map comparison model. Dynamic programmingalgorithms for computing optimal global and local alignmentsunder the new model are described. The new algorithms take aboutthe same order of time as previous map comparison algorithms.Programs implementing some of the new algorithms are used tofind similar regions within the Escherichia coli restrictionmap of Kohara et al.  相似文献   

14.
A greedy algorithm for aligning DNA sequences.   总被引:39,自引:0,他引:39  
For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.  相似文献   

15.
A molecular sequence alignment algorithm based on dynamic programming has been extended to allow the computation of all pairs of residues that can be part of optimal and suboptimal sequence alignments. The uncertainties inherent in sequence alignment can be displayed using a new form of dot plot. The method allows the qualitative assessment of whether or not two sequences are related, and can reveal what parts of the alignment are better determined than others. It also permits the computation of representative optimal and suboptimal alignments. The relation between alignment reliability and alignment parameters is discussed. Other applications are to cyclical permutations of sequences and the detection of self-similarity. An application to multiple sequence alignment is noted.  相似文献   

16.
17.
Protein engineering by combinatorial site-directed mutagenesis evaluates a portion of the sequence space near a target protein, seeking variants with improved properties (e.g., stability, activity, immunogenicity). In order to improve the hit-rate of beneficial variants in such mutagenesis libraries, we develop methods to select optimal positions and corresponding sets of the mutations that will be used, in all combinations, in constructing a library for experimental evaluation. Our approach, OCoM (Optimization of Combinatorial Mutagenesis), encompasses both degenerate oligonucleotides and specified point mutations, and can be directed accordingly by requirements of experimental cost and library size. It evaluates the quality of the resulting library by one- and two-body sequence potentials, averaged over the variants. To ensure that it is not simply recapitulating extant sequences, it balances the quality of a library with an explicit evaluation of the novelty of its members. We show that, despite dealing with a combinatorial set of variants, in our approach the resulting library optimization problem is actually isomorphic to single-variant optimization. By the same token, this means that the two-body sequence potential results in an NP-hard optimization problem. We present an efficient dynamic programming algorithm for the one-body case and a practically-efficient integer programming approach for the general two-body case. We demonstrate the effectiveness of our approach in designing libraries for three different case study proteins targeted by previous combinatorial libraries--a green fluorescent protein, a cytochrome P450, and a beta lactamase. We found that OCoM worked quite efficiently in practice, requiring only 1 hour even for the massive design problem of selecting 18 mutations to generate 10? variants of a 443-residue P450. We demonstrate the general ability of OCoM in enabling the protein engineer to explore and evaluate trade-offs between quality and novelty as well as library construction technique, and identify optimal libraries for experimental evaluation.  相似文献   

18.

Background  

Whereas the molecular assembly of protein expression clones is readily automated and routinely accomplished in high throughput, sequence verification of these clones is still largely performed manually, an arduous and time consuming process. The ultimate goal of validation is to determine if a given plasmid clone matches its reference sequence sufficiently to be "acceptable" for use in protein expression experiments. Given the accelerating increase in availability of tens of thousands of unverified clones, there is a strong demand for rapid, efficient and accurate software that automates clone validation.  相似文献   

19.
Optimal feed control for the fed-batch fermentation process of ethanol production is studied. Additional inequality constraints are introduced in this optimization problem to assure the optimal solution in a reality region. Introducing an updating rule of augmented Lagrange multipliers to handle these inequality constraints, iterative dynamic programming can be used in a straightforward manner for the optimization of fed-batch fermentors. To obtain more accurate solution a method of sequential quadratic programming can be used to solve this problem again. As a result of this optimal control, the maximum production at final time is very close to the theoretical yield. Although sequential quadratic programming can be rapid convergence to the optimal solution, but very good initial starting points has to be used to ensure obtaining the global optimum. Experimental works were used to validate this study. The simulated results could fit the experiments satisfactorily.  相似文献   

20.
A new approach to sequence comparison: normalized sequence alignment   总被引:3,自引:0,他引:3  
The Smith-Waterman algorithm for local sequence alignment is one of the most important techniques in computational molecular biology. This ingenious dynamic programming approach was designed to reveal the highly conserved fragments by discarding poorly conserved initial and terminal segments. However, the existing notion of local similarity has a serious flaw: it does not discard poorly conserved intermediate segments. The Smith-Waterman algorithm finds the local alignment with maximal score but it is unable to find local alignment with maximum degree of similarity (e.g. maximal percent of matches). Moreover, there is still no efficient algorithm that answers the following natural question: do two sequences share a (sufficiently long) fragment with more than 70% of similarity? As a result, the local alignment sometimes produces a mosaic of well-conserved fragments artificially connected by poorly-conserved or even unrelated fragments. This may lead to problems in comparison of long genomic sequences and comparative gene prediction as recently pointed out by Zhang et al. (Bioinformatics, 15, 1012-1019, 1999). In this paper we propose a new sequence comparison algorithm (normalized local alignment ) that reports the regions with maximum degree of similarity. The algorithm is based on fractional programming and its running time is O(n2log n). In practice, normalized local alignment is only 3-5 times slower than the standard Smith-Waterman algorithm.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号