首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于动态规划的快速序列比对算法   总被引:3,自引:0,他引:3  
序列比对算法是生物信息学中重要的研究方向之一,而动态规划法是序列比对算法中最有效最基本的方法.由于原有的基本动态规划方法时间和空间复杂度大,不适合实际的生物序列比对,因此本文在分析介绍几种相关动态规划算法的基础上,提出了一种基于动态规划的快速序列比对算法UKK_FA.实验结果表明,该算法有效地降低了时间复杂度,具有一定的实用性。  相似文献   

2.
Multiple sequence alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm  相似文献   

3.
An artificial neural network with a two-layer feedback topology and generalized recurrent neurons, for solving nonlinear discrete dynamic optimization problems, is developed. A direct method to assign the weights of neural networks is presented. The method is based on Bellmann's Optimality Principle and on the interchange of information which occurs during the synaptic chemical processing among neurons. The neural network based algorithm is an advantageous approach for dynamic programming due to the inherent parallelism of the neural networks; further it reduces the severity of computational problems that can occur in methods like conventional methods. Some illustrative application examples are presented to show how this approach works out including the shortest path and fuzzy decision making problems.  相似文献   

4.
Finding signals for plant promoters   总被引:2,自引:0,他引:2  
The strongest signal of plant promoter is searched with the model of single motif with two types. It turns out that the dominant type is the TATA-box. The other type may be called TATA-less signal, and may be used in gene finders for promoter recognition. While the TATA signals are very close for the monocot and the dicot, their TATA-less signals are significantly different. A general and flexible multi-motif model is also proposed for promoter analysis based on dynamic programming. By extending the Gibbs sampler to the dynamic programming and introducing temperature, an efficient algorithm is developed for searching signals in plant promoters.  相似文献   

5.
The strongest signal of plant promoter is searched with the model of single motif with two types. It turns out that the dominant type is the TATA-box. The other type may be called TATA-less signal, and may be used in gene finders for promoter recognition. While the TATA signals are very close for the monocot and the dicot, their TATA-less signals are significantly different. A general and flexible multi-motif model is also proposed for promoter analysis based on dynamic programming. By extending the Gibbs sampler to the dynamic programming and introducing temperature, an efficient algorithm is developed for searching signals in plant promoters.  相似文献   

6.
Zhou H  Zhou Y 《Proteins》2004,55(4):1005-1013
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.  相似文献   

7.
The major algorithms currently used for aligning biological sequences are those based on dynamic programming method. A dynamic programming algorithm consists of two major procedures, forward and traceback routines. This paper describes a dynamic programming algorithm for aligning three sequences at a time. Deletions and insertions are penalized according to their numbers and lengths. A forward process is accomplished in O(L3) computational steps, where L is the average sequence length. On the other hand, a traceback process is done in T steps, where T is the number of elementary configurations involved in the optimal alignment (usually T much less than L). The traceback procedure uses an effective technique for memory management, which is applicable to a wide range of sequence-matching methods.  相似文献   

8.
BCL::Align is a multiple sequence alignment tool that utilizes the dynamic programming method in combination with a customizable scoring function for sequence alignment and fold recognition. The scoring function is a weighted sum of the traditional PAM and BLOSUM scoring matrices, position-specific scoring matrices output by PSI-BLAST, secondary structure predicted by a variety of methods, chemical properties, and gap penalties. By adjusting the weights, the method can be tailored for fold recognition or sequence alignment tasks at different levels of sequence identity. A Monte Carlo algorithm was used to determine optimized weight sets for sequence alignment and fold recognition that most accurately reproduced the SABmark reference alignment test set. In an evaluation of sequence alignment performance, BCL::Align ranked best in alignment accuracy (Cline score of 22.90 for sequences in the Twilight Zone) when compared with Align-m, ClustalW, T-Coffee, and MUSCLE. ROC curve analysis indicates BCL::Align's ability to correctly recognize protein folds with over 80% accuracy. The flexibility of the program allows it to be optimized for specific classes of proteins (e.g. membrane proteins) or fold families (e.g. TIM-barrel proteins). BCL::Align is free for academic use and available online at http://www.meilerlab.org/.  相似文献   

9.
Dong E  Smith J  Heinze S  Alexander N  Meiler J 《Gene》2008,422(1-2):41-46
BCL::Align is a multiple sequence alignment tool that utilizes the dynamic programming method in combination with a customizable scoring function for sequence alignment and fold recognition. The scoring function is a weighted sum of the traditional PAM and BLOSUM scoring matrices, position-specific scoring matrices output by PSI-BLAST, secondary structure predicted by a variety of methods, chemical properties, and gap penalties. By adjusting the weights, the method can be tailored for fold recognition or sequence alignment tasks at different levels of sequence identity. A Monte Carlo algorithm was used to determine optimized weight sets for sequence alignment and fold recognition that most accurately reproduced the SABmark reference alignment test set. In an evaluation of sequence alignment performance, BCL::Align ranked best in alignment accuracy (Cline score of 22.90 for sequences in the Twilight Zone) when compared with Align-m, ClustalW, T-Coffee, and MUSCLE. ROC curve analysis indicates BCL::Align's ability to correctly recognize protein folds with over 80% accuracy. The flexibility of the program allows it to be optimized for specific classes of proteins (e.g. membrane proteins) or fold families (e.g. TIM-barrel proteins). BCL::Align is free for academic use and available online at http://www.meilerlab.org/.  相似文献   

10.
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.  相似文献   

11.
《Journal of biotechnology》1999,67(2-3):173-187
An evolutionary program, based on a real-code genetic algorithm (GA), is applied to calculate optimal control policies for bioreactors. The GA is used as a nonlinear optimizer in combination with simulation software and constraint handling procedures. A new class of GA-operators is introduced to obtain smooth control trajectories, which leads also to a drastic reduction in computational load. The proposed method is easy to understand and has no restrictions on the model type and structure. The performance and optimal trajectories obtained by the extended GA are compared with those calculated with two common methods: (i) dynamic programming, and (ii) a Hamiltonian based gradient algorithm. The GA proved to be a good and often superior alternative for solving optimal control problems.  相似文献   

12.
Kim D  Xu D  Guo JT  Ellrott K  Xu Y 《Protein engineering》2003,16(9):641-650
A new method for fold recognition is developed and added to the general protein structure prediction package PROSPECT (http://compbio.ornl.gov/PROSPECT/). The new method (PROSPECT II) has four key features. (i) We have developed an efficient way to utilize the evolutionary information for evaluating the threading potentials including singleton and pairwise energies. (ii) We have developed a two-stage threading strategy: (a) threading using dynamic programming without considering the pairwise energy and (b) fold recognition considering all the energy terms, including the pairwise energy calculated from the dynamic programming threading alignments. (iii) We have developed a combined z-score scheme for fold recognition, which takes into consideration the z-scores of each energy term. (iv) Based on the z-scores, we have developed a confidence index, which measures the reliability of a prediction and a possible structure-function relationship based on a statistical analysis of a large data set consisting of threadings of 600 query proteins against the entire FSSP templates. Tests on several benchmark sets indicate that the evolutionary information and other new features of PROSPECT II greatly improve the alignment accuracy. We also demonstrate that the performance of PROSPECT II on fold recognition is significantly better than any other method available at all levels of similarity. Improvement in the sensitivity of the fold recognition, especially at the superfamily and fold levels, makes PROSPECT II a reliable and fully automated protein structure and function prediction program for genome-scale applications.  相似文献   

13.
In this paper, an unsupervised learning algorithm is developed. Two versions of an artificial neural network, termed a differentiator, are described. It is shown that our algorithm is a dynamic variation of the competitive learning found in most unsupervised learning systems. These systems are frequently used for solving certain pattern recognition tasks such as pattern classification and k-means clustering. Using computer simulation, it is shown that dynamic competitive learning outperforms simple competitive learning methods in solving cluster detection and centroid estimation problems. The simulation results demonstrate that high quality clusters are detected by our method in a short training time. Either a distortion function or the minimum spanning tree method of clustering is used to verify the clustering results. By taking full advantage of all the information presented in the course of training in the differentiator, we demonstrate a powerful adaptive system capable of learning continuously changing patterns.  相似文献   

14.
Dynamic programming algorithms for restriction map comparison   总被引:1,自引:0,他引:1  
For most sequence comparison problems there is a correspondingmap comparison algorithm. While map data may appear to be incompatiblewith dynamic programming, we show in this paper that the rigorand efficiency of dynamic programming algorithms carry overto the map comparison algorithms. We present algorithms forrestriction map comparison that deal with two types of map errors:(i) closely spaced sites for different enzymes can be orderedincorrectly, and (ii) closely spaced sites for the same enzymecan be mapped as a single site. The new algorithms are a naturalextension of a previous map comparison model. Dynamic programmingalgorithms for computing optimal global and local alignmentsunder the new model are described. The new algorithms take aboutthe same order of time as previous map comparison algorithms.Programs implementing some of the new algorithms are used tofind similar regions within the Escherichia coli restrictionmap of Kohara et al.  相似文献   

15.
Protein sequence alignment has become an essential task in modern molecular biology research. A number of alignment techniques have been documented in literature and their corresponding tools are made available as freeware and commercial software. The choice and use of these tools for sequence alignment through the complete interpretation of alignment results is often considered non-trivial by end-users with limited skill in Bioinformatics algorithm development. Here, we discuss the comparison of sequence alignment techniques based on dynamic programming (N-W, S-W) and heuristics (LFASTA, BL2SEQ) for four sets of sequence data towards an educational purpose. The analysis suggests that heuristics based methods are faster than dynamic programming methods in alignment speed.  相似文献   

16.
ABSTRACT: BACKGROUND: The estimation of parameter values for mathematical models of biological systems is an optimization problem that is particularly challenging due to the nonlinearities involved. One major difficulty is the existence of multiple minima in which standard optimization methods may fall during the search. Deterministic global optimization methods overcome this limitation, ensuring convergence to the global optimum within a desired tolerance. Global optimization techniques are typically classified into stochastic and deterministic. The former typically lead to lower CPU times but offer no guarantee of convergence to the global minimum in a finite number of iterations. In contrast, deterministic methods provide solutions of a given quality (i.e., optimality gap), but tend to lead to large computational burdens. RESULTS: This work presents a deterministic outer approximation-based algorithm for the global optimization of dynamic problems arising in the parameter estimation of models of biological systems. Our approach, which offers a theoretical guarantee of convergence to the global minimum, reformulating the set of ordinary differential equations into an equivalent set of algebraic equations through the use of orthogonal collocation methods, giving rise to a nonconvex nonlinear programming (NLP) problem. This nonconvex NLP is decomposed into two hierarchical levels: a master mixed-integer linear programming problem (MILP) that provides a rigorous lower bound on the optimal solution, and a reduced-space slave NLP that yields an upper bound. The algorithm iterates between these two levels until a termination criterion is satisfied. CONCLUSION: The capabilities of our approach were tested in two benchmark problems, in which the performance of our algorithm was compared with that of the commercial global optimization package BARON. The proposed strategy produced near optimal solutions (i.e., within a desired tolerance) in a fraction of the CPU time required by BARON.  相似文献   

17.
S Vajda  C Delisi 《Biopolymers》1990,29(14):1755-1772
A combinatorial optimization approach is used for solving the multiple-minima problem when determining the low-energy conformations of short polypeptides. Each residue is represented by a finite number of discrete states corresponding to single residue local minima of the energy function. These precomputed values constitute a search table and define the conformational space for discrete minimization by a generalized dynamic programming algorithm that significantly limits the number of intermediate conformations to be generated during the search. Since dynamic programming involves stagewise decisions, it results in buildup-type procedures implemented in two different forms. The first procedure predicts a number of conformations by a completely discrete search and these are subsequently refined by local minimization. The second involves limited continuous local minimization within the combinatorial algorithm, generally restricted to two dihedral angles in a buildup step. Both procedures are tested on 17 short peptides previously studied by other global minimization methods but involving the same potential energy function. The discrete method is extremely fast, but proves to be successful only in 14 of the 17 test problems. The version with limited local minimization finds, however, conformations in all the 17 examples that are close to the ones previously presented in the literature or have lower energies. In addition, results are almost independent of the cutoff energy, the most important parameter governing the search. Although the limited local minimization increases the number of energy evaluations, the method still offers substantial advantages in speed.  相似文献   

18.
A major problem in sequence alignments based on the standard dynamic programming method is that the optimal path does not necessarily yield the best equivalencing of residues assessed by structural or functional criteria. An algorithm is presented that finds suboptimal alignments of protein sequences by a simple modification to the standard dynamic programming method. The standard pairwise weight matrix elements are modified in order to penalize, but not eliminate, the equivalencing of residues obtained from previous alignments. The algorithm thereby yields a limited set of alternate alignments that can differ considerably from the optimal. The approach is benchmarked on the alignments of immunoglobulin domains. Without a prior knowledge of the optimal choice of gap penalty, one of the suboptimal alignments is shown to be more accurate than the optimal.  相似文献   

19.
This article discusses the problem of unloading a sequence of boxes from a single conveyor line with a minimum number of moves. The problem under study is efficiently solvable with dynamic programming if the complete sequence of boxes is known in advance. In practice, however, the problem typically occurs in a real-time setting where the boxes are simultaneously placed on and picked from the conveyor line. Moreover, a large part of the sequence is often not visible. As a result, only a part of the sequence is known when deciding which boxes to move next. We develop an online algorithm that evaluates the quality of each possible move with a scenario-based stochastic method. Two versions of the algorithm are analyzed: in one version, the quality of each scenario is measured with an exact method, while a heuristic technique is applied in the second version. We evaluate the performance of the proposed algorithms using extensive computational experiments and establish a simple policy for determining which version to choose for specific problems. Numerical results show that the proposed approach consistently provides high-quality results, and compares favorably with the best known deterministic online algorithms. Indeed, the new approach typically provides results with relative gaps of 1–5% to the optimum, which is about 20–80% lower than those obtained with the best deterministic approach.  相似文献   

20.
MOTIVATION: Missing data in genotyping single nucleotide polymorphism (SNP) spots are common. High-throughput genotyping methods usually have a high rate of missing data. For example, the published human chromosome 21 data by Patil et al. contains about 20% missing SNPs. Inferring missing SNPs using the haplotype block structure is promising but difficult because the haplotype block boundaries are not well defined. Here we propose a global algorithm to overcome this difficulty. RESULTS: First, we propose to use entropy as a measure of haplotype diversity. We show that the entropy measure combined with a dynamic programming algorithm produces better haplotype block partitions than other measures. Second, based on the entropy measure, we propose a two-step iterative partition-inference algorithm for the inference of missing SNPs. At the first step, we apply the dynamic programming algorithm to partition haplotypes into blocks. At the second step, we use an iterative process similar to the expectation-maximization algorithm to infer missing SNPs in each haplotype block so as to minimize the block entropy. The algorithm iterates these two steps until the total block entropy is minimized. We test our algorithm in several experimental data sets. The results show that the global approach significantly improves the accuracy of the inference. AVAILABILITY: Upon request.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号