首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Finding the minimum energy amino acid side-chain conformation is a fundamental problem in both homology modeling and protein design. To address this issue, numerous computational algorithms have been proposed. However, there have been few quantitative comparisons between methods and there is very little general understanding of the types of problems that are appropriate for each algorithm. Here, we study four common search techniques: Monte Carlo (MC) and Monte Carlo plus quench (MCQ); genetic algorithms (GA); self-consistent mean field (SCMF); and dead-end elimination (DEE). Both SCMF and DEE are deterministic, and if DEE converges, it is guaranteed that its solution is the global minimum energy conformation (GMEC). This provides a means to compare the accuracy of SCMF and the stochastic methods. For the side-chain placement calculations, we find that DEE rapidly converges to the GMEC in all the test cases. The other algorithms converge on significantly incorrect solutions; the average fraction of incorrect rotamers for SCMF is 0.12, GA 0.09, and MCQ 0.05. For the protein design calculations, design positions are progressively added to the side-chain placement calculation until the time required for DEE diverges sharply. As the complexity of the problem increases, the accuracy of each method is determined so that the results can be extrapolated into the region where DEE is no longer tractable. We find that both SCMF and MCQ perform reasonably well on core calculations (fraction amino acids incorrect is SCMF 0.07, MCQ 0.04), but fail considerably on the boundary (SCMF 0.28, MCQ 0.32) and surface calculations (SCMF 0.37, MCQ 0.44).  相似文献   

2.
Dead-end elimination with backbone flexibility   总被引:1,自引:0,他引:1  
MOTIVATION: Dead-End Elimination (DEE) is a powerful algorithm capable of reducing the search space for structure-based protein design by a combinatorial factor. By using a fixed backbone template, a rotamer library, and a potential energy function, DEE identifies and prunes rotamer choices that are provably not part of the Global Minimum Energy Conformation (GMEC), effectively eliminating the majority of the conformations that must be subsequently enumerated to obtain the GMEC. Since a fixed-backbone model biases the algorithm predictions against protein sequences for which even small backbone movements may result in a significantly enhanced stability, the incorporation of backbone flexibility can improve the accuracy of the design predictions. If explicit backbone flexibility is incorporated into the model, however, the traditional DEE criteria can no longer guarantee that the flexible-backbone GMEC, the lowest-energy conformation when the backbone is allowed to flex, will not be pruned. RESULTS: We derive a novel DEE pruning criterion, flexible-backbone DEE (BD), that is provably accurate with backbone flexibility, guaranteeing that no rotamers belonging to the flexible-backbone GMEC are pruned; we also present further enhancements to BD for improved pruning efficiency. The results from applying our novel algorithms to redesign the beta1 domain of protein G and to switch the substrate specificity of the NRPS enzyme GrsA-PheA are then compared against the results from previous fixed-backbone DEE algorithms. We confirm experimentally that traditional-DEE is indeed not provably-accurate with backbone flexibility and that BD is capable of generating conformations with significantly lower energies, thus confirming the feasibility of our novel algorithms. AVAILABILITY: Contact authors for source code.  相似文献   

3.
A reduction-based exact algorithm for the contact map overlap problem.   总被引:1,自引:0,他引:1  
Aligning proteins based on their structural similarity is a fundamental problem in molecular biology with applications in many settings, including structure classification, database search, function prediction, and assessment of folding prediction methods. Structural alignment can be done via several methods, including contact map overlap (CMO) maximization that aligns proteins in a way that maximizes the number of common residue contacts. In this paper, we develop a reduction-based exact algorithm for the CMO problem. Our approach solves CMO directly rather than after transformation to other combinatorial optimization problems. We exploit the mathematical structure of the problem in order to develop a number of efficient lower bounding, upper bounding, and reduction schemes. Computational experiments demonstrate that our algorithm runs significantly faster than existing exact algorithms and solves some hard CMO instances that were not solved in the past. In addition, the algorithm produces protein clusters that are in excellent agreement with the SCOP classification. An implementation of our algorithm is accessible as an on-line server at http://eudoxus.scs.uiuc.edu/cmos/cmos.html.  相似文献   

4.
Mark A. Hallen 《Proteins》2019,87(1):62-73
Protein design algorithms must search an enormous conformational space to identify favorable conformations. As a result, those that perform this search with guarantees of accuracy generally start with a conformational pruning step, such as dead-end elimination (DEE). However, the mathematical assumptions of DEE-based pruning algorithms have up to now severely restricted the biophysical model that can feasibly be used in protein design. To lift these restrictions, I propose to prune local unrealistic geometries (PLUG) using a linear programming-based method. PLUG's biophysical model consists only of well-known lower bounds on interatomic distances. PLUG is intended as preprocessing for energy-based protein design calculations, whose biophysical model need not support DEE pruning. Based on 96 test cases, PLUG is at least as effective at pruning as DEE for larger protein designs—the type that most require pruning. When combined with the LUTE protein design algorithm, PLUG greatly facilitates designs that account for continuous entropy, large multistate designs with continuous flexibility, and designs with extensive continuous backbone flexibility and advanced nonpairwise energy functions. Many of these designs are tractable only with PLUG, either for empirical reasons (LUTE's machine learning step achieves an accurate fit only after PLUG pruning), or for theoretical reasons (many energy functions are fundamentally incompatible with DEE).  相似文献   

5.
Despite significant successes in structure‐based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest‐energy structures and sequences are found. DEE/A*‐based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap‐free list of low‐energy protein conformations, which is necessary for ensemble‐based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*‐based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs. Proteins 2015; 83:1859–1877. © 2015 Wiley Periodicals, Inc.  相似文献   

6.
The dead-end elimination (DEE) theorems are powerful tools for the combinatorial optimization of protein side-chain placement in protein design and homology modeling. In order to reach their full potential, the theorems must be extended to handle very hard problems. We present a suite of new algorithms within the DEE paradigm that significantly extend its range of convergence and reduce run time. As a demonstration, we show that a total protein design problem of 10(115) combinations, a hydrophobic core design problem of 10(244) combinations, and a side-chain placement problem of 10(1044) combinations are solved in less than two weeks, a day and a half, and an hour of CPU time, respectively. This extends the range of the method by approximately 53, 144 and 851 log-units, respectively, using modest computational resources. Small to average-sized protein domains can now be designed automatically, and side-chain placement calculations can be solved for nearly all sizes of proteins and protein complexes in the growing field of structural genomics.  相似文献   

7.
Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code.  相似文献   

8.
The maximum likelihood (ML) method of phylogenetic tree construction is not as widely used as other tree construction methods (e.g., parsimony, neighbor-joining) because of the prohibitive amount of time required to find the ML tree when the number of sequences under consideration is large. To overcome this difficulty, we propose a stochastic search strategy for estimation of the ML tree that is based on a simulated annealing algorithm. The algorithm works by moving through tree space by way of a "local rearrangement" strategy so that topologies that improve the likelihood are always accepted, whereas those that decrease the likelihood are accepted with a probability that is related to the proportionate decrease in likelihood. Besides greatly reducing the time required to estimate the ML tree, the stochastic search strategy is less likely to become trapped in local optima than are existing algorithms for ML tree estimation. We demonstrate the success of the modified simulated annealing algorithm by comparing it with two existing algorithms (Swofford's PAUP* and Felsenstein's DNAMLK) for several theoretical and real data examples.  相似文献   

9.
Tree search and its more complicated variant, tree search and simultaneous multiple DNA sequence alignment, are difficult NP-complete optimization problems, which require the application of advanced computational techniques, if large data sets are to be solved within reasonable computation times. Traditionally tree search has been attacked with a search strategy that is best described as multistart hill-climbing; local search by branch swapping has been performed on several different starting trees. Recently a different tree search strategy was tested in the Parsigal parsimony program, which used a combination of evolutionary optimization and local search. Evolutionary optimization algorithms use principles adopted from biological evolution to solve technical optimization tasks. Evolutionary optimization is a stochastic global search method, which means that the method is able to escape local optima, and is in principle able to produce any solution in the search space (although this may take a long time). Local search techniques, such as branch swapping, employ a completely different search strategy; they exploit local information maximally in order to achieve quick improvement in the value of the objective function. However, local search algorithms lack the ability to escape from local optima, which is a fundamental requirement for any search algorithm that aims to be able to discover the global optimum of a multimodal optimization problem. Hence it seems that an optimization strategy combining the good properties of both evolutionary algorithms and local search would be ideal. In this study, aspects of global optimization and local search are discussed, and the method of simulated evolutionary optimization is reviewed in detail. The application of simulated evolutionary optimization to tree search in Parsigal is then reviewed briefly.  相似文献   

10.
One bottleneck in NMR structure determination lies in the laborious and time-consuming process of side-chain resonance and NOE assignments. Compared to the well-studied backbone resonance assignment problem, automated side-chain resonance and NOE assignments are relatively less explored. Most NOE assignment algorithms require nearly complete side-chain resonance assignments from a series of through-bond experiments such as HCCH-TOCSY or HCCCONH. Unfortunately, these TOCSY experiments perform poorly on large proteins. To overcome this deficiency, we present a novel algorithm, called Nasca (NOE Assignment and Side-Chain Assignment), to automate both side-chain resonance and NOE assignments and to perform high-resolution protein structure determination in the absence of any explicit through-bond experiment to facilitate side-chain resonance assignment, such as HCCH-TOCSY. After casting the assignment problem into a Markov Random Field (MRF), Nasca extends and applies combinatorial protein design algorithms to compute optimal assignments that best interpret the NMR data. The MRF captures the contact map information of the protein derived from NOESY spectra, exploits the backbone structural information determined by RDCs, and considers all possible side-chain rotamers. The complexity of the combinatorial search is reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution. Then an A* search algorithm is employed to find a set of optimal side-chain resonance assignments that best fit the NMR data. These side-chain resonance assignments are then used to resolve the NOE assignment ambiguity and compute high-resolution protein structures. Tests on five proteins show that Nasca assigns resonances for more than 90% of side-chain protons, and achieves about 80% correct assignments. The final structures computed using the NOE distance restraints assigned by Nasca have backbone RMSD 0.8–1.5 Å from the reference structures determined by traditional NMR approaches.  相似文献   

11.
In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. The algorithm uses both score-based bounding and a novel bounding technique based on the "consistency" of the alignment. A sequence order independent search tree is used in conjunction with a technique for avoiding redundant calculations inherent in the structure of the tree. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed, for a short fixed motif width, the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program on a dataset of 300 E. coli promoter sequences and a dataset of 85 lipocalin protein sequences. For a motif width of 4, the optimal alignment of the entire set of sequences can be found. For the more natural motif width of 6, the program can align 21 sequences of length 100, more than twice the number of sequences which can be aligned by the best previous exact algorithm. The algorithm can relax the constraint of requiring each sequence to be aligned, and align 105 of the 300 promoter sequences with a motif width of 6. For the lipocalin dataset, we introduce a technique for reducing the effective alphabet size with a minimal loss of useful information. With this technique, we show that the program can find meaningful motifs in a reasonable amount of time by optimizing the score over three motif positions.  相似文献   

12.
A rapid heuristic algorithm for finding minimum evolution trees   总被引:2,自引:0,他引:2  
The minimum sum of branch lengths (S), or the minimum evolution (ME) principle, has been shown to be a good optimization criterion in phylogenetic inference. Unfortunately, the number of topologies to be analyzed is computationally prohibitive when a large number of taxa are involved. Therefore, simplified, heuristic methods, such as the neighbor-joining (NJ) method, are usually employed instead. The NJ method analyzes only a small number of trees (compared with the size of the entire search space); so, the tree obtained may not be the ME tree (for which the S value is minimum over the entire search space). Different compromises between very restrictive and exhaustive search spaces have been proposed recently. In particular, the "stepwise algorithm" (SA) utilizes what is known in computer science as the "beam search," whereas the NJ method employs a "greedy search." SA is virtually guaranteed to find the ME trees while being much faster than exhaustive search algorithms. In this study we propose an even faster method for finding the ME tree. The new algorithm adjusts its search exhaustiveness (from greedy to complete) according to the statistical reliability of the tree node being reconstructed. It is also virtually guaranteed to find the ME tree. The performances and computational efficiencies of ME, SA, NJ, and our new method were compared in extensive simulation studies. The new algorithm was found to perform practically as well as the SA (and, therefore, ME) methods and slightly better than the NJ method. For searching for the globally optimal ME tree, the new algorithm is significantly faster than existing ones, thus making it relatively practical for obtaining all trees with an S value equal to or smaller than that of the NJ tree, even when a large number of taxa is involved.  相似文献   

13.
This paper introduces a novel class of tree comparison problems strongly motivated by an important and cost intensive step in drug discovery pipeline viz., mapping cell bound receptors to the ligands they bind to and vice versa. Tree comparison studies motivated by problems such as virus-host tree comparison, gene-species tree comparison and consensus tree problem have been reported. None of these studies are applicable in our context because in all these problems, there is a well-defined mapping of the nodes the trees are built on across the set of trees being compared. A new class of tree comparison problems arises in cases where finding the correspondence among the nodes of the trees being compared is itself the problem. The problem arises while trying to find the interclass correspondence between the members of a pair of coevolving classes, e.g., cell bound receptors and their ligands. Given the evolution of the two classes, the combinatorial problem is to find a mapping among the leaves of the two trees that optimizes a given cost function. In this work we formulate various combinatorial optimization problems motivated by the aforementioned biological problem for the first time. We present hardness results, give an efficient algorithm for a restriction of the problem and demonstrate its applicability.  相似文献   

14.
MOTIVATION: Structure-based protein redesign can help engineer proteins with desired novel function. Improving computational efficiency while still maintaining the accuracy of the design predictions has been a major goal for protein design algorithms. The combinatorial nature of protein design results both from allowing residue mutations and from the incorporation of protein side-chain flexibility. Under the assumption that a single conformation can model protein folding and binding, the goal of many algorithms is the identification of the Global Minimum Energy Conformation (GMEC). A dominant theorem for the identification of the GMEC is Dead-End Elimination (DEE). DEE-based algorithms have proven capable of eliminating the majority of candidate conformations, while guaranteeing that only rotamers not belonging to the GMEC are pruned. However, when the protein design process incorporates rotameric energy minimization, DEE is no longer provably-accurate. Hence, with energy minimization, the minimized-DEE (MinDEE) criterion must be used instead. RESULTS: In this paper, we present provably-accurate improvements to both the DEE and MinDEE criteria. We show that our novel enhancements result in a speedup of up to a factor of more than 1000 when applied in redesign for three different proteins: Gramicidin Synthetase A, plastocyanin, and protein G. AVAILABILITY: Contact authors for source code.  相似文献   

15.
Computational protein and drug design generally require accurate modeling of protein conformations. This modeling typically starts with an experimentally determined protein structure and considers possible conformational changes due to mutations or new ligands. The DEE/A* algorithm provably finds the global minimum‐energy conformation (GMEC) of a protein assuming that the backbone does not move and the sidechains take on conformations from a set of discrete, experimentally observed conformations called rotamers. DEE/A* can efficiently find the overall GMEC for exponentially many mutant sequences. Previous improvements to DEE/A* include modeling ensembles of sidechain conformations and either continuous sidechain or backbone flexibility. We present a new algorithm, DEEPer (D ead‐E nd E limination with Per turbations), that combines these advantages and can also handle much more extensive backbone flexibility and backbone ensembles. DEEPer provably finds the GMEC or, if desired by the user, all conformations and sequences within a specified energy window of the GMEC. It includes the new abilities to handle arbitrarily large backbone perturbations and to generate ensembles of backbone conformations. It also incorporates the shear, an experimentally observed local backbone motion never before used in design. Additionally, we derive a new method to accelerate DEE/A*‐based calculations, indirect pruning, that is particularly useful for DEEPer. In 67 benchmark tests on 64 proteins, DEEPer consistently identified lower‐energy conformations than previous methods did, indicating more accurate modeling. Additional tests demonstrated its ability to incorporate larger, experimentally observed backbone conformational changes and to model realistic conformational ensembles. These capabilities provide significant advantages for modeling protein mutations and protein–ligand interactions. Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

16.
Venkatraman V  Ritchie DW 《Proteins》2012,80(9):2262-2274
Modeling conformational changes in protein docking calculations is challenging. To make the calculations tractable, most current docking algorithms typically treat proteins as rigid bodies and use soft scoring functions that implicitly accommodate some degree of flexibility. Alternatively, ensembles of structures generated from molecular dynamics (MD) may be cross-docked. However, such combinatorial approaches can produce many thousands or even millions of docking poses, and require fast and sensitive scoring functions to distinguish them. Here, we present a novel approach called "EigenHex," which is based on normal mode analyses (NMAs) of a simple elastic network model of protein flexibility. We initially assume that the proteins to be docked are rigid, and we begin by performing conventional soft docking using the Hex polar Fourier correlation algorithm. We then apply a pose-dependent NMA to each of the top 1000 rigid body docking solutions, and we sample and re-score multiple perturbed docking conformations generated from linear combinations of up to 20 eigenvectors using a multi-threaded particle swarm optimization algorithm. When applied to the 63 "rigid body" targets of the Protein Docking Benchmark version 2.0, our results show that sampling and re-scoring from just one to three eigenvectors gives a modest but consistent improvement for these targets. Thus, pose-dependent NMA avoids the need to sample multiple eigenvectors and it offers a promising alternative to combinatorial cross-docking.  相似文献   

17.
In protein–ligand docking, an optimization algorithm is used to find the best binding pose of a ligand against a protein target. This algorithm plays a vital role in determining the docking accuracy. To evaluate the relative performance of different optimization algorithms and provide guidance for real applications, we performed a comparative study on six efficient optimization algorithms, containing two evolutionary algorithm (EA)-based optimizers (LGA, DockDE) and four particle swarm optimization (PSO)-based optimizers (SODock, varCPSO, varCPSO-ls, FIPSDock), which were implemented into the protein–ligand docking program AutoDock. We unified the objective functions by applying the same scoring function, and built a new fitness accuracy as the evaluation criterion that incorporates optimization accuracy, robustness, and efficiency. The varCPSO and varCPSO-ls algorithms show high efficiency with fast convergence speed. However, their accuracy is not optimal, as they cannot reach very low energies. SODock has the highest accuracy and robustness. In addition, SODock shows good performance in efficiency when optimizing drug-like ligands with less than ten rotatable bonds. FIPSDock shows excellent robustness and is close to SODock in accuracy and efficiency. In general, the four PSO-based algorithms show superior performance than the two EA-based algorithms, especially for highly flexible ligands. Our method can be regarded as a reference for the validation of new optimization algorithms in protein–ligand docking.  相似文献   

18.
We describe an efficient algorithm for determining exactly the minimum number of sires consistent with the multi-locus genotypes of a mother and her progeny. We consider cases where a simple exhaustive search through all possible sets of sires is impossible in practice because it would take too long to complete. Our algorithm for solving this combinatorial optimization problem avoids visiting large parts of search space that would not result in a solution with fewer sires. This improvement is of particular importance when the number of allelic types in the progeny array is large and when the minimum number of sires is expected to be large. Precisely in such cases, it is important to know the minimum number of sires: this number gives an exact bound on the most likely number of sires estimated by a random search algorithm in a parameter region where it may be difficult to determine whether it has converged. We apply our algorithm to data from the marine snail, Littorina saxatilis.  相似文献   

19.
Developing predictive models of multi‐protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi‐protein expression space across a > 10,000‐fold range with tailored search parameters and well‐predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram‐positive and gram‐negative bacterial hosts. We then combined the search algorithm with system‐level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence‐expression‐activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate‐limiting steps in metabolism. Creating sequence‐expression‐activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs.  相似文献   

20.
The problem of protein structure prediction in the hydrophobic-polar (HP) lattice model is the prediction of protein tertiary structure. This problem is usually referred to as the protein folding problem. This paper presents a method for the application of an enhanced hybrid search algorithm to the problem of protein folding prediction, using the three dimensional (3D) HP lattice model. The enhanced hybrid search algorithm is a combination of the particle swarm optimizer (PSO) and tabu search (TS) algorithms. Since the PSO algorithm entraps local minimum in later evolution extremely easily, we combined PSO with the TS algorithm, which has properties of global optimization. Since the technologies of crossover and mutation are applied many times to PSO and TS algorithms, so enhanced hybrid search algorithm is called the MCMPSO-TS (multiple crossover and mutation PSO-TS) algorithm. Experimental results show that the MCMPSO-TS algorithm can find the best solutions so far for the listed benchmarks, which will help comparison with any future paper approach. Moreover, real protein sequences and Fibonacci sequences are verified in the 3D HP lattice model for the first time. Compared with the previous evolutionary algorithms, the new hybrid search algorithm is novel, and can be used effectively to predict 3D protein folding structure. With continuous development and changes in amino acids sequences, the new algorithm will also make a contribution to the study of new protein sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号