共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a novel linear programming approach to do protein 3-dimensional (3D) structure prediction via threading. Based on the contact map graph of the protein 3D structure template, the protein threading problem is formulated as a large scale integer programming (IP) problem. The IP formulation is then relaxed to a linear programming (LP) problem, and then solved by the canonical branch-and-bound method. The final solution is globally optimal with respect to energy functions. In particular, our energy function includes pairwise interaction preferences and allowing variable gaps which are two key factors in making the protein threading problem NP-hard. A surprising result is that, most of the time, the relaxed linear programs generate integral solutions directly. Our algorithm has been implemented as a software package RAPTOR-RApid Protein Threading by Operation Research technique. Large scale benchmark test for fold recognition shows that RAPTOR significantly outperforms other programs at the fold similarity level. The CAFASP3 evaluation, a blind and public test by the protein structure prediction community, ranks RAPTOR as top 1, among individual prediction servers, in terms of the recognition capability and alignment accuracy for Fold Recognition (FR) family targets. RAPTOR also performs very well in recognizing the hard Homology Modeling (HM) targets. RAPTOR was implemented at the University of Waterloo and it can be accessed at http://www.cs.uwaterloo.ca/~j3xu/RAPTOR_form.htm. 相似文献
2.
In the present paper, we describe how a directed graph was constructed and then searched for the optimum path using a dynamic programming approach, based on the secondary structure propensity of the protein short sequence derived from a training data set. The protein secondary structure was thus predicted in this way. The average three-state accuracy of the algorithm used was 76.70%. 相似文献
3.
Taylor WR 《Protein science : a publication of the Protein Society》1999,8(3):654-665
A protein structure comparison method is described that allows the generation of large populations of high-scoring alternate alignments. This was achieved by incorporating a random element into an iterative double dynamic programming algorithm. The maximum scores from repeated comparisons of a pair of structures converged on a value that was taken as the global maximum. This lay 15% over the score obtained from the single fixed (unrandomized) calculation. The effect of the gap penalty was observed through the shift of the alignment populations, characterized by their alignment length and root-mean-square deviation (RMSD). The best (lowest RMSD) values found in these populations provided a base-line against which other methods were compared. 相似文献
4.
Classification of newly determined protein structures is important in understanding their function and mechanism of action. Currently available methods employ a global structure alignment strategy and are computationally expensive. We propose a two-step methodology with a quick screen to significantly reduce the number of candidate structures followed by global structure alignment of the query structure with the reduced set. We represent a protein structure as a sequence of local structures, codified in the form of geometric invariants. Geometric invariants are quantities that remain unchanged under transformations such as translation and rotation. Protein structures represented as multi-attribute sequences are aligned via dynamic programming to identify close neighbors of the query structure. The query structure is then compared with this reduced dataset using conventional structure comparison methods to predict its functional class. For a typical protein structure, the screening method was able to reduce the protein data bank to mere 200 proteins while preserving structurally closest neighbor in the reduced set. This has resulted in 30 to 60 fold improvement in the execution time. We present the results of leave-one-out classification experiment on ASTRAL-95 domains and comparison with SCOP classification hierarchy. 相似文献
5.
Multiple sequence alignments are a routine tool in protein fold recognition, but multiple structure alignments are computationally less cooperative. This work describes a method for protein sequence threading and sequence-to-structure alignments that uses multiple aligned structures, the aim being to improve models from protein threading calculations. Sequences are aligned into a field due to corresponding sites in homologous proteins. On the basis of a test set of more than 570 protein pairs, the procedure does improve alignment quality, although no more than averaging over sequences. For the force field tested, the benefit of structure averaging is smaller than that of adding sequence similarity terms or a contribution from secondary structure predictions. Although there is a significant improvement in the quality of sequence-to-structure alignments, this does not directly translate to an immediate improvement in fold recognition capability. 相似文献
6.
7.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc. 相似文献
8.
Xu Y Xu D Crawford OH Einstein Larimer F Uberbacher E Unseren MA Zhang G 《Protein engineering》1999,12(11):899-907
We present an analysis of the protein fold recognition experiment using PROSPECT in The Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP3). PROSPECT is a computer program we have recently developed for finding an optimal alignment between a protein sequence and a protein structural fold. Two unique features of PROSPECT are (a) that it guarantees to find the globally optimal sequence-structure alignment and does so in an efficient manner, when the alignment-scoring function consists of three additive terms: (i) a singleton fitness term, (ii) a pairwise contact preference term between residues that are spatially close (=15 A between their beta-carbons) and (iii) an alignment gap penalty; and (b) that it guarantees to find the globally-optimal alignment under various constraints on the unknown protein specified by the user. In the CASP3 experiment, PROSPECT correctly identified the most similar folds for 11 targets and predicted closely-similar folds for five other targets among the 23 targets which can be classified into the category of fold-recognition problems and also had their experimentally-determined structures available. Among the 11 correctly identified folds, PROSPECT obtained good sequence-structure alignments for nine of them. On three of the five ab initio prediction problems, PROSPECT successfully located partial structures from our template library, which align accurately with the corresponding targets. 相似文献
9.
A model-free deconvolution method is proposed for evaluating the frequency distribution function of organ transit times. The deconvolution is treated as a nonlinear constrained optimization problem and it is solved by using a modified constrained variable metric approach. The only constraint implemented in the solution is that all the discrete transport function values are not allowed to become negative. The method is tested on model mathematical systems of known analytical transport functions. The tests are performed on systems that included noise in both the input and output functions. The criteria of successful deconvolution are the reconvolution error and, most importantly, the deviation of the computed transport function from the known analytical one. The proposed method is then applied, as a pilot experiment, to biological data obtained from an isolated, perfused rabbit lung preparation contained within a plethysmograph. The results indicate that this type of deconvolution produces stable estimates which faithfully follow the analytical function while negating the need to assume either any functional form for the behavior of the transport function or any educated initial guess of its values. 相似文献
10.
Protein threading using PROSPECT: design and evaluation 总被引:14,自引:0,他引:14
The computer system PROSPECT for the protein fold recognition using the threading method is described and evaluated in this article. For a given target protein sequence and a template structure, PROSPECT guarantees to find a globally optimal threading alignment between the two. The scoring function for a threading alignment employed in PROSPECT consists of four additive terms: i) a mutation term, ii) a singleton fitness term, iii) a pairwise-contact potential term, and iv) alignment gap penalties. The current version of PROSPECT considers pair contacts only between core (alpha-helix or beta-strand) residues and alignment gaps only in loop regions. PROSPECT finds a globally optimal threading efficiently when pairwise contacts are considered only between residues that are spatially close (7 A or less between the C(beta) atoms in the current implementation). On a test set consisting of 137 pairs of target-template proteins, each pair being from the same superfamily and having sequence identity = 30%, PROSPECT recognizes 69% of the templates correctly and aligns 66% of the structurally alignable residues correctly. These numbers may be compared with the 55% fold recognition and 64% alignment accuracy for the same test set using only scoring terms i), ii), and (iv), indicating the significant contribution from the contact term. The fold recognition and alignment accuracy are further improved to 72% and 74%, respectively, when the secondary structure information predicted by the PHD program is used in scoring. PROSPECT also allows a user to incorporate constraints about a target protein, e.g., disulfide bonds, active sites, and NOE distance restraints, into the threading process. The system rigorously finds a globally optimal threading under the specified constraints. Test results have shown that the constraints can further improve the performance of PROSPECT. Proteins 2000;40:343-354. 相似文献
11.
Linear programming optimization and a double statistical filter for protein threading protocols 总被引:4,自引:0,他引:4
The design of scoring functions (or potentials) for threading, differentiating native-like from non-native structures with a limited computational cost, is an active field of research. We revisit two widely used families of threading potentials: the pairwise and profile models. To design optimal scoring functions we use linear programming (LP). The LP protocol makes it possible to measure the difficulty of a particular training set in conjunction with a specific form of the scoring function. Gapless threading demonstrates that pair potentials have larger prediction capacity compared with profile energies. However, alignments with gaps are easier to compute with profile potentials. We therefore search and propose a new profile model with comparable prediction capacity to contact potentials. A protocol to determine optimal energy parameters for gaps, using LP, is also presented. A statistical test, based on a combination of local and global Z-scores, is employed to filter out false-positives. Extensive tests of the new protocol are presented. The new model provides an efficient alternative for threading with pair energies, maintaining comparable accuracy. The code, databases, and a prediction server are available at http://www.tc.cornell.edu/CBIO/loopp. 相似文献
12.
Luus R 《Biotechnology and bioengineering》1993,41(5):599-602
By using penalty functions to handle state constraints, iterative dynamic programming can be used in a straightforward manner for the optimization of fedbatch fermentors. No computational difficulties were encountered and better results are obtained than previously reported in the literature for a fed-batch fermentor for biosynthesis of penicillin. (c) 1993 Johy Wiley & Sons, Inc. 相似文献
13.
Ong SL 《Biotechnology and bioengineering》1986,28(6):818-823
This article concerns the development of a simple yet effective procedure for optimizing the design of a reactor system employing CSTRs in series. The basic approach used in this work was to translate the problem of reactor design to a mathematical programming model. The resulting model was then solved by dynamic programming. The procedure was tested on an IBM 3033 computer and an IBM PC-compatible machine, the CORONA PC-II microcomputer. The results of this study indicate that the optimization procedure developed is very effective. 相似文献
14.
Dukka BK Tomita E Suzuki J Horimoto K Akutsu T 《Journal of bioinformatics and computational biology》2006,4(1):19-42
With the advent of experimental technologies like chemical cross-linking, it has become possible to obtain distances between specific residues of a newly sequenced protein. These types of experiments usually are less time consuming than X-ray crystallography or NMR. Consequently, it is highly desired to develop a method that incorporates this distance information to improve the performance of protein threading methods. However, protein threading with profiles in which constraints on distances between residues are given is known to be NP-hard. By using the notion of a maximum edge-weight clique finding algorithm, we introduce a more efficient method called FTHREAD for profile threading with distance constraints that is 18 times faster than its predecessor CLIQUETHREAD. Moreover, we also present a novel practical algorithm NTHREAD for profile threading with Non-strict constraints. The overall performance of FTHREAD on a data set shows that although our algorithm uses a simple threading function, our algorithm performs equally well as some of the existing methods. Particularly, when there are some unsatisfied constraints, NTHREAD (Non-strict constraints threading algorithm) performs better than threading with FTHREAD (Strict constraints threading algorithm). We have also analyzed the effects of using a number of distance constraints. This algorithm helps the enhancement of alignment quality between the query sequence and template structure, once the corresponding template structure is determined for the target sequence. 相似文献
15.
A combinatorial optimization approach is used for solving the multiple-minima problem when determining the low-energy conformations of short polypeptides. Each residue is represented by a finite number of discrete states corresponding to single residue local minima of the energy function. These precomputed values constitute a search table and define the conformational space for discrete minimization by a generalized dynamic programming algorithm that significantly limits the number of intermediate conformations to be generated during the search. Since dynamic programming involves stagewise decisions, it results in buildup-type procedures implemented in two different forms. The first procedure predicts a number of conformations by a completely discrete search and these are subsequently refined by local minimization. The second involves limited continuous local minimization within the combinatorial algorithm, generally restricted to two dihedral angles in a buildup step. Both procedures are tested on 17 short peptides previously studied by other global minimization methods but involving the same potential energy function. The discrete method is extremely fast, but proves to be successful only in 14 of the 17 test problems. The version with limited local minimization finds, however, conformations in all the 17 examples that are close to the ones previously presented in the literature or have lower energies. In addition, results are almost independent of the cutoff energy, the most important parameter governing the search. Although the limited local minimization increases the number of energy evaluations, the method still offers substantial advantages in speed. 相似文献
16.
Summary An algorithm for the optimization of a fermentation process was studied using the combination of dynamic programming and linear predictive procedure by regression analysis. It was applied to the fed-batch culture for glutamic acid production with ethanol feeding, the results of which proved that it was effective for the optimization problems.Nomenclature
a
i
, b
i
, c
i
, d
i
;i = 1 2 or 3:
Partial regression coefficients derived by multiple regression analysis ( - )
-
E :
Euclid distance ( - )
-
F :
Volumetric feed rate (l/hr)
-
f :
Defined by Eq. (13) (g-glutamic acid)
-
G :
Concentration of glutamic acid (g/l)
-
G* :
Aeration rate (l/hr)
-
J :
Objective function defined by Eq. (12) (g-glutamic acid)
-
N :
Number of stage ( - )
-
P :
Defined by Eq. (14) (g-glutamic acid)
-
Q :
Metabolic activity of the culture, in this case Q = Q
CO
2 (mole CO2/g-cell hr)
-
S :
Ethanol concentration in culture broth (g/l)
-
S
Q
:
Ethanol concentration in feed (g/l)
-
S* :
Ethanol concentration in effluent gas (g/l)
-
t :
Culture time (hr)
-
t
o
:
Initial culture time (hr)
-
V :
Volume of culture broth (l)
-
X :
Cell concentration (g/l)
-
X :
State vector (X, S, Q, V)
- :
Rate of Q
CO
2 change (mole CO2/g-cell·h2)
- :
Specific growth rate of microorganisms (hr–1)
- :
Specific production rate of glutamic acid (g-glutamic acid/g-cell·hr)
- :
Specific consumption rate of ethanol (g-ethanol/g-cell·hr)
- :
Standard deviation ( - )
This optimization procedure was presented in preliminary form at the 45th Annual Meeting of the Soc. of Chem. Engrs., Japan, Osaka, C105 (1980). 相似文献
17.
18.
An optimal substrate feeding for an industrial scale fed-batch fermenter is determined through iterative dynamic programming in order to maximize the cell-mass production and to minimize the ethanol formation. An experimentally validated rigorous dynamic model comprises constraints in the optimization problem. A new objective function is proposed to accommodate the competing requirements of maximum yeast production and minimum ethanol formation. The objective function is maximized with iterative dynamic programming with respect to the sugar feed rate. Results prove the effectiveness of dynamic programming for solving such high-dimensional and nonlinear optimization problems, and the resulting optimal policy indicates that considerable increase in yeast production in fed-batch fermenters can be achieved while minimizing the undesired by-product, ethanol. 相似文献
19.
We developed a dynamic programming approach of computing common sequence structure patterns among two RNAs given their primary sequences and their secondary structures. Common patterns between two RNAs are defined to share the same local sequential and structural properties. The locality is based on the connections of nucleotides given by their phosphodiester and hydrogen bonds. The idea of interpreting secondary structures as chains of structure elements leads us to develop an efficient dynamic programming approach in time O(nm) and space O(nm), where n and m are the lengths of the RNAs. The biological motivation is given by detecting common, local regions of RNAs, although they do not necessarily share global sequential and structural properties. This might happen if RNAs fold into different structures but share a lot of local, stable regions. Here, we illustrate our algorithm on Hepatitis C virus internal ribosome entry sites. Our method is useful for detecting and describing local motifs as well. An implementation in C++ is available and can be obtained by contacting one of the authors. 相似文献
20.
A dynamic programming algorithm for finding alternative RNA secondary structures. 总被引:6,自引:8,他引:6 下载免费PDF全文
Dynamic programming algorithms that predict RNA secondary structure by minimizing the free energy have had one important limitation. They were able to predict only one optimal structure. Given the uncertainties of the thermodynamic data and the effects of proteins and other environmental factors on structure, the optimal structure predicted by these methods may not have biological significance. We present a dynamic programming algorithm that can determine optimal and suboptimal secondary structures for an RNA. The power and utility of the method is demonstrated in the folding of the intervening sequence of the rRNA of Tetrahymena. By first identifying the major secondary structures corresponding to the lowest free energy minima, a secondary structure of possible biological significance is derived. 相似文献