首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the present paper, we describe how a directed graph was constructed and then searched for the optimum path using a dynamic programming approach, based on the secondary structure propensity of the protein short sequence derived from a training data set. The protein secondary structure was thus predicted in this way. The average three-state accuracy of the algorithm used was 76.70%.  相似文献   

2.
A protein structure comparison method is described that allows the generation of large populations of high-scoring alternate alignments. This was achieved by incorporating a random element into an iterative double dynamic programming algorithm. The maximum scores from repeated comparisons of a pair of structures converged on a value that was taken as the global maximum. This lay 15% over the score obtained from the single fixed (unrandomized) calculation. The effect of the gap penalty was observed through the shift of the alignment populations, characterized by their alignment length and root-mean-square deviation (RMSD). The best (lowest RMSD) values found in these populations provided a base-line against which other methods were compared.  相似文献   

3.
We have carried out numerical experiments to investigate the applicability of the global optimization method of conformational space annealing (CSA) to the enhanced NMR protein structure determination over existing PDB structures. The NMR protein structure determination is driven by the optimization of collective multiple restraints arising from experimental data and the basic stereochemical properties of a protein‐like molecule. By rigorous and straightforward application of CSA to the identical NMR experimental data used to generate existing PDB structures, we redetermined 56 recent PDB protein structures starting from fully randomized structures. The quality of CSA‐generated structures and existing PDB structures were assessed by multiobjective functions in terms of their consistencies with experimental data and the requirements of protein‐like stereochemistry. In 54 out of 56 cases, CSA‐generated structures were better than existing PDB structures in the Pareto‐dominant manner, while in the remaining two cases, it was a tie with mixed results. As a whole, all structural features tested improved in a statistically meaningful manner. The most improved feature was the Ramachandran favored portion of backbone torsion angles with about 8.6% improvement from 88.9% to 97.5% (P‐value <10?17). We show that by straightforward application of CSA to the efficient global optimization of an energy function, NMR structures will be of better quality than existing PDB structures. Proteins 2015; 83:2251–2262. © 2015 Wiley Periodicals, Inc.  相似文献   

4.
5.
MOTIVATION: Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. RESULTS: The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. AVAILABILITY: C/C++/Perl implementation is available from authors upon request.  相似文献   

6.
Protein threading by recursive dynamic programming.   总被引:4,自引:0,他引:4  
We present the recursive dynamic programming (RDP) method for the threading approach to three-dimensional protein structure prediction. RDP is based on the divide-and-conquer paradigm and maps the protein sequence whose backbone structure is to be found (the protein target) onto the known backbone structure of a model protein (the protein template) in a stepwise fashion, a technique that is similar to computing local alignments but utilising different cost functions. We begin by mapping parts of the target onto the template that show statistically significant similarity with the template sequence. After mapping, the template structure is modified in order to account for the mapped target residues. Then significant similarities between the yet unmapped parts of the target and the modified template are searched, and the resulting segments of the target are mapped onto the template. This recursive process of identifying segments in the target to be mapped onto the template and modifying the template is continued until no significant similarities between the remaining parts of target and template are found. Those parts which are left unmapped by the procedure are interpreted as gaps.The RDP method is robust in the sense that different local alignment methods can be used, several alternatives of mapping parts of the target onto the template can be handled and compared in the process, and the cost functions can be dynamically adapted to biological needs.Our computer experiments show that the RDP procedure is efficient and effective. We can thread a typical protein sequence against a database of 887 template domains in about 12 hours even on a low-cost workstation (SUN Ultra 5). In statistical evaluations on databases of known protein structures, RDP significantly outperforms competing methods. RDP has been especially valuable in providing accurate alignments for modeling active sites of proteins.RDP is part of the ToPLign system (GMD Toolbox for protein alignment) and can be accessed via the WWW independently or in concert with other ToPLign tools at http://cartan.gmd.de/ToPLign.html.  相似文献   

7.
A measure of protein structure similarity is calculated from the matching of pairs of secondary structure elements between two proteins. The interaction of each pair was estimated from their axial line segments and combined with other geometric features to produce an optimal discrimination between intrafamily and interfamily relationships. The matching used a fast bipartite graph-matching algorithm that avoids the computational complexity of searching for the full subgraph isomorphism between the two sets of interactions. The main algorithm used was the "stable marriage" algorithm, which works on the ranked "preferences" of one interaction for another. The method takes 1/10 of a second for a typical comparison making it suitable as a fast pre-filter for slower, more exhaustive approaches. An application to protein structure classification is described.  相似文献   

8.
Protein docking using continuum electrostatics and geometric fit   总被引:9,自引:0,他引:9  
The computer program DOT quickly finds low-energy docked structures for two proteins by performing a systematic search over six degrees of freedom. A novel feature of DOT is its energy function, which is the sum of both a Poisson-Boltzmann electrostatic energy and a van der Waals energy, each represented as a grid-based correlation function. DOT evaluates the energy of interaction for many orientations of the moving molecule and maintains separate lists scored by either the electrostatic energy, the van der Waals energy or the composite sum of both. The free energy is obtained by summing the Boltzmann factor over all rotations at each grid point. Three important findings are presented. First, for a wide variety of protein-protein interactions, the composite-energy function is shown to produce larger clusters of correct answers than found by scoring with either van der Waals energy (geometric fit) or electrostatic energy alone. Second, free-energy clusters are demonstrated to be indicators of binding sites. Third, the contributions of electrostatic and attractive van der Waals energies to the total energy term appropriately reflect the nature of the various types of protein-protein interactions studied.  相似文献   

9.
10.
Yang JM  Tung CH 《Nucleic acids research》2006,34(13):3646-3659
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].  相似文献   

11.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

12.

Background  

Dynamic programming is a widely used programming technique in bioinformatics. In sharp contrast to the simplicity of textbook examples, implementing a dynamic programming algorithm for a novel and non-trivial application is a tedious and error prone task. The algebraic dynamic programming approach seeks to alleviate this situation by clearly separating the dynamic programming recurrences and scoring schemes.  相似文献   

13.

Background

A protein structure can be determined by solving a so-called distance geometry problem whenever a set of inter-atomic distances is available and sufficient. However, the problem is intractable in general and has proved to be a NP hard problem. An updated geometric build-up algorithm (UGB) has been developed recently that controls numerical errors and is efficient in protein structure determination for cases where only sparse exact distance data is available. In this paper, the UGB method has been improved and revised with aims at solving distance geometry problems more efficiently and effectively.

Methods

An efficient algorithm (called the revised updated geometric build-up algorithm (RUGB)) to build up a protein structure from atomic distance data is presented and provides an effective way of determining a protein structure with sparse exact distance data. In the algorithm, the condition to determine an unpositioned atom iteratively is relaxed (when compared with the UGB algorithm) and data structure techniques are used to make the algorithm more efficient and effective. The algorithm is tested on a set of proteins selected randomly from the Protein Structure Database-PDB.

Results

We test a set of proteins selected randomly from the Protein Structure Database-PDB. We show that the numerical errors produced by the new RUGB algorithm are smaller when compared with the errors of the UGB algorithm and that the novel RUGB algorithm has a significantly smaller runtime than the UGB algorithm.

Conclusions

The RUGB algorithm relaxes the condition for updating and incorporates the data structure for accessing neighbours of an atom. The revisions result in an improvement over the UGB algorithm in two important areas: a reduction on the overall runtime and decrease of the numeric error.
  相似文献   

14.
Recent advances in next-generation sequencing technologies have resulted in an exponential increase in the rate at which protein sequence data are being acquired. The k-gram feature representation, commonly used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. In this paper, we study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is "reduced" by hashing the features into a low-dimensional space, using a hash function, i.e., by mapping features into hash keys, where multiple features can be mapped (at random) to the same hash key, and "aggregating" their counts. We compare feature hashing with the "bag of k-grams" approach. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.  相似文献   

15.
Smoothing and differentiation of noisy data using spline functions requires the selection of an unknown smoothing parameter. The method of generalized cross-validation provides an excellent estimate of the smoothing parameter from the data itself even when the amount of noise associated with the data is unknown. In the present model only a single smoothing parameter must be obtained, but in a more general context the number may be larger. In an earlier work, smoothing of the data was accomplished by solving a minimization problem using the technique of dynamic programming. This paper shows how the computations required by generalized cross-validation can be performed as a simple extension of the dynamic programming formulas. The results of numerical experiments are also included.  相似文献   

16.
Kolodny R  Levitt M 《Biopolymers》2003,68(3):278-285
A small set of protein fragments can represent adequately all known local protein structure. This set of fragments, along with a construction scheme that assembles these fragments into structures, defines a discrete (relatively small) conformation space, which approximates protein structures accurately. We generate protein decoys by sampling geometrically valid structures from this conformation space, biased by the secondary structure prediction for the protein. Unlike other methods, secondary structure prediction is the only protein-specific information used for generating the decoys. Nevertheless, these decoys are qualitatively similar to those found by others. The method works well for all-alpha proteins, and shows promising results for alpha and beta proteins.  相似文献   

17.
Sadeghi M  Parto S  Arab S  Ranjbar B 《FEBS letters》2005,579(16):3397-3400
We have used a statistical approach for protein secondary structure prediction based on information theory and simultaneously taking into consideration pairwise residue types and conformational states. Since the prediction of residue secondary structure by one residue window sliding make ambiguity in state prediction, we used a dynamic programming algorithm to find the path with maximum score. A score system for residue pairs in particular conformations is derived for adjacent neighbors up to ten residue apart in sequence. The three state overall per-residue accuracy, Q3, of this method in a jackknife test with dataset created from PDBSELECT is more than 70%.  相似文献   

18.
19.
Prediction of the exon-intron structure by a dynamic programming approach   总被引:4,自引:0,他引:4  
  相似文献   

20.
We present a method for classifying proteins into families based on short subsequences of amino acids using a new probabilistic model called sparse Markov transducers (SMT). We classify a protein by estimating probability distributions over subsequences of amino acids from the protein. Sparse Markov transducers, similar to probabilistic suffix trees, estimate a probability distribution conditioned on an input sequence. SMTs generalize probabilistic suffix trees by allowing for wild-cards in the conditioning sequences. Since substitutions of amino acids are common in protein families, incorporating wild-cards into the model significantly improves classification performance. We present two models for building protein family classifiers using SMTs. As protein databases become larger, data driven learning algorithms for probabilistic models such as SMTs will require vast amounts of memory. We therefore describe and use efficient data structures to improve the memory usage of SMTs. We evaluate SMTs by building protein family classifiers using the Pfam and SCOP databases and compare our results to previously published results and state-of-the-art protein homology detection methods. SMTs outperform previous probabilistic suffix tree methods and under certain conditions perform comparably to state-of-the-art protein homology methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号