首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
D J Ayers  T Huber  A E Torda 《Proteins》1999,36(4):454-461
We describe two ways of optimizing score functions for protein sequence to structure threading. The first method adjusts parameters to improve sequence to structure alignment. The second adjusts parameters so as to improve a score function's ability to rank alignments calculated in the first score function. Unlike those functions known as knowledge-based force fields, the resulting parameter sets do not rely on Boltzmann statistics, have no claim to representing free energies and are purely constructions for recognizing protein folds. The methods give a small improvement, but suggest that functions can be profitably optimized for very specific aspects of protein fold recognition. Proteins 1999;36:454-461.  相似文献   

2.
We present a general method for assessing threading score significance. The threading score of a protein sequence, thread onto a given structure, should be compared with the threading score distribution of a random amino-acid sequence, of the same length, thread on the same structure; small p-values point significantly high scores. We claim that, due to general protein contact map properties, this reference distribution is a Weibull extreme value distribution whose parameters depend on the threading method, the structure, the length of the query and the random sequence simulation model used. These parameters can be estimated off-line with simulated sequence samples, for different sequence lengths. They can further be interpolated at the exact length of a query, enabling the quick computation of the p-value.  相似文献   

3.
4.
Russell AJ  Torda AE 《Proteins》2002,47(4):496-505
Multiple sequence alignments are a routine tool in protein fold recognition, but multiple structure alignments are computationally less cooperative. This work describes a method for protein sequence threading and sequence-to-structure alignments that uses multiple aligned structures, the aim being to improve models from protein threading calculations. Sequences are aligned into a field due to corresponding sites in homologous proteins. On the basis of a test set of more than 570 protein pairs, the procedure does improve alignment quality, although no more than averaging over sequences. For the force field tested, the benefit of structure averaging is smaller than that of adding sequence similarity terms or a contribution from secondary structure predictions. Although there is a significant improvement in the quality of sequence-to-structure alignments, this does not directly translate to an immediate improvement in fold recognition capability.  相似文献   

5.
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.  相似文献   

6.
7.
To facilitate investigation of the molecular and biochemical functions of the adenovirus E4 Orf6 protein, we sought to derive three-dimensional structural information using computational methods, particularly threading and comparative protein modeling. The amino acid sequence of the protein was used for secondary structure and hidden Markov model (HMM) analyses, and for fold recognition by the ProCeryon program. Six alternative models were generated from the top-scoring folds identified by threading. These models were examined by 3D-1D analysis and evaluated in the light of available experimental evidence. The final model of the E4 protein derived from these and additional threading calculations was a chimera, with the tertiary structure of its C-terminal 226 residues derived from a TIM barrel template and a mainly alpha-nonbundle topology for its poorly conserved N-terminal 68 residues. To assess the accuracy of this model, additional threading calculations were performed with E4 Orf6 sequences altered as in previous experimental studies. The proposed structural model is consistent with the reported secondary structure of a functionally important C-terminal sequence and can account for the properties of proteins carrying alterations in functionally important sequences or of those that disrupt an unusual zinc-coordination motif.  相似文献   

8.
A new method for the homology-based modeling of protein three-dimensional structures is proposed and evaluated. The alignment of a query sequence to a structural template produced by threading algorithms usually produces low-resolution molecular models. The proposed method attempts to improve these models. In the first stage, a high-coordination lattice approximation of the query protein fold is built by suitable tracking of the incomplete alignment of the structural template and connection of the alignment gaps. These initial lattice folds are very similar to the structures resulting from standard molecular modeling protocols. Then, a Monte Carlo simulated annealing procedure is used to refine the initial structure. The process is controlled by the model's internal force field and a set of loosely defined restraints that keep the lattice chain in the vicinity of the template conformation. The internal force field consists of several knowledge-based statistical potentials that are enhanced by a proper analysis of multiple sequence alignments. The template restraints are implemented such that the model chain can slide along the template structure or even ignore a substantial fraction of the initial alignment. The resulting lattice models are, in most cases, closer (sometimes much closer) to the target structure than the initial threading-based models. All atom models could easily be built from the lattice chains. The method is illustrated on 12 examples of target/template pairs whose initial threading alignments are of varying quality. Possible applications of the proposed method for use in protein function annotation are briefly discussed.  相似文献   

9.
The threading approach to protein structure prediction suffers from the limited number of substantially different folds available as templates. A method is presented for the generation of artificial protein structures, amenable to threading, by modification of native ones. The artificial structures so generated are compared to the native ones and it is shown that, within the accuracy of the pseudoenergy function or force field used, these two types of structures appear equally useful for threading. Since a multitude of pseudonative artificial structures can be generated per native structure, the pool of pseudonative template structures for threading can be enormously enlarged by the inclusion of the pseudonative artificial structures. Proteins 28:522–529, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

10.
NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data.  相似文献   

11.
Protein threading using PROSPECT: design and evaluation   总被引:14,自引:0,他引:14  
Xu Y  Xu D 《Proteins》2000,40(3):343-354
The computer system PROSPECT for the protein fold recognition using the threading method is described and evaluated in this article. For a given target protein sequence and a template structure, PROSPECT guarantees to find a globally optimal threading alignment between the two. The scoring function for a threading alignment employed in PROSPECT consists of four additive terms: i) a mutation term, ii) a singleton fitness term, iii) a pairwise-contact potential term, and iv) alignment gap penalties. The current version of PROSPECT considers pair contacts only between core (alpha-helix or beta-strand) residues and alignment gaps only in loop regions. PROSPECT finds a globally optimal threading efficiently when pairwise contacts are considered only between residues that are spatially close (7 A or less between the C(beta) atoms in the current implementation). On a test set consisting of 137 pairs of target-template proteins, each pair being from the same superfamily and having sequence identity 相似文献   

12.
MOTIVATION: We propose a general method for deriving amino acid substitution matrices from low resolution force fields. Unlike current popular methods, the approach does not rely on evolutionary arguments or alignment of sequences or structures. Instead, residues are computationally mutated and their contribution to the total energy/score is collected. The average of these values over each position within a set of proteins results in a substitution matrix. RESULTS: Example substitution matrices have been calculated from force fields based on different philosophies and their performance compared with conventional substitution matrices. Although this can produce useful substitution matrices, the methodology highlights the virtues, deficiencies and biases of the source force fields. It also allows a rather direct comparison of sequence alignment methods with the score functions underlying protein sequence to structure threading. AVAILABILITY: Example substitution matrices are available from http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html. SUPPLEMENTARY INFORMATION: The list of proteins used for data collection and the optimized parameters for the alignment are given as supplementary material at http://www.rsc.anu.edu.au/~zsuzsa/suppl/matrices.html.  相似文献   

13.
Learning MHC I--peptide binding   总被引:1,自引:0,他引:1  
MOTIVATION AND RESULTS: Motivated by the ability of a simple threading approach to predict MHC I--peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data (if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC types for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system. AVAILABILITY: http://www.research.microsoft.com/~jojic/hlaBinding.html.  相似文献   

14.
The open reading frames of human cytomegalovirus (human herpesvirus-5, HHV5) encode some 213 unique proteins with mostly unknown functions. Using the threading program, ProCeryon, we calculated possible matches between the amino acid sequences of these proteins and the Protein Data Bank library of three-dimensional structures. Thirty-six proteins were fully identified in terms of their structure and, often, function; 65 proteins were recognized as members of narrow structural/functional families (e.g. DNA-binding factors, cytokines, enzymes, signaling particles, cell surface receptors etc.); and 87 proteins were assigned to broad structural classes (e.g. all-beta, 3-layer-alphabetaalpha, multidomain, etc.). Genes encoding proteins with similar folds, or containing identical structural traits (extreme sequence length, runs of unstructured (Pro and/or Gly-rich) residues, transmembrane segments, etc.) often formed tandem clusters throughout the genome. In the course of this work, benchmarks on about 20 known folds were used to optimize adjustable parameters of threading calculations, i.e. gap penalty weights used in sequence/structure alignments; new scores obtained as simple combinations of existing scoring functions; and number of threading runs conducive to meaningful results. An introduction of summed, per-residue-normalized scores has been essential for discovery of subdomains (EGF-like, SH2, SH3) in longer protein sequences, such as the eight "open sandwich" cytokine domains, 60-70 amino acids long and having the 3beta1alpha fold with one or two disulfide bridges, present in otherwise unrelated proteins.  相似文献   

15.
We present a CPU efficient protocol for refinement of protein structures in a thin layer of explicit solvent and energy parameters with completely revised dihedral angle terms. Our approach is suitable for protein structures determined by theoretical (e.g., homology modeling or threading) or experimental methods (e.g., NMR). In contrast to other recently proposed refinement protocols, we put a strong emphasis on consistency with widely accepted covalent parameters and computational efficiency. We illustrate the method for NMR structure calculations of three proteins: interleukin-4, ubiquitin, and crambin. We show a comparison of their structure ensembles before and after refinement in water with and without a force field energy term for the dihedral angles; crambin was also refined in DMSO. Our results demonstrate the significant improvement of structure quality by a short refinement in a thin layer of solvent. Further, they show that a dihedral angle energy term in the force field is beneficial for structure calculation and refinement. We discuss the optimal weight for the energy constant for the backbone angle omega and include an extensive discussion of meaning and relevance of the calculated validation criteria, in particular root mean square Z scores for covalent parameters such as bond lengths.  相似文献   

16.
Yang YD  Park C  Kihara D 《Proteins》2008,73(3):581-596
Optimizing weighting factors for a linear combination of terms in a scoring function is a crucial step for success in developing a threading algorithm. Usually weighting factors are optimized to yield the highest success rate on a training dataset, and the determined constant values for the weighting factors are used for any target sequence. Here we explore completely different approaches to handle weighting factors for a scoring function of threading. Throughout this study we use a model system of gapless threading using a scoring function with two terms combined by a weighting factor, a main chain angle potential and a residue contact potential. First, we demonstrate that the optimal weighting factor for recognizing the native structure differs from target sequence to target sequence. Then, we present three novel threading methods which circumvent training dataset-based weighting factor optimization. The basic idea of the three methods is to employ different weighting factor values and finally select a template structure for a target sequence by examining characteristics of the distribution of scores computed by using the different weighting factor values. Interestingly, the success rate of our approaches is comparable to the conventional threading method where the weighting factor is optimized based on a training dataset. Moreover, when the size of the training set available for the conventional threading method is small, our approach often performs better. In addition, we predict a target-specific weighting factor optimal for a target sequence by an artificial neural network from features of the target sequence. Finally, we show that our novel methods can be used to assess the confidence of prediction of a conventional threading with an optimized constant weighting factor by considering consensus prediction between them. Implication to the underlined energy landscape of protein folding is discussed.  相似文献   

17.
Peng J  Xu J 《Proteins》2011,79(6):1930-1939
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures.  相似文献   

18.
MOTIVATION: This paper investigates the sequence-structure specificity of a representative knowledge based energy function by applying it to threading at the level of secondary structures of proteins. Assessing the strengths and weaknesses of an energy function at this fundamental level provides more detailed and insightful information than at the tertiary structure level and the results obtained can be useful in tertiary level threading. RESULTS: We threaded each of the 293 non-redundant proteins onto the secondary structures contained in its respective native protein (host template). We also used 68 pairs of proteins with similar folds and low sequence identity. For each pair, we threaded the sequence of one protein onto the secondary structures of the other protein. The discerning power of the total energy function and its one-body, pairwise, and mutation components is studied. We then applied our energy function to a recent study which demonstrated how a designed 11-amino acid sequence can replace distinct segments (one segment is an alpha-helix, the other is a beta-sheet) of a protein without changing its fold. We conducted random mutations of the designed sequence to determine the patterns for favorable mutations. We also studied the sequence-structure specificity at the boundaries of a secondary structure. Finally, we demonstrated how to speed up tertiary level threading by filtering out alignments found to be energetically unfavorable during the secondary structure threading. AVAILABILITY: The program is available on request from the authors. CONTACT: xud@ornl.gov  相似文献   

19.
With the advent of experimental technologies like chemical cross-linking, it has become possible to obtain distances between specific residues of a newly sequenced protein. These types of experiments usually are less time consuming than X-ray crystallography or NMR. Consequently, it is highly desired to develop a method that incorporates this distance information to improve the performance of protein threading methods. However, protein threading with profiles in which constraints on distances between residues are given is known to be NP-hard. By using the notion of a maximum edge-weight clique finding algorithm, we introduce a more efficient method called FTHREAD for profile threading with distance constraints that is 18 times faster than its predecessor CLIQUETHREAD. Moreover, we also present a novel practical algorithm NTHREAD for profile threading with Non-strict constraints. The overall performance of FTHREAD on a data set shows that although our algorithm uses a simple threading function, our algorithm performs equally well as some of the existing methods. Particularly, when there are some unsatisfied constraints, NTHREAD (Non-strict constraints threading algorithm) performs better than threading with FTHREAD (Strict constraints threading algorithm). We have also analyzed the effects of using a number of distance constraints. This algorithm helps the enhancement of alignment quality between the query sequence and template structure, once the corresponding template structure is determined for the target sequence.  相似文献   

20.
Protein structure prediction is limited by the inaccuracy of the simplified energy functions necessary for efficient sorting over many conformations. It was recently suggested (Finkelstein, Phys Rev Lett 1998;80:4823-4825) that these errors can be reduced by energy averaging over a set of homologous sequences. This conclusion is confirmed in this study by testing protein structure recognition in gapless threading. The accuracy of recognition was estimated by the Z-score values obtained in gapless threading tests. For threading, we used 20 target proteins, each having from 20 to 70 homologs taken from the HSSP sequence base. The energy of the native structures was compared with the energy from 34 to 75 thousand of alternative structures generated by threading. The energy calculations were done with our recently developed Calpha atom-based phenomenological potentials. We show that averaging of protein energies over homologs reduces the Z-score from approximately -6.1 (average Z-score for individual chains) to approximately -8.1. This means that a correct fold can be found among 3 x 10(9) random folds in the first case and among 3 x 10(15) in the second. Such increase in selectivity is important for recognition of protein folds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号