首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.  相似文献   

2.
Protein structures are stabilized by both local and long range interactions. In this work, we analyze the residue-residue contacts and the role of medium- and long-range interactions in globular proteins belonging to different structural classes. The results show that while medium range interactions predominate in all-alpha class proteins, long-range interactions predominate in all-beta class. Based on this, we analyze the performance of several structure prediction methods in different structural classes of globular proteins and found that all the methods predict the secondary structures of all-alpha proteins more accurately than other classes. Also, we observed that the residues occurring in the range of 21-30 residues apart contributes more towards long-range contacts and about 85% of residues are involved in long-range contacts. Further, the preference of residue pairs to the folding and stability of globular proteins is discussed.  相似文献   

3.
We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.  相似文献   

4.
The glycoprotein of vesicular stomatitis virus (VSV G) mediates fusion of the viral envelope with the host cell, with the conformational changes that mediate VSV G fusion activation occurring in a reversible, low pH-dependent manner. Based on its novel structure, VSV G has been classified as class III viral fusion protein, having a predicted bipartite fusion domain comprising residues Trp-72, Tyr-73, Tyr-116, and Ala-117 that interacts with the host cell membrane to initiate the fusion reaction. Here, we carried out a systematic mutagenesis study of the predicted VSV G fusion loops, to investigate the functional role of the fusion domain. Using assays of low pH-induced cell-cell fusion and infection studies of mutant VSV G incorporated into viral particles, we show a fundamental role for the bipartite fusion domain. We show that Trp-72 is a critical residue for VSV G-mediated membrane fusion. Trp-72 could only tolerate mutation to a phenylalanine residue, which allowed only limited fusion. Tyr-73 and Tyr-116 could be mutated to other aromatic residues without major effect but could not tolerate any other substitution. Ala-117 was a less critical residue, with only charged residues unable to allow fusion activation. These data represent a functional analysis of predicted bipartite fusion loops of VSV G, a founder member of the class III family of viral fusion proteins.  相似文献   

5.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

6.
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein–protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker’s yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.  相似文献   

7.
Rapid cold‐hardening (RCH) is a unique form of phenotypic plasticity which confers survival advantages at low temperature. The fitness costs of RCH are generally poorly elucidated and are important to understanding the evolution of plastic physiology. This study examined whether RCH responses, induced by ecologically relevant diel temperature fluctuations, carry metabolic, survival, or fecundity costs. We predicted that potential costs in RCH would be manifested as differences in metabolic rate, fecundity, or survival in flies which have hardened versus those which have not, or flies that have experienced more RCH events would show greater costs than those which have experienced fewer events. One group of flies cooled to 10°C for 2 h for 11 consecutive days experienced daily RCH (Hardened), whereas the other group exposed to 15°C for the same 2‐h period each day formed a Control group. Hardened flies had higher survival at –5°C for 2 h than control flies (69 ± 9% vs. 44 ± 19%, P = 0.04). Hardened flies showed no metabolic or fecundity costs, but had reduced average survival (P = 0.0403). Thus, a major cost to repeated low temperature exposures in Ceratitis capitata is through direct mortality caused by chilling injury, although this appears not to be a direct cost of RCH.  相似文献   

8.
MOTIVATION: Most secondary structure prediction programs target only alpha helix and beta sheet structures and summarize all other structures in the random coil pseudo class. However, such an assignment often ignores existing local ordering in so-called random coil regions. Signatures for such ordering are distinct dihedral angle pattern. For this reason, we propose as an alternative approach to predict directly dihedral regions for each residue as this leads to a higher amount of structural information. RESULTS: We propose a multi-step support vector machine (SVM) procedure, dihedral prediction (DHPRED), to predict the dihedral angle state of residues from sequence. Trained on 20,000 residues our approach leads to dihedral region predictions, that in regions without alpha helices or beta sheets is higher than those from secondary structure prediction programs. AVAILABILITY: DHPRED has been implemented as a web service, which academic researchers can access from our webpage http://www.fz-juelich.de/nic/cbb  相似文献   

9.
Accurate identification of strand residues aids prediction and analysis of numerous structural and functional aspects of proteins. We propose a sequence-based predictor, BETArPRED, which improves prediction of strand residues and β-strand segments. BETArPRED uses a novel design that accepts strand residues predicted by SSpro and predicts the remaining positions utilizing a logistic regression classifier with nine custom-designed features. These are derived from the primary sequence, the secondary structure (SS) predicted by SSpro, PSIPRED and SPINE, and residue depth as predicted by RDpred. Our features utilize certain local (window-based) patterns in the predicted SS and combine information about the predicted SS and residue depth. BETArPRED is evaluated on 432 sequences that share low identity with the training chains, and on the CASP8 dataset. We compare BETArPRED with seven modern SS predictors, and the top-performing automated structure predictor in CASP8, the ZHANG-server. BETArPRED provides statistically significant improvements over each of the SS predictors; it improves prediction of strand residues and β-strands, and it finds β-strands that were missed by the other methods. When compared with the ZHANG-server, we improve predictions of strand segments and predict more actual strand residues, while the other predictor achieves higher rate of correct strand residue predictions when under-predicting them.  相似文献   

10.
Organisms evolved at high temperatures must maintain their proteins' structures in the face of increased thermal disorder. This challenge results in differences in residue utilization and overall structure. Focusing on thermostable/mesostable pairs of homologous structures, we have examined these differences using novel geometric measures: specifically burial depth (distance from the molecular surface to each atom) and travel depth (distance from the convex hull to the molecular surface that avoids the protein interior). These along with common metrics like packing and Wadell Sphericity are used to gain insight into the constraints experienced by thermophiles. Mean travel depth of hyperthermostable proteins is significantly less than that of their mesostable counterparts, indicating smaller, less numerous and less deep pockets. The mean burial depth of hyperthermostable proteins is significantly higher than that of mesostable proteins indicating that they bury more atoms further from the surface. The burial depth can also be tracked on the individual residue level, adding a finer level of detail to the standard exposed surface area analysis. Hyperthermostable proteins for the first time are shown to be more spherical than their mesostable homologues, regardless of when and how they adapted to extreme temperature. Additionally, residue specific burial depth examinations reveal that charged residues stay unburied, most other residues are slightly more buried and Alanine is more significantly buried in hyperthermostable proteins. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
12.
Low-resolution experiments suggest that most membrane helices span over 17-25 residues and that most loops between two helices are longer than 15 residues. Both constraints have been used explicitly in the development of prediction methods. Here, we compared the largest possible sequence-unique data sets from high- and low-resolution experiments. For the high-resolution data, we found that only half of the helices fall into the expected length interval and that half of the loops were shorter than 10 residues. We compared the accuracy of detecting short loops and long helices for 28 advanced and simple prediction methods: All methods predicted short loops less accurately than longer ones. In particular, loops shorter than 7 residues appeared to be very difficult to detect by current methods. Similarly, all methods tended to be more accurate for longer than for shorter helices. However, helices with more than 32 residues were predicted less accurately than all other helices. Our findings may suggest particular strategies for improving predictions of membrane helices.  相似文献   

13.
Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.  相似文献   

14.

Background

Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.

Results

We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C α trace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C α traces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.

Conclusion

Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/.  相似文献   

15.
T Imanaka  M Nakae  T Ohta    M Takagi 《Journal of bacteriology》1992,174(4):1423-1425
Pro residues in predicted beta-turn structures were substituted with other amino acids to obtain temperature-sensitive penicillinase repressors (PenI). A mutant repressor (P70L; Pro-70 is substituted with Leu) was inactive at 48 degrees C and penP gene expression was derepressed (1,200 U/OD660 [optical density at 660 nm] ), although the mutant was still active at 30 degrees C (27 U). The heat induction ratio (penicillinase activity at 48 degrees C compared with that at 30 degrees C) of the mutant was 98 times higher than that of the wild type (i.e., 44 versus 0.45). This result indicated that the side chain of the Leu residue in P70L destroyed the proper folding of the repressor protein at the elevated temperature, whereas the Pro residue of the wild-type repressor stabilized this predicted beta-turn structure even at 48 degrees C. When the Pro residue was replaced by amino acid residues with smaller side chains (i.e., Gly and Ala), these mutant repressors were less temperature sensitive than P70L. These data suggest that the presence of the Pro residue in the beta-turn structure could be one of the key factors in stabilizing protein structure at elevated temperatures.  相似文献   

16.
Protein heterodimer complexes are often involved in catalysis, regulation, assembly, immunity and inhibition. This involves the formation of stable interfaces between the interacting partners. Hence, it is of interest to describe heterodimer interfaces using known structural complexes. We use a non-redundant dataset of 192 heterodimer complex structures from the protein databank (PDB) to identify interface residues and describe their interfaces using amino-acids residue property preference. Analysis of the dataset shows that the heterodimer interfaces are often abundant in polar residues. The analysis also shows the presence of two classes of interfaces in heterodimer complexes. The first class of interfaces (class A) with more polar residues than core but less than surface is known. These interfaces are more hydrophobic than surfaces, where protein-protein binding is largely hydrophobic. The second class of interfaces (class B) with more polar residues than core and surface is shown. These interfaces are more polar than surfaces, where binding is mainly polar. Thus, these findings provide insights to the understanding of protein-protein interactions.  相似文献   

17.
MOTIVATION: The prediction of ligand-binding residues or catalytically active residues of a protein may give important hints that can guide further genetic or biochemical studies. Existing sequence-based prediction methods mostly rank residue positions by evolutionary conservation calculated from a multiple sequence alignment of homologs. A problem hampering more wide-spread application of these methods is the low per-residue precision, which at 20% sensitivity is around 35% for ligand-binding residues and 20% for catalytic residues. RESULTS: We combine information from the conservation at each site, its amino acid distribution, as well as its predicted secondary structure (ss) and relative solvent accessibility (rsa). First, we measure conservation by how much the amino acid distribution at each site differs from the distribution expected for the predicted ss and rsa states. Second, we include the conservation of neighboring residues in a weighted linear score by analytically optimizing the signal-to-noise ratio of the total score. Third, we use conditional probability density estimation to calculate the probability of each site to be functional given its conservation, the observed amino acid distribution, and the predicted ss and rsa states. We have constructed two large data sets, one based on the Catalytic Site Atlas and the other on PDB SITE records, to benchmark methods for predicting functional residues. The new method FRcons predicts ligand-binding and catalytic residues with higher precision than alternative methods over the entire sensitivity range, reaching 50% and 40% precision at 20% sensitivity, respectively. AVAILABILITY: Server: http://frpred.tuebingen.mpg.de. Data sets: ftp://ftp.tuebingen.mpg.de/pub/protevo/FRpred/.  相似文献   

18.
Shukla A  Guptasarma P 《Proteins》2004,57(3):548-557
We show that residues at the interfaces of protein-protein complexes have higher side-chain energy than other surface residues. Eight different sets of protein complexes were analyzed. For each protein pair, the complex structure was used to identify the interface residues in the unbound monomer structures. Side-chain energy was calculated for each surface residue in the unbound monomer using our previously developed scoring function.1 The mean energy was calculated for the interface residues and the other surface residues. In 15 of the 16 monomers, the mean energy of the interface residues was higher than that of other surface residues. By decomposing the scoring function, we found that the energy term of the buried surface area of non-hydrogen-bonded hydrophilic atoms is the most important factor contributing to the high energy of the interface regions. In spite of lacking hydrophilic residues, the interface regions were found to be rich in buried non-hydrogen-bonded hydrophilic atoms. Although the calculation results could be affected by the inaccuracy of the scoring function, patch analysis of side-chain energy on the surface of an isolated protein may be helpful in identifying the possible protein-protein interface. A patch was defined as 20 residues surrounding the central residue on the protein surface, and patch energy was calculated as the mean value of the side-chain energy of all residues in the patch. In 12 of the studied monomers, the patch with the highest energy overlaps with the observed interface. The results are more remarkable when only three residues with the highest energy in a patch are averaged to derive the patch energy. All three highest-energy residues of the top energy patch belong to interfacial residues in four of the eight small protomers. We also found that the residue with the highest energy score on the surface of a small protomer is very possibly the key interaction residue.  相似文献   

19.
Chen H  Kihara D 《Proteins》2008,71(3):1255-1274
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously.  相似文献   

20.
Fuchs A  Kirschner A  Frishman D 《Proteins》2009,74(4):857-871
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号