首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Knowing the coordination number and relative solvent accessibility of all the residues in a protein is crucial for deriving constraints useful in modeling protein folding and protein structure and in scoring remote homology searches. We develop ensembles of bidirectional recurrent neural network architectures to improve the state of the art in both contact and accessibility prediction, leveraging a large corpus of curated data together with evolutionary information. The ensembles are used to discriminate between two different states of residue contacts or relative solvent accessibility, higher or lower than a threshold determined by the average value of the residue distribution or the accessibility cutoff. For coordination numbers, the ensemble achieves performances ranging within 70.6-73.9% depending on the radius adopted to discriminate contacts (6A-12A). These performances represent gains of 16-20% over the baseline statistical predictor, always assigning an amino acid to the largest class, and are 4-7% better than any previous method. A combination of different radius predictors further improves performance. For accessibility thresholds in the relevant 15-30% range, the ensemble consistently achieves a performance above 77%, which is 10-16% above the baseline prediction and better than other existing predictors, by up to several percentage points. For both problems, we quantify the improvement due to evolutionary information in the form of PSI-BLAST-generated profiles over BLAST profiles. The prediction programs are implemented in the form of two web servers, CONpro and ACCpro, available at http://promoter.ics.uci.edu/BRNN-PRED/.  相似文献   

2.
Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure, but the problem of predicting reliable contact maps is far from solved. One of the main pitfalls of existing contact map predictors is that they generally predict unphysical maps, i.e. maps that cannot be embedded into three-dimensional structures or, at best, violate a number of basic constraints observed in real protein structures, such as the maximum number of contacts for a residue. Here, we focus on the problem of learning to predict more "physical" contact maps. We do so by first predicting contact maps through a traditional system (XXStout), and then filtering these maps by an ensemble of artificial neural networks. The filter is provided as input not only the bare predicted map, but also a number of global or long-range features extracted from it. In a rigorous cross-validation test, we show that the filter greatly improves the predicted maps it is input. CASP7 results, on which we report here, corroborate this finding. Importantly, since the approach we present here is fully modular, it may be beneficial to any other ab initio contact map predictor.  相似文献   

3.
Li J  Wang J  Wang W 《Proteins》2008,71(4):1899-1907
In the native structure of a protein, all the residues are tightly parked together in a specific order following its folding and every residue contacts with some spatially neighbor residues. A residue contact network can be constructed by defining the residues as nodes and the native contacts as edges. During the folding of small single-domain proteins, there is a set of contacts (or bonds), defined as the folding nucleus (FN), which is formed around the transition state, i.e., a rate-limiting barrier located at about the middle between the unfolded states and the native state on the free energy landscape. Such a FN plays an essential role in the folding dynamics and the residues, which form the related contacts called as folding nucleus residues (FNRs). In this work, the FNRs in proteins are identified by using quantities which characterize the topology of residue contact networks of proteins. By comparing the specificities of residues with the network quantities K(R), L(R), and D(R), up to 90% FNRs of six typical proteins found experimentally are identified. It is found that the FNRs behave the full-closeness centrals rather than degree or closeness centers in the residue contact network, implying that they are important to the folding cooperativity of proteins. Our study shows that the FNRs can be identified solely from the native structures of proteins based on the analysis of residue contact network without any knowledge of the transition state ensemble.  相似文献   

4.
5.
Molecular docking is the method of choice for investigating the molecular basis of recognition in a large number of functional protein complexes. However, correctly scoring the obtained docking solutions (decoys) to rank native‐like (NL) conformations in the top positions is still an open problem. Herein we present CONSRANK, a simple and effective tool to rank multiple docking solutions, which relies on the conservation of inter‐residue contacts in the analyzed decoys ensemble. First it calculates a conservation rate for each inter‐residue contact, then it ranks decoys according to their ability to match the more frequently observed contacts. We applied CONSRANK to 102 targets from three different benchmarks, RosettaDock, DOCKGROUND, and Critical Assessment of PRedicted Interactions (CAPRI). The method performs consistently well, both in terms of NL solutions ranked in the top positions and of values of the area under the receiver operating characteristic curve. Its ideal application is to solutions coming from different docking programs and procedures, as in the case of CAPRI targets. For all the analyzed CAPRI targets where a comparison is feasible, CONSRANK outperforms the CAPRI scorers. The fraction of NL solutions in the top ten positions in the RosettaDock, DOCKGROUND, and CAPRI benchmarks is enriched on average by a factor of 3.0, 1.9, and 9.9, respectively. Interestingly, CONSRANK is also able to specifically single out the high/medium quality (HMQ) solutions from the docking decoys ensemble: it ranks 46.2 and 70.8% of the total HMQ solutions available for the RosettaDock and CAPRI targets, respectively, within the top 20 positions. Proteins 2013. © 2013 Wiley Periodicals, Inc.  相似文献   

6.
Based on available experimental data and using a theoretical model of protein folding, we demonstrate that there is an optimal ratio between the average conformational entropy and the average contact energy per residue for fast protein folding. A statistical analysis of the conformational entropy and the number of contacts per residue for 5829 protein domains from four main classes (α, β, α/β, α+β) shows that each class has its own characteristic average number of contacts per residue and average conformational entropy per residue. These class-specific characteristics determine the protein folding rates: α-proteins are the fastest to fold, β-proteins are the second fastest, α+β-proteins are the third, and α/β-proteins are the last to fold.  相似文献   

7.
Unfolding transitions of an intrinsically unstable annexin domain and the unfolded state structure have been examined using multiple approximately 10-ns molecular dynamics simulations. Three main basins are observed in the configurational space: native-like state, compact partially unfolded or intermediate compact state, and the unfolded state. In the native-like state fluctuations are observed that are nonproductive for unfolding. During these fluctuations, after an initial loss of approximately 20% of the core residue native contacts, the core of the protein transiently completely refolds to the native state. The transition from the native-like basin to the partially unfolded compact state involves approximately 75% loss of native contacts but little change in the radius of gyration or core hydration properties. The intermediate state adopts for part of the time in one of the trajectories a novel highly compact salt-bridge stabilized structure that can be identified as a conformational trap. The intermediate-to-unfolded state transition is characterized by a large increase in the radius of gyration. After an initial relaxation the unfolded state recovers a native-like topology of the domain. The simulated unfolded state ensemble reproduces in detail experimental nuclear magnetic resonance data and leads to a convincing complete picture of the unfolded domain.  相似文献   

8.
Fuchs A  Kirschner A  Frishman D 《Proteins》2009,74(4):857-871
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.  相似文献   

9.
Predicted protein residue–residue contacts can be used to build three‐dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three‐dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two‐stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β‐sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM‐score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM‐score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/ . Proteins 2015; 83:1436–1449. © 2015 Wiley Periodicals, Inc.  相似文献   

10.

Background

Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions.

Results

In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC.

Conclusions

Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
  相似文献   

11.
Inter-residue pair contacts have been analyzed in detail for the four pairs of protein structures determined both by X-ray analysis (X-ray) and nuclear magnetic resonance (NMR). At contact distances < or = 4.0 angstroms in the four NMR structures the overall number of pair contacts are less by 4-9% and pair contacts are in average shorter by 0.02-0.16 angstroms than those in corresponding X-ray structures. In each of four structure pairs 83-94% of common pair contacts are formed by the same residues in both structures and rest 6-17% ones are longer own pair contacts formed by the different residues in the NMR and X-ray structures. The amount of the longer own contacts is higher in the X-ray structure of the pair. In the each NMR structure there are three types of common pair contacts, which are shorter, longer or equal length in comparison with identical pair contacts in the X-ray structure of the same protein. The methodological different shortened common pair contacts predominate in the known distant dependence of the inter-residue contact densities of the 60-61 pair of the NMR/X-ray structure. Among four pairs analyzed the contact shortening proceeds upon the energy minimization of the crambin NMR structure and upon the resolving by the program X-PLOR with decreased atom van der Waals radius of the NMR structures of ubiquitin, hen lysozyme and monomeric hemoglobin. An extent of the NMR contact shortening decreased as the amount of NMR information upon the calculation of the NMR structures increased. Among 60-61 pairs of NMR/X-ray structures the main difference between alpha-helical and beta-structural proteins on the inter-residue distant dependence of the average contact densities arises from the strong alpha/beta difference in the local backbone geometry.  相似文献   

12.
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position–specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα‐Cα atoms. First, using a rigorous leave‐one‐protein‐out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state‐of‐the‐art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/ . Proteins 2016; 84:332–348. © 2016 Wiley Periodicals, Inc.  相似文献   

13.
R. Rajgaria  Y. Wei  C. A. Floudas 《Proteins》2010,78(8):1825-1846
An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα? Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β‐sheet alignments. These β‐sheet alignments are used as constraints for contacts between residues of β‐sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was ~61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO‐FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
Radius of gyration is indicator of compactness of protein structure   总被引:1,自引:0,他引:1  
Search and study of the general principles that govern kinetics and thermodynamics of protein folding generate a new insight into the factors controlling this process. Statistical analysis of radii of gyration for 3769 protein structures from four general structural classes (all-alpha, all-beta, alpha/beta, alpha + beta) demonstrates that each class of proteins has its own class-specific radius of gyration, which determines compactness of protein structures: alpha-proteins have the largest radius of gyration. This indicates that they are less tightly packed than beta- and alpha + beta-proteins. Finally, alpha/beta-proteins are the most tightly packed proteins with the least radius of gyration. It should be underlined that radius of gyration normalized on the radius of gyration of ball with the same volume, is independent of the length in comparison with such parameters as compactness and number of contacts per residue.  相似文献   

15.
Chen C  Li L  Xiao Y 《Biopolymers》2007,85(1):28-37
In this paper we use all-atom potential energy to define and analyze the inter-residue contacts in mesophilic and thermophilic proteins. Fifteen families of proteins are selected and each family has two representative proteins with greatly different preferred environmental temperatures. We find that both the number and energy of the contacts defined in this way show stronger correlations with the preferred temperatures of proteins than other factors used before. We also find that the charged-polar and charged-nonpolar residue contacts not only have larger contact numbers but also have lower single contact energies. Furthermore, the most important is that most of the thermophilic proteins have more charged-polar and charged-nonpolar residue contacts than their mesophilic counterparts. This suggests that they may play an important role in the thermostability of proteins, except usual charged-charged and nonpolar-nonpolar residue contacts. Charged residues may exert their profound influence by forming contacts not only with other charged residues but also with polar or nonpolar residues, thus further increasing the strength of contact network and then the thermostability of proteins.  相似文献   

16.
17.
Ishida T  Nakamura S  Shimizu K 《Proteins》2006,64(4):940-947
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.  相似文献   

18.
Wang XF  Chen Z  Wang C  Yan RX  Zhang Z  Song J 《PloS one》2011,6(10):e26767
Integral membrane proteins constitute 25-30% of genomes and play crucial roles in many biological processes. However, less than 1% of membrane protein structures are in the Protein Data Bank. In this context, it is important to develop reliable computational methods for predicting the structures of membrane proteins. Here, we present the first application of random forest (RF) for residue-residue contact prediction in transmembrane proteins, which we term as TMhhcp. Rigorous cross-validation tests indicate that the built RF models provide a more favorable prediction performance compared with two state-of-the-art methods, i.e., TMHcon and MEMPACK. Using a strict leave-one-protein-out jackknifing procedure, they were capable of reaching the top L/5 prediction accuracies of 49.5% and 48.8% for two different residue contact definitions, respectively. The predicted residue contacts were further employed to predict interacting helical pairs and achieved the Matthew's correlation coefficients of 0.430 and 0.424, according to two different residue contact definitions, respectively. To facilitate the academic community, the TMhhcp server has been made freely accessible at http://protein.cau.edu.cn/tmhhcp.  相似文献   

19.
Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for β-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.  相似文献   

20.

Background

Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts.

Results

We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively.

Conclusions

When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号