首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Chameleon sequences (ChSeqs) refer to sequence strings of identical amino acids that can adopt different conformations in protein structures. Researchers have detected and studied ChSeqs to understand the interplay between local and global interactions in protein structure formation. The different secondary structures adopted by one ChSeq challenge sequence‐based secondary structure predictors. With increasing numbers of available Protein Data Bank structures, we here identify a large set of ChSeqs ranging from 6 to 10 residues in length. The homologous ChSeqs discovered highlight the structural plasticity involved in biological function. When compared with previous studies, the set of unrelated ChSeqs found represents an about 20‐fold increase in the number of detected sequences, as well as an increase in the longest ChSeq length from 8 to 10 residues. We applied secondary structure predictors on our ChSeqs and found that methods based on a sequence profile outperformed methods based on a single sequence. For the unrelated ChSeqs, the evolutionary information provided by the sequence profile typically allows successful prediction of the prevailing secondary structure adopted in each protein family. Our dataset will facilitate future studies of ChSeqs, as well as interpretations of the interplay between local and nonlocal interactions. A user‐friendly web interface for this ChSeq database is available at prodata.swmed.edu/chseq .  相似文献   

3.
The bacterial elongation factor RfaH promotes the expression of virulence factors by specifically binding to RNA polymerases (RNAP) paused at a DNA signal. This behavior is unlike that of its paralog NusG, the major representative of the protein family to which RfaH belongs. Both proteins have an N-terminal domain (NTD) bearing an RNAP binding site, yet NusG C-terminal domain (CTD) is folded as a β-barrel while RfaH CTD is forming an α-hairpin blocking such site. Upon recognition of the specific DNA exposed by RNAP, RfaH is activated via interdomain dissociation and complete CTD structural rearrangement into a β-barrel structurally identical to NusG CTD. Although RfaH transformation has been extensively characterized computationally, little attention has been given to the role of the NTD in the fold-switching process, as its structure remains unchanged. Here, we used Associative Water-mediated Structure and Energy Model (AWSEM) molecular dynamics to characterize the transformation of RfaH, spotlighting the sequence-dependent effects of NTD on CTD fold stabilization. Umbrella sampling simulations guided by native contacts recapitulate the thermodynamic equilibrium experimentally observed for RfaH and its isolated CTD. Temperature refolding simulations of full-length RfaH show a high success towards α-folded CTD, whereas the NTD interferes with βCTD folding, becoming trapped in a β-barrel intermediate. Meanwhile, NusG CTD refolding is unaffected by the presence of RfaH NTD, showing that these NTD-CTD interactions are encoded in RfaH sequence. Altogether, these results suggest that the NTD of RfaH favors the α-folded RfaH by specifically orienting the αCTD upon interdomain binding and by favoring β-barrel rupture into an intermediate from which fold-switching proceeds.  相似文献   

4.
Designing protein sequences that can fold into a given structure is a well‐known inverse protein‐folding problem. One important characteristic to attain for a protein design program is the ability to recover wild‐type sequences given their native backbone structures. The highest average sequence identity accuracy achieved by current protein‐design programs in this problem is around 30%, achieved by our previous system, SPIN. SPIN is a program that predicts sequences compatible with a provided structure using a neural network with fragment‐based local and energy‐based nonlocal profiles. Our new model, SPIN2, uses a deep neural network and additional structural features to improve on SPIN. SPIN2 achieves over 34% in sequence recovery in 10‐fold cross‐validation and independent tests, a 4% improvement over the previous version. The sequence profiles generated from SPIN2 are expected to be useful for improving existing fold recognition and protein design techniques. SPIN2 is available at http://sparks-lab.org .  相似文献   

5.
Lim Heo  Michael Feig 《Proteins》2020,88(5):637-642
Protein structure prediction has long been available as an alternative to experimental structure determination, especially via homology modeling based on templates from related sequences. Recently, models based on distance restraints from coevolutionary analysis via machine learning to have significantly expanded the ability to predict structures for sequences without templates. One such method, AlphaFold, also performs well on sequences where templates are available but without using such information directly. Here we show that combining machine-learning based models from AlphaFold with state-of-the-art physics-based refinement via molecular dynamics simulations further improves predictions to outperform any other prediction method tested during the latest round of CASP. The resulting models have highly accurate global and local structures, including high accuracy at functionally important interface residues, and they are highly suitable as initial models for crystal structure determination via molecular replacement.  相似文献   

6.
Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15–25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at .  相似文献   

7.
The current state of the art in modeling protein structure has been assessed, based on the results of the CASP (Critical Assessment of protein Structure Prediction) experiments. In comparative modeling, improvements have been made in sequence alignment, sidechain orientation and loop building. Refinement of the models remains a serious challenge. Improved sequence profile methods have had a large impact in fold recognition. Although there has been some progress in alignment quality, this factor still limits model usefulness. In ab initio structure prediction, there has been notable progress in building approximately correct structures of 40-60 residue-long protein fragments. There is still a long way to go before the general ab initio prediction problem is solved. Overall, the field is maturing into a practical technology, able to deliver useful models for a large number of sequences.  相似文献   

8.
Sean Burke  Ron Elber 《Proteins》2012,80(2):463-470
Exhaustive enumeration of sequences and folds is conducted for a simple lattice model of conformations, sequences, and energies. Examination of all foldable sequences and their nearest connected neighbors (sequences that differ by no more than a point mutation) illustrates the following: (i) There exist unusually large number of sequences that fold into a few structures (super‐folds). The same observation was made experimentally and computationally using stochastic sampling and exhaustive enumeration of related models. (ii) There exist only a few large networks of connected sequences that are not restricted to one fold. These networks cover a significant fraction of fold spaces (super‐networks). (iii) There exist barriers in sequence space that prevent foldable sequences of the same structure to “connect” through a series of single point mutations (super‐barrier), even in the presence of the sequence connection between folds. While there is ample experimental evidence for the existence of super‐folds, evidence for a super‐network is just starting to emerge. The prediction of a sequence barrier is an intriguing characteristic of sequence space, suggesting that the overall sequence space may be disconnected. The implications and limitations of these observations for evolution of protein structures are discussed. Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

9.
The prediction of 1D structural properties of proteins is an important step toward the prediction of protein structure and function, not only in the ab initio case but also when homology information to known structures is available. Despite this the vast majority of 1D predictors do not incorporate homology information into the prediction process. We develop a novel structural alignment method, SAMD, which we use to build alignments of putative remote homologues that we compress into templates of structural frequency profiles. We use these templates as additional input to ensembles of recursive neural networks, which we specialise for the prediction of query sequences that show only remote homology to any Protein Data Bank structure. We predict four 1D structural properties – secondary structure, relative solvent accessibility, backbone structural motifs, and contact density. Secondary structure prediction accuracy, tested by five‐fold cross‐validation on a large set of proteins allowing less than 25% sequence identity between training and test set and query sequences and templates, exceeds 82%, outperforming its ab initio counterpart, other state‐of‐the‐art secondary structure predictors (Jpred 3 and PSIPRED) and two other systems based on PSI‐BLAST and COMPASS templates. We show that structural information from homologues improves prediction accuracy well beyond the Twilight Zone of sequence similarity, even below 5% sequence identity, for all four structural properties. Significant improvement over the extraction of structural information directly from PDB templates suggests that the combination of sequence and template information is more informative than templates alone. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

10.
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.  相似文献   

11.
Tom Defay  Fred E. Cohen 《Proteins》1995,23(3):431-445
The results of a protein structure prediction contest are reviewed. Twelve different groups entered predictions on 14 proteins of known sequence whose structures had been determined but not yet disseminated to the scientific community. Thus, these represent true tests of the current state of structure prediction methodologies. From this work, it is clear that accurate tertiary structure prediction is not yet possible. However, protein fold and motif prediction are possible when the motif is recognizably similar to another known structure. Internal symmetry and the information inherent in an aligned family of homologous sequences facilitate predictive efforts. Novel folds remain a major challenge for prediction efforts. © 1995 Wiley-Liss, Inc.  相似文献   

12.
Coevolution-based contact prediction, either directly by coevolutionary couplings resulting from global statistical sequence models or using structural supervision and deep learning, has found widespread application in protein-structure prediction from sequence. However, one of the basic assumptions in global statistical modeling is that sequences form an at least approximately independent sample of an unknown probability distribution, which is to be learned from data. In the case of protein families, this assumption is obviously violated by phylogenetic relations between protein sequences. It has turned out to be notoriously difficult to take phylogenetic correlations into account in coevolutionary model learning. Here, we propose a complementary approach: we develop strategies to randomize or resample sequence data, such that conservation patterns and phylogenetic relations are preserved, while intrinsic (i.e. structure- or function-based) coevolutionary couplings are removed. A comparison between the results of Direct Coupling Analysis applied to real and to resampled data shows that the largest coevolutionary couplings, i.e. those used for contact prediction, are only weakly influenced by phylogeny. However, the phylogeny-induced spurious couplings in the resampled data are compatible in size with the first false-positive contact predictions from real data. Dissecting functional from phylogeny-induced couplings might therefore extend accurate contact predictions to the range of intermediate-size couplings.  相似文献   

13.
Detailed primary sequence and secondary structure analyses are reported for the hyaluronate binding region (G1 domain) and link protein of proteoglycan aggregates. These are based on six full or partial sequences from the chicken, pig, human, rat and bovine proteins. Determinations of a full pig and a partial human link protein sequence are reported in the Appendix. Five sequences at the N terminus in both proteins were compared with the structures of 11 variable immunoglobulin (Ig) fold domains for which crystal structures are available. Despite only modest sequence homology, a clear alignment could be proposed. Analysis of this shows that the equivalents of the first and second hypervariable segments are now significantly longer, and both proteins have N-terminal extensions that are up to 23 residues in length. Secondary structure predictions showed that these sequences could be identified with available crystal structures for the variable Ig fold. However the hydrophobic residues involved in interactions between the light and heavy chains in Igs are replaced by hydrophilic charged groups in both proteins. These results imply that both proteins are members of the Ig superfamily, but exhibit structural differences distinct from other members of this superfamily for which crystal structures are known. The proteoglycan tandem repeat (PTR) is a repeat of 99 residues that is found twice in the amino acid sequence of link protein and the proteoglycan G1 domain adjacent to the Ig fold, and also twice in the proteoglycan G2 domain. A total of 16 PTRs was available for analysis. Compositional analyses show that these are positively charged if these originate from link protein, and negatively charged if from the G1 or G2 domains. The 16 Robson secondary structure predictions for the PTRs were averaged to improve the statistics of the prediction, and checked by comparison with Chou-Fasman calculations. A strong alpha-helix prediction was found at residues 13 to 25, and several beta-strands were predicted. The overall content is 18% alpha-helix and 28% beta-sheet, with 44% of the remaining sequence being predicted as turns. These analyses show that both the proteoglycan G1 domain and link protein are constructed from two distinct globular components, which may provide the two functional roles of these proteins in proteoglycan aggregation.  相似文献   

14.
Ancestral sequence reconstruction has had recent success in decoding the origins and the determinants of complex protein functions. However, phylogenetic analyses of remote homologues must handle extreme amino acid sequence diversity resulting from extended periods of evolutionary change. We exploited the wealth of protein structures to develop an evolutionary model based on protein secondary structure. The approach follows the differences between discrete secondary structure states observed in modern proteins and those hypothesized in their immediate ancestors. We implemented maximum likelihood-based phylogenetic inference to reconstruct ancestral secondary structure. The predictive accuracy from the use of the evolutionary model surpasses that of comparative modeling and sequence-based prediction; the reconstruction extracts information not available from modern structures or the ancestral sequences alone. Based on a phylogenetic analysis of a sequence-diverse protein family, we showed that the model can highlight relationships that are evolutionarily rooted in structure and not evident in amino acid-based analysis.  相似文献   

15.
McGuffin LJ  Jones DT 《Proteins》2002,48(1):44-52
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.  相似文献   

16.
17.
Mehdi Mirzaie 《Proteins》2018,86(4):467-474
Evaluation of protein structures needs a trustworthy potential function. Although several knowledge‐based potential functions exist, the impact of different types of amino acids in the scoring functions has not been studied yet. Previously, we have reported the importance of nonlocal interactions in scoring function (based on Delaunay tessellation) in discrimination of native structures. Then, we have questioned the structural impact of hydrophobic amino acids in protein fold recognition. Therefore, a Hydrophobic Reduced Model (HRM) was designed to reduce protein structure of FS (Full Structure) into RS (Reduced Structure). RS is considered as a reduced structure of only seven hydrophobic amino acids (L, V, F, I, A, W, Y) and all their interactions. The presented model was evaluated via four different performance metrics including the number of correctly identified natives, the Z‐score of the native energy, the RMSD of the minimum score, and the Pearson correlation coefficient between the energy and the model quality. Results indicated that only nonlocal interactions between hydrophobic amino acids could be sufficient and accurate enough for protein fold recognition. Interestingly, the results of HRM is significantly close to the model that considers all amino acids (20‐amino acid model) to discriminate the native structure of the proteins on eleven decoy sets. This indicates that the power of knowledge‐based potential functions in protein fold recognition is mostly due to hydrophobic interactions. Hence, we suggest combining a different well‐designed scoring function for non‐hydrophobic interactions with HRM to achieve better performance in fold recognition.  相似文献   

18.
Arriving at the native conformation of a polypeptide chain characterized by minimum most free energy is a problem of long standing interest in protein structure prediction endeavors. Owing to the computational requirements in developing free energy estimates, scoring functions--energy based or statistical--have received considerable renewed attention in recent years for distinguishing native structures of proteins from non-native like structures. Several cleverly designed decoy sets, CASP (Critical Assessment of Techniques for Protein Structure Prediction) structures and homology based internet accessible three dimensional model builders are now available for validating the scoring functions. We describe here an all-atom energy based empirical scoring function and examine its performance on a wide series of publicly available decoys. Barring two protein sequences where native structure is ranked second and seventh, native is identified as the lowest energy structure in 67 protein sequences from among 61,659 decoys belonging to 12 different decoy sets. We further illustrate a potential application of the scoring function in bracketing native-like structures of two small mixed alpha/beta globular proteins starting from sequence and secondary structural information. The scoring function has been web enabled at www.scfbio-iitd.res.in/utility/proteomics/energy.jsp.  相似文献   

19.
MOTIVATION: The number of protein families has been estimated to be as small as 1000. Recent study shows that the growth in discovery of novel structures that are deposited into PDB and the related rate of increase of SCOP categories are slowing down. This indicates that the protein structure space will be soon covered and thus we may be able to derive most of remaining structures by using the known folding patterns. Present tertiary structure prediction methods behave well when a homologous structure is predicted, but give poorer results when no homologous templates are available. At the same time, some proteins that share twilight-zone sequence identity can form similar folds. Therefore, determination of structural similarity without sequence similarity would be beneficial for prediction of tertiary structures. RESULTS: The proposed PFRES method for automated protein fold classification from low identity (<35%) sequences obtains 66.4% and 68.4% accuracy for two test sets, respectively. PFRES obtains 6.3-12.4% higher accuracy than the existing methods. The prediction accuracy of PFRES is shown to be statistically significantly better than the accuracy of competing methods. Our method adopts a carefully designed, ensemble-based classifier, and a novel, compact and custom-designed feature representation that includes nearly 90% less features than the representation of the most accurate competing method (36 versus 283). The proposed representation combines evolutionary information by using the PSI-BLAST profile-based composition vector and information extracted from the secondary structure predicted with PSI-PRED. AVAILABILITY: The method is freely available from the authors upon request.  相似文献   

20.
Tim J. Hubbard  J. Park 《Proteins》1995,23(3):398-402
Protein structure predictions were submitted for 9 of the target sequences in the competition that ran during 1994. Targets sequences were selected that had no known homology with any sequence of known structure and were members of a reasonably sized family of related but divergent sequences. The objective was either to recognize a compatible fold for the target sequence in the database of known structures or to predict ab initio its rough 3D topology. The main tools used were Hidden Markov models (HMM) for fold recognition, a β- strand pair potential to predict β-sheet topology, and the PHD server for secondary structure prediction. Compatible folds were correctly identified in a number of cases and the β-strand pair potential was shown to be a useful tool for ab initio topology prediction. © 1995 Wiley-Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号