首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
The Wiggle series are support vector machine–based predictors that identify regions of functional flexibility using only protein sequence information. Functionally flexible regions are defined as regions that can adopt different conformational states and are assumed to be necessary for bioactivity. Many advances have been made in understanding the relationship between protein sequence and structure. This work contributes to those efforts by making strides to understand the relationship between protein sequence and flexibility. A coarse-grained protein dynamic modeling approach was used to generate the dataset required for support vector machine training. We define our regions of interest based on the participation of residues in correlated large-scale fluctuations. Even with this structure-based approach to computationally define regions of functional flexibility, predictors successfully extract sequence-flexibility relationships that have been experimentally confirmed to be functionally important. Thus, a sequence-based tool to identify flexible regions important for protein function has been created. The ability to identify functional flexibility using a sequence based approach complements structure-based definitions and will be especially useful for the large majority of proteins with unknown structures. The methodology offers promise to identify structural genomics targets amenable to crystallization and the possibility to engineer more flexible or rigid regions within proteins to modify their bioactivity.  相似文献   

2.
MOTIVATION: Conformational flexibility is essential to the function of many proteins, e.g. catalytic activity. To assist efforts in determining and exploring the functional properties of a protein, it is desirable to automatically identify regions that are prone to undergo conformational changes. It was recently shown that a probabilistic predictor of continuum secondary structure is more accurate than categorical predictors for structurally ambivalent sequence regions, suggesting that such models are suited to characterize protein flexibility. RESULTS: We develop a computational method for identifying regions that are prone to conformational change directly from the amino acid sequence. The method uses the entropy of the probabilistic output of an 8-class continuum secondary structure predictor. Results for 171 unique amino acid sequences with well-characterized variable structure (identified in the 'Macromolecular movements database') indicate that the method is highly sensitive at identifying flexible protein regions, but false positives remain a problem. The method can be used to explore conformational flexibility of proteins (including hypothetical or synthetic ones) whose structure is yet to be determined experimentally. AVAILABILITY: The predictor, sequence data and supplementary studies are available at http://pprowler.itee.uq.edu.au/sspred/ and are free for academic use.  相似文献   

3.
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function.  相似文献   

4.
5.
Advances reported over the last few years and the increasing availability of protein crystal structure data have greatly improved structure-based druggability approaches. However, in practice, nearly all druggability estimation methods are applied to protein crystal structures as rigid proteins, with protein flexibility often not directly addressed. The inclusion of protein flexibility is important in correctly identifying the druggability of pockets that would be missed by methods based solely on the rigid crystal structure. These include cryptic pockets and flexible pockets often found at protein-protein interaction interfaces. Here, we apply an approach that uses protein modeling in concert with druggability estimation to account for light protein backbone movement and protein side-chain flexibility in protein binding sites. We assess the advantages and limitations of this approach on widely-used protein druggability sets. Applying the approach to all mammalian protein crystal structures in the PDB results in identification of 69 proteins with potential druggable cryptic pockets.  相似文献   

6.
Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

7.
Protein flexibility predictions using graph theory   总被引:6,自引:0,他引:6  
Jacobs DJ  Rader AJ  Kuhn LA  Thorpe MF 《Proteins》2001,44(2):150-165
Techniques from graph theory are applied to analyze the bond networks in proteins and identify the flexible and rigid regions. The bond network consists of distance constraints defined by the covalent and hydrogen bonds and salt bridges in the protein, identified by geometric and energetic criteria. We use an algorithm that counts the degrees of freedom within this constraint network and that identifies all the rigid and flexible substructures in the protein, including overconstrained regions (with more crosslinking bonds than are needed to rigidify the region) and underconstrained or flexible regions, in which dihedral bond rotations can occur. The number of extra constraints or remaining degrees of bond-rotational freedom within a substructure quantifies its relative rigidity/flexibility and provides a flexibility index for each bond in the structure. This novel computational procedure, first used in the analysis of glassy materials, is approximately a million times faster than molecular dynamics simulations and captures the essential conformational flexibility of the protein main and side-chains from analysis of a single, static three-dimensional structure. This approach is demonstrated by comparison with experimental measures of flexibility for three proteins in which hinge and loop motion are essential for biological function: HIV protease, adenylate kinase, and dihydrofolate reductase.  相似文献   

8.
The functional characterization of proteins represents a daily challenge for biochemical, medical and computational sciences. Although finally proved on the bench, the function of a protein can be successfully predicted by computational approaches that drive the further experimental assays. Current methods for comparative modeling allow the construction of accurate 3D models for proteins of unknown structure, provided that a crystal structure of a homologous protein is available. Binding regions can be proposed by using binding site predictors, data inferred from homologous crystal structures, and data provided from a careful interpretation of the multiple sequence alignment of the investigated protein and its homologs. Once the location of a binding site has been proposed, chemical ligands that have a high likelihood of binding can be identified by using ligand docking and structure-based virtual screening of chemical libraries. Most docking algorithms allow building a list sorted by energy of the lowest energy docking configuration for each ligand of the library. In this review the state-of-the-art of computational approaches in 3D protein comparative modeling and in the study of protein–ligand interactions is provided. Furthermore a possible combined/concerted multistep strategy for protein function prediction, based on multiple sequence alignment, comparative modeling, binding region prediction, and structure-based virtual screening of chemical libraries, is described by using suitable examples. As practical examples, Abl-kinase molecular modeling studies, HPV-E6 protein multiple sequence alignment analysis, and some other model docking-based characterization reports are briefly described to highlight the importance of computational approaches in protein function prediction.  相似文献   

9.
Prediction of disordered regions in proteins based on the meta approach   总被引:1,自引:0,他引:1  
MOTIVATION: Intrinsically disordered regions in proteins have no unique stable structures without their partner molecules, thus these regions sometimes prevent high-quality structure determination. Furthermore, proteins with disordered regions are often involved in important biological processes, and the disordered regions are considered to play important roles in molecular interactions. Therefore, identifying disordered regions is important to obtain high-resolution structural information and to understand the functional aspects of these proteins. RESULTS: We developed a new prediction method for disordered regions in proteins based on the meta approach and implemented a web-server for this prediction method named 'metaPrDOS'. The method predicts the disorder tendency of each residue using support vector machines from the prediction results of the seven independent predictors. Evaluation of the meta approach was performed using the CASP7 prediction targets to avoid an overestimation due to the inclusion of proteins used in the training set of some component predictors. As a result, the meta approach achieved higher prediction accuracy than all methods participating in CASP7.  相似文献   

10.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

11.
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.  相似文献   

12.
An algorithm has been developed to estimate flexibility for potential hinge motion at specified residues, that is, the mutual movement of two domains by rotation around a set of main-chain dihedral angles with torsion angles of neighboring side chains as variables. Such conformational changes must occur without severe atomic collisions. Flexible hinges have been found that satisfy such criteria. Sequence flexibility charts were obtained by plotting the flexibility of each residue against the residue number. Such charts were calculated for 10 proteins (ovomucoid third domain, cytochrome c, lysozyme, hemoglobin β-chain, α-chymotrypsin, elastase, carboxypeptidase A, dihydrofolate reductase, triosephosphate isomerase, and alcohol dehydrogenase) taken from the Protein Data Bank. The first step of unfolding is likely to occur at the hinge point with the largest flexibility. Following this idea, the polypeptide chain can be dissected into several folding units according to the sequence flexibility chart. When two domains are separated by conformational changes at such a hinge, the sequence flexibility chart for each domain changes, and it is recalculated and used to indicate subsequent unfolding steps. In this process of iterative estimation of flexibile hinges, some well-isolated hinges, or the border line between flexible and inflexible regions, were found to be directly at or close to the positions of splice junctions in the eukaryotic genes. Of a total of 45 splice junctions in the 10 proteins examined in this paper, 38 junctions can be identified as flexible hinges between folding units. We suggest that the iterative estimation of flexible hinges may define an array of possible folding/unfolding paths, and that the exon–intron arrangement in the gene may be closely correlated with the folding process of the protein.  相似文献   

13.
The extracellular regions of many cell surface proteins of the immune system contain distinct domains that may be linked in many different ways and are often only loosely tethered to the transmembrane segment. In efforts to identify regions critical for binding, molecular models of these domains are used to select residues for mutagenesis and to map binding sites. Many immune cell surface proteins belong to protein superfamilies and display only limited sequence identity compared to proteins of known three-dimensional (3D) structure, often 30% or less. Therefore, detailed 3D structures are difficult to predict, and structure-based sequence analysis and model assessment are particularly important components of the model building process. In some cases, experimentally determined structures have made it possible to assess the accuracy of predictions, which illustrates the opportunities and shortcomings of the approach. Herein the model-based identification of binding sites in cell surface proteins is described and representative examples are discussed.  相似文献   

14.
15.
Liu ZP  Wu LY  Wang Y  Zhang XS  Chen L 《Amino acids》2008,35(3):627-650
One of the major goals of molecular and evolutionary biology is to understand the functions of proteins by extracting functional information from protein sequences, structures and interactions. In this review, we summarize the repertoire of methods currently being applied and report recent progress in the field of in silico annotation of protein function based on the accumulation of vast amounts of sequence and structure data. In particular, we emphasize the newly developed structure-based methods, which are able to identify locally structural motifs and reveal their relationship with protein functions. These methods include computational tools to identify the structural motifs and reveal the strong relationship between these pre-computed local structures and protein functions. We also discuss remaining problems and possible directions for this exciting and challenging area.  相似文献   

16.
Protein molecules require both flexibility and rigidity for functioning. The fast and accurate prediction of protein rigidity/flexibility is one of the important problems in protein science. We have determined flexible regions for four homologous pairs from thermophilic and mesophilic organisms by two methods: the fast FoldUnfold which uses amino acid sequence and the time consuming MDFirst which uses three-dimensional structures. We demonstrate that both methods allow determining flexible regions in protein structure. For three of the four thermophile–mesophile pairs of proteins, FoldUnfold predicts practically the same flexible regions which have been found by the MD/First method. As expected, molecular dynamics simulations show that thermophilic proteins are more rigid in comparison to their mesophilic homologues. Analysis of rigid clusters and their decomposition provides new insights into protein stability. It has been found that the local networks of salt bridges and hydrogen bonds in thermophiles render their structure more stable with respect to fluctuations of individual contacts. Such network includes salt bridge triads Agr-Glu-Lys and Arg-Glu-Arg, or salt bridges (such as Arg-Glu) connected with hydrogen bonds. This ionic network connects alpha helices and rigidifies the structure. Mesophiles can be characterized by stand alone salt bridges and hydrogen bonds or small ionic clusters. Such difference in the network of salt bridges results in different flexibility of homologous proteins. Combining both approaches allows characterizing structural features in atomic detail that determine the rigidity/flexibility of a protein structure. This article is a part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.  相似文献   

17.
Induced-fit effects are well known in the binding of small molecules to proteins and other macromolecular targets. Among other targets, protein kinases are particularly flexible proteins, so that such effects should be considered in attempts at structure-based inhibitor design for kinase targets. This paper outlines some recent progress in methods for including target flexibility in computational studies of molecular recognition. A focus is the "relaxed complex method," in which ligands are docked to an ensemble of conformations of the target, and the best complexes are re-scored to provide predictions of optimal binding geometries. Early applications of this method have suggested a new approach to the development of inhibitors of HIV-1 Integrase.  相似文献   

18.
ABSTRACT: BACKGROUND: RNA molecules play diverse functional and structural roles in cells. They function as messengers for transferring genetic information from DNA to proteins, as the primary genetic material in many viruses, as catalysts (ribozymes) important for protein synthesis and RNA processing, and as essential and ubiquitous regulators of gene expression in living organisms. Many of these functions depend on precisely orchestrated interactions between RNA molecules and specific proteins in cells. Understanding the molecular mechanisms by which proteins recognize and bind RNA is essential for comprehending the functional implications of these interactions, but the recognition 'code' that mediates interactions between proteins and RNA is not yet understood. Success in deciphering this code would dramatically impact the development of new therapeutic strategies for intervening in devastating diseases such as AIDS and cancer. Because of the high cost of experimental determination of protein-RNA interfaces, there is an increasing reliance on statistical machine learning methods for training predictors of RNA-binding residues in proteins. However, because of differences in the choice of datasets, performance measures, and data representations used, it has been difficult to obtain an accurate assessment of the current state of the art in protein-RNA interface prediction. RESULTS: We provide a review of published approaches for predicting RNA-binding residues in proteins and a systematic comparison and critical assessment of protein-RNA interface residue predictors trained using these approaches on three carefully curated non-redundant datasets. We directly compare two widely used machine learning algorithms (Naive Bayes (NB) and Support Vector Machine (SVM)) using three different data representations in which features are encoded using either sequence- or structure-based windows. Our results show that (i) Sequence-based classifiers that use a position-specific scoring matrix (PSSM)-based representation (PSSMSeq) outperform those that use an amino acid identity based representation (IDSeq) or a smoothed PSSM (SmoPSSMSeq); (ii) Structure-based classifiers that use smoothed PSSM representation (SmoPSSMStr) outperform those that use PSSM (PSSMStr) as well as sequence identity based representation (IDStr). PSSMSeq classifiers, when tested on an independent test set of 44 proteins, achieve performance that is comparable to that of three state-of-the-art structure-based predictors (including those that exploit geometric features) in terms of Matthews Correlation Coefficient (MCC), although the structure-based methods achieve substantially higher Specificity (albeit at the expense of Sensitivity) compared to sequence-based methods. We also find that the expected performance of the classifiers on a residue level can be markedly different from that on a protein level. Our experiments show that the classifiers trained on three different non-redundant protein-RNA interface datasets achieve comparable cross-validation performance. However, we find that the results are significantly affected by differences in the distance threshold used to define interface residues. CONCLUSIONS: Our results demonstrate that protein-RNA interface residue predictors that use a PSSM-based encoding of sequence windows outperform classifiers that use other encodings of sequence windows. While structure-based methods that exploit geometric features can yield significant increases in the Specificity of protein-RNA interface residue predictions, such increases are offset by decreases in Sensitivity. These results underscore the importance of comparing alternative methods using rigorous statistical procedures, multiple performance measures, and datasets that are constructed based on several alternative definitions of interface residues and redundancy cutoffs as well as including evaluations on independent test sets into the comparisons.  相似文献   

19.
The primary immunological target of COVID-19 vaccines is the SARS-CoV-2 spike (S) protein. S is exposed on the viral surface and mediates viral entry into the host cell. To identify possible antibody binding sites, we performed multi-microsecond molecular dynamics simulations of a 4.1 million atom system containing a patch of viral membrane with four full-length, fully glycosylated and palmitoylated S proteins. By mapping steric accessibility, structural rigidity, sequence conservation, and generic antibody binding signatures, we recover known epitopes on S and reveal promising epitope candidates for structure-based vaccine design. We find that the extensive and inherently flexible glycan coat shields a surface area larger than expected from static structures, highlighting the importance of structural dynamics. The protective glycan shield and the high flexibility of its hinges give the stalk overall low epitope scores. Our computational epitope-mapping procedure is general and should thus prove useful for other viral envelope proteins whose structures have been characterized.  相似文献   

20.
Cell-surface-anchored immunoglobulin superfamily (IgSF) proteins are widespread throughout the human proteome, forming crucial components of diverse biological processes including immunity, cell-cell adhesion, and carcinogenesis. IgSF proteins generally function through protein-protein interactions carried out between extracellular, membrane-bound proteins on adjacent cells, known as trans-binding interfaces. These protein-protein interactions constitute a class of pharmaceutical targets important in the treatment of autoimmune diseases, chronic infections, and cancer. A molecular-level understanding of IgSF protein-protein interactions would greatly benefit further drug development. A critical step toward this goal is the reliable identification of IgSF trans-binding interfaces. We propose a novel combination of structure and sequence information to identify trans-binding interfaces in IgSF proteins. We developed a structure-based binding interface prediction approach that can identify broad regions of the protein surface that encompass the binding interfaces and suggests that IgSF proteins possess binding supersites. These interfaces could theoretically be pinpointed using sequence-based conservation analysis, with performance approaching the theoretical upper limit of binding interface prediction accuracy, but achieving this in practice is limited by the current ability to identify an appropriate multiple sequence alignment for conservation analysis. However, an important contribution of combining the two orthogonal methods is that agreement between these approaches can estimate the reliability of the predictions. This approach was benchmarked on the set of 22 IgSF proteins with experimentally solved structures in complex with their ligands. Additionally, we provide structure-based predictions and reliability scores for the 62 IgSF proteins with known structure but yet uncharacterized binding interfaces.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号