首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Many of the targets of structural genomics will be proteins with little or no structural similarity to those currently in the database. Therefore, novel function prediction methods that do not rely on sequence or fold similarity to other known proteins are needed. We present an automated approach to predict nucleic-acid-binding (NA-binding) proteins, specifically DNA-binding proteins. The method is based on characterizing the structural and sequence properties of large, positively charged electrostatic patches on DNA-binding protein surfaces, which typically coincide with the DNA-binding-sites. Using an ensemble of features extracted from these electrostatic patches, we predict DNA-binding proteins with high accuracy. We show that our method does not rely on sequence or structure homology and is capable of predicting proteins of novel-binding motifs and protein structures solved in an unbound state. Our method can also distinguish NA-binding proteins from other proteins that have similar, large positive electrostatic patches on their surfaces, but that do not bind nucleic acids.  相似文献   

2.
Protein structure can provide new insight into the biological function of a protein and can enable the design of better experiments to learn its biological roles. Moreover, deciphering the interactions of a protein with other molecules can contribute to the understanding of the protein's function within cellular processes. In this study, we apply a machine learning approach for classifying RNA-binding proteins based on their three-dimensional structures. The method is based on characterizing unique properties of electrostatic patches on the protein surface. Using an ensemble of general protein features and specific properties extracted from the electrostatic patches, we have trained a support vector machine (SVM) to distinguish RNA-binding proteins from other positively charged proteins that do not bind nucleic acids. Specifically, the method was applied on proteins possessing the RNA recognition motif (RRM) and successfully classified RNA-binding proteins from RRM domains involved in protein-protein interactions. Overall the method achieves 88% accuracy in classifying RNA-binding proteins, yet it cannot distinguish RNA from DNA binding proteins. Nevertheless, by applying a multiclass SVM approach we were able to classify the RNA-binding proteins based on their RNA targets, specifically, whether they bind a ribosomal RNA (rRNA), a transfer RNA (tRNA), or messenger RNA (mRNA). Finally, we present here an innovative approach that does not rely on sequence or structural homology and could be applied to identify novel RNA-binding proteins with unique folds and/or binding motifs.  相似文献   

3.
Lee HS  Zhang Y 《Proteins》2012,80(1):93-110
We developed BSP‐SLIM, a new method for ligand–protein blind docking using low‐resolution protein structures. For a given sequence, protein structures are first predicted by I‐TASSER; putative ligand binding sites are transferred from holo‐template structures which are analogous to the I‐TASSER models; ligand–protein docking conformations are then constructed by shape and chemical match of ligand with the negative image of binding pockets. BSP‐SLIM was tested on 71 ligand–protein complexes from the Astex diverse set where the protein structures were predicted by I‐TASSER with an average RMSD 2.92 Å on the binding residues. Using I‐TASSER models, the median ligand RMSD of BSP‐SLIM docking is 3.99 Å which is 5.94 Å lower than that by AutoDock; the median binding‐site error by BSP‐SLIM is 1.77 Å which is 6.23 Å lower than that by AutoDock and 3.43 Å lower than that by LIGSITECSC. Compared to the models using crystal protein structures, the median ligand RMSD by BSP‐SLIM using I‐TASSER models increases by 0.87 Å, while that by AutoDock increases by 8.41 Å; the median binding‐site error by BSP‐SLIM increase by 0.69Å while that by AutoDock and LIGSITECSC increases by 7.31 Å and 1.41 Å, respectively. As case studies, BSP‐SLIM was used in virtual screening for six target proteins, which prioritized actives of 25% and 50% in the top 9.2% and 17% of the library on average, respectively. These results demonstrate the usefulness of the template‐based coarse‐grained algorithms in the low‐resolution ligand–protein docking and drug‐screening. An on‐line BSP‐SLIM server is freely available at http://zhanglab.ccmb.med.umich.edu/BSP‐SLIM . Proteins 2012. © 2011 Wiley Periodicals, Inc.  相似文献   

4.
The FK506-binding proteins (FKBPs) are a unique group of chaperones found in a wide variety of organisms. They perform a number of cellular functions including protein folding, regulation of cytokines, transport of steroid receptor complexes, nucleic acid binding, histone assembly, and modulation of apoptosis. These functions are mediated by specific domains that adopt distinct tertiary conformations. Using the Threading/ASSEmbly/Refinement (TASSER) approach, tertiary structures were predicted for a total of 45 FKBPs in 23 species. These models were compared with previously characterized FKBP solution structures and the predicted structures were employed to identify groups of homologous proteins. The resulting classification may be utilized to infer functional roles of newly discovered FKBPs. The three-dimensional conformations revealed that this family may have undergone several modifications throughout evolution, including loss of N- and C-terminal regions, duplication of FKBP domains as well as insertions of entire functional motifs. Docking simulations suggest that additional sequence segments outside FKBP domains may modulate the binding affinity of FKBPs to immunosuppressive drugs. The docking models also indicate the presence of a helix-loop-helix (HLH) region within a subset of FKBPs, which may be responsible for the interaction between this group of proteins and nucleic acids.  相似文献   

5.
Lee SY  Zhang Y  Skolnick J 《Proteins》2006,63(3):451-456
The TASSER structure prediction algorithm is employed to investigate whether NMR structures can be moved closer to their corresponding X-ray counterparts by automatic refinement procedures. The benchmark protein dataset includes 61 nonhomologous proteins whose structures have been determined by both NMR and X-ray experiments. Interestingly, by starting from NMR structures, the majority (79%) of TASSER refined models show a structural shift toward their X-ray structures. On average, the TASSER refined models have a root-mean-square-deviation (RMSD) from the X-ray structure of 1.785 A (1.556 A) over the entire chain (aligned region), while the average RMSD between NMR and X-ray structures (RMSD(NMR_X-ray)) is 2.080 A (1.731 A). For all proteins having a RMSD(NMR_X-ray) >2 A, the TASSER refined structures show consistent improvement. However, for the 34 proteins with a RMSD(NMR_X-ray) <2 A, there are only 21 cases (60%) where the TASSER model is closer to the X-ray structure than NMR, which may be due to the inherent resolution of TASSER. We also compare the TASSER models with 12 NMR models in the RECOORD database that have been recalculated recently by Nederveen et al. from original NMR restraints using the newest molecular dynamics tools. In 8 of 12 cases, TASSER models show a smaller RMSD to X-ray structures; in 3 of 12 cases, where RMSD(NMR_X-ray) <1 A, RECOORD does better than TASSER. These results suggest that TASSER can be a useful tool to improve the quality of NMR structures.  相似文献   

6.
7.
We evaluate tertiary structure predictions on medium to large size proteins by TASSER, a new algorithm that assembles protein structures through rearranging the rigid fragments from threading templates guided by a reduced Calpha and side-chain based potential consistent with threading based tertiary restraints. Predictions were generated for 745 proteins 201-300 residues in length that cover the Protein Data Bank (PDB) at the level of 35% sequence identity. With homologous proteins excluded, in 365 cases, the templates identified by our threading program, PROSPECTOR_3, have a root-mean-square deviation (RMSD) to native < 6.5 angstroms, with >70% alignment coverage. After TASSER assembly, in 408 cases the best of the top five full-length models has a RMSD < 6.5 angstroms. Among the 745 targets are 18 membrane proteins, with one-third having a predicted RMSD < 5.5 A. For all representative proteins less than or equal to 300 residues that have corresponding multiple NMR structures in the Protein Data Bank, approximately 20% of the models generated by TASSER are closer to the NMR structure centroid than the farthest individual NMR model. These results suggest that reasonable structure predictions for nonhomologous large size proteins can be automatically generated on a proteomic scale, and the application of this approach to structural as well as functional genomics represent promising applications of TASSER.  相似文献   

8.
9.
HI1506 is a 128-residue hypothetical protein of unknown function from Haemophilus influenzae. It was originally annotated as a shorter 85-residue protein, but a more detailed sequence analysis conducted in our laboratory revealed that the full-length protein has an additional 43 residues on the C terminus, corresponding with a region initially ascribed to HI1507. As part of a larger effort to understand the functions of hypothetical proteins from Gram-negative bacteria, and H. influenzae in particular, we report here the three-dimensional solution NMR structure for the corrected full-length HI1506 protein. The structure consists of two well-defined domains, an alpha/beta 50-residue N-domain and a 3-alpha 32-residue C-domain, separated by an unstructured 30-residue linker. Both domains have positively charged surface patches and weak structural homology with folds that are associated with RNA binding, suggesting a possible functional role in binding distal nucleic acid sites.  相似文献   

10.
Yunqi Li  Yang Zhang 《Proteins》2009,76(3):665-676
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen‐bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 nonhomologous proteins with reduced structure decoys generated by I‐TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen‐bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I‐TASSER models in both atomic‐level structural refinement and hydrogen‐bonding network construction. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

11.
An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP3. Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request.  相似文献   

12.

Background

Despite sharing 92% sequence identity, paralogous human translation elongation factor 1 alpha-1 (eEF1A1) and elongation factor 1 alpha-2 (eEF1A2) have different but overlapping functional profiles. This may reflect the differential requirements of the cell-types in which they are expressed and is consistent with complex roles for these proteins that extend beyond delivery of tRNA to the ribosome.

Methodology/Principal Findings

To investigate the structural basis of these functional differences, we created and validated comparative three-dimensional (3-D) models of eEF1A1 and eEF1A2 on the basis of the crystal structure of homologous eEF1A from yeast. The spatial location of amino acid residues that vary between the two proteins was thereby pinpointed, and their surface electrostatic and lipophilic properties were compared. None of the variations amongst buried amino acid residues are judged likely to have a major structural effect on the protein fold, or to affect domain-domain interactions. Nearly all the variant surface-exposed amino acid residues lie on one face of the protein, in two proximal but distinct sub-clusters. The result of previously performed mutagenesis in yeast may be interpreted as confirming the importance of one of these clusters in actin-bundling and filament disorganization. Interestingly, some variant residues lie in close proximity to, and in a few cases show differences in interactions with, residues previously inferred to be directly involved in binding GTP/GDP, eEF1Bα and aminoacyl-tRNA. Additional sequence-based predictions, in conjunction with the 3-D models, reveal likely differences in phosphorylation sites that could reconcile some of the functional differences between the two proteins.

Conclusions

The revelation and putative functional assignment of two distinct sub-clusters on the surface of the protein models should enable rational site-directed mutagenesis, including homologous reverse-substitution experiments, to map surface binding patches onto these proteins. The predicted variant-specific phosphorylation sites also provide a basis for experimental verification by mutagenesis. The models provide a structural framework for interpretation of the resulting functional analysis.  相似文献   

13.
Many proteins function by interacting with other small molecules (ligands). Identification of ligand‐binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand‐binding protein sequences and functions. Consequently, we classified the patches into ~2000 well‐characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross‐fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.  相似文献   

14.
We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins < or =250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP(3), original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted alpha-helix content > or =50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score > or =0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP(3), TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 medium/hard targets <200 amino-acids-long from CASP7. Chunk-TASSER is approximately 11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. Chunk-TASSER is fully automated and can be used in proteome scale protein structure prediction.  相似文献   

15.
Protein binding site prediction using an empirical scoring function   总被引:4,自引:1,他引:3  
Liang S  Zhang C  Liu S  Zhou Y 《Nucleic acids research》2006,34(13):3698-3707
Most biological processes are mediated by interactions between proteins and their interacting partners including proteins, nucleic acids and small molecules. This work establishes a method called PINUP for binding site prediction of monomeric proteins. With only two weight parameters to optimize, PINUP produces not only 42.2% coverage of actual interfaces (percentage of correctly predicted interface residues in actual interface residues) but also 44.5% accuracy in predicted interfaces (percentage of correctly predicted interface residues in the predicted interface residues) in a cross validation using a 57-protein dataset. By comparison, the expected accuracy via random prediction (percentage of actual interface residues in surface residues) is only 15%. The binding sites of the 57-protein set are found to be easier to predict than that of an independent test set of 68 proteins. The average coverage and accuracy for this independent test set are 30.5 and 29.4%, respectively. The significant gain of PINUP over expected random prediction is attributed to (i) effective residue-energy score and accessible-surface-area-dependent interface-propensity, (ii) isolation of functional constraints contained in the conservation score from the structural constraints through the combination of residue-energy score (for structural constraints) and conservation score and (iii) a consensus region built on top-ranked initial patches.  相似文献   

16.
Computational prediction of RNA‐binding residues is helpful in uncovering the mechanisms underlying protein‐RNA interactions. Traditional algorithms individually applied feature‐ or template‐based prediction strategy to recognize these crucial residues, which could restrict their predictive power. To improve RNA‐binding residue prediction, herein we propose the first integrative algorithm termed RBRDetector (RNA‐Binding Residue Detector) by combining these two strategies. We developed a feature‐based approach that is an ensemble learning predictor comprising multiple structure‐based classifiers, in which well‐defined evolutionary and structural features in conjunction with sequential or structural microenvironment were used as the inputs of support vector machines. Meanwhile, we constructed a template‐based predictor to recognize the putative RNA‐binding regions by structurally aligning the query protein to the RNA‐binding proteins with known structures. The final RBRDetector algorithm is an ingenious fusion of our feature‐ and template‐based approaches based on a piecewise function. By validating our predictors with diverse types of structural data, including bound and unbound structures, native and simulated structures, and protein structures binding to different RNA functional groups, we consistently demonstrated that RBRDetector not only had clear advantages over its component methods, but also significantly outperformed the current state‐of‐the‐art algorithms. Nevertheless, the major limitation of our algorithm is that it performed relatively well on DNA‐binding proteins and thus incorrectly predicted the DNA‐binding regions as RNA‐binding interfaces. Finally, we implemented the RBRDetector algorithm as a user‐friendly web server, which is freely accessible at http://ibi.hzau.edu.cn/rbrdetector . Proteins 2014; 82:2455–2471. © 2014 Wiley Periodicals, Inc.  相似文献   

17.
Protein–protein interactions play a key part in most biological processes and understanding their mechanism is a fundamental problem leading to numerous practical applications. The prediction of protein binding sites in particular is of paramount importance since proteins now represent a major class of therapeutic targets. Amongst others methods, docking simulations between two proteins known to interact can be a useful tool for the prediction of likely binding patches on a protein surface. From the analysis of the protein interfaces generated by a massive cross‐docking experiment using the 168 proteins of the Docking Benchmark 2.0, where all possible protein pairs, and not only experimental ones, have been docked together, we show that it is also possible to predict a protein's binding residues without having any prior knowledge regarding its potential interaction partners. Evaluating the performance of cross‐docking predictions using the area under the specificity‐sensitivity ROC curve (AUC) leads to an AUC value of 0.77 for the complete benchmark (compared to the 0.5 AUC value obtained for random predictions). Furthermore, a new clustering analysis performed on the binding patches that are scattered on the protein surface show that their distribution and growth will depend on the protein's functional group. Finally, in several cases, the binding‐site predictions resulting from the cross‐docking simulations will lead to the identification of an alternate interface, which corresponds to the interaction with a biomolecular partner that is not included in the original benchmark. Proteins 2016; 84:1408–1421. © 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.  相似文献   

18.
Brylinski M  Skolnick J 《Proteins》2011,79(3):735-751
The rapid accumulation of gene sequences, many of which are hypothetical proteins with unknown function, has stimulated the development of accurate computational tools for protein function prediction with evolution/structure‐based approaches showing considerable promise. In this article, we present FINDSITE‐metal, a new threading‐based method designed specifically to detect metal‐binding sites in modeled protein structures. Comprehensive benchmarks using different quality protein structures show that weakly homologous protein models provide sufficient structural information for quite accurate annotation by FINDSITE‐metal. Combining structure/evolutionary information with machine learning results in highly accurate metal‐binding annotations; for protein models constructed by TASSER, whose average Cα RMSD from the native structure is 8.9 Å, 59.5% (71.9%) of the best of top five predicted metal locations are within 4 Å (8 Å) from a bound metal in the crystal structure. For most of the targets, multiple metal‐binding sites are detected with the best predicted binding site at rank 1 and within the top two ranks in 65.6% and 83.1% of the cases, respectively. Furthermore, for iron, copper, zinc, calcium, and magnesium ions, the binding metal can be predicted with high, typically 70% to 90%, accuracy. FINDSITE‐metal also provides a set of confidence indexes that help assess the reliability of predictions. Finally, we describe the proteome‐wide application of FINDSITE‐metal that quantifies the metal‐binding complement of the human proteome. FINDSITE‐metal is freely available to the academic community at http://cssb.biology.gatech.edu/findsite‐metal/ . Proteins 2011. © 2010 Wiley‐Liss, Inc.  相似文献   

19.
Yunhui Peng  Emil Alexov 《Proteins》2017,85(2):282-295
Protein–nucleic acid interactions play a crucial role in many biological processes. This work investigates the changes of pKa values and protonation states of ionizable groups (including nucleic acid bases) that may occur at protein–nucleic acid binding. Taking advantage of the recently developed pKa calculation tool DelphiPka, we utilize the large protein–nucleic acid interaction database (NPIDB database) to model pKa shifts caused by binding. It has been found that the protein's interfacial basic residues experience favorable electrostatic interactions while the protein acidic residues undergo proton uptake to reduce the energy cost upon the binding. This is in contrast with observations made for protein–protein complexes. In terms of DNA/RNA, both base groups and phosphate groups of nucleotides are found to participate in binding. Some DNA/RNA bases undergo pKa shifts at complex formation, with the binding process tending to suppress charged states of nucleic acid bases. In addition, a weak correlation is found between the pH‐optimum of protein–DNA/RNA binding free energy and the pH‐optimum of protein folding free energy. Overall, the pH‐dependence of protein–nucleic acid binding is not predicted to be as significant as that of protein–protein association. Proteins 2017; 85:282–295. © 2016 Wiley Periodicals, Inc.  相似文献   

20.
PI2PE (http://pipe.sc.fsu.edu) is a suite of four web servers for predicting a variety of folding- and binding-related properties of proteins. These include the solvent accessibility of amino acids upon protein folding, the amino acids forming the interfaces of protein–protein and protein–nucleic acid complexes, and the binding rate constants of these complexes. Three of the servers debuted in 2007, and have garnered ~2,500 unique users and finished over 30,000 jobs. The functionalities of these servers are now enhanced, and a new sever, for predicting the binding rate constants, has been added. Together, these web servers form a pipeline from protein sequence to tertiary structure, then to quaternary structure, and finally to binding kinetics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号