首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A model of the BR96 antibody variable regions is compared to two X-ray structures of a BR96–carbohydrate complex, independently determined after the model was built and analyzed. The comparison illustrates the opportunities and limitations of antibody modeling. Encouraging results were obtained for the prediction of single CDR loop conformations and for the outline of the BR96 antigen binding site. The comparison of CDR loop conformations in the two X-ray structures provides a realistic reference frame for the CDR loop predictions. CDR loop prediction accuracy is lower when not only conformational, but also positional criteria are taken into account.  相似文献   

3.
Structural GenomiX, Inc. (SGX), four New York area institutions, and two University of California schools have formed the New York Structural GenomiX Research Consortium (NYSGXRC), an industrial/academic Research Consortium that exploits individual core competencies to support all aspects of the NIH-NIGMS funded Protein Structure Initiative (PSI), including protein family classification and target selection, generation of protein for biophysical analyses, sample preparation for structural studies, structure determination and analyses, and dissemination of results. At the end of the PSI Pilot Study Phase (PSI-1), the NYSGXRC will be capable of producing 100–200 experimentally determined protein structures annually. All Consortium activities can be scaled to increase production capacity significantly during the Production Phase of the PSI (PSI-2). The Consortium utilizes both centralized and de-centralized production teams with clearly defined deliverables and hand-off procedures that are supported by a web-based target/sample tracking system (SGX Laboratory Information Data Management System, LIMS, and NYSGXRC Internal Consortium Experimental Database, ICE-DB). Consortium management is provided by an Executive Committee, which is composed of the PI and all Co-PIs. Progress to date is tracked on a publicly available Consortium web site (http://www.nysgxrc.org) and all DNA/protein reagents and experimental protocols are distributed freely from the New York City Area institutions. In addition to meeting the requirements of the Pilot Study Phase and preparing for the Production Phase of the PSI, the NYSGXRC aims to develop modular technologies that are transferable to structural biology laboratories in both academe and industry. The NYSGXRC PI and Co-PIs intend the PSI to have a transforming effect on the disciplines of X-ray crystallography and NMR spectroscopy of biological macromolecules. Working with other PSI-funded Centers, the NYSGXRC seeks to create the structural biology laboratory of the future. Herein, we present an overview of the organization of the NYSGXRC and describe progress toward development of a high-throughput Gene→Structure platform. An analysis of current and projected consortium metrics reflects progress to date and delineates opportunities for further technology development.  相似文献   

4.
Babor M  Gerzon S  Raveh B  Sobolev V  Edelman M 《Proteins》2008,70(1):208-217
Metal ions are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Current tools for predicting metal-protein interactions are based on proteins crystallized with their metal ions present (holo forms). However, a majority of resolved structures are free of metal ions (apo forms). Moreover, metal binding is a dynamic process, often involving conformational rearrangement of the binding pocket. Thus, effective predictions need to be based on the structure of the apo state. Here, we report an approach that identifies transition metal-binding sites in apo forms with a resulting selectivity >95%. Applying the approach to apo forms in the Protein Data Bank and structural genomics initiative identifies a large number of previously unknown, putative metal-binding sites, and their amino acid residues, in some cases providing a first clue to the function of the protein.  相似文献   

5.
Cai XH  Jaroszewski L  Wooley J  Godzik A 《Proteins》2011,79(8):2389-2402
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.  相似文献   

6.
Identification of protein biochemical functions based on their three-dimensional structures is now required in the post-genome-sequencing era. Ligand binding is one of the major biochemical functions of proteins, and thus the identification of ligands and their binding sites is the starting point for the function identification. Previously we reported our first trial on structure-based function prediction, based on the similarity searches of molecular surfaces against the functional site database. Here we describe the extension of our first trial by expanding the search database to whole heteroatom binding sites appearing within the Protein Data Bank (PDB) with the new analysis protocol. In addition, we have determined the similarity threshold line, by using 10 structure pairs with solved free and complex structures. Finally, we extensively applied our method to newly determined hypothetical proteins, including some without annotations, and evaluated the performance of our methods.  相似文献   

7.
A method for simultaneous alignment of multiple protein structures   总被引:1,自引:0,他引:1  
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2004,56(1):143-156
Here, we present MultiProt, a fully automated highly efficient technique to detect multiple structural alignments of protein structures. MultiProt finds the common geometrical cores between input molecules. To date, most methods for multiple alignment start from the pairwise alignment solutions. This may lead to a small overall alignment. In contrast, our method derives multiple alignments from simultaneous superpositions of input molecules. Further, our method does not require that all input molecules participate in the alignment. Actually, it efficiently detects high scoring partial multiple alignments for all possible number of molecules in the input. To demonstrate the power of MultiProt, we provide a number of case studies. First, we demonstrate known multiple alignments of protein structures to illustrate the performance of MultiProt. Next, we present various biological applications. These include: (1) a partial alignment of hinge-bent domains; (2) identification of functional groups of G-proteins; (3) analysis of binding sites; and (4) protein-protein interface alignment. Some applications preserve the sequence order of the residues in the alignment, whereas others are order-independent. It is their residue sequence order-independence that allows application of MultiProt to derive multiple alignments of binding sites and of protein-protein interfaces, making MultiProt an extremely useful structural tool.  相似文献   

8.
The ability to predict protein function from structure is becoming increasingly important as the number of structures resolved is growing more rapidly than our capacity to study function. Current methods for predicting protein function are mostly reliant on identifying a similar protein of known function. For proteins that are highly dissimilar or are only similar to proteins also lacking functional annotations, these methods fail. Here, we show that protein function can be predicted as enzymatic or not without resorting to alignments. We describe 1178 high-resolution proteins in a structurally non-redundant subset of the Protein Data Bank using simple features such as secondary-structure content, amino acid propensities, surface properties and ligands. The subset is split into two functional groupings, enzymes and non-enzymes. We use the support vector machine-learning algorithm to develop models that are capable of assigning the protein class. Validation of the method shows that the function can be predicted to an accuracy of 77% using 52 features to describe each protein. An adaptive search of possible subsets of features produces a simplified model based on 36 features that predicts at an accuracy of 80%. We compare the method to sequence-based methods that also avoid calculating alignments and predict a recently released set of unrelated proteins. The most useful features for distinguishing enzymes from non-enzymes are secondary-structure content, amino acid frequencies, number of disulphide bonds and size of the largest cleft. This method is applicable to any structure as it does not require the identification of sequence or structural similarity to a protein of known function.  相似文献   

9.
Minai R  Matsuo Y  Onuki H  Hirota H 《Proteins》2008,72(1):367-381
Many drugs, even ones that are designed to act selectively on a target protein, bind unintended proteins. These unintended bindings can explain side effects or indicate additional mechanisms for a drug's medicinal properties. Structural similarity between binding sites is one of the reasons for binding to multiple targets. We developed a method for the structural alignment of atoms in the solvent-accessible surface of proteins that uses similarities in the local atomic environment, and carried out all-against-all structural comparisons for 48,347 potential ligand-binding regions from a nonredundant protein structure subset (nrPDB, provided by NCBI). The relationships between the similarity of ligand-binding regions and the similarity of the global structures of the proteins containing the binding regions were examined. We found 10,403 known ligand-binding region pairs whose structures were similar despite having different global folds. Of these, we detected 281 region pairs that had similar ligands with similar binding modes. These proteins are good examples of convergent evolution. In addition, we found a significant correlation between Z-score of structural similarity and true positive rate of "active" entries in the PubChem BioAssay database. Moreover, we confirmed the interaction between ibuprofen and a new target, porcine pancreatic elastase, by NMR experiment. Finally, we used this method to predict new drug-target protein interactions. We obtained 540 predictions for 105 drugs (e.g., captopril, lovastatin, flurbiprofen, metyrapone, and salicylic acid), and calculated the binding affinities using AutoDock simulation. The results of these structural comparisons are available at http://www.tsurumi.yokohama-cu.ac.jp/fold/database.html.  相似文献   

10.
Brakoulias A  Jackson RM 《Proteins》2004,56(2):250-260
A method is described for the rapid comparison of protein binding sites using geometric matching to detect similar three-dimensional structure. The geometric matching detects common atomic features through identification of the maximum common sub-graph or clique. These features are not necessarily evident from sequence or from global structural similarity giving additional insight into molecular recognition not evident from current sequence or structural classification schemes. Here we use the method to produce an all-against-all comparison of phosphate binding sites in a number of different nucleotide phosphate-binding proteins. The similarity search is combined with clustering of similar sites to allow a preliminary structural classification. Clustering by site similarity produces a classification of binding sites for the 476 representative local environments producing ten main clusters representing half of the representative environments. The similarities make sense in terms of both structural and functional classification schemes. The ten main clusters represent a very limited number of unique structural binding motifs for phosphate. These are the structural P-loop, di-nucleotide binding motif [FAD/NAD(P)-binding and Rossman-like fold] and FAD-binding motif. Similar classification schemes for nucleotide binding proteins have also been arrived at independently by others using different methods.  相似文献   

11.
Methods for predicting protein function from structure are becoming more important as the rate at which structures are solved increases more rapidly than experimental knowledge. As a result, protein structures now frequently lack functional annotations. The majority of methods for predicting protein function are reliant upon identifying a similar protein and transferring its annotations to the query protein. This method fails when a similar protein cannot be identified, or when any similar proteins identified also lack reliable annotations. Here, we describe a method that can assign function from structure without the use of algorithms reliant upon alignments. Using simple attributes that can be calculated from any crystal structure, such as secondary structure content, amino acid propensities, surface properties and ligands, we describe each enzyme in a non-redundant set. The set is split according to Enzyme Classification (EC) number. We combine the predictions of one-class versus one-class support vector machine models to make overall assignments of EC number to an accuracy of 35% with the top-ranked prediction, rising to 60% accuracy with the top two ranks. In doing so we demonstrate the utility of simple structural attributes in protein function prediction and shed light on the link between structure and function. We apply our methods to predict the function of every currently unclassified protein in the Protein Data Bank.  相似文献   

12.
Structural genomics projects require strategies for rapidly recognizing protein sequences appropriate for routine structure determination. For large proteins, this strategy includes the dissection of proteins into structural domains that form stable native structures. However, protein dissection essentially remains an empirical and often a tedious process. Here, we describe a simple strategy for rapidly identifying structural domains and assessing their structures. This approach combines the computational prediction of sequence regions corresponding to putative domains with an experimental assessment of their structures and stabilities by NMR and biochemical methods. We tested this approach with nine putative domains predicted from a set of 108 Thermus thermophilus HB8 sequences using PASS, a domain prediction program we previously reported. To facilitate the experimental assessment of the domain structures, we developed a generic 6-hour His-tag-based purification protocol, which enables the sample quality evaluation of a putative structural domain in a single day. As a result, we observed that half of the predicted structural domains were indeed natively folded, as judged by their HSQC spectra. Furthermore, two of the natively folded domains were novel, without related sequences classified in the Pfam and SMART databases, which is a significant result with regard to the ability of structural genomics projects to uniformly cover the protein fold space.  相似文献   

13.
Rai BK  Fiser A 《Proteins》2006,63(3):644-661
A major bottleneck in comparative protein structure modeling is the quality of input alignment between the target sequence and the template structure. A number of alignment methods are available, but none of these techniques produce consistently good solutions for all cases. Alignments produced by alternative methods may be superior in certain segments but inferior in others when compared to each other; therefore, an accurate solution often requires an optimal combination of them. To address this problem, we have developed a new approach, Multiple Mapping Method (MMM). The algorithm first identifies the alternatively aligned regions from a set of input alignments. These alternatively aligned segments are scored using a composite scoring function, which determines their fitness within the structural environment of the template. The best scoring regions from a set of alternative segments are combined with the core part of the alignments to produce the final MMM alignment. The algorithm was tested on a dataset of 1400 protein pairs using 11 combinations of two to four alignment methods. In all cases MMM showed statistically significant improvement by reducing alignment errors in the range of 3 to 17%. MMM also compared favorably over two alignment meta-servers. The algorithm is computationally efficient; therefore, it is a suitable tool for genome scale modeling studies.  相似文献   

14.
We introduce a new algorithm, IRECS (Iterative REduction of Conformational Space), for identifying ensembles of most probable side-chain conformations for homology modeling. On the basis of a given rotamer library, IRECS ranks all side-chain rotamers of a protein according to the probability with which each side chain adopts the respective rotamer conformation. This ranking enables the user to select small rotamer sets that are most likely to contain a near-native rotamer for each side chain. IRECS can therefore act as a fast heuristic alternative to the Dead-End-Elimination algorithm (DEE). In contrast to DEE, IRECS allows for the selection of rotamer subsets of arbitrary size, thus being able to define structure ensembles for a protein. We show that the selection of more than one rotamer per side chain is generally meaningful, since the selected rotamers represent the conformational space of flexible side chains. A knowledge-based statistical potential ROTA was constructed for the IRECS algorithm. The potential was optimized to discriminate between side-chain conformations of native and rotameric decoys of protein structures. By restricting the number of rotamers per side chain to one, IRECS can optimize side chains for a single conformation model. The average accuracy of IRECS for the chi1 and chi1+2 dihedral angles amounts to 84.7% and 71.6%, respectively, using a 40 degrees cutoff. When we compared IRECS with SCWRL and SCAP, the performance of IRECS was comparable to that of both methods. IRECS and the ROTA potential are available for download from the URL http://irecs.bioinf.mpi-inf.mpg.de.  相似文献   

15.
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.  相似文献   

16.
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods.  相似文献   

17.
Homology modeling is a powerful technique that greatly increases the value of experimental structure determination by using the structural information of one protein to predict the structures of homologous proteins. We have previously described a method of homology modeling by satisfaction of spatial restraints (Li et al., Protein Sci 1997;6:956-970). The Homology Modeling Automatically (HOMA) web site, , is a new tool, using this method to predict 3D structure of a target protein based on the sequence alignment of the target protein to a template protein and the structure coordinates of the template. The user is presented with the resulting models, together with an extensive structure validation report providing critical assessments of the quality of the resulting homology models. The homology modeling method employed by HOMA was assessed and validated using twenty-four groups of homologous proteins. Using HOMA, homology models were generated for 510 proteins, including 264 proteins modeled with correct folds and 246 modeled with incorrect folds. Accuracies of these models were assessed by superimposition on the corresponding experimentally determined structures. A subset of these results was compared with parallel studies of modeling accuracy using several other automated homology modeling approaches. Overall, HOMA provides prediction accuracies similar to other state-of-the-art homology modeling methods. We also provide an evaluation of several structure quality validation tools in assessing the accuracy of homology models generated with HOMA. This study demonstrates that Verify3D (Luthy et al., Nature 1992;356:83-85) and ProsaII (Sippl, Proteins 1993;17:355-362) are most sensitive in distinguishing between homology models with correct or incorrect folds. For homology models that have the correct fold, the steric conformational energy (including primarily the Van der Waals energy), MolProbity clashscore (Word et al., Protein Sci 2000;9:2251-2259), and the PROCHECK G-factors (Laskowski et al., J Biomol NMR 1996;8:477-486) provide sensitive and consistent methods for assessing accuracy and can distinguish between homology models of higher and lower accuracy. As demonstrated in the accompanying paper (Bhattacharya et al., accompanying paper), combinations of these scores for models generated with HOMA provide a basis for distinguishing low from high accuracy models.  相似文献   

18.
We present a Model Quality Assessment Program (MQAP), called MQAPsingle, for ranking and assessing the absolute global quality of single protein models. MQAPsingle is quasi single‐model MQAP, a method that combines advantages of both “pure” single‐model MQAPs and clustering MQAPs. This approach results in higher accuracy compared to the state‐of‐the‐art single‐model MQAPs. Notably, the prediction for a given model is the same regardless if this model is submitted to our server alone or together with other models. Proteins 2016; 84:1021–1028. © 2015 Wiley Periodicals, Inc.  相似文献   

19.
Protein function elucidation often relies heavily on amino acid sequence analysis and other bioinformatics approaches. The reliance is extended to structure homology modeling for ligand docking and protein–protein interaction mapping. However, sequence analysis of RPA3313 exposes a large, unannotated class of hypothetical proteins mostly from the Rhizobiales order. In the absence of sequence and structure information, further functional elucidation of this class of proteins has been significantly hindered. A high quality NMR structure of RPA3313 reveals that the protein forms a novel split ββαβ fold with a conserved ligand binding pocket between the first β‐strand and the N‐terminus of the α‐helix. Conserved residue analysis and protein–protein interaction prediction analyses reveal multiple protein binding sites and conserved functional residues. Results of a mass spectrometry proteomic analysis strongly point toward interaction with the ribosome and its subunits. The combined structural and proteomic analyses suggest that RPA3313 by itself or in a larger complex may assist in the transportation of substrates to or from the ribosome for further processing. Proteins 2016; 85:93–102. © 2016 Wiley Periodicals, Inc.  相似文献   

20.
R B Russell  G J Barton 《Proteins》1992,14(2):309-323
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号