首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Knowledge-based model building of proteins: concepts and examples.   总被引:2,自引:6,他引:2       下载免费PDF全文
We describe how to build protein models from structural templates. Methods to identify structural similarities between proteins in cases of significant, moderate to low, or virtually absent sequence similarity are discussed. The detection and evaluation of structural relationships is emphasized as a central aspect of protein modeling, distinct from the more technical aspects of model building. Computational techniques to generate and complement comparative protein models are also reviewed. Two examples, P-selectin and gp39, are presented to illustrate the derivation of protein model structures and their use in experimental studies.  相似文献   

2.
  总被引:7,自引:0,他引:7  
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.  相似文献   

3.
    
A strategy for overexpression in Escherichia coli of the extracellular immunoglobulin domain of human CD8alpha was devised using codon usage alterations in the 5' region of the gene, designed so as to prevent the formation of secondary structures in the mRNA. A fragment of CD8alpha, comprising residues 1-120 of the mature protein, excluding the signal peptide and the membrane-proximal stalk region, was recovered from bacterial inclusion bodies and refolded to produce a single species of homodimeric, soluble receptor. HLA-A2 heavy chain, beta2-microglobulin and a synthetic peptide antigen corresponding to the pol epitope from HIV-1 were also expressed in E. coli, refolded and purified. CD8alpha/HLA-A2 complexes were formed in solution and by co-crystallization with a stoichiometry of one CD8alpha alpha dimer to one HLA-A2-peptide unit.  相似文献   

4.
    
An open question in protein homology modeling is, how well do current modeling packages satisfy the dual criteria of quality of results and practical ease of use? To address this question objectively, we examined homology‐built models of a variety of therapeutically relevant proteins. The sequence identities across these proteins range from 19% to 76%. A novel metric, the difference alignment index (DAI), is developed to aid in quantifying the quality of local sequence alignments. The DAI is also used to construct the relative sequence alignment (RSA), a new representation of global sequence alignment that facilitates comparison of sequence alignments from different methods. Comparisons of the sequence alignments in terms of the RSA and alignment methodologies are made to better understand the advantages and caveats of each method. All sequence alignments and corresponding 3D models are compared to their respective structure‐based alignments and crystal structures. A variety of protein modeling software was used. We find that at sequence identities >40%, all packages give similar (and satisfactory) results; at lower sequence identities (<25%), the sequence alignments generated by Profit and Prime, which incorporate structural information in their sequence alignment, stand out from the rest. Moreover, the model generated by Prime in this low sequence identity region is noted to be superior to the rest. Additionally, we note that DSModeler and MOE, which generate reasonable models for sequence identities >25%, are significantly more functional and easier to use when compared with the other structure‐building software.  相似文献   

5.
The CD28 and CTLA-4 (CD152) receptors on T cells recognize CD80 and CD86 ligands on antigen presenting cells. These interactions provide and control costimulatory signals required for effective T cell activation. CD28 and CTLA-4 belong to the immunoglobulin superfamily (IgSF) and contain a single extracellular ligand binding domain. The three-dimensional (3D) structure of the binding domain of CTLA-4 was modeled previously using a combination of structure-based sequence comparison, IgSF consensus residue analysis, conformational search, and inverse folding calculations. Recently, the 3D structure of CTLA-4 was determined by NMR. Comparison of the modeled and experimentally determined CTLA-4 structure has made it possible to assess the accuracy of our predictions. We found that the overall accuracy of the model was sound and sufficient for a meaningful application of the model in experimental studies. Major errors in the model are limited to the conformation and position of some loops. Our studies on CTLA-4 provide an example for the opportunities and limitations of comparative protein modeling in the presence of low sequence similarity.Electronic Supplementary Material available.  相似文献   

6.
    
In order to study structural aspects of sequence conservation in families of homologous proteins, we have analyzed structurally aligned sequences of 585 proteins grouped into 128 homologous families. The conservation of a residue in a family is defined as the average residue similarity in a given position of aligned sequences. The residue similarities were expressed in the form of log-odd substitution tables that take into account the environments of amino acids in three-dimensional structures. The protein core is defined as those residues that have less then 7% solvent accessibility. The density of a protein core is described in terms of atom packing, which is investigated as a criterion for residue substitution and conservation. Although there is no significant correlation between sequence conservation and average atom packing around nonpolar residues such as leucine, valine and isoleucine, a significant correlation is observed for polar residues in the protein core. This may be explained by the hydrogen bonds in which polar residues are involved; the better their protection from water access the more stable should be the structure in that position. Proteins 33:358–366, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

7.
Testis-specific protein, Y-encoded (TSPY) binds to eukaryotic translation elongation factor 1 alpha (eEF1A) at its SET/NAP domain that is essential for the elongation during protein synthesis implicated with normal spermatogenesis. The eEF1A exists in two forms, eEF1A1 (alpha 1) and eEF1A2 (alpha 2), encoded by separate loci. Despite critical interplay of the TSPY and eEF1A proteins, literature remained silent on the residues playing significant roles during such interactions. We deduced 3D structures of TSPY and eEF1A variants by comparative modeling (Modeller 9.13) and assessed protein–protein interactions employing HADDOCK docking. Pairwise alignment using EMBOSS Needle for eEF1A1 and eEF1A2 proteins revealed high degree (~92%) of homology. Efficient binding of TSPY with eEF1A2 as compared to eEF1A1 was observed, in spite of the occurrence of significant structural similarities between the two variants. We also detected strong interactions of domain III followed by domains II and I of both eEF1A variants with TSPY. In the process, seven interacting residues of TSPY’s NAP domain namely, Asp 175, Glu 176, Asp 179, Tyr 183, Asp 240, Glu 244, and Tyr 246 common to both eEF1A variants were detected. Additionally, six lysine residues observed in eEF1A2 suggest their possible role in TSPY–eEF1A2 complex formation essential for germ cell development and spermatogenesis. Thus, more efficient binding of TSPY with eEF1A2 as compared to that of eEF1A1 established autonomous functioning of these two variants. Studies on mutated protein following similar approach would uncover the causative obstruction, between the interacting partners leading to deeper understanding on the structure–function relationship.  相似文献   

8.
  总被引:2,自引:0,他引:2  
We developed a variant of the intermediate sequence search method (ISS(new)) for detection and alignment of weakly similar pairs of protein sequences. ISS(new) relates two query sequences by an intermediate sequence that is potentially homologous to both queries. The improvement was achieved by a more robust overlap score for a match between the queries through an intermediate. The approach was benchmarked on a data set of 2369 sequences of known structure with insignificant sequence similarity to each other (BLAST E-value larger than 0.001); 2050 of these sequences had a related structure in the set. ISS(new) performed significantly better than both PSI-BLAST and a previously described intermediate sequence search method. PSI-BLAST could not detect correct homologs for 1619 of the 2369 sequences. In contrast, ISS(new) assigned a correct homolog as the top hit for 121 of these 1619 sequences, while incorrectly assigning homologs for only nine targets; it did not assign homologs for the remainder of the sequences. By estimate, ISS(new) may be able to assign the folds of domains in approximately 29,000 of the approximately 500,000 sequences unassigned by PSI-BLAST, with 90% specificity (1 - false positives fraction). In addition, we show that the 15 alignments with the most significant BLAST E-values include the nearly best alignments constructed by ISS(new).  相似文献   

9.
    
The emerging field of proteomics has created a need for new high-throughput methodologies for the analysis of gene products. An attractive approach is to develop systems that allow for clonal selection of interacting protein pairs from large molecular libraries. In this study, we have characterized a novel approach for identification and selection of protein-protein interactions, denoted SPIRE (selection of protein interactions by receptor engagement), which is based on a mammalian expression system. We have demonstrated proof of concept by creating a general plasma membrane bound decoy receptor, by displaying a protein or a peptide genetically fused to a trunctated version of the CD40 molecule. When this decoy receptor is engaged by a ligand to the displayed protein/peptide, the receptor expressing cell is rescued from apoptosis. To design a high-throughput system with a highly parallel capacity, we utilized the B cell line WEHI-231, as carrier of the decoy receptor. One specific peptide-displaying cell could be identified and amplified, based on a specific receptor engagement, in a background of 12 500 wild-type cells after four selections. This demonstrates that the approach may serve as a tool in post-genomic research for identifying protein-protein interactions, without prior knowledge of either component.  相似文献   

10.
    
Cozzetto D  Tramontano A 《Proteins》2005,58(1):151-157
Comparative modeling is the method of choice, whenever applicable, for protein structure prediction, not only because of its higher accuracy compared to alternative methods, but also because it is possible to estimate a priori the quality of the models that it can produce, thereby allowing the usefulness of a model for a given application to be assessed beforehand. By and large, the quality of a comparative model depends on two factors: the extent of structural divergence between the target and the template and the quality of the sequence alignment between the two protein sequences. The latter is usually derived from a multiple sequence alignment (MSA) of as many proteins of the family as possible, and its accuracy depends on the number and similarity distribution of the sequences of the protein family. Here we describe a method to evaluate the expected difficulty, and by extension accuracy, of a comparative model on the basis of the MSA used to build it. The parameter that we derive is used to compare the results obtained in the last two editions of the Critical Assessment of Methods for Structure Prediction (CASP) experiment as a function of the difficulty of the modeling exercise. Our analysis demonstrates that the improvement in the scope and quality of comparative models between the two experiments is largely due to the increased number of available protein sequences and to the consequent increased chance that a large and appropriately spaced set of protein sequences homologous to the proteins of interest is available.  相似文献   

11.
    
Pathogens have evolved numerous strategies to infect their hosts, while hosts have evolved immune responses and other defenses to these foreign challenges. The vast majority of host-pathogen interactions involve protein-protein recognition, yet our current understanding of these interactions is limited. Here, we present and apply a computational whole-genome protocol that generates testable predictions of host-pathogen protein interactions. The protocol first scans the host and pathogen genomes for proteins with similarity to known protein complexes, then assesses these putative interactions, using structure if available, and, finally, filters the remaining interactions using biological context, such as the stage-specific expression of pathogen proteins and tissue expression of host proteins. The technique was applied to 10 pathogens, including species of Mycobacterium, apicomplexa, and kinetoplastida, responsible for \"neglected\" human diseases. The method was assessed by (1) comparison to a set of known host-pathogen interactions, (2) comparison to gene expression and essentiality data describing host and pathogen genes involved in infection, and (3) analysis of the functional properties of the human proteins predicted to interact with pathogen proteins, demonstrating an enrichment for functionally relevant host-pathogen interactions. We present several specific predictions that warrant experimental follow-up, including interactions from previously characterized mechanisms, such as cytoadhesion and protease inhibition, as well as suspected interactions in hypothesized networks, such as apoptotic pathways. Our computational method provides a means to mine whole-genome data and is complementary to experimental efforts in elucidating networks of host-pathogen protein interactions.  相似文献   

12.
We evaluate 3D models of human nucleoside diphosphate kinase, mouse cellular retinoic acid binding protein I, and human eosinophil neurotoxin that were calculated by MODELLER , a program for comparative protein modeling by satisfaction of spatial restraints. The models have good stereochemistry and are at least as similar to the crystallographic structures as the closest template structures. The largest errors occur in the regions that were not aligned correctly or where the template structures are not similar to the correct structure. These regions correspond predominantly to exposed loops, insertions of any length, and non-conserved side chains. When a template structure with more than 40% sequence identity to the target protein is available, the model is likely to have about 90% of the mainchain atoms modeled with an rms deviation from the X-ray structure of ≈ 1 Å, in large part because the templates are likely to be that similar to the X-ray structure of the target. This rms deviation is comparable to the overall differences between refined NMR and X-ray crystallography structures of the same protein. © 1995 Wiley-Liss, Inc.  相似文献   

13.
The signal recognition particle (SRP) controls the transport of secretory proteins into and across lipid bilayers. SRP-like ribonucleoprotein complexes exist in all organisms, including plants. We characterized the rice SRP RNA and its primary RNA binding protein, SRP19. The secondary structure of the rice SRP RNA was similar to that found in other eukaryotes; however, as in other plant SRP RNAs, a GUUUCA hexamer sequence replaced the highly conserved GNRA-tetranucleotide loop motif at the apex of helix 8. The small domain of the rice SRP RNA was reduced considerably. Structurally, rice SRP19 lacked two small regions that can be present in other SRP19 homologues. Conservative structure prediction and site-directed mutagenesis of rice and human SRP19 polypeptides indicated that binding to the SRP RNAs occurred via a loop that is present in the N-domain of both proteins. Rice SRP19 protein was able to form a stable complex with the rice SRP RNA in vitro. Furthermore, heterologous ribonucleoprotein complexes with components of the human SRP were assembled, thus confirming a high degree of structural and functional conservation between plant and mammalian SRP components.  相似文献   

14.
  总被引:8,自引:1,他引:8  
CD40 Ligand (CD40L) is transiently expressed on the surface of T-cells and binds to CD40, which is expressed on the surface of B-cells. This binding event leads to the differentiation, proliferation, and isotype switching of the B-cells. The physiological importance of CD40L has been demonstrated by the fact that expression of defective CD40L protein causes an immunodeficiency state characterized by high IgM and low IgG serum levels, indicating faulty T-cell dependent B-cell activation. To understand the structural basis for CD40L/CD40 association, we have used a combination of molecular modeling, mutagenesis, and X-ray crystallography. The structure of the extracellular region of CD40L was determined by protein crystallography, while the CD40 receptor was built using homology modeling based upon a novel alignment of the TNF receptor superfamily, and using the X-ray structure of the TNF receptor as a template. The model shows that the interface of the complex is composed of charged residues, with CD40L presenting basic side chains (K143, R203, R207), and CD40 presenting acidic side chains (D84, E114, E117). These residues were studied experimentally through site-directed mutagenesis, and also theoretically using electrostatic calculations with the program Delphi. The mutagenesis data explored the role of the charged residues in both CD40L and CD40 by switching to Ala (K143A, R203A, R207A of CD40L, and E74A, D84A, E114A, E117A of CD40), charge reversal (K143E, R203E, R207E of CD40L, and D84R, E114R, E117R of CD40), mutation to a polar residue (K143N, R207N, R207Q of CD40L, and D84N, E117N of CD40), and for the basic side chains in CD40L, isosteric substitution to a hydrophobic side chain (R203M, R207M). All the charge-reversal mutants and the majority of the Met and Ala substitutions led to loss of binding, suggesting that charged interactions stabilize the complex. This was supported by the Delphi calculations which confirmed that the CD40/CD40L residue pairs E74-R203, D84-R207, and E117-R207 had a net stabilizing effect on the complex. However, the substitution of hydrophilic side chains at several of the positions was tolerated, which suggests that although charged interactions stabilize the complex, charge per se is not crucial at all positions. Finally, we compared the electrostatic surface of TNF/TNFR with CD40L/CD40 and have identified a set of polar interactions surrounded by a wall of hydrophobic residues that appear to be similar but inverted between the two complexes.  相似文献   

15.
  总被引:27,自引:0,他引:27  
Comparative protein structure prediction is limited mostly by the errors in alignment and loop modeling. We describe here a new automated modeling technique that significantly improves the accuracy of loop predictions in protein structures. The positions of all nonhydrogen atoms of the loop are optimized in a fixed environment with respect to a pseudo energy function. The energy is a sum of many spatial restraints that include the bond length, bond angle, and improper dihedral angle terms from the CHARMM-22 force field, statistical preferences for the main-chain and side-chain dihedral angles, and statistical preferences for nonbonded atomic contacts that depend on the two atom types, their distance through space, and separation in sequence. The energy function is optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the predicted loop conformation corresponds to the lowest energy conformation among 500 independent optimizations. Predictions were made for 40 loops of known structure at each length from 1 to 14 residues. The accuracy of loop predictions is evaluated as a function of thoroughness of conformational sampling, loop length, and structural properties of native loops. When accuracy is measured by local superposition of the model on the native loop, 100, 90, and 30% of 4-, 8-, and 12-residue loop predictions, respectively, had <2 A RMSD error for the mainchain N, C(alpha), C, and O atoms; the average accuracies were 0.59 +/- 0.05, 1.16 +/- 0.10, and 2.61 +/- 0.16 A, respectively. To simulate real comparative modeling problems, the method was also evaluated by predicting loops of known structure in only approximately correct environments with errors typical of comparative modeling without misalignment. When the RMSD distortion of the main-chain stem atoms is 2.5 A, the average loop prediction error increased by 180, 25, and 3% for 4-, 8-, and 12-residue loops, respectively. The accuracy of the lowest energy prediction for a given loop can be estimated from the structural variability among a number of low energy predictions. The relative value of the present method is gauged by (1) comparing it with one of the most successful previously described methods, and (2) describing its accuracy in recent blind predictions of protein structure. Finally, it is shown that the average accuracy of prediction is limited primarily by the accuracy of the energy function rather than by the extent of conformational sampling.  相似文献   

16.
    
Protein structure prediction by comparative modeling benefits greatly from the use of multiple sequence alignment information to improve the accuracy of structural template identification and the alignment of target sequences to structural templates. Unfortunately, this benefit is limited to those protein sequences for which at least several natural sequence homologues exist. We show here that the use of large diverse alignments of computationally designed protein sequences confers many of the same benefits as natural sequences in identifying structural templates for comparative modeling targets. A large-scale massively parallelized application of an all-atom protein design algorithm, including a simple model of peptide backbone flexibility, has allowed us to generate 500 diverse, non-native, high-quality sequences for each of 264 protein structures in our test set. PSI-BLAST searches using the sequence profiles generated from the designed sequences (\"reverse\" BLAST searches) give near-perfect accuracy in identifying true structural homologues of the parent structure, with 54% coverage. In 41 of 49 genomes scanned using reverse BLAST searches, at least one novel structural template (not found by the standard method of PSI-BLAST against PDB) is identified. Further improvements in coverage, through optimizing the scoring function used to design sequences and continued application to new protein structures beyond the test set, will allow this method to mature into a useful strategy for identifying distantly related structural templates.  相似文献   

17.
    
Dolan MA  Keil M  Baker DS 《Proteins》2008,72(4):1243-1258
Although the number of known protein structures is increasing, the number of protein sequences without determined structures is still much larger. Three-dimensional (3D) protein structure information helps in the understanding of functional mechanisms, but solving structures by X-ray crystallography or NMR is often a lengthy and difficult process. A relatively fast way of determining a protein's 3D structure is to construct a computer model using homologous sequence and structure information. Much work has gone into algorithms that comprise the ORCHESTRAR homology modeling program in the SYBYL software package. This novel homology modeling tool combines algorithms for modeling conserved cores, variable regions, and side chains. The paradigm of using existing knowledge from multiple templates and the underlying protein environment knowledgebase is used in all of these algorithms, and will become even more powerful as the number of experimentally derived protein structures increases. To determine how ORCHESTRAR compares to Composer (a broadly used, but an older tool), homology models of 18 proteins were constructed using each program so that a detailed comparison of each step in the modeling process could be carried out. Proteins modeled include kinases, dihydrofolate reductase, HIV protease, and factor Xa. In almost all cases ORCHESTRAR produces models with lower root-mean-squared deviation (RMSD) values when compared with structures determined by X-ray crystallography or NMR. Moreover, ORCHESTRAR produced a homology model for three target sequences where Composer failed to produce any. Data for RMSD comparisons between structurally conserved cores, structurally variable regions, side-chain conformations are presented, as well as analyses of active site and protein-protein interface configurations.  相似文献   

18.
    
We show that long- and short-range interactions in almost all protein native structures are actually consistent with each other for coarse-grained energy scales; specifically we mean the long-range inter-residue contact energies and the short-range secondary structure energies based on peptide dihedral angles, which are potentials of mean force evaluated from residue distributions observed in protein native structures. This consistency is observed at equilibrium in sequence space rather than in conformational space. Statistical ensembles of sequences are generated by exchanging residues for each of 797 protein native structures with the Metropolis method. It is shown that adding the other category of interaction to either the short- or long-range interactions decreases the means and variances of those energies for essentially all protein native structures, indicating that both interactions consistently work by more-or-less restricting sequence spaces available to one of the interactions. In addition to this consistency, independence by these interaction classes is also indicated by the fact that there are almost no correlations between them when equilibrated using both interactions and significant but small, positive correlations at equilibrium using only one of the interactions. Evidence is provided that protein native sequences can be regarded approximately as samples from the statistical ensembles of sequences with these energy scales and that all proteins have the same effective conformational temperature. Designing protein structures and sequences to be consistent and minimally frustrated among the various interactions is a most effective way to increase protein stability and foldability.  相似文献   

19.
    
Structural characterization of protein–protein interactions is essential for our ability to understand life processes. However, only a fraction of known proteins have experimentally determined structures. Such structures provide templates for modeling of a large part of the proteome, where individual proteins can be docked by template‐free or template‐based techniques. Still, the sensitivity of the docking methods to the inherent inaccuracies of protein models, as opposed to the experimentally determined high‐resolution structures, remains largely untested, primarily due to the absence of appropriate benchmark set(s). Structures in such a set should have predefined inaccuracy levels and, at the same time, resemble actual protein models in terms of structural motifs/packing. The set should also be large enough to ensure statistical reliability of the benchmarking results. We present a major update of the previously developed benchmark set of protein models. For each interactor, six models were generated with the model‐to‐native Cα RMSD in the 1 to 6 Å range. The models in the set were generated by a new approach, which corresponds to the actual modeling of new protein structures in the “real case scenario,” as opposed to the previous set, where a significant number of structures were model‐like only. In addition, the larger number of complexes (165 vs. 63 in the previous set) increases the statistical reliability of the benchmarking. We estimated the highest accuracy of the predicted complexes (according to CAPRI criteria), which can be attained using the benchmark structures. The set is available at http://dockground.bioinformatics.ku.edu . Proteins 2015; 83:891–897. © 2015 Wiley Periodicals, Inc.  相似文献   

20.
In the present study, a novel structural motif of proteins referred to as the phi-motif is considered, and two novel structural trees in which the phi-motif is taken as the root structure have been constructed. The simplest phi-motif is formed by three adjacent beta-strands connected by loops and packed in one beta-sheet so that its overall fold resembles the Greek letter phi. Construction of the structural trees and modeling of folding pathways have shown that all structures of the protein superfamilies can be obtained by stepwise addition of alpha-helices and/or beta-strands to the root phi-motif taking into account a restricted set of rules inferred from known principles of protein structure. The structural trees are a good tool for structure comparison, structural classification of proteins, as well as for searching for all possible protein folds and folding pathways.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号