首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.  相似文献   

2.
A significant proportion of proteins comprise multiple domains. Domain–domain docking is a tool that predicts multi-domain protein structures when individual domain structures can be accurately predicted but when domain orientations cannot be predicted accurately. GalaxyDomDock predicts an ensemble of domain orientations from given domain structures by docking. Such information would also be beneficial in elucidating the functions of proteins that have multiple states with different domain orientations. GalaxyDomDock is an ab initio domain–domain docking method based on GalaxyTongDock, a previously developed protein–protein docking method. Infeasible domain orientations for the given linker are effectively screened out from the docked conformations by a geometric filter, using the Dijkstra algorithm. In addition, domain linker conformations are predicted by adopting a loop sampling method FALC. The proposed GalaxyDomDock outperformed existing ab initio domain–domain docking methods, such as AIDA and Rosetta, in performance tests on the Rosetta benchmark set of two-domain proteins. GalaxyDomDock also performed better than or comparable to AIDA on the AIDA benchmark set of two-domain proteins and two-domain proteins containing discontinuous domains, including the benchmark set in which each domain of the set was modeled by the recent version of AlphaFold. The GalaxyDomDock web server is freely available as a part of GalaxyWEB at http://galaxy.seoklab.org/domdock.  相似文献   

3.
Crystal Structure of a Full-Length Autotransporter   总被引:1,自引:0,他引:1  
The autotransporter (AT) secretion mechanism is the most common mechanism for the secretion of virulence factors across the outer membrane (OM) from pathogenic Gram-negative bacteria. In addition, ATs have attracted biotechnological and biomedical interest for protein display on bacterial cell surfaces. Despite their importance, the mechanism by which passenger domains of ATs pass the OM is still unclear. The classical view is that the β-barrel domain provides the conduit through which the unfolded passenger moves, with the energy provided by vectorial folding of the β-strand-rich passenger on the extracellular side of the OM. We present here the first structure of a full-length AT, the esterase EstA from Pseudomonas aeruginosa, at a resolution of 2.5 Å. EstA has a relatively narrow, 12-stranded β-barrel that is covalently attached to the passenger domain via a long, curved helix that occupies the lumen of the β-barrel. The passenger has a structure that is dramatically different from that of other known passengers, with a globular fold that is dominated by α-helices and loops. The arrangement of secondary-structure elements suggests that the passenger can fold sequentially, providing the driving force for passenger translocation. The esterase active-site residues are located at the apical surface of the passenger, at the entrance of a large hydrophobic pocket that contains a bound detergent molecule that likely mimics substrate. The EstA structure provides insight into AT mechanism and will facilitate the design of fusion proteins for cell surface display.  相似文献   

4.
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.  相似文献   

5.
Studies on members of protein families with similar structures but divergent sequences provide insights into the effects of sequence composition on the mechanism of folding. Members of the peripheral subunit-binding domain (PSBD) family fold ultrafast and approach the smallest size for cooperatively folding proteins. Φ-Value analysis of the PSBDs E3BD and POB reveals folding via nucleation-condensation through structurally very similar, polarized transition states. Here, we present a Φ-value analysis of the family member BBL and found that it also folds by a nucleation-condensation mechanism. The mean Φ values of BBL, E3BD, and POB were near identical, indicating similar fractions of non-covalent interactions being formed in the transition state. Despite the overall conservation of folding mechanism in this protein family, however, the pattern of Φ values determined for BBL revealed a larger dispersion of the folding nucleus across the entire structure, and the transition state was less polarized. The observed plasticity of transition-state structure can be rationalized by the different helix-forming propensities of PSBD sequences. The very strong helix propensity in the first helix of BBL, relative to E3BD and POB, appears to recruit more structure formation in that helix in the transition state at the expense of weaker interactions in the second helix. Differences in sequence composition can modulate transition-state structure of even the smallest natural protein domains.  相似文献   

6.
Protein-engineering methods (Φ-values) were used to investigate the folding transition state of a lysin motif (LysM) domain from Escherichia coli membrane-bound lytic murein transglycosylase D. This domain consists of just 48 structured residues in a symmetrical βααβ arrangement and is the smallest αβ protein yet investigated using these methods. An extensive mutational analysis revealed a highly robust folding pathway with no detectable transition state plasticity, indicating that LysM is an example of an ideal two-state folder. The pattern of Φ-values denotes a highly polarised transition state, with significant formation of the helices but no structure within the β-sheet. Remarkably, this transition state remains polarised after circularisation of the domain, and exhibits an identical Φ-value pattern; however, the interactions within the transition state are uniformly weaker in the circular variant. This observation is supported by results from an Eyring analysis of the folding rates of the two proteins. We propose that the folding pathway of LysM is dominated by enthalpic rather than entropic considerations, and suggest that the lower entropy cost of formation of the circular transition state is balanced, to some extent, by the lower enthalpy of contacts within this structure.  相似文献   

7.
Antibodies are modular proteins consisting of domains that exhibit a β-sandwich structure, the so-called immunoglobulin fold. Despite structural similarity, differences in folding and stability exist between different domains. In particular, the variable domain of the light chain VL is unusual as it is associated with misfolding diseases, including the pathologic assembly of the protein into fibrillar structures. Here, we have analysed the folding pathway of a VL domain with a view to determine features that may influence the relationship between productive folding and fibril formation. The VL domain from MAK33 (murine monoclonal antibody of the subtype κ/IgG1) has not previously been associated with fibrillisation but is shown here to be capable of forming fibrils. The folding pathway of this VL domain is complex, involving two intermediates in different pathways. An obligatory early molten globule-like intermediate with secondary structure but only loose tertiary interactions is inferred. The native state can then be formed directly from this intermediate in a phase that can be accelerated by the addition of prolyl isomerases. However, an alternative pathway involving a second, more native-like intermediate is also significantly populated. Thus, the protein can reach the native state via two distinct folding pathways. Comparisons to the folding pathways of other antibody domains reveal similarities in the folding pathways; however, in detail, the folding of the VL domain is striking, with two intermediates populated on different branches of the folding pathway, one of which could provide an entry point for molecules diverted into the amyloid pathway.  相似文献   

8.
The human glucocorticoid receptor ligand-binding domain (hGR-LBD) is an important drug target for the treatment of various diseases. However, the low intrinsic stability and solubility of hGR-LBD have rendered its purification and biophysical characterization difficult. In order to overcome these problems, we have stabilized hGR-LBD by a combination of random mutagenesis and high-throughput screening using fluorescence-activated cell sorting (FACS) with enhanced green fluorescent protein (eGFP) as folding reporter. Two plasmid-encoded gene libraries of hGR-LBD fused to the egfp gene were expressed in Escherichia coli, followed by eight rounds of FACS screening, in each of which 108 cells were analyzed. The hgr-lbd mutants isolated by this approach contained numerous amino acid exchanges, and four beneficial ones (A605V, V702A, E705G, and M752T) were followed up in detail. Their characterization showed that the fluorescence of hGR-LBD-eGFP fusions is correlated linearly with the stability and solubility of hGR-LBD in the absence of eGFP. When combined, the four exchanges increased the thermal stability of hGR-LBD by more than 8 °C and enhanced its purification yield after expression in E. coli by about 26-fold. The introduction of three beneficial exchanges into the homologous ligand-binding domain of mouse enabled its X-ray structure determination at high resolution, which showed how the exchanges stabilize the protein and revealed atomic details that will guide future drug design. Our results demonstrate that large eGFP fusion libraries can be screened by FACS with extreme sensitivity and efficiency, yielding stabilized eukaryotic proteins suitable for biophysical characterization and structure determination.  相似文献   

9.
The SlyD (sensitive to lysis D) protein of Escherichia coli is a folding enzyme with a chaperone domain and a prolyl isomerase domain of the FK506 binding protein type. Here we investigated how the two domains and their interplay are optimized for function in protein folding. Unfolded protein molecules initially form a highly dynamic complex with the chaperone domain of SlyD, and they are then transferred to the prolyl isomerase domain. The turnover number of the prolyl isomerase site is very high and guarantees that, after transfer, prolyl peptide bonds in substrate proteins are isomerized very rapidly. The Michaelis constant of catalyzed folding reflects the substrate affinity of the chaperone domain, and the turnover number is presumably determined by the rate of productive substrate transfer from the chaperone to the prolyl isomerase site and by the intrinsic propensity of the refolding protein chain to leave the active site with the native prolyl isomer. The efficiency of substrate transfer is high because dissociation from the chaperone site is very fast and because the two sites are close to each other. Protein molecules that left the prolyl isomerase site with an incorrect prolyl isomer can rapidly be re-bound by the chaperone domain because the association rate is very high as well.  相似文献   

10.
An N-terminally truncated and cooperatively folded version (residues 6-39) of the human Pin1 WW domain (hPin1 WW hereafter) has served as an excellent model system for understanding triple-stranded beta-sheet folding energetics. Here we report that the negatively charged N-terminal sequence (Met1-Ala-Asp-Glu-Glu5) previously deleted, and which is not conserved in highly homologous WW domain family members from yeast or certain fungi, significantly increases the stability of hPin1 WW (approximately 4 kJ mol(-1) at 65 degrees C), in the context of the 1-39 sequence based on equilibrium measurements. N-terminal truncations and mutations in conjunction with a double mutant cycle analysis and a recently published high-resolution X-ray structure of the hPin1 cis/trans-isomerase suggest that the increase in stability is due to an energetically favorable ionic interaction between the negatively charged side chains in the N terminus of full-length hPin1 WW and the positively charged epsilon-ammonium group of residue Lys13 in beta-strand 1. Our data therefore suggest that the ionic interaction between Lys13 and the charged N terminus is the optimal solution for enhanced stability without compromising function, as ascertained by ligand binding studies. Kinetic laser temperature-jump relaxation studies reveal that this stabilizing interaction has not formed to a significant extent in the folding transition state at near physiological temperature, suggesting a differential contribution of the negatively charged N-terminal sequence to protein stability and folding rate. As neither the N-terminal sequence nor Lys13 are highly conserved among WW domains, our data further suggest that caution must be exercised when selecting domain boundaries for WW domains for structural, functional, or thermodynamic studies.  相似文献   

11.
Oshrit Arviv  Yaakov Levy 《Proteins》2012,80(12):2780-2798
Most eukaryotic and a substantial fraction of prokaryotic proteins are composed of more than one domain. The tethering of these evolutionary, structural, and functional units raises, among others, questions regarding the folding process of conjugated domains. Studying the folding of multidomain proteins in silico enables one to identify and isolate the tethering‐induced biophysical determinants that govern crosstalks generated between neighboring domains. For this purpose, we carried out coarse‐grained and atomistic molecular dynamics simulations of two two‐domain constructs from the immunoglobulin‐like β‐sandwich fold. Each of these was experimentally shown to behave as the “sum of its parts,” that is, the thermodynamic and kinetic folding behavior of the constituent domains of these constructs seems to occur independently, with the folding of each domain uncoupled from the folding of its partner in the two‐domain construct. We show that the properties of the individual domains can be significantly affected by conjugation to another domain. The tethering may be accompanied by stabilizing as well as destabilizing factors whose magnitude depends on the size of the interface, the length, and the flexibility of the linker, and the relative stability of the domains. Accordingly, the folding of a multidomain protein should not be viewed as the sum of the folding patterns of each of its parts, but rather, it involves abrogating several effects that lead to this outcome. An imbalance between these effects may result in either stabilization or destabilization owing to the tethering. Proteins 2012; © 2012 Wiley Periodicals, Inc.  相似文献   

12.
Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a protein from sequence information alone is presented. The method is based on analyzing multiple sequence alignments derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence. Then they are combined into a single predictor using support vector machine. What is more important, the domain detection is first taken as an imbal- anced data learning problem. A novel undersampling method is proposed on distance-based maximal entropy in the feature space of Support Vector Machine (SVM). The overall precision is about 80%. Simulation results demonstrate that the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general im- balanced datasets.  相似文献   

13.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda.  相似文献   

14.
Streptococcus gordonii is a primary colonizer and is involved in the formation of dental plaque. This bacterium expresses several surface proteins. One of them is the adhesin SspB, which is a member of the Antigen I/II family of proteins. SspB is a large multi-domain protein that has interactions with surface molecules on other bacteria and on host cells, and is thus a key factor in the formation of biofilms. Here, we report the crystal structure of a truncated form of the SspB C-terminal domain, solved by single-wavelength anomalous dispersion to 1.5 Å resolution. The structure represents the first of a C-terminal domain from a streptococcal Antigen I/II protein and is comprised of two structurally related β-sandwich domains, C2 and C3, both with a Ca2+ bound in equivalent positions. In each of the domains, a covalent isopeptide bond is observed between a lysine and an asparagine, a feature that is believed to be a common stabilization mechanism in Gram-positive surface proteins. S. gordonii biofilms contain attachment sites for the periodontal pathogen Porphyromonas gingivalis and the SspB C-terminal domain has been shown to have one such recognition motif, the SspB adherence region. The motif protrudes from the protein, and serves as a handle for attachment. The structure suggests several additional putative binding surfaces, and other binding clefts may be created when the full-length protein is folded.  相似文献   

15.
Two homologous fibronectin type III (fnIII) domains, FNfn10 (the 10th fnIII domain of human fibronectin) and TNfn3 (the third fnIII domain of human tenascin), have essentially the same backbone structure, although they share only ∼ 24% sequence identity. While they share a similar folding mechanism with a common core of key residues in the folding transition state, they differ in many other physical properties. We use a chimeric protein, FNoTNc, to investigate the molecular basis for these differences. FNoTNc is a core-swapped protein, containing the “outside” (surface and loops) of FNfn10 and the hydrophobic core of TNfn3. Remarkably, FNoTNc retains the structure of the parent proteins despite the extent of redesign, allowing us to gain insight into which components of each parent protein are responsible for different aspects of its behaviour. Naively, one would expect properties that appear to depend principally on the core to be similar to TNfn3, for example, the response to mutations, folding kinetics and side-chain dynamics, while properties apparently determined by differences in the surface and loops, such as backbone dynamics, would be more like FNfn10. While this is broadly true, it is clear that there are also unexpected crosstalk effects between the core and the surface. For example, the anomalous response of FNfn10 to mutation is not solely a property of the core as we had previously suggested.  相似文献   

16.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

17.
GroEL is a group I chaperonin that facilitates protein folding and prevents protein aggregation in the bacterial cytosol. Mycobacteria are unusual in encoding two or more copies of GroEL in their genome. While GroEL2 is essential for viability and likely functions as the general housekeeping chaperonin, GroEL1 is dispensable, but its structure and function remain unclear.Here, we present the 2.2-Å resolution crystal structure of a 23-kDa fragment of Mycobacterium tuberculosis GroEL1 consisting of an extended apical domain. Our X-ray structure of the GroEL1 apical domain closely resembles those of Escherichia coli GroEL and M. tuberculosis GroEL2, thus highlighting the remarkable structural conservation of bacterial chaperonins. Notably, in our structure, the proposed substrate-binding site of GroEL1 interacts with the N-terminal region of a symmetry-related neighboring GroEL1 molecule. The latter is consistent with the known GroEL apical domain function in substrate binding and is supported by results obtained from using peptide array technology. Taken together, these data show that the apical domains of M. tuberculosis GroEL paralogs are conserved in three-dimensional structure, suggesting that GroEL1, like GroEL2, is a chaperonin.  相似文献   

18.
The X-ray structure of the C-terminal fragment, containing residues 449-946, of Escherichia coli glutamine synthetase adenylyl transferase (ATase) has been determined. ATase is part of the cascade that regulates the enzymatic activity of E. coli glutamine synthetase, a key component of the cell's machinery for the uptake of ammonia. It has two enzymatic activities, adenylyl removase (AR) and adenylyl transferase (AT), which are located in distinct catalytic domains that are separated by a regulatory (R) domain. We previously reported the three-dimensional structure of the AR domain (residues 1-440). The present structure contains both the R and AT domains. AR and AT share 24% sequence identity and also contain the β-polymerase motif that is characteristic of many nucleotidylyl transferase enzymes. The structures overlap with an rmsd of 2.4 Å when the superhelical R domain is omitted. A model for the complete ATase molecule is proposed, along with some refinements of domain boundaries. A rather more speculative model for the complex of ATase with glutamine synthetase and the nitrogen signal transduction protein PII is also presented.  相似文献   

19.
The delineation of domain boundaries of a given sequence in the absence of known 3D structures or detectable sequence homology to known domains benefits many areas in protein science, such as protein engineering, protein 3D structure determination and protein structure prediction. With the exponential growth of newly determined sequences, our ability to predict domain boundaries rapidly and accurately from sequence information alone is both essential and critical from the viewpoint of gene function annotation. Anyone attempting to predict domain boundaries for a single protein sequence is invariably confronted with a plethora of databases that contain boundary information available from the internet and a variety of methods for domain boundary prediction. How are these derived and how well do they work? What definition of 'domain' do they use? We will first clarify the different definitions of protein domains, and then describe the available public databases with domain boundary information. Finally, we will review existing domain boundary prediction methods and discuss their strengths and weaknesses.  相似文献   

20.
We investigate the average inter-residue folding forces derived from mutational data of the 15 proteins: barstar, barnase, chymotrypsin inhibitor 2 (CI2), Src SH3 domain, spectrin R16 domain, Arc repressor, apo-azurin, cold shock protein B (cspB), C-terminal domain of ribosomal protein L9 (CTL9), FKBP12, α-lactalbumin, colicin E7 immunity protein 7 (IM7), colicin E9 immunity protein 9 (IM9), spectrin R17 domain, and ubiquitin. The residue-specific contributions to folding in most of the 15 protein molecules are highly non-uniformly distributed and are typically about 1 piconewton (pN) per interaction. The strongest folding forces often occur in some of the helices and strands of folding nuclei which suggests that folding nucleation−condensation is partially directed by formation of some secondary structure interactions. The correlation of the energy changes of mutants with inter-residue contact maps of the protein molecules provides a higher resolution than assigning the mutant data to certain positions in the polypeptide strand alone. In contrast to previous Φ-value analysis, we now can partially resolve folding motions. Compaction of at least one α-helix along its axis mediated by internal hydrogen bonds and stabilized by diffuse tertiary structure interactions appears to be one important molecular event during early folding in barstar, CI2, spectrin R16 domain, Arc repressor, α-lactalbumin, IM7, IM9, and spectrin R17 domain. A lateral movement of at least two strands neighbored in sequence towards each other appears to be involved in early folding of the SH3 domain, cspB, CTL9, and FKBP12.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号