首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The structure of MTH538, a previously uncharacterized hypothetical protein from Methanobacterium thermoautotrophicum, has been determined by NMR spectroscopy. MTH538 is one of numerous structural genomics targets selected in a genome-wide survey of uncharacterized sequences from this organism. MTH538 is a so-called singleton, a sequence not closely related to any other (known) sequences. The structure of MTH538 closely resembles the known structures of receiver domains from two component response regulator systems, such as CheY, and is similar to the structures of flavodoxins and GTP-binding proteins. Tests on MTH538 for characteristic activities of CheY and flavodoxin were negative. MTH538 did not become phosphorylated in the presence of acetyl phosphate and Mg(2+), although it appeared to bind Mg(2+). MTH538 also did not bind flavin mononucleotide (FMN) or coenzyme F(420). Nevertheless, sequence and structure parallels between MTH538/CheY and two families of ATPase/phosphatase proteins suggest that MTH538 may have a role in a phosphorylation-independent two-component response regulator system.  相似文献   

2.
Gray CH  Good VM  Tonks NK  Barford D 《The EMBO journal》2003,22(14):3524-3535
The Cdc14 family of dual-specificity protein phosphatases (DSPs) is conserved within eukaryotes and functions to down-regulate mitotic Cdk activities, promoting cytokinesis and mitotic exit. We have integrated structural and kinetic analyses to define the molecular mechanism of the dephosphorylation reaction catalysed by Cdc14. The structure of Cdc14 illustrates a novel arrangement of two domains, each with a DSP-like fold, arranged in tandem. The C-terminal domain contains the conserved PTP motif of the catalytic site, whereas the N-terminal domain, which shares no sequence similarity with other DSPs, contributes to substrate specificity, and lacks catalytic activity. The catalytic site is located at the base of a pronounced surface channel formed by the interface of the two domains, and regions of both domains interact with the phosphopeptide substrate. Specificity for a pSer-Pro motif is mediated by a hydrophobic pocket that is capable of accommodating the apolar Pro(P+1) residue of the peptide. Our structural and kinetic data support a role for Cdc14 in the preferential dephosphorylation of proteins modified by proline-directed kinases.  相似文献   

3.
This paper describes a novel computer graphics tool for predicting protein structures. The method is based on structural profiles; which are plots of hydrophobicity, parameters used for secondary structure prediction, or other residue-specific traits against sequence number. Similar structural profiles can indicate similar tertiary structures, in the absence of sequence homology. The profiles of reference proteins, with known structure, can be used for prediction. In the method presented here, structural profiles are compared by interactive computer graphics, using the program Multiplot. As a test, a structural profile comparison of several proteins known to have similar 3D structures is presented. Comparison of structural profiles detects similar folding of the two domains of rhodanese, which was not easily detected by sequence homology.  相似文献   

4.
Kosloff M  Kolodny R 《Proteins》2008,71(2):891-902
It is often assumed that in the Protein Data Bank (PDB), two proteins with similar sequences will also have similar structures. Accordingly, it has proved useful to develop subsets of the PDB from which "redundant" structures have been removed, based on a sequence-based criterion for similarity. Similarly, when predicting protein structure using homology modeling, if a template structure for modeling a target sequence is selected by sequence alone, this implicitly assumes that all sequence-similar templates are equivalent. Here, we show that this assumption is often not correct and that standard approaches to create subsets of the PDB can lead to the loss of structurally and functionally important information. We have carried out sequence-based structural superpositions and geometry-based structural alignments of a large number of protein pairs to determine the extent to which sequence similarity ensures structural similarity. We find many examples where two proteins that are similar in sequence have structures that differ significantly from one another. The source of the structural differences usually has a functional basis. The number of such proteins pairs that are identified and the magnitude of the dissimilarity depend on the approach that is used to calculate the differences; in particular sequence-based structure superpositioning will identify a larger number of structurally dissimilar pairs than geometry-based structural alignments. When two sequences can be aligned in a statistically meaningful way, sequence-based structural superpositioning provides a meaningful measure of structural differences. This approach and geometry-based structure alignments reveal somewhat different information and one or the other might be preferable in a given application. Our results suggest that in some cases, notably homology modeling, the common use of nonredundant datasets, culled from the PDB based on sequence, may mask important structural and functional information. We have established a data base of sequence-similar, structurally dissimilar protein pairs that will help address this problem (http://luna.bioc.columbia.edu/rachel/seqsimstrdiff.htm).  相似文献   

5.
The crystal structure of a hypothetical protein ST2348 (GI: 47118305) from the hyperthermophilic bacteria Sulfolobus tokodaii has been determined using X-ray crystallography. The protein consists of two CBS (cystathione β synthase) domains, whose function has been analyzed and reported here. PSI-BLAST shows a conservation of this domain in about 100 proteins in various species. However, none of the close homologs of ST2348 have been functionally characterized so far. Structure and sequence comparison of ST2348 with human AMP-kinase γ1 subunit and the CBS domain pair of bacterial IMP dehydrogenase is suggestive of its binding to AMP and ATP. A highly conserved residue Asp118, located in a negatively charged patch near the ligand binding cleft, could serve as a site for phosphorylation similar to that found in the chemotatic signal protein CheY and thereby ST2348 can function as a signal transduction molecule.  相似文献   

6.
7.
Domain Analysis of the FliM Protein of Escherichia coli   总被引:1,自引:0,他引:1       下载免费PDF全文
The FliM protein of Escherichia coli is required for the assembly and function of flagella. Genetic analyses and binding studies have shown that FliM interacts with several other flagellar proteins, including FliN, FliG, phosphorylated CheY, other copies of FliM, and possibly MotA and FliF. Here, we examine the effects of a set of linker insertions and partial deletions in FliM on its binding to FliN, FliG, CheY, and phospho-CheY and on its functions in flagellar assembly and rotation. The results suggest that FliM is organized into multiple domains. A C-terminal domain of about 90 residues binds to FliN in coprecipitation experiments, is most stable when coexpressed with FliN, and has some sequence similarity to FliN. This C-terminal domain is joined to the rest of FliM by a segment (residues 237 to 247) that is poorly conserved, tolerates linker insertion, and may be an interdomain linker. Binding to FliG occurs through multiple segments of FliM, some in the C-terminal domain and others in an N-terminal domain of 144 residues. Binding of FliM to CheY and phospho-CheY was complex. In coprecipitation experiments using purified FliM, the protein bound weakly to unphosphorylated CheY and more strongly to phospho-CheY, in agreement with previous reports. By contrast, in experiments using FliM in fresh cell lysates, the protein bound to unphosphorylated CheY about as well as to phospho-CheY. Determinants for binding CheY occur both near the N terminus of FliM, which appears most important for binding to the phosphorylated protein, and in the C-terminal domain, which binds more strongly to unphosphorylated CheY. Several different deletions and linker insertions in FliM enhanced its binding to phospho-CheY in coprecipitation experiments with protein from cell lysates. This suggests that determinants for binding phospho-CheY may be partly masked in the FliM protein as it exists in the cytoplasm. A model is proposed for the arrangement and function of FliM domains in the flagellar motor.  相似文献   

8.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

9.
Here, we discuss the relationship between protein sequence and protein structural similarity. It is established that a protein structural distance (PSD) of 2.0 is a threshold above which two proteins are unlikely to have a detectable pairwise sequence relationship. A precise correlation is established between the level of sequence similarity, defined by a normalized Smith-Waterman score, and the probability that two proteins will have a similar structure (defined by pairwise PSD<2). This correlation can be used in evaluating the likelihood for success in a comparative modeling procedure. We establish the existence of a correlation between sequence and structural similarity for pairs of proteins that are related in structure but whose sequence relationship is not detectable using standard pairwise sequence alignments. Although it is well known that there is a close relationship between sequence and structural similarity for pairwise sequence identities greater than about 30 %, there has been little discussion as to the possible existence of such a relationship for pairs of proteins in or below the twilight zone of sequence similarity (<25 % pairwise sequence identity). Possible implications of our results for the evolution of protein structure are discussed.  相似文献   

10.
Although protein Z (PZ) has a domain arrangement similar to the essential coagulation proteins FVII, FIX, FX, and protein C, its serine protease (SP)-like domain is incomplete and does not exhibit proteolytic activity. We have generated a trial sequence of putative activated protein Z (PZa) by identifying amino acid mutations in the SP-like domain that might reasonably resurrect the serine protease catalytic activity of PZ. The structure of the activated form was then modeled based on the proposed sequence using homology modeling and solvent-equilibrated molecular dynamics simulations. In silico docking of inhibitors of FVIIa and FXa to the putative active site of equilibrated PZa, along with structural comparison with its homologous proteins, suggest that the designed PZa can possibly act as a serine protease.  相似文献   

11.
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.  相似文献   

12.
The protein (Escherichia coli CheY) that controls the direction of flagellar rotation during bacterial chemotaxis has been shown to be phosphorylated on the aspartate 57 residue. The residue phosphorylated is present within a conserved sequence in every member of a family of bacterial regulatory proteins. The phosphorylation is transient, with a much shorter half-life than that expected of a simple acyl phosphate intermediate, indicating that the sequence and conformation of the protein is designed to achieve a rapid hydrolysis. The CheY-phosphate linkage can be reductively cleaved by sodium borohydride. High-performance tandem mass-spectrometric analysis of proteolytic peptides derived from [3H]borohydride-reduced phosphorylated CheY protein was used to identify the position of phosphorylation. Mutants with altered aspartate 57 exhibited no chemotaxis. When aspartate 13, another conserved residue, was changed, greatly reduced chemotaxis was observed, suggesting an important role for aspartate 13. The rate-determining step of chemotactic signaling is governed by the kinetics of formation and hydrolysis of the CheY protein phosphoaspartate bond. The CheY protein apparently functions as a protein phosphatase that possesses a transient covalent intermediate. Transient phosphorylation of an aspartate residue is an effective mechanism for producing a biochemical signal with a short concentration-independent half-life. The duration of the signal can be controlled by small structural elements within the phosphorylated protein.  相似文献   

13.
Human complement component C9 is a multidomain protein for which a large number of surface topographical features have been determined. We have analyzed the exon-intron boundaries of the human C9 gene and find a good correlation between splice sites and surface features of the protein but little correlation with the putative protein domain structure, even in the cysteine-rich sequence homology with the low-density lipoprotein (LDL) receptor which is likely to be an independently folded structural motif. This is surprising because in the LDL receptor the same sequence is precisely bounded by introns, and it has been assumed that this sequence is present in both proteins as a result of exon shuffling. We deduce that substantial rearrangement of the exon-intron structure of the C9 gene must have occurred before the exchange of cysteine-rich domains, possibly linked to the process of exon duplication which was required to generate the repeats in the LDL receptor.  相似文献   

14.
A DNA binding protein encoded by the filamentous single-stranded DNA phage IKe has been isolated from IKe-infected Escherichia coli cells. Fluorescence and in vitro binding studies have shown that the protein binds co-operatively and with a high specificity to single-stranded but not to double-stranded DNA. From titration of the protein to poly(dA) it has been calculated that approximately four bases of the DNA are covered by one monomer of protein. These binding characteristics closely resemble those of gene V protein encoded by the F-specific filamentous phages M13 and fd. The nucleotide sequence of the gene specifying the IKe DNA binding protein has been established. When compared to the nucleotide sequence of gene V of phage M13 it shows an homology of 58%, indicating that these two phages are evolutionarily related. The IKe DNA binding protein is 88 amino acids long which is one amino acid residue larger than the gene V protein sequence. When the IKe DNA binding protein sequence is compared with that of gene V protein it was found that 39 amino acid residues have identical positions in both proteins. The positions of all five tyrosine residues, a number of which are known to be involved in DNA binding, are conserved. Secondary structure predictions indicate that the two proteins contain similar structural domains. It is proposed that the tyrosine residues which are involved in DNA binding are the ones in or next to a beta-turn, at positions 26, 41 and 56 in gene V protein and at positions 27, 42 and 57 in the IKe DNA binding protein.  相似文献   

15.
The monoclonal antibody Lan3-15 identifies a novel protein, Hillarin, that is localized to the axon hillock of leech neurons. Using this antibody we have identified a full length cDNA coding for leech Hillarin and determined its sequence. The gene encodes a 1274 residue protein with a predicted molecular mass of 144013 Da. Data base searches revealed that leech Hillarin has potential orthologues in fly and nematode and that these proteins share two novel protein domains. The W180 domain is characterized by five conserved tryptophans whereas the H domains share 21 invariant residues. In contrast to the arrangement in fly and nematode the cassette containing the W180 and H domains is repeated twice in leech Hillarin. This suggests that the leech Hillarin sequence originated from a duplication event of an ancestral protein with single cassette structure.  相似文献   

16.
As part of the Northeast Structural Genomics Consortium pilot project focused on small eukaryotic proteins and protein domains, we have determined the NMR structure of the protein encoded by ORF YML108W from Saccharomyces cerevisiae. YML108W belongs to one of the numerous structural proteomics targets whose biological function is unknown. Moreover, this protein does not have sequence similarity to any other protein. The NMR structure of YML108W consists of a four-stranded beta-sheet with strand order 2143 and two alpha-helices, with an overall topology of betabetaalphabetabetaalpha. Strand beta1 runs parallel to beta4, and beta2:beta1 and beta4:beta3 pairs are arranged in an antiparallel fashion. Although this fold belongs to the split betaalphabeta family, it appears to be unique among this family; it is a novel arrangement of secondary structure, thereby expanding the universe of protein folds.  相似文献   

17.

Background

As tertiary structure is currently available only for a fraction of known protein families, it is important to assess what parts of sequence space have been structurally characterized. We consider protein domains whose structure can be predicted by sequence similarity to proteins with solved structure and address the following questions. Do these domains represent an unbiased random sample of all sequence families? Do targets solved by structural genomic initiatives (SGI) provide such a sample? What are approximate total numbers of structure-based superfamilies and folds among soluble globular domains?

Results

To make these assessments, we combine two approaches: (i) sequence analysis and homology-based structure prediction for proteins from complete genomes; and (ii) monitoring dynamics of the assigned structure set in time, with the accumulation of experimentally solved structures. In the Clusters of Orthologous Groups (COG) database, we map the growing population of structurally characterized domain families onto the network of sequence-based connections between domains. This mapping reveals a systematic bias suggesting that target families for structure determination tend to be located in highly populated areas of sequence space. In contrast, the subset of domains whose structure is initially inferred by SGI is similar to a random sample from the whole population. To accommodate for the observed bias, we propose a new non-parametric approach to the estimation of the total numbers of structural superfamilies and folds, which does not rely on a specific model of the sampling process. Based on dynamics of robust distribution-based parameters in the growing set of structure predictions, we estimate the total numbers of superfamilies and folds among soluble globular proteins in the COG database.

Conclusion

The set of currently solved protein structures allows for structure prediction in approximately a third of sequence-based domain families. The choice of targets for structure determination is biased towards domains with many sequence-based homologs. The growing SGI output in the future should further contribute to the reduction of this bias. The total number of structural superfamilies and folds in the COG database are estimated as ~4000 and ~1700. These numbers are respectively four and three times higher than the numbers of superfamilies and folds that can currently be assigned to COG proteins.  相似文献   

18.
S K Holland  K Harlos    C C Blake 《The EMBO journal》1987,6(7):1875-1880
The proposed homology between the fibronectin type II domain and the Kringle domains of blood clotting and fibrinolytic proteins has been examined in three dimensions by substituting the type II sequence into the bovine prothrombin Kringle 1 tertiary structure, determined by X-ray crystallographical methods at 3.8 A. Structural substitution of aligned amino acids of the type II domains and the Kringle produces a compact chain fold and deletions and insertions in the type II sequence are accommodated within the modelled structure. This confirms the structural homology between the two domains and verifies the sequence alignment and common evolution of the type II and Kringle units. The two structures contain homologous hydrophobic cores, centered around the two disulphide bridges which link conserved beta-type strands. Gross differences between the two domains occur in exterior loops and potential functional sites in these regions of the type II structures as found in fibronectin, Factor XII and seminal fluid protein PDC-109 are proposed. We suggest that the domains evolved from a common ancestral protein comprising the hydrophobic core and disulphide arrangement which later diverged to bind different macromolecules through adaptation of the external loops.  相似文献   

19.
The trefoil factor family protein, TFF1, forms a homodimer, via a disulphide linkage, that has greater activity in wound healing assays than the monomer. Having previously determined a high-resolution solution structure of a monomeric analogue of TFF1, we now investigate the structure of the homodimer formed by the native sequence. The two putative receptor/ligand recognition domains are found to be well separated, at opposite ends of a flexible linker. This contrasts sharply with the known fixed and compact arrangement of the two trefoil domains of the closely related TFF2, and has significant implications for the mechanism of action and functional specificity of the TFF of proteins.  相似文献   

20.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号