首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 712 毫秒
1.
Wintjens R  Gilis D  Rooman M 《Proteins》2008,70(4):1564-1577
Fe- and Mn-containing superoxide dismutase (sod) enzymes are closely related and similar in both amino acid sequence and structure, but differ in their mode of oligomerization and in their specificity for the Fe or Mn cofactor. The goal of the present work is to identify and analyze the sequence and structure characteristics that ensure the cofactor specificities and the oligomerization modes. For that purpose, 374 sod sequences and 17 sod crystal structures were collected and aligned. These alignments were searched for residues and inter-residue interactions that are conserved within the whole sod family, or alternatively, that are specific to a given sod subfamily sharing common characteristics. This led us to define key residues and inter-residue interaction fingerprints in each subfamily. The comparison of these fingerprints allows, on a rational basis, the design of mutants likely to modulate the activity and/or specificity of the target sod, in good agreement with the available experimental results on known mutants. The key residues and interaction fingerprints are furthermore used to predict if a novel sequence corresponds to a sod enzyme, and if so, what type of sod it is. The predictions of this fingerprint method reach much higher scores and present much more discriminative power than the commonly used method that uses pairwise sequence comparisons.  相似文献   

2.
The 2.9 A resolution structure of iron superoxide dismutase (FeSOD) (EC 1.15.1.1) from Pseudomonas ovalis complexed with the inhibitor azide was solved. Comparison of this structure with free enzyme shows that the inhibitor is bound at the open coordination position of the iron, with a bond length of 2.0 A. The metal moves by 0.4 A into the trigonal plane to produce an orthogonal geometry at the iron. Binding of the inhibitor also causes a movement of the axial ligand (histidine 26) away from the metal, a lengthening of the iron-histidine bond, and a rotation of the histidine 74 ring. The inhibitor possesses contacts in the binding pocket with a pair of conserved tryptophan residues and with the side chains of tyrosine 34 and glutamine 70. This glutamine is conserved between all FeSODs, but is absent in MnSOD. Comparisons with MnSOD show that a different glutamine which possesses the same interactions in the active site as Gln70 in FeSOD is conserved at position 154 in the overall SOD sequence, implying that while manganese and FeSODs are structural homologues in a global sense, their functional and evolutionary relationship is that of second-site mutation revertants.  相似文献   

3.
This work presents a method to compare local clusters of interactingresidues as observed in a known three-dimensional protein structurewith corresponding clusters inferred from homologous proteinsequences, assuming conserved protein folding. For this purposethe local environment of a selected residue in a known proteinstructure is defined as the ensemble of amino acids in contactwith it in the folded state. Using a multiple sequence alignmentto identify corresponding residues in homologous proteins, adetailed comparison can be performed between the local environmentof a selected amino acid in the template protein structure andthe expected local environments at the sets of equivalent residues,derived from the aligned protein sequences. The comparison makesit possible to detect conserved local features such as hydrogenbonding or complementarity in residue substitution. A globalmeasure of environmental similarity is also defined, to searchfor conserved amino acid clusters subject to functional or structural constraints. The proposed approach is useful for investigatingprotein function as well as for site-directed mutagenesis experiments,where appropriate amino acid substitutions can be suggestedby observing naturally occurring protein variants.  相似文献   

4.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

5.
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.  相似文献   

6.
Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.  相似文献   

7.
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone.Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions.These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues.This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.  相似文献   

8.
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.  相似文献   

9.
All sequenced peptide toxins of the cecropin, pleurocidin and dermaceptin/ceratotoxin families in the National Center for Biotechnology Information (NCBI) database as of May 2005 were identified and shown to comprise a single superfamily. The peptide sequences were multiply aligned, revealing conserved residues that may play roles in structure and function. Signature sequences were derived for each of the 3 constituent families. Phylogenetic analyses revealed the relationships of these peptides to each other, and average hydropathy/amphipathicity studies provided structural information. This study serves to characterize a large superfamily of toxic peptides that perform host defense functions in a range of animal kingdoms.  相似文献   

10.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9.  相似文献   

11.
The protein sequences of seven members of the superoxide dismutase (SOD) family from halophilic archaebacteria have been aligned and compared with each other and with the homologous Mn and Fe SOD sequences from eubacteria and the methanogenic archaebacterium Methanobacterium thermoautotrophicum. Of 199 common residues in the SOD proteins from halophilic archaebacteria, 125 are conserved in all seven sequences, and 64 of these are encoded by single unique triplets. The 74 remaining positions exhibit a high degree of variability, and for almost half of these, the encoding triplets are connected by at least two nonsynonymous nucleotide substitutions. The majority of nucleotide substitutions within the seven genes are nonsynonymous and result in amino acid replacement in the respective protein; silent third-codon-position (synonymous) substitutions are unexpectedly rare. Halophilic SODs contain 30 specific residues that are not found at the corresponding positions of the methanogenic or eubacterial SOD proteins. Seven of these are replacements of highly conserved amino acids in eubacterial SODs that are believed to play an important role in the three-dimensional structure of the protein. Residues implicated in formation of the active site, catalysis, and metal ion binding are conserved in all Mn and Fe SODs. Molecular phylogenies based on parsimony and neighbor-joining methods coherently group the halophile sequences but surprisingly fail to distinguish between the Mn SOD of Escherichia coli and the Fe SOD of M. thermoautotrophicum as the outgroup. These comparisons indicate that as a group, the SODs of halophilic archaebacteria have many unique and characteristic features. At the same time, the patterns of nucleotide substitution and amino acid replacement indicate that these genes and the proteins that they encode continue to be subject to strong and changing selection. This selection may be related to the presence of oxygen radicals and the inter- and intracellular composition and concentration of metal cations.  相似文献   

12.
Two related mammalian proteins, bactericidal/permeability-increasing protein (BPI) and lipopolysaccharide-binding protein (LBP), share high-affinity binding to lipopolysaccharide (LPS), a glycolipid found in the outer membrane of gram-negative bacteria. The recently determined crystal structure of human BPI permits a structure/function analysis, presented here, of the conserved regions of these two proteins sequences. In the seven known sequences of BPI and LBP, 102 residues are completely conserved and may be classified in terms of location, side-chain chemistry, and interactions with other residues. We find that the most highly conserved regions lie at the interfaces between the tertiary structural elements that help create two apolar lipid-binding pockets. Most of the conserved polar and charged residues appear to be involved in inter-residue interactions such as H-bonding. However, in both BPI and LBP a subset of conserved residues with positive charge (lysines 42, 48, 92, 95, and 99 of BPI) have no apparent structural role. These residues cluster at the tip of the NH2-terminal domain, and several coincide with residues known to affect LPS binding; thus, it seems likely that these residues make electrostatic interactions with negatively charged groups of LPS. Overall differences in charge and electrostatic potential between BPI and LBP suggest that BPI''s bactericidal activity is related to the high positive charge of its NH2-terminal domain. A model of human LBP derived from the BPI structure provides a rational basis for future experiments, such as site-directed mutagenesis and inhibitor design.  相似文献   

13.

Background  

A large number of PROSITE patterns select false positives and/or miss known true positives. It is possible that – at least in some cases – the weak specificity and/or sensitivity of a pattern is due to the fact that one, or maybe more, functional and/or structural key residues are not represented in the pattern. Multiple sequence alignments are commonly used to build functional sequence patterns. If residues structurally conserved in proteins sharing a function cannot be aligned in a multiple sequence alignment, they are likely to be missed in a standard pattern construction procedure.  相似文献   

14.
An essential function of DNA glycosylases is the recognition and excision of damaged bases in DNA, thereby preserving genomic integrity. Lesion recognition is a multistep process, which is only partially revealed by structural analysis of the catalytically competent complex. The functional role of additional residues can be predicted by combining structural data with analysis of amino acid conservation. The following postulate underlies this approach: if a family or superfamily can be broken into subgroups with different substrate specificities, residues highly conserved between these subgroups represent those important for enzyme catalysis and structure maintenance while residues highly conserved within a subgroup but not between subgroups represent residues important for substrate specificity. We review the bioinformatics approach used for this quantitative analysis and describe its application to the Nth superfamily and Fpg family of DNA glycosylases. These results serve as a starting point in planning site-directed mutagenesis experiments to elucidate the functional role of similar and dissimilar residues in DNA repair and other proteins.  相似文献   

15.
PAS domains are widespread in archaea, bacteria, and eukaryota, and play important roles in various functions. In this study, we aim to explore functional evolutionary relationship among proteins in the PAS domain superfamily in view of the sequence‐structure‐dynamics‐function relationship. We collected protein sequences and crystal structure data from RCSB Protein Data Bank of the PAS domain superfamily belonging to three biological functions (nucleotide binding, photoreceptor activity, and transferase activity). Protein sequences were aligned and then used to select sequence‐conserved residues and build phylogenetic tree. Three‐dimensional structure alignment was also applied to obtain structure‐conserved residues. The protein dynamics were analyzed using elastic network model (ENM) and validated by molecular dynamics (MD) simulation. The result showed that the proteins with same function could be grouped by sequence similarity, and proteins in different functional groups displayed statistically significant difference in their vibrational patterns. Interestingly, in all three functional groups, conserved amino acid residues identified by sequence and structure conservation analysis generally have a lower fluctuation than other residues. In addition, the fluctuation of conserved residues in each biological function group was strongly correlated with the corresponding biological function. This research suggested a direct connection in which the protein sequences were related to various functions through structural dynamics. This is a new attempt to delineate functional evolution of proteins using the integrated information of sequence, structure, and dynamics.  相似文献   

16.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

17.
Restriction endonucleases (REases) are DNA-cleaving enzymes that have become indispensable tools in molecular biology. Type II REases are highly divergent in sequence despite their common structural core, function and, in some cases, common specificities towards DNA sequences. This makes it difficult to identify and classify them functionally based on sequence, and has hampered the efforts of specificity-engineering. Here, we define novel REase sequence motifs, which extend beyond the PD-(D/E)XK hallmark, and incorporate secondary structure information. The automated search using these motifs is carried out with a newly developed fast regular expression matching algorithm that accommodates long patterns with optional secondary structure constraints. Using this new tool, named Scan2S, motifs derived from REases with specificity towards GATC- and CGGG-containing DNA sequences successfully identify REases of the same specificity. Notably, some of these sequences are not identified by standard sequence detection tools. The new motifs highlight potential specificity-determining positions that do not fully overlap for the GATC- and the CCGG-recognizing REases and are candidates for specificity re-engineering.  相似文献   

18.
Sequence alignment is a common method for finding protein structurally conserved/similar regions. However, sequence alignment is often not accurate if sequence identities between to-be-aligned sequences are less than 30%. This is because that for these sequences, different residues may play similar structural roles and they are incorrectly aligned during the sequence alignment using substitution matrix consisting of 20 types of residues. Based on the similarity of physicochemical features, residues can be clustered into a few groups. Using such simplified alphabets, the complexity of protein sequences is reduced and at the same time the key information encoded in the sequences remains. As a result, the accuracy of sequence alignment might be improved if the residues are properly clustered. Here, by using a database of aligned protein structures (DAPS), a new clustering method based on the substitution scores is proposed for the grouping of residues, and substitution matrices of residues at different levels of simplification are constructed. The validity of the reduced alphabets is confirmed by relative entropy analysis. The reduced alphabets are applied to recognition of protein structurally conserved/similar regions by sequence alignment. The results indicate that the accuracy or efficiency of sequence alignment can be improved with the optimal reduced alphabet with N around 9. Supported by the National Natural Science Foundation of China (Grant Nos. 90403120, 10474041 and 10021001) and the Nonlinear Project (973) of the NSM  相似文献   

19.
Farnsworth PN  Singh K 《FEBS letters》2000,482(3):175-179
Small heat shock proteins (sHsp) have been implicated in many cell processes involving the dynamics of protein-protein interactions. Two unusual sequences containing self-complementary motifs (SCM) have been identified within the conserved alpha-crystallin domain of sHsps. When two SCMs are aligned in an anti-parallel direction (N to C and C to N), the charged or polar residues form either salt bridges or hydrogen bonds while the non-polar residues participate in hydrophobic interactions. When aligned in reverse order, the residues of these motifs in alpha-crystallin subunits form either hydrophobic and/or polar interactions. Homology based molecular modeling of the C-terminal domain of alpha-crystallin subunits using the crystal structure of MjHSP16.5 suggests that SCM1 and 2 participate in stabilizing secondary structure and subunit interactions. Also there is overwhelming evidence that these motifs are important in the chaperone-like activity of alpha-crystallin subunits. These sequences are conserved and appear to be characteristic of the entire sHsp superfamily. Similar motifs are also present in the Hsp70 family and the immunoglobulin superfamily.  相似文献   

20.
Detailed analysis of the CuZn superoxide dismutase (SOD) structure provides new results concerning the significance and molecular basis for sequence conservation, intron-exon boundary locations, gene duplication, and Greek key beta-barrel evolution. Using 15 aligned sequences, including a new mouse sequence, specific roles have been assigned to all 23 invariant residues and additional residues exhibiting functional equivalence. Sequence invariance is dominated by 15 residues that form the active site stereochemistry, supporting a primary biological function of superoxide dismutation. The beta-strands have no sequence insertions and deletions, whereas insertions occur within the loops connecting the beta-strands and at both termini. Thus, the beta-barrel with only four invariant residues is apparently over-determined, but dependent on multiple cooperative side chain interactions. The regions encoded by exon I, a proposed nucleation site for protein folding, and exon III, the Zn loop involved in stability and catalysis, are the major structural subdomains not included in the internal twofold axis of symmetry passing near the catalytic Cu ion. This provides strong confirmatory evidence for gene evolution by duplication and fusion followed by the addition of these two exons. The proposed evolutionary pathway explains the structural versatility of the Greek key beta-barrel through functional specialization and subdomain insertions in new loop connections, and provides a rationale for the size of the present day enzyme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号