首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Abstract Protein structures are much more conserved than sequences during evolution. Based on this observation, we investigate the consequences of structural conservation on protein evolution. We study seven of the most studied protein folds, determining that an extended neutral network in sequence space is associated with each of them. Within our model, neutral evolution leads to a non-Poissonian substitution process, due to the broad distribution of connectivities in neutral networks. The observation that the substitution process has non-Poissonian statistics has been used to argue against the original Kimura neutral theory, while our model shows that this is a generic property of neutral evolution with structural conservation. Our model also predicts that the substitution rate can strongly fluctuate from one branch to another of the evolutionary tree. The average sequence similarity within a neutral network is close to the threshold of randomness, as observed for families of sequences sharing the same fold. Nevertheless, some positions are more difficult to mutate than others. We compare such structurally conserved positions to positions conserved in protein evolution, suggesting that our model can be a valuable tool to distinguish structural from functional conservation in databases of protein families. These results indicate that a synergy between database analysis and structurally based computational studies can increase our understanding of protein evolution.  相似文献   

2.
We have extended the resolution of the crystal structure of human bactericidal/permeability-increasing protein (BPI) to 1.7 A. BPI has two domains with the same fold, but with little sequence similarity. To understand the similarity in structure of the two domains, we compare the corresponding residue positions in the two domains by the method of 3D-1D profiles. A 3D-1D profile is a string formed by assigning each position in the 3D structure to one of 18 environment classes. The environment classes are defined by the local secondary structure, the area of the residue which is buried from solvent, and the fraction of the area buried by polar atoms. A structural alignment between the two BPI domains was used to compare the 3D-1D environments of structurally equivalent positions. Greater than 31% of the aligned positions have conserved 3D-1D environments, but only 13% have conserved residue identities. Analysis of the 3D-1D environmentally conserved positions helps to identify pairs of residues likely to be important in conserving the fold, regardless of the residue similarity. We find examples of 3D-1D environmentally conserved positions with dissimilar residues which nevertheless play similar structural roles. To generalize our findings, we analyzed four other proteins with similar structures yet dissimilar sequences. Together, these examples show that aligned pairs of dissimilar residues often share similar structural roles, stabilizing dissimilar sequences in the same fold.  相似文献   

3.
CORA is a suite of programs for multiply aligning and analyzing protein structural families to identify the consensus positions and capture their most conserved structural characteristics (e.g., residue accessibility, torsional angles, and global geometry as described by inter-residue vectors/contacts). Knowledge of these structurally conserved positions, which are mostly in the core of the fold and of their properties, significantly improves the identification and classification of newly-determined relatives. Information is encoded in a consensus three-dimensional (3D) template and relatives found by a sensitive alignment method, which employs a new scoring scheme based on conserved residue contacts. By encapsulating these critical "core" features, templates perform more reliably in recognizing distant structural relatives than searches with representative structures. Parameters for 3D-template generation and alignment were optimized for each structural class (mainly-alpha, mainly-beta, alpha-beta), using representative superfold families. For all families selected, the templates gave significant improvements in sensitivity and selectivity in recognizing distant structural relatives. Furthermore, since templates contain less than 70% of fold positions and compare fewer positions when aligning structures, scans are at least an order of magnitude faster than scans using selected structures. CORA was subsequently tested on eight other broad structural families from the CATH database. Diagnostics plots are generated automatically and provide qualitative assistance for classifying newly determined relatives. They are demonstrated here by application to the large globin-like fold family. CORA templates for both homologous superfamilies and fold families will be stored in CATH and used to improve the classification and analysis of newly determined structures.  相似文献   

4.
In an effort to better understand beta-sheet assembly, we have investigated the evolutionary behavior of neighboring residues on adjacent antiparallel beta-strands. Residue pairs were classified according to solvent exposure as well as by whether their backbone NH and C==O groups are hydrogen bonded. The conservation and covariation of 19,241 pairs in 219 sequence alignments was analyzed. Buried pairs were found to be the most conserved, while stronger covariation was detected in the solvent-exposed pairs. However, residues on neighboring strands showed a degree of conservation and covariation similar to that of well-separated residues on the same strand, suggesting that evolutionary pressure to maintain complementarity between pairs on neighboring strands is weak. Moreover, in spite of the preference of certain amino acid pairs to occupy neighboring positions on adjacent strands, such favored pairs are neither more strongly mutually conserved nor covary more strongly than pairs of the same type in non-interacting positions. Although the beta-sheet pairs did not show outstanding evolutionary coupling, in many protein families significant conservation and covariation patterns were detected for some of the residue pairs. Overall, the weak evolutionary conservation and covariation of the beta-sheet pairs indicates that sheet structure is unlikely to be dictated by specific side-chain interactions.  相似文献   

5.
Reinhardt A  Eisenberg D 《Proteins》2004,56(3):528-538
In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.  相似文献   

6.
Protein-protein interactions play an essential role in the functioning of cell. The importance of charged residues and their diverse role in protein-protein interactions have been well studied using experimental and computational methods. Often, charged residues located in protein interaction interfaces are conserved across the families of homologous proteins and protein complexes. However, on a large scale, it has been recently shown that charged residues are significantly less conserved than other residue types in protein interaction interfaces. The goal of this work is to understand the role of charged residues in the protein interaction interfaces through their conservation patterns. Here, we propose a simple approach where the structural conservation of the charged residue pairs is analyzed among the pairs of homologous binary complexes. Specifically, we determine a large set of homologous interactions using an interaction interface similarity measure and catalog the basic types of conservation patterns among the charged residue pairs. We find an unexpected conservation pattern, which we call the correlated reappearance, occurring among the pairs of homologous interfaces more frequently than the fully conserved pairs of charged residues. Furthermore, the analysis of the conservation patterns across different superkingdoms as well as structural classes of proteins has revealed that the correlated reappearance of charged residues is by far the most prevalent conservation pattern, often occurring more frequently than the unconserved charged residues. We discuss a possible role that the new conservation pattern may play in the long-range electrostatic steering effect.  相似文献   

7.
Hernández G  LeMaster DM 《Proteins》2005,60(4):723-731
Given any operational criterion for pairwise interatomic interactions, for a pair of structurally homologous proteins there exists for both proteins a unique equivalent partitioning of the nonconserved residue positions into mutually non-interacting clusters. In the formation of a chimeric protein derived from these two parental sequences, if nonnative-like interactions are to be avoided in its tertiary structure, then all of the nonconserved residues of each cluster must necessarily be either maintained or interchanged simultaneously. This hybrid native partitioning criterion is applied to known gene shuffling results. When the degree of estimated disruption is modest, the HybNat algorithm provides an efficient predictor of structural integrity. This supports the expectation that a substantial fraction of sequences that conform to the hybrid native partitioning criterion will yield tertiary structures that largely preserve the native-like interactions of the parental proteins.  相似文献   

8.
In order to study structural aspects of sequence conservation in families of homologous proteins, we have analyzed structurally aligned sequences of 585 proteins grouped into 128 homologous families. The conservation of a residue in a family is defined as the average residue similarity in a given position of aligned sequences. The residue similarities were expressed in the form of log-odd substitution tables that take into account the environments of amino acids in three-dimensional structures. The protein core is defined as those residues that have less then 7% solvent accessibility. The density of a protein core is described in terms of atom packing, which is investigated as a criterion for residue substitution and conservation. Although there is no significant correlation between sequence conservation and average atom packing around nonpolar residues such as leucine, valine and isoleucine, a significant correlation is observed for polar residues in the protein core. This may be explained by the hydrogen bonds in which polar residues are involved; the better their protection from water access the more stable should be the structure in that position. Proteins 33:358–366, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

9.
Two new sets of scoring matrices are introduced: H2 for the protein sequence comparison and T2 for the protein sequence-structure correlation. Each element of H2 or T2 measures the frequency with which a pair of amino acid types in one protein, k-residues apart in the sequence, is aligned with another pair of residues, of given amino acid types (for H2) or in given structural states (for T2), in other structurally homologous proteins. There are four types, corresponding to the k-values of 1 to 4, for both H2 and T2. These matrices were set up using a large number of structurally homologous protein pairs, with little sequence homology between the pair, that were recently generated using the structure comparison program SHEBA. The two scoring matrices were incorporated into the main body of the sequence alignment program SSEARCH in the FASTA package and tested in a fold recognition setting in which a set of 107 test sequences were aligned to each of a panel of 3,539 domains that represent all known protein structures. Six procedures were tested; the straight Smith-Waterman (SW) and FASTA procedures, which used the Blosum62 single residue type substitution matrix; BLAST and PSI-BLAST procedures, which also used the Blosum62 matrix; PASH, which used Blosum62 and H2 matrices; and PASSC, which used Blosum62, H2, and T2 matrices. All procedures gave similar results when the probe and target sequences had greater than 30% sequence identity. However, when the sequence identity was below 30%, a similar structure could be found for more sequences using PASSC than using any other procedure. PASH and PSI-BLAST gave the next best results.  相似文献   

10.
Shih CH  Chang CM  Lin YS  Lo WC  Hwang JK 《Proteins》2012,80(6):1647-1657
The knowledge of conserved sequences in proteins is valuable in identifying functionally or structurally important residues. Generating the conservation profile of a sequence requires aligning families of homologous sequences and having knowledge of their evolutionary relationships. Here, we report that the conservation profile at the residue level can be quantitatively derived from a single protein structure with only backbone information. We found that the reciprocal packing density profiles of protein structures closely resemble their sequence conservation profiles. For a set of 554 nonhomologous enzymes, 74% (408/554) of the proteins have a correlation coefficient > 0.5 between these two profiles. Our results indicate that the three-dimensional structure, instead of being a mere scaffold for positioning amino acid residues, exerts such strong evolutionary constraints on the residues of the protein that its profile of sequence conservation essentially reflects that of its structural characteristics.  相似文献   

11.
A method of targeted random mutagenesis has been used to investigate the informational content of 25 residue positions in two alpha-helical regions of the N-terminal domain of lambda repressor. Examination of the functionally allowed sequences indicates that there is a wide range in tolerance to amino acid substitution at these positions. At positions that are buried in the structure, there are severe limitations on the number and type of residues allowed. At most surface positions, many different residues and residue types are tolerated. However, at several surface positions there is a strong preference for hydrophilic amino acids, and at one surface position proline is absolutely conserved. The results reveal the high level of degeneracy in the information that specifies a particular protein fold.  相似文献   

12.
MOTIVATION: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. RESULTS: We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments. AVAILABILITY: The C code of the AL2CO program and its pre-compiled versions for several platforms as well as the details of the analysis are freely available at ftp://iole.swmed.edu/pub/al2co/.  相似文献   

13.
The diversity of function in some enzyme superfamilies shows that during evolution, enzymes have evolved to catalyse different reactions on the same structure scaffold. In this analysis, we examine in detail how enzymes can modify their chemistry, through a comparison of the catalytic residues and mechanisms in 27 pairs of homologous enzymes of totally different functions. We find that evolution is very economical. Enzymes retain structurally conserved residues to aid catalysis, including residues that bind catalytic metal ions and modulate cofactor chemistry. We examine the conservation of residue type and residue function in these structurally conserved residue pairs. Additionally, enzymes often retain common mechanistic steps catalyzed by structurally conserved residues. We have examined these steps in the context of their overall reactions.  相似文献   

14.
MOTIVATION: Structural alignments of superfamily members often exhibit insertions and deletions of secondary structure elements (SSEs), yet conserved subsets of SSEs appear to be important for maintaining the fold and facilitating common functionalities. RESULTS: A database of aligned SSEs was constructed from the structure-based alignments of protein superfamily members in the CAMPASS database. SSEs were classified into several types on the basis of their length and solvent accessibility and counts were made for the replacements of SSEs in different types at structurally aligned positions. The results, summarized as log-odds substitution matrices, can be used for two types of comparisons: (1) structure against structure, both with secondary structure assignments; and (2) structure against sequence with predicted secondary structures. The conservation of SSEs at each alignment position was defined as the deviation of observed SSE frequencies from the uniform distribution. This offers a useful resource to define and examine the core of superfamily folds. Even when the structure of only a single member of a superfamily is known, the extended method can be used to predict the conservation of SSEs. Such information will be useful when modelling the structure of other members of a superfamily or identifying structurally and functionally important positions in the fold.  相似文献   

15.
Proteins in the intracellular lipid-binding protein (iLBP) family show remarkably high structural conservation despite their low-sequence identity. A multiple-sequence alignment using 52 sequences of iLBP family members revealed 15 fully conserved positions, with a disproportionately high number of these (n=7) located in the relatively small helical region. The conserved positions displayed high structural conservation based on comparisons of known iLBP crystal structures. It is striking that the beta-sheet domain had few conserved positions, despite its high structural conservation. This observation prompted us to analyze pair-wise interactions within the beta-sheet region to ask whether structural information was encoded in interacting amino acid pairs. We conducted this analysis on the iLBP family member, cellular retinoic acid-binding protein I (CRABP I), whose folding mechanism is under study in our laboratory. Indeed, an analysis based on a simple classification of hydrophobic and polar amino acids revealed a network of conserved interactions in CRABP I that cluster spatially, suggesting a possible nucleation site for folding. Significantly, a small number of residues participated in multiple conserved interactions, suggesting a key role for these sites in the structure and folding of CRABP I. The results presented here correlate well with available experimental evidence on folding of CRABPs and their family members and suggest future experiments. The analysis also shows the usefulness of considering pair-wise conservation based on a simple classification of amino acids, in analyzing sequences and structures to find common core regions among homologues.  相似文献   

16.
The Structural Motifs of Superfamilies (SMoS) database provides information about the structural motifs of aligned protein domain superfamilies. Such motifs among structurally aligned multiple members of protein superfamilies are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural content, hydrogen bonding, non-polar interaction and residue packing. These motifs, along with their sequence and spatial orientation, represent the conserved core structure of each superfamily and also provide the minimal requirement of sequence and structural information to retain each superfamily fold.  相似文献   

17.
Covariation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify covarying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a model system composed of three nested sets of G‐protein‐coupled receptors (GPCRs) in which a divergence event occurred. The covariation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the covariation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven—OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)—favored pairs with intermediate entropy and a networking structure with a central residue involved in several high‐scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the subfamily specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis‐based mechanism of functional divergence within a protein family. Proteins 2014; 82:2141–2156. © 2014 Wiley Periodicals, Inc.  相似文献   

18.
A long-standing question in molecular biology is whether interfaces of protein-protein complexes are more conserved than the rest of the protein surfaces. Although it has been reported that conservation can be used as an indicator for predicting interaction sites on proteins, there are recent reports stating that the interface regions are only slightly more conserved than the rest of the protein surfaces, with conservation signals not being statistically significant enough for predicting protein-protein binding sites. In order to properly address these controversial reports we have studied a set of 28 well resolved hetero complex structures of proteins that consists of transient and non-transient complexes. The surface positions were classified into four conservation classes and the conservation index of the surface positions was quantitatively analyzed. The results indicate that the surface density of highly conserved positions is significantly higher in the protein-protein interface regions compared with the other regions of the protein surface. However, the average conservation index of the patches in the interface region is not significantly higher compared with other surface regions of the protein structures. This finding demonstrates that the number of conserved residue positions is a more appropriate indicator for predicting protein-protein binding sites than the average conservation index in the interacting region. We have further validated our findings on a set of 59 benchmark complex structures. Furthermore, an analysis of 19 complexes of antigen-antibody interactions shows that there is no conservation of amino acid positions in the interacting regions of these complexes, as expected, with the variable region of the immunoglobulins interacting mostly with the antigens. Interestingly, antigen interacting regions also have a higher number of non-conserved residue positions in the interacting region than the rest of the protein surface.  相似文献   

19.
In an effort to understand the driving forces behind antiparallel beta-sheet assembly, we have investigated the mutational tolerance of four pairs of residues in CspA, the major cold shock protein of E. coli. Two buried pairs and two exposed pairs of neighboring amino acids were separately randomized and the corresponding effects on protein stability were assessed using a protein expression screen. The thermal denaturation of a subset of the recovered proteins was measured by circular dichroism spectroscopy in order to determine the range of stabilities sampled by the expressed mutants. As anticipated, buried sites are substantially less tolerant of substitutions than exposed sites with more than half of the exposed residue combinations giving rise to stably folded proteins. The two exposed residue pairs, however, display different degrees of tolerance to substitution and accept different residue pair combinations. Except for the prohibition of proline from interior strand positions, no obvious correlations of mutant stability with any single parameter such as beta-sheet propensity or hydrophobicity can be detected. Mutant combinations recovered in both orientations (e.g. XY and YX) at a given exposed pair site often show markedly different stabilities, indicating that the local environment plays a substantial role in modulating the pairing preferences of residues in beta-sheets.  相似文献   

20.
Sullivan SA  Landsman D 《Proteins》2003,52(3):454-465
The three-helix, approximately 65-residue histone fold domain is the most structurally conserved part of the core histones H2A, H2B, H3, and H4. However, it evinces a notable degree of sequence variation within and between histone classes. We used two approaches to characterize sequence variation in these histone folds, toward elucidating their structure/function relationships and evolution. On the one hand we asked how much of the sequence variation seen in structure-based alignments of the folds maintains physicochemical properties at a position, and on the other, whether conservation correlates to structural importance, as measured by the number of residue-to-residue contacts a position makes. Strong physicochemical conservation or correlation of conservation to contacts would support the idea that functional constraints, rather than genetic drift, determines the observed range of variants at a given position. We used an 11-state table of physicochemical properties to classify each position in the core histone fold (CHF) alignments, and a public website (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/valdar/scorecons_server.pl) to score conservation. We found that, depending on histone class, from 38 to 77% of CHF positions are maximally conserved physicochemically, and that for H2B, H3, and H4 the degree to which a position is conserved correlates positively to the number of contacts made by the residue at that position in the crystal structure of the nucleosome core particle. We also examined the correlation between conservation and the type of contact (e.g., inter- or intrachain, histone-histone, or histone-DNA, etc.). For H2B, H3, and H4 we found a positive correlation between conservation and number of interchain protein contacts. No such correlation or statistical significance was found for DNA or intrachain contacts. This suggests that variations in the CHF sequences could be functionally constrained by requirements to make sufficient interchain histone contacts. We also suggest that inventory of histone residue variants can augment functional studies of histones. An example is presented for histone H3.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号