首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to study structural aspects of sequence conservation in families of homologous proteins, we have analyzed structurally aligned sequences of 585 proteins grouped into 128 homologous families. The conservation of a residue in a family is defined as the average residue similarity in a given position of aligned sequences. The residue similarities were expressed in the form of log-odd substitution tables that take into account the environments of amino acids in three-dimensional structures. The protein core is defined as those residues that have less then 7% solvent accessibility. The density of a protein core is described in terms of atom packing, which is investigated as a criterion for residue substitution and conservation. Although there is no significant correlation between sequence conservation and average atom packing around nonpolar residues such as leucine, valine and isoleucine, a significant correlation is observed for polar residues in the protein core. This may be explained by the hydrogen bonds in which polar residues are involved; the better their protection from water access the more stable should be the structure in that position. Proteins 33:358–366, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

2.
Liu XS  Guo WL 《Amino acids》2008,34(4):643-652
Measuring residue conservation at aligned positions has many applications in biology. Recently, a new conservation score has been defined. Unlike the previous methods, the new approach considers both residue frequencies and physicochemistries. Specifically, it measures physicochemistries based on BLOSUM matrices disregarding the meaning of the entries in such matrices, which may involve the problem of log–log probability. In this paper we present a conservation measure that also reflects both frequencies and physicochemistries while considering the fact that the entries of BLOSUM matrices are already interpreted as log probability. When the supposed score is applied to 14 protein examples, the results show that these two conservation scores are equivalent aside from the different score ranges. The method is also used to score the functional sites of three protein families. Compared with the widely used entropy-based methods, the resulting scores are more robust and consistent in the sense that the functional sites are much more conserved because of functional constraints.  相似文献   

3.
The conservation of residues in columns of a multiple sequence alignment (MSA) reflects the importance of these residues for maintaining the structure and function of a protein. To date, many scores have been suggested for quantifying residue conservation, but none has achieved the full rigor both in biology and statistics. In this paper, we present a new approach for measuring the evolutionary conservation at aligned positions. Our conservation measure is related to the logarithmic probabilities for aligned positions, and combines the physicochemical properties and the frequencies of amino acids. Such a measure is both biologically and statistically meaningful. For testing the relationship between an amino acid's evolutionary conservation and its role in the Phi-value defined protein folding kinetics, our results indicate that the folding nucleus residues may not be significantly more conserved than other residues by using the biological-relevance weighted statistical scoring method suggested in this paper as an alternative to entropy-based procedures.  相似文献   

4.
Shih CH  Chang CM  Lin YS  Lo WC  Hwang JK 《Proteins》2012,80(6):1647-1657
The knowledge of conserved sequences in proteins is valuable in identifying functionally or structurally important residues. Generating the conservation profile of a sequence requires aligning families of homologous sequences and having knowledge of their evolutionary relationships. Here, we report that the conservation profile at the residue level can be quantitatively derived from a single protein structure with only backbone information. We found that the reciprocal packing density profiles of protein structures closely resemble their sequence conservation profiles. For a set of 554 nonhomologous enzymes, 74% (408/554) of the proteins have a correlation coefficient > 0.5 between these two profiles. Our results indicate that the three-dimensional structure, instead of being a mere scaffold for positioning amino acid residues, exerts such strong evolutionary constraints on the residues of the protein that its profile of sequence conservation essentially reflects that of its structural characteristics.  相似文献   

5.
Protein interfaces are thought to be distinguishable from the rest of the protein surface by their greater degree of residue conservation. We test the validity of this approach on an expanded set of 64 protein-protein interfaces using conservation scores derived from two multiple sequence alignment types, one of close homologs/orthologs and one of diverse homologs/paralogs. Overall, we find that the interface is slightly more conserved than the rest of the protein surface when using either alignment type, with alignments of diverse homologs showing marginally better discrimination. However, using a novel surface-patch definition, we find that the interface is rarely significantly more conserved than other surface patches when using either alignment type. When an interface is among the most conserved surface patches, it tends to be part of an enzyme active site. The most conserved surface patch overlaps with 39% (+/- 28%) and 36% (+/- 28%) of the actual interface for diverse and close homologs, respectively. Contrary to results obtained from smaller data sets, this work indicates that residue conservation is rarely sufficient for complete and accurate prediction of protein interfaces. Finally, we find that obligate interfaces differ from transient interfaces in that the former have significantly fewer alignment gaps at the interface than the rest of the protein surface, as well as having buried interface residues that are more conserved than partially buried interface residues.  相似文献   

6.
Identifying protein–protein interfaces is crucial for structural biology. Because of the constraints in wet experiments, many computational methods have been proposed. Without knowing any information about the partner chains, a new method of predicting protein–protein interaction interface residues purely based on evolutionary information in heterocomplexes is proposed here. Unlike traditional approaches using multiple sequence alignment profiles to represent the conservation level for each residue, we make predictions based on the concept of residue conservation scores so that the dimension of the feature vector for each residue can be drastically reduced, at least 20 times less than conventional methods. Based on the representation approach, a simple linear discriminant function is used to make predictions, so the computational complexity of the whole prediction procedure can also be greatly decreased. By testing our approach on 69 heterocomplex chains, experimental results demonstrate the performance of our approach is indeed superior to current existing methods.  相似文献   

7.
Wang B  Chen P  Huang DS  Li JJ  Lok TM  Lyu MR 《FEBS letters》2006,580(2):380-384
This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.  相似文献   

8.
Many protein pairs that share the same fold do not have any detectable sequence similarity, providing a valuable source of information for studying sequence-structure relationship. In this study, we use a stringent data set of structurally similar, sequence-dissimilar protein pairs to characterize residues that may play a role in the determination of protein structure and/or function. For each protein in the database, we identify amino-acid positions that show residue conservation within both close and distant family members. These positions are termed "persistently conserved". We then proceed to determine the "mutually" persistently conserved (MPC) positions: those structurally aligned positions in a protein pair that are persistently conserved in both pair mates. Because of their intra- and interfamily conservation, these positions are good candidates for determining protein fold and function. We find that 45% of the persistently conserved positions are mutually conserved. A significant fraction of them are located in critical positions for secondary structure determination, they are mostly buried, and many of them form spatial clusters within their protein structures. A substitution matrix based on the subset of MPC positions shows two distinct characteristics: (i) it is different from other available matrices, even those that are derived from structural alignments; (ii) its relative entropy is high, emphasizing the special residue restrictions imposed on these positions. Such a substitution matrix should be valuable for protein design experiments.  相似文献   

9.
Amino acid sequence alignment is an extremely useful tool in protein family analysis. Most family characteristics, such as the localization of functional residues, structural constraints and evolutionary relationships may be retrieved through the observation of the conservation pattern highlighted by the alignments. A quantitative score for the conservation in the alignment allows different stages of an alignment to be compared and consequently the alignment information to be efficiently exploited. Many scoring methods have been proposed during the last three decades. Claude Shannon's theory of communication (1948) paved the way for a consistent scoring of protein alignments by considering the residue (or symbol) frequency. A number of modifications have been proposed since that time, but the core statistical approach is still considered one of the best. By combining many database managing tools for treatment of protein sequences, a ClustalW software integration, a flexible symbols treatment and gap normalization functions, Entropy Calculator software has been developed. This new tool provides a global and optimal approach to multiple sequence alignment scoring by offering an easy graphic interface and a series of modification options that help in interpreting alignments and allow conservation pattern inferences to be performed.  相似文献   

10.
Sullivan SA  Landsman D 《Proteins》2003,52(3):454-465
The three-helix, approximately 65-residue histone fold domain is the most structurally conserved part of the core histones H2A, H2B, H3, and H4. However, it evinces a notable degree of sequence variation within and between histone classes. We used two approaches to characterize sequence variation in these histone folds, toward elucidating their structure/function relationships and evolution. On the one hand we asked how much of the sequence variation seen in structure-based alignments of the folds maintains physicochemical properties at a position, and on the other, whether conservation correlates to structural importance, as measured by the number of residue-to-residue contacts a position makes. Strong physicochemical conservation or correlation of conservation to contacts would support the idea that functional constraints, rather than genetic drift, determines the observed range of variants at a given position. We used an 11-state table of physicochemical properties to classify each position in the core histone fold (CHF) alignments, and a public website (http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/valdar/scorecons_server.pl) to score conservation. We found that, depending on histone class, from 38 to 77% of CHF positions are maximally conserved physicochemically, and that for H2B, H3, and H4 the degree to which a position is conserved correlates positively to the number of contacts made by the residue at that position in the crystal structure of the nucleosome core particle. We also examined the correlation between conservation and the type of contact (e.g., inter- or intrachain, histone-histone, or histone-DNA, etc.). For H2B, H3, and H4 we found a positive correlation between conservation and number of interchain protein contacts. No such correlation or statistical significance was found for DNA or intrachain contacts. This suggests that variations in the CHF sequences could be functionally constrained by requirements to make sufficient interchain histone contacts. We also suggest that inventory of histone residue variants can augment functional studies of histones. An example is presented for histone H3.  相似文献   

11.
Covariation between positions in a multiple sequence alignment may reflect structural, functional, and/or phylogenetic constraints and can be analyzed by a wide variety of methods. We explored several of these methods for their ability to identify covarying positions related to the divergence of a protein family at different hierarchical levels. Specifically, we compared seven methods on a model system composed of three nested sets of G‐protein‐coupled receptors (GPCRs) in which a divergence event occurred. The covariation methods analyzed were based on: χ2 test, mutual information, substitution matrices, and perturbation methods. We first analyzed the dependence of the covariation scores on residue conservation (measured by sequence entropy), and then we analyzed the networking structure of the top pairs. Two methods out of seven—OMES (Observed minus Expected Squared) and ELSC (Explicit Likelihood of Subset Covariation)—favored pairs with intermediate entropy and a networking structure with a central residue involved in several high‐scoring pairs. This networking structure was observed for the three sequence sets. In each case, the central residue corresponded to a residue known to be crucial for the evolution of the GPCR family and the subfamily specificity. These central residues can be viewed as evolutionary hubs, in relation with an epistasis‐based mechanism of functional divergence within a protein family. Proteins 2014; 82:2141–2156. © 2014 Wiley Periodicals, Inc.  相似文献   

12.
Three-dimensional cluster analysis offers a method for the prediction of functional residue clusters in proteins. This method requires a representative structure and a multiple sequence alignment as input data. Individual residues are represented in terms of regional alignments that reflect both their structural environment and their evolutionary variation, as defined by the alignment of homologous sequences. From the overall (global) and the residue-specific (regional) alignments, we calculate the global and regional similarity matrices, containing scores for all pairwise sequence comparisons in the respective alignments. Comparing the matrices yields two scores for each residue. The regional conservation score (C(R)(x)) defines the conservation of each residue x and its neighbors in 3D space relative to the protein as a whole. The similarity deviation score (S(x)) detects residue clusters with sequence similarities that deviate from the similarities suggested by the full-length sequences. We evaluated 3D cluster analysis on a set of 35 families of proteins with available cocrystal structures, showing small ligand interfaces, nucleic acid interfaces and two types of protein-protein interfaces (transient and stable). We present two examples in detail: fructose-1,6-bisphosphate aldolase and the mitogen-activated protein kinase ERK2. We found that the regional conservation score (C(R)(x)) identifies functional residue clusters better than a scoring scheme that does not take 3D information into account. C(R)(x) is particularly useful for the prediction of poorly conserved, transient protein-protein interfaces. Many of the proteins studied contained residue clusters with elevated similarity deviation scores. These residue clusters correlate with specificity-conferring regions: 3D cluster analysis therefore represents an easily applied method for the prediction of functionally relevant spatial clusters of residues in proteins.  相似文献   

13.
As a result of rapid advances in genome sequencing, the pace of discovery of new protein sequences has surpassed that of structure and function determination by orders of magnitude. This is also true for metal-binding proteins, that is, proteins that bind one or more metal atoms necessary for their biological function. While metal binding site geometry and composition have been extensively studied, no large scale investigation of metal-coordinating residue conservation has been pursued so far. In pursuing this analysis, we were able to corroborate anecdotal evidence that certain residues are preferred to others for binding to certain metals. The conservation of most metal-coordinating residues is correlated with residue preference in a statistically significant manner. Additionally, we also established a statistically significant difference in conservation between metal-coordinating and noncoordinating residues. These results could be useful for providing better insight to functional importance of metal-coordinating residues, possibly aiding metal binding site prediction and design, metal-protein complex structure prediction, drug discovery, as well as model fitting to electron-density maps produced by X-ray crystallography.  相似文献   

14.
Bahir I  Linial M 《Proteins》2006,63(4):996-1004
The two ends of each protein are known as the amino (N-) and carboxyl (C-) termini. Short signatures in a protein's termini often carry vital cellular function. No systematic research has been conducted to address the importance of short signatures (3 to 10 amino acids) in protein termini at the proteomic level. Specifically, it is unknown whether such signatures are evolutionarily conserved, and if so, whether this conservation confers shared biological functions. Current signature detection methods fail to detect such short signatures due to inadequate statistical scores. The findings presented in this study strongly support the notion that functional significance of protein sets may be captured by short signatures at their termini. A positional search method was applied to over one million proteins from the UniProt database. The result is a collection of about a thousand significant signature groups (SIGs) that include previously identified as well as many novel signatures in protein termini. These SIGs represent protein sets with minimal or no overall sequence similarity excepting the similarity at their termini. The most significant SIGs are assigned by their strong correspondence to functional annotations derived from external databases such as Gene Ontology. Each of the SIGs is associated with the statistical significance of its functional association. These SIGs provide a valuable source for testing previously overlooked signatures in protein termini and allow for the investigation of the role played by such signatures throughout evolution. The SIGs archive and advanced search options are available at http://www.proteus.cs.huji.ac.il.  相似文献   

15.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

16.
A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set.  相似文献   

17.
Kuznetsov IB 《Proteins》2008,72(1):74-87
Ordered conformational changes are an important structural property of proteins and are involved in a variety of fundamental biological activities. Large-scale analyses of the implications of such changes for protein function and dysfunction require efficient methods for automated recognition of conformationally variable residue positions. The goal of this work was to study sequence and low-resolution structural properties of residue positions that change backbone conformation upon changes in protein environment and the utility of these properties for automated recognition of such conformationally variable positions. This study was performed using a large nonredundant set of experimentally characterized proteins that undergo ordered conformational transitions obtained from the Database of Macromolecular Movements. The results of this study show that ordered changes in backbone conformation are not limited to solvent accessible loop regions. A considerable fraction of conformationally variable positions is observed in helices and strands, and in buried positions. Conformationally variable positions are less conserved in evolution. Local patterns of (a) sequence neighbors, (b) evolutionary conservation, and (c) solvent accessibility can be used to predict conformationally variable positions with balanced sensitivity and specificity, albeit with large variance at the level of individual proteins. However, including a pattern of secondary structure into the prediction scheme results in a highly unbalanced performance when all conformationally variable positions located in regular secondary structure are misclassified. Application of the present methodology to the prion protein (PrP) shows that conformationally variable positions predicted in its ordered C-terminal domain are located within segments presumed to be involved in refolding of PrP.  相似文献   

18.
A long-standing question in molecular biology is whether interfaces of protein-protein complexes are more conserved than the rest of the protein surfaces. Although it has been reported that conservation can be used as an indicator for predicting interaction sites on proteins, there are recent reports stating that the interface regions are only slightly more conserved than the rest of the protein surfaces, with conservation signals not being statistically significant enough for predicting protein-protein binding sites. In order to properly address these controversial reports we have studied a set of 28 well resolved hetero complex structures of proteins that consists of transient and non-transient complexes. The surface positions were classified into four conservation classes and the conservation index of the surface positions was quantitatively analyzed. The results indicate that the surface density of highly conserved positions is significantly higher in the protein-protein interface regions compared with the other regions of the protein surface. However, the average conservation index of the patches in the interface region is not significantly higher compared with other surface regions of the protein structures. This finding demonstrates that the number of conserved residue positions is a more appropriate indicator for predicting protein-protein binding sites than the average conservation index in the interacting region. We have further validated our findings on a set of 59 benchmark complex structures. Furthermore, an analysis of 19 complexes of antigen-antibody interactions shows that there is no conservation of amino acid positions in the interacting regions of these complexes, as expected, with the variable region of the immunoglobulins interacting mostly with the antigens. Interestingly, antigen interacting regions also have a higher number of non-conserved residue positions in the interacting region than the rest of the protein surface.  相似文献   

19.
Summary Increasing data onDrosophila alcohol dehydrogenase (ADH) sequences have made it possible to calculate the rate of amino acid replacement per year, which is 1.7×10–9. This value makes this protein suitable for reconstructing phylogenetic relationships within the genus for those species for which no molecular data are available such asScaptodrosophila. The amino acid sequence ofDrosophila lebanonensis is compared to all of the already knownDrosophila ADHs, stressing the unique characteristic features of this protein such as the conservation of an initiating methionine at the N-terminus, the unique replacement of a glycine by an alanine at a very conserved position in the NAD domain of all dehydrogenases, the lack of a slowmigrating peptide, and the total conservation of the maximally hydrophilic peptide. The functional significance of these features is discussed.Although the percent amino acid identity of the ADH molecule inDrosophila decreases as the number of sequences compared increases, the conservation of residue type in terms of size and hydrophobocity for the ADH molecule is shown to be very high throughout the genusDrosophila. The distance matrix and parsimony methods used to establish the phylogenetic relationships ofD. lebanonensis show that the three subgenera,Scaptodrosophila, Drosophila, andSophophora separated at approximately the same time.  相似文献   

20.
Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号