首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It has long been suspected that analysis of correlated amino acid substitutions should uncover pairs or clusters of sites that are spatially proximal in mature protein structures. Accordingly, methods based on different mathematical principles such as information theory, correlation coefficients and maximum likelihood have been developed to identify co-evolving amino acids from multiple sequence alignments. Sets of pairs of sites whose behaviour is identified by these methods as correlated are often significantly enriched in pairs of spatially proximal residues. However, relatively high levels of false-positive predictions typically render such methods, in isolation, of little use in the ab initio prediction of protein structure. Misleading signal (or problems with the estimation of significance levels) can be caused by phylogenetic correlations between homologous sequences and from correlation due to factors other than spatial proximity (for example, correlation of sites which are not spatially close but which are involved in common functional properties of the protein). In recent years, several workers have suggested that information from correlated substitutions should be combined with other sources of information (secondary structure, solvent accessibility, evolutionary rates) in an attempt to reduce the proportion of false-positive predictions. We review methods for the detection of correlated amino acid substitutions, compare their relative performance in contact prediction and predict future directions in the field.  相似文献   

2.
The O(R) regions from several lambdoid bacteriophages contain the three regulatory sites O(R)1, O(R)2 and O(R)3, to which the Cro and CI proteins can bind. These sites show imperfect dyad symmetry, have similar sequences, and generally lie on the same face of the DNA double helix. We have developed a computational method, which analyzes the O(R) regions of additional phages and predicts the location of these three sites. After tuning the method to predict known O(R) sites accurately, we used it to predict unknown sites, and ultimately compiled a database of 32 known and predicted O(R) binding site sets. We then identified sequences of the recognition helices (RH) for the cognate Cro proteins through manual inspection of multiple sequence alignments. Comparison of Cro RH and consensus O(R) half-site sequences revealed strong one-to-one correlations between two amino acids at each of three RH positions and two bases at each of three half-site positions (H1-->2, H3-->5 and H6-->6). In each of these three cases, one of the two amino acid/base-pairings corresponds to a contact observed in the crystal structure of a lambda Cro/consensus operator complex. The alternate amino acid/base combinations were rationalized using structural models. We suggest that the pairs of amino acid residues act as binary switches that efficiently modulate specificity for different consensus half-site variants during evolution. The observation of structurally reasonable amino acid-to-base correlations suggests that Cro proteins share some common rules of recognition despite their functional and structural diversity.  相似文献   

3.
Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large‐scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD( http://www.uhnres.utoronto.ca/labs/tillier/ ) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

4.
A fast method to predict protein interaction sites from sequences   总被引:15,自引:0,他引:15  
A simple method for predicting residues involved in protein interaction sites is proposed. In the absence of any structural report, the procedure identifies linear stretches of sequences as "receptor-binding domains" (RBDs) by analysing hydrophobicity distribution. The sequences of two databases of non-homologous interaction sites eliciting various biological activities were tested; 59-80 % were detected as RBDs. A statistical analysis of amino acid frequencies was carried out in known interaction sites and in predicted RBDs. RBDs were predicted from the 80,000 sequences of the Swissprot database. In both cases, arginine is the most frequently occurring residue. The RBD procedure can also detect residues involved in specific interaction sites such as the DNA-binding (95 % detected) and Ca-binding domains (83 % detected). We report two recent analyses; from the prediction of RBDs in sequences to the experimental demonstration of the functional activities. The examples concern a retroviral Gag protein and a penicillin-binding protein. We support that this method is a quick way to predict protein interaction sites from sequences and is helpful for guiding experiments such as site-specific mutageneses, two-hybrid systems or the synthesis of inhibitors.  相似文献   

5.
The identification of protein sites undergoing correlated evolution (coevolution) is of great interest due to the possibility that these pairs will tend to be adjacent in the three-dimensional structure. Identification of such pairs should provide useful information for understanding the evolutionary process, predicting the effects of site-directed substitution, and potentially for predicting protein structure. Here, we develop and apply a maximum likelihood method with the aim of improving detection of coevolution. Unlike previous methods which have had limited success, this method allows for correlations induced by phylogenetic relationships and for variation in rate of evolution along branches, and does not rely on accurate reconstruction of ancestral nodes. In order to reduce the complexity of coevolutionary relationships and identify the primary component of pairwise coevolution between two sites, we reduce the data to a two-state system at each site, regardless of the actual number of residues observed at that site. Simulations show that this strategy is good at identifying simple correlations and at recognizing cases in which the data are insufficient to distinguish between coevolution and spurious correlations. The new method was tested by using size and charge characteristics to group the residues at each site, and then evaluating coevolution in myoglobin sequences. Grouping based on physicochemical characteristics allows categorization of coevolving sites into positive and negative coevolution, depending on the correlation between equilibrium state frequencies. We detected a striking excess of negative coevolution (corresponding to charge) at sites brought into proximity by the periodicity of the alpha-helix, and there was also a tendency for sites with significant likelihood ratios to be close in the three-dimensional structure. Sites on the surface of the protein appear to coevolve both when they are close in the structure, and when they are distant, implying a role for folding and/or avoidance of quaternary structure in the coevolution process.  相似文献   

6.
Membrane proteins function in the diverse environment of the lipid bilayer. Experimental evidence suggests that some lipid molecules bind tightly to specific sites on the membrane protein surface. These lipid molecules often act as co-factors and play important functional roles. In this study, we have assessed the evolutionary selection pressure experienced at lipid-binding sites in a set of α-helical and β-barrel membrane proteins using posterior probability analysis of the ratio of synonymous vs. nonsynonymous substitutions (ω-ratio). We have also carried out a geometric analysis of the membrane protein structures to identify residues in close contact with co-crystallized lipids. We found that residues forming cholesterol-binding sites in both β(2)-adrenergic receptor and Na(+)-K(+)-ATPase exhibit strong conservation, which can be characterized by an expanded cholesterol consensus motif for GPCRs. Our results suggest the functional importance of aromatic stacking interactions and interhelical hydrogen bonds in facilitating protein-cholesterol interactions, which is now reflected in the expanded motif. We also find that residues forming the cardiolipin-binding site in formate dehydrogenase-N γ-subunit and the phosphatidylglycerol binding site in KcsA are under strong purifying selection pressure. Although the lipopolysaccharide (LPS)-binding site in ferric hydroxamate uptake receptor (FhuA) is only weakly conserved, we show using a statistical mechanical model that LPS binds to the least stable FhuA β-strand and protects it from the bulk lipid. Our results suggest that specific lipid binding may be a general mechanism employed by β-barrel membrane proteins to stabilize weakly stable regions. Overall, we find that the residues forming specific lipid binding sites on the surfaces of membrane proteins often experience strong purifying selection pressure.  相似文献   

7.
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.  相似文献   

8.
Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.  相似文献   

9.
La D  Kihara D 《Proteins》2012,80(1):126-141
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.  相似文献   

10.
11.
真核翻译延伸因子1A(eEF1A)是真核生物蛋白质翻译过程中能将氨酰tRNA运送到核糖体A位点参与多肽延伸反应的多功能蛋白质. 本文主要利用多种生物信息学分析工具进行地中海涡虫翻译延伸因子1A(SmEF1A)蛋白序列的查找与eEF1A直系同源蛋白的搜索, 并基于90条直系同源蛋白进行eEF1A蛋白家族的进化踪迹分析和SmEF1A蛋白功能位点的比较研究. 结果表明,在eEF1A蛋白家族中共识别到338个踪迹残基位点和20个踪迹残基富集区域,SmEF1A蛋白的功能位点与踪迹残基位点密切相关,与GTP/Mg2+结合相关的S21、T72、D91、G94等重要位点均为全家族保守的踪迹残基,N 糖基化、磷酸化等蛋白修饰位点中踪迹残基位点往往是被修饰的部位或修饰功能发挥的关键辅助位点,而位于分子表面的配基结合口袋则与20个踪迹残基富集区域在分子表面形成的踪迹残基簇关系密切. eEF1A蛋白家族的进化踪迹分析为eEF1A蛋白重要功能区域关键残基的确定和未知功能位点的预测提供了重要信息.  相似文献   

12.
Pairwise alignment incorporating dipeptide covariation   总被引:1,自引:0,他引:1  
MOTIVATION: Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrices that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations and by assessing the ability of this algorithm to detect remote homologies. RESULTS: Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation.  相似文献   

13.
Evolution of protein sequences and structures.   总被引:9,自引:0,他引:9  
The relationship between sequence similarity and structural similarity has been examined in 36 protein families with five or more diverse members whose structures are known. The structural similarity within a family (as determined with the DALI structure comparison program) is linearly related to sequence similarity (as determined by a Smith-Waterman search of the protein sequences in the structure database). The correlation between structural similarity and sequence similarity is very high; 18 of the 36 families had linear correlation coefficients r>/=0.878, and only nine had correlation coefficients r相似文献   

14.
Autocorrelation and spectrum analyses of amino acid residues along protein chains in a large data base has been performed. Results reveal the presence of general long range correlations. Similar analyses of simulated (random) peptides do not exhibit any such long range correlations. Based on the results of nur analysis, an attempt has been made to model the distribution of residues in protein sequences on a fractional Brownian motion and individual sequences as multi-fractals. For this purpose, the characteristics of an fractional Brownian motion namely, the scaling parameterH. the spectral exponent β and the fractal dimensionD, have been described  相似文献   

15.
A combination of three independent biological features, genomic organization, diagnostic amino acid sites, and rare indels, was used to elucidate the phylogeny of the vertebrate serpin (serine protease inhibitor) superfamily. A strong correlation between serpin gene families displaying (1) a conserved exon-intron pattern and (2) family-specific combinations of amino acid residues at specific sites suggests that present-day vertebrates encompass six serpin gene families which evolved from primordial genes by massive intron insertion before or during early vertebrate radiation. Introns placed at homologous positions in the gene sequences in combination with diagnostic sequence characters may also constitute a reliable kinship indicator for other protein superfamilies.  相似文献   

16.
Given the massive increase in the number of new sequences and structures, a critical problem is how to integrate these raw data into meaningful biological information. One approach, the Evolutionary Trace, or ET, uses phylogenetic information to rank the residues in a protein sequence by evolutionary importance and then maps those ranked at the top onto a representative structure. If these residues form structural clusters, they can identify functional surfaces such as those involved in molecular recognition. Now that a number of examples have shown that ET can identify binding sites and focus mutational studies on their relevant functional determinants, we ask whether the method can be improved so as to be applicable on a large scale. To address this question, we introduce a new treatment of gaps resulting from insertions and deletions, which streamlines the selection of sequences used as input. We also introduce objective statistics to assess the significance of the total number of clusters and of the size of the largest one. As a result of the novel treatment of gaps, ET performance improves measurably. We find evolutionarily privileged clusters that are significant at the 5% level in 45 out of 46 (98%) proteins drawn from a variety of structural classes and biological functions. In 37 of the 38 proteins for which a protein-ligand complex is available, the dominant cluster contacts the ligand. We conclude that spatial clustering of evolutionarily important residues is a general phenomenon, consistent with the cooperative nature of residues that determine structure and function. In practice, these results suggest that ET can be applied on a large scale to identify functional sites in a significant fraction of the structures in the protein databank (PDB). This approach to combining raw sequences and structure to obtain detailed insights into the molecular basis of function should prove valuable in the context of the Structural Genomics Initiative.  相似文献   

17.
We investigate methods of estimating residue correlation within protein sequences. We begin by using mutual information (MI) of adjacent residues, and improve our methodology by defining the mutual information vector (MIV) to estimate long range correlations between nonadjacent residues. We also consider correlation based on residue hydropathy rather than protein-specific interactions. Finally, in experiments of family classification tests, the modeling power of MIV was shown to be significantly better than the classic MI method, reaching the level where proteins can be classified without alignment information.  相似文献   

18.
Long-range two-body correlations in a DNA sequence should in theory approach a constant value very rapidly with increasing value of the correlation length. It is shown that for most DNA sequences, the long-range correlations exhibit oscillations superimposed on the constant background. These oscillations persist for very large correlation lengths. The oscillations are shown to be three-point cycles and are related to the coding regions in the DNA. A method for discovering the coding regions in DNA sequences is presented. The limitations of the method are discussed.  相似文献   

19.
Dihydrofolate reductase (DHFR) is of significant recent interest as a target for drugs against parasitic and opportunistic infections. Understanding factors which influence DHFR homolog inhibitor specificity is critical for the design of compounds that selectively target DHFRs from pathogenic organisms over the human homolog. This paper presents a novel approach for predicting residues involved in ligand discrimination in a protein family using DHFR as a model system. In this approach, the relationship between inhibitor specificity and amino acid composition for sets of protein homolog pairs is examined. Similar inhibitor specificity profiles correlate with increased sequence homology at specific alignment positions. Residue positions that exhibit the strongest correlations are predicted as specificity determinants. Correlation analysis requires a quantitative measure of similarity in inhibitor specificity (S(lig)) for a pair of homologs. To this end, a method of calculating S(lig) values using K(I) values for the two homologs against a set of inhibitors as input was developed. Correlation analysis of S(lig) values to amino acid sequence similarity scores - obtained via multiple sequence alignments - was performed for individual residue alignment positions and sets of residues on 13 DHFRs. Eighteen alignment positions were identified with a strong correlation of S(lig) to sequence similarity. Of these, three lie in the active site; four are located proximal to the active site, four are clustered together in the adenosine binding domain and five on the βFβG loop. The validity of the method is supported by agreement between experimental findings and current predictions involving active site residues.  相似文献   

20.
A fundamental goal in cellular signaling is to understand allosteric communication, the process by which signals originating at one site in a protein propagate reliably to affect distant functional sites. The general principles of protein structure that underlie this process remain unknown. Here, we describe a sequence-based statistical method for quantitatively mapping the global network of amino acid interactions in a protein. Application of this method for three structurally and functionally distinct protein families (G protein-coupled receptors, the chymotrypsin class of serine proteases and hemoglobins) reveals a surprisingly simple architecture for amino acid interactions in each protein family: a small subset of residues forms physically connected networks that link distant functional sites in the tertiary structure. Although small in number, residues comprising the network show excellent correlation with the large body of mechanistic data available for each family. The data suggest that evolutionarily conserved sparse networks of amino acid interactions represent structural motifs for allosteric communication in proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号