首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 77 毫秒
1.
Conservation of residue interactions in a family of Ca-binding proteins   总被引:1,自引:0,他引:1  
In the TNC family of Ca-binding proteins (calmodulin, parvalbumin, intestinal calcium binding protein and troponin C) approximately 70 well-conserved amino acid sequences and six crystal structures are known. We find a clear correlation between residue contacts in the structures and residue conservation in the sequences: residues with strong sidechain-sidechain contacts in the three-dimenesional structure tend to be the more conserved in the sequence. This is one way to quantify the intuitive notion of the importance of sidechain interactions for maintaining protein three-dimensional structure in evolution and may usefully be taken into account in planning point mutations in protein engineering.  相似文献   

2.
Communication between distant sites often defines the biological role of a protein: amino acid long-range interactions are as important in binding specificity, allosteric regulation and conformational change as residues directly contacting the substrate. The maintaining of functional and structural coupling of long-range interacting residues requires coevolution of these residues. Networks of interaction between coevolved residues can be reconstructed, and from the networks, one can possibly derive insights into functional mechanisms for the protein family. We propose a combinatorial method for mapping conserved networks of amino acid interactions in a protein which is based on the analysis of a set of aligned sequences, the associated distance tree and the combinatorics of its subtrees. The degree of coevolution of all pairs of coevolved residues is identified numerically, and networks are reconstructed with a dedicated clustering algorithm. The method drops the constraints on high sequence divergence limiting the range of applicability of the statistical approaches previously proposed. We apply the method to four protein families where we show an accurate detection of functional networks and the possibility to treat sets of protein sequences of variable divergence.  相似文献   

3.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

4.
This paper discusses the benefit of mapping paired cysteine mutation patterns as a guide to identifying the positions of protein disulfide bonds. This information can facilitate the computer modeling of protein tertiary structure. First, a simple, paired natural-cysteine-mutation map is presented that identifies the positions of putative disulfide bonds in protein families. The method is based on the observation that if, during the process of evolution, a disulfide-bonded cysteine residue is not conserved, then it is likely that its counterpart will also be mutated. For each target protein, protein databases were searched for the primary amino acid sequences of all known members of distinct protein families. Primary sequence alignment was carried out using PileUp algorithms in the GCG package. To search for correlated mutations, we listed only the positions where cysteine residues were highly conserved and emphasized the mutated residues. In proteins of known three-dimensional structure, a striking pattern of paired cysteine mutations correlated with the positions of known disulfide bridges. For proteins of unknown architecture, the mutation maps showed several positions where disulfide bridging might occur.  相似文献   

5.
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.  相似文献   

6.
Comparison of both the DNA and protein sequences of catabolite gene activator protein (CAP) with the sequences of lac and gal repressors shows significant homologies between a sequence that forms a two alpha-helix motif in CAP and sequences near the amino terminus of both repressors. This two-helix motif is thought to be involved in specific DNA sequence recognition by CAP. The region in lac repressor to which CAP is homologous contains many i-d mutations that are defective in DNA binding. Less significant sequence homologies between CAP and phage repressors and activators are also shown. The amino acid residues that are critical to the formation of the two-helix motif are conserved, while those residues expected to interact with DNA are variable. These observations suggest the lac and gal repressors also have a two alpha-helix structural motif which is involved in DNA binding and that this two helix motif may be generally found in many bacterial and phage repressors. We conclude that one major mechanism by which proteins can recognize specific base sequences in double stranded DNA is via the amino acid side chains of alpha-helices fitting into the major groove of B-DNA.  相似文献   

7.
The nucleotide sequence of the structural gene (nifH) of nitrogenase reductase (Fe protein) from R.meliloti 41 with its flanking ends is reported. The amino acid sequence of nitrogenase reductase was deduced from the DNA sequence. The predicted R.meliloti nitrogenase reductase protein consists of 297 amino acid residues, has a molecular weight of 32,740 daltons and contains 5 cysteine residues. The codon usage in the nifH gene is presented. In the 5' flanking region, sequences resembling to consensus sequences of bacterial control regions were found. Comparison of the R.meliloti nifH nucleotide and amino acid sequences with those from different nitrogen-fixing organisms showed that the amino acid sequences are more conserved than the nucleotide sequences. This structural conservation of nitrogenase reductase may be related to its function and may explain the conservation of the nifH gene during evolution.  相似文献   

8.

Background  

Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities.  相似文献   

9.
10.
As the amino acid sequence of a given protein changes along the phylogenetic tree, enough of the overall folding pattern must be conserved to ensure that the protein still fulfils its biological function. Eighteen published scales which tabulate various side chain properties are compared here by computing the variance of each scale when applied to each of several protein families. The conservation of each scale of side chain properties is examined for the 20,627 residues in 60 mammalian myoglobins, 31 mammalian ribonucleases, insulin A and B chains (29 sequences each), 29 vertebrate and 28 plant cytochrome c's. Those scales which are the most highly conserved through the evolution of each protein family may well be the best predictors of protein folding patterns. The mean-area-buried scale and the optimized matching hydrophobicities scale are more conserved than other scales. An additional result is the relatively poor conservation across evolution of the Chou-Fasman secondary structure predictors.  相似文献   

11.
WW Zhu  C Wang  J Jipp  L Ferguson  SN Lucas  MA Hicks  ME Glasner 《Biochemistry》2012,51(31):6171-6181
Understanding how enzyme specificity evolves will provide guiding principles for protein engineering and function prediction. The o-succinylbenzoate synthase (OSBS) family is an excellent model system for elucidating these principles because it has many highly divergent amino acid sequences that are <20% identical, and some members have evolved a second function. The OSBS family belongs to the enolase superfamily, members of which use a set of conserved residues to catalyze a wide variety of reactions. These residues are the only conserved residues in the OSBS family, so they are not sufficient to determine reaction specificity. Some enzymes in the OSBS family catalyze another reaction, N-succinylamino acid racemization (NSAR). NSARs cannot be segregated into a separate family because their sequences are highly similar to those of known OSBSs, and many of them have both OSBS and NSAR activities. To determine how such divergent enzymes can catalyze the same reaction and how NSAR activity evolved, we divided the OSBS family into subfamilies and compared the divergence of their active site residues. Correlating sequence conservation with the effects of mutations in Escherichia coli OSBS identified two nonconserved residues (R159 and G288) at which mutations decrease efficiency ≥200-fold. These residues are not conserved in the subfamily that includes NSAR enzymes. The OSBS/NSAR subfamily binds the substrate in a different orientation, eliminating selective pressure to retain arginine and glycine at these positions. This supports the hypothesis that specificity-determining residues have diverged in the OSBS family and provides insight into the sequence changes required for the evolution of NSAR activity.  相似文献   

12.
Amino acid residues, which play important roles in protein function, are often conserved. Here, we analyze thermodynamic and structural data of protein-DNA interactions to explore a relationship between free energy, sequence conservation and structural cooperativity. We observe that the most stabilizing residues or putative hotspots are those which occur as clusters of conserved residues. The higher packing density of the clusters and available experimental thermodynamic data of mutations suggest cooperativity between conserved residues in the clusters. Conserved singlets contribute to the stability of protein-DNA complexes to a lesser extent. We also analyze structural features of conserved residues and their clusters and examine their role in identifying DNA-binding sites. We show that about half of the observed conserved residue clusters are in the interface with the DNA, which could be identified from their amino acid composition; whereas the remaining clusters are at the protein-protein or protein-ligand interface, or embedded in the structural scaffolds. In protein-protein interfaces, conserved residues are highly correlated with experimental residue hotspots, contributing dominantly and often cooperatively to the stability of protein-protein complexes. Overall, the conservation patterns of the stabilizing residues in DNA-binding proteins also highlight the significance of clustering as compared to single residue conservation.  相似文献   

13.
Functional and structural regions inferred from the Escherichia coli R ecA protein crystal structure and mutation studies are evaluated in terms of evolutionary conservation across 63 RecA eubacterial sequences. Two paramount segments invariant in specific amino acids correspond to the ATP-binding A site and the functionally unassigned segment from residues 145 to 149 immediately carboxyl to the ATP hydrolysis B site. Not only are residues 145 to 149 conserved individually, but also all three-dimensional structural neighbors of these residues are invariant, strongly attesting to the functional or structural importance of this segment. The conservation of charged residues at the monomer-monomer interface, emphasizing basic residues on one surface and acidic residues on the other, suggests that RecA monomer polymerization is substantially mediated by electrostatic interactions. Different patterns of conservation also allow determination of regions proposed to interact with DNA, of LexA binding sites, and of filament-filament contact regions. Amino acid conservation is also compared with activities and properties of certain RecA protein mutants. Arginine 243 and its strongly cationic structural environment are proposed as the major site of competition for DNA and LexA binding to RecA. The conserved acidic and glycine residues of the disordered loop L1 and its proximity to the RecA acidic monomer interface suggest its involvement in monomer-monomer interactions rather than DNA binding. The conservation of various RecA positions and regions suggests a model for RecA-double-stranded DNA interaction and other functional and structural assignments.  相似文献   

14.
15.
To help elucidate the function of the cystic fibrosis transmembrane conductance regulator (CFTR), we have undertaken a cross-species analysis of the DNA sequence which encodes this protein. We have isolated and characterized the cDNA of the bovine homologue of CFTR. The deduced amino acid sequence shows high overall identity with the published sequences from human and mouse, although there is marked variability between the different potential functional domains. The region around human amino acid 508, which is deleted in 70% of cystic fibrosis chromosomes, is highly conserved across species; of the missense cystic fibrosis mutations reported to date, all of the amino acids in the normal human sequence are conserved in the bovine and mouse sequences. A single amino acid encoded by the human cDNA (Ser-434) is missing in the bovine sequence, and there are two amino acids encoded by the bovine sequence which are absent in the human. These all stem from in-frame 3-base omissions within the sequences. In addition to the cow, we amplified the DNA sequences encoding a portion of the R-domain from sheep, monkey, rabbit, and guinea pig. These sequences show relatively low overall sequence identity (63%), but nearly all of the potential protein kinase A and protein kinase C phosphorylation sites are conserved over all of the species examined. Our results suggest functional significance for certain highly conserved residues and putative domains within CFTR.  相似文献   

16.
Here, we present statistical analysis of conservation profiles in families of homologous sequences for nine proteins whose folding nucleus was determined by protein engineering methods. We show that in all but one protein (AcP) folding nucleus residues are significantly more conserved than the rest of the protein. Two aspects of our study are especially important: (i) grouping of amino acid residues into classes according to their physical-chemical properties and (ii) proper normalization of amino acid probabilities that reflects the fact that evolutionary pressure to conserve some amino acid types may itself affect concentration of various amino acid types in protein families. Neglect of any of those two factors may make physical and biological "signals" from conservation profiles disappear.  相似文献   

17.
An Escherichia coli strain, ECOR28, was found to have insertions of an identical sequence (1,279 bp in length) at 10 loci in its genome. This insertion sequence (named IS621) has one large open reading frame encoding a putative protein that is 326 amino acids in length. A computer-aided homology search using the DNA sequence as the query revealed that IS621 was homologous to the piv genes, encoding pilin gene invertase (PIV). A homology search using the amino acid sequence of the putative protein encoded by IS621 as the query revealed that the protein also has partial homology to transposases encoded by the IS110/IS492 family elements, which were known to have partial homology to PIV. This indicates that IS621 belongs to the IS110/IS492 family but is most closely related to the piv genes. In fact, a phylogenetic tree constructed on the basis of amino acid sequences of PIV proteins and transposases revealed that IS621 belongs to the piv gene group, which is distinct from the IS110/IS492 family elements, which form several groups. PIV proteins and transposases encoded by the IS110/IS492 family elements, including IS621, have four acidic amino acid residues, which are conserved at positions in their N-terminal regions. These residues may constitute a tetrad D-E(or D)-D-D motif as the catalytic center. Interestingly, IS621 was inserted at specific sites within repetitive extragenic palindromic (REP) sequences at 10 loci in the ECOR28 genome. IS621 may not recognize the entire REP sequence in transposition, but it recognizes a 15-bp sequence conserved in the REP sequences around the target site. There are several elements belonging to the IS110/IS492 family that also transpose to specific sites in the repeated sequences, as does IS621. IS621 does not have terminal inverted repeats like most of the IS110/IS492 family elements. The terminal sequences of IS621 have homology with the 26-bp inverted repeat sequences of pilin gene inversion sites that are recognized and used for inversion of pilin genes by PIV. This suggests that IS621 initiates transposition through recognition of their terminal regions and cleavage at the ends by a mechanism similar to that used for PIV to promote inversion at the pilin gene inversion sites.  相似文献   

18.
We report 31 point mutations in the factor IX gene and explore the relationship between the level of evolutionary conservation of an amino acid and the probability of a mutation causing hemophilia B. From our total sample of 125 hemophiliacs and from those reported by others, we identify 95 independent missense mutations, 94 of which occur at amino acids that are evolutionarily conserved in the available mammalian factor IX sequences. The likelihood of a missense mutation causing hemophilia B depends on whether the residue is also conserved in the factor IX-related proteases: factor VII, factor X, and protein C. Most of the possible missense mutations in generically conserved residues (i.e., those conserved in factor IX and in all the related proteases) should cause disease. In contrast, missense mutations in factor IX-specific residues (i.e., those conserved in human, cow, dog, and mouse factor IX but not in the related proteases) are sixfold less likely to cause disease. Missense mutations at nonconserved residues are 33-fold less likely to cause disease. At least three models are compatible with these observations. A comparison of sequence alignments from four and nine species of factor IX and an examination of the missense mutations occurring at CpG residues suggest a model in which most residues fall on opposite ends of a spectrum. In about 40% of residues, virtually any missense mutation in a minority of the residues will cause disease, while virtually no missense mutations will cause disease in most of the remaining residues. Thus, many of the residues in factor IX are spacers; that is, the main chains are presumably necessary to keep other amino acid interactions in register, but the nature of the side chain is unimportant.  相似文献   

19.
The mechanisms that determine mechanical stabilities of protein folds remain elusive. Our understanding of these mechanisms is vital to both bioengineering efforts and to the better understanding and eventual treatment of pathogenic mutations affecting mechanically important proteins such as titin. We present a new approach to analyze data from single‐molecule force spectroscopy for different domains of the giant muscle protein titin. The region of titin found in the I‐band of a sarcomere is composed of about 40 Ig‐domains and is exposed to force under normal physiological conditions and connects the free‐hanging ends of the myosin filaments to the Z‐disc. Recent single‐molecule force spectroscopy data show a mechanical hierarchy in the I‐band domains. Domains near the C‐terminus in this region unfold at forces two to three times greater than domains near the beginning of the I‐band. Though all of these Ig‐domains are thought to share a fold and topology common to members of the Ig‐like fold family, the sequences of neighboring domains vary greatly with an average sequence identity of only 25%. We examine in this study the relation of these unique mechanical stabilities of each I‐band Ig domain to specific, conserved physical–chemical properties of amino acid sequences in related Ig domains. We find that the sequences of each individual titin Ig domain are very highly conserved, with an average sequence identity of 79% across species that are divergent as humans, chickens, and zebra fish. This indicates that the mechanical properties of each domain are well conserved and tailored to its unique position in the titin molecule. We used the PCPMer software to determine the conservation of amino acid properties in titin Ig domains grouped by unfolding forces into “strong” and “weak” families. We found two motifs unique to each family that may have some role in determining the mechanical properties of these Ig domains. A detailed statistical analysis of properties of individual residues revealed several positions that displayed differentially conserved properties in strong and weak families. In contrast to previous studies, we find evidence that suggests that the mechanical stability of Ig domains is determined by several residues scattered across the β‐sandwich fold, and force sensitive residues are not only confined to the A′‐G region. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

20.
Shih CH  Chang CM  Lin YS  Lo WC  Hwang JK 《Proteins》2012,80(6):1647-1657
The knowledge of conserved sequences in proteins is valuable in identifying functionally or structurally important residues. Generating the conservation profile of a sequence requires aligning families of homologous sequences and having knowledge of their evolutionary relationships. Here, we report that the conservation profile at the residue level can be quantitatively derived from a single protein structure with only backbone information. We found that the reciprocal packing density profiles of protein structures closely resemble their sequence conservation profiles. For a set of 554 nonhomologous enzymes, 74% (408/554) of the proteins have a correlation coefficient > 0.5 between these two profiles. Our results indicate that the three-dimensional structure, instead of being a mere scaffold for positioning amino acid residues, exerts such strong evolutionary constraints on the residues of the protein that its profile of sequence conservation essentially reflects that of its structural characteristics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号