首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
The aim of our study was to annotate sequences for 35 putative globins from the nematode Caenorhabditis elegans. All these proteins are expressed, but seven of these differ from the gene predictions in Wormbase. The entire polypeptide sequences for 31 genes and the core globin domain of four proteins were confirmed or corrected. All core globin domains were aligned manually following a procedure that was designed to fit the putative sequences to the crystal structure based alignment of 56 known globin crystal structures. Neighbor-joining analysis of the resulting alignment showed that the majority of these globins are very divergent from each other, possibly suggesting a long evolutionary divergence. The surprisingly high number and low sequence conservation of putative globins in this small organism urges a detailed functional analysis.  相似文献   

2.
To determine how different amino acid sequences form similar protein structures, and how proteins adapt to mutations that change the volume of residues buried in their close-packed interiors, we have analysed and compared the atomic structures of nine different globins. The homology of the sequences in the two most distantly related molecules is only 16%.The principal determinants of three-dimensional structure of these proteins are the approximately 59 residues involved in helix to helix and helix to haem packings. Half of these residues are buried within the molecules. The observed variations in the sequence keep the side-chains of buried residues non-polar, but do not maintain their size: the mean variation of the volume among homologous amino acids is 56 Å3.Changes in the volumes of buried residues are accompanied by changes in the geometry of the helix packings. The relative positions and orientations of homologous pairs of helices in the globins differ by rigid body shifts of up to 7 Å and 30 °. In order to retain functional activity these shifts are coupled so that the geometry of the residues forming the haem pocket is very similar in all the globins.We discuss the implications of these results for the mechanism of protein evolution.  相似文献   

3.
The concept of a flexible protein sequence pattern is defined. In contrast to conventional pattern matching, template or sequence alignment methods, flexible patterns allow residue patterns typical of a complete protein fold to be developed in terms of residue positions (elements), separated by gaps of defined range. An efficient dynamic programming algorithm is presented to enable the best alignment(s) of a pattern with a sequence to be identified. The flexible pattern method is evaluated in detail by reference to the globin protein family, and by comparison to alignment techniques that exploit single sequence, multiple sequence and secondary structural information. A flexible pattern derived from seven globins aligned on structural criteria successfully discriminates all 345 globins from non-globins in the Protein Identification Resource database. Furthermore, a pattern that uses helical regions from just human alpha-haemoglobin identified 337 globins compared to 318 for the best non-pattern global alignment method. Patterns derived from successively fewer, yet more highly conserved positions in a structural alignment of seven globins show that as few as 38 residue positions (25 buried hydrophobic, 4 exposed and 9 others) may be used to uniquely identify the globin fold. The study suggests that flexible patterns gain discriminating power both by discarding regions known to vary within the protein family, and by defining gaps within specific ranges. Flexible patterns therefore provide a convenient and powerful bridge between regular expression pattern matching techniques and more conventional local and global sequence comparison algorithms.  相似文献   

4.
Invariant features of the primary structure of 67 globins are analysed. These features may be responsible for the formation of the secondary structure of these proteins at the first stage of self-organization (in the unfolded chain). It is shown that in primary structures of globins there are 11 sites or regions of one to four residues in which at least one of the residues Asn, Asp, His, Pro, Ser or Thr is located in every globin (haem-linking His residues are excluded from these sites). An unambiguous correlation exists between the position of these regions and the secondary structure of globins: all these regions (except one) are located near the ends of helices in globins whose three-dimensional structure is known and the ends of all helices (except for the helix F) are coded by such regions. A decrease in the set of residues listed above leads to a sharp drop in the number of regions invariantly occupied by the residues, while an addition of residues such as Tyr and Gly to this set does not eventually increase the number of invariant regions. Five residues (Asn, Asp, His, Ser and Thr) of the six that code the ends of helical regions have polar side groups with a small number of degrees of freedom capable of forming hydrogen bonds with atoms of the backbone with a relatively small loss of entropy. One residue (Pro) has no NH-group and, therefore, has less chance of participating in the formation of hydrogen bonds between atoms of the backbone. This corroborates the hypothesis that competition between hydrogen bonds of short polar side groups and hydrogen bonds in the backbone is essential for the formation of the secondary structure in unfolded protein chains. Amino acid replacements in hydrophobic cores of the 67 globins are considered in the Appendix.  相似文献   

5.
We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue-residue contacts and the more than 3 x 10(6) amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (approximately 75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (> 21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence-structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue-residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank.  相似文献   

6.
The Server for Quick Alignment Reliability Evaluation (SQUARE) is a Web-based version of the method we developed to predict regions of reliably aligned residues in sequence alignments. Given an alignment between a query sequence and a sequence of known structure, SQUARE is able to predict which residues are reliably aligned. The server accesses a database of profiles of sequences of known three-dimensional structures in order to calculate the scores for each residue in the alignment. SQUARE produces a graphical output of the residue profile-derived alignment scores along with an indication of the reliability of the alignment. In addition, the scores can be compared against template secondary structure, conserved residues and important sites.  相似文献   

7.
Translated cDNA for Artemia hemoglobin provided sequence data for almost nine domains, from the fourth residue of the A helix of one domain through 1405 residues to a stop codon after the ninth domain. The domain sequences were all different (homology between pairs 17-38%) but aligned well with each other and with conventional globins, satisfying the requirements for Phe at CD1, His at F8 and most other highly conserved features of globins including His at E7. Features found to be characteristic of Artemia globin and present in all nine domains were Phe at B10, Tyr at C4, Gly at F5, Phe at G5 and Gly at H22. Approximately 14 residues including a consensus -Val-Asp-Pro-Val-Thr-Gly-Leu- were available to form the linker between each pair of domains. The Artemia sequence data were compared with the crystal structures of Chironomus thummi thummi erythrocruorin III and sperm whale myoglobin in order to identify features of structural similarity and to examine the consequences of the differences. The Artemia sequences were compatible with the main helices and critical features of the globin fold. Possible modifications to the C helix, FG turn, and GH turn were studied in terms of molecular coordinates.  相似文献   

8.
The giant extracellular hexagonal bilayer hemoglobin (HBL-Hb) of the deep-sea hydrothermal vent tube worm Riftia pachyptila is able to transport simultaneously O(2) and H(2)S in the blood from the gills to a specific organ: the trophosome that harbors sulfide-oxidizing endosymbionts. This vascular HBL-Hb is made of 144 globins from which four globin types (A1, A2, B1, and B2) coevolve. The H(2)S is bound at a specific location (not on the heme site) onto two of these globin types. In order to understand how such a function emerged and evolved in vestimentiferans and other related annelids, six partial cDNAs corresponding to the six globins known to compose the multigenic family of R. pachyptila have been identified and sequenced. These partial sequences (ca. 120 amino acids, i.e., 80% of the entire protein) were used to reconstruct molecular phylogenies in order to trace duplication events that have led to the family organization of these globins and to locate the position of the free cysteine residues known to bind H(2)S. From these sequences, only two free cysteine residues have been found to occur, at positions Cys + 1 (i.e., 1 a.a. from the well-conserved distal histidine) and Cys + 11 (i.e., 11 a.a. from the same histidine) in globins B2 and A2, respectively. These two positions are well conserved in annelids, vestimentiferans, and pogonophorans, which live in sulfidic environments. The structural comparison of the hydrophobic environment that surrounds these cysteine residues (the sulfide-binding domain) using hydrophobic cluster analysis plots, together with the cysteine positions in paralogous strains, suggests that the sulfide-binding function might have emerged before the annelid radiation in order to detoxify this toxic compound. Moreover, globin evolutionary rates are highly different between paralogous strains. This suggests that either the two globin subfamilies involved in the sulfide-binding function (A2 and B2) have evolved under strong directional selective constraints (negative selection) and that the two other globins (A1 and B1) have accumulated more substitutions through positive selection or have evolved neutrally after a relaxation of selection pressures. A likely scenario on the evolution of this multigenic family is proposed and discussed from this data set.  相似文献   

9.
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone.Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions.These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues.This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.  相似文献   

10.
Erythrocytes of the adult axolotl, Ambystoma mexicanum, have multiple hemoglobins. We separated and purified two kinds of hemoglobin, termed major hemoglobin (Hb M) and minor hemoglobin (Hb m), from a five-year-old male by hydrophobic interaction column chromatography on Alkyl Superose. The hemoglobins have two distinct alpha type globin polypeptides (alphaM and alpham) and a common beta globin polypeptide, all of which were purified in FPLC on a reversed-phase column after S-pyridylethylation. The complete amino acid sequences of the three globin chains were determined separately using nucleotide sequencing with the assistance of protein sequencing. The mature globin molecules were composed of 141 amino acid residues for alphaM globin, 143 for alpham globin and 146 for beta globin. Comparing primary structures of the five kinds of axolotl globins, including two previously established alpha type globins from the same species, with other known globins of amphibians and representatives of other vertebrates, we constructed phylogenetic trees for amphibian hemoglobins and tetrapod hemoglobins. The molecular trees indicated that alphaM, alpham, beta and the previously known alpha major globin were adult types of globins and the other known alpha globin was a larval type. The existence of two to four more globins in the axolotl erythrocyte is predicted.  相似文献   

11.
M J Sippl  S Weitckus 《Proteins》1992,13(3):258-271
We present an approach which can be used to identify native-like folds in a data base of protein conformations in the absence of any sequence homology to proteins in the data base. The method is based on a knowledge-based force field derived from a set of known protein conformations. A given sequence is mounted on all conformations in the data base and the associated energies are calculated. Using several conformations and sequences from the globin family we show that the native conformation is identified correctly. In fact the resolution of the force field is high enough to discriminate between a native fold and several closely related conformations. We then apply the procedure to several globins of known sequence but unknown three dimensional structure. The homology of these sequences to globins of known structures in the data base ranges from 49 to 17%. With one exception we find that for all globin sequences one of the known globin folds is identified as the most favorable conformation. These results are obtained using a force field derived from a data base devoid of globins of known structure. We briefly discuss useful applications in protein structural research and future development of our approach.  相似文献   

12.
13.
Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.  相似文献   

14.
The N-terminal domain (NTD) of the heme-regulated eukaryotic initiation factor (eIF)2alpha kinase (HRI) was aligned to sequences in the NCBI data base using ENTREZ and a PAM250 matrix. Significant similarity was found between amino acids 11-118 in the NTD of rabbit HRI and amino acids 16-120 in mammalian alpha-globins. Several conserved amino acid residues present in globins are conserved in the NTD of HRI. His83 of HRI was predicted to be equivalent to the proximal heme ligand (HisF8) that is conserved in all globins. Molecular modeling of the NTD indicated that its amino acid sequence was compatible with the globin fold. Recombinant NTD (residues 1-159) was expressed in Escherichia coli. Spectral analysis of affinity purified recombinant NTD indicated that the NTD contained stably bound hemin. Mutational analysis indicated that His83 played a critical structural role in the stable binding of heme to the NTD, and was required to stabilize full length HRI synthesized de novo in the rabbit reticulocyte lysate. These results indicate that the NTD of HRI is an autonomous heme-binding domain, with His83 possibly serving as the proximal heme binding ligand.  相似文献   

15.
Seven-hundred globin sequences, including 146 nonvertebrate sequences, were aligned on the basis of conservation of secondary structure and the avoidance of gap penalties. Of the 182 positions needed to accommodate all the globin sequences, only 84 are common to all, including the absolutely conserved PheCD1 and HisF8. The mean number of amino acid substitutions per position ranges from 8 to 13 for all globins and 5 to 9 for internal positions. Although the total sequence volumes have a variation approximately 2-3%, the variation in volume per position ranges from approximately 13% for the internal to approximately 21% for the surface positions. Plausible correlations exist between amino acid substitution and the variation in volume per position for the 84 common and the internal but not the surface positions. The amino acid substitution matrix derived from the 84 common positions was used to evaluate sequence similarity within the globins and between the globins and phycocyanins C and colicins A, via calculation of pairwise similarity scores. The scores for globin-globin comparisons over the 84 common positions overlap the globin-phycocyanin and globin-colicin scores, with the former being intermediate. For the subset of internal positions, overlap is minimal between the three groups of scores. These results imply a continuum of amino acid sequences able to assume the common three-on-three alpha-helical structure and suggest that the determinants of the latter include sites other than those inaccessible to solvent.  相似文献   

16.
The intracellular hemoglobin of the polychaete Glycera dibranchiata consists of several components, some of which self-associate into a "polymeric" fraction. The cDNA library constructed from the poly(A+) mRNA of Glycera erythrocytes (Simons, P. C., and Satterlee, J. D. (1989) Biochemistry 28, 8525-8530) was screened with two oligodeoxynucleotide probes corresponding to the amino acid sequences MEEKVP and AMNSKV. Each of the two probes identified a full-length positive insert; these were sequenced using the dideoxynucleotide chain termination method. One clone was 630 bases long and contained 36 bases of 5'-untranslated RNA, a reading frame of 441 bases coding for the 147 amino acids of globin P2 including the residues MEEKVP, and a 3'-untranslated region of 153 bases. The other clone was 540 bases long and contained 24 bases of 5'-untranslated RNA, an open reading frame of 441 bases coding for globin P3 including the residues AMNSKV, and a 3'-untranslated region of 75 bases. The inferred amino acid sequences of the two globins were in agreement with the partial amino acid sequences obtained by chemical methods. The P2 and P3 globin sequences, together with the previously determined P1 sequence of a complete insert and partial sequences P4, P5, and P6 obtained from partial inserts (Zafar, R. S., Chow, L. H., Stern, M. S., Vinogradov, S. N., and Walz, D. A. (1990) Biochim. Biophys. Acta, in press) suggest that there are at least six components in the polymeric fraction of Glycera hemoglobin, which is in agreement with the results of polyacrylamide gel electrophoresis in Tris/glycine buffer, pH 8.3, 6 M urea. Nothern and dot blot analyses of Glycera erythrocyte poly(A+) mRNA using the foregoing two cDNA probes clearly demonstrated the presence of mature messages encoding both types of globins. Comparison of the polymeric sequences P1, P2, and P3 with the "monomeric" globins M-II and M-IV using the alignment and templates of Bashford et al. (Bashford, D., Chothia, C., and Lesk, A. M. (1987) J. Mol. Biol. 196, 199-216) showed that all five globins have identical residues at 39 positions. At 44 positions, the three polymeric globins share identical residues that differ from the identical residues at the corresponding locations in the monomeric sequences M-II and M-IV including position E7, where the latter have leucine instead of the distal histidine. At 15 positions, there occurs an alteration from polar to nonpolar or from a small nonpolar to a larger nonpolar residue in going from the monomeric to the polymeric globins.(ABSTRACT TRUNCATED AT 400 WORDS)  相似文献   

17.
Structure-based sequence alignment of 728 sequences of different globin subfamilies shows that in each subfamily there are two clusters of consensually conserved residues. The first is the well-known "functional" cluster which includes six heme-binding conserved residues (Phe CD1, His F8; aliphatic E11, FG5; hydrophobic F4, G5) and seven other conserved residues (Pro C2; aliphatic H19; hydrophobic B10, B13, B14, CD4, E4) that do not bind the heme but belong to its immediate neighborhood. The second cluster revealed here (aliphatic A8, G16, G12; aromatic A12; hydrophobic H8 and possibly H12) is distant from the heme. It is entirely non-polar and includes one turn (i, i+4 positions) from each of helices A, G, and H. It is known that A, G, and H helices formed at the earliest stage of apomyoglobin folding remain relatively stable in the equilibrium molten globule state, and are likely to be tightly packed with each other in this state. We have shown the existence of two similar conserved clusters in c -type cytochromes, heme-binding and distal from the heme. The second cluster in c -cytochromes includes one turn from each of the N and C-terminal alpha-helices. These N and C-terminal helices in cytochrome c are formed at the earliest stage of protein folding, remain relatively stable in the molten globule state, and are tightly packed with each other in this state, similar to the observed behavior of the globins. At least these two large protein families (c -type cytochromes and globins) have a close similarity in the existence and mutual positions of non-functional conserved residues. We assume that non-functional conserved residues are requisite for the fast and correct folding of both of these protein families into their stable 3D structures.  相似文献   

18.
The catalytic or functionally important residues of a protein are known to exist in evolutionarily constrained regions. However, the patterns of residue conservation alone are sometimes not very informative, depending on the homologous sequences available for a given query protein. Here, we present an integrated method to locate the catalytic residues in an enzyme from its sequence and structure. Mutations of functional residues usually decrease the activity, but concurrently often increase stability. Also, catalytic residues tend to occupy partially buried sites in holes or clefts on the molecular surface. After confirming these general tendencies by carrying out statistical analyses on 49 representative enzymes, these data together with amino acid conservation were evaluated. This novel method exhibited better sensitivity in the prediction accuracy than traditional methods that consider only the residue conservation. We applied it to some so-called "hypothetical" proteins, with known structures but undefined functions. The relationships among the catalytic, conserved, and destabilizing residues in enzymatic proteins are discussed.  相似文献   

19.
Oliveira L  Paiva PB  Paiva AC  Vriend G 《Proteins》2003,52(4):544-552
We introduce sequence entropy-variability plots as a method of analyzing families of protein sequences, and demonstrate this for three well-known sequence families: globins, ras-like proteins, and serine-proteases. The location of an aligned residue position in the entropy-variability plot correlates with structural characteristics, and with known facts about the roles of individual amino acids in the function of these proteins. The large numbers of known sequences in these families allowed us to introduce new filtering methods for variability patterns. The results are discussed in terms of a simple evolutionary model for functional proteins.  相似文献   

20.
This study was designed to search for new regions of similarity in the integrase family of recombination proteins which consists of 28 members found in bacteria and yeast. A computer method based on an information content analysis has been used to align local regions of homology in the set of unaligned protein sequences from this family. Among the aligned regions with high information content were those containing the known conserved histidine, arginine and tyrosine residues. In addition, a new region was identified containing another arginine residue that appears to be conserved in all members of the family. To test further the importance of this newly identified arginine residue, mutants in the Cre protein from phase P1, a member of this integrase family, have been constructed which alter this residue. The mutations which change arginine to lysine and arginine to cysteine depress catalytic activity but not site-specific binding to the lox site. This result is expected for a conserved active site residue. This computer analysis also provides a means for searching for new members of the integrase family.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号