首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A sequence similarity has been found between two segments of endothiapepsin (acid proteinase, 2APE), bovine pancreatic ribonuclease A, and peptide T, a segment of the gp120 protein of human inmmune deficiency virus (HIV), which has been implicated in blocking viral attachment to the T4 receptor. The two similar sequences of the acid proteinase enzyme are Leu-Ile-Asp-Ser-Ser-Ala-Tyr-Thr (residues 169–176) and Tyr-Thr-Gly-Ser-Leu-Asn-Tyr-Thr (residues 175–182). Since the X-ray crystallographic structures of the acid proteinase and ribonuclease are known, it has been possible to determine whether the three-dimensional structures of the segments are similar. Portions of both of the segments of acid proteinase are directly superimposable on the structure of the RNase A 19–26 segment. The fact that the three similar sequences from two completely unrelated proteins give rise to almost identical structures raises the possibility that these segments may be involved in nucleating the folding of these proteins. In addition, this provides further support for the concept that the octapeptide sequence of peptide T of HIV, which is also similar in sequence to the 19–26 sequence of RNase A, is also structurally similar to these residues, which adopt a -bend conformation. Furthermore, comparison of similarities and differences in the structure of these similar sequences provides an explanation for alterations in the biological activity of various truncated or substituted derivatives of peptide T and additional confirmation of the structural requirements for peptide T in T4-receptor recognition.  相似文献   

2.
Subtilases are members of the family of subtilisin-like serine proteases. Presently, greater than 50 subtilases are known, greater than 40 of which with their complete amino acid sequences. We have compared these sequences and the available three-dimensional structures (subtilisin BPN', subtilisin Carlsberg, thermitase and proteinase K). The mature enzymes contain up to 1775 residues, with N-terminal catalytic domains ranging from 268 to 511 residues, and signal and/or activation-peptides ranging from 27 to 280 residues. Several members contain C-terminal extensions, relative to the subtilisins, which display additional properties such as sequence repeats, processing sites and membrane anchor segments. Multiple sequence alignment of the N-terminal catalytic domains allows the definition of two main classes of subtilases. A structurally conserved framework of 191 core residues has been defined from a comparison of the four known three-dimensional structures. Eighteen of these core residues are highly conserved, nine of which are glycines. While the alpha-helix and beta-sheet secondary structure elements show considerable sequence homology, this is less so for peptide loops that connect the core secondary structure elements. These loops can vary in length by greater than 150 residues. While the core three-dimensional structure is conserved, insertions and deletions are preferentially confined to surface loops. From the known three-dimensional structures various predictions are made for the other subtilases concerning essential conserved residues, allowable amino acid substitutions, disulphide bonds, Ca(2+)-binding sites, substrate-binding site residues, ionic and aromatic interactions, proteolytically susceptible surface loops, etc. These predictions form a basis for protein engineering of members of the subtilase family, for which no three-dimensional structure is known.  相似文献   

3.
Shestopalov BV 《Tsitologiia》2003,45(7):702-706
The calculation of protein three-dimensional structure from the amino acid sequence is a fundamental problem to be solved. This paper presents principles of the code theory of protein secondary structure, and their consequence--the amino acid code of protein secondary structure. The doublet code model of protein secondary structure, developed earlier by the author (Shestopalov, 1990), is part of this theory. The theory basis are: 1) the name secondary structure is assigned to the conformation, stabilized only by the nearest (intraresidual) and middle-range (at a distance no more than that between residues i and i + 5) interactions; 2) the secondary structure consists of regular (alpha-helical and beta-structural) and irregular (coil) segments; 3) the alpha-helices, beta-strands and coil segments are encoded, respectively, by residue pairs (i, i + 4), (i, i + 2), (i, i = 1), according to the numbers of residues per period, 3.6, 2, 1; 4) all such pairs in the amino acid sequence are codons for elementary structural elements, or structurons; 5) the codons are divided into 21 types depending on their strength, i.e. their encoding capability; 6) overlappings of structurons of one and the same structure generate the longer segments of this structure; 7) overlapping of structurons of different structures is forbidden, and therefore selection of codons is required, the codon selection is hierarchic; 8) the code theory of protein secondary structure generates six variants of the amino acid code of protein secondary structure. There are two possible kinds of model construction based on the theory: the physical one using physical properties of amino acid residues, and the statistical one using results of statistical analysis of a great body of structural data. Some evident consequences of the theory are: a) the theory can be used for calculating the secondary structure from the amino acid sequence as a partial solution of the problem of calculation of protein three-dimensional structure from the amino acid sequence, and the calculated secondary structure and codon strength distribution can be used for simulating the next step of protein folding; b) one can propose that the same secondary structures can be folded into different tertiary structures and, vice versa, different secondary structures can be folded into the same tertiary structures, provided codon distributions are considered also; c) codons can be considered as first elements of protein three-dimensional structure language.  相似文献   

4.
M J Rooman  J P Kocher  S J Wodak 《Biochemistry》1992,31(42):10226-10238
A recently developed procedure to predict backbone structure from the amino acid sequence [Rooman, M., Kocher, J. P., & Wodak, S. (1991) J. Mol. Biol, 221, 961-979] is fine tuned to identify protein segments, of length 5-15 residues, that adopt well-defined conformations in the absence of tertiary interactions. These segments are obtained by requiring that their predicted lowest energy structures have a sizable energy gap relative to other computed conformations. Applying this procedure to 69 proteins of known structure, we find that regions with largest energy gaps--those having highly preferred conformations--are also the most accurately predicted ones. On the basis of previous findings that such regions correlate well with sites that become structured early during folding, our approach provides the means of identifying such sites in proteins without prior knowledge of the tertiary structure. Furthermore, when predictions are performed so as to ignore the influence of residues flanking each segment along the sequence, a situation akin to excising the considered peptide from the rest of the chain, they offer the possibility of identifying protein segments liable to adopt well-defined conformations on their own. The described approach should have useful applications in experimental and theoretical investigations of protein folding and stability, and aid in designing peptide drugs and vaccines.  相似文献   

5.
MOTIVATION: A large body of experimental and theoretical evidence suggests that local structural determinants are frequently encoded in short segments of protein sequence. Although the local structural information, once recognized, is particularly useful in protein structural and functional analyses, it remains a difficult problem to identify embedded local structural codes based solely on sequence information. RESULTS: In this paper, we describe a local structure prediction method aiming at predicting the backbone structures of nine-residue sequence segments. Two elements are the keys for this local structure prediction procedure. The first key element is the LSBSP1 database, which contains a large number of non-redundant local structure-based sequence profiles for nine-residue structure segments. The second key element is the consensus approach, which identifies a consensus structure from a set of hit structures. The local structure prediction procedure starts by matching a query sequence segment of nine consecutive amino acid residues to all the sequence profiles in the local structure-based sequence profile database (LSBSP1). The consensus structure, which is at the center of the largest structural cluster of the hit structures, is predicted to be the native state structure adopted by the query sequence segment. This local structure prediction method is assessed with a large set of random test protein structures that have not been used in constructing the LSBSP1 database. The benchmark results indicate that the prediction capacities of the novel local structure prediction procedure exceed the prediction capacities of the local backbone structure prediction methods based on the I-sites library by a significant margin. AVAILABILITY: All the computational and assessment procedures have been implemented in the integrated computational system PrISM.1 (Protein Informatics System for Modeling). The system and associated databases for LINUX systems can be downloaded from the website: http://www.columbia.edu/~ay1/.  相似文献   

6.
The antigenic determinants of trichosanthin were predicted by molecular modeling.First,the threedimensional structure model of the antigen-binding fragment of anti-trichosanthin immunoglobulin E was built on the basis of its amino-acid sequence and the known three-dimensional structure of an antibody with similar sequence.Secondly,the preferable antigen-antibody interactions were obtained based on the known three-dimensional structure of trichosanthin and of the hypervariable regions of anti-trichosanthin immunoglobulin E.Two regions in the molecular surface of trichosanthin were found to form extensive interactions with the hypervariable regions of the antibody and have been predicted to be the possible antigenic determinants:one is composed of two polypeptide segments,Ile201-Glu210 and Ile225-Asp229,which are close to each other in the three-dimensional structure;and the other is the segment Lys173-Thr178.The former region seems to be the more reasonable antigenic determinant than the latter one.  相似文献   

7.
Shestopalov BV 《Tsitologiia》2003,45(7):707-713
In the previous paper (Shestopalov, 2003) we presented the amino acid code of protein secondary structure as a partial solution of the fundamental problem of the protein three-dimensional structure calculation from the amino acid sequence. Here a statistical model of the code is described. The model is based on the structural data from 2258 protein chains (417,112 amino acid residues used). 60 and 61% of the secondary structure, calculated using the model, coincide, respectively, with the observed secondary structure in the training subset and test subset (104 protein chains and 21,166 residues used). This is equal to the threshold value for all the secondary structure calculations, based on the models, where, similarly as here, only the nearest and middle-range interactions are considered. Therefore the constructed model can be applied for the protein structure prediction from the amino acid sequence, especially when additional information is used along with expert analysis, as in the most successful prediction methods. The model can be used for analysis of the secondary structure changes during protein folding by comparison of the calculated and observed secondary structures. The information about the conformationally invariant segments can serve for the simulation of the supersecondary structure formation. One can try to obtain and examine the protein subset, in which the calculated and observed secondary structures are very similar.  相似文献   

8.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

9.
Prediction of the location of structural domains in globular proteins   总被引:7,自引:0,他引:7  
The location of structural domains in proteins is predicted from the amino acid sequence, based on the analysis of a computed contact map for the protein, the average distance map (ADM). Interactions between residues i and j in a protein are subdivided into several ranges, according to the separation |i-j| in the amino acid sequence. Within each range, average spatial distances between every pair of amino acid residues are computed from a data base of known protein structures. Infrequently occurring pairs are omitted as being statistically insignificant. The average distances are used to construct a predicted ADM. The ADM is analyzed for the occurrence of regions with high densities of contacts (compact regions). Locations of rapid changes of density between various parts of the map are determined by the use of scanning plots of contact densities. These locations serve to pinpoint the distribution of compact regions. This distribution, in turn, is used to predict boundaries of domains in the protein. The technique provides an objective method for the location of domains both on a contact map derived from a known three-dimensional protein structure, the real distance map (RDM), and on an ADM. While most other published methods for the identification of domains locate them in the known three-dimensional structure of a protein, the technique presented here also permits the prediction of domains in proteins of unknown spatial structure, as the construction of the ADM for a given protein requires knowledge of only its amino acid sequence.  相似文献   

10.
Using only data on sequence, a method of computing a low-resolution tertiary structure of a protein is described. The steps are: (a) Estimate the distances of individual residues from the centroid of the molecule, using data on hydrophobicity and additional geometrical constraints. (b) Using these distances, construct a two-valued matrix whose elements, the distances between residues, are greater or less thanR, the radius of the molecule. (c) Optimize to obtain a three-dimensional structure. This procedure requires modest computing facilities and is applicable to proteins with 164 residues and presumably more. It produces structures withr (correlation between inter-residue distances in the computed and native structures) between 0.5 and 0.7. Furthermore, correct inference of two or three long-range contacts suffices to yield structures withr values of 0.8–0.9. Because segments forming parallel or antiparallel folding structures intersect the radius vector at similar angles, from centroidal point distances it is possible to infer some of these long-range contacts by an elaboration of the procedure used to construct the input matrix. A criterion is also described which can be used to determine the quality of a proposed input matrix even when the native structure is not known.  相似文献   

11.
Comparisons of the primary structures of yeast and horse liver alcohol dehydrogenases reveal that the enzymes are homologous but distantly related. The overall positional identity is 25% between common regions, and several deletions/insertions occur in either enzyme, the longest apparently corresponding to 21 residues, showing that the different subunit sizes are largely explained by internal differences. Variabilities in the structural similarities can be coupled with functional requirements but not directly with whole domains in the previously known tertiary structure of the horse protein. The two most similar regions of the enzymes affect active-site segments and the two most dissimilar regions seem to affect a loop structure without known function, and a segment participating in subunit interactions. The dissimilarities may probably be correlated with changes in zinc-binding properties and quaternary structures. The extra region corresponding to the large internal chain-length difference shows an apparent coincidence in sequence to a following segment of the horse enzyme, and additional elements of internal coincidences, or superficial similarities with other dehydrogenases, are noticed. These characteristics are not fully distinguishable from chance distributions but in view of the extensive species variations in alcohol dehydrogenases some evolutionary considerations may not be excluded, in which case a model relating all regions of these and associated enzymes to a common ancestor is shown to be compatible with all known observations.  相似文献   

12.
The GTP-binding p21 protein encoded by the ras-oncogene can be activated to cause malignant transformation of cells by substitution of a single amino acid at critical positions along the polypeptide chain. Substitution of any non-cyclic L-amino acid for Gly 12 in the normal protein results in a transforming protein. This substitution occurs in a hydrophobic sequence (residues 6-15) which is known to be involved in binding the phosphate moities of GTP (and GDP). We find, using conformational energy calculations, that the 6-15 segment of the normal protein (with Gly 12) adopts structures that contain a bend at residues 11 and 12 with the Gly in the D* conformation, not allowed energetically for L-amino acids. Substitution of non-cyclic L-amino acids for Gly 12 results in shifting this bend to residues 12 and 13. We show that many computed structures for the Gly 12-containing phosphate binding loop, segment 9-15, are superimposable on the corresponding segment of the recently determined X-ray crystallographic structure for residues 1-171 of the p21 protein. All such structures contain bends at residues 11 and 12 and most of these contain Gly 12 in the C* or D* conformational state. Other computed conformations for the 9-15 segment were superimposable on the structure of the corresponding 18-23 segment of EFtu, the bacterial chain elongation factor having structural similarities to the p21 protein in the phosphate-binding regions. This segment contains a Val residue where a Gly occurs in the p21 protein. As previously predicted, all of these superimposable conformations contain a bend at positions 12 and 13, not 11 and 12. If these structures that are superimposable on EFtu are introduced into the p21 protein structure, bad contacts occur between the sidechain of the residue (here Val) at position 12 and another phosphate binding loop region around position 61. These bad contacts between the two segments can be removed by changing the conformation of the 61 region in the p21 protein to the corresponding position of the homologous region in EFtu. In this new conformation, a large site becomes available for the binding of phosphate residues. In addition, such phenomena as autophosphorylation of the p21 protein by GTP can be explained with this new model structure for the activated protein which cannot be explained by the structure for the non-activated protein.  相似文献   

13.
Prediction of protein residue contacts with a PDB-derived likelihood matrix   总被引:8,自引:0,他引:8  
Proteins with similar folds often display common patterns of residue variability. A widely discussed question is how these patterns can be identified and deconvoluted to predict protein structure. In this respect, correlated mutation analysis (CMA) has shown considerable promise. CMA compares multiple members of a protein family and detects residues that remain constant or mutate in tandem. Often this behavior points to structural or functional interdependence between residues. CMA has been used to predict pairs of amino acids that are distant in the primary sequence but likely to form close contacts in the native three-dimensional structure. Until now these methods have used evolutionary or biophysical models to score the fit between residues. We wished to test whether empirical methods, derived from known protein structures, would provide useful predictive power for CMA. We analyzed 672 known protein structures, derived contact likelihood scores for all possible amino acid pairs, and used these scores to predict contacts. We then tested the method on 118 different protein families for which structures have been solved to atomic resolution. The mean performance was almost seven times better than random prediction. Used in concert with secondary structure prediction, the new CMA method could supply restraints for predicting still undetermined structures.  相似文献   

14.
Eight different types of peptide mixtures from [14C]carboxymethylated yeast alcohol dehydrogenase were obtained using trypsin with or without prior maleylation of the substrate, chymotrypsin, pepsin, microbial proteases or CNBr. Each mixture was fractionated by exclusion chromatography and peptides were further purified on paper. From results of analyses of all fragments it seems possible to to deduce a primary structure of 347 unique residues in three segments. Together, the segments can account for the whole protein monomer with the exception of a small connecting region. Many unfavourable structures complicated the determination and made single sequence conclusions tentative, but known data are consistent and for most segments of the monomer results are abundant. Several microheterogeneities in the protein are indicated and one apparent amino acid exchange is characterized, suggesting that different types of subunits occur. This may probably be correlated with genetic polymorphism in yeast. Multiple desamidations are also characterized and a few of these affect particularly labile structures. Many residues are unevenly distributed and unexpected patterns are shown. Elements of repetitive sequences occur, reducing the uniqueness of structures. Hydrophobic segments are found, and the uncharacterized region is, at least in some subunits, in a core-like tryptic segment. These and other aspects of the structure may explain some properties of the monomer, and form the background for evolutionary, structural and functional correlations with related enzymes.  相似文献   

15.
The sequence of the starch-binding domain present in 10% of amylolytic enzymes of microbial origin and classified as the carbohydrate-binding module family 20, was identified in the equivalent part of sequence of human genethonin, a skeletal muscle protein of unknown function. The sequence identity between the starch-binding domain from Bacillus sp. strain 1011 cyclodextrin glucanotransferase and the corresponding segment of human genethonin was higher than 28%. The amino acid residues known to be involved in the raw starch binding were found to be conserved in the genethonin sequence. The three-dimensional structure of the genethonin 'starch-binding domain' was modelled and its eventual function briefly discussed.  相似文献   

16.
The three-dimensional structure of nawaprin has been determined by nuclear magnetic resonance spectroscopy. This 51-amino acid residue peptide was isolated from the venom of the spitting cobra, Naja nigricollis, and is the first member of a new family of snake venom proteins referred to as waprins. Nawaprin is relatively flat and disc-like in shape, characterized by a spiral backbone configuration that forms outer and inner circular segments. The two circular segments are held together by four disulfide bonds, three of which are clustered at the base of the molecule. The inner segment contains a short antiparallel beta-sheet, whereas the outer segment is devoid of secondary structures except for a small turn or 310 helix. The structure of nawaprin is very similar to elafin, a human leukocyte elastase-specific inhibitor. Although substantial parts of the nawaprin molecule are well defined, the tips of the outer and inner circular segments, which are hypothesized to be critical for binding interactions, are apparently disordered, similar to that found in elafin. The amino acid residues in these important regions in nawaprin are different from those in elafin, suggesting that nawaprin is not an elastase-specific inhibitor and therefore has a different function in the snake venom.  相似文献   

17.
A suite of FORTRAN programs, PREF, is described for calculating preference functions from the data base of known protein structures and for comparing smoothed profiles of sequence-dependent preferences in proteins of unknown structure. Amino acid preferences for a secondary structure are considered as functions of a sequence environment. Sequence environment of amino acid residue in a protein is defined as an average over some physical, chemical, or statistical property of its primary structure neighbors. The frequency distribution of sequence environments in the data base of soluble protein structures is approximately normal for each amino acid type of known secondary conformation. An analytical expression for the dependence of preferences on sequence environment is obtained after each frequency distribution is replaced by corresponding Gaussian function. The preference for the α-helical conformation increases for each amino acid type with the increase of sequence environment of buried solvent-accessible surface areas. We show that a set of preference functions based on buried surface area is useful for predicting folding motifs in α-class proteins and in integral membrane proteins. The prediction accuracy for helical residues is 79% for 5 integral membrane proteins and 74% for 11 α-class soluble proteins. Most residues found in transmembrane segments of membrane proteins with known α-helical structure are predicted to be indeed in the helical conformation because of very high middle helix preferences. Both extramembrane and transmembrane helices in the photosynthetic reaction center M and L subunits are correctly predicted. We point out in the discussion that our method of conformational preference functions can identify what physical properties of the amino acids are important in the formation of particular secondary structure elements. © 1993 John Wiley & Sons, Inc.  相似文献   

18.
We have previously reported the isolation of the gene coding for a 25-kDa polypeptide present in a purified yeast QH2:cytochrome c oxidoreductase preparation, which was thus identified as the gene for the Rieske iron-sulphur protein [Van Loon et al. (1983) Gene 26, 261-272]. Subsequent DNA sequence analysis reported here reveals, however, that the encoded protein is in fact manganese superoxide dismutase, a mitochondrial matrix protein. Comparison with the known amino acid sequence of the mature protein indicates that it is synthesized with an N-terminal extension of 27 amino acids. In common with the N-terminal extensions of other imported mitochondrial proteins, the presequence has several basic residues but lacks negatively charged residues. The function of these positive charges and other possible topogenic sequences are discussed. Sequences 5' of the gene contain two elements that may be homologous to the suggested regulatory sites, UAS 1 and UAS 2 in the yeast CYC1 gene [Guarente et al. (1984) Cell 36, 503-511]. The predicted secondary structures in manganese superoxide dismutase appear to be very similar to those reported for iron superoxide dismutase, suggesting similar three-dimensional structures. Making use of the known three-dimensional structure of the Fe enzyme, the Mn ligands are predicted.  相似文献   

19.
Genomics has posed the challenge of determination of protein function from sequence and/or 3-D structure. Functional assignment from sequence relationships can be misleading, and structural similarity does not necessarily imply functional similarity. Proteins in the DJ-1 family, many of which are of unknown function, are examples of proteins with both sequence and fold similarity that span multiple functional classes. THEMATICS (theoretical microscopic titration curves), an electrostatics-based computational approach to functional site prediction, is used to sort proteins in the DJ-1 family into different functional classes. Active site residues are predicted for the eight distinct DJ-1 proteins with available 3-D structures. Placement of the predicted residues onto a structural alignment for six of these proteins reveals three distinct types of active sites. Each type overlaps only partially with the others, with only one residue in common across all six sets of predicted residues. Human DJ-1 and YajL from Escherichia coli have very similar predicted active sites and belong to the same probable functional group. Protease I, a known cysteine protease from Pyrococcus horikoshii, and PfpI/YhbO from E. coli, a hypothetical protein of unknown function, belong to a separate class. THEMATICS predicts a set of residues that is typical of a cysteine protease for Protease I; the prediction for PfpI/YhbO bears some similarity. YDR533Cp from Saccharomyces cerevisiae, of unknown function, and the known chaperone Hsp31 from E. coli constitute a third group with nearly identical predicted active sites. While the first four proteins have predicted active sites at dimer interfaces, YDR533Cp and Hsp31 both have predicted sites contained within each subunit. Although YDR533Cp and Hsp31 form different dimers with different orientations between the subunits, the predicted active sites are superimposable within the monomer structures. Thus, the three predicted functional classes form four different types of quaternary structures. The computational prediction of the functional sites for protein structures of unknown function provides valuable clues for functional classification.  相似文献   

20.
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号