首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Internal homologies in an amino acid sequence of a protein and in amino acid sequences of two different proteins are examined, using correlation coefficients calculated from the sequences when residues are replaced by various quantitative properties of the amino acids such as hydrophobicity. To improve the signal-noise ratio the average correlation coefficient is used to detect homology because the correlation depends on the property considered. In this way, any sequence repetition in a protein and the extent of the similarity and difference among proteins can be estimated quantitatively. The procedure was applied first to the sequences of proteins which have been assumed on other grounds to contain some internal sequence repetitions, α-tropomyosin from rabbit skeletal muscle, calmodulin from bovine brain, troponin C from skeletal and cardiac muscle, and then to the sequences of calcium binding proteins, calmodulin, troponin C, and L2 light chain of myosin. The results show that α-tropomyosin has a markedly periodic sequence at intervals of multiples of seven residues throughout the whole sequence, and calmodulin and skeletal troponin C contain two homologous sequences, the homology of troponin C being weaker than that of calmodulin. Candidates for the calcium binding regions of both troponin C, calmodulin, and L2 light chain are the homologous parts having a high average correlation coefficient (about 0·5) with respect to the sequences of the CD and EF hand regions of carp parvalbumin. The procedure may be a useful method for searching for homologous segments in amino acid sequences.  相似文献   

2.
We present a new method for predicting the secondary structure of globular proteins based on non-linear neural network models. Network models learn from existing protein structures how to predict the secondary structure of local sequences of amino acids. The average success rate of our method on a testing set of proteins non-homologous with the corresponding training set was 64.3% on three types of secondary structure (alpha-helix, beta-sheet, and coil), with correlation coefficients of C alpha = 0.41, C beta = 0.31 and Ccoil = 0.41. These quality indices are all higher than those of previous methods. The prediction accuracy for the first 25 residues of the N-terminal sequence was significantly better. We conclude from computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for non-homologous proteins. The performance of our method of homologous proteins is much better than for non-homologous proteins, but is not as good as simply assuming that homologous sequences have identical structures.  相似文献   

3.
M J Sippl  S Weitckus 《Proteins》1992,13(3):258-271
We present an approach which can be used to identify native-like folds in a data base of protein conformations in the absence of any sequence homology to proteins in the data base. The method is based on a knowledge-based force field derived from a set of known protein conformations. A given sequence is mounted on all conformations in the data base and the associated energies are calculated. Using several conformations and sequences from the globin family we show that the native conformation is identified correctly. In fact the resolution of the force field is high enough to discriminate between a native fold and several closely related conformations. We then apply the procedure to several globins of known sequence but unknown three dimensional structure. The homology of these sequences to globins of known structures in the data base ranges from 49 to 17%. With one exception we find that for all globin sequences one of the known globin folds is identified as the most favorable conformation. These results are obtained using a force field derived from a data base devoid of globins of known structure. We briefly discuss useful applications in protein structural research and future development of our approach.  相似文献   

4.
K W Jackson  J Tang 《Biochemistry》1982,21(26):6620-6625
The complete amino acid sequence of streptokinase has been determined by automated Edman degradation of its cyanogen bromide and proteolytic fragments. The protein consists of 415 amino acid residues. Sequence microheterogeneity was found at two positions. The NH2-terminal 245 residues of streptokinase are homologous to the sequences of several serine proteases including bovine trypsin and Streptomyces griseus proteases A and B. The sequence alignment suggests that the active-site histidine-57 has changed to a glycine in streptokinase. The other active-site residues, aspartyl-102 and serine-195, are, however, present at the expected positions. Streptokinase also contains internal sequence homology between the NH2-terminal 173 residues and a COOH-terminal 162-residue region between residues 254 and 415. Moderate homology in predicted secondary structures also exists between these two regions. Although streptokinase is not a protease, these observations suggest that it has evolved from a serine protease by gene duplication and fusion. A COOH-terminal region of about 80 residues is apparently deleted from the second half of the duplicated structures. These observations further suggest that the three-dimensional structure of streptokinase likely contains two independently folded domains, each homologous to serine proteases.  相似文献   

5.
We have determined the sequence of a partial cDNA clone encoding the C-terminal region of bovine cartilage aggregating proteoglycan core protein. The deduced amino acid sequence contains a cysteine-rich region which is homologous with chicken hepatic lectin. This lectin-homologous region has previously been identified in rat and chicken cartilage proteoglycan. The bovine sequence presented here is highly homologous with the rat and chicken amino acid sequences in this apparently globular region. A region containing clusters of Ser-Gly sequences is located N-terminal to the lectin homology domain. These Ser-Gly-rich segments are arranged in tandemly repeated, approx. 100-residue-long, homology domains. Each homology domain consists of an approx. 75-residue-long Ser-Gly-rich region separated by an approx. 25-residue-long segment lacking Ser-Gly dipeptides. These dipeptides are arranged in 10-residue-long segments in the 100-residue-long homology domains. The shorter homologous segments are tandemly repeated some six times in each 100-residue-long homology domain. Serine residues in these repeats are potential attachment sites for chondroitin sulphate chains.  相似文献   

6.
Edward R. Fliss  Peter Setlow   《Gene》1984,30(1-3):167-172
The nucleotide sequence of the Bacillus megaterium gene coding for spore-specific protein C-3 has been determined. The gene codes for 65 amino acids and the coding sequence is preceded by an efficient ribosome-binding site. The predicted protein C-3 sequence agrees with both the amino acid composition and the amino terminal sequence of protein C-3, and shows homology (approx. 65 % of all residues are identical) with the sequences of the analogous proteins A and C of B. megaterium. Protein C-3 is cleaved by the sequence-specific B. megaterium spore protease, and the amino acid sequence at the new amino-terminus generated is identical to that predicted from the gene sequence, and homologous to the spore protease cleavage sites in the A and C proteins. The protein C-3 gene also shares a number of features with the previously sequenced protein C gene in both upstream and downstream flanking sequence.  相似文献   

7.
8.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

9.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

10.
To investigate the degree of similarity between picornavirus proteases, we cloned the genomic cDNAs of an enterovirus, echovirus 9 (strain Barty), and two rhinoviruses, serotypes 1A and 14LP, and determined the nucleotide sequence of the region which, by analogy to poliovirus, encodes the protease. The nucleotide sequence of the region encoding the genome-linked protein VPg, immediately adjacent to the protease, was also determined. Comparison of nucleotide and deduced amino acid sequences with other available picornavirus sequences showed remarkable homology in proteases and among VPgs. Three highly conserved peptide regions were identified in the protease; one of these is specific for human picornaviruses and has no obvious counterpart in encephalomyocarditis virus, foot-and-mouth disease virus, or cowpea mosaic virus proteases. Within the other two peptide regions two conserved amino acids, Cys 147 and His 161, could be the reactive residues of the active site. We used a statistical method to predict certain features of the secondary structures, such as alpha helices, beta sheets, and turns, and found many of these conformations to be conserved. The hydropathy profiles of the compared proteases were also strikingly similar. Thus, the proteases of human picornaviruses very probably have a similar three-dimensional structure.  相似文献   

11.
Modeling protein loops using a phi i + 1, psi i dimer database.   总被引:1,自引:1,他引:0       下载免费PDF全文
We present an automated method for modeling backbones of protein loops. The method samples a database of phi i + 1 and psi i angles constructed from a nonredundant version of the Protein Data Bank (PDB). The dihedral angles phi i + 1 and psi i completely define the backbone conformation of a dimer when standard bond lengths, bond angles, and a trans planar peptide configuration are used. For the 400 possible dimers resulting from 20 natural amino acids, a list of allowed phi i + 1, psi i pairs for each dimer is created by pooling all such pairs from the loop segments of each protein in the nonredundant version of the PDB. Starting from the N-terminus of the loop sequence, conformations are generated by assigning randomly selected pairs of phi i + 1, psi i for each dimer from the respective pool using standard bond lengths, bond angles, and a trans peptide configuration. We use this database to simulate protein loops of lengths varying from 5 to 11 amino acids in five proteins of known three-dimensional structures. Typically, 10,000-50,000 models are simulated for each protein loop and are evaluated for stereochemical consistency. Depending on the length and sequence of a given loop, 50-80% of the models generated have no stereochemical strain in the backbone atoms. We demonstrate that, when simulated loops are extended to include flanking residues from homologous segments, only very few loops from an ensemble of sterically allowed conformations orient the flanking segments consistent with the protein topology. The presence of near-native backbone conformations for loops from five different proteins suggests the completeness of the dimeric database for use in modeling loops of homologous proteins. Here, we take advantage of this observation to design a method that filters near-native loop conformations from an ensemble of sterically allowed conformations. We demonstrate that our method eliminates the need for a loop-closure algorithm and hence allows for the use of topological constraints of the homologous proteins or disulfide constraints to filter near-native loop conformations.  相似文献   

12.
Proteins with homologous amino acid sequences have similar folds and it has been assumed that an unknown three-dimensional structure can be obtained from a known homologous structure by substituting new side-chains into the polypeptide chain backbone, followed by relatively small adjustment of the model. To examine this approach of structure prediction and, more generally, to isolate the characteristics of native proteins, we constructed two incorrectly folded protein models. Sea-worm hemerythrin and the variable domain of mouse immunoglobulin K-chain, two proteins with no sequence homology, were chosen for study; the former is composed of a bundle of four alpha-helices and the latter consists of two 4-stranded beta-sheets. Using an automatic computer procedure, hemerythrin side-chains were substituted into the immunoglobulin domain and vice versa. The structures were energy-minimized with the program CHARMM and the resulting structures compared with the correctly folded forms. It was found that the incorrect side-chains can be incorporated readily into both types of structures (alpha-helices, beta-sheets) with only small structural adjustments. After constrained energy-minimization, which led to an average atomic co-ordinate shift of no more than 0.7 to 0.9 A, the incorrectly folded models arrived at potential energy values comparable to those of the correct structures. Detailed analysis of the energy results shows that the incorrect structures have less stabilizing electrostatic, van der Waals' and hydrogen-bonding interactions. The difference is particularly pronounced when the electrostatic and van der Waals' energy terms are calculated by modified equations that include an approximate representation of solvent effects. The incorrectly folded structures also have a significantly larger solvent-accessible surface and a greater fraction of non-polar side-chain atoms exposed to solvent. Examination of their interior shows that the packing of side-chains at the secondary structure interfaces, although corresponding to sterically allowed conformations, deviates from the characteristics found in normal proteins. The analysis of incorrectly folded structures has made it clear that the absence of bad non-bonded contacts, though necessary, is not sufficient to demonstrate the validity of model-built structures and that modeling of homologous structures has to be accompanied by a thorough quantitative evaluation of the results. Further, certain features that characterize native proteins are made evident by their absence in misfolded models.  相似文献   

13.
Using several consensus sequences for the 106 amino acid residue alpha-spectrin repeat segment as probes we searched animal sequence databases using the BLAST program in order to find proteins revealing limited, but significant similarity to spectrin. Among many spectrins and proteins from the spectrin-alpha-actinin-dystrophin family as well as sequences showing a rather high degree of similarity in very short stretches, we found seven homologous animal sequences of low overall similarity to spectrin but showing the presence of one or more spectrin-repeat motifs. The homology relationship of these sequences to alpha-spectrin was further analysed using the SEMIHOM program. Depending on the probe, these segments showed the presence of 6 to 26 identical amino acid residues and a variable number of semihomologous residues. Moreover, we found six protein sequences, which contained a sequence fragment sharing the SH3 (sarc homology region 3) domain homology of 42-59% similarity. Our data indicate the occurrence of motifs of significant homology to alpha-spectrin repeat segments among animal proteins, which are not classical members of the spectrin-alpha-actinin-dystrophin family. This might indicate that these segments together with the SH3 domain motif are conserved in proteins which possibly at the early stage of evolution were close cognates of spectrin-alpha-actinin-dystrophin progenitors but then evolved separately.  相似文献   

14.
Structural genomic projects envision almost routine protein structure determinations, which are currently imaginable only for small proteins with molecular weights below 25,000 Da. For larger proteins, structural insight can be obtained by breaking them into small segments of amino acid sequences that can fold into native structures, even when isolated from the rest of the protein. Such segments are autonomously folding units (AFU) and have sizes suitable for fast structural analyses. Here, we propose to expand an intuitive procedure often employed for identifying biologically important domains to an automatic method for detecting putative folded protein fragments. The procedure is based on the recognition that large proteins can be regarded as a combination of independent domains conserved among diverse organisms. We thus have developed a program that reorganizes the output of BLAST searches and detects regions with a large number of similar sequences. To automate the detection process, it is reduced to a simple geometrical problem of recognizing rectangular shaped elevations in a graph that plots the number of similar sequences at each residue of a query sequence. We used our program to quantitatively corroborate the premise that segments with conserved sequences correspond to domains that fold into native structures. We applied our program to a test data set composed of 99 amino acid sequences containing 150 segments with structures listed in the Protein Data Bank, and thus known to fold into native structures. Overall, the fragments identified by our program have an almost 50% probability of forming a native structure, and comparable results are observed with sequences containing domain linkers classified in SCOP. Furthermore, we verified that our program identifies AFU in libraries from various organisms, and we found a significant number of AFU candidates for structural analysis, covering an estimated 5 to 20% of the genomic databases. Altogether, these results argue that methods based on sequence similarity can be useful for dissecting large proteins into small autonomously folding domains, and such methods may provide an efficient support to structural genomics projects.  相似文献   

15.
We have used antibodies to the basement membrane proteoglycan to screen lambda gt11 expression vector libraries and have isolated two cDNA clones, termed BPG 5 and BPG 7, which encode different portions of the core protein of the heparan sulfate basement membrane proteoglycan. These clones hybridize to a single mRNA species of approximately 12 kilobases. Amino acid sequences obtained on peptides derived from protease digests of the core protein were found in the deduced sequence, confirming the identity of these clones. BPG 5 spanned 1986 base pairs and has an open reading frame of 662 amino acids. The amino acid sequence deduced from BPG 5 contains two cysteine-rich domains and two internally homologous domains lacking cysteine. The cysteine-rich domains show homology to the cysteine-rich domains of the laminin chains. A globule-rod structure, similar to that of the short arms of the laminin chains, is proposed for this region of the proteoglycan. The other clone, BPG 7, is 2193 base pairs long and has an open reading frame of 731 amino acids. The deduced sequence contains eight internal repeats with 2 cysteine residues in each repeat. These repeats show homology to the neural-cell adhesion molecule N-CAM and the plasma alpha 1B-glycoprotein. Looping structures similar to these proteins and to other proteins of the immunoglobulin gene superfamily are proposed for this region of the proteoglycan. The sequence DSGEY was found four times in this domain and could be heparan sulfate attachment sites.  相似文献   

16.
From the spectrin gene to the assembly of the membrane skeleton   总被引:1,自引:0,他引:1  
The complete nucleotide sequence coding for the chicken brain alpha-spectrin was determined. It comprises the entire coding frame, 5'- and 3'-untranslated sequences terminating in a poly(A)-tail. The deduced amino acid sequence shows that the alpha-chain contains 22 segments, 20 of which correspond to the typical 106 residue repeat of the human erythrocyte spectrin. Some segments non-homologous to the repeat structure reside in the middle and COOH-terminal regions. Sequence comparisons with other proteins show that these segments evidently harbour some structural and functional features such as: homology to alpha-actinin and dystrophin, two typical EF-hand structures (calcium-binding) and a putative calmodulin-binding site in the COOH-terminus and a sequence homologous to various src-tyrosine kinases and to phospholipase C in the middle of the molecule. Comparison of our sequence with other partial alpha-spectrin sequences shows that alpha-spectrin is well conserved in different species and that the human erythrocyte alpha-spectrin is divergent.  相似文献   

17.
The degree of similarity in the three-dimensional structures of two proteins can be examined by comparing the patterns of hydrophobicity found in their amino acid sequences. Each type of amino acid residue is assigned a numerical hydrophobicity, and the correlation coefficient rH is computed between all pairs of residues in the two sequences. In tests on sequences from two properly aligned proteins of similar three-dimensional structures, rH is found in the range 0.3 to 0.7. Improperly aligned sequences or unrelated sequences give rH near zero. By considering the observed frequency of amino acid replacements among related structures, a set of optimal matching hydrophobicities (OMHs) was derived. With this set of OMHs, significant correlation coefficients are calculated for similar three-dimensional structures, even though the two sequences contain few identical residues. An example is the two similar folding domains of rhodanese (rH = 0.5). Predictions are made of similar three-dimensional structures for the alpha and beta chains of the various phycobiliproteins, and for delta hemolysin and melittin.  相似文献   

18.
We describe a computer algorithm for predicting the three-dimensional structures of proteins using only their amino acid sequences. The method differs from others in two ways: (1) it uses very few energy parameters, representing hydrophobic and polar interactions, and (2) it uses a new "constraint-based exhaustive" searching method, which appears to be among the fastest and most complete search methods yet available for realistic protein models. It finds a relatively small number of low-energy conformations, among which are native-like conformations, for crambin (1CRN), avian pancreatic polypeptide (1PPT), melittin (2MLT), and apamin. Thus, the lowest-energy states of very simple energy functions may predict the native structures of globular proteins.  相似文献   

19.
Li Q  Zhou C  Liu H 《Proteins》2009,74(4):820-836
General and transferable statistical potentials to quantify the compatibility between local structures and local sequences of peptide fragments in proteins were derived. In the derivation, structure clusters of fragments are obtained by clustering five-residue fragments in native proteins based on their conformations represented by a local structure alphabet (de Brevern et al., Proteins 2000;41:271-287), secondary structure states, and solvent accessibilities. On the basis of the native sequences of the structurally clustered fragments, the probabilities of different amino acid sequences were estimated for each structure cluster. From the sequence probabilities, statistical energies as a function of sequence for a given structure were directly derived. The same sequence probabilities were employed in a database-matching approach to derive statistical energies as a function of local structure for a given sequence. Compared with prior models of local statistical potentials, we provided an integrated approach in which local conformations and local environments are treated jointly, structures are treated in units of fragments instead of individual residues so that coupling between the conformations of adjacent residues is included, and strong interdependences between the conformations of overlapping or neighboring fragment units are also considered. In tests including fragment threading, pseudosequence design, and local structure predictions, the potentials performed at least comparably and, in most cases, better than a number of existing models applicable to the same contexts indicating the advantages of such an integrated approach for deriving local potentials and suggesting applicability of the statistical potentials derived here in sequence designs and structure predictions.  相似文献   

20.
To estimate how extensively the ensemble of denatured-state conformations is constrained by local side-chain–backbone interactions, propensities of each of the 20 amino acids to occur in mono- and dipeptides mapped to discrete regions of the Ramachandran map are computed from proteins of known structure. In addition, propensities are computed for the trans, gauche−, and gauche+ rotamers, with or without consideration of the values of phi and psi. These propensities are used in scoring functions for fragment threading, which estimates the energetic favorability of fragments of protein sequence to adopt the native conformation as opposed to hundreds of thousands of incorrect conformations. As finer subdivisions of the Ramachandran plot, neighboring residue phi/psi angles, and rotamers are incorporated, scoring functions become better at ranking the native conformation as the most favorable. With the best composite propensity function, the native structure can be distinguished from 300,000 incorrect structures for 71% of the 2130 arbitrary protein segments of length 40, 48% of 2247 segments of length 30, and 20% of 2368 segments of length 20. A majority of fragments of length 30–40 are estimated to be folded into the native conformation a substantial fraction of the time. These data suggest that the variations observed in amino acid frequencies in different phi/psi/chi1 environments in folded proteins reflect energetically important local side-chain–backbone interactions, interactions that may severely restrict the ensemble of conformations populated in the denatured state to a relatively small subset with nativelike structure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号