首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The location measure of a residue in a globular protein is defined as the number of C alpha atoms surrounding the residue located within a sphere of the radius of 14 A. This quantity is a measure of the exposure of a residue to solvent, and is related closely to the distance from the center of mass of a protein. In this work, the experimental value for each residue of a protein is obtained from the X-ray crystallographic data, and the quantity is also calculated from the amino acid sequence data by applying an empirical parameter set to it. The correlation between the experimental and computed quantities is as high as 0.50 on the average over 92 proteins of known three-dimensional structure. Therefore, the location measure of every residue in a globular protein is predictable with good accuracy from the sequence.  相似文献   

2.
Prediction-based fingerprints of protein-protein interactions   总被引:2,自引:0,他引:2  
Porollo A  Meller J 《Proteins》2007,66(3):630-645
The recognition of protein interaction sites is an important intermediate step toward identification of functionally relevant residues and understanding protein function, facilitating experimental efforts in that regard. Toward that goal, the authors propose a novel representation for the recognition of protein-protein interaction sites that integrates enhanced relative solvent accessibility (RSA) predictions with high resolution structural data. An observation that RSA predictions are biased toward the level of surface exposure consistent with protein complexes led the authors to investigate the difference between the predicted and actual (i.e., observed in an unbound structure) RSA of an amino acid residue as a fingerprint of interaction sites. The authors demonstrate that RSA prediction-based fingerprints of protein interactions significantly improve the discrimination between interacting and noninteracting sites, compared with evolutionary conservation, physicochemical characteristics, structure-derived and other features considered before. On the basis of these observations, the authors developed a new method for the prediction of protein-protein interaction sites, using machine learning approaches to combine the most informative features into the final predictor. For training and validation, the authors used several large sets of protein complexes and derived from them nonredundant representative chains, with interaction sites mapped from multiple complexes. Alternative machine learning techniques are used, including Support Vector Machines and Neural Networks, so as to evaluate the relative effects of the choice of a representation and a specific learning algorithm. The effects of induced fit and uncertainty of the negative (noninteracting) class assignment are also evaluated. Several representative methods from the literature are reimplemented to enable direct comparison of the results. Using rigorous validation protocols, the authors estimated that the new method yields the overall classification accuracy of about 74% and Matthews correlation coefficients of 0.42, as opposed to up to 70% classification accuracy and up to 0.3 Matthews correlation coefficient for methods that do not utilize RSA prediction-based fingerprints. The new method is available at http://sppider.cchmc.org.  相似文献   

3.
Polyphenol oxidase (PPO), a metalloenzyme containing a type-3 copper center, is produced by many species of plants, fungi, and bacteria. There is great variability in the subunit molecular mass reported for PPO, even from a single species. In some cases, experimental evidence (usually protein sequencing by Edman degradation) indicates that the variability in molecular mass for PPO from a given species is the result of proteolytic processing at the N and/or C-termini of the protein. In order to identify specific sequence regions where proteolysis occurs in PPO from most species, the experimentally established N and C-termini of these proteolyzed enzymes were compared to the protein sequences of other PPOs for which the N and C-termini have not been established by protein sequencing methods. In all cases the N-terminal proteolysis sites were located prior to a conserved arginine residue, and the C-terminal proteolysis sites were located following a conserved tyrosine motif. Based on the sites of proteolysis, molecular masses were calculated for the enzymes, and the calculated values were used to rationalize the varying molecular masses reported in the literature. To determine the structural implications of N and C-terminal proteolysis, the proteolysis sites were related to the two available PPO structures: Ipomoea batatas catechol oxidase and Streptomyces castaneoglobisporus tyrosinase. A structural “core” region that appears to be essential for structural stability and enzymatic activity was identified.  相似文献   

4.
Limited proteolysis or autolysis of thermolysin under different experimental conditions leads to fission of a small number of peptide bonds located in exposed surface segments of the polypeptide chain characterized by highest mobility, as given by the temperature factors (B values) determined crystallographically [Holmes, M.A., & Matthews, B.W. (1982) J. Mol. Biol. 160, 623-639]. Considering also similar findings observed previously with other protein systems, it is proposed that this correlation between segmental mobility and sites of limited proteolysis in globular proteins is quite general. Thus, flexibility of the polypeptide chain of a globular protein at the site of proteolytic attack promotes optimal binding and proper interaction with the active site of the protease. These findings emphasize that apparent thermal motion seen in protein crystals is relevant to motion in solution and appear to be of general significance in protein-protein recognition processes.  相似文献   

5.
6.
The gene 5 protein (g5p) of the bacteriophage Pf1 is a 144 residue single-stranded (ss) DNA binding protein involved in replication and packaging of the viral DNA. Compared to the gene 5 proteins of other filamentous bacteriophages, such as fd, the Pf1 g5p has an additional C-terminal sequence ( approximately 40 residues) with an unusual amino acid composition, being particularly rich in proline, glutamine and alanine. This C-terminal sequence is susceptible to limited proteolysis, in contrast to the globular N-terminal domain of the protein. The C-terminal sequence has been shown to play a role in the stabilisation of the protein-ssDNA complex. In the present study, the DNA sequence corresponding to the 38 amino acid residue C-terminal peptide has been cloned and expressed. A variety of biophysical techniques suggest that this peptide has a largely irregular conformation in solution, in contrast to the N-terminal globular domain that is principally beta-sheet. However, circular dichroism (CD) spectroscopy indicates that the peptide can be induced to form a structure that resembles a left-handed polyproline-like (P(II)) helix, suggesting that the C-terminal tail of the protein may adopt a more structured conformation in the appropriate physiological environment.  相似文献   

7.
Procarboxypeptidase B is converted to enzymatically active carboxypeptidase B by limited proteolysis catalysed by trypsin, removing the long N-terminal activation segment of 95 amino acids. The three-dimensional crystal structure of procarboxypeptidase B from porcine pancreas has been determined at 2.3 A resolution and refined to a crystallographic R-factor of 0.169. The functional determinants of its enzymatic inactivity and of its activation by limited proteolysis have thus been unveiled. The activation segment folds in a globular region with an open sandwich antiparallel-alpha antiparallel-beta topology and in a C terminal alpha-helix which connects it to the enzyme moiety. The globular region (A7-A82) shields the preformed active site, and establishes specific interactions with residues important for substrate recognition. AspA41 forms a salt bridge with Arg145, which in active carboxypeptidase binds the C-terminal carboxyl group of substrate molecules. The connecting region occupies the putative extended substrate binding site. The scissile peptide bond cleaved by trypsin during activation is very exposed. Its cleavage leads to the release of the activation segment and to exposure of the substrate binding site. An open-sandwich folding has been observed in a number of other proteins and protein domains. One of them is the C-terminal fragment of L7/L12, a ribosomal protein from Escherichia coli that displays a topology similar to the activation domain of procarboxypeptidase.  相似文献   

8.
An exposed "hinge" region of cGMP-dependent protein kinase is known to be susceptible to both limited proteolysis and autophosphorylation. A 91-residue fragment has been isolated from this region and its amino acid sequence has been compared with the analogous regions of the cAMP-dependent protein kinases. Although a resemblance among these sequences is not striking, the phosphorylation sites are in corresponding regions toward the NH2 termini, and there are indications of homology in the vicinity of their autophosphorylation sites. As in the cAMP-dependent protein kinase, the site of autophosphorylation and the site of susceptibility to limited proteolysis are very near each other in the primary structure. The actual site of autophosphorylation (the underlined threonine residue in Pro-Arg-Thr-Thr-Arg) is quite different from those in the regulatory subunit of Type II cAMP-dependent kinase or the site in Type I regulatory subunit that can be phosphorylated by the cGMP-dependent protein kinase.  相似文献   

9.
The classical problem of secondary structure prediction is approached by a new joint algorithm (Q7-JASEP) that combines the best aspects of six different methods. The algorithm includes the statistical methods of Chou-Fasman, Nagano, and Burgess-Ponnuswamy-Scheraga, the homology method of Nishikawa, the information theory method of Garnier-Osgurthope-Robson, and the artificial neural network approach of Qian-Sejnowski. Steps in the algorithm are (i) optimizing each individual method with respect to its correlation coefficient (Q7) for assigning a structural type from the predictive score of the method, (ii) weighting each method, (iii) combining the scores from different methods, and (iv) comparing the scores for alpha-helix, beta-strand, and coil conformational states to assign the secondary structure at each residue position. The present application to 45 globular proteins demonstrates good predictive power in cross-validation testing (with average correlation coefficients per test protein of Q7, alpha = 0.41, Q7, beta = 0.47, Q7,c = 0.41 for alpha-helix, beta-strand, and coil conformations). By the criterion of correlation coefficient (Q7) for each type of secondary structure, Q7-JASEP performs better than any of the component methods. When all protein classes are included for training and testing (by cross-validation), the results here equal the best in the literature, by the Q7 criterion. More generally, the basic algorithm can be applied to any protein class and to any type of structure/sequence or function/sequence correlation for which multiple predictive methods exist.  相似文献   

10.
MotivationProtein-protein interactions are important for many biological processes. Theoretical understanding of the structurally determining factors of interaction sites will help to understand the underlying mechanism of protein-protein interactions. Taking advantage of advanced mathematical methods to correctly predict interaction sites will be useful. Although some previous studies have been devoted to the interaction interface of protein monomer and the interface residues between chains of protein dimers, very few studies about the interface residues prediction of protein multimers, including trimers, tetramer and even more monomers in a large protein complex. As we all know, a large number of proteins function with the form of multibody protein complexes. And the complexity of the protein multimers structure causes the difficulty of interface residues prediction on them. So, we hope to build a method for the prediction of protein tetramer interface residue pairs.ResultsHere, we developed a new deep network based on LSTM network combining with graph to predict protein tetramers interaction interface residue pairs. On account of the protein structure data is not the same as the image or video data which is well-arranged matrices, namely the Euclidean Structure mentioned in many researches. Because the Non-Euclidean Structure data can't keep the translation invariance, and we hope to extract some spatial features from this kind of data applying on deep learning, an algorithm combining with graph was developed to predict the interface residue pairs of protein interactions based on a topological graph building a relationship between vertexes and edges in graph theory combining multilayer Long Short-Term Memory network. First, selecting the training and test samples from the Protein Data Bank, and then extracting the physicochemical property features and the geometric features of surface residue associated with interfacial properties. Subsequently, we transform the protein multimers data to topological graphs and predict protein interaction interface residue pairs using the model. In addition, different types of evaluation indicators verified its validity.  相似文献   

11.
This paper proposes a novel method using protein residue conservation and evolution information, i.e., spatial sequence profile, sequence information entropy and evolution rate, to infer protein binding sites. Some predictors based on support vector machines (SVMs) algorithm are constructed to predict the role of surface residues in protein-protein interface. By combining protein residue characters, the prediction performance can be improved obviously. We then made use of the predicted labels of neighbor residues to improve the performance of the predictors. The efficiency and the effectiveness of our proposed approach are verified by its better prediction performance based on a non-redundant data set of heterodimers.  相似文献   

12.
Domain structure of the HSC70 cochaperone, HIP.   总被引:1,自引:0,他引:1  
The domain structure of the HSC70-interacting protein (HIP), a 43-kDa cytoplasmic cochaperone involved in the regulation of HSC70 chaperone activity and the maturation of progesterone receptor, has been probed by limited proteolysis and biophysical and biochemical approaches. HIP proteolysis by thrombin and chymotrypsin generates essentially two fragments, an NH2-terminal fragment of 25 kDa (N25) and a COOH-terminal fragment of 18 kDa (C18) that appear to be well folded and stable as indicated by circular dichroism and recombinant expression in Escherichia coli. NH2-terminal amino acid sequencing of the respective fragments indicates that both proteases cleave HIP within a predicted alpha-helix following the tetratricopeptide repeat (TPR) region, despite their different specificities and the presence of several potential cleavage sites scattered throughout the sequence, thus suggesting that this region is particularly accessible and may constitute a linker between two structural domains. After size exclusion chromatography, N25 and C18 elute as two distinct and homogeneous species having a Stokes radius of 49 and 24 A, respectively. Equilibrium sedimentation and sedimentation velocity indicate that N25 is a stable dimer, whereas C18 is monomeric in solution, with sedimentation coefficients of 3.2 and 2.3 S and f/f(o) values of 1.5 and 1.1 for N25 and C18, respectively, indicating that the N25 is elongated whereas C18 is globular in shape. Both domains are able to bind to the ATPase domain of HSC70 and inhibit rhodanese aggregation. Moreover, their effects appear to be additive when used in combination, suggesting a cooperation of these domains in the full-length protein not only for HSC70 binding but also for chaperone activity. Altogether, these results indicate that HIP is made of two structural and functional domains, an NH2-terminal 25-kDa domain, responsible for the dimerization and the overall asymmetry of the molecule, and a COOH-terminal 18-kDa globular domain, both involved in HSC70 and unfolded protein binding.  相似文献   

13.

Background  

Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.  相似文献   

14.
The lack of ordered structure in “natively unfolded” proteins raises a general question: Are there intrinsic properties of amino acid residues that are responsible for the absence of fixed structure at physiological conditions? In this article, we demonstrate that the competence of a protein to be folded or to be unfolded may be determined by the property of amino acid residues to form a sufficient number of contacts in a globular state. The expected average number of contacts per residue calculated from the amino acid sequence alone (using the average number of contacts for 20 amino acid residues in globular proteins) can be used as one of the simple indicators of natively unfolded proteins. The prediction accuracy for the sets of 80 folded and 90 natively unfolded proteins reaches 89% if the expected average number of contacts is used as a parameter and 83% in the case of hydrophobicity. An optimal set of artificial parameters for 20 amino acid residues obtained by Monte Carlo algorithm to maximally separate the sets of 90 natively unfolded and 80 folded proteins demonstrates the upper limit for prediction accuracy, which is 95%.  相似文献   

15.
天然球蛋白分子表面暴露的和柔性的环是对蛋白质水解作用最敏感的部位,可用蛋白质部分水解来确定木糖异构酶突变体上的这些部位。枯草杆菌蛋白酶对W136E单体的水解在一级反应图上呈折线,对T89S和V134I单体的部分水解为直线。对镁-酶(MgE)的水解速度低于对脱辅基酶(ApoE)的水解速度。枯草杆菌蛋白酶对W136E的第一个水解位点在Ala28与Thr29之间,第二个水解位点在肽链的C端。  相似文献   

16.
A new algorithm, called convex constraint analysis, has been developed to deduce the chiral contribution of the common secondary structures directly from experimental CD curves of a large number of proteins. The analysis is based on CD data reported by Yang, J.T., Wu, C.-S.C. and Martinez, H.M. [Methods Enzymol., 130, 208-269 (1986)]. Application of the decomposition algorithm for simulated protein data sets resulted in component spectra [B (lambda, i)] identical to the originals and weights [C (i, k)] with excellent Pearson correlation coefficients (R) [Chang, C.T., Wu, C.-S.C. and Yang, J.T. (1978) Anal. Biochem., 91, 12-31]. Test runs were performed on sets of simulated protein spectra created by the Monte Carlo technique using poly-L-lysine-based pure component spectra. The significant correlational coefficients (R greater than 0.9) demonstrated the high power of the algorithm. The algorithm, applied to globular protein data, independent of X-ray data, revealed that the CD spectrum of a given protein is composed of at least four independent sources of chirality. Three of the computed component curves show remarkable resemblance to the CD spectra of known protein secondary structures. This approach yields a significant improvement in secondary structural evaluations when compared with previous methods, as compared with X-ray data, and yields a realistic set of pure component spectra. The new method is a useful tool not only in analyzing CD spectra of globular proteins but also has the potential for the analysis of integral membrane proteins.  相似文献   

17.
Wang B  Chen P  Huang DS  Li JJ  Lok TM  Lyu MR 《FEBS letters》2006,580(2):380-384
This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.  相似文献   

18.
A database of peptide chemical shifts, computed at the density functional level, has been used to develop an algorithm for prediction of 15N and 13C shifts in proteins from their structure; the method is incorporated into a program called SHIFTS (version 4.0). The database was built from the calculated chemical shift patterns of 1335 peptides whose backbone torsion angles are limited to areas of the Ramachandran map around helical and sheet configurations. For each tripeptide in these regions of regular secondary structure (which constitute about 40% of residues in globular proteins) SHIFTS also consults the database for information about sidechain torsion angle effects for the residue of interest and for the preceding residue, and estimates hydrogen bonding effects through an empirical formula that is also based on density functional calculations on peptides. The program optionally searches for alternate side-chain torsion angles that could significantly improve agreement between calculated and observed shifts. The application of the program on 20 proteins shows good consistency with experimental data, with correlation coefficients of 0.92, 0.98, 0.99 and 0.90 and r.m.s. deviations of 1.94, 0.97, 1.05, and 1.08 ppm for 15N, 13C, 13C and 13C, respectively. Reference shifts fit to protein data are in good agreement with `random-coil' values derived from experimental measurements on peptides. This prediction algorithm should be helpful in NMR assignment, crystal and solution structure comparison, and structure refinement.  相似文献   

19.
Calculation of protein extinction coefficients from amino acid sequence data   总被引:128,自引:0,他引:128  
Quantitative study of protein-protein and protein-ligand interactions in solution requires accurate determination of protein concentration. Often, for proteins available only in "molecular biological" amounts, it is difficult or impossible to make an accurate experimental measurement of the molar extinction coefficient of the protein. Yet without a reliable value of this parameter, one cannot determine protein concentrations by the usual uv spectroscopic means. Fortunately, knowledge of amino acid residue sequence and promoter molecular weight (and thus also of amino acid composition) is generally available through the DNA sequence, which is usually accurately known for most such proteins. In this paper we present a method for calculating accurate (to +/- 5% in most cases) molar extinction coefficients for proteins at 280 nm, simply from knowledge of the amino acid composition. The method is calibrated against 18 "normal" globular proteins whose molar extinction coefficients are accurately known, and the assumptions underlying the method, as well as its limitations, are discussed.  相似文献   

20.
An algorithm is presented to identify peptide chain turns from X-ray-elucidated co-ordinate data. Chain turns are those regions in a globular protein where the backbone is folded back upon itself. The concept of a turn is important both because turns constitute recognizable structural units in proteins and because turns are situated at the solvent-accessible surface of the molecule.Current algorithms for turn identification are highly operational in character, often finding false turns and omitting actual ones. The algorithm presented here uses only the C-alpha co-ordinates for every residue in the protein. No other information of any kind is required, and notions about hydrogen bonding at these loci are irrelevant to the geometric nature of the argument. In this sense, the algorithm provides an objective criterion for the recognition of turns as strictly structural components in proteins.The algorithm is used to find the turns in a test set of proteins. Results of this application are in excellent agreement with visual turn identification from physical models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号