首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Machine learning approach for the prediction of protein secondary structure   总被引:8,自引:0,他引:8  
PROMIS (protein machine induction system), a program for machine learning, was used to generalize rules that characterize the relationship between primary and secondary structure in globular proteins. These rules can be used to predict an unknown secondary structure from a known primary structure. The symbolic induction method used by PROMIS was specifically designed to produce rules that are meaningful in terms of chemical properties of the residues. The rules found were compared with existing knowledge of protein structure: some features of the rules were already recognized (e.g. amphipathic nature of alpha-helices). Other features are not understood, and are under investigation. The rules produced a prediction accuracy for three states (alpha-helix, beta-strand and coil) of 60% for all proteins, 73% for proteins of known alpha domain type, 62% for proteins of known beta domain type and 59% for proteins of known alpha/beta domain type. We conclude that machine learning is a useful tool in the examination of the large databases generated in molecular biology.  相似文献   

2.
There are constraints on a protein sequence/structure for it to adopt a particular fold. These constraints could be either a local signature involving particular sequences or arrangements of secondary structure or a global signature involving features along the entire chain. To search systematically for protein fold signatures, we have explored the use of Inductive Logic Programming (ILP). ILP is a machine learning technique which derives rules from observation and encoded principles. The derived rules are readily interpreted in terms of concepts used by experts. For 20 populated folds in SCOP, 59 rules were found automatically. The accuracy of these rules, which is defined as the number of true positive plus true negative over the total number of examples, is 74% (cross-validated value). Further analysis was carried out for 23 signatures covering 30% or more positive examples of a particular fold. The work showed that signatures of protein folds exist, about half of rules discovered automatically coincide with the level of fold in the SCOP classification. Other signatures correspond to homologous family and may be the consequence of a functional requirement. Examination of the rules shows that many correspond to established principles published in specific literature. However, in general, the list of signatures is not part of standard biological databases of protein patterns. We find that the length of the loops makes an important contribution to the signatures, suggesting that this is an important determinant of the identity of protein folds. With the expansion in the number of determined protein structures, stimulated by structural genomics initiatives, there will be an increased need for automated methods to extract principles of protein folding from coordinates.  相似文献   

3.
This paper investigates whether Chomsky-like grammar representations are useful for learning cost-effective, comprehensible predictors of members of biological sequence families. The Inductive Logic Programming (ILP) Bayesian approach to learning from positive examples is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Collectively, five of the co-authors of this paper, have extensive expertise on NPPs and general bioinformatics methods. Their motivation for generating a NPP grammar was that none of the existing bioinformatics methods could provide sufficient cost-savings during the search for new NPPs. Prior to this project experienced specialists at SmithKline Beecham had tried for many months to hand-code such a grammar but without success. Our best predictor makes the search for novel NPPs more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the ILP Bayesian approach to learning from positive examples. A group of features is derived from this grammar. Other groups of features of NPPs are derived using other learning strategies. Amalgams of these groups are formed. A recognition model is generated for each amalgam using C4.5 and C4.5rules and its performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.  相似文献   

4.
The receptor, a maltose/maltooligosaccharide-binding protein, has been found to be an excellent system for the study of molecular recognition because its polar and nonpolar binding functions are segregated into two globular domains. The X-ray structures of the "closed" and "open" forms of the protein complexed with maltose and maltotetraitol have been determined. These sugars have approximately 3 times more accessible polar surface (from OH groups) than nonpolar surface (from small clusters of sugar ring CH bonds). In the closed structures, the oligosaccharides are buried in the groove between the two domains of the protein and bound by extensive hydrogen bonding interactions of the OH groups with the polar residues confined mostly in one domain and by nonpolar interactions of the CH clusters with four aromatic residues lodged in the other domain. Substantial contacts between the sugar hydroxyls and aromatic residues are also formed. In the open structures, the oligosaccharides are bound almost exclusively in the domain rich in aromatic residues. This finding, along with the analysis of buried surface area due to complex formations in the open and closed structures, supports a major role for nonpolar interactions in initial ligand binding even when the ligands have significantly greater potential for highly specific polar interactions.  相似文献   

5.
Protein–protein interactions (PPI) are crucial for the establishment of life. However, its basic principles are still elusive and the recognition process is yet to be understood. It is important to look at the biomolecular structural space as a whole, in order to understand the principles behind conformation–function relationships. Since the application of an alanine scanning mutagenesis (ASM) study to the growth hormone it was demonstrated that only a small subset of residues at a protein–protein interface is essential for binding — the hot-spots (HS). Aromatic residues are some of the most typical HS at a protein–protein interface. To investigate the structural role of the interfacial aromatic residues in protein–protein interactions, we performed Molecular Dynamic (MD) simulations of protein–protein complexes in a water environment and calculated a variety of physical–chemical characteristics. ASM studies of single residues and of dimers or high-order clusters were performed to check for cooperativity within aromatic residues. Major differences were found between the behavior of non-HS aromatic residues and HS aromatic residues that can be used to design drugs to block the critical interactions or to predict major interactions at protein–protein complexes.  相似文献   

6.
7.
The identification of protein–protein interactions is vital for understanding protein function, elucidating interaction mechanisms, and for practical applications in drug discovery. With the exponentially growing protein sequence data, fully automated computational methods that predict interactions between proteins are becoming essential components of system‐level function inference. A thorough analysis of protein complex structures demonstrated that binding site locations as well as the interfacial geometry are highly conserved across evolutionarily related proteins. Because the conformational space of protein–protein interactions is highly covered by experimental structures, sensitive protein threading techniques can be used to identify suitable templates for the accurate prediction of interfacial residues. Toward this goal, we developed eFindSitePPI, an algorithm that uses the three‐dimensional structure of a target protein, evolutionarily remotely related templates and machine learning techniques to predict binding residues. Using crystal structures, the average sensitivity (specificity) of eFindSitePPI in interfacial residue prediction is 0.46 (0.92). For weakly homologous protein models, these values only slightly decrease to 0.40–0.43 (0.91–0.92) demonstrating that eFindSitePPI performs well not only using experimental data but also tolerates structural imperfections in computer‐generated structures. In addition, eFindSitePPI detects specific molecular interactions at the interface; for instance, it correctly predicts approximately one half of hydrogen bonds and aromatic interactions, as well as one third of salt bridges and hydrophobic contacts. Comparative benchmarks against several dimer datasets show that eFindSitePPI outperforms other methods for protein‐binding residue prediction. It also features a carefully tuned confidence estimation system, which is particularly useful in large‐scale applications using raw genomic data. eFindSitePPI is freely available to the academic community at http://www.brylinski.org/efindsiteppi . Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
In the age of proteomics, the role of certain amino acid residues and some post-translational modifications in noncovalent complex formation are gaining in importance, as the understanding of interactions between biological molecules, is at the heart of the structure function relationship puzzle. In this work, mass spectrometry is used to highlight ammonium- or guanidinium-aromatic interactions through Cation-pi bonds and ammonium- or guanidinium-phosphate interactions through salt bridge formation. Such interactions are crucial factors in certain ligand-receptor interactions and receptor-receptor interactions. In addition, the ability of phosphorylated residues and phosphorylated lipids to form noncovalent complexes with guanidinium and quaternary ammonium (mostly through Coulombic interactions) is demonstrated, and could explain the stability of certain membrane embedded protein, or a possible role for phosphorylation in protein-protein interactions. Dougherty's work demonstrates cation-pi interactions in intra-protein interactions and folding, the present work explores inter-peptide interactions, i.e., the formation of noncovalent complexes between peptides' epitopes containing adjacent aromatic residues and ones containing adjacent Arg as a model to better understand the role of cation-pi complexes in protein-protein interaction. Complexes of peptides containing aromatic residues with quaternary amines as well as the interaction of aromatic compounds, with the guanidinium group of Arg are also investigated. Considering that an inordinate number of therapeutic compounds contain aromatic rings and quaternary amines, the above-described interactions could possibly be of great importance in better understanding their mechanism of action.  相似文献   

9.
Antiparallel beta-sheets present two distinct environments to inter-strand residue pairs: beta(A,HB) sites have two backbone hydrogen bonds; whereas at beta(A,NHB) positions backbone hydrogen bonding is precluded. We used statistical methods to compare the frequencies of amino acid pairs at each site. Only approximately 10% of the 210 possible pairs showed occupancies that differed significantly between the two sites. Trends were clear in the preferred pairs, and these could be explained using stereochemical arguments. Cys-Cys, Aromatic-Pro, Thr-Thr, and Val-Val pairs all preferred the beta(A,NHB) site. In each case, the residues usually adopted sterically favored chi1 conformations, which facilitated intra-pair interactions: Cys-Cys pairs formed disulfide bonds; Thr-Thr pairs made hydrogen bonds; Aromatic-Pro and Val-Val pairs formed close van der Waals contacts. In contrast, to make intimate interactions at a beta(A,HB) site, one or both residues had to adopt less favored chi1 geometries. Nonetheless, pairs containing glycine and/or aromatic residues were favored at this site. Where glycine and aromatic side chains combined, the aromatic residue usually adopted the gauche conformation, which promoted novel aromatic ring-peptide interactions. This work provides rules that link protein sequence and tertiary structure, which will be useful in protein modeling, redesign, and de novo design. Our findings are discussed in light of previous analyses and experimental studies.  相似文献   

10.
We examine the interaction of aromatic residues of proteins with arginine, an additive commonly used to suppress protein aggregation, using experiments and molecular dynamics simulations. An aromatic-rich peptide, FFYTP (a segment of insulin), and lysozyme and insulin are used as model systems. Mass spectrometry shows that arginine increases the solubility of FFYTP by binding to the peptide, with the simulations revealing the predominant association of arginine to be with the aromatic residues. The calculations further show a positive preferential interaction coefficient, Γ(XP), contrary to conventional thinking that positive Γ(XP)'s indicate aggregation rather than suppression of aggregation. Simulations with lysozyme and insulin also show arginine's preference for aromatic residues, in addition to acidic residues. We use these observations and earlier results reported by us and others to discuss the possible implications of arginine's interactions with aromatic residues on the solubilization of aromatic moieties and proteins. Our results also highlight the fact that explanations based purely on Γ(XP), which measures average affinity of an additive to a protein, could obscure or misinterpret the underlying molecular mechanisms behind additive-induced suppression of protein aggregation.  相似文献   

11.
Samanta U  Pal D  Chakrabarti P 《Proteins》2000,38(3):288-300
Although relatively rare, the tryptophan residue (Trp), with its large hydrophobic surface, has a unique role in the folded structure and the binding site of many proteins, and its fluorescence properties make it very useful in studying the structures and dynamics of protein molecules in solution. An analysis has been made of its environment and the geometry of its interaction with neighbors using 719 Trp residues in 180 different protein structures. The distribution of the number of partners interacting with the Trp aromatic ring shows a peak at 6 (considering protein residues only) and 8 (including water and substrate molecules also). The means of the solvent-accessible surface areas of the ring show an exponential decrease with the increase in the number of partners; this relationship can be used to assess the efficiency of packing of residues around Trp. Various residues exhibit different propensities of binding the Trp side chain. The aromatic residues, Met and Pro have high values, whereas the smaller and polar-chain residues have weaker propensities. Most of the interactions are with residues far away in sequence, indicating the importance of Trp in stabilizing the tertiary structure. Of all the ring atoms NE1 shows the highest number of interactions, both along the edge (hydrogen bonding) as well as along the face. Various weak but specific interactions, engendering stability to the protein structure, have been identified.  相似文献   

12.
Aromatic residues have been previously shown to mediate the self-assembly of different soluble proteins through pi-pi interactions (McGaughey, G. B., Gagne, M., and Rappe, A. K. (1998) J. Biol. Chem. 273, 15458-15463). However, their role in transmembrane (TM) assembly is not yet clear. In this study, we performed statistical analysis of the frequency of occurrence of aromatic pairs in a bacterial TM data base that provided an initial indication that the appearance of a specific aromatic pattern, Aromatic-XX-Aromatic, is not coincidental, similar to the well characterized QXXS motif. The QXXS motif was previously shown to be both critical and sufficient for stabilizing TM self-assembly. Using the ToxR system, we monitored the dimerization propensities of TM domains that contain mutations of interacting residues to aromatic amino acids and demonstrated that aromatic residues can adequately stabilize self-association. Importantly, we have provided an example of a natural TM domain, the cholera toxin secretion protein EpsM, whose TM self-assembly is mediated by an aromatic motif (WXXW). This is, in fact, the first evidence that aromatic residues are involved in the dimerization of a wild type TM domain. The association mediated by aromatic residues was found to be sensitive to the TM sequence, suggesting that aromatic residue motifs can provide a general means for specificity in TM assembly. Molecular dynamics provided a structural explanation for this backbone sequence sensitivity.  相似文献   

13.
The constrained backbone torsion angle of a proline (Pro) residue has usually been invoked to explain its three-dimensional context in proteins. Here we show that specific interactions involving the pyrrolidine ring atoms also contribute to its location in a given secondary structure and its binding to another molecule. It is adept at participating in two rather non-conventional interactions, C-H...pi and C-H...O. The geometry of interaction between the pyrrolidine and aromatic rings, vis-à-vis the occurrence of the C-H...pi interactions has been elucidated. Some of the secondary structural elements stabilized by Pro-aromatic interactions are beta-turns, where a Pro can interact with an adjacent aromatic residue, and in antiparallel beta-sheet, where a Pro in an edge strand can interact with an aromatic residue in the adjacent strand at a non-hydrogen-bonded site. The C-H groups at the Calpha and Cdelta positions can form strong C-H...O interactions (as seen from the clustering of points) and such interactions involving a Pro residue at C' position relative to an alpha-helix can cap the hydrogen bond forming potentials of the free carbonyl groups at the helix C terminus. Functionally important Pro residues occurring at the binding site of a protein almost invariably engage aromatic residues (with one of them being held by C-H...pi interaction) from the partner molecule in the complex, and such aromatic residues are highly conserved during evolution.  相似文献   

14.
Cation-pi interactions play an important role to the stability of protein structures. In our earlier work, we have analyzed the influence and energetic contribution of cation-pi interactions in three-dimensional structures of membrane proteins. In this work, we investigate the characteristic features of residues that are involved in cation-pi interactions. We have computed several parameters, such as surrounding hydrophobicity, number of long-range contacts, conservation score and normalized B-factor for all these residues and identified their location, whether in the membrane or at surface. We found that the cation-pi interactions are mainly formed by long-range interactions. The cationic residues involved in cation-pi interactions have higher surrounding hydrophobicity than their average values in the whole dataset and an opposite trend is observed for aromatic residues. In transmembrane helical proteins, except Phe, all other residues that are responsible for cation-pi interactions are highly conserved with other related protein sequences whereas in transmembrane strand proteins, an appreciable conservation is observed only for Arg. The analysis on the flexibility of residues reveals that the cation-pi interaction forming residues are more stable than other residues. The results obtained in the present study would be helpful to understand the role of cation-pi interactions in the structure and folding of membrane proteins.  相似文献   

15.
P Gettins 《Biochemistry》1987,26(5):1391-1398
1H NMR has been used to characterize and compare the structures of antithrombin III from human, bovine, and porcine plasma as well as to investigate the interactions of each of these proteins with heparin fragments of defined length. The amino acid compositions of the three proteins are very similar, which is reflected in the gross features of their 1H NMR spectra. In addition, aromatic and methyl proton resonances in upfield-shifted positions appear to be common to all three proteins and suggest similar tertiary structures. Human antithrombin III has five histidine residues, bovine has six, and porcine has five. The C(2) proton from each of these residues gives a narrow resonance and titrates with pH; the pKa's are in the range 5.15-7.25. It is concluded that all histidines in each protein are surface residues with considerable independent mobility. The carbohydrate chains in each protein also give sharp resonances consistent with a surface location and motional flexibility. The 1H spectra are sensitive to heparin binding. Although heparin resonances obscure protein resonances in the region 3.2-6.0 ppm, difference spectra between antithrombin III with and without heparin show clear perturbation of a small number of aromatic and aliphatic protein protons. These resonances include those of histidine C(2) and C(4) protons, of 10-20 other aromatic protons, of a methyl group, and also of protons with chemical shifts similar to those of lysine and/or arginine side chains. For human antithrombin III, it was shown that heparin fragments 8, 10, and 16 sugar residues in length result in almost identical perturbations to the protein. In contrast, tetrasaccharide results in fewer perturbations.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

16.
A low-molecular-weight protein induced in the liver of the plaice (Pleuronectes platessa) by exposure to cadmium was purified and characterized. It is closely similar to mammalian metallothioneins in all of its properties in that it is a single-chain cadmium-binding protein of approx. 7000 mol.wt. with a high cysteine content (31 mol%) and no aromatic amino acid residues. The thiol groups of the cysteine residues complex with the cadmium in a SH/Cd molar ratio of 3:1 and produce a characteristic absorption maximum at 250 nm. Unlike the mammalian metallothioneins, however, metal analyses reveal only traces of zinc and copper in addition to cadmium. The presence of carbohydrate previously assumed from a positive reaction with periodic acid/Schiff reagent has now been disproved, and the positive reaction attributed to interaction with the thiol groups in the protein.  相似文献   

17.
We investigated the spectroscopic properties of the aromatic residues in a set of octapeptides with various self-assembly properties. These octapeptides are based on lanreotide, a cyclic peptide analogue of somatostatin-14 that spontaneously self-assembles into very long and monodisperse hollow nanotubes. A previous study on these lanreotide-based derivatives has shown that the disulfide bridge, the peptide hairpin conformation and the aromatic residues are involved in the self-assembly process and that modification of these properties either decreases the self-assembly propensity or modifies the molecular packing resulting in different self-assembled architectures. In this study we probed the local environment of the aromatic residues, naphthyl-alanine, tryptophan and tyrosine, by Raman and fluorescence spectroscopy, comparing nonassembled peptides at low concentrations with the self-assembled ones at high concentrations. As expected, the spectroscopic characteristics of the aromatic residues were found to be sensitive to the peptide-peptide interactions. Among the most remarkable features we could record a very unusual Raman spectrum for the tyrosine of lanreotide in relation to its propensity to form H-bonds within the assemblies. In Lanreotide nanotubes, and also in the supramolecular architectures formed by its derivatives, the tryptophan side chain is water-exposed. Finally, the low fluorescence polarization of the peptide aggregates suggests that fluorescence energy transfer occurs within the nanotubes.  相似文献   

18.
Qamra R  Prakash P  Aruna B  Hasnain SE  Mande SC 《Biochemistry》2006,45(23):6997-7005
Chorismate mutase catalyzes the first committed step toward the biosynthesis of the aromatic amino acids, phenylalanine and tyrosine. While this biosynthetic pathway exists exclusively in the cell cytoplasm, the Mycobacterium tuberculosis enzyme has been shown to be secreted into the extracellular medium. The secretory nature of the enzyme and its existence in M. tuberculosis as a duplicated gene are suggestive of its role in host-pathogen interactions. We report here the crystal structure of homodimeric chorismate mutase (Rv1885c) from M. tuberculosis determined at 2.15 A resolution. The structure suggests possible gene duplication within each subunit of the dimer (residues 35-119 and 130-199) and reveals an interesting proline-rich region on the protein surface (residues 119-130), which might act as a recognition site for protein-protein interactions. The structure also offers an explanation for its regulation by small ligands, such as tryptophan, a feature previously unknown in the prototypical Escherichia coli chorismate mutase. The tryptophan ligand is found to be sandwiched between the two monomers in a dimer contacting residues 66-68. The active site in the "gene-duplicated" monomer is occupied by a sulfate ion and is located in the first half of the polypeptide, unlike in the Saccharomyces cerevisiae (yeast) enzyme, where it is located in the later half. We hypothesize that the M. tuberculosis chorismate mutase might have a role to play in host-pathogen interactions, making it an important target for designing inhibitor molecules against the deadly pathogen.  相似文献   

19.
20.
Membrane protein folding and topogenesis are tuned to a given lipid profile since lipids and proteins have co-evolved to follow a set of interdependent rules governing final protein topological organization. Transmembrane domain (TMD) topology is determined via a dynamic process in which topogenic signals in the nascent protein are recognized and interpreted initially by the translocon followed by a given lipid profile in accordance with the Positive Inside Rule. The net zero charged phospholipid phosphatidylethanolamine and other neutral lipids dampen the translocation potential of negatively charged residues in favor of the cytoplasmic retention potential of positively charged residues (Charge Balance Rule). This explains why positively charged residues are more potent topological signals than negatively charged residues. Dynamic changes in orientation of TMDs during or after membrane insertion are attributed to non-sequential cooperative and collective lipid–protein charge interactions as well as long-term interactions within a protein. The proportion of dual topological conformers of a membrane protein varies in a dose responsive manner with changes in the membrane lipid composition not only in vivo but also in vitro and therefore is determined by the membrane lipid composition. Switching between two opposite TMD topologies can occur in either direction in vivo and also in liposomes (designated as fliposomes) independent of any other cellular factors. Such lipid-dependent post-insertional reversibility of TMD orientation indicates a thermodynamically driven process that can occur at any time and in any cell membrane driven by changes in the lipid composition. This dynamic view of protein topological organization influenced by the lipid environment reveals previously unrecognized possibilities for cellular regulation and understanding of disease states resulting from mis-folded proteins. This article is part of a Special Issue entitled: Protein trafficking and secretion in bacteria. Guest Editors: Anastassios Economou and Ross Dalbey.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号