首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 195 毫秒
1.

Background

Prediction and analysis of protein-protein interactions (PPI) and specifically types of PPIs is an important problem in life science research because of the fundamental roles of PPIs in many biological processes in living cells. In addition, electrostatic interactions are important in understanding inter-molecular interactions, since they are long-range, and because of their influence in charged molecules. This is the main motivation for using electrostatic energy for prediction of PPI types.

Results

We propose a prediction model to analyze protein interaction types, namely obligate and non-obligate, using electrostatic energy values as properties. The prediction approach uses electrostatic energy values for pairs of atoms and amino acids present in interfaces where the interaction occurs. The main features of the complexes are found and then the prediction is performed via several state-of-the-art classification techniques, including linear dimensionality reduction (LDR), support vector machine (SVM), naive Bayes (NB) and k-nearest neighbor (k-NN). For an in-depth analysis of classification results, some other experiments were performed by varying the distance cutoffs between atom pairs of interacting chains, ranging from 5Å to 13Å. Moreover, several feature selection algorithms including gain ratio (GR), information gain (IG), chi-square (Chi2) and minimum redundancy maximum relevance (mRMR) are applied on the available datasets to obtain more discriminative pairs of atom types and amino acid types as features for prediction.

Conclusions

Our results on two well-known datasets of obligate and non-obligate complexes confirm that electrostatic energy is an important property to predict obligate and non-obligate protein interaction types on the basis of all the experimental results, achieving accuracies of over 98%. Furthermore, a comparison performed by changing the distance cutoff demonstrates that the best values for prediction of PPI types using electrostatic energy range from 9Å to 12Å, which show that electrostatic interactions are long-range and cover a broader area in the interface. In addition, the results on using feature selection before prediction confirm that (a) a few pairs of atoms and amino acids are appropriate for prediction, and (b) prediction performance can be improved by eliminating irrelevant and noisy features and selecting the most discriminative ones.
  相似文献   

2.

Background

One of the crucial steps toward understanding the biological functions of a cellular system is to investigate protein–protein interaction (PPI) networks. As an increasing number of reliable PPIs become available, there is a growing need for discovering PPIs to reconstruct PPI networks of interesting organisms. Some interolog-based methods and homologous PPI families have been proposed for predicting PPIs from the known PPIs of source organisms.

Results

Here, we propose a multiple-strategy scoring method to identify reliable PPIs for reconstructing the mouse PPI network from two well-known organisms: human and fly. We firstly identified the PPI candidates of target organisms based on homologous PPIs, sharing significant sequence similarities (joint E-value ≤ 1 × 10−40), from source organisms using generalized interolog mapping. These PPI candidates were evaluated by our multiple-strategy scoring method, combining sequence similarities, normalized ranks, and conservation scores across multiple organisms. According to 106,825 PPI candidates in yeast derived from human and fly, our scoring method can achieve high prediction accuracy and outperform generalized interolog mapping. Experiment results show that our multiple-strategy score can avoid the influence of the protein family size and length to significantly improve PPI prediction accuracy and reflect the biological functions. In addition, the top-ranked and conserved PPIs are often orthologous/essential interactions and share the functional similarity. Based on these reliable predicted PPIs, we reconstructed a comprehensive mouse PPI network, which is a scale-free network and can reflect the biological functions and high connectivity of 292 KEGG modules, including 216 pathways and 76 structural complexes.

Conclusions

Experimental results show that our scoring method can improve the predicting accuracy based on the normalized rank and evolutionary conservation from multiple organisms. Our predicted PPIs share similar biological processes and cellular components, and the reconstructed genome-wide PPI network can reflect network topology and modularity. We believe that our method is useful for inferring reliable PPIs and reconstructing a comprehensive PPI network of an interesting organism.  相似文献   

3.
For a better comprehension of the structure-function relationship in proteins it is necessary to identify the amino acids that are relevant for measurable protein functions. Because of the numerous contacts that amino acids establish within proteins and the cooperative nature of their interactions, it is difficult to achieve this goal. Thus, the study of protein-ligand interactions is usually focused on local environmental structural differences. Here, using a pair of triosephosphate isomerase enzymes with extremely high homology from two different organisms, we demonstrate that the control of a seventy-fold difference in reactivity of the interface cysteine is located in several amino acids from two structurally unrelated regions that do not contact the cysteine sensitive to the sulfhydryl reagent methylmethane sulfonate, nor the residues in its immediate vicinity. The change in reactivity is due to an increase in the apparent pKa of the interface cysteine produced by the mutated residues. Our work, which involved grafting systematically portions of one protein into the other protein, revealed unsuspected and multisite long-range interactions that modulate the properties of the interface cysteines and has general implications for future studies on protein structure-function relationships.  相似文献   

4.
Sequence comparison of the heterocyst-type ferredoxin (FdxH) from Anabaena 7120 and type-I ferredoxins (PetF) from the same organism and other cyanobacteria revealed a group of positively charged residues characteristic for FdxH. Molecular modeling showed that these basic amino acids are clustered on the surface of FdxH. The corresponding domain of PetF contained acidic or nonpolar residues instead. To identify amino acids that are important for interaction with nitrogenase, we generated site-directed mutations in the fdxH gene and assayed the in vitro activity of the resulting recombinant proteins isolated from Escherichia coli. In addition to the point mutants, two chimeric proteins, FdxH : PetF and PetF : FdxH, were constructed containing the 58 N-terminal amino acids of one ferredoxin fused to the 40 C-terminal amino acids of the other. Exchange of lysines 10 and 11 of FdxH for the corresponding residues of PetF (glutamate 10 and alanine 11) resulted in a ferredoxin with greatly decreased affinity to nitrogenase. This indicates an important function of these basic amino acids in interaction with dinitrogenase reductase (NifH) from Anabaena. In addition we checked the reactivity of the recombinant ferredoxins with ferredoxin-NADP+ oxidoreductase (FNR) and photosystem I. The experiments with both the chimeric and point mutated ferredoxins showed that the C-terminal part of this protein determines its activity in NADP+ photoreduction.  相似文献   

5.

Background

Protein-protein interactions play a critical role in protein function. Completion of many genomes is being followed rapidly by major efforts to identify interacting protein pairs experimentally in order to decipher the networks of interacting, coordinated-in-action proteins. Identification of protein-protein interaction sites and detection of specific amino acids that contribute to the specificity and the strength of protein interactions is an important problem with broad applications ranging from rational drug design to the analysis of metabolic and signal transduction networks.

Results

In order to increase the power of predictive methods for protein-protein interaction sites, we have developed a consensus methodology for combining four different methods. These approaches include: data mining using Support Vector Machines, threading through protein structures, prediction of conserved residues on the protein surface by analysis of phylogenetic trees, and the Conservatism of Conservatism method of Mirny and Shakhnovich. Results obtained on a dataset of hydrolase-inhibitor complexes demonstrate that the combination of all four methods yield improved predictions over the individual methods.

Conclusions

We developed a consensus method for predicting protein-protein interface residues by combining sequence and structure-based methods. The success of our consensus approach suggests that similar methodologies can be developed to improve prediction accuracies for other bioinformatic problems.  相似文献   

6.
Analysis of proteins commonly requires the partition of their structure into regions such as the surface, interior, or interface. Despite the frequent use of such categorization, no consensus definition seems to exist. This study thus aims at providing a definition that is general, is simple to implement, and yields new biological insights. This analysis relies on 397, 196, and 701 protein structures from Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens, respectively, and the conclusions are consistent across all three species. A threshold of 25% relative accessible surface area best segregates amino acids at the interior and at the surface. This value is further used to extend the core-rim model of protein-protein interfaces and to introduce a third region called support. Interface core, rim, and support regions contain similar numbers of residues on average, but core residues contribute over two-thirds of the contact surface. The amino acid composition of each region remains similar across different organisms and interface types. The interface core composition is intermediate between the surface and the interior, but the compositions of the support and the rim are virtually identical with those of the interior and the surface, respectively. The support and rim could thus “preexist” in proteins, and evolving a new interaction could require mutations to form an interface core only. Using the interface regions defined, it is shown through simulations that only two substitutions are necessary to shift the average composition of a  1000-Å2 surface patch involving ∼ 28 residues to that of an equivalent interface. This analysis and conclusions will help understand the notion of promiscuity in protein-protein interaction networks.  相似文献   

7.
The study of gene and protein interaction networks has improved our understanding of the multiple, systemic levels of regulation found in eukaryotic and prokaryotic organisms. Here we carry out a large-scale analysis of the protein-protein interaction (PPI) network of fission yeast (Schizosaccharomyces pombe) and establish a method to identify ‘linker’ proteins that bridge diverse cellular processes - integrating Gene Ontology and PPI data with network theory measures. We test the method on a highly characterized subset of the genome consisting of proteins controlling the cell cycle, cell polarity and cytokinesis and identify proteins likely to play a key role in controlling the temporal changes in the localization of the polarity machinery. Experimental inspection of one such factor, the polarity-regulating RNB protein Sts5, confirms the prediction that it has a cell cycle dependent regulation. Detailed bibliographic inspection of other predicted ‘linkers’ also confirms the predictive power of the method. As the method is robust to network perturbations and can successfully predict linker proteins, it provides a powerful tool to study the interplay between different cellular processes.  相似文献   

8.
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.  相似文献   

9.
Porcine pleuropneumonia caused by Actinobacillus pleuropneumoniae has led to severe economic losses in the pig industry worldwide. A. pleuropneumoniae displays various levels of antimicrobial resistance, leading to the dire need to identify new drug targets. Protein–protein interaction (PPI) network can aid the identification of drug targets by discovering essential proteins during the life of bacteria. The aim of this study is to identify drug target candidates of A. pleuropneumoniae from essential proteins in PPI network. The homologous protein mapping method (HPM) was utilized to construct A. pleuropneumoniae PPI network. Afterwards, the subnetwork centered with H-NS was selected to verify the PPI network using bacterial two-hybrid assays. Drug target candidates were identified from the hub proteins by analyzing the topology of the network using interaction degree and homologous comparison with the pig proteome. An A. pleuropneumoniae PPI network containing 2737 non-redundant interaction pairs among 533 proteins was constructed. These proteins were distributed in 21 COG functional categories and 28 KEGG metabolic pathways. The A. pleuropneumoniae PPI network was scale free and the similar topological tendencies were found when compared with other bacteria PPI network. Furthermore, 56.3% of the H-NS subnetwork interactions were validated. 57 highly connected proteins (hub proteins) were identified from the A. pleuropneumoniae PPI network. Finally, 9 potential drug targets were identified from the hub proteins, with no homologs in swine. This study provides drug target candidates, which are promising for further investigations to explore lead compounds against A. pleuropneumoniae.  相似文献   

10.
A proteome-wide protein-protein interaction (PPI) network of Methanobrevibacter ruminantium M1 (MRU), a predominant rumen methanogen, was constructed from its metabolic genes using a gene neighborhood algorithm and then compared with closely related rumen methanogens Using proteome-wide PPI approach, we constructed network encompassed 2194 edges and 637 nodes interacting with 634 genes. Network quality and robustness of functional modules were assessed with gene ontology terms. A structure-function-metabolism mapping for each protein has been carried out with efforts to extract experimental PPI concomitant information from the literature. The results of our study revealed that some topological properties of its network were robust for sharing homologous protein interactions across heterotrophic and hydrogenotrophic methanogens. MRU proteome has shown to establish many PPI sub-networks for associated metabolic subsystems required to survive in the rumen environment. MRU genome found to share interacting proteins from its PPI network involved in specific metabolic subsystems distinct to heterotrophic and hydrogenotrophic methanogens. Across these proteomes, the interacting proteins from differential PPI networks were shared in common for the biosynthesis of amino acids, nucleosides, and nucleotides and energy metabolism in which more fractions of protein pairs shared with Methanosarcina acetivorans. Our comparative study expedites our knowledge to understand a complex proteome network associated with typical metabolic subsystems of MRU and to improve its genome-scale reconstruction in the future.  相似文献   

11.
Chen CT  Peng HP  Jian JW  Tsai KC  Chang JY  Yang EW  Chen JB  Ho SY  Hsu WL  Yang AS 《PloS one》2012,7(6):e37706
Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.  相似文献   

12.
Protein-protein interactions, a key to almost any biological process, are mediated by molecular mechanisms that are not entirely clear. The study of these mechanisms often focuses on all residues at protein-protein interfaces. However, only a small subset of all interface residues is actually essential for recognition or binding. Commonly referred to as "hotspots," these essential residues are defined as residues that impede protein-protein interactions if mutated. While no in silico tool identifies hotspots in unbound chains, numerous prediction methods were designed to identify all the residues in a protein that are likely to be a part of protein-protein interfaces. These methods typically identify successfully only a small fraction of all interface residues. Here, we analyzed the hypothesis that the two subsets correspond (i.e., that in silico methods may predict few residues because they preferentially predict hotspots). We demonstrate that this is indeed the case and that we can therefore predict directly from the sequence of a single protein which residues are interaction hotspots (without knowledge of the interaction partner). Our results suggested that most protein complexes are stabilized by similar basic principles. The ability to accurately and efficiently identify hotspots from sequence enables the annotation and analysis of protein-protein interaction hotspots in entire organisms and thus may benefit function prediction and drug development. The server for prediction is available at http://www.rostlab.org/services/isis.  相似文献   

13.
14.
This paper presents an in silico characterization of the chitin binding protein CBP50 from B. thuringiensis serovar konkukian S4 through homology modeling and molecular docking. The CBP50 has shown a modular structure containing an N-terminal CBM33 domain, two consecutive fibronectin-III (Fn-III) like domains and a C-terminal CBM5 domain. The protein presented a unique modular structure which could not be modeled using ordinary procedures. So, domain wise modeling using MODELLER and docking analyses using Autodock Vina were performed. The best conformation for each domain was selected using standard procedure. It was revealed that four amino acid residues Glu-71, Ser-74, Glu-76 and Gln-90 from N-terminal domain are involved in protein-substrate interaction. Similarly, amino acid residues Trp-20, Asn-21, Ser-23 and Val-30 of Fn-III like domains and Glu-15, Ala-17, Ser-18 and Leu-35 of C-terminal domain were involved in substrate binding. Site-directed mutagenesis of these proposed amino acid residues in future will elucidate the key amino acids involved in chitin binding activity of CBP50 protein.  相似文献   

15.
ALPL encodes the tissue nonspecific alkaline phosphatase (TNSALP), which removes phosphate groups from various substrates. Its function is essential for bone and tooth mineralization. In humans, ALPL mutations lead to hypophosphatasia, a genetic disorder characterized by defective bone and/or tooth mineralization. To date, 275 ALPL mutations have been reported to cause hypophosphatasia, of which 204 were simple missense mutations. Molecular evolutionary analysis has proved to be an efficient method to highlight residues important for the protein function and to predict or validate sensitive positions for genetic disease. Here we analyzed 58 mammalian TNSALP to identify amino acids unchanged, or only substituted by residues sharing similar properties, through 220 millions years of mammalian evolution. We found 469 sensitive positions of the 524 residues of human TNSALP, which indicates a highly constrained protein. Any substitution occurring at one of these positions is predicted to lead to hypophosphatasia. We tested the 204 missense mutations resulting in hypophosphatasia against our predictive chart, and validated 99% of them. Most sensitive positions were located in functionally important regions of TNSALP (active site, homodimeric interface, crown domain, calcium site, …). However, some important positions are located in regions, the structure and/or biological function of which are still unknown. Our chart of sensitive positions in human TNSALP (i) enables to validate or invalidate at low cost any ALPL mutation, which would be suspected to be responsible for hypophosphatasia, by contrast with time consuming and expensive functional tests, and (ii) displays higher predictive power than in silico models of prediction.  相似文献   

16.
柽柳金属硫蛋白基因的克隆及序列分析   总被引:2,自引:0,他引:2  
张艳  杨传平  王玉成 《植物研究》2007,27(3):293-296
用木麻黄(Casuarina glauca)的金属硫蛋白基因(metallothionein 1)氨基酸序列对柽柳ESTs序列本地数据库进行tBlastn检索,获得了柽柳金属硫蛋白基因全长cDNA序列,去除polyA后该基因全长366 bp,其中5′非翻译区97 bp,3′非翻译区59 bp,开放读码框(ORF)长210 bp,编码70 个氨基酸组成的多肽,蛋白分子量为6.793 kD,理论等电点为4.99,含10个Cys,集中分布在肽链的N端和C端。BlastP同源性分析表明该基因与花生同源性最高,与小豆同源性最低。该基因的EST序列在GenBank登录(登录号:CV792539)。  相似文献   

17.
Protein–protein interaction (PPI) establishes the central basis for complex cellular networks in a biological cell. Association of proteins with other proteins occurs at varying affinities, yet with a high degree of specificity. PPIs lead to diverse functionality such as catalysis, regulation, signaling, immunity, and inhibition, playing a crucial role in functional genomics. The molecular principle of such interactions is often elusive in nature. Therefore, a comprehensive analysis of known protein complexes from the Protein Data Bank (PDB) is essential for the characterization of structural interface features to determine structure–function relationship. Thus, we analyzed a nonredundant dataset of 278 heterodimer protein complexes, categorized into major functional classes, for distinguishing features. Interestingly, our analysis has identified five key features (interface area, interface polar residue abundance, hydrogen bonds, solvation free energy gain from interface formation, and binding energy) that are discriminatory among the functional classes using Kruskal-Wallis rank sum test. Significant correlations between these PPI interface features amongst functional categories are also documented. Salt bridges correlate with interface area in regulator-inhibitors (r = 0.75). These representative features have implications for the prediction of potential function of novel protein complexes. The results provide molecular insights for better understanding of PPIs and their relation to biological functions.  相似文献   

18.
Calcium ion is thought to be one of the initial signals in the process of synaptic modification. Various reports have described that the critical amino acids responsible for determining calcium permeability of ion channels are glutamic acid, glutamine, arginine, and asparagine. By using a computational method (MacPROT) distinguishing transmembrane, globular, and surface sequences of proteins, the present work predicts that the critical amino acids exist within surface regions of the proteins. Furthermore, occurrence ofβ-turn probabilities can be predicted around these critical residues by the protein conformational prediction method of Chou and Fasman. The results suggest that the critical amino acids exist at hydrophilic spaces or canals of membranous channel proteins and that the redirection potential of the protein chain induced by the turn structures provides the conformational change requisite for the ion selectivity and gating (opening/closing) of the channels.  相似文献   

19.
Sequence-based approach for motif prediction is of great interest and remains a challenge. In this work, we develop a local combinational variable approach for sequence-based helix-turn-helix (HTH) motif prediction. First we choose a sequence data set for 88 proteins of 22 amino acids in length to launch an optimized traversal for extracting local combinational segments (LCS) from the data set. Then after LCS refinement, local combinational variables (LCV) are generated to construct prediction models for HTH motifs. Prediction ability of LCV sets at different thresholds is calculated to settle a moderate threshold. The large data set we used comprises 13 HTH families, with 17 455 sequences in total. Our approach predicts HTH motifs more precisely using only primary protein sequence information, with 93.29% accuracy, 93.93% sensitivity and 92.66% specificity. Prediction results of newly reported HTH-containing proteins compared with other prediction web service presents a good prediction model derived from the LCV approach. Comparisons with profile-HMM models from the Pfam protein families database show that the LCV approach maintains a good balance while dealing with HTH-containing proteins and non-HTH proteins at the same time. The LCV approach is to some extent a complementary to the profile-HMM models for its better identification of false-positive data. Furthermore, genome-wide predictions detect new HTH proteins in both Homo sapiens and Escherichia coli organisms, which enlarge applications of the LCV approach. Software for mining LCVs from sequence data set can be obtained from anonymous ftp site ftp://cheminfo.tongji.edu.cn/LCV/freely.  相似文献   

20.
We present a new method for predicting protein–ligand-binding sites based on protein three-dimensional structure and amino acid conservation. This method involves calculation of the van der Waals interaction energy between a protein and many probes placed on the protein surface and subsequent clustering of the probes with low interaction energies to identify the most energetically favorable locus. In addition, it uses amino acid conservation among homologous proteins. Ligand-binding sites were predicted by combining the interaction energy and the amino acid conservation score. The performance of our prediction method was evaluated using a non-redundant dataset of 348 ligand-bound and ligand-unbound protein structure pairs, constructed by filtering entries in a ligand-binding site structure database, LigASite. Ligand-bound structure prediction (bound prediction) indicated that 74.0 % of predicted ligand-binding sites overlapped with real ligand-binding sites by over 25 % of their volume. Ligand-unbound structure prediction (unbound prediction) indicated that 73.9 % of predicted ligand-binding residues overlapped with real ligand-binding residues. The amino acid conservation score improved the average prediction accuracy by 17.0 and 17.6 points for the bound and unbound predictions, respectively. These results demonstrate the effectiveness of the combined use of the interaction energy and amino acid conservation in the ligand-binding site prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号