首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.  相似文献   

3.
We propose a feature vector approach to characterize the variation in large data sets of biological sequences. Each candidate sequence produces a single feature vector constructed with the number and location of amino acids or nucleic acids in the sequence. The feature vector characterizes the distance between the actual sequence and a model of a theoretical sequence based on the binomial and uniform distributions. This method is distinctive in that it does not rely on sequence alignment for determining protein relatedness, allowing the user to visualize the relationships within a set of proteins without making a priori assumptions about those proteins. We apply our method to two large families of proteins: protein kinase C, and globins, including hemoglobins and myoglobins. We interpret the high-dimensional feature vectors using principal components analysis and agglomerative hierarchical clustering. We find that the feature vector retains much of the information about the original sequence. By using principal component analysis to extract information from collections of feature vectors, we are able to quickly identify the nature of variation in a collection of proteins. Where collections are phylogenetically or functionally related, this is easily detected. Hierarchical agglomerative clustering provides a means of constructing cladograms from the feature vector output.  相似文献   

4.
Protein–protein interactions (PPIs) describe the direct physical contact of two proteins that usually results in specific biological functions or regulatory processes. The characterization and study of PPIs through the investigation of their pattern and principle have remained a question in biological studies. Various experimental and computational methods have been used for PPI studies, but most of them are based on the sequence similarity with current validated PPI participators or cellular localization patterns. Most methods ignore the fact that PPIs are defined by their specific biological functions. In this study, we constructed a novel rule-based computational method using gene ontology and KEGG pathway annotation of PPI participators that correspond to the complicated biological effects of PPIs. Our newly presented computational method identified a group of biological functions that are tightly associated with PPIs and provided a new function-based tool for PPI studies in a rule manner.  相似文献   

5.
Protein–protein interactions (PPIs) are essential in the regulation of biological functions and cell events, therefore understanding PPIs have become a key issue to understanding the molecular mechanism and investigating the design of drugs. Here we highlight the major developments in computational methods developed for predicting PPIs by using types of artificial intelligence algorithms. The first part introduces the source of experimental PPI data. The second part is devoted to the PPI prediction methods based on sequential information. The third part covers representative methods using structural information as the input feature. The last part is methods designed by combining different types of features. For each part, the state-of-the-art computational PPI prediction methods are reviewed in an inclusive view. Finally, we discuss the flaws existing in this area and future directions of next-generation algorithms.  相似文献   

6.
Guo Y  Yu L  Wen Z  Li M 《Nucleic acids research》2008,36(9):3025-3030
Compared to the available protein sequences of different organisms, the number of revealed protein-protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11,474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.  相似文献   

7.
Given a new uncharacterized protein sequence, a biologist may want to know whether it is a membrane protein or not? If it is, which membrane protein type it belongs to? Knowing the type of an uncharacterized membrane protein often provides useful clues for finding the biological function of the query protein, developing the computational methods to address these questions can be really helpful. In this study, a sequence encoding scheme based on combing pseudo position-specific score matrix (PsePSSM) and dipeptide composition (DC) is introduced to represent protein samples. However, this sequence encoding scheme would correspond to a very high dimensional feature vector. A dimensionality reduction algorithm, the so-called geometry preserving projections (GPP) is introduced to extract the key features from the high-dimensional space and reduce the original high-dimensional vector to a lower-dimensional one. Finally, the K-nearest neighbor (K-NN) and support vector machine (SVM) classifiers are employed to identify the types of membrane proteins based on their reduced low-dimensional features. Our jackknife and independent dataset test results thus obtained are quite encouraging, which indicate that the above methods are used effectively to deal with this complicated problem of predicting the membrane protein type.  相似文献   

8.
Protein-protein interactions (PPIs) are crucial to most biochemical processes in human beings. Although many human PPIs have been identified by experiments, the number is still limited compared to the available protein sequences of human organisms. Recently, many computational methods have been proposed to facilitate the recognition of novel human PPIs. However the existing methods only concentrated on the information of individual PPI, while the systematic characteristic of protein-protein interaction networks (PINs) was ignored. In this study, a new method was proposed by combining the global information of PINs and protein sequence information. Random forest (RF) algorithm was implemented to develop the prediction model, and a high accuracy of 91.88% was obtained. Furthermore, the RF model was tested using three independent datasets with good performances, suggesting that our method is a useful tool for identification of PPIs and investigation into PINs as well.  相似文献   

9.
Protein-protein interaction (PPI) maps provide insight into cellular biology and have received considerable attention in the post-genomic era. While large-scale experimental approaches have generated large collections of experimentally determined PPIs, technical limitations preclude certain PPIs from detection. Recently, we demonstrated that yeast PPIs can be computationally predicted using re-occurring short polypeptide sequences between known interacting protein pairs. However, the computational requirements and low specificity made this method unsuitable for large-scale investigations. Here, we report an improved approach, which exhibits a specificity of approximately 99.95% and executes 16,000 times faster. Importantly, we report the first all-to-all sequence-based computational screen of PPIs in yeast, Saccharomyces cerevisiae in which we identify 29,589 high confidence interactions of approximately 2 x 10(7) possible pairs. Of these, 14,438 PPIs have not been previously reported and may represent novel interactions. In particular, these results reveal a richer set of membrane protein interactions, not readily amenable to experimental investigations. From the novel PPIs, a novel putative protein complex comprised largely of membrane proteins was revealed. In addition, two novel gene functions were predicted and experimentally confirmed to affect the efficiency of non-homologous end-joining, providing further support for the usefulness of the identified PPIs in biological investigations.  相似文献   

10.
Predicting protein–protein interactions (PPIs) is a challenging task and essential to construct the protein interaction networks, which is important for facilitating our understanding of the mechanisms of biological systems. Although a number of high‐throughput technologies have been proposed to predict PPIs, there are unavoidable shortcomings, including high cost, time intensity, and inherently high false positive rates. For these reasons, many computational methods have been proposed for predicting PPIs. However, the problem is still far from being solved. In this article, we propose a novel computational method called RVM‐BiGP that combines the relevance vector machine (RVM) model and Bi‐gram Probabilities (BiGP) for PPIs detection from protein sequences. The major improvement includes (1) Protein sequences are represented using the Bi‐gram probabilities (BiGP) feature representation on a Position Specific Scoring Matrix (PSSM), in which the protein evolutionary information is contained; (2) For reducing the influence of noise, the Principal Component Analysis (PCA) method is used to reduce the dimension of BiGP vector; (3) The powerful and robust Relevance Vector Machine (RVM) algorithm is used for classification. Five‐fold cross‐validation experiments executed on yeast and Helicobacter pylori datasets, which achieved very high accuracies of 94.57 and 90.57%, respectively. Experimental results are significantly better than previous methods. To further evaluate the proposed method, we compare it with the state‐of‐the‐art support vector machine (SVM) classifier on the yeast dataset. The experimental results demonstrate that our RVM‐BiGP method is significantly better than the SVM‐based method. In addition, we achieved 97.15% accuracy on imbalance yeast dataset, which is higher than that of balance yeast dataset. The promising experimental results show the efficiency and robust of the proposed method, which can be an automatic decision support tool for future proteomics research. For facilitating extensive studies for future proteomics research, we developed a freely available web server called RVM‐BiGP‐PPIs in Hypertext Preprocessor (PHP) for predicting PPIs. The web server including source code and the datasets are available at http://219.219.62.123:8888/BiGP/ .  相似文献   

11.
Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases.  相似文献   

12.
Global viewing of protein–protein interactions (PPIs)is a useful way to assign biological roles to large numbersof proteins predicted by complete genome sequence. Here, wesystematically analyzed PPIs in the nitrogen-fixing soil bacteriumMesorhizobium loti using a modified high-throughput yeast two-hybridsystem. The aims of this study are primarily on the providingfunctional clues to M. loti proteins that are relevant to symbioticnitrogen fixation and conserved in other rhizobium species,especially proteins with regulatory functions and unannotatedproteins. By the screening of 1542 genes as bait, 3121 independentinteractions involving 1804 proteins (24% of the total proteincoding genes) were identified and each interaction was evaluatedusing an interaction generality (IG) measure and the generalfeatures of the interacting partners. Most PPIs detected inthis study are novel interactions revealing potential functionalrelationships between genes for symbiotic nitrogen fixationand signal transduction. Furthermore, we have predicted theputative functions of unannotated proteins through their interactionswith known proteins. The results described here represent newinsight into protein network of M. loti and provide useful experimentalclues to elucidate the biological function of rhizobial genesthat can not be assigned directly from their genomic sequence.  相似文献   

13.
Protein–protein interactions (PPIs) govern numerous cellular functions in terms of signaling, transport, defense and many others. Designing novel PPIs poses a fundamental challenge to our understanding of molecular interactions. The capability to robustly engineer PPIs has immense potential for the development of novel synthetic biology tools and protein-based therapeutics. Over the last decades, many efforts in this area have relied purely on experimental approaches, but more recently, computational protein design has made important contributions. Template-based approaches utilize known PPIs and transplant the critical residues onto heterologous scaffolds. De novo design instead uses computational methods to generate novel binding motifs, allowing for a broader scope of the sites engaged in protein targets. Here, we review successful design cases, giving an overview of the methodological approaches used for templated and de novo PPI design.  相似文献   

14.
Li ZC  Zhou XB  Dai Z  Zou XY 《Amino acids》2009,37(2):415-425
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.  相似文献   

15.
Protein-protein interactions (PPIs) play an important role in many biological functions. PPIs typically involve binding between domains, the basic units of protein folding, evolution and function. Identifying domain-domain interactions (DDIs) would aid understanding PPI networks. Recently, many computational methods aimed to infer DDIs from databases of interacting proteins and subsequently used the inferred DDIs to predict new PPIs. We attempt to describe systematically current domain-based approaches including the association method, maximum likelihood estimation and parsimonious explanation method. The performance of these methods at inferring DDIs and predicting PPIs was evaluated comparatively. We observe that each method generates artefacts in certain situations and discuss biases in the available benchmark sets.  相似文献   

16.
Protein domains are functional and structural units of proteins. Therefore, identification of domain–domain interactions (DDIs) can provide insight into the biological functions of proteins. In this article, we propose a novel discriminative approach for predicting DDIs based on both protein–protein interactions (PPIs) and the derived information of non‐PPIs. We make a threefold contribution to the work in this area. First, we take into account non‐PPIs explicitly and treat the domain combinations that can discriminate PPIs from non‐PPIs as putative DDIs. Second, DDI identification is formalized as a feature selection problem, in which it tries to find out a minimum set of informative features (i.e., putative DDIs) that discriminate PPIs from non‐PPIs, which is plausible in biology and is able to predict DDIs in a systematic and accurate manner. Third, multidomain combinations including two‐domain combinations are taken into account in the proposed method, where multidomain cooperations may help proteins to interact with each other. Numerical results on several DDI prediction benchmark data sets show that the proposed discriminative method performs comparably well with other top algorithms with respect to overall performance, and outperforms other methods in terms of precision. The PPI data sets used for prediction of DDIs and prediction results can be found at http://csb.shu.edu.cn/dipd . Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
In multicellular organisms, several biological processes control the rise and fall of life. Different cell types communicate and co-operate in response to different stimulus through cell to cell signaling and regulate biologic processes in the cell/organism. Signaling in multicellular organism has to be made very secretly so that only the target cell responds to the signal. Of all the biomolecules, nature chose mainly proteins for secret delivery of information both inside and outside the cell. During cell signaling, proteins physically interact and shake hands for transfer of secret information by a phenomenon called as protein–protein interactions (PPIs). In both, extra and intracellular signaling processes PPIs play a crucial role. PPIs involved in cellular signaling are the primary cause for cell proliferation, differentiation, movement, metabolism, death and various other biological processes not mentioned here. These secret handshakes are very specific for specific functions. Any alterations/malfunctions in particular PPIs results in diseased condition. An overview of signaling pathways and importance of PPIs in cellular function and possibilities of targeting PPIs for novel drug development are discussed in this review.  相似文献   

18.
生物信息学方法预测蛋白质相互作用网络中的功能模块   总被引:1,自引:0,他引:1  
蛋白质相互作用是大多数生命过程的基础。随着高通量实验技术和计算机预测方法的发展,在各种生物中已获得了数目十分庞大的蛋白质相互作用数据,如何从中提取出具有生物学意义的数据是一项艰巨的挑战。从蛋白质相互作用数据出发获得相互作用网络进而预测出其中的功能模块,对于蛋白质功能预测、揭示各种生化反应过程的分子机理都有着极大的帮助。我们分类概括了用生物信息学预测蛋白质相互作用功能模块的方法,以及对这些方法的评价,并介绍了蛋白质相互作用网络比较的一些方法。  相似文献   

19.
As much of the focus of genetics and molecular biology has shifted toward the systems level, it has become increasingly important to accurately extract biologically relevant signal from thousands of related measurements. The common property among these high-dimensional biological studies is that the measured features have a rich and largely unknown underlying structure. One example of much recent interest is identifying differentially expressed genes in comparative microarray experiments. We propose a new approach aimed at optimally performing many hypothesis tests in a high-dimensional study. This approach estimates the optimal discovery procedure (ODP), which has recently been introduced and theoretically shown to optimally perform multiple significance tests. Whereas existing procedures essentially use data from only one feature at a time, the ODP approach uses the relevant information from the entire data set when testing each feature. In particular, we propose a generally applicable estimate of the ODP for identifying differentially expressed genes in microarray experiments. This microarray method consistently shows favorable performance over five highly used existing methods. For example, in testing for differential expression between two breast cancer tumor types, the ODP provides increases from 72% to 185% in the number of genes called significant at a false discovery rate of 3%. Our proposed microarray method is freely available to academic users in the open-source, point-and-click EDGE software package.  相似文献   

20.
Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein–protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR–KNNs–wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号