首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Identifying protein–protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under smgsmg@mail.ustc.edu.cn.  相似文献   

2.
A novel method is proposed for predicting protein–protein interactions (PPIs) based on the meta approach, which predicts PPIs using support vector machine that combines results by six independent state-of-the-art predictors. Significant improvement in prediction performance is observed, when performed on Saccharomyces cerevisiae and Helicobacter pylori datasets. In addition, we used the final prediction model trained on the PPIs dataset of S. cerevisiae to predict interactions in other species. The results reveal that our meta model is also capable of performing cross-species predictions. The source code and the datasets are available at  相似文献   

3.
Guo Y  Yu L  Wen Z  Li M 《Nucleic acids research》2008,36(9):3025-3030
Compared to the available protein sequences of different organisms, the number of revealed protein-protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11,474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.  相似文献   

4.
Discovering and characterizing protein–protein interactions (PPIs) that contribute to cellular homeostasis, development, and disease is a key priority in proteomics. Numerous assays for protein–protein interactions have been developed, but each one comes with its own strengths, weaknesses, and false‐positive/false‐negative rates. Therefore, it seems rather intuitive that combining multiple assays is beneficial for robust and reliable discovery of interactions. Along those lines, in their recent study, Wanker and colleagues (Trepte et al, 2018 ) combined two complementary and quantitative interaction assays in one pot. One assay is luminescence‐based and depends on protein proximity in living cells, while the other relies on formation of more stable complexes detected by co‐precipitation with a luminescence‐based readout, which facilitates confident identification and quantitation of interactions in high throughput.  相似文献   

5.
6.
7.
Heterosis is the phenomenon in which hybrid progeny exhibits superior traits in comparison with those of their parents. Genomic variations between the two parental genomes may generate epistasis interactions, which is one of the genetic hypotheses explaining heterosis. We postulate that protein?protein interactions specific to F1 hybrids (F1‐specific PPIs) may occur when two parental genomes combine, as the proteome of each parent may supply novel interacting partners. To test our assumption, an inter‐subspecies hybrid interactome was simulated by in silico PPI prediction between rice japonica (cultivar Nipponbare) and indica (cultivar 9311). Four‐thousand, six‐hundred and twelve F1‐specific PPIs accounting for 20.5% of total PPIs in the hybrid interactome were found. Genes participating in F1‐specific PPIs tend to encode metabolic enzymes and are generally localized in genomic regions harboring metabolic gene clusters. To test the genetic effect of F1‐specific PPIs in heterosis, genomic selection analysis was performed for trait prediction with additive, dominant and epistatic effects separately considered in the model. We found that the removal of single nucleotide polymorphisms associated with F1‐specific PPIs reduced prediction accuracy when epistatic effects were considered in the model, but no significant changes were observed when additive or dominant effects were considered. In summary, genomic divergence widely dispersed between japonica and indica rice may generate F1‐specific PPIs, part of which may accumulatively contribute to heterosis according to our computational analysis. These candidate F1‐specific PPIs, especially for those involved in metabolic biosynthesis pathways, are worthy of experimental validation when large‐scale protein interactome datasets are generated in hybrid rice in the future.  相似文献   

8.
Annotation of protein functions plays an important role in understanding life at the molecular level. High‐throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time‐consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence‐ and network‐derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low‐dimensional vector which is combined with topological information extracted from protein–protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.  相似文献   

9.
This paper introduces a new subcellular localization system (TSSub) for eukaryotic proteins. This system extracts features from both profiles and amino acid sequences. Four different features are extracted from profiles by four probabilistic neural network (PNN) classifiers, respectively (the amino acid composition from whole profiles; the amino acid composition from the N-terminus of profiles; the dipeptide composition from whole profiles and the amino acid composition from fragments of profiles). In addition, a support vector machine (SVM) classifier is added to implement the residue-couple feature extracted from amino acid sequences. The results from the five classifiers are fused by an additional SVM classifier. The overall accuracies of this TSSub reach 93.0 and 77.4% on Reinhardt and Hubbard's eukaryotic protein dataset and Huang and Li's eukaryotic protein dataset, respectively. The comparison with existing methods results shows TSSub provides better prediction performance than existing methods. AVAILABILITY: The web server is available from http://166.111.24.5/webtools/TSSub/index.html.  相似文献   

10.
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor.  相似文献   

11.
12.
13.
Deciphering protein‐protein interactions (PPIs) is fundamental for understanding signal transduction pathways in plants. The split firefly luciferase (Fluc) complementation (SLC) assay has been widely used for analyzing PPIs. However, concern has risen about the bulky halves of Fluc interfering with the functions of their fusion partners. Nano luciferase (Nluc) is the smallest substitute for Fluc with improved stability and luminescence. Here, we developed a dual‐use system enabling the detection of PPIs through the Nluc‐based SLC and co‐immunoprecipitation assays. This was realized by coexpression of two proteins under investigation in fusion with the HA‐ or FLAG‐tagged Nluc halves, respectively. We validated the robustness of this system by reproducing multiple previously documented PPIs in protoplasts or Agrobacterium‐transformed plants. We next applied this system to evaluate the homodimerization of Arabidopsis CERK1, a coreceptor of fungal elicitor chitin, and its heterodimerization with other homologs in the absence or presence of chitin. Moreover, split fragments of Nluc were fused to two cytosolic ends of Arabidopsis calcium channels CNGC2 and CNGC4 to help sense the allosteric change induced by the bacterial elicitor flg22. Collectively, these results demonstrate the usefulness of the Nluc‐based SLC assay for probing constitutive or inducible PPIs and protein allostery in plant cells.  相似文献   

14.
Membrane proteins are a major class of proteins and encoded by approximately 20% to 30% of genes in most organisms. In this work, a two-layer novel membrane protein prediction system, called Mem-PHybrid, is proposed. It is able to first identify the protein query as a membrane or nonmembrane protein. In the second level, it further identifies the type of membrane protein. The proposed Mem-PHybrid prediction system is based on hybrid features, whereby a fusion of both the physicochemical and split amino acid composition-based features is performed. This enables the proposed Mem-PHybrid to exploit the discrimination capabilities of both types of feature extraction strategy. In addition, minimum redundancy and maximum relevance has also been applied to reduce the dimensionality of a feature vector. We employ random forest, evidence-theoretic K-nearest neighbor, and support vector machine (SVM) as classifiers and analyze their performance on two datasets. SVM using hybrid features yields the highest accuracy of 89.6% and 97.3% on dataset1 and 91.5% and 95.5% on dataset2 for jackknife and independent dataset tests, respectively. The enhanced prediction performance of Mem-PHybrid is largely attributed to the exploitation of the discrimination power of the hybrid features and of the learning capability of SVM. Mem-PHybrid is accessible at http://www.111.68.99.218/Mem-PHybrid.  相似文献   

15.
Elucidation of signaling events in a pathogen is potentially important to tackle the infection caused by it. Such events mediated by protein phosphorylation play important roles in infection, and therefore, to predict the phosphosites and substrates of the serine/threonine protein kinases, we have developed a Machine learning-based approach for Mycobacterium tuberculosis serine/threonine protein kinases using kinase-peptide structure–sequence data. This approach utilizes features derived from kinase three-dimensional-structure environment and known phosphosite sequences to generate support vector machine (SVM)-based kinase-specific predictions of phosphosites of serine/threonine protein kinases (STPKs) with no or scarce data of their substrates. SVM outperformed the four machine learning algorithms we tried (random forest, logistic regression, SVM, and k-nearest neighbors) with an area under the curve receiver-operating characteristic value of 0.88 on the independent testing dataset and a 10-fold cross-validation accuracy of ~81.6% for the final model. Our predicted phosphosites of M. tuberculosis STPKs form a useful resource for experimental biologists enabling elucidation of STPK mediated posttranslational regulation of important cellular processes.  相似文献   

16.
Despite the great interest in identifying protein–protein interactions (PPIs) in biological systems, only a few attempts have been made at large‐scale PPI screening in planta. Unlike biochemical assays, bimolecular fluorescence complementation allows visualization of transient and weak PPIs in vivo at subcellular resolution. However, when the non‐fluorescent fragments are highly expressed, spontaneous and irreversible self‐assembly of the split halves can easily generate false positives. The recently developed tripartite split‐GFP system was shown to be a reliable PPI reporter in mammalian and yeast cells. In this study, we adapted this methodology, in combination with the β‐estradiol‐inducible expression cassette, for the detection of membrane PPIs in planta. Using a transient expression assay by agroinfiltration of Nicotiana benthamiana leaves, we demonstrate the utility of the tripartite split‐GFP association in plant cells and affirm that the tripartite split‐GFP system yields no spurious background signal even with abundant fusion proteins readily accessible to the compartments of interaction. By validating a few of the Arabidopsis PPIs, including the membrane PPIs implicated in phosphate homeostasis, we proved the fidelity of this assay for detection of PPIs in various cellular compartments in planta. Moreover, the technique combining the tripartite split‐GFP association and dual‐intein‐mediated cleavage of polyprotein precursor is feasible in stably transformed Arabidopsis plants. Our results provide a proof‐of‐concept implementation of the tripartite split‐GFP system as a potential tool for membrane PPI screens in planta.  相似文献   

17.
The photoactivatable amino acid p‐benzoyl‐l ‐phenylalanine (pBpa) has been used for the covalent capture of protein–protein interactions (PPIs) in vitro and in living cells. However, this technique often suffers from poor photocrosslinking yields due to the low reactivity of the active species. Here we demonstrate that the incorporation of halogenated pBpa analogs into proteins leads to increased crosslinking yields for protein–protein interactions. The analogs can be incorporated into live yeast and upon irradiation capture endogenous PPIs. Halogenated pBpas will extend the scope of PPIs that can be captured and expand the toolbox for mapping PPIs in their native environment.  相似文献   

18.

Background  

Protein-protein interactions (PPIs) are challenging but attractive targets of small molecule drugs for therapeutic interventions of human diseases. In this era of rapid accumulation of PPI data, there is great need for a methodology that can efficiently select drug target PPIs by holistically assessing the druggability of PPIs. To address this need, we propose here a novel approach based on a supervised machine-learning method, support vector machine (SVM).  相似文献   

19.
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT‐TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross‐validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template‐based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top‐ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

20.
Protein domains are functional and structural units of proteins. Therefore, identification of domain–domain interactions (DDIs) can provide insight into the biological functions of proteins. In this article, we propose a novel discriminative approach for predicting DDIs based on both protein–protein interactions (PPIs) and the derived information of non‐PPIs. We make a threefold contribution to the work in this area. First, we take into account non‐PPIs explicitly and treat the domain combinations that can discriminate PPIs from non‐PPIs as putative DDIs. Second, DDI identification is formalized as a feature selection problem, in which it tries to find out a minimum set of informative features (i.e., putative DDIs) that discriminate PPIs from non‐PPIs, which is plausible in biology and is able to predict DDIs in a systematic and accurate manner. Third, multidomain combinations including two‐domain combinations are taken into account in the proposed method, where multidomain cooperations may help proteins to interact with each other. Numerical results on several DDI prediction benchmark data sets show that the proposed discriminative method performs comparably well with other top algorithms with respect to overall performance, and outperforms other methods in terms of precision. The PPI data sets used for prediction of DDIs and prediction results can be found at http://csb.shu.edu.cn/dipd . Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号