首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server--CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.  相似文献   

2.
3.
Predicting the interactions between all the possible pairs of proteins in a given organism (making a protein-protein interaction map) is a crucial subject in bioinformatics. Most of the previous methods based on supervised machine learning use datasets containing approximately the same number of interacting pairs of proteins (positives) and non-interacting pairs of proteins (negatives) for training a classifier and are estimated to yield a large number of false positives. Thinking that the negatives used in previous studies cannot adequately represent all the negatives that need to be taken into account, we have developed a method based on multiple Support Vector Machines (SVMs) that uses more negatives than positives for predicting interactions between pairs of yeast proteins and pairs of human proteins. We show that the performance of a single SVM improved as we increased the number of negatives used for training and that, if more than one CPU is available, an approach using multiple SVMs is useful not only for improving the performance of classifiers but also for reducing the time required for training them. Our approach can also be applied to assessing the reliability of high-throughput interactions.  相似文献   

4.
5.
6.
7.
K Nakai  M Kanehisa 《Proteins》1991,11(2):95-110
We have developed an expert system that makes use of various kinds of knowledge organized as "if-then" rules for predicting protein localization sites in Gram-negative bacteria, given the amino acid sequence information alone. We considered four localization sites: the cytoplasm, the inner (cytoplasmic) membrane, the periplasm, and the outer membrane. Most rules were derived from experimental observations. For example, the rule to recognize an inner membrane protein is the presence of either a hydrophobic stretch in the predicted mature protein or an uncleavable N-terminal signal sequence. Lipoproteins are first recognized by a consensus pattern and then assumed present at either the inner or outer membrane. These two possibilities are further discriminated by examining an acidic residue in the mature N-terminal portion. Furthermore, we found an empirical rule that periplasmic and outer membrane proteins were successfully discriminated by their different amino acid composition. Overall, our system could predict 83% of the localization sites of proteins in our database.  相似文献   

8.
The nucleus guides life processes of cells. Many of the nuclear proteins participating in the life processes tend to concentrate on subnuclear compartments. The subnuclear localization of nuclear proteins is hence important for deeply understanding the construction and functions of the nucleus. Recently, Gene Ontology (GO) annotation has been used for prediction of subnuclear localization. However, the effective use of GO terms in solving sequence-based prediction problems remains challenging, especially when query protein sequences have no accession number or annotated GO term. This study obtains homologies of query proteins with known accession numbers using BLAST to retrieve GO terms for sequence-based subnuclear localization prediction. A prediction method PGAC, which involves mining informative GO terms associated with amino acid composition features, is proposed to design a support vector machine-based classifier. PGAC yields 55 informative GO terms with training and test accuracies of 85.7% and 76.3%, respectively, using a data set SNL_35 (561 proteins in 9 localizations) with 35% sequence identity. Upon comparison with Nuc-PLoc, which combines amphiphilic pseudo amino acid composition of a protein with its position-specific scoring matrix, PGAC using the data set SNL_80 yields a leave-one-out cross-validation accuracy of 81.1%, which is better than that of Nuc-PLoc, 67.4%. Experimental results show that the set of informative GO terms are effective features for protein subnuclear localization. The prediction server based on PGAC has been implemented at http://iclab.life.nctu.edu.tw/prolocgac.  相似文献   

9.
10.
According to the recent experiments, proteins in budding yeast can be distinctly classified into 22 subcellular locations. Of these proteins, some bear the multi-locational feature, i.e., occur in more than one location. However, so far all the existing methods in predicting protein subcellular location were developed to deal with only the mono-locational case where a query protein is assumed to belong to one, and only one, subcellular location. To stimulate the development of subcellular location prediction, an augmentation procedure is formulated that will enable the existing methods to tackle the multi-locational problem as well. It has been observed thru a jackknife cross-validation test that the success rate obtained by the augmented GO-FnD-PseAA algorithm [BBRC 320 (2004) 1236] is overwhelmingly higher than those by the other augmented methods. It is anticipated that the augmented GO-FunD-PseAA predictor will become a very useful tool in predicting protein subcellular localization for both basic research and practical application.  相似文献   

11.
Wang J  Li C  Wang E  Wang X 《PloS one》2011,6(1):e14449
Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research.  相似文献   

12.
Wild ruminants require energy and protein for the normal function. I developed a system for predicting these energy and protein requirements across ruminant species and life stages. This system defines requirements on the basis of net energy (NE), net protein (NP), and ruminally degraded protein (RDP). Total NE and NP requirements are calculated as the sum of NE and NP required for several functions (maintenance, activity, thermoregulation, gain, lactation, and gestation). To estimate the requirements for each function, I collected data predominantly for wild species and then formulated allometric and other equations that predict requirements across species. I estimated RDP requirements using an equation for cattle. I then related NE, NP, and RDP to quantities more practical for diet formulation (e.g. dry matter intake). I tabulated requirements over a range of body mass and life stages (neonate, juvenile, nonproductive adult, lactating adult, and gestating adult). Tabulated requirements suggest that adults at peak lactation require greatest quantities of energy and neonates generally require greatest quantities of protein, agreeing with suggestions that lactation is energetically expensive and protein is most limiting during growth. Equations used in this system were precise (allometric equations had R2 generally ≥0.89 and coefficient of variation <31.1%) and expected to reliably predict requirements across species. Results showed that a system for beef cattle would overestimate NE and either over‐ or underestimate NP for gain when applied to wild ruminants, showing that systems for wild ruminants should not extrapolate from requirements for domestic ruminants. One prominent system for wild ruminants predicted at times vastly different protein requirements from those predicted by the proposed system. The proposed system should be further evaluated and expanded to include other nutrients. Zoo Biol 30:165–188, 2011. © 2010 Wiley‐Liss, Inc.  相似文献   

13.
An eigenvalue-eigenvector approach to predicting protein folding types   总被引:1,自引:0,他引:1  
The accuracy of predicting protein folding types can be significantly enhanced by a recently developed algorithm in which the coupling effect among different amino acid components is taken into account [Chou and Zhang (1994)J. Biol. Chem. 269, 22014-22020]. However, in practical calculations using this powerful algorithm, one may sometimes face illconditioned matrices. To overcome such a difficulty, an effective eigenvalue-eigenvector approach is proposed. Furthermore, the new approach has been used to predict a recently constructed set of 76 proteins not included in the training set, and the accuracy of prediction is also much higher than those of other methods.  相似文献   

14.
The transmembrane (TM) domains of many integral membrane proteins are composed of alpha-helix bundles. Structure determination at high resolution (<4 A) of TM domains is still exceedingly difficult experimentally. Hence, some TM-protein structures have only been solved at intermediate (5-10 A) or low (>10 A) resolutions using, for example, cryo-electron microscopy (cryo-EM). These structures reveal the packing arrangement of the TM domain, but cannot be used to determine the positions of individual amino acids. The observation that typically, the lipid-exposed faces of TM proteins are evolutionarily more variable and less charged than their core provides a simple rule for orienting their constituent helices. Based on this rule, we developed score functions and automated methods for orienting TM helices, for which locations and tilt angles have been determined using, e.g., cryo-EM data. The method was parameterized with the aim of retrieving the native structure of bacteriorhodopsin among near- and far-from-native templates. It was then tested on proteins that differ from bacteriorhodopsin in their sequences, architectures, and functions, such as the acetylcholine receptor and rhodopsin. The predicted structures were within 1.5-3.5 A from the native state in all cases. We conclude that the computational method can be used in conjunction with cryo-EM data to obtain approximate model structures of TM domains of proteins for which a sufficiently heterogeneous set of homologs is available. We also show that in those proteins in which relatively short loops connect neighboring helices, the scoring functions can discriminate between near- and far-from-native conformations even without the constraints imposed on helix locations and tilt angles that are derived from cryo-EM.  相似文献   

15.
SLLE for predicting membrane protein types   总被引:2,自引:0,他引:2  
Introduction of the concept of pseudo amino acid composition (PROTEINS: Structure, Function, and Genetics 43 (2001) 246; Erratum: ibid. 44 (2001) 60) has made it possible to incorporate a considerable amount of sequence-order effects by representing a protein sample in terms of a set of discrete numbers, and hence can significantly enhance the prediction quality of membrane protein type. As a continuous effort along such a line, the Supervised Locally Linear Embedding (SLLE) technique for nonlinear dimensionality reduction is introduced (Science 22 (2000) 2323). The advantage of using SLLE is that it can reduce the operational space by extracting the essential features from the high-dimensional pseudo amino acid composition space, and that the cluster-tolerant capacity can be increased accordingly. As a consequence by combining these two approaches, high success rates have been observed during the tests of self-consistency, jackknife and independent data set, respectively, by using the simplest nearest neighbour classifier. The current approach represents a new strategy to deal with the problems of protein attribute prediction, and hence may become a useful vehicle in the area of bioinformatics and proteomics.  相似文献   

16.
Nucleoids, a subnuclear system capable of chain elongation   总被引:1,自引:0,他引:1  
Nucleoids, prepared by salt extraction of non-DNase-digested nuclei, have properties similar, but not identical, to those of nuclear matrices which are prepared by salt extraction of DNase-digested nuclei. Nuclear matrices retained less pulse-labelled DNA, slightly less bound DNA polymerase alpha and DNA primase, but had greater in vitro DNA synthesis and in vitro priming. Nucleoids contained larger (110 S) DNA chains than nuclear matrices (30 S). Each type of residual nuclear structure could synthesize 4.5 S Okazaki fragments. When extracted with increasing concentrations of salt, DNase-digested nucleo lost the ability for further elongation of the 4.5 S DNA intermediate after 0.1-0.2 M NaCl, whereas undigested nuclei retained this ability up to 0.9 M NaCl. Chain elongation to 28 S DNA chains could be restored to nucleoids, but not to nuclear matrices, by the addition of nuclear extracts.  相似文献   

17.
Darnell SJ  Page D  Mitchell JC 《Proteins》2007,68(4):813-823
Protein-protein interactions can be altered by mutating one or more "hot spots," the subset of residues that account for most of the interface's binding free energy. The identification of hot spots requires a significant experimental effort, highlighting the practical value of hot spot predictions. We present two knowledge-based models that improve the ability to predict hot spots: K-FADE uses shape specificity features calculated by the Fast Atomic Density Evaluation (FADE) program, and K-CON uses biochemical contact features. The combined K-FADE/CON (KFC) model displays better overall predictive accuracy than computational alanine scanning (Robetta-Ala). In addition, because these methods predict different subsets of known hot spots, a large and significant increase in accuracy is achieved by combining KFC and Robetta-Ala. The KFC analysis is applied to the calmodulin (CaM)/smooth muscle myosin light chain kinase (smMLCK) interface, and to the bone morphogenetic protein-2 (BMP-2)/BMP receptor-type I (BMPR-IA) interface. The results indicate a strong correlation between KFC hot spot predictions and mutations that significantly reduce the binding affinity of the interface.  相似文献   

18.
Feng ZP 《In silico biology》2002,2(3):291-303
The present paper overviews the issue on predicting the subcellular location of a protein. Five measures of extracting information from the global sequence based on the Bayes discriminant algorithm are reviewed. 1) The auto-correlation functions of amino acid indices along the sequence; 2) The quasi-sequence-order approach; 3) the pseudo-amino acid composition; 4) the unified attribute vector in Hilbert space, 5) Zp parameters extracted from the Zp curve. The actual performance of the predictive accuracy is closely related to the degree of similarity between the training and testing sets or to the average degree of pairwise similarity in dataset in a cross-validated study. Many scholars considered that the current higher predictive accuracy still cannot ensure that some available algorithms are effective in practice prediction for the higher pairwise sequence identity of the datasets, but some of them declared that construction of the dataset used for developing software should base on the reality determined by the Mother Nature that some subcellular locations really contain only a minor number of proteins of which some even have a high percentage of sequence similarity. Owing to the complexity of the problem itself, some very sophisticated and special programs are needed for both constructing dataset and improving the prediction. Anyhow finding the target information in mature protein sequence and properly cooperating it with sorting signals in prediction may further improve the overall predictive accuracy and make the prediction into practice.  相似文献   

19.
20.
Methods for predicting bacterial protein subcellular localization   总被引:1,自引:0,他引:1  
The computational prediction of the subcellular localization of bacterial proteins is an important step in genome annotation and in the search for novel vaccine or drug targets. Since the 1991 release of PSORT I--the first comprehensive algorithm to predict bacterial protein localization--many other localization prediction tools have been developed. These methods offer significant improvements in predictive performance over PSORT I and the accuracy of some methods now rivals that of certain high-throughput laboratory methods for protein localization identification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号