首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper(1) we present a novel framework for protein secondary structure prediction. In this prediction framework, firstly we propose a novel parameterized semi-probability profile, which combines single sequence with evolutionary information effectively. Secondly, different semi-probability profiles are respectively applied as network input to predict protein secondary structure. Then a comparison among these different predictions is discussed in this article. Finally, na?ve Bayes approaches are used to combine these predictions in order to obtain a better prediction performance than individual prediction. The experimental results show that our proposed framework can indeed improve the prediction accuracy.  相似文献   

2.
Computational characterization of proteins is a necessary first step in understanding the biologic role of a protein. The composite architecture of mammalian proteins makes the prediction of the biologic role rather difficult. Nevertheless, integration of many different prediction methods allows for a more accurate representation. Information on the 3D structure of a protein improves the reliability of predictions of many features. This article reviews existing methods used to characterize proteins and several tools that provide an integrated access to different types of information. The authors point out the increasing importance of structural constraints and an increasing need to integrate different approaches.  相似文献   

3.
Computational characterization of proteins is a necessary first step in understanding the biologic role of a protein. The composite architecture of mammalian proteins makes the prediction of the biologic role rather difficult. Nevertheless, integration of many different prediction methods allows for a more accurate representation. Information on the 3D structure of a protein improves the reliability of predictions of many features. This article reviews existing methods used to characterize proteins and several tools that provide an integrated access to different types of information. The authors point out the increasing importance of structural constraints and an increasing need to integrate different approaches.  相似文献   

4.
ExPASy: The proteomics server for in-depth protein knowledge and analysis   总被引:10,自引:0,他引:10  
The ExPASy (the Expert Protein Analysis System) World Wide Web server (http://www.expasy.org), is provided as a service to the life science community by a multidisciplinary team at the Swiss Institute of Bioinformatics (SIB). It provides access to a variety of databases and analytical tools dedicated to proteins and proteomics. ExPASy databases include SWISS-PROT and TrEMBL, SWISS-2DPAGE, PROSITE, ENZYME and the SWISS-MODEL repository. Analysis tools are available for specific tasks relevant to proteomics, similarity searches, pattern and profile searches, post-translational modification prediction, topology prediction, primary, secondary and tertiary structure analysis and sequence alignment. These databases and tools are tightly interlinked: a special emphasis is placed on integration of database entries with related resources developed at the SIB and elsewhere, and the proteomics tools have been designed to read the annotations in SWISS-PROT in order to enhance their predictions. ExPASy started to operate in 1993, as the first WWW server in the field of life sciences. In addition to the main site in Switzerland, seven mirror sites in different continents currently serve the user community.  相似文献   

5.
Prediction of which peptides can bind major histocompatibility complex (MHC) molecules is commonly used to assist in the identification of T cell epitopes. However, because of the large numbers of different MHC molecules of interest, each associated with different predictive tools, tool generation and evaluation can be a very resource intensive task. A methodology commonly used to predict MHC binding affinity is the matrix or linear coefficients method. Herein, we described Average Relative Binding (ARB) matrix methods that directly predict IC50 values allowing combination of searches involving different peptide sizes and alleles into a single global prediction. A computer program was developed to automate the generation and evaluation of ARB predictive tools. Using an in-house MHC binding database, we generated a total of 85 and 13 MHC class I and class II matrices, respectively. Results from the automated evaluation of tool efficiency are presented. We anticipate that this automation framework will be generally applicable to the generation and evaluation of large numbers of MHC predictive methods and tools, and will be of value to centralize and rationalize the process of evaluation of MHC predictions. MHC binding predictions based on ARB matrices were made available at web server.  相似文献   

6.
Park Y  Helms V 《Biopolymers》2006,83(4):389-399
Given the difficulty in determining high-resolution structures of helical membrane proteins, sequence-based prediction methods can be useful in elucidating diverse physiological processes mediated by this important class of proteins. Predicting the angular orientations of transmembrane (TM) helices about the helix axes, based on the helix parameters from electron microscopy data, is a classical problem in this regard. This problem has triggered the development of a number of different empirical scales. Recently, sequence conservation patterns were also made use of for improved predictions. Empirical scales and sequence conservation patterns (collectively termed as "prediction scales") have also found frequent applications in other research areas of membrane proteins: for example, in structure modeling and in prediction of buried TM helices. This trend is expected to grow in the near future unless there are revolutionary developments in the experimental characterization of membrane proteins. Thus, it is timely and imperative to carry out a comprehensive benchmark test over the prediction scales proposed so far to determine their pros and cons. In the current analysis, we use exposure patterns of TM helices as a golden standard, because if one develops a prediction scale that correlates perfectly with exposure patterns of TM helices, it will enable one to predict buried residues (or buried faces) of TM helices with an accuracy of 100%. Our analysis reveals several important points. (1) It demonstrates that sequence conservation patterns are much more strongly correlated with exposure patterns of TM helices than empirical scales. (2) Scales that were specifically parameterized using structure data (structure-based scales) display stronger correlation than hydrophobicity-based scales, as expected. (3) A nonnegligible difference is observed among the structure-based scales in their correlational property, suggesting that not every learning algorithm is equally effective. (4) A straightforward framework of optimally combining sequence conservation patterns and empirical scales is proposed, which reveals that improvements gained from combining the two sources of information are not dramatic in almost all cases. In turn, this calls for the development of fundamentally different scales that capture the essentials of membrane protein folding for substantial improvements.  相似文献   

7.
Zhu M  Gao L  Guo Z  Li Y  Wang D  Wang J  Wang C 《Gene》2007,391(1-2):113-119
Determining protein functions is an important task in the post-genomic era. Most of the current methods work on some large-sized functional classes selected from functional categorization systems prior to the prediction processes. GESTs, a prediction approach previously proposed by us, is based on gene expression similarity and taxonomy similarity of the functional classes. Unlike many conventional methods, it does not require pre-selecting the functional classes and can predict specific functions for genes according to the functional annotations of their co-expressed genes. In this paper, we extend this method for analyzing protein-protein interaction data. We introduce gene expression data to filter the interacting neighbors of a protein in order to enhance the degree of functional consensus among the neighbors. Using the taxonomy similarity of protein functional classes, the proposed approach can call on the interacting neighbor proteins annotated to nearby classes to support the predictions for an uncharacterized protein, and automatically select the most appropriate small-sized specific functional classes in Gene Ontology (GO) during the learning process. By three measures particularly designed for the functional classes organized in GO, we evaluate the effects of using different taxonomy similarity scores on the prediction performance. Based on the yeast protein-protein interaction data from MIPS and a dataset of gene expression profiles, we show that this method is powerful for predicting protein function to very specific terms. Compared with the other two taxonomy similarity measures used in this study, if we want to achieve higher prediction accuracy with an acceptable specific level (predicted depth), SB-TS measure proposed by us is a reasonable choice for ontology-based functional predictions.  相似文献   

8.
Recognition of peptides bound to major histocompatibility complex (MHC) class I molecules by T lymphocytes is an essential part of immune surveillance. Each MHC allele has a characteristic peptide binding preference, which can be captured in prediction algorithms, allowing for the rapid scan of entire pathogen proteomes for peptide likely to bind MHC. Here we make public a large set of 48,828 quantitative peptide-binding affinity measurements relating to 48 different mouse, human, macaque, and chimpanzee MHC class I alleles. We use this data to establish a set of benchmark predictions with one neural network method and two matrix-based prediction methods extensively utilized in our groups. In general, the neural network outperforms the matrix-based predictions mainly due to its ability to generalize even on a small amount of data. We also retrieved predictions from tools publicly available on the internet. While differences in the data used to generate these predictions hamper direct comparisons, we do conclude that tools based on combinatorial peptide libraries perform remarkably well. The transparent prediction evaluation on this dataset provides tool developers with a benchmark for comparison of newly developed prediction methods. In addition, to generate and evaluate our own prediction methods, we have established an easily extensible web-based prediction framework that allows automated side-by-side comparisons of prediction methods implemented by experts. This is an advance over the current practice of tool developers having to generate reference predictions themselves, which can lead to underestimating the performance of prediction methods they are not as familiar with as their own. The overall goal of this effort is to provide a transparent prediction evaluation allowing bioinformaticians to identify promising features of prediction methods and providing guidance to immunologists regarding the reliability of prediction tools.  相似文献   

9.
Fold assignments for newly sequenced genomes belong to the most important and interesting applications of the booming field of protein structure prediction. We present a brief survey and a discussion of such assignments completed to date, using as an example several fold assignment projects for proteins from the Escherichia coli genome. This review focuses on steps that are necessary to go beyond the simple assignment projects and into the development of tools extending our understanding of functions of proteins in newly sequenced genomes. This paper also discusses several problems seldom addressed in the literature, such as the problem of domain prediction and complementary predictions (e.g., transmembrane regions and flexible regions) and cross-correlation of predictions from different servers. The influence of sequence and structure database growth on prediction success is also addressed. Finally, we discuss the perspectives of the field in the context of massive sequence and structure determination projects, as well as the development of novel prediction methods.  相似文献   

10.
A crucial step for identifying genes of interest in legume crops is to determine gene function in Medicago truncatula. To facilitate functional genomics in this species, an ecophysiological framework of analysis was developed. Our primary aim was to establish a standard terminology for identifying each organ on the plant. A standard system for the characterization of the vegetative and the reproductive developmental stages was then proposed. Using these tools, the time course of vegetative development of nitrogen-fixing A17 plants was analysed in experiments conducted under different environmental conditions. To take into account the influence of temperature on plant development timing, an original approach was used by modelling vegetative development as a function of thermal time. Interestingly, the use of thermal time highlighted genotypic constants in plant development. Thereafter, to illustrate how this methodology can be used in explaining phenotypic alterations, the phenotype of two allelic mutants was analysed. Because the tools proposed in this paper allow the following: (1) standardization of how the plant material should be characterized to be used for functional genomics; (2) prediction of plant vegetative development; and (3) a more accurate phenotyping, the use of these tools by the M. truncatula community should provide a relevant framework for facilitating the production of reproducible functional genomics data.  相似文献   

11.
Hering JA  Innocent PR  Haris PI 《Proteomics》2003,3(8):1464-1475
Fourier transform infrared (FTIR) spectroscopy is a very flexible technique for characterization of protein secondary structure. Measurements can be carried out rapidly in a number of different environments based on only small quantities of proteins. For this technique to become more widely used for protein secondary structure characterization, however, further developments in methods to accurately quantify protein secondary structure are necessary. Here we propose a structural classification of proteins (SCOP) class specialized neural networks architecture combining an adaptive neuro-fuzzy inference system (ANFIS) with SCOP class specialized backpropagation neural networks for improved protein secondary structure prediction. Our study shows that proteins can be accurately classified into two main classes "all alpha proteins" and "all beta proteins" merely based on the amide I band maximum position of their FTIR spectra. ANFIS is employed to perform the classification task to demonstrate the potential of this architecture with moderately complex problems. Based on studies using a reference set of 17 proteins and an evaluation set of 4 proteins, improved predictions were achieved compared to a conventional neural network approach, where structure specialized neural networks are trained based on protein spectra of both "all alpha" and "all beta" proteins. The standard errors of prediction (SEPs) in % structure were improved by 4.05% for helix structure, by 5.91% for sheet structure, by 2.68% for turn structure, and by 2.15% for bend structure. For other structure, an increase of SEP by 2.43% was observed. Those results were confirmed by a "leave-one-out" run with the combined set of 21 FTIR spectra of proteins.  相似文献   

12.

Background

One of the major challenges in the field of vaccine design is identifying B-cell epitopes in continuously evolving viruses. Various tools have been developed to predict linear or conformational epitopes, each relying on different physicochemical properties and adopting distinct search strategies. We propose a meta-learning approach for epitope prediction based on stacked and cascade generalizations. Through meta learning, we expect a meta learner to be able integrate multiple prediction models, and outperform the single best-performing model. The objective of this study is twofold: (1) to analyze the complementary predictive strengths in different prediction tools, and (2) to introduce a generic computational model to exploit the synergy among various prediction tools. Our primary goal is not to develop any particular classifier for B-cell epitope prediction, but to advocate the feasibility of meta learning to epitope prediction. With the flexibility of meta learning, the researcher can construct various meta classification hierarchies that are applicable to epitope prediction in different protein domains.

Results

We developed the hierarchical meta-learning architectures based on stacked and cascade generalizations. The bottom level of the hierarchy consisted of four conformational and four linear epitope prediction tools that served as the base learners. To perform consistent and unbiased comparisons, we tested the meta-learning method on an independent set of antigen proteins that were not used previously to train the base epitope prediction tools. In addition, we conducted correlation and ablation studies of the base learners in the meta-learning model. Low correlation among the predictions of the base learners suggested that the eight base learners had complementary predictive capabilities. The ablation analysis indicated that the eight base learners differentially interacted and contributed to the final meta model. The results of the independent test demonstrated that the meta-learning approach markedly outperformed the single best-performing epitope predictor.

Conclusions

Computational B-cell epitope prediction tools exhibit several differences that affect their performances when predicting epitopic regions in protein antigens. The proposed meta-learning approach for epitope prediction combines multiple prediction tools by integrating their complementary predictive strengths. Our experimental results demonstrate the superior performance of the combined approach in comparison with single epitope predictors.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0378-y) contains supplementary material, which is available to authorized users.  相似文献   

13.
Computational interactomics deals with prediction of functionally related proteins. One approach for solving this problem using comparative genomics consists in analysis of similarities between phylogenetic profiles of proteins. In contrast to most methods, which predict only pairwise interactions between proteins, in the present work we have applied the cluster analysis techniques in order to find modules of functionally related proteins. We have performed the cluster analysis of phylogenetic profiles of E. coli proteins using several clustering techniques and various modes for estimation of distances between profiles. We report here, that the best correspondence in the composition of resultant clusters to known metabolic pathways is achieved using Ward’s clustering together with Hamming’s distance. The proposed technique of assessing predictions of the modules of functionally related proteins can be used for comparative analysis of different algorithms for computational interactomics.  相似文献   

14.
A widely studied problem in systems biology is to predict bacterial phenotype from growth conditions, using mechanistic models such as flux balance analysis (FBA). However, the inverse prediction of growth conditions from phenotype is rarely considered. Here we develop a computational framework to carry out this inverse prediction on a computational model of bacterial metabolism. We use FBA to calculate bacterial phenotypes from growth conditions in E. coli, and then we assess how accurately we can predict the original growth conditions from the phenotypes. Prediction is carried out via regularized multinomial regression. Our analysis provides several important physiological and statistical insights. First, we show that by analyzing metabolic end products we can consistently predict growth conditions. Second, prediction is reliable even in the presence of small amounts of impurities. Third, flux through a relatively small number of reactions per growth source (∼10) is sufficient for accurate prediction. Fourth, combining the predictions from two separate models, one trained only on carbon sources and one only on nitrogen sources, performs better than models trained to perform joint prediction. Finally, that separate predictions perform better than a more sophisticated joint prediction scheme suggests that carbon and nitrogen utilization pathways, despite jointly affecting cellular growth, may be fairly decoupled in terms of their dependence on specific assortments of molecular precursors.  相似文献   

15.
MOTIVATION: Assigning functions for unknown genes based on diverse large-scale data is a key task in functional genomics. Previous work on gene function prediction has addressed this problem using independent classifiers for each function. However, such an approach ignores the structure of functional class taxonomies, such as the Gene Ontology (GO). Over a hierarchy of functional classes, a group of independent classifiers where each one predicts gene membership to a particular class can produce a hierarchically inconsistent set of predictions, where for a given gene a specific class may be predicted positive while its inclusive parent class is predicted negative. Taking the hierarchical structure into account resolves such inconsistencies and provides an opportunity for leveraging all classifiers in the hierarchy to achieve higher specificity of predictions. RESULTS: We developed a Bayesian framework for combining multiple classifiers based on the functional taxonomy constraints. Using a hierarchy of support vector machine (SVM) classifiers trained on multiple data types, we combined predictions in our Bayesian framework to obtain the most probable consistent set of predictions. Experiments show that over a 105-node subhierarchy of the GO, our Bayesian framework improves predictions for 93 nodes. As an additional benefit, our method also provides implicit calibration of SVM margin outputs to probabilities. Using this method, we make function predictions for multiple proteins, and experimentally confirm predictions for proteins involved in mitosis. SUPPLEMENTARY INFORMATION: Results for the 105 selected GO classes and predictions for 1059 unknown genes are available at: http://function.princeton.edu/genesite/ CONTACT: ogt@cs.princeton.edu.  相似文献   

16.
The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.  相似文献   

17.
The genomes of many organisms have been sequenced in the last 5 years. Typically about 30% of predicted genes from a newly sequenced genome cannot be given functional assignments using sequence comparison methods. In these situations three-dimensional structural predictions combined with a suite of computational tools can suggest possible functions for these hypothetical proteins. Suggesting functions may allow better interpretation of experimental data (e.g., microarray data and mass spectroscopy data) and help experimentalists design new experiments. In this paper, we focus on three hypothetical proteins of Shewanella oneidensis MR-1 that are potentially related to iron transport/metabolism based on microarray experiments. The threading program PROSPECT was used for protein structural predictions and functional annotation, in conjunction with literature search and other computational tools. Computational tools were used to perform transmembrane domain predictions, coiled coil predictions, signal peptide predictions, sub-cellular localization predictions, motif prediction, and operon structure evaluations. Combined computational results from all tools were used to predict roles for the hypothetical proteins. This method, which uses a suite of computational tools that are freely available to academic users, can be used to annotate hypothetical proteins in general.  相似文献   

18.
Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.  相似文献   

19.
Protein structure prediction is a cornerstone of bioinformatics research. Membrane proteins require their own prediction methods due to their intrinsically different composition. A variety of tools exist for topology prediction of membrane proteins, many of them available on the Internet. The server described in this paper, BPROMPT (Bayesian PRediction Of Membrane Protein Topology), uses a Bayesian Belief Network to combine the results of other prediction methods, providing a more accurate consensus prediction. Topology predictions with accuracies of 70% for prokaryotes and 53% for eukaryotes were achieved. BPROMPT can be accessed at http://www.jenner.ac.uk/BPROMPT.  相似文献   

20.
Cells exploit signaling pathways during responses to environmental changes, and these processes are often modulated during disease. Particularly, relevant human pathologies such as cancer or viral infections require downregulating apoptosis signaling pathways to progress. As a result, the identification of proteins responsible for these changes is essential for the diagnostics and development of therapeutics. Transferring functional annotation within protein interaction networks has proven useful to identify such proteins, although this is not a trivial task. Here, we used different scoring methods to transfer annotation from 53 well-studied members of the human apoptosis pathways (as known by 2005) to their protein interactors. All scoring methods produced significant predictions (compared to a random negative model), but its number was too large to be useful. Thus, we made a final prediction using specific combinations of scoring methods and compared it to the proteins related to apoptosis signaling pathways during the last 5 years. We propose 273 candidate proteins that may be relevant in apoptosis signaling pathways. Although some of them have known functions consistent with their proposed apoptotsis involvement, the majority have not been annotated yet, leaving room for further experimental studies. We provide our predictions at http://sbi.imim.es/web/Apoptosis.php.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号