首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
ProteoCat is a computer program that has been designed to help researchers in the planning of large-scale proteomic experiments. The central part of this program is the unit of hydrolysis simulation that supports 4 proteases (trypsin, lysine C, endoproteinases Asp-N and GluC). For peptides obtained after virtual hydrolysis or loaded from data files a number of properties important in mass-spectrometric experiments can be calculated and predicted; the resultant data can be analyzed or filtered (to reduce a set of peptides). The program is using new and improved modifications of own earlier developed methods for pI prediction, which can be also predicted by means of popular pKa scales proposed by other reseachers. The algorithm for prediction of peptide retention time has been realized similarly to the algorithm used in the SSRCalc program. Using ProteoCat it is possible to estimate the coverage of amino acid sequences of analyzed proteins under defined limitation on peptides detection, as well as the possibility of assembly of peptide fragments with user-defined minimal sizes of “sticky” ends. The program has a graphical user interface, written on JAVA and available at http://www.ibmc.msk.ru/LPCIT/ProteoCat.  相似文献   

2.
Protein nitration and nitrosylation are essential post-translational modifications(PTMs)involved in many fundamental cellular processes. Recent studies have revealed that excessive levels of nitration and nitrosylation in some critical proteins are linked to numerous chronic diseases.Therefore, the identification of substrates that undergo such modifications in a site-specific manner is an important research topic in the community and will provide candidates for targeted therapy. In this study, we aimed to develop a computational tool for predicting nitration and nitrosylation sites in proteins. We first constructed four types of encoding features, including positional amino acid distributions, sequence contextual dependencies, physicochemical properties, and position-specificscoring features, to represent the modified residues. Based on these encoding features, we established a predictor called DeepNitro using deep learning methods for predicting protein nitration and nitrosylation. Using n-fold cross-validation, our evaluation shows great AUC values for DeepNitro, 0.65 for tyrosine nitration, 0.80 for tryptophan nitration, and 0.70 for cysteine nitrosylation, respectively,demonstrating the robustness and reliability of our tool. Also, when tested in the independent dataset, DeepNitro is substantially superior to other similar tools with a 7%à42% improvement in the prediction performance. Taken together, the application of deep learning method and novel encoding schemes, especially the position-specific scoring feature, greatly improves the accuracy of nitration and nitrosylation site prediction and may facilitate the prediction of other PTM sites. DeepNitro is implemented in JAVA and PHP and is freely available for academic research at http://deepnitro.renlab.org.  相似文献   

3.
It is an established fact that allelic variation and post-translational modifications create different variants of proteins, which are observed as isoelectric and size subspecies in two-dimensional gel based proteomics. Here we explore the stromal proteome of spinach and Arabidopsis chloroplast and show that clustering of mass spectra is a useful tool for investigating such variants and detecting modified peptides with amino acid substitutions or post-translational modifications. This study employs data mining by hierarchical clustering of MALDI-MS spectra, using the web version of the SPECLUST program (http://bioinfo.thep.lu.se/speclust.html). The tool can also be used to remove peaks of contaminating proteins and to improve protein identification, especially for species without a fully sequenced genome. Mutually exclusive peptide peaks within a cluster provide a good starting point for MS/MS investigation of modified peptides, here exemplified by the identification of an A to E substitution that accounts for the isoelectric heterogeneity in protein isoforms.  相似文献   

4.
We propose a novel method for phenotype identification involving a stringent noise analysis and filtering procedure followed by combining the results of several machine learning tools to produce a robust predictor. We illustrate our method on SELDI-TOF MS prostate cancer data (http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp). Our method identified 11 proteomic biomarkers and gave significantly improved predictions over previous analyses with these data. We were able to distinguish cancer from non-cancer cases with a sensitivity of 90.31% and a specificity of 98.81%. The proposed method can be generalized to multi-phenotype prediction and other types of data (e.g., microarray data).  相似文献   

5.
Investigation of physiological mechanisms at a cellular level often requires production of high-quality antibodies, frequently using synthetic peptides as immunogens. Here we describe a new, web-based software tool called NHLBI-AbDesigner that allows the user to visualize the information needed to choose optimal peptide sequences for peptide-directed antibody production (http://helixweb.nih.gov/AbDesigner/). The choice of an immunizing peptide is generally based on a need to optimize immunogenicity, antibody specificity, multispecies conservation, and robustness in the face of posttranslational modifications (PTMs). AbDesigner displays information relevant to these criteria as follows: 1) "Immunogenicity Score," based on hydropathy and secondary structure prediction; 2) "Uniqueness Score," a predictor of specificity of an antibody against all proteins expressed in the same species; 3) "Conservation Score," a predictor of ability of the antibody to recognize orthologs in other animal species; and 4) "Protein Features" that show structural domains, variable regions, and annotated PTMs that may affect antibody performance. AbDesigner displays the information online in an interactive graphical user interface, which allows the user to recognize the trade-offs that exist for alternative synthetic peptide choices and to choose the one that is best for a proposed application. Several examples of the use of AbDesigner for the display of such trade-offs are presented, including production of a new antibody to Slc9a3. We also used the program in large-scale mode to create a database listing the 15-amino acid peptides with the highest Immunogenicity Scores for all known proteins in five animal species, one plant species (Arabidopsis thaliana), and Saccharomyces cerevisiae.  相似文献   

6.
There is a growing interest in the Non-ribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) of microbes, fungi and plants because they can produce bioactive peptides such as antibiotics. The ability to identify the substrate specificity of the enzyme''s adenylation (A) and acyl-transferase (AT) domains is essential to rationally deduce or engineer new products. We here report on a Hidden Markov Model (HMM)-based ensemble method to predict the substrate specificity at high quality. We collected a new reference set of experimentally validated sequences. An initial classification based on alignment and Neighbor Joining was performed in line with most of the previously published prediction methods. We then created and tested single substrate specific HMMs and found that their use improved the correct identification significantly for A as well as for AT domains. A major advantage of the use of HMMs is that it abolishes the dependency on multiple sequence alignment and residue selection that is hampering the alignment-based clustering methods. Using our models we obtained a high prediction quality for the substrate specificity of the A domains similar to two recently published tools that make use of HMMs or Support Vector Machines (NRPSsp and NRPS predictor2, respectively). Moreover, replacement of the single substrate specific HMMs by ensembles of models caused a clear increase in prediction quality. We argue that the superiority of the ensemble over the single model is caused by the way substrate specificity evolves for the studied systems. It is likely that this also holds true for other protein domains. The ensemble predictor has been implemented in a simple web-based tool that is available at http://www.cmbi.ru.nl/NRPS-PKS-substrate-predictor/.  相似文献   

7.
Several accurate prediction systems have been developed for prediction of class I major histocompatibility complex (MHC):peptide binding. Most of these are trained on binding affinity data of primarily 9mer peptides. Here, we show how prediction methods trained on 9mer data can be used for accurate binding affinity prediction of peptides of length 8, 10 and 11. The method gives the opportunity to predict peptides with a different length than nine for MHC alleles where no such peptides have been measured. As validation, the performance of this approach is compared to predictors trained on peptides of the peptide length in question. In this validation, the approximation method has an accuracy that is comparable to or better than methods trained on a peptide length identical to the predicted peptides. AVAILABILITY: The algorithm has been implemented in the web-accessible servers NetMHC-3.0: http://www.cbs.dtu.dk/services/NetMHC-3.0, and NetMHCpan-1.1: http://www.cbs.dtu.dk/services/NetMHCpan-1.1  相似文献   

8.
Previous work in predicting protein localization to the chloroplast organelle in plants led to the development of an artificial neural network-based approach capable of remarkable accuracy in its prediction (ChloroP). A common criticism against such neural network models is that it is difficult to interpret the criteria that are used in making predictions. We address this concern with several new prediction methods that base predictions explicitly on the abundance of different amino acid types in the N-terminal region of the protein. Our successful prediction accuracy suggests that ChloroP uses little positional information in its decision-making; an unexpected result given the elaborate ChloroP input scheme. By removing positional information, our simpler methods allow us to identify those amino acids that are useful for successful prediction. The identification of important sequence features, such as amino acid content, is advantageous if one of the goals of localization predictors is to gain an understanding of the biological process of chloroplast localization. Our most accurate predictor combines principal component analysis and logistic regression. Web-based prediction using this method is available online at http://apicoplast.cis.upenn.edu/pclr/.  相似文献   

9.
A thorough understanding of the fragmentation processes in MS/MS can be a powerful tool in assessing the resulting peptide and protein identifications. We here present the freely available, open‐source FragmentationAnalyzer tool ( http://fragmentation‐analyzer.googlecode.com ) that makes it straightforward to analyze large MS/MS data sets for specific types of identified peptides, using a common set of peptide properties. This enables the detection of fragmentation pattern nuances related to specific instruments or due to the presence of post‐translational modifications.  相似文献   

10.
Computational approaches for predicting protein-protein interfaces are extremely useful for understanding and modelling the quaternary structure of protein assemblies. In particular, partner-specific binding site prediction methods allow delineating the specific residues that compose the interface of protein complexes. In recent years, new machine learning and other algorithmic approaches have been proposed to solve this problem. However, little effort has been made in finding better training datasets to improve the performance of these methods. With the aim of vindicating the importance of the training set compilation procedure, in this work we present BIPSPI+, a new version of our original server trained on carefully curated datasets that outperforms our original predictor. We show how prediction performance can be improved by selecting specific datasets that better describe particular types of protein interactions and interfaces (e.g. homo/hetero). In addition, our upgraded web server offers a new set of functionalities such as the sequence-structure prediction mode, hetero- or homo-complex specialization and the guided docking tool that allows to compute 3D quaternary structure poses using the predicted interfaces. BIPSPI+ is freely available at https://bipspi.cnb.csic.es.  相似文献   

11.
Modified peptides constitute a sub-population among the tryptic peptides analyzed in LC–MS based shotgun proteomics experiments. For larger proteomes including the human proteome, the tryptic peptide pool is very large, which necessitates some form of sample fractionation. By carefully choosing the sample fractionation and separation methods applied as shown here for the combination of narrow-range immobilized pH gradient isoelectric focusing (IPG-IEF) and nanoUPLC–MS, significantly increased information content can be achieved. Relatively low standard deviations were obtained for such multidimensional separations in terms of peptide pI (<0.05 pI units) and retention time (<0.3 min for a 350 min gradient) for a selection of highly complex proteomics samples. Using narrow-range IPG-IEF, experimental and predicted pI were in relative good agreement. However, based on our data, retention time prediction algorithms need further improvements in accuracy to match state-of-the-art reversed-phase chromatography performance. General trends of peptide pI shifts induced by common modifications including deamidations and N-terminal modifications are described. Deamidations of glutamine and asparagines shift peptide pI by approximately 1.5 pI units, making the peptides more acidic. Additionally, a novel pI shift (+~0.4 pI units) was found associated with dethiomethyl Met modifications. Further, the effects of these modifications as well as methionine oxidation were investigated in terms of experimentally observed retention time shifts in the chromatographic separation step. Clearly, post-translational modification-induced influences on peptide pI and retention time can be accurately and reproducibly measured using narrow-range IPG-IEF and high-performance nanoLC–MS. Even at modest mass accuracy (±50 ppm), the inclusion of peptide pI (±0.2 pI units) and/or retention time (±20 min) criteria are highly informative for human proteome analyses. The applications of using this information to identify post-translationally modified peptides and improve data analysis workflows are discussed.  相似文献   

12.
We present a neural network based method (ChloroP) for identifying chloroplast transit peptides and their cleavage sites. Using cross-validation, 88% of the sequences in our homology reduced training set were correctly classified as transit peptides or nontransit peptides. This performance level is well above that of the publicly available chloroplast localization predictor PSORT. Cleavage sites are predicted using a scoring matrix derived by an automatic motif-finding algorithm. Approximately 60% of the known cleavage sites in our sequence collection were predicted to within +/-2 residues from the cleavage sites given in SWISS-PROT. An analysis of 715 Arabidopsis thaliana sequences from SWISS-PROT suggests that the ChloroP method should be useful for the identification of putative transit peptides in genome-wide sequence data. The ChloroP predictor is available as a web-server at http://www.cbs.dtu.dk/services/ChloroP/.  相似文献   

13.
We have developed an automated method for predicting signal peptide sequences and their cleavage sites in eukaryotic and bacterial protein sequences. It is a 2-layer predictor: the 1st-layer prediction engine is to identify a query protein as secretory or non-secretory; if it is secretory, the process will be automatically continued with the 2nd-layer prediction engine to further identify the cleavage site of its signal peptide. The new predictor is called Signal-CF, where C stands for "coupling" and F for "fusion", meaning that Signal-CF is formed by incorporating the subsite coupling effects along a protein sequence and by fusing the results derived from many width-different scaled windows through a voting system. Signal-CF is featured by high success prediction rates with short computational time, and hence is particularly useful for the analysis of large-scale datasets. Signal-CF is freely available as a web-server at http://chou.med.harvard.edu/bioinf/Signal-CF/ or http://202.120.37.186/bioinf/Signal-CF/.  相似文献   

14.
Prediction of the types of membrane proteins is of great importance both for genome-wide annotation and for experimental researchers to understand proteins' functions. We describe a new strategy for the prediction of the types of membrane proteins using the Nearest Neighbor Algorithm. We introduced a bipartite feature space consisting of two kinds of disjoint vectors, proteins' domain profile and proteins' physiochemical characters. Jackknife cross validation test shows that a combination of both features greatly improves the prediction accuracy. Furthermore, the contribution of the physiochemical features to the classification of membrane proteins has also been explored using the feature selection method called "mRMR" (Minimum Redundancy, Maximum Relevance) ( IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27 ( 8), 1226- 1238 ). A more compact set of features that are mostly contributive to membrane protein classification are obtained. The analyses highlighted both hydrophobicity and polarity as the most important features. The predictor with 56 most contributive features achieves an acceptable prediction accuracy of 87.02%. Online prediction service is available freely on our Web site http://pcal.biosino.org/TransmembraneProteinClassification.html.  相似文献   

15.
Because the number of post-translational modifications requiring standardized annotation in the PIR-International Protein Sequence Database was large and steadily increasing, a database of protein structure modifications was constructed in 1993 to assist in producing appropriate feature annotations for covalent binding sites, modified sites and cross-links. In 1995 RESID was publicly released as a PIR-International text database distributed on CD-ROM and accessible through the ATLAS program. In 1998 it was made available on the PIR Web site at http://www-nbrf.georgetown.edu/pir/searchdb++ +.html . The RESID Database includes such information as: systematic and frequently observed alternate names; Chemical s Service registry numbers; atomic formulas and weights; enzyme activities; indicators forN-terminal, C-terminal or peptide chain cross-link modifications; keywords; and literature citations with database cross-references. The RESID Database can be used to predict atomic masses for peptides, and is being enhanced to provide molecular structures for graphical presentation on the PIR Web site using widely available molecular viewing programs.  相似文献   

16.
MOTIVATION: The success of the consensus approach to the protein structure prediction problem has led to development of several different consensus methods. Most of them only rely on a structural comparison of a number of different models. However, there are other types of information that might be useful such as the score from the server and structural evaluation. RESULTS: Pcons5 is a new and improved version of the consensus predictor Pcons. Pcons5 integrates information from three different sources: the consensus analysis, structural evaluation and the score from the fold recognition servers. We show that Pcons5 is better than the previous version of Pcons and that it performs better than using only the consensus analysis. In addition, we also present a version of Pmodeller based on Pcons5, which performs significantly better than Pcons5. AVAILABILITY: Pcons5 is the first Pcons version available as a standalone program from http://www.sbc.su.se/~bjorn/Pcons5. It should be easy to implement in local meta-servers.  相似文献   

17.
Peptide length-based prediction of peptide-MHC class II binding   总被引:2,自引:0,他引:2  
MOTIVATION: Algorithms for predicting peptide-MHC class II binding are typically similar, if not identical, to methods for predicting peptide-MHC class I binding despite known differences between the two scenarios. We investigate whether representing one of these differences, the greater range of peptide lengths binding MHC class II, improves the performance of these algorithms. RESULTS: A non-linear relationship between peptide length and peptide-MHC class II binding affinity was identified in the data available for several MHC class II alleles. Peptide length was incorporated into existing prediction algorithms using one of several modifications: using regression to pre-process the data, using peptide length as an additional variable within the algorithm, or representing register shifting in longer peptides. For several datasets and at least two algorithms these modifications consistently improved prediction accuracy. AVAILABILITY: http://malthus.micro.med.umich.edu/Bioinformatics  相似文献   

18.
Protein and peptide mass analysis and amino acid sequencing by mass spectrometry is widely used for identification and annotation of post-translational modifications (PTMs) in proteins. Modification-specific mass increments, neutral losses or diagnostic fragment ions in peptide mass spectra provide direct evidence for the presence of post-translational modifications, such as phosphorylation, acetylation, methylation or glycosylation. However, the commonly used database search engines are not always practical for exhaustive searches for multiple modifications and concomitant missed proteolytic cleavage sites in large-scale proteomic datasets, since the search space is dramatically expanded. We present a formal definition of the problem of searching databases with tandem mass spectra of peptides that are partially (sub-stoichiometrically) modified. In addition, an improved search algorithm and peptide scoring scheme that includes modification specific ion information from MS/MS spectra was implemented and tested using the Virtual Expert Mass Spectrometrist (VEMS) software. A set of 2825 peptide MS/MS spectra were searched with 16 variable modifications and 6 missed cleavages. The scoring scheme returned a large set of post-translationally modified peptides including precise information on modification type and position. The scoring scheme was able to extract and distinguish the near-isobaric modifications of trimethylation and acetylation of lysine residues based on the presence and absence of diagnostic neutral losses and immonium ions. In addition, the VEMS software contains a range of new features for analysis of mass spectrometry data obtained in large-scale proteomic experiments. Windows binaries are available at http://www.yass.sdu.dk/.  相似文献   

19.
Learning MHC I--peptide binding   总被引:1,自引:0,他引:1  
MOTIVATION AND RESULTS: Motivated by the ability of a simple threading approach to predict MHC I--peptide binding, we developed a new and improved structure-based model for which parameters can be estimated from additional sources of data about MHC-peptide binding. In addition to the known 3D structures of a small number of MHC-peptide complexes that were used in the original threading approach, we included three other sources of information on peptide-MHC binding: (1) MHC class I sequences; (2) known binding energies for a large number of MHC-peptide complexes; and (3) an even larger binary dataset that contains information about strong binders (epitopes) and non-binders (peptides that have a low affinity for a particular MHC molecule). Our model significantly outperforms the standard threading approach in binding energy prediction. In our approach, which we call adaptive double threading, the parameters of the threading model are learnable, and both MHC and peptide sequences can be threaded onto structures of other alleles. These two properties make our model appropriate for predicting binding for alleles for which very little data (if any) is available beyond just their sequence, including prediction for alleles for which 3D structures are not available. The ability of our model to generalize beyond the MHC types for which training data is available also separates our approach from epitope prediction methods which treat MHC alleles as symbolic types, rather than biological sequences. We used the trained binding energy predictor to study viral infections in 246 HIV patients from the West Australian cohort, and over 1000 sequences in HIV clade B from Los Alamos National Laboratory database, capturing the course of HIV evolution over the last 20 years. Finally, we illustrate short-, medium-, and long-term adaptation of HIV to the human immune system. AVAILABILITY: http://www.research.microsoft.com/~jojic/hlaBinding.html.  相似文献   

20.
Post-translational modifications (PTMs) occur on almost all proteins analyzed to date. The function of a modified protein is often strongly affected by these modifications and therefore increased knowledge about the potential PTMs of a target protein may increase our understanding of the molecular processes in which it takes part. High-throughput methods for the identification of PTMs are being developed, in particular within the fields of proteomics and mass spectrometry. However, these methods are still in their early stages, and it is indeed advantageous to cut down on the number of experimental steps by integrating computational approaches into the validation procedures. Many advanced methods for the prediction of PTMs exist and many are made publicly available. We describe our experiences with the development of prediction methods for phosphorylation and glycosylation sites and the development of PTM-specific databases. In addition, we discuss novel ideas for PTM visualization (exemplified by kinase landscapes) and improvements for prediction specificity (by using ESS--evolutionary stable sites). As an example, we present a new method for kinase-specific prediction of phosphorylation sites, NetPhosK, which extends our earlier and more general tool, NetPhos. The new server, NetPhosK, is made publicly available at the URL http://www.cbs.dtu.dk/services/NetPhosK/. The issues of underestimation, over-prediction and strategies for improving prediction specificity are also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号