首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Short peptides connecting-helices and-strands have been analyzed in 240 proteins refined at resolutions of 0.25 nm or better. Connecting peptides of lengths between one and five residues have been classified as part of supersecondary motifs of four types:, , , and. Careful consideration has been given to the definition of secondary structures on the basis of hydrogen bonds and main-chain conformational angles. Using five classes of residue conformation—a, b, e, l, t—in the nonregular structure regions of, space, 34 classes of supersecondary motifs occurring at least five times have been identified. Among these 34 classes, 11 classes that occur more than 25 times are commonly occurring supersecondary structure motifs. The patterns and conformations of the 11 commonly occurring supersecondary structure motifs have been characterized, demonstrating that patterns and conformations adopted by supersecondary structure motifs are limited. The results have relevance to structure prediction, comparative modeling, and protein folding.  相似文献   

2.
Prediction of the β-Hairpins in Proteins Using Support Vector Machine   总被引:1,自引:0,他引:1  
Hu XZ  Li QZ 《The protein journal》2008,27(2):115-122
By using of the composite vector with increment of diversity and scoring function to express the information of sequence, a support vector machine (SVM) algorithm for predicting β-hairpin motifs is proposed. The prediction is done on a dataset of 3,088 non homologous proteins containing 6,027 β-hairpins. The overall accuracy of prediction and Matthew’s correlation coefficient are 79.9% and 0.59 for the independent testing dataset. In addition, a higher accuracy of 83.3% and Matthew’s correlation coefficient of 0.67 in the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nuclic Acid Res 33:154–159). The performance of the method is also evaluated by predicting the β-hairpins of in the CASP6 proteins, and the better results are obtained. Moreover, this method is used to predict four kinds of supersecondary structures. The overall accuracy of prediction is 64.5% for the independent testing dataset.  相似文献   

3.
Helix capping.   总被引:12,自引:7,他引:5  
Helix-capping motifs are specific patterns of hydrogen bonding and hydrophobic interactions found at or near the ends of helices in both proteins and peptides. In an alpha-helix, the first four >N-H groups and last four >C=O groups necessarily lack intrahelical hydrogen bonds. Instead, such groups are often capped by alternative hydrogen bond partners. This review enlarges our earlier hypothesis (Presta LG, Rose GD. 1988. Helix signals in proteins. Science 240:1632-1641) to include hydrophobic capping. A hydrophobic interaction that straddles the helix terminus is always associated with hydrogen-bonded capping. From a global survey among proteins of known structure, seven distinct capping motifs are identified-three at the helix N-terminus and four at the C-terminus. The consensus sequence patterns of these seven motifs, together with results from simple molecular modeling, are used to formulate useful rules of thumb for helix termination. Finally, we examine the role of helix capping as a bridge linking the conformation of secondary structure to supersecondary structure.  相似文献   

4.
Background HLA-DQ alleles are involved in the pathogenesis of hypersensitivity reactions, with HLA-DQ8 associated with several human autoimmune disorders. Limited success has been achieved using sequence-based computational techniques for predicting HLA-DQ8-restricted T cell epitopes while accuracy and efficiency of recently developed structure-based models need to be improved. Results We describe a combined structure-based prediction approach for DQ8-restricted T cell epitope prediction using a recently developed fast and accurate docking protocol, pDOCK, and molecular surface electrostatic potential (MSEP)-based clustering of pMHC binding interfaces. The prediction model was rigorously trained, tested and validated using experimentally verified DQ8 binding and non-binding peptides. High prediction accuracy (average area under the ROC curve, average AROC>0.94) is validated against experimental data. Our model also predicts all binding registers correctly and known T cell activators with 77% accuracy. We also studied the patterns of DQ8-binding peptides and reassure the existence of epitopes not conforming to binding motifs. Conclusions We have developed a model that can be successfully applied as a generic protocol for easy in silico identification of potential immunogenic T cell epitopes. The current model is therefore applicable for screening vaccine candidates irrespective of sequence motifs. We have also illustrated efficient discrimination of different categories of binders from non-binders as well as different categories of pMHC agonists from non-agonists, while accurately predicting the binding registers of DQ8-restricted peptides. This combined approach provides a set of sensitive and specific computational tools to facilitate high-throughput screening of peptides for immunotherapeutic applications such as controlling allergic and autoimmune responses.  相似文献   

5.
MOTIVATION: Functional annotation of unknown proteins is a major goal in proteomics. A key annotation is the prediction of a protein's subcellular localization. Numerous prediction techniques have been developed, typically focusing on a single underlying biological aspect or predicting a subset of all possible localizations. An important step is taken towards emulating the protein sorting process by capturing and bringing together biologically relevant information, and addressing the clear need to improve prediction accuracy and localization coverage. RESULTS: Here we present a novel SVM-based approach for predicting subcellular localization, which integrates N-terminal targeting sequences, amino acid composition and protein sequence motifs. We show how this approach improves the prediction based on N-terminal targeting sequences, by comparing our method TargetLoc against existing methods. Furthermore, MultiLoc performs considerably better than comparable methods predicting all major eukaryotic subcellular localizations, and shows better or comparable results to methods that are specialized on fewer localizations or for one organism. AVAILABILITY: http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc/  相似文献   

6.
We propose a knowledge-based approach to the prediction of protein structures in cases where there is no sequence-homology to proteins with known spatial structure. Using methods from Artificial Intelligence we attempt to take into account long-range interactions within the prediction process. This allows not only the assignment of secondary but also of supersecondary structure elements. In particular, the patterns used as conditions of prediction rules are generated by learning methods from information contained in the Protein Data Base. Patterns on higher levels of the protein structure hierarchy are used as constraints to reduce the combinatorial search space. These patterns may also be used to search for specified structure motifs by interactive retrieval.  相似文献   

7.
A new approach to estimation of quantal release distribution of transmitter under conditions of high synaptic activity is presented. Postsynaptic responses of neuromuscular excitatory synapse in muscle-opener of nipper of the lobster, which are obtained by focal extracellular recording, are used as original data set. Based on two data groups (value of evoked and spontaneous postsynaptic responses), the linear regression model is constructed. Parameters of this model describe completely the quantal release distribution. To evaluate the parameters, biased modifications of the least squares method—the penalized least squares method and the principal components method—were applied. As a result, it was possible to achieve estimations of the quantal release distribution with sufficiently low standard errors. Modeling studies have shown that the gain of accuracy of the estimation due to a decrease of the standard error exceeds considerably losses caused by its bias.  相似文献   

8.
Protein backbones have characteristic secondary structures, including α-helices and β-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of α-helix caps, we test whether the information content of the sequence–structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of ±1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.  相似文献   

9.
Successful prediction of the beta-hairpin motif will be helpful for understanding the of the fold recognition. Some algorithms have been proposed for the prediction of beta-hairpin motifs. However, the parameters used by these methods were primarily based on the amino acid sequences. Here, we proposed a novel model for predicting beta-hairpin structure based on the chemical shift. Firstly, we analyzed the statistical distribution of chemical shifts of six nuclei in not beta-hairpin and beta-hairpin motifs. Secondly, we used these chemical shifts as features combined with three algorithms to predict beta-hairpin structure. Finally, we achieved the best prediction, namely sensitivity of 92%, the specificity of 94% with 0.85 of Mathew’s correlation coefficient using quadratic discriminant analysis algorithm, which is clearly superior to the same method for the prediction of beta-hairpin structure from 20 amino acid compositions in the three-fold cross-validation. Our finding showed that the chemical shift is an effective parameter for beta-hairpin prediction, suggesting the quadratic discriminant analysis is a powerful algorithm for the prediction of beta-hairpin.  相似文献   

10.
Prediction of β-turns from amino acid sequences has long been recognized as an important problem in structural bioinformatics due to their frequent occurrence as well as their structural and functional significance. Because various structural features of proteins are intercorrelated, secondary structure information has been often employed as an additional input for machine learning algorithms while predicting β-turns. Here we present a novel bidirectional Elman-type recurrent neural network with multiple output layers (MOLEBRNN) capable of predicting multiple mutually dependent structural motifs and demonstrate its efficiency in recognizing three aspects of protein structure: β-turns, β-turn types, and secondary structure. The advantage of our method compared to other predictors is that it does not require any external input except for sequence profiles because interdependencies between different structural features are taken into account implicitly during the learning process. In a sevenfold cross-validation experiment on a standard test dataset our method exhibits the total prediction accuracy of 77.9% and the Mathew's Correlation Coefficient of 0.45, the highest performance reported so far. It also outperforms other known methods in delineating individual turn types. We demonstrate how simultaneous prediction of multiple targets influences prediction performance on single targets. The MOLEBRNN presented here is a generic method applicable in a variety of research fields where multiple mutually depending target classes need to be predicted. Availability: http://webclu.bio.wzw.tum.de/predator-web/.  相似文献   

11.
It is well established that protein structures are more conserved than protein sequences. One-third of all known protein structures can be classified into ten protein folds, which themselves are composed mainly of alpha-helical hairpin, beta hairpin, and betaalphabeta supersecondary structural elements. In this study, we explore the ability of a recent Monte Carlo-based procedure to generate the 3D structures of eight polypeptides that correspond to units of supersecondary structure and three-stranded antiparallel beta sheet. Starting from extended or misfolded compact conformations, all Monte Carlo simulations show significant success in predicting the native topology using a simplified chain representation and an energy model optimized on other structures. Preliminary results on model peptides from nucleotide binding proteins suggest that this simple protein folding model can help clarify the relation between sequence and topology.  相似文献   

12.
Tantoso E  Li KB 《Amino acids》2008,35(2):345-353
Identifying a protein's subcellular localization is an important step to understand its function. However, the involved experimental work is usually laborious, time consuming and costly. Computational prediction hence becomes valuable to reduce the inefficiency. Here we provide a method to predict protein subcellular localization by using amino acid composition and physicochemical properties. The method concatenates the information extracted from a protein's N-terminal, middle and full sequence. Each part is represented by amino acid composition, weighted amino acid composition, five-level grouping composition and five-level dipeptide composition. We divided our dataset into training and testing set. The training set is used to determine the best performing amino acid index by using five-fold cross validation, whereas the testing set acts as the independent dataset to evaluate the performance of our model. With the novel representation method, we achieve an accuracy of approximately 75% on independent dataset. We conclude that this new representation indeed performs well and is able to extract the protein sequence information. We have developed a web server for predicting protein subcellular localization. The web server is available at http://aaindexloc.bii.a-star.edu.sg .  相似文献   

13.
Genomic information is becoming available for an ever-wider range of animals with the genes for several well-characterized peptide families, such as the RFamides, detected in a surprisingly diverse set of these animals. While bioinformatic tools allow the prediction of the RFamide-related prohormones from genetic information, it is more difficult to accurately predict the final processed peptides because of the large number of processing steps required to convert a prohormone into mature bioactive peptides. Several statistical-based methods for predicting basic site cleavages in prohormones are described, and their ability to predict the basic site cleavages in a variety of RFamide-related peptides from vertebrates and invertebrates is reported. Specifically, the cleavages in the invertebrate FMRFamides, and the vertebrate NPFFa, RFRPa, and PrRPa peptide families are modeled. The three models compared here are based on known cleavage motifs, a logistic regression, and artificial neural networks. Improvements in the accuracy and precision of the cleavage estimates will lead to increased utilization of these models for predicting bioactive neuropeptides before experimental verification is available.  相似文献   

14.
Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.  相似文献   

15.
Identification and characterization of recurrent supersecondary structural elements is central to understanding the rules governing protein tertiary structure. Here, we describe the GD box, a widespread noncontiguous supersecondary element, which we initially found in a group of topologically distinct but homologous β‐barrels—the cradle‐loop barrels. The GD box is similar both in sequence and structure and comprises two short unpaired β‐strands connected by an orthogonal type‐II β‐turn and a noncontiguous β‐strand forming hydrogen bonds with the β‐turn. Using structure‐based analysis, we have detected 518 instances of the GD box in a nonredundant subset of the SCOP database comprising 3771 domains. Apart from the cradle‐loop barrels, this motif is also found in a diverse set of nonhomologous folds including other topologically related β‐barrels. Since nonlocal interactions are fundamental in the formation of protein structure, systematic identification and characterization of other noncontiguous supersecondary structural elements is likely to prove valuable to protein structure modeling, validation, and prediction.  相似文献   

16.
Skrabanek L  Niv MY 《Proteins》2008,72(4):1138-1147
Sequence signature databases such as PROSITE, which include protein pattern motifs indicative of a protein's function, are widely used for function prediction studies, cellular localization annotation, and sequence classification. Correct annotation relies on high precision of the motifs. We present a new and general approach for increasing the precision of established protein pattern motifs by including secondary structure constraints (SSCs). We use Scan2S, the first sequence motif-scanning program to optionally include SSCs, to augment PROSITE pattern motifs. The constraints were derived from either the DSSP secondary structure assignment or the PSIPRED predictions for PROSITE-documented true positive hits. The secondary structure-augmented motifs were scanned against all SwissProt sequences, for which secondary structure predictions were precalculated. Against this dataset, motifs with PSIPRED-derived SSCs exhibited improved performance over motifs with DSSP-derived constraints. The precision of 763 of the 782 PSIPRED-augmented motifs remained unchanged or increased compared to the original motifs; 26 motifs showed an absolute precision increase of 10-30%. We provide the complete set of augmented motifs and the Scan2S program at http://physiology.med.cornell.edu/go/scan2s. Our results suggest a general protocol for increasing the precision of protein pattern detection via the inclusion of SSCs.  相似文献   

17.
Accurate prediction of tumor progression is key for adaptive therapy and precision medicine. Cancer progression models (CPMs) can be used to infer dependencies in mutation accumulation from cross-sectional data and provide predictions of tumor progression paths. However, their performance when predicting complete evolutionary trajectories is limited by violations of assumptions and the size of available data sets. Instead of predicting full tumor progression paths, here we focus on short-term predictions, more relevant for diagnostic and therapeutic purposes. We examine whether five distinct CPMs can be used to answer the question “Given that a genotype with n mutations has been observed, what genotype with n + 1 mutations is next in the path of tumor progression?” or, shortly, “What genotype comes next?”. Using simulated data we find that under specific combinations of genotype and fitness landscape characteristics CPMs can provide predictions of short-term evolution that closely match the true probabilities, and that some genotype characteristics can be much more relevant than global features. Application of these methods to 25 cancer data sets shows that their use is hampered by a lack of information needed to make principled decisions about method choice. Fruitful use of these methods for short-term predictions requires adapting method’s use to local genotype characteristics and obtaining reliable indicators of performance; it will also be necessary to clarify the interpretation of the method’s results when key assumptions do not hold.  相似文献   

18.
Bordner AJ  Abagyan R 《Proteins》2006,63(3):512-526
Since determining the crystallographic structure of all peptide-MHC complexes is infeasible, an accurate prediction of the conformation is a critical computational problem. These models can be useful for determining binding energetics, predicting the structures of specific ternary complexes with T-cell receptors, and designing new molecules interacting with these complexes. The main difficulties are (1) adequate sampling of the large number of conformational degrees of freedom for the flexible peptide, (2) predicting subtle changes in the MHC interface geometry upon binding, and (3) building models for numerous MHC allotypes without known structures. Whereas previous studies have approached the sampling problem by dividing the conformational variables into different sets and predicting them separately, we have refined the Biased-Probability Monte Carlo docking protocol in internal coordinates to optimize a physical energy function for all peptide variables simultaneously. We also imitated the induced fit by docking into a more permissive smooth grid representation of the MHC followed by refinement and reranking using an all-atom MHC model. Our method was tested by a comparison of the results of cross-docking 14 peptides into HLA-A*0201 and 9 peptides into H-2K(b) as well as docking peptides into homology models for five different HLA allotypes with a comprehensive set of experimental structures. The surprisingly accurate prediction (0.75 A backbone RMSD) for cross-docking of a highly flexible decapeptide, dissimilar to the original bound peptide, as well as docking predictions using homology models for two allotypes with low average backbone RMSDs of less than 1.0 A illustrate the method's effectiveness. Finally, energy terms calculated using the predicted structures were combined with supervised learning on a large data set to classify peptides as either HLA-A*0201 binders or nonbinders. In contrast with sequence-based prediction methods, this model was also able to predict the binding affinity for peptides to a different MHC allotype (H-2K(b)), not used for training, with comparable prediction accuracy.  相似文献   

19.
To describe the supersecondary structure (SSS) of beta sandwich-like proteins (SPs), we introduce a structural unit called the "strandon." A strandon is defined as a set of sequentially consecutive strands connected by hydrogen bonds in 3D structures. Representing beta-proteins as the assembly of strandons exposes the underlying similarities in their SSS and enables us to construct a novel classification scheme of SPs. Classification of all known SPs is based on shared supersecondary structural features and is presented in the SSS database (http://binfs.umdnj.edu/sssdb/). Analysis of the SSS reveals two common specific patterns. The first pattern defines the arrangement of strandons and was found in 95% of all examined SPs. The second pattern establishes the ordering of strands in the protein domain and was observed in 82% of the analyzed SPs. Knowledge of these two patterns that uncover the spatial arrangement of strands will likely prove useful in protein structure prediction.  相似文献   

20.
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号