首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80 % currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25 % was used to train and test the proposed method. The results indicate that overall accuracy of 87.8 % was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89 % at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.  相似文献   

2.
Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing α-helices and β-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in β-sheets and between the turns of α-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 Å. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded β-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed.  相似文献   

3.
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”.  相似文献   

4.
5.
6.

Background

The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies.

Results

The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization.

Conclusions

The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at “http://rna.physics.missouri.edu”.  相似文献   

7.
Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms.This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC).  相似文献   

8.

Background  

The ability to access, search and analyse secondary structures of a large set of known RNA molecules is very important for deriving improved RNA energy models, for evaluating computational predictions of RNA secondary structures and for a better understanding of RNA folding. Currently there is no database that can easily provide these capabilities for almost all RNA molecules with known secondary structures.  相似文献   

9.

Background

It is well established that only a portion of residues that mediate protein-protein interactions (PPIs), the so-called hot spot, contributes the most to the total binding energy, and thus its identification is an important and relevant question that has clear applications in drug discovery and protein design. The experimental identification of hot spots is however a lengthy and costly process, and thus there is an interest in computational tools that can complement and guide experimental efforts.

Principal Findings

Here, we present Presaging Critical Residues in Protein interfaces-Web server (http://www.bioinsilico.org/PCRPi), a web server that implements a recently described and highly accurate computational tool designed to predict critical residues in protein interfaces: PCRPi. PRCPi depends on the integration of structural, energetic, and evolutionary-based measures by using Bayesian Networks (BNs).

Conclusions

PCRPi-W has been designed to provide an easy and convenient access to the broad scientific community. Predictions are readily available for download or presented in a web page that includes among other information links to relevant files, sequence information, and a Jmol applet to visualize and analyze the predictions in the context of the protein structure.  相似文献   

10.
Computational de novo protein structure prediction is limited to small proteins of simple topology. The present work explores an approach to extend beyond the current limitations through assembling protein topologies from idealized α-helices and β-strands. The algorithm performs a Monte Carlo Metropolis simulated annealing folding simulation. It optimizes a knowledge-based potential that analyzes radius of gyration, β-strand pairing, secondary structure element (SSE) packing, amino acid pair distance, amino acid environment, contact order, secondary structure prediction agreement and loop closure. Discontinuation of the protein chain favors sampling of non-local contacts and thereby creation of complex protein topologies. The folding simulation is accelerated through exclusion of flexible loop regions further reducing the size of the conformational search space. The algorithm is benchmarked on 66 proteins with lengths between 83 and 293 amino acids. For 61 out of these proteins, the best SSE-only models obtained have an RMSD100 below 8.0 Å and recover more than 20% of the native contacts. The algorithm assembles protein topologies with up to 215 residues and a relative contact order of 0.46. The method is tailored to be used in conjunction with low-resolution or sparse experimental data sets which often provide restraints for regions of defined secondary structure.  相似文献   

11.
Data Mining of Toxic Chemicals: Structure Patterns and QSAR   总被引:1,自引:0,他引:1  
We take a two-step strategy to explore noncongeneric toxic chemicals from the database RTECS: the screening of structure patterns and the generation of a detailed relationship between structure and activity. An efficient similarity comparison is proposed to screen chemical patterns for further QSAR analysis. Then CoMFA study is carried out on one structure pattern as an example of the implementation, and the result shows that QSAR studies of structure patterns can provide an estimate of the activity as well as a detailed relationship between activity and structure. From the performance of overall procedure, such a stepwise scheme is demonstrated to be feasible and effective to mine a database of toxic chemicals.  相似文献   

12.
目的预测猪肌生成抑制素去信号肽蛋白的二级结构和B细胞优势抗原表位,为生产该蛋白的单克隆抗体、建立噬菌体抗体库、研制针对该基因的表位多肽疫苗、表位核酸疫苗等奠定基础。方法根据猪肌生成抑制素去信号肽蛋白氨基酸序列,应用7种参数和方法分析预测二级结构和抗原表位,包括Garnier-Robson、Chou-Fasman、Karplus-Schulz、Kyte-Doolittle、Emini、Jameson-Wolf及吴氏综合预测方法。结果MSTN去信号肽蛋白存在多个潜在的抗原表位位点,其中B细胞抗原优势表位可能在1-11、41-55、57-64、62-90、99-104、138-144、193-200、202-212、235-243区段或其附近,此结果将为进一步鉴定和合成多肽疫苗和表位核酸疫苗制备抗猪MSTN蛋白抗体提供依据,并为研究MSTN结构和功能奠定基础。  相似文献   

13.
Kun  Wang  Feng  Gao  Renshan  Zhu  Shaoqing  Li  Yingguo  Zhu 《Plant Molecular Biology Reporter》2011,29(3):739-744
Pentatricopeptide repeat protein (PPR) proteins are putative RNA-binding proteins which are particularly prevalent in terrestrial plants. Previous research has reported the great difficulty in purifying soluble PPR proteins in Escherichia coli, therefore hindering further study of their functions. In this paper, we report the use of the pMAL prokaryotic expression system to acquire a soluble expression of a PPR protein, RF1A from rice (Oryza sativa L.). After purification, we identified RF1A by ESI-TOF-MS/MS. We also made an estimation of its secondary structure using the circular dichroism spectroscopy. These results supported the bioinformatic prediction of helical-hairpin model about PPR proteins.  相似文献   

14.
15.
From its origin, the PIR has aspired to support research in computational biology and genomics through the compilation of a comprehensive, quality controlled and well-organized protein sequence information resource. The resource originated with the pioneering work of the late Margaret O. Dayhoff in the early 1960s. Since 1988, the Protein Sequence Database has been maintained collaboratively by PIR-International, an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. The work of the resource is widely distributed and is available on the World Wide Web, via FTP, E-mail server, CD-ROM and magnetic media. It is widely redistributed and incorporated into many other protein sequence data compilations including SWISS-PROT and theEntrezsystem of the NCBI.  相似文献   

16.
Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are critical to research on protein misfolding diseases, such as Alzheimer’s and Parkinson’s, as well as biotechnological production of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from experimental studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 experimentally observed aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of additional information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Experimentally validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways. CPAD is freely available at http://www.iitm.ac.in/bioinfo/CPAD/. The potential applications of CPAD have also been discussed.  相似文献   

17.
Staphylococcus aureus V8 protease has been reported to have a strict specificity for cleavage of the Glu-X bond in ammonium bicarbonate (pH 7.9). With myelin basic protein and one of its major peptic fragments (residues 89-169) as substrates, selective cleavage of Asp(32)-Thr(33), Asp(37)-Ser(38), and Glu(118-Gly(119) bonds was observed, as well as the unusual cleavage of the Gly(127)-Gly(128) bond. The Asp-Glu and Glu-Asn bonds in the sequence of Gln-Asp-Glu-Asn-Pro(81-84) were resistant to V8 protease attack. The following peptides were identified as products of limited cleavage of basic protein by V8 protease: (1-32), (1-37), (33-169), (38-169), (33-118), (38-118), (33-127), (38-127), (119-169), and (128-169). Cleavage of the peptic peptide (89-169) yielded fragments (89-118), (89-127), (119-169), and (128-169). All peptides were identified by amino acid analysis, as well as NH2- and COOH-terminal analyses. Time course studies with basic protein showed that V8 protease initially attacked the bonds between Asp(32) and Thr(33) and Asp(37) and Ser(38). With peptide (89-169) the initial cleavage was between Glu(118) and Gly(119). Peptides (89-118) and (89-127) were encephalitogenic in the Lewis rat. The activity of these peptides in the rat confirms the presence of a minor encephalitogenic site in guinea pig basic protein. Peptide (89-127) was encephalitogenic in the guinea pig, as expected, because it contains the intact encephalitogenic site. V8 protease digestion of basic protein yields some interesting new fragments, not previously available for biologic studies.  相似文献   

18.
Myelin basic protein (MBP) from the Whaler shark (Carcharhinus obscurus) has been purified from acid extracts of a chloroform/methanol pellet from whole brains. The amino acid sequence of the majority of the protein has been determined and compared with the sequences of other MBPs. The shark protein has only 44% homology with the bovine protein, but, in common with other MBPs, it has basic residues distributed throughout the sequence and no extensive segments that are predicted to have an ordered secondary structure in solution. Shark MBP lacks the triproline sequence previously postulated to form a hairpin bend in the molecule. The region containing the putative consensus sequence for encephalitogenicity in the guinea pig contains several substitutions, thus accounting for the lack of activity of the shark protein. Studies of the secondary structure and self-association have shown that shark MBP possesses solution properties similar to those of the bovine protein, despite the extensive differences in primary structure.  相似文献   

19.
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.  相似文献   

20.
Abstract

COSY, TOCSY and NOESY experiments have been used to assign sequentially the 1H 500 MHz NMR spectra of the Hydrophobic Protein of Soybean (HPS). Spin systems identification combined with sequential assignment allowed to identify the proton resonances of this 80 residues protein. Analysis of medium range connectivities showed that its secondary structure involved four helical fragments similarly located as in the structure deduced from X-ray diffraction. This work set the basis for a further fine comparison between the crystal and the solution structures and a dynamical study of HPS in solution. In addition, search of secondary structure similarities showed that the global folding of HPS should be rather similar to that found for non specific Lipid Transfer Proteins (ns-LTP) from vegetal origin. Distributions of the helical fragments along the primary sequences of these two classes of proteins were compared.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号