共查询到20条相似文献,搜索用时 8 毫秒
1.
Sen TZ Cheng H Kloczkowski A Jernigan RL 《Protein science : a publication of the Protein Society》2006,15(11):2499-2506
The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q(3) ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments. 相似文献
2.
3.
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80 % currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25 % was used to train and test the proposed method. The results indicate that overall accuracy of 87.8 % was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89 % at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures. 相似文献
4.
Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing α-helices and β-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in β-sheets and between the turns of α-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 Å. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded β-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed. 相似文献
5.
Asparagine residues in proteins undergo spontaneous deamidation, a post-translational modification that may act as a molecular clock for the regulation of protein function and turnover. Asparagine deamidation is modulated by protein local sequence, secondary structure and hydrogen bonding. We present NGOME, an algorithm able to predict non-enzymatic deamidation of internal asparagine residues in proteins in the absence of structural data, using sequence-based predictions of secondary structure and intrinsic disorder. Compared to previous algorithms, NGOME does not require three-dimensional structures yet yields better predictions than available sequence-only methods. Four case studies of specific proteins show how NGOME may help the user identify deamidation-prone asparagine residues, often related to protein gain of function, protein degradation or protein misfolding in pathological processes. A fifth case study applies NGOME at a proteomic scale and unveils a correlation between asparagine deamidation and protein degradation in yeast. NGOME is freely available as a webserver at the National EMBnet node Argentina, URL: http://www.embnet.qb.fcen.uba.ar/ in the subpage “Protein and nucleic acid structure and sequence analysis”. 相似文献
6.
《Journal of molecular biology》1996,263(2):196-208
Previously proposed methods for protein secondary structure prediction from multiple sequence alignments do not efficiently extract the evolutionary information that these alignments contain. The predictions of these methods are less accurate than they could be, because of their failure to consider explicitly the phylogenetic tree that relates aligned protein sequences. As an alternative, we present a hidden Markov model approach to secondary structure prediction that more fully uses the evolutionary information contained in protein sequence alignments. A representative example is presented, and three experiments are performed that illustrate how the appropriate representation of evolutionary relatedness can improve inferences. We explain why similar improvement can be expected in other secondary structure prediction methods and indeed any comparative sequence analysis method. 相似文献
7.
8.
9.
Background
The ability to access, search and analyse secondary structures of a large set of known RNA molecules is very important for deriving improved RNA energy models, for evaluating computational predictions of RNA secondary structures and for a better understanding of RNA folding. Currently there is no database that can easily provide these capabilities for almost all RNA molecules with known secondary structures. 相似文献10.
Background
The ever increasing discovery of non-coding RNAs leads to unprecedented demand for the accurate modeling of RNA folding, including the predictions of two-dimensional (base pair) and three-dimensional all-atom structures and folding stabilities. Accurate modeling of RNA structure and stability has far-reaching impact on our understanding of RNA functions in human health and our ability to design RNA-based therapeutic strategies.Results
The Vfold server offers a web interface to predict (a) RNA two-dimensional structure from the nucleotide sequence, (b) three-dimensional structure from the two-dimensional structure and the sequence, and (c) folding thermodynamics (heat capacity melting curve) from the sequence. To predict the two-dimensional structure (base pairs), the server generates an ensemble of structures, including loop structures with the different intra-loop mismatches, and evaluates the free energies using the experimental parameters for the base stacks and the loop entropy parameters given by a coarse-grained RNA folding model (the Vfold model) for the loops. To predict the three-dimensional structure, the server assembles the motif scaffolds using structure templates extracted from the known PDB structures and refines the structure using all-atom energy minimization.Conclusions
The Vfold-based web server provides a user friendly tool for the prediction of RNA structure and stability. The web server and the source codes are freely accessible for public use at “http://rna.physics.missouri.edu”. 相似文献11.
Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms.This research was supported in part by the Natural Sciences and Engineering Research Council of Canada (NSERC). 相似文献
12.
Background
It is well established that only a portion of residues that mediate protein-protein interactions (PPIs), the so-called hot spot, contributes the most to the total binding energy, and thus its identification is an important and relevant question that has clear applications in drug discovery and protein design. The experimental identification of hot spots is however a lengthy and costly process, and thus there is an interest in computational tools that can complement and guide experimental efforts.Principal Findings
Here, we present Presaging Critical Residues in Protein interfaces-Web server (http://www.bioinsilico.org/PCRPi), a web server that implements a recently described and highly accurate computational tool designed to predict critical residues in protein interfaces: PCRPi. PRCPi depends on the integration of structural, energetic, and evolutionary-based measures by using Bayesian Networks (BNs).Conclusions
PCRPi-W has been designed to provide an easy and convenient access to the broad scientific community. Predictions are readily available for download or presented in a web page that includes among other information links to relevant files, sequence information, and a Jmol applet to visualize and analyze the predictions in the context of the protein structure. 相似文献13.
14.
Mert Karaka? Nils Woetzel Rene Staritzbichler Nathan Alexander Brian E. Weiner Jens Meiler 《PloS one》2012,7(11)
Computational de novo protein structure prediction is limited to small proteins of simple topology. The present work explores an approach to extend beyond the current limitations through assembling protein topologies from idealized α-helices and β-strands. The algorithm performs a Monte Carlo Metropolis simulated annealing folding simulation. It optimizes a knowledge-based potential that analyzes radius of gyration, β-strand pairing, secondary structure element (SSE) packing, amino acid pair distance, amino acid environment, contact order, secondary structure prediction agreement and loop closure. Discontinuation of the protein chain favors sampling of non-local contacts and thereby creation of complex protein topologies. The folding simulation is accelerated through exclusion of flexible loop regions further reducing the size of the conformational search space. The algorithm is benchmarked on 66 proteins with lengths between 83 and 293 amino acids. For 61 out of these proteins, the best SSE-only models obtained have an RMSD100 below 8.0 Å and recover more than 20% of the native contacts. The algorithm assembles protein topologies with up to 215 residues and a relative contact order of 0.46. The method is tailored to be used in conjunction with low-resolution or sparse experimental data sets which often provide restraints for regions of defined secondary structure. 相似文献
15.
SARS病毒M蛋白的二级结构和B细胞表位预测 总被引:4,自引:0,他引:4
以SARS病毒基因组序列为基础,采用GarnierRobson方法、ChouFasman方法和KarplusSchulz方法预测蛋白质的二级结构;按KyteDoolittle方案、Emini方案和JamesonWolf方案预测SARS病毒M蛋白的B细胞表位。预测结果表明,在SARS病毒M蛋白N端第11~20、27~36区段和第133~141区段可能是α螺旋中心;M蛋白分子N端第20~27、34~37,44~56,61~64,70~76,79~97,117~132,142~147,165~176区段和第216~221区段可能是β折叠中心。在M蛋白N端第5~6、40~44、105~107、112~116、189~190、202~203区段和第210~215区段具有较柔软的结构,有可能进行一定幅度的摆动或折叠而形成较复杂的三级结构。SARS病毒M蛋白N端第1~15、37~47、99~120、181~192区段和第196~215区段内或附近很可能是B细胞表位优势区域。以蛋白质的二级结构预测作为辅助手段,用抗原指数,亲水性参数和可及性参数预测SARS冠状病毒M蛋白的B细胞表位,为实验确定SARS病毒M蛋白的B细胞表位和免疫识别研究奠定了基础 。 相似文献
16.
奥利亚罗非鱼DMO和DMT蛋白二级结构和B细胞抗原表位的预测 总被引:1,自引:0,他引:1
以DMO和DMT氨基酸序列为基础,采用Garnier-Robson法、Chou-Fasman法和Karplus-Schulz法预测蛋白质的二级结构;按Kyte-Doolittle法、Emini法和Jameson-Wolf法预测DMO和DMT蛋白的B细胞抗原表位。预测结果表明:在DMO蛋白N-端第80~112,144~147,193~194,251~255,260~269区段和279~283区段,DMT蛋白N-端61~86,98~105,140~146,239~241区段和第269~273区段,可能是α-螺旋中心;DMO蛋白N-端第59~61,69~70,148~150区段和383~390区段,DMT蛋白的N-端第125~129,207~213,255~264区段和第281~284区段,可能是β-折叠中心;在DMO蛋白分子N-端40~41, 44~45,50~51,128~129,189~192,204~207,216~222,226~233,244~246,298~299区段和第323~326区段和DMT蛋白分子N-端第12~13,26~27,43~44,58~60,93~95,115~120,136~139区段和第149~151区段具有较柔软的结构,这些区段有可能进行一定幅度的摆动或折叠而形成较复杂的三级结构。DMO蛋白N-端第1~5,41~51,65~67,86~89,98~110,154~170,183~203,205~248,258~264,284~291,293~298,270~375,389~392,402~410区域和DMT蛋白N-端第1~9,17~28,77~84,114~123,131~139,157~184,196~207区域可能是B细胞表位优势区域。以蛋白质的二级结构预测作为辅助手段,用抗原指数,亲水性参数和可及性参数预测DMO和DMT蛋白的B细胞表位,为DMO和DMT蛋白单克隆抗体的制备提供了线索,为系统研究奥利亚罗非鱼DMO和DMT基因的性别调控机理研究提供参考。 相似文献
17.
Data Mining of Toxic Chemicals: Structure Patterns and QSAR 总被引:1,自引:0,他引:1
We take a two-step strategy to explore noncongeneric toxic chemicals from the database RTECS: the screening of structure patterns and the generation of a detailed relationship between structure and activity. An efficient similarity comparison is proposed to screen chemical patterns for further QSAR analysis. Then CoMFA study is carried out on one structure pattern as an example of the implementation, and the result shows that QSAR studies of structure patterns can provide an estimate of the activity as well as a detailed relationship between activity and structure. From the performance of overall procedure, such a stepwise scheme is demonstrated to be feasible and effective to mine a database of toxic chemicals. 相似文献
18.
Kun Wang Feng Gao Renshan Zhu Shaoqing Li Yingguo Zhu 《Plant Molecular Biology Reporter》2011,29(3):739-744
Pentatricopeptide repeat protein (PPR) proteins are putative RNA-binding proteins which are particularly prevalent in terrestrial
plants. Previous research has reported the great difficulty in purifying soluble PPR proteins in Escherichia coli, therefore hindering further study of their functions. In this paper, we report the use of the pMAL
™
prokaryotic expression system to acquire a soluble expression of a PPR protein, RF1A from rice (Oryza sativa L.). After purification, we identified RF1A by ESI-TOF-MS/MS. We also made an estimation of its secondary structure using
the circular dichroism spectroscopy. These results supported the bioinformatic prediction of helical-hairpin model about PPR
proteins. 相似文献
19.
猪肌生成抑制素基因去信号肽蛋白二级构预测和B细胞表位分析 总被引:1,自引:0,他引:1
目的预测猪肌生成抑制素去信号肽蛋白的二级结构和B细胞优势抗原表位,为生产该蛋白的单克隆抗体、建立噬菌体抗体库、研制针对该基因的表位多肽疫苗、表位核酸疫苗等奠定基础。方法根据猪肌生成抑制素去信号肽蛋白氨基酸序列,应用7种参数和方法分析预测二级结构和抗原表位,包括Garnier-Robson、Chou-Fasman、Karplus-Schulz、Kyte-Doolittle、Emini、Jameson-Wolf及吴氏综合预测方法。结果MSTN去信号肽蛋白存在多个潜在的抗原表位位点,其中B细胞抗原优势表位可能在1-11、41-55、57-64、62-90、99-104、138-144、193-200、202-212、235-243区段或其附近,此结果将为进一步鉴定和合成多肽疫苗和表位核酸疫苗制备抗猪MSTN蛋白抗体提供依据,并为研究MSTN结构和功能奠定基础。 相似文献
20.
The Protein Information Resource (PIR) and the PIR-International Protein Sequence Database. 总被引:1,自引:0,他引:1 下载免费PDF全文
D G George R J Dodson J S Garavelli D H Haft L T Hunt C R Marzec B C Orcutt K E Sidman G Y Srinivasarao L S Yeh L M Arminski R S Ledley A Tsugita W C Barker 《Nucleic acids research》1997,25(1):24-28
From its origin, the PIR has aspired to support research in computational biology and genomics through the compilation of a comprehensive, quality controlled and well-organized protein sequence information resource. The resource originated with the pioneering work of the late Margaret O. Dayhoff in the early 1960s. Since 1988, the Protein Sequence Database has been maintained collaboratively by PIR-International, an association of macromolecular sequence data collection centers dedicated to fostering international cooperation as an essential element in the development of scientific databases. The work of the resource is widely distributed and is available on the World Wide Web, via FTP, E-mail server, CD-ROM and magnetic media. It is widely redistributed and incorporated into many other protein sequence data compilations including SWISS-PROT and theEntrezsystem of the NCBI. 相似文献