共查询到18条相似文献,搜索用时 0 毫秒
1.
蛋白质二级结构预测样本集数据库的设计与实现 总被引:1,自引:0,他引:1
将数据库技术应用到蛋白质二级结构预测的样本集处理和分析上,建立了二级结构预测样本集数据库。以CB513样本集为例介绍了该数据库的构建模式。构建样本数据库不仅便于存储、管理和检索数据,还可以完成一些简单的序列分析工作,取代许多以往必须的编程。从而大大提高了工作效率,减少错误的发生。 相似文献
2.
3.
This paper is concerned with a branch of computational biology related to protein prediction and analysis of secondary structure of proteins. Although traditional methods use a simple amino acid composition to predict the secondary structure content, hydrophobicity has been recently found to improve the results in this and several related prediction tasks. To this end, we propose and analyze advantages of two new hydrophobicity index-based scales that incorporate information about long-range interactions along the protein sequence and contrast them with currently used raw hydrophobic index values. We also compare three leading hydrophobicity indices, i.e., Eisenberg's, Fauchere-Pliska's, and Cid's, using the proposed scales. The analysis is performed using fuzzy cognitive maps that quantify the strength of relation between the hydrophobicity scales/indices and the protein content values. A set of empirical tests that involve generation of fuzzy cognitive map models for a set of 200 low homology proteins have been performed. The results show that the secondary structure content along the protein sequence is characterized by about 2.5 times stronger relation with the two proposed hydrophobicity scales when compared with the currently used raw index values. The new scales exhibit stronger relation irrespective of the applied hydrobhobicity indices. Analysis of different scales shows superiority of the Eisenberg's hydrophobicity index, when used with the new scales. In contrast, the Fauchere-Pliska's index is found to perform better when compared with the two other indices when using raw hydrophobic index values that disregard the long-range interactions. 相似文献
4.
The effects of the quaternary agent meproadifen on ACh-activated channel currents were studied on myoballs cultured from hind limb muscles of neonatal rats. Meproadifen (0.02-0.1 microM) combined with ACh (0.1-0.3 microM) in the patch pipette caused an increase, followed by a decrease, in the frequency of channel openings. At concentrations greater than 0.2 microM the initial phase was not detected and a rapid and marked reduction in the opening frequency was observed. Meproadifen (up to 2.5 microM) produced no change in the duration or conductance of the open state of ACh-activated channels. In addition, this agent induced the appearance of events with a marked increase in the 'noise' during the opening phase. The lack of effect under inside-out patch conditions suggested that meproadifen binds to a site located at the external portion of the nicotinic macromolecule and has no access to it through the cell membrane. This study indicated that non-competitive antagonists such as meproadifen can facilitate receptor activation and desensitization. 相似文献
5.
Secondary structures of proteins have been predicted using neural networks from their Fourier transform infrared spectra. To improve the generalization ability of the neural networks, the training data set has been artificially increased by linear interpolation. The leave-one-out approach has been used to demonstrate the applicability of the method. Bayesian regularization has been used to train the neural networks and the predictions have been further improved by the maximum-likelihood estimation method. The networks have been tested and standard error of prediction (SEP) of 4.19% for alpha helix, 3.49% for beta sheet, and 3.15% for turns have been achieved. The results indicate that there is a significant decrease in the SEP for each type of structure parameter compared to previous works. 相似文献
6.
Strong contribution of the aromatic amino acid side chain chromophores to the far-UV circular dichroism (CD) spectra substantially distorts a relatively weak CD signal originating from beta sheet, the main type of immunoglobulin secondary structure. In this study we compared the secondary structure calculated from the far-UV CD spectra with the X-ray data for three antibody Fab fragments. Calculations were performed with three different algorithms, using two sets of reference proteins. Low standard deviations between all six estimates indicate stable mathematical solutions. Despite pronounced differences in the shape and amplitude of the CD spectra, we found a strong correlation between CD and X-ray data in the secondary structure for every protein studied. The number and average length of the secondary structure elements estimated from the CD spectra closely resemble those of the X-ray data. Agreement between spectroscopic and crystallographic results demonstrates that modern methods of secondary structure calculation are resilient to distortions of the far-UV CD spectra of immunoglobulins caused by aromatic side chain chromophores. 相似文献
7.
Costantini S Colonna G Facchiano AM 《Biochemical and biophysical research communications》2006,342(2):441-451
Amino acid propensities for secondary structures were used since the 1970s, when Chou and Fasman evaluated them within datasets of few tens of proteins and developed a method to predict secondary structure of proteins, still in use despite prediction methods having evolved to very different approaches and higher reliability. Propensity for secondary structures represents an intrinsic property of amino acid, and it is used for generating new algorithms and prediction methods, therefore our work has been aimed to investigate what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, i.e., all-alpha, all-beta, alpha-beta proteins. As a first analysis, we evaluated amino acid propensities for helix, beta-strand, and coil in more than 2000 proteins from the PDBselect dataset. With these propensities, secondary structure predictions performed with a method very similar to that of Chou and Fasman gave us results better than the original one, based on propensities derived from the few tens of X-ray protein structures available in the 1970s. In a refined analysis, we subdivided the PDBselect dataset of proteins in three secondary structural classes, i.e., all-alpha, all-beta, and alpha-beta proteins. For each class, the amino acid propensities for helix, beta-strand, and coil have been calculated and used to predict secondary structure elements for proteins belonging to the same class by using resubstitution and jackknife tests. This second round of predictions further improved the results of the first round. Therefore, amino acid propensities for secondary structures became more reliable depending on the degree of homogeneity of the protein dataset used to evaluate them. Indeed, our results indicate also that all algorithms using propensities for secondary structure can be still improved to obtain better predictive results. 相似文献
8.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server. 相似文献
9.
Summary A simple technique for identifying protein secondary structures through the analysis of backbone 13C chemical shifts is described. It is based on the Chemical-Shift Index [Wishart et al. (1992) Biochemistry, 31, 1647–1651] which was originally developed for the analysis of 1H chemical shifts. By extending the Chemical-Shift Index to include 13C, 13C and carbonyl 13C chemical shifts, it is now possible to use four independent chemical-shift measurements to identify and locate protein secondary structures. It is shown that by combining both 1H and 13C chemical-shift indices to produce a consensus estimate of secondary structure, it is possible to achieve a predictive accuracy in excess of 92%. This suggests that the secondary structure of peptides and proteins can be accurately obtained from 1H and 13C chemical shifts, without recourse to NOE measurements.Supplementary material is available in the form of a 10-page table (Table S1) describing the exact location of secondary structures in all 20 proteins as determined using the methods described in this paper. Requests for Table S1 should be directed to the authors. 相似文献
10.
When a protein sequence does not share any significant sequence similarity with a protein of known structure, homology modeling cannot be applied. However, many novel and interesting methods, such as secondary structure prediction, fold recognition, and prediction of long-range interactions, are being developed and have been shown to be reasonably successful in predicting protein structures from sequence data and evolutionary information. The a priori evaluation of the correctness of a prediction obtained by one of these methods is however often problematic. Consequently, it is important to use all available information provided by as many different methods as possible and all the available experimental data about the protein of interest, since the consistency of the results is indicative of the reliability of the prediction. Hence the need has arisen for suitable tools able to compare results provided by different methods and evaluate their consistency. We have therefore constructed GLASS, a general platform to read, visualize, compare, and evaluate prediction results from many different sources and to project these prediction results into three dimensions. In addition, GLASS allows the comparison of selected parameters calculated for a model with the distribution observed in real protein structures, thus providing an easy way to test new methods for evaluating the likelihood of different structural models. GLASS can be considered as a “workbench” for structural predictions useful to both experimentalists and theoreticians. Proteins 30:339–351, 1998. © 1998 Wiley-Liss, Inc. 相似文献
11.
A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.
下载免费PDF全文

P. K. Mehta J. Heringa P. Argos 《Protein science : a publication of the Protein Society》1995,4(12):2517-2525
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins. 相似文献
12.
Zhang W Xiao W Wei H Zhang J Tian Z 《Biochemical and biophysical research communications》2006,349(1):69-78
Codon usage and thermodynamic optimization of the 5'-end of mRNA have been applied to improve the efficiency of human protein production in Escherichia coli. However, high level expression of human protein in E. coli is still a challenge that virtually depends upon each individual target genes. Using human interleukin 10 (huIL-10) and interferon alpha (huIFN-alpha) coding sequences, we systematically analyzed the influence of several major factors on expression of human protein in E. coli. The results from huIL-10 and reinforced by huIFN-alpha showed that exposing AUG initiator codon from base-paired structure within mRNA itself significantly improved the translation of target protein, which resulted in a 10-fold higher protein expression than the wild-type genes. It was also noted that translation process was not affected by the retained short-range stem-loop structure at Shine-Dalgarno (SD) sequences. On the other hand, codon-optimized constructs of huIL-10 showed unimproved levels of protein expression, on the contrary, led to a remarkable RNA degradation. Our study demonstrates that exposure of AUG initiator codon from long-range intra-strand secondary structure at 5'-end of mRNA may be used as a general strategy for human protein production in E. coli. 相似文献
13.
- Download : Download high-res image (84KB)
- Download : Download full-size image
14.
The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. 相似文献
15.
Detailed structural analysis of protein necessitates investigation at primary, secondary and tertiary levels, respectively. Insight into protein secondary structures pave way for understanding the type of secondary structural elements involved (α-helices, β-strands etc.), the amino acid sequence that encode the secondary structural elements, number of residues, length and, percentage composition of the respective elements in the protein. Here we present a standalone tool entitled "ExSer" which facilitate an automated extraction of the amino acid sequence that encode for the secondary structural regions of a protein from the protein data bank (PDB) file. AVAILABILITY: ExSer is freely downloadable from http://code.google.com/p/tool-exser/ 相似文献
16.
R E Martenson 《Journal of neurochemistry》1983,40(4):951-968
The amino acid sequence of the P2 protein of peripheral myelin was analyzed with regard to regions of probable alpha-helix, beta-structure, beta-turn, and unordered conformation by means of several algorithms commonly used to predict secondary structure in proteins. Because of the high beta-sheet content and virtual absence of alpha-helix shown by the circular dichroic spectra of the protein, a bias was introduced into the algorithms to favor the beta-structure over the alpha-helical conformation. In order to define those beta-sheet residues that could lie on the external hydrophilic surface of the protein and those that could lie in its hydrophobic interior, the predicted beta-strands were examined for charged and uncharged amino acids located at alternating positions in the sequence. The sequential beta-strands in the predicted secondary structure were then ordered into beta-sheets and aligned according to generally accepted tertiary folding principles and certain chemical properties peculiar to the P2 protein. The general model of the P2 protein that emerged was a "Greek key" beta-barrel, consisting of eight antiparallel beta-strands with a two-stranded ribbon of antiparallel beta-structure emerging from one end. The model has an uncharged, hydrophobic core and a highly hydrophilic surface. The two Cys residues, which form a disulfide, occur in a loop connecting two adjacent antiparallel strands. Two hydrophilic loops, each containing a cluster of acidic residues and a single Phe, protrude from one end of the molecule. The general model is consistent with many of the properties of the actual protein, including the relatively weak nature of its association with myelin lipids and the positions of amino acid substitutions. Alternative beta-strand orderings yield three specific models having different interstrand connections across the barrel ends. 相似文献
17.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure
of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed
with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated
secondary structure for 408 selected nonhomologous proteins. To the amino acid triplets which are not found in the selected
dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of
parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino
acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used
to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP)
was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage
of prediction accuracy for αhelix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the
proteins in the selected dataset. For Β-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP,
NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD
method but much better than NNPREDICT method. 相似文献
18.
Subbotin SA Sturhan D Vovlas N Castillo P Tambe JT Moens M Baldwin JG 《Molecular phylogenetics and evolution》2007,43(3):881-890
Knowledge of rRNA structure is increasingly important to assist phylogenetic analysis through reconstructing optimal alignment, utilizing molecule features as an additional source of data and refining appropriate models of evolution of the molecule. We describe a procedure of optimization for alignment and a new coding method for nucleotide sequence data using secondary structure models of the D2 and D3 expansion fragments of the LSU-rRNA gene reconstructed for fifteen nematode species of the agriculturally important and diverse family Hoplolaimidae, order Tylenchida. Using secondary structure information we converted the original sequence data into twenty-eight symbol codes and submitted the transformed data to maximum parsimony analysis. We also applied the original sequence data set for Bayesian inference. This used the doublet model with sixteen states of nucleotide doublets for the stem region and the standard model of DNA substitution with four nucleotide states for loops and bulges. By this approach, we demonstrate that using structural information for phylogenetic analyses led to trees with lower resolved relationships between clades and likely eliminated some artefactual support for misinterpreted relationships, such as paraphyly of Helicotylenchus or Rotylenchus. This study as well as future phylogenetic analyses is herein supported by the development of an on-line database, NEMrRNA, for rRNA molecules in a structural format for nematodes. We also have developed a new computer program, RNAstat, for calculation of nucleotide statistics designed and proposed for phylogenetic studies. 相似文献