首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Prediction of the disulfide-bonding state of cysteine in proteins   总被引:5,自引:0,他引:5  
The bonding states of cysteine play important functional and structural roles in proteins. In particular, disulfide bond formation is one of the most important factors influencing the three-dimensional fold of proteins. Proteins of known structure were used to teach computer-simulated neural networks rules for predicting the disulfide-bonding state of a cysteine given only its flanking amino acid sequence. Resulting networks make accurate predictions on sequences different from those used in training, suggesting that local sequence greatly influences cysteines in disulfide bond formation. The average prediction rate after seven independent network experiments is 81.4% for disulfide-bonded and 80.0% for non-disulfide-bonded scenarios. Predictive accuracy is related to the strength of network output activities. Network weights reveal interesting position-dependent amino acid preferences and provide a physical basis for understanding the correlation between the flanking sequence and a cysteine's disulfide-bonding state. Network predictions may be used to increase or decrease the stability of existing disulfide bonds or to aid the search for potential sites to introduce new disulfide bonds.  相似文献   

2.
3.
4.
MOTIVATION: We focus on the prediction of disulfide bridges in proteins starting from their amino acid sequence and from the knowledge of the disulfide bonding state of each cysteine. The location of disulfide bridges is a structural feature that conveys important information about the protein main chain conformation and can therefore help towards the solution of the folding problem. Existing approaches based on weighted graph matching algorithms do not take advantage of evolutionary information. Recursive neural networks (RNN), on the other hand, can handle in a natural way complex data structures such as graphs whose vertices are labeled by real vectors, allowing us to incorporate multiple alignment profiles in the graphical representation of disulfide connectivity patterns. RESULTS: The core of the method is the use of machine learning tools to rank alternative disulfide connectivity patterns. We develop an ad-hoc RNN architecture for scoring labeled undirected graphs that represent connectivity patterns. In order to compare our algorithm with previous methods, we report experimental results on the SWISS-PROT 39 dataset. We find that using multiple alignment profiles allows us to obtain significant prediction accuracy improvements, clearly demonstrating the important role played by evolutionary information. AVAILABILITY: The Web interface of the predictor is available at http://neural.dsi.unifi.it/cysteines  相似文献   

5.
6.
Based on the 639 non-homologous proteins with 2910 cysteine-containing segments of well-resolved three-dimensional structures, a novel approach has been proposed to predict the disulfide-bonding state of cysteines in proteins by constructing a two-stage classifier combining a first global linear discriminator based on their amino acid composition and a second local support vector machine classifier. The overall prediction accuracy of this hybrid classifier for the disulfide-bonding state of cysteines in proteins has scored 84.1% and 80.1%, when measured on cysteine and protein basis using the rigorous jack-knife procedure, respectively. It shows that whether cysteines should form disulfide bonds depends not only on the global structural features of proteins but also on the local sequence environment of proteins. The result demonstrates the applicability of this novel method and provides comparable prediction performance compared with existing methods for the prediction of the oxidation states of cysteines in proteins.  相似文献   

7.
8.
High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.  相似文献   

9.
以2002年4月份的Culled Protein Data Bank数据库中的639条蛋白质多肽链为研究对象,统计分析了其含有的584条二硫键的形成特征,发现半胱氨酸氧化还原状态表现出明显的协同性现象:含有二硫键的蛋白质中几乎所有的半胱氨酸都以氧化态形式存在。这一协同性可以通过蛋白质全局水平上的20种氨基酸组分的百分含量很好地加以说明,由此来预测半胱氨酸的氧化还原状态准确率最高可达84.5%。结果表明半胱氨酸是否形成二硫键主要取决于蛋白质全局的而非局部的结构信息。  相似文献   

10.
在蛋白质结构预测的研究中,一个重要的问题就是正确预测二硫键的连接,二硫键的准确预测可以减少蛋白质构像的搜索空间,有利于蛋白质3D结构的预测,本文将预测二硫键的连接问题转化成对连接模式的分类问题,并成功地将支持向量机方法引入到预测工作中。通过对半胱氨酸局域序列连接模式的分类预测,可以由蛋白质的一级结构序列预测该蛋白质的二硫键的连接。结果表明蛋白质的二硫键的连接与半胱氨酸局域序列连接模式有重要联系,应用支持向量机方法对蛋白质结构的二硫键预测取得了良好的结果。  相似文献   

11.
In the eucaryotic cell, the formation of disulfide bonds takes place in general inside the endoplasmic reticulum which provides a unique folding environment. The DisulfideDB database gathers information about this biological process with structural, evolutionary and neighborhood information on cysteines in proteins. Mining this information with an association rule discovery program permits to extract some strong rules for the prediction of the disulfide-bonding state of cysteines.  相似文献   

12.
We constructed a gene encoding rCAS, recombinant constant and subrepeat protein, modeled after tandem repeats found in the major silk proteins synthesized by aquatic larvae of the midge, Chironomus tentans. Bacterially synthesized rCAS was purified to near homogeneity and characterized by several biochemical and biophysical methods including amino-terminal sequencing, amino acid compositional analysis, sedimentation equilibrium ultracentrifugation, and mass spectrometry. Complementing these techniques with quantitative sulfhydryl assays, we discovered that the four cysteines present in rCAS form two intramolecular disulfide bonds. Mapping studies revealed that the disulfide bonds are heterogeneous. When reduced and denatured rCAS was allowed to refold and its disulfide bonding state monitored, it again adopted a conformation with two intramolecular disulfide bonds. The inherent ability of rCAS to quantitatively form two intramolecular disulfide bonds may reflect a previously unknown feature of the in vivo silk proteins from which it is derived.  相似文献   

13.
The predictive limits of the amino acid composition for the secondary structural content (percentage of residues in the secondary structural states helix, sheet, and coil) in proteins are assessed quantitatively. For the first time, techniques for prediction of secondary structural content are presented which rely on the amino acid composition as the only information on the query protein. In our first method, the amino acid composition of an unknown protein is represented by the best (in a least square sense) linear combination of the characteristic amino acid compositions of the three secondary structural types computed from a learning set of tertiary structures. The second technique is a generalization of the first one and takes into account also possible compositional couplings between any two sorts of amino acids. Its mathematical formulation results in an eigenvalue/eigenvector problem of the second moment matrix describing the amino acid compositional fluctuations of secondary structural types in various proteins of a learning set. Possible correlations of the principal directions of the eigenspaces with physical properties of the amino acids were also checked. For example, the first two eigenvectors of the helical eigenspace correlate with the size and hydrophobicity of the residue types respectively. As learning and test sets of tertiary structures, we utilized representative, automatically generated subsets of Protein Data Bank (PDB) consisting of non-homologous protein structures at the resolution thresholds ≤1.8Å, ≤2.0Å, ≤2.5Å, and ≤3.0Å. We show that the consideration of compositional couplings improves prediction accuracy, albeit not dramatically. Whereas in the self-consistency test (learning with the protein to be predicted), a clear decrease of prediction accuracy with worsening resolution is observed, the jackknife test (leave the predicted protein out) yielded best results for the largest dataset (≤3.0 Å, almost no difference to the self-consistency test!), i.e., only this set, with more than 400 proteins, is sufficient for stable computation of the parameters in the prediction function of the second method. The average absolute error in predicting the fraction of helix, sheet, and coil from amino acid composition of the query protein are 13.7, 12.6, and 11.4%, respectively with r.m.s. deviations in the range of 8.6 ÷ 11.8% for the 3.0 Å dataset in a jackknife test. The absolute precision of the average absolute errors is in the range of 1 ÷ 3% as measured for other representative subsets of the PDB. Secondary structural content prediction methods found in the literature have been clustered in accordance with their prediction accuracies. To our surprise, much more complex secondary structure prediction methods utilized for the same purpose of secondary structural content prediction achieve prediction accuracies very similar to those of the present analytic techniques, implying that all the information beyond the amino acid composition is, in fact, mainly utilized for positioning the secondary structural state in the sequence but not for determination of the overall number of residues in a secondary structural type. This result implies that higher prediction accuracies cannot be achieved relying solely on the amino acid composition of an unknown query protein as prediction input. Our prediction program SSCP has been made available as a World Wide Web and E-mail service. © 1996 Wiley-Liss, Inc.  相似文献   

14.
One of the major contributors to protein structures is the formation of disulphide bonds between selected pairs of cysteines at oxidized state. Prediction of such disulphide bridges from sequence is challenging given that the possible combination of cysteine pairs as the number of cysteines increases in a protein. Here, we describe a SVM (support vector machine) model for the prediction of cystine connectivity in a protein sequence with and without a priori knowledge on their bonding state. We make use of a new encoding scheme based on physico-chemical properties and statistical features (probability of occurrence of each amino acid residue in different secondary structure states along with PSI-blast profiles). We evaluate our method in SPX (an extended dataset of SP39 (swiss-prot 39) and SP41 (swiss-prot 41) with known disulphide information from PDB) dataset and compare our results with the recursive neural network model described for the same dataset.  相似文献   

15.
Cheng J  Saigo H  Baldi P 《Proteins》2006,62(3):617-629
The formation of disulphide bridges between cysteines plays an important role in protein folding, structure, function, and evolution. Here, we develop new methods for predicting disulphide bridges in proteins. We first build a large curated data set of proteins containing disulphide bridges to extract relevant statistics. We then use kernel methods to predict whether a given protein chain contains intrachain disulphide bridges or not, and recursive neural networks to predict the bonding probabilities of each pair of cysteines in the chain. These probabilities in turn lead to an accurate estimation of the total number of disulphide bridges and to a weighted graph matching problem that can be addressed efficiently to infer the global disulphide bridge connectivity pattern. This approach can be applied both in situations where the bonded state of each cysteine is known, or in ab initio mode where the state is unknown. Furthermore, it can easily cope with chains containing an arbitrary number of disulphide bridges, overcoming one of the major limitations of previous approaches. It can classify individual cysteine residues as bonded or nonbonded with 87% specificity and 89% sensitivity. The estimate for the total number of bridges in each chain is correct 71% of the times, and within one from the true value over 94% of the times. The prediction of the overall disulphide connectivity pattern is exact in about 51% of the chains. In addition to using profiles in the input to leverage evolutionary information, including true (but not predicted) secondary structure and solvent accessibility information yields small but noticeable improvements. Finally, once the system is trained, predictions can be computed rapidly on a proteomic or protein-engineering scale. The disulphide bridge prediction server (DIpro), software, and datasets are available through www.igb.uci.edu/servers/psss.html.  相似文献   

16.
A hidden neural network-based method is used to predict the bonding state of cysteines starting from the residue sequence of the protein chain. The method scores as high as 89% and 86% per cysteine residue and per protein, respectively, and in this overcomes other predictors of the same category. We then explore the efficacy of our predictor in computing the disulfide content of the whole proteome of Escherichia coli (K12 and O157), Aeropirum pernix, Thermotoga maritima, and Homo sapiens. We find that the percentage of extracellular disulfide containing proteins is higher than that of intracellular one, and that the human proteome is by far the one with the highest content of sulfur-sulfur linkages in proteins.  相似文献   

17.
Pellequer JL  Chen SW 《Proteins》2006,65(1):192-202
The key issue for disulfide bond engineering is to select the most appropriate location in the protein. By surveying the structure of experimentally engineered disulfide bonds, we found about half of them that have geometry incompatible with any native disulfide bond geometry. To improve the current prediction methods that tend to apply either ideal geometrical or energetical criteria to single three-dimensional structures, we have combined a novel computational protocol with the usage of multiple protein structures to take into account protein backbone flexibility. The multiple structures can be selected from either independently determined crystal structures for identical proteins, models of nuclear magnetic resonance experiments, or crystal structures of homology-related proteins. We have validated our approach by comparing the predictions with known disulfide bonds. The accuracy of prediction for native disulfide bonds reaches 99.6%. In a more stringent test on the reported engineered disulfide bonds, we have obtained a success rate of 93%. Our protocol also determines the oxido-reduction state of a predicted disulfide bond and the corresponding mutational cost. From the energy ranking, the user can easily choose top predicted sites for mutagenesis experiments. Our method provides information about local stability of the engineered disulfide bond surroundings.  相似文献   

18.
Native type III collagen and procollagen were prepared from fetal bovine skin. Examination of the cleavage products produced by digestion with tadpole collagenase demonstrated that the three palpha1(III) chains of type III procollagen were linked together by disulfide bonds occurring at both the amino-terminal and carboxy-terminal portions of the molecule. Type III collagen contained interchain disulfide bonds only in the carboxy-terminal region of the molecule. After digestion of procollagen with bacterial collagenase an amino-terminal, triple-stranded peptide fragment was isolated. The reduced and alkylated chain constituents of this fragment had molecular weights of about 21 000. After digestion of procollagen with cyanogen bromide a related triple-stranded fragment was isolated. The chains of the cyanogen bromide fragment had a molecular weight of about 27 000. When the collagenase-derived peptide was fully reduced and alkylated, it became susceptible to further digestion with bacterial collagenase. This treatment released a fragment of about 97 amino acid residues which contained 12 cystein residues and had an amino acid composition typical for globular proteins. A second, non-helical fragment of about 48 amino acid residues contained three cysteines. This latter fragment is formed from sequences that overlap the amino-terminal region in the collagen alpha1(III) chain by 20 amino acids and possesses an antigenic determinant specific for the alpha1(III) chain. The collagenase-sensitive region exposed by reduction comprised about 33 amino acid residues. It was recovered as a mixture of small peptides. These results indicate that the amino-terminal region of type III procollagen has the same type of structure as the homologous region of type I procollagen. It consists of a globular, a collagen-like and a non-helical domain. Interchain disulfide bonding and the occurrence of cysteines in the non-helical domain are, however, unique for type III procollagen.  相似文献   

19.
鹅白细胞介素 2基因的克隆与分子模型   总被引:1,自引:0,他引:1  
对鸡、鸭、火鸡IL-2的核苷酸序列进行比较,在其保守区设计引物,通过RT-PCR方法扩增和克隆了鹅白介素2 (goIL-2) 的核苷酸序列。该序列由768 nt组成,编码一条由141个氨基酸组成的前体蛋白。goIL-2核苷酸序列和氨基酸序列与鸭IL-2(duIL-2)核苷酸序列和氨基酸序列的同源性为90.1%和83.6%,与鸡、火鸡和鹌鹑IL-2的同源性为69.7%-75%和61.0%-63.1%,与哺乳动物IL-2的同源性为25%-30%和14%-17%。氨基酸序列分析表明,N端存在一长21个氨基酸的信号肽,含有形成2个链内二硫键的4个半胱氨酸。goIL-2 mRNA的体外表达动力学分析表明,脾脏T淋巴细胞经Con A诱导2 h至24 h均可检测到goIL-2 mRNA的表达。三维结构预测表明,goIL-2蛋白由A、B、C、D 4个α-螺旋和2个?-折叠构成。遗传进化分析表明,goIL-2和duIL-2的亲缘关系最近。  相似文献   

20.
Integral membrane proteins (iMPs) are challenging targets for structure determination because of the substantial experimental difficulties involved in their sample preparation. Accordingly, success rates of large-scale structural genomics consortia are much lower for this class of molecules compared to globular targets, underscoring the pressing need for predictive strategies to identify iMPs that are more likely to overcome laboratory bottlenecks. On the basis of the target status information available in the TargetDB repository, we describe the first large-scale analysis of experimental behavior of iMPs. Using information on recalcitrant and propagating iMP targets as negative and positive sets, respectively, we present naive Bayes classifiers capable of predicting, from sequence alone, those proteins that are more amenable to cloning, expression, and solubilization studies. Protein sequences are represented in the space of 72 features, including amino acid composition, occurrence of amino acid groups, ratios between residue groups, and hydrophobicity measures. Taking into account unequal representation of main taxonomic groups in the TargetDB, sequence database had a beneficial effect on the prediction results. The classifiers achieve accuracies of 70%, 63-70%, and 61% in predicting the amenability of iMPs for cloning, expression, and solubilization, respectively, thus making them useful tools in target selection for structure determination. Our assessment of prediction results clearly demonstrates that classifiers based on single features do not possess acceptable discriminative power and that the experimental behavior of iMPs is imprinted in their primary sequence through relationships between a restricted set of key properties. In most cases, sets of 10-20 protein features were found actually relevant, most notably, the content of isoleucine, valine, and positively-charged residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号