期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Prediction of protein homo-oligomer types by pseudo amino acid composition: Approached with an improved feature extraction and Naive Bayes Feature Fusion

Zhang SW Pan Q Zhang HC Shao ZC Shi JY 《Amino acids》2006,30(4):461-468

Summary. The interaction of non-covalently bound monomeric protein subunits forms oligomers. The oligomeric proteins are superior to the monomers within the scope of functional evolution of biomacromolecules. Such complexes are involved in various biological processes, and play an important role. It is highly desirable to predict oligomer types automatically from their sequence. Here, based on the concept of pseudo amino acid composition, an improved feature extraction method of weighted auto-correlation function of amino acid residue index and Naive Bayes multi-feature fusion algorithm is proposed and applied to predict protein homo-oligomer types. We used the support vector machine (SVM) as base classifiers, in order to obtain better results. For example, the total accuracies of A, B, C, D and E sets based on this improved feature extraction method are 77.63, 77.16, 76.46, 76.70 and 75.06% respectively in the jackknife test, which are 6.39, 5.92, 5.22, 5.46 and 3.82% higher than that of G set based on conventional amino acid composition method with the same SVM. Comparing with Chou’s feature extraction method of incorporating quasi-sequence-order effect, our method can increase the total accuracy at a level of 3.51 to 1.01%. The total accuracy improves from 79.66 to 80.83% by using the Naive Bayes Feature Fusion algorithm. These results show: 1) The improved feature extraction method is effective and feasible, and the feature vectors based on this method may contain more protein quaternary structure information and appear to capture essential information about the composition and hydrophobicity of residues in the surface patches that buried in the interfaces of associated subunits; 2) Naive Bayes Feature Fusion algorithm and SVM can be referred as a powerful computational tool for predicting protein homo-oligomer types. 相似文献

2.

Prediction of protein relative solvent accessibility with a two-stage SVM approach

Nguyen MN Rajapakse JC 《Proteins》2005,59(1):30-37

Information on relative solvent accessibility (RSA) of amino acid residues in proteins provides valuable clues to the prediction of protein structure and function. A two-stage approach with support vector machines (SVMs) is proposed, where an SVM predictor is introduced to the output of the single-stage SVM approach to take into account the contextual relationships among solvent accessibilities for the prediction. By using the position-specific scoring matrices (PSSMs) generated by PSI-BLAST, the two-stage SVM approach achieves accuracies up to 90.4% and 90.2% on the Manesh data set of 215 protein structures and the RS126 data set of 126 nonhomologous globular proteins, respectively, which are better than the highest published scores on both data sets to date. A Web server for protein RSA prediction using a two-stage SVM method has been developed and is available (http://birc.ntu.edu.sg/~pas0186457/rsa.html). 相似文献

3.

SVM based prediction of RNA‐binding proteins using binding residues and evolutionary information

Manish Kumar M. Michael Gromiha Gajendra P. S. Raghava 《Journal of molecular recognition : JMR》2011,24(2):303-313

相似文献

4.

Prediction of zinc-binding sites in proteins from sequence.

Nanjiang Shu Tuping Zhou Sven Hovm?ller 《Bioinformatics (Oxford, England)》2008,24(6):775-782

MOTIVATION: Motivated by the abundance, importance and unique functionality of zinc, both biologically and physiologically, we have developed an improved method for the prediction of zinc-binding sites in proteins from their amino acid sequences. RESULTS: By combining support vector machine (SVM) and homology-based predictions, our method predicts zinc-binding Cys, His, Asp and Glu with 75% precision (86% for Cys and His only) at 50% recall according to a 5-fold cross-validation on a non-redundant set of protein chains from the Protein Data Bank (PDB) (2727 chains, 235 of which bind zinc). Consequently, our method predicts zinc-binding Cys and His with 10% higher precision at different recall levels compared to a recently published method when tested on the same dataset. AVAILABILITY: The program is available for download at www.fos.su.se/~nanjiang/zincpred/download/ 相似文献

5.

Predicting sub-cellular localization of tRNA synthetases from their primary structures

Panwar B Raghava GP 《Amino acids》2012,42(5):1703-1713

Since endo-symbiotic events occur, all genes of mitochondrial aminoacyl tRNA synthetase (AARS) were lost or transferred from ancestral mitochondrial genome into the nucleus. The canonical pattern is that both cytosolic and mitochondrial AARSs coexist in the nuclear genome. In the present scenario all mitochondrial AARSs are nucleus-encoded, synthesized on cytosolic ribosomes and post-translationally imported from the cytosol into the mitochondria in eukaryotic cell. The site-based discrimination between similar types of enzymes is very challenging because they have almost same physico-chemical properties. It is very important to predict the sub-cellular location of AARSs, to understand the mitochondrial protein synthesis. We have analyzed and optimized the distinguishable patterns between cytosolic and mitochondrial AARSs. Firstly, support vector machines (SVM)-based modules have been developed using amino acid and dipeptide compositions and achieved Mathews correlation coefficient (MCC) of 0.82 and 0.73, respectively. Secondly, we have developed SVM modules using position-specific scoring matrix and achieved the maximum MCC of 0.78. Thirdly, we developed SVM modules using N-terminal, intermediate residues, C-terminal and split amino acid composition (SAAC) and achieved MCC of 0.82, 0.70, 0.39 and 0.86, respectively. Finally, a SVM module was developed using selected attributes of split amino acid composition (SA-SAAC) approach and achieved MCC of 0.92 with an accuracy of 96.00%. All modules were trained and tested on a non-redundant data set and evaluated using fivefold cross-validation technique. On the independent data sets, SA-SAAC based prediction model achieved MCC of 0.95 with an accuracy of 97.77%. The web-server 'MARSpred' based on above study is available at http://www.imtech.res.in/raghava/marspred/. 相似文献

6.

Design of accurate predictors for DNA-binding sites in proteins using hybrid SVM-PSSM method

Ho SY Yu FC Chang CY Huang HL 《Bio Systems》2007,90(1):234-241

In this paper, we investigate the design of accurate predictors for DNA-binding sites in proteins from amino acid sequences. As a result, we propose a hybrid method using support vector machine (SVM) in conjunction with evolutionary information of amino acid sequences in terms of their position-specific scoring matrices (PSSMs) for prediction of DNA-binding sites. Considering the numbers of binding and non-binding residues in proteins are significantly unequal, two additional weights as well as SVM parameters are analyzed and adopted to maximize net prediction (NP, an average of sensitivity and specificity) accuracy. To evaluate the generalization ability of the proposed method SVM-PSSM, a DNA-binding dataset PDC-59 consisting of 59 protein chains with low sequence identity on each other is additionally established. The SVM-based method using the same six-fold cross-validation procedure and PSSM features has NP=80.15% for the training dataset PDNA-62 and NP=69.54% for the test dataset PDC-59, which are much better than the existing neural network-based method by increasing the NP values for training and test accuracies up to 13.45% and 16.53%, respectively. Simulation results reveal that SVM-PSSM performs well in predicting DNA-binding sites of novel proteins from amino acid sequences. 相似文献

7.

Structure-based identification of catalytic residues

Yahalom R Reshef D Wiener A Frankel S Kalisman N Lerner B Keasar C 《Proteins》2011,79(6):1952-1963

The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/～meshi/functionPrediction. 相似文献

8.

Thin-layer separation and quantification of bile acids

William T. Beher Sofia Stradnieks Grace J. Lin 《Steroids》1982,39(3):313-323

A method has been developed for quantification of total free and conjugated bile acids separated on silica gel HR coated thin-layer chromatography plates. Aliquots of bile acid solutions are applied to channeled plates which are developed with either ethyl acetate: isooctane: glacial acetic acid 10:10:2 v/v for free bile acid separation, or chloroform:methanol:glacial acetic acid:water 130:50:4:8 v/v for conjugated bile acid separation. Bile acids are determined directly in serial areas of silica gel by treating gel areas suspended in tris buffer with resazurin reagent. The method is quantitative and as little as 0.1 μg of bile acid is readily determined. Application of the method to determinations of bile acids in crude fecal extracts is described. 相似文献

9.

结合支持向量机和贝叶斯方法进行蛋白质二级结构预测

王宝文王水星刘文远于家新《生物信息学》2010,8(1):75-77,81

组建一个分两个阶段的分类器来进行蛋白质二级结构预测。第一阶段由支持向量机分类器组成,在第二阶段中使用第一阶段已预测的结果来进行贝叶斯判别。预测性能的改进表明了结合支持向量机和贝叶斯方法预测性能优越于单独使用支持向量机的预测性能。同时也证明残基在形成二级结构时是相互影响的。相似文献

10.

苏云金杆菌杀虫晶体蛋白活性预测的支持向量机模型 总被引：4，自引：1，他引：4

林毅蔡福营张光亚《生物工程学报》2007,23(1):127-132

藉均匀设计(UD)方法,构建了苏云金杆菌(Bt)杀虫晶体蛋白氨基酸组成特征与其杀虫活性之间关系的支持向量机(SVM)模型。当惩罚系数为0·01、epsilon值为0·2、gamma值为0·05、域值为0·5时,该模型对Bt杀虫晶体蛋白杀虫活性的预测平均准确率达73%。相似文献

11.

Distribution analyses of chain substituents of lipoteichoic acids by chemical degradation

J Schurek W Fischer 《European journal of biochemistry》1989,186(3):649-655

The lipoteichoic acid from Lactococcus lactis Kiel 48337 was analyzed. It had 61% of its glycerophosphate residues substituted with alpha-D-galactopyranosyl residues. Non-substituted glycerophosphate residues were split off by two alkaline hydrolyses and an intermediate enzymatic phosphomonoester cleavage. The resulting (GalGroP)nGroGal and (GalGroP)nGlc2Gro oligomers were separated by chromatography on DEAE-Sephadex into 10 pairs of molecular species with n from 1 to 10. The relative frequencies of GalGro and these oligomers were close to the values calculated by computer simulation for a random distribution of chain substituents. A similar series of oligomers was obtained in one step by hydrolysis of the lipoteichoic acid with 98% (by vol.) acetic acid. Due to side reactions, the picture was less precise but nevertheless indicative of the same distribution pattern. The data provide indirect evidence that the alanine ester substituents of the native lipoteichoic acid (Ala/P = 0.38) occupy the free positions between the galactosylated oligomers and are therefore themselves distributed randomly. 相似文献

12.

Formation of anhydrosugars in the chemical depolymerization of heparin. 总被引：77，自引：0，他引：77

J E Shively H E Conrad 《Biochemistry》1976,15(18):3932-3942

In the reactions used to break heparin down to mono- and oligosaccharides, androsugars are formed at two stages. The first of these is the well-known cleavage of heparin with nitrous acid to convert the N-sulfated D-glucosamines to anhydro-D-mannose residues; this reaction has been studied in detail. It is demonstrated here that only low pH (less than 2.5) reaction conditions favor the deamination of N-sulfated D-glucosamine residues; the reaction proceeds very slowly at pH 3.5 or above. On the other hand, N-unsubstituted amino sugars are deaminated at a maximum rate at pH 4 with markedly reduced rates at pH2 or pH6. At room temperature solutions of nitrous acid lose one-fourth to one-third of their capacity to deaminate amino sugars in 1 h at all pHs. A low pH nitrous acid reagent which will convert heparin quantitatively to its deamination products in 10 min at room temperature is described, and a comparison of the effectiveness of this reagent with other commonly used nitrous acid reagents is presented. It is also shown that conditions used for acid hydrolysis of heparin convert approximately one-fourth of the L-iduronosyluronic acid 2-sulfate residues to a 2,5-anhydrouronic acid. This product is an artifact of the reaction conditions, and its formation represents one of several pathways followed in the acid-catalyzed cleavage of the glycosidic bond of the sulfated L-idosyluronic acid residues. 相似文献

13.

基于分段氨基酸组成成分的蛋白质相互作用预测

罗丽张绍武陈伟潘泉《生物物理学报》2009,25(4):282-286

蛋白质相互作用研究有助于揭示生命过程的许多本质问题,也有助于疾病预防、诊断,对药物研制具有重要的参考价值。文章首先构建出蛋白质作用数据库,提出分段氨基酸组成成分特征提取方法来预测蛋白质相互作用。10CV检验下,基于支持向量机的3段氨基酸组成成分特征提取方法的预测总精度为86.2%,比传统的氨基酸组成成分方法提高2.31个百分点;采用Guo的数据库和检验方法,3段氨基酸组成成分特征提取方法的预测总精度为90.11%,比Guo的自相关函数特征提取方法提高2.75个百分点,从而表明分段氨基酸组成成分特征提取方法可有效地应用于蛋白质相互作用预测。相似文献

14.

Computational prediction of heme-binding residues by exploiting residue interaction network

Liu R Hu J 《PloS one》2011,6(10):e25560

Computational identification of heme-binding residues is beneficial for predicting and designing novel heme proteins. Here we proposed a novel method for heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures. Comprehensive analysis showed that key residues located in heme-binding regions are generally associated with the nodes with higher degree, closeness and betweenness, but lower clustering coefficient in the network. HemeNet, a support vector machine (SVM) based predictor, was developed to identify heme-binding residues by combining topological features with existing sequence and structural features. The results showed that incorporation of network-based features significantly improved the prediction performance. We also compared the residue interaction networks of heme proteins before and after heme binding and found that the topological features can well characterize the heme-binding sites of apo structures as well as those of holo structures, which led to reliable performance improvement as we applied HemeNet to predicting the binding residues of proteins in the heme-free state. HemeNet web server is freely accessible at http://mleg.cse.sc.edu/hemeNet/. 相似文献

15.

Predicting protein solubility with a hybrid approach by pseudo amino acid composition

Xiaohui N Nana L Feng S Xuehai H Jingbo X Huijuan X 《Protein and peptide letters》2010,17(12):1466-1472

Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chou's pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 0.9288 and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 0.8678 and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison. 相似文献

16.

Structural studies of Bacillus subtilis glutamine synthetase. Further purification, sulfhydryl groups, and the NH2-terminal amino acid sequence.

R Hsu S J Singer P Keim T F Deuel R L Heinrikson 《Archives of biochemistry and biophysics》1977,178(2):644-651

A new procedure for the isolation of Bacillus subtilis glutamine synthetase in a high state of purity is described. Automated Edman degradation of the reduced and carboxy-methylated protein revealed a single NH₂-terminal amino acid sequence: H₂N-Ala-Lys- Tyr-Thr-Arg⁵-Glu-Asp-Ile-Gln-Lys¹⁰-Leu-Val-Ser-Glu-Ser¹⁵-CM-Cys-Val-Thr- Tyr-Ile²⁰-Ser-Leu-Gly-Phe-Ser²⁵-Asn-Ser-Leu-Gly- -. The recovery of phenylthiohydantoin(PTH)-amino acids and the single sequence obtained are consistent with the view that the dodecameric enzyme of molecular weight 600,000 is composed of identical subunits. Earlier observations of multiple sequences (80% PTH-Ala and 20% PTH-Gly as NH₂ terminal residues) appear to have been due to impurities removed by the final purification step described herein, which involves column chromatography on hydroxyapatite. Evidence for the existence of one disulfide bond and two free cysteine residues per subunit of dodecameric glutamine synthetase was obtained by alkylation of the denatured enzyme in the presence and absence of reducing agents. This distribution of the four cysteine residues in the enzyme monomer was confirmed by titration of the enzyme denatured in sodium dodecyl sulfate with 5,5′-dithiobis(2-nitrobenzoic acid). 相似文献

17.

Yeast phenylalanyl-tRNA synthetase. Properties of the histidyl residues.

J P Raffin P Remy 《Biochimica et biophysica acta》1978,520(1):164-174

Reactivity of the histidyl groups of yeast phenylalanyl-tRNA synthetase was studied in the absence or presence of substrates. In the absence of substrates about 10 histidine residues were found to react with similar kinetic constants. Phenylalanine at 10(-3) M was found to protect two histidyl residues; increasing the amino acid concentration to 5 . 10(-3) M resulted in the protection of two more histidyl groups. tRNAPhe did not afford any protection to histidine residues, but acylated phenylalanyl-tRNA (Phe-tRNAPhe) protected two of the four histidyl groups already protected by phenylalanine. These results suggest the existence of two different sets of accepting sites for phenylalanine: one specific for the free amino acid, the other one specific for the amino acid linked to the tRNA, but being accessible to free phenylalanine, with a somewhat lower binding constant, ATP was found to mask around four histidyl residues against diethylpyrocarbonate modification. By photoirradiation of enzyme-phenylalanine complex in the presence of rose bengale, a significant amount of amino acid was bound to the alpha subunit (Mr = 73 000) of phenylalanyl-tRNA synthetase, confirming that the amino acid binding site is located on this subunit, as previously suggested by modification of thiol groups. Upon irradiation of an enzyme-tRNA complex, almost no covalent binding of tRNA occurred during enzyme inactivation, suggesting that the histidyl residues involved in the enzymic activity are not required for tRNA binding. 相似文献

18.

Identification of catalytic residues from protein structure using support vector machine with sequence and structural features

Pugalenthi G Kumar KK Suganthan PN Gangal R 《Biochemical and biophysical research communications》2008,367(3):630-634

Identification of catalytic residues can provide valuable insights into protein function. With the increasing number of protein 3D structures having been solved by X-ray crystallography and NMR techniques, it is highly desirable to develop an efficient method to identify their catalytic sites. In this paper, we present an SVM method for the identification of catalytic residues using sequence and structural features. The algorithm was applied to the 2096 catalytic residues derived from Catalytic Site Atlas database. We obtained overall prediction accuracy of 88.6% from 10-fold cross validation and 95.76% from resubstitution test. Testing on the 254 catalytic residues shows our method can correctly predict all 254 residues. This result suggests the usefulness of our approach for facilitating the identification of catalytic residues from protein structures. 相似文献

19.

Prediction of transmembrane regions of beta-barrel proteins using ANN- and SVM-based methods

Natt NK Kaur H Raghava GP 《Proteins》2004,56(1):11-18

This article describes a method developed for predicting transmembrane beta-barrel regions in membrane proteins using machine learning techniques: artificial neural network (ANN) and support vector machine (SVM). The ANN used in this study is a feed-forward neural network with a standard back-propagation training algorithm. The accuracy of the ANN-based method improved significantly, from 70.4% to 80.5%, when evolutionary information was added to a single sequence as a multiple sequence alignment obtained from PSI-BLAST. We have also developed an SVM-based method using a primary sequence as input and achieved an accuracy of 77.4%. The SVM model was modified by adding 36 physicochemical parameters to the amino acid sequence information. Finally, ANN- and SVM-based methods were combined to utilize the full potential of both techniques. The accuracy and Matthews correlation coefficient (MCC) value of SVM, ANN, and combined method are 78.5%, 80.5%, and 81.8%, and 0.55, 0.63, and 0.64, respectively. These methods were trained and tested on a nonredundant data set of 16 proteins, and performance was evaluated using "leave one out cross-validation" (LOOCV). Based on this study, we have developed a Web server, TBBPred, for predicting transmembrane beta-barrel regions in proteins (available at http://www.imtech.res.in/raghava/tbbpred). 相似文献

20.

A two-stage classifier for identification of protein-protein interface residues

Yan C Dobbs D Honavar V 《Bioinformatics (Oxford, England)》2004,20(Z1):i371-i378

MOTIVATION: The ability to identify protein-protein interaction sites and to detect specific amino acid residues that contribute to the specificity and affinity of protein interactions has important implications for problems ranging from rational drug design to analysis of metabolic and signal transduction networks. RESULTS: We have developed a two-stage method consisting of a support vector machine (SVM) and a Bayesian classifier for predicting surface residues of a protein that participate in protein-protein interactions. This approach exploits the fact that interface residues tend to form clusters in the primary amino acid sequence. Our results show that the proposed two-stage classifier outperforms previously published sequence-based methods for predicting interface residues. We also present results obtained using the two-stage classifier on an independent test set of seven CAPRI (Critical Assessment of PRedicted Interactions) targets. The success of the predictions is validated by examining the predictions in the context of the three-dimensional structures of protein complexes. 相似文献