期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences

Song J Tan H Wang M Webb GI Akutsu T 《PloS one》2012,7(2):e30361

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the C(α)-N bond (Phi) and the C(α)-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. 相似文献

2.

ANGLOR: a composite machine-learning algorithm for protein backbone torsion angle prediction

Wu S Zhang Y 《PloS one》2008,3(10):e3400

We developed a composite machine-learning based algorithm, called ANGLOR, to predict real-value protein backbone torsion angles from amino acid sequences. The input features of ANGLOR include sequence profiles, predicted secondary structure and solvent accessibility. In a large-scale benchmarking test, the mean absolute error (MAE) of the phi/psi prediction is 28 degrees/46 degrees , which is approximately 10% lower than that generated by software in literature. The prediction is statistically different from a random predictor (or a purely secondary-structure-based predictor) with p-value <1.0 x 10(-300) (or <1.0 x 10(-148)) by Wilcoxon signed rank test. For some residues (ILE, LEU, PRO and VAL) and especially the residues in helix and buried regions, the MAE of phi angles is much smaller (10-20 degrees ) than that in other environments. Thus, although the average accuracy of the ANGLOR prediction is still low, the portion of the accurately predicted dihedral angles may be useful in assisting protein fold recognition and ab initio 3D structure modeling. 相似文献

3.

Using predicted shape string to enhance the accuracy of <Emphasis Type="Bold">γ</Emphasis>-turn prediction

Zhu Y Li T Li D Zhang Y Xiong W Sun J Tang Z Chen G 《Amino acids》2012,42(5):1749-1755

Numerous methods for predicting γ-turns in proteins have been developed. However, the results they generally provided are not very good, with a Matthews correlation coefficient (MCC) ≤0.18. Here, an attempt has been made to develop a method to improve the accuracy of γ-turn prediction. First, we employ the geometric mean metric as optimal criterion to evaluate the performance of support vector machine for the highly imbalanced γ-turn dataset. This metric tries to maximize both the sensitivity and the specificity while keeping them balanced. Second, a predictor to generate protein shape string by structure alignment against the protein structure database has been designed and the predicted shape string is introduced as new variable for γ-turn prediction. Based on this perception, we have developed a new method for γ-turn prediction. After training and testing the benchmark dataset of 320 non-homologous protein chains using a fivefold cross-validation technique, the present method achieves excellent performance. The overall prediction accuracy Q _total can achieve 92.2% and the MCC is 0.38, which outperform the existing γ-turn prediction methods. Our results indicate that the protein shape string is useful for predicting protein tight turns and it is reasonable to use the dihedral angle information as a variable for machine learning to predict protein folding. The dataset used in this work and the software to generate predicted shape string from structure database can be obtained from anonymous ftp site freely. 相似文献

4.

Dihedral angle and secondary structure database of short amino acid fragments

下载免费PDF全文

Dayalan S Gooneratne ND Bevinakoppa S Schroder H 《Bioinformation》2006,1(3):78-80

Dihedral angles of amino acids are of considerable importance in protein tertiary structure prediction as they define the backbone of a protein and hence almost define the protein's entire conformation. Most ab initio protein structure prediction methods predict the secondary structure of a protein before predicting the tertiary structure because three-dimensional fold consists of repeating units of secondary structures. Hence, both dihedral angles and secondary structures are important in tertiary structure prediction of proteins. Here we describe a database called DASSD (Dihedral Angle and Secondary Structure Database of Short Amino acid Fragments) that contains dihedral angle values and secondary structure details of short amino acid fragments of lengths 1, 3 and 5. Information stored in this database was extracted from a set of 5,227 non-redundant high resolution (less than 2-angstroms) protein structures. In total, DASSD stores details for about 733,000 fragments. This database finds application in the development of ab initio protein structure prediction methods using fragment libraries and fragment assembly techniques. It is also useful in protein secondary structure prediction.

Availability 相似文献

5.

Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks

Glennie Helles Rasmus Fonseca 《BMC bioinformatics》2009,10(1):338

Background

Predicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments make up nearly 40% of proteins and they do not have any apparent recurrent patterns, which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been done previously, none have, to our knowledge, presented comparable results for the probability distribution of dihedral angles. 相似文献

6.

Prediction of RNA binding sites in a protein using SVM and PSSM profile 总被引：1，自引：0，他引：1

Kumar M Gromiha MM Raghava GP 《Proteins》2008,71(1):189-194

相似文献

7.

BIPSPI+: Mining Type-Specific Datasets of Protein Complexes to Improve Protein Binding Site Prediction

《Journal of molecular biology》2022,434(11):167556

Computational approaches for predicting protein-protein interfaces are extremely useful for understanding and modelling the quaternary structure of protein assemblies. In particular, partner-specific binding site prediction methods allow delineating the specific residues that compose the interface of protein complexes. In recent years, new machine learning and other algorithmic approaches have been proposed to solve this problem. However, little effort has been made in finding better training datasets to improve the performance of these methods. With the aim of vindicating the importance of the training set compilation procedure, in this work we present BIPSPI+, a new version of our original server trained on carefully curated datasets that outperforms our original predictor. We show how prediction performance can be improved by selecting specific datasets that better describe particular types of protein interactions and interfaces (e.g. homo/hetero). In addition, our upgraded web server offers a new set of functionalities such as the sequence-structure prediction mode, hetero- or homo-complex specialization and the guided docking tool that allows to compute 3D quaternary structure poses using the predicted interfaces. BIPSPI+ is freely available at https://bipspi.cnb.csic.es. 相似文献

8.

Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training 总被引：1，自引：0，他引：1

Dor O Zhou Y 《Proteins》2007,66(4):838-845

An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained. 相似文献

9.

Fluctuations of backbone torsion angles obtained from NMR‐determined structures and their prediction

Tuo Zhang Eshel Faraggi Yaoqi Zhou 《Proteins》2010,78(16):3353-3362

相似文献

10.

New Insights into the Interdependence between Amino Acid Stereochemistry and Protein Structure

Alice?Qinhua Zhou Diego Caballero Corey?S. O’Hern Lynne Regan 《Biophysical journal》2013,105(10):2403-2411

To successfully design new proteins and understand the effects of mutations in natural proteins, we must understand the geometric and physicochemical principles underlying protein structure. The side chains of amino acids in peptides and proteins adopt specific dihedral angle combinations; however, we still do not have a fundamental quantitative understanding of why some side-chain dihedral angle combinations are highly populated and others are not. Here we employ a hard-sphere plus stereochemical constraint model of dipeptide mimetics to enumerate the side-chain dihedral angles of leucine (Leu) and isoleucine (Ile), and identify those conformations that are sterically allowed versus those that are not as a function of the backbone dihedral angles ϕ and ψ. We compare our results with the observed distributions of side-chain dihedral angles in proteins of known structure. With the hard-sphere plus stereochemical constraint model, we obtain agreement between the model predictions and the observed side-chain dihedral angle distributions for Leu and Ile. These results quantify the extent to which local, geometrical constraints determine protein side-chain conformations. 相似文献

11.

Prediction of backbone dihedral angles and protein secondary structure using support vector machines

Petros Kountouris Jonathan D Hirst 《BMC bioinformatics》2009,10(1):437

Background

The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure. 相似文献

12.

Exploiting multi-layered information to iteratively predict protein functions

Zhu W Hou J Chen YP 《Mathematical biosciences》2012,236(2):108-116

BackgroundSimilarity based computational methods are a useful tool for predicting protein functions from protein–protein interaction (PPI) datasets. Although various similarity-based prediction algorithms have been proposed, unsatisfactory prediction results have occurred on many occasions. The purpose of this type of algorithm is to predict functions of an unannotated protein from the functions of those proteins that are similar to the unannotated protein. Therefore, the prediction quality largely depends on how to select a set of proper proteins (i.e., a prediction domain) from which the functions of an unannotated protein are predicted, and how to measure the similarity between proteins. Another issue with existing algorithms is they only believe the function prediction is a one-off procedure, ignoring the fact that interactions amongst proteins are mutual and dynamic in terms of similarity when predicting functions. How to resolve these major issues to increase prediction quality remains a challenge in computational biology.ResultsIn this paper, we propose an innovative approach to predict protein functions of unannotated proteins iteratively from a PPI dataset. The iterative approach takes into account the mutual and dynamic features of protein interactions when predicting functions, and addresses the issues of protein similarity measurement and prediction domain selection by introducing into the prediction algorithm a new semantic protein similarity and a method of selecting the multi-layer prediction domain. The new protein similarity is based on the multi-layered information carried by protein functions. The evaluations conducted on real protein interaction datasets demonstrated that the proposed iterative function prediction method outperformed other similar or non-iterative methods, and provided better prediction results.ConclusionsThe new protein similarity derived from multi-layered information of protein functions more reasonably reflects the intrinsic relationships among proteins, and significant improvement to the prediction quality can occur through incorporation of mutual and dynamic features of protein interactions into the prediction algorithm. 相似文献

13.

Cyclic coordinate descent: A robotics algorithm for protein loop closure

Canutescu AA Dunbrack RL 《Protein science : a publication of the Protein Society》2003,12(5):963-972

In protein structure prediction, it is often the case that a protein segment must be adjusted to connect two fixed segments. This occurs during loop structure prediction in homology modeling as well as in ab initio structure prediction. Several algorithms for this purpose are based on the inverse Jacobian of the distance constraints with respect to dihedral angle degrees of freedom. These algorithms are sometimes unstable and fail to converge. We present an algorithm developed originally for inverse kinematics applications in robotics. In robotics, an end effector in the form of a robot hand must reach for an object in space by altering adjustable joint angles and arm lengths. In loop prediction, dihedral angles must be adjusted to move the C-terminal residue of a segment to superimpose on a fixed anchor residue in the protein structure. The algorithm, referred to as cyclic coordinate descent or CCD, involves adjusting one dihedral angle at a time to minimize the sum of the squared distances between three backbone atoms of the moving C-terminal anchor and the corresponding atoms in the fixed C-terminal anchor. The result is an equation in one variable for the proposed change in each dihedral. The algorithm proceeds iteratively through all of the adjustable dihedral angles from the N-terminal to the C-terminal end of the loop. CCD is suitable as a component of loop prediction methods that generate large numbers of trial structures. It succeeds in closing loops in a large test set 99.79% of the time, and fails occasionally only for short, highly extended loops. It is very fast, closing loops of length 8 in 0.037 sec on average. 相似文献

14.

How Many 3D Structures Do We Need to Train a Predictor？

Pantelis G. Bagos Georgios N. Tsaousis Stavros J. Hamodrakas 《基因组蛋白质组与生物信息学报(英文版)》2009,7(3):128-137

It has been shown that the progress in the determination of membrane protein structure grows exponentially, with approximately the same growth rate as that of the water-soluble proteins. In order to investigate the effect of this, on the performance of prediction algorithms for both α-helical and β-barrel membrane proteins, we conducted a prospective study based on historical records. We trained separate hidden Markov models with different sized training sets and evaluated their performance on topology pred... 相似文献

15.

Reliability of transmembrane predictions in whole-genome data

Käll L Sonnhammer EL 《FEBS letters》2002,532(3):415-418

Transmembrane prediction methods are generally benchmarked on a set of proteins with experimentally verified topology. We have investigated if the accuracy measured on such datasets can be expected in an unbiased genomic analysis, or if there is a bias towards 'easily predictable' proteins in the benchmark datasets. As a measurement of accuracy, the concordance of the results from five different prediction methods was used (TMHMM, PHD, HMMTOP, MEMSAT, and TOPPRED). The benchmark dataset showed significantly higher levels (up to five times) of agreement between different methods than in 10 tested genomes. We have also analyzed which programs are most prone to make mispredictions by measuring the frequency of one-out-of-five disagreeing predictions. 相似文献

16.

CONFOLD: Residue‐residue contact‐guided ab initio protein folding

下载免费PDF全文

Badri Adhikari Debswapna Bhattacharya Renzhi Cao Jianlin Cheng 《Proteins》2015,83(8):1436-1449

Predicted protein residue–residue contacts can be used to build three‐dimensional models and consequently to predict protein folds from scratch. A considerable amount of effort is currently being spent to improve contact prediction accuracy, whereas few methods are available to construct protein tertiary structures from predicted contacts. Here, we present an ab initio protein folding method to build three‐dimensional models using predicted contacts and secondary structures. Our method first translates contacts and secondary structures into distance, dihedral angle, and hydrogen bond restraints according to a set of new conversion rules, and then provides these restraints as input for a distance geometry algorithm to build tertiary structure models. The initially reconstructed models are used to regenerate a set of physically realistic contact restraints and detect secondary structure patterns, which are then used to reconstruct final structural models. This unique two‐stage modeling approach of integrating contacts and secondary structures improves the quality and accuracy of structural models and in particular generates better β‐sheets than other algorithms. We validate our method on two standard benchmark datasets using true contacts and secondary structures. Our method improves TM‐score of reconstructed protein models by 45% and 42% over the existing method on the two datasets, respectively. On the dataset for benchmarking reconstructions methods with predicted contacts and secondary structures, the average TM‐score of best models reconstructed by our method is 0.59, 5.5% higher than the existing method. The CONFOLD web server is available at http://protein.rnet.missouri.edu/confold/ . Proteins 2015; 83:1436–1449. © 2015 Wiley Periodicals, Inc. 相似文献

17.

Prediction of protein solvent accessibility using fuzzy k-nearest neighbor method

Sim J Kim SY Lee J 《Bioinformatics (Oxford, England)》2005,21(12):2844-2849

MOTIVATION: The solvent accessibility of amino acid residues plays an important role in tertiary structure prediction, especially in the absence of significant sequence similarity of a query protein to those with known structures. The prediction of solvent accessibility is less accurate than secondary structure prediction in spite of improvements in recent researches. The k-nearest neighbor method, a simple but powerful classification algorithm, has never been applied to the prediction of solvent accessibility, although it has been used frequently for the classification of biological and medical data. RESULTS: We applied the fuzzy k-nearest neighbor method to the solvent accessibility prediction, using PSI-BLAST profiles as feature vectors, and achieved high prediction accuracies. With leave-one-out cross-validation on the ASTRAL SCOP reference dataset constructed by sequence clustering, our method achieved 64.1% accuracy for a 3-state (buried/intermediate/exposed) prediction (thresholds of 9% for buried/intermediate and 36% for intermediate/exposed) and 86.7, 82.0, 79.0 and 78.5% accuracies for 2-state (buried/exposed) predictions (thresholds of each 0, 5, 16 and 25% for buried/exposed), respectively. Our method also showed slightly better accuracies than other methods by about 2-5% on the RS126 dataset and a benchmarking dataset with 229 proteins. AVAILABILITY: Program and datasets are available at http://biocom1.ssu.ac.kr/FKNNacc/ CONTACT: jul@ssu.ac.kr. 相似文献

18.

A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions

Shapovalov MV Dunbrack RL 《Structure (London, England : 1993)》2011,19(6):844-858

Rotamer libraries are used in protein structure determination, prediction, and design. The backbone-dependent rotamer library consists of rotamer frequencies, mean dihedral angles, and variances as?a function of the backbone dihedral angles. Structure prediction and design methods that employ backbone flexibility would strongly benefit from smoothly varying probabilities and angles. A new version of the?backbone-dependent rotamer library has been developed using adaptive kernel density estimates for the rotamer frequencies and adaptive kernel regression for the mean dihedral angles and variances. This formulation allows for evaluation of the rotamer probabilities, mean angles, and variances as?a smooth and continuous function of phi and psi. Continuous probability density estimates for the nonrotameric degrees of freedom of amides, carboxylates, and aromatic side chains have been modeled as a function of the backbone dihedrals and rotamers of the remaining degrees of freedom. New backbone-dependent rotamer libraries at varying levels of smoothing are available from http://dunbrack.fccc.edu. 相似文献

19.

Transmembrane Protein Alignment and Fold Recognition Based on Predicted Topology

Han Wang Zhiquan He Chao Zhang Li Zhang Dong Xu 《PloS one》2013,8(7)

Background

Although Transmembrane Proteins (TMPs) are highly important in various biological processes and pharmaceutical developments, general prediction of TMP structures is still far from satisfactory. Because TMPs have significantly different physicochemical properties from soluble proteins, current protein structure prediction tools for soluble proteins may not work well for TMPs. With the increasing number of experimental TMP structures available, template-based methods have the potential to become broadly applicable for TMP structure prediction. However, the current fold recognition methods for TMPs are not as well developed as they are for soluble proteins.

Methodology

We developed a novel TMP Fold Recognition method, TMFR, to recognize TMP folds based on sequence-to-structure pairwise alignment. The method utilizes topology-based features in alignment together with sequence profile and solvent accessibility. It also incorporates a gap penalty that depends on predicted topology structure segments. Given the difference between α-helical transmembrane protein (αTMP) and β-strands transmembrane protein (βTMP), parameters of scoring functions are trained respectively for these two protein categories using 58 αTMPs and 17 βTMPs in a non-redundant training dataset.

Results

We compared our method with HHalign, a leading alignment tool using a non-redundant testing dataset including 72 αTMPs and 30 βTMPs. Our method achieved 10% and 9% better accuracies than HHalign in αTMPs and βTMPs, respectively. The raw score generated by TMFR is negatively correlated with the structure similarity between the target and the template, which indicates its effectiveness for fold recognition. The result demonstrates TMFR provides an effective TMP-specific fold recognition and alignment method. 相似文献

20.

Predicting protein–protein interactions from protein sequences using meta predictor

Jun-Feng Xia Xing-Ming Zhao De-Shuang Huang 《Amino acids》2010,39(5):1595-1599

A novel method is proposed for predicting protein–protein interactions (PPIs) based on the meta approach, which predicts PPIs using support vector machine that combines results by six independent state-of-the-art predictors. Significant improvement in prediction performance is observed, when performed on Saccharomyces cerevisiae and Helicobacter pylori datasets. In addition, we used the final prediction model trained on the PPIs dataset of S. cerevisiae to predict interactions in other species. The results reveal that our meta model is also capable of performing cross-species predictions. The source code and the datasets are available at 相似文献