首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Support vector machine for predicting alpha-turn types   总被引:3,自引:0,他引:3  
Cai YD  Feng KY  Li YX  Chou KC 《Peptides》2003,24(4):629-630
Tight turns play an important role in globular proteins from both the structural and functional points of view. Of tight turns, beta-turns and gamma-turns have been extensively studied, but alpha-turns were little investigated. Recently, a systematic search for alpha-turns classified alpha-turns into nine different types according to their backbone trajectory features. In this paper, Support Vector Machines (SVMs), a new machine learning method, is proposed for predicting the alpha-turn types in proteins. The high rates of correct prediction imply that that the formation of different alpha-turn types is evidently correlated with the sequence of a pentapeptide, and hence can be approximately predicted based on the sequence information of the pentapeptide alone, although the incorporation of its interaction with the other part of a protein, the so-called "long distance interaction", will further improve the prediction quality.  相似文献   

2.
Prediction of tight turns and their types in proteins   总被引:6,自引:0,他引:6  
A tight turn in protein structure is defined as a site where (i) a polypeptide chain reverses its overall direction, i.e., leads the chain to fold back on itself by nearly 180 degrees, and (ii) the amino acid residues directly involved in forming the turn are no more than six. Tight turns are generally categorized as delta-turn, gamma-turn, beta-turn, alpha-turn, and pi-turn, which are formed by two-, three-, four-, five-, and six-amino-acid residues, respectively. According to the folding mode, each of such tight turns can be further classified into several different types. Tight turns play an important role in globular proteins from both the structural and functional points of view. In view of this, various efforts have been made to predict tight turns and their types. This Review summarizes the development in this area, with an emphasis focused on the most recent work concerned that is featured by the sequence-coupled model. Meanwhile, the future challenge in this area has also been briefly addressed.  相似文献   

3.
Dasgupta B  Pal L  Basu G  Chakrabarti P 《Proteins》2004,55(2):305-315
Like the beta-turns, which are characterized by a limiting distance between residues two positions apart (i, i+3), a distance criterion (involving residues at positions i and i+4) is used here to identify alpha-turns from a database of known protein structures. At least 15 classes of alpha-turns have been enumerated based on the location in the phi,psi space of the three central residues (i+1 to i+3)-one of the major being the class AAA, where the residues occupy the conventional helical backbone torsion angles. However, moving towards the C-terminal end of the turn, there is a shift in the phi,psi angles towards more negative phi, such that the electrostatic repulsion between two consecutive carbonyl oxygen atoms is reduced. Except for the last position (i+4), there is not much similarity in residue composition at different positions of hydrogen and non-hydrogen bonded AAA turns. The presence or absence of Pro at i+1 position of alpha- and beta-turns has a bearing on whether the turn is hydrogen-bonded or without a hydrogen bond. In the tertiary structure, alpha-turns are more likely to be found in beta-hairpin loops. The residue composition at the beginning of the hydrogen bonded AAA alpha-turn has similarity with type I beta-turn and N-terminal positions of helices, but the last position matches with the C-terminal capping position of helices, suggesting that the existence of a "helix cap signal" at i+4 position prevents alpha-turns from growing into helices. Our results also provide new insights into alpha-helix nucleation and folding.  相似文献   

4.
Wang Y  Xue Z  Xu J 《Proteins》2006,65(1):49-54
We have developed a novel method named AlphaTurn to predict alpha-turns in proteins based on the support vector machine (SVM). The prediction was done on a data set of 469 nonhomologous proteins containing 967 alpha-turns. A great improvement in prediction performance was achieved by using multiple sequence alignment generated by PSI-BLAST as input instead of the single amino acid sequence. The introduction of secondary structure information predicted by PSIPRED also improved the prediction performance. Moreover, we handled the very uneven data set by combining the cost factor j with the "state-shifting" rule. This further promoted the prediction quality of our method. The final SVM model yielded a Matthews correlation coefficient (MCC) of 0.25 by a 10-fold cross-validation. To our knowledge, this MCC value is the highest obtained so far for predicting alpha-turns. An online Web server based on this method has been developed and can be freely accessed at http://bmc.hust.edu.cn/bioinformatics/ or http://210.42.106.80/.  相似文献   

5.
Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/.  相似文献   

6.
As a structural class, tight turns can control molecular recognition, enzymatic activity, and nucleation of folding. They have been extensively characterized in soluble proteins but have not been characterized in outer membrane proteins (OMPs), where they also support critical functions. We clustered the 4 to 6 residue tight turns of 110 OMPs to characterize the phi/psi angles, sequence, and hydrogen bonding of these structures. We find significant differences between reports of soluble protein tight turns and OMP tight turns. Since OMP strands are less twisted than soluble strands, they favor different turn structures types. Moreover, the membrane localization of OMPs yields different sequence hallmarks for their tight turns relative to soluble protein turns. We also characterize the differences in phi/psi angles, sequence, and hydrogen bonding between OMP extracellular loops and OMP periplasmic turns. As previously noted, the extracellular loops tend to be much longer than the periplasmic turns. We find that this difference in length is due to the broader distribution of lengths of the extracellular loops not a large difference in the median length. Extracellular loops also tend to have more charged residues as predicted by the charge-out rule. Finally, in all OMP tight turns, hydrogen bonding between the side chain and backbone 2 to 4 residues away from that side chain plays an important role. These bonds preferentially use an Asp, Asn, Ser, or Thr residue in a beta or pro phi/psi conformation. We anticipate that this study will be applicable to future design and structure prediction of OMPs.  相似文献   

7.
Kaur H  Raghava GP 《Proteins》2004,55(1):83-90
In this paper a systematic attempt has been made to develop a better method for predicting alpha-turns in proteins. Most of the commonly used approaches in the field of protein structure prediction have been tried in this study, which includes statistical approach "Sequence Coupled Model" and machine learning approaches; i) artificial neural network (ANN); ii) Weka (Waikato Environment for Knowledge Analysis) Classifiers and iii) Parallel Exemplar Based Learning (PEBLS). We have also used multiple sequence alignment obtained from PSIBLAST and secondary structure information predicted by PSIPRED. The training and testing of all methods has been performed on a data set of 193 non-homologous protein X-ray structures using five-fold cross-validation. It has been observed that ANN with multiple sequence alignment and predicted secondary structure information outperforms other methods. Based on our observations we have developed an ANN-based method for predicting alpha-turns in proteins. The main components of the method are two feed-forward back-propagation networks with a single hidden layer. The first sequence-structure network is trained with the multiple sequence alignment in the form of PSI-BLAST-generated position specific scoring matrices. The initial predictions obtained from the first network and PSIPRED predicted secondary structure are used as input to the second structure-structure network to refine the predictions obtained from the first net. The final network yields an overall prediction accuracy of 78.0% and MCC of 0.16. A web server AlphaPred (http://www.imtech.res.in/raghava/alphapred/) has been developed based on this approach.  相似文献   

8.
MOTIVATION: beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. RESULTS: This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. AVAILABILITY: The server is accessible from http://imtech.res.in/raghava/betatpred/  相似文献   

9.
An algorithm to predict the membrane protein types based on the multi-residue-pair effect in the Markov model is proposed. For a newly constructed dataset of 835 membrane proteins with very low sequence similarity, the overall prediction accuracy has been achieved as high as 81.1% and 71.7% in the resubstitution and jackknife test, respectively, for a prediction of type I single-pass, type II single-pass, multi-pass membrane proteins, lipid chain-anchored and GPI-anchored membrane proteins. The improvement of about 11% in the jackknife test can be achieved compared with the component-coupled algorithm merely based on the amino acid composition (AAC approach). The improvement is also confirmed on a high similarity dataset and the other extrapolating test. The result implies that designing more incisive analysis tools, one should develop algorithms based on the representative dataset with lower sequence similarity. The present algorithm is useful to expedite the determination of the types and functions of new membrane proteins and may be useful for the systematic analysis of functional genome data in a large scale. The computer program is available on request.  相似文献   

10.
Cell membranes are vitally important to living cells. Although the infrastructure of biological membrane is provided by the lipid bilayer, membrane proteins perform most of the specific functions. Knowledge of membrane protein types often provides crucial hints toward determining the function of an uncharacterized membrane protein. With the avalanche of new protein sequences generated in the post-genomic era, it is highly demanded to develop a high throughput tool in identifying the type of newly found membrane proteins according to their primary sequences, so as to timely annotate them for reference usage in both basic research and drug discovery. To realize this, the key is to establish a powerful identifier that can catch their characteristic sequence patterns for different membrane protein types. However, it is not easy because they are buried in a pile of long and complicated sequences. In this paper, based on the concept of the pseudo-amino acid composition [K.C. Chou, PROTEINS: Struct., Funct., Genet. 43 (2001) 246-255], the low-frequency Fourier spectrum analysis is introduced. The merits by doing so are that the sequence pattern information can be more effectively incorporated into a set of discrete components, and that all the existing prediction algorithms can be straightforwardly used on such a formulation for protein samples. High success rates were observed by the re-substitution test, jackknife test, and independent dataset test, indicating that the low-frequency Fourier spectrum approach may become a very useful tool for membrane protein type prediction. The novel approach also holds a high potential for predicting many other attributes of proteins.  相似文献   

11.

Motivation

Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously.

Results

In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.  相似文献   

12.
Membrane protein is an important composition of cell membrane. Given a membrane protein sequence, how can we identify its type(s) is very important because the type keeps a close correlation with its functions. According to previous studies, membrane protein can be divided into the following eight types: single-pass type I, single-pass type II, single-pass type III, single-pass type IV, multipass, lipid-anchor, GPI-anchor, peripheral membrane protein. With the avalanche of newly found protein sequences in the post-genomic age, it is urgent to develop an automatic and effective computational method to rapid and reliable prediction of the types of membrane proteins. At present, most of the existing methods were based on the assumption that one membrane protein only belongs to one type. Actually, a membrane protein may simultaneously exist at two or more different functional types. In this study, a new method by hybridizing the pseudo amino acid composition with multi-label algorithm called LIFT (multi-label learning with label-specific features) was proposed to predict the functional types both singleplex and multiplex animal membrane proteins. Experimental result on a stringent benchmark dataset of membrane proteins by jackknife test show that the absolute-true obtained was 0.6342, indicating that our approach is quite promising. It may become a useful high-through tool, or at least play a complementary role to the existing predictors in identifying functional types of membrane proteins.  相似文献   

13.
Numerous studies have been performed for analysis and prediction of β‐turns in a protein. This study focuses on analyzing, predicting, and designing of β‐turns to understand the preference of amino acids in β‐turn formation. We analyzed around 20,000 PDB chains to understand the preference of residues or pair of residues at different positions in β‐turns. Based on the results, a propensity‐based method has been developed for predicting β‐turns with an accuracy of 82%. We introduced a new approach entitled “Turn level prediction method,” which predicts the complete β‐turn rather than focusing on the residues in a β‐turn. Finally, we developed BetaTPred3, a Random forest based method for predicting β‐turns by utilizing various features of four residues present in β‐turns. The BetaTPred3 achieved an accuracy of 79% with 0.51 MCC that is comparable or better than existing methods on BT426 dataset. Additionally, models were developed to predict β‐turn types with better performance than other methods available in the literature. In order to improve the quality of prediction of turns, we developed prediction models on a large and latest dataset of 6376 nonredundant protein chains. Based on this study, a web server has been developed for prediction of β‐turns and their types in proteins. This web server also predicts minimum number of mutations required to initiate or break a β‐turn in a protein at specified location of a protein. Proteins 2015; 83:910–921. © 2015 Wiley Periodicals, Inc.  相似文献   

14.
Tight turns have long been recognized as one of the three important features of proteins, together with alpha-helix and beta-sheet. Tight turns play an important role in globular proteins from both the structural and functional points of view. More than 90% tight turns are beta-turns and most of the rest are gamma-turns. Analysis and prediction of beta-turns and gamma-turns is very useful for design of new molecules such as drugs, pesticides, and antigens. In this paper we investigated two aspects of applying support vector machine (SVM), a promising machine learning method for bioinformatics, to prediction and analysis of beta-turns and gamma-turns. First, we developed two SVM-based methods, called BTSVM and GTSVM, which predict beta-turns and gamma-turns in a protein from its sequence. When compared with other methods, BTSVM has a superior performance and GTSVM is competitive. Second, we used SVMs with a linear kernel to estimate the support of amino acids for the formation of beta-turns and gamma-turns depending on their position in a protein. Our analysis results are more comprehensive and easier to use than the previous results in designing turns in proteins.  相似文献   

15.
The genome of the diurnal cyanobacterium Cyanothece sp. PCC 51142 has recently been sequenced and observed to contain 35 pentapeptide repeat proteins (PRPs). These proteins, while present throughout the prokaryotic and eukaryotic kingdoms, are most abundant in cyanobacteria. The sheer number of PRPs in cyanobacteria coupled with their predicted location in every cellular compartment argues for important, yet unknown, physiological and biochemical functions. To gain biochemical insights, the crystal structure for Rfr32, a 167-residue PRP with an N-terminal 29-residue signal peptide, was determined at 2.1 A resolution. The structure is dominated by 21 tandem pentapeptide repeats that fold into a right-handed quadrilateral beta-helix, or Rfr-fold, as observed for the tandem pentapeptide repeats in the only other PRP structure, the mycobacterial fluoroquinoline resistance protein MfpA from Mycobacterium tuberculosis. Sitting on top of the Rfr-fold are two short, antiparallel alpha-helices, bridged with a disulfide bond, that perhaps prevent edge-to-edge aggregation at the C terminus. Analysis of the main-chain (Phi,Psi) dihedral orientations for the pentapeptide repeats in Rfr32 and MfpA makes it possible to recognize the structural details for the two distinct types of four-residue turns adopted by the pentapeptide repeats in the Rfr-fold. These turns, labeled type II and type IV beta-turns, may be universal motifs that shape the Rfr-fold in all PRPs.  相似文献   

16.
Membrane transporters are critical in living cells. Therefore, the discrimination of the types of membrane proteins based on their functions is of great importance both for helping genome annotation and providing a supplementary role to experimental researchers to gain insight into membrane proteins’ function. There are a lot of computational methods to facilitate the identification of the functional types of membrane proteins. However, in these methods, the local sequence environment was not integrated into the constructed model. In this study, we described a new strategy to predict the functional types of membrane proteins using a model based on auto covariance and position-specific scoring matrix. The novelty of the presented approach is considering the distribution of different positions of functional conservation sites in protein sequences. Thereby, this model adequately takes into account the long-range correlation between such sites during sequential evolution. Fivefold cross-validation test shows that this method greatly improves the prediction accuracy and achieves an acceptable prediction accuracy of 87.51%. The result indicates that the current approach might be an effective tool for predicting the functional types of membrane proteins only using the primary sequences. The code and dataset used in this article are freely available at .  相似文献   

17.
The relationship between the amino acid sequence and the three-dimensional structure of proteins with internal repeats is discussed. In particular, correlations between the amino acid composition and the ability to fold in a unique structure, as well as classification of the structures based on their repeat length, are described. This analysis suggests rules that can be used for the structural prediction of repeat-containing proteins. The paper is focused on prediction and modeling of solenoid-like proteins with the repeat length ranging between 5 and 40 residues. The models of leucine-rich repeat proteins and bacterial proteins with pentapeptide repeats are examined in light of the recently solved structures of the related molecules.  相似文献   

18.
随机森林方法预测膜蛋白类型   总被引:2,自引:0,他引:2  
膜蛋白的类型与其功能是密切相关的,因此膜蛋白类型的预测是研究其功能的重要手段,从蛋白质的氨基酸序列出发对膜蛋白的类型进行预测有重要意义。文章基于蛋白质的氨基酸序列,将组合离散增量和伪氨基酸组分信息共同作为预测参数,采用随机森林分类器,对8类膜蛋白进行了预测。在Jackknife检验下的预测精度为86.3%,独立检验的预测精度为93.8%,取得了好于前人的预测结果。  相似文献   

19.
The pentapeptide repeat is a recently discovered protein fold. Mycobacterium tuberculosis MfpA is a founding member of the pentapeptide repeat protein (PRP) family that confers resistance to the antibiotic fluoroquinolone by binding to DNA gyrase and inhibiting its activity. The size, shape, and surface potential of MfpA mimics duplex DNA. As an initial step in a comprehensive biophysical analysis of the role of PRPs in the regulation of cellular topoisomerase activity and conferring antibiotic resistance, we have explored the solution structure and refolding of MfpA by fluorescence spectroscopy, CD, and analytical centrifugation. A unique CD spectrum for the pentapeptide repeat fold is described. This spectrum reveals a native structure whose beta-strands and turns within the right-handed quadrilateral beta-helix that define the PRP fold differ from canonical secondary structure types. MfpA refolded from urea or guanidium by dialysis or dilution forms stable aggregates of monomers whose secondary and tertiary structure are not native. In contrast, MfpA refolded using a novel "time-dependent renaturation" protocol yields protein with native secondary, tertiary, and quaternary structure. The generality of "time-dependent renaturation" to other proteins and denaturation methods is discussed.  相似文献   

20.
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号