期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Prediction of functional sites in proteins using conserved functional group analysis

Innis CA Anand AP Sowdhamini R 《Journal of molecular biology》2004,337(4):1053-1068

A detailed knowledge of a protein's functional site is an absolute prerequisite for understanding its mode of action at the molecular level. However, the rapid pace at which sequence and structural information is being accumulated for proteins greatly exceeds our ability to determine their biochemical roles experimentally. As a result, computational methods are required which allow for the efficient processing of the evolutionary information contained in this wealth of data, in particular that related to the nature and location of functionally important sites and residues. The method presented here, referred to as conserved functional group (CFG) analysis, relies on a simplified representation of the chemical groups found in amino acid side-chains to identify functional sites from a single protein structure and a number of its sequence homologues. We show that CFG analysis can fully or partially predict the location of functional sites in approximately 96% of the 470 cases tested and that, unlike other methods available, it is able to tolerate wide variations in sequence identity. In addition, we discuss its potential in a structural genomics context, where automation, scalability and efficiency are critical, and an increasing number of protein structures are determined with no prior knowledge of function. This is exemplified by our analysis of the hypothetical protein Ydde_Ecoli, whose structure was recently solved by members of the North East Structural Genomics consortium. Although the proposed active site for this protein needs to be validated experimentally, this example illustrates the scope of CFG analysis as a general tool for the identification of residues likely to play an important role in a protein's biochemical function. Thus, our method offers a convenient solution to rapidly and automatically process the vast amounts of data that are beginning to emerge from structural genomics projects. 相似文献

2.

Prediction of functional phosphorylation sites by incorporating evolutionary information

Shen Niu Zhen Wang Dongya Ge Guoqing Zhang Yixue Li 《蛋白质与细胞》2012,3(9):675

Protein phosphorylation is a ubiquitous protein post-translational modification, which plays an important role in cellular signaling systems underlying various physiological and pathological processes. Current in silico methods mainly focused on the prediction of phosphorylation sites, but rare methods considered whether a phosphorylation site is functional or not. Since functional phosphorylation sites are more valuable for further experimental research and a proportion of phosphorylation sites have no direct functional effects, the prediction of functional phosphorylation sites is quite necessary for this research area. Previous studies have shown that functional phosphorylation sites are more conserved than non-functional phosphorylation sites in evolution. Thus, in our method, we developed a web server by integrating existing phosphorylation site prediction methods, as well as both absolute and relative evolutionary conservation scores to predict the most likely functional phosphorylation sites. Using our method, we predicted the most likely functional sites of the human, rat and mouse proteomes and built a database for the predicted sites. By the analysis of overall prediction results, we demonstrated that protein phosphorylation plays an important role in all the enriched KEGG pathways. By the analysis of protein-specific prediction results, we demonstrated the usefulness of our method for individual protein studies. Our method would help to characterize the most likely functional phosphorylation sites for further studies in this research area. 相似文献

3.

Prediction of amino acid positions specific for functional groups in a protein family based on local sequence similarity

下载免费PDF全文

Dmitry A. Karasev Alexander V. Veselovsky Nina Yu. Oparina Dmitry A. Filimonov Boris N. Sobolev 《Journal of molecular recognition : JMR》2016,29(4):159-169

相似文献

4.

Exploiting sequence and structure homologs to identify protein-protein binding sites

Chung JL Wang W Bourne PE 《Proteins》2006,62(3):630-640

A rapid increase in the number of experimentally derived three-dimensional structures provides an opportunity to better understand and subsequently predict protein-protein interactions. In this study, structurally conserved residues were derived from multiple structure alignments of the individual components of known complexes and the assigned conservation score was weighted based on the crystallographic B factor to account for the structural flexibility that will result in a poor alignment. Sequence profile and accessible surface area information was then combined with the conservation score to predict protein-protein binding sites using a Support Vector Machine (SVM). The incorporation of the conservation score significantly improved the performance of the SVM. About 52% of the binding sites were precisely predicted (greater than 70% of the residues in the site were identified); 77% of the binding sites were correctly predicted (greater than 50% of the residues in the site were identified), and 21% of the binding sites were partially covered by the predicted residues (some residues were identified). The results support the hypothesis that in many cases protein interfaces require some residues to provide rigidity to minimize the entropic cost upon complex formation. 相似文献

5.

Prediction of kinase-specific phosphorylation sites with sequence features by a log-odds ratio approach

Li T Li F Zhang X 《Proteins》2008,70(2):404-414

Protein phosphorylation plays important roles in a variety of cellular processes. Detecting possible phosphorylation sites and their corresponding protein kinases is crucial for studying the function of many proteins. This article presents a new prediction system, called PhoScan, to predict phosphorylation sites in a kinase-family-specific way. Common phosphorylation features and kinase-specific features are extracted from substrate sequences of different protein kinases based on the analysis of published experiments, and a scoring system is developed for evaluating the possibility that a peptide can be phosphorylated by the protein kinase at the specific site in its sequence context. PhoScan can achieve a specificity of above 90% with sensitivity around 90% at kinase-family level on the data experimented. The system is applied on a set of human proteins collected from Swiss-Prot and sets of putative phosphorylation sites are predicted for protein kinase A, cyclin-dependent kinase, and casein kinase 2 families. PhoScan is available at http://bioinfo.au.tsinghua.edu.cn/phoscan/. 相似文献

6.

A computational method for the analysis and prediction of protein:phosphopeptide-binding sites

Joughin BA Tidor B Yaffe MB 《Protein science : a publication of the Protein Society》2005,14(1):131-139

Phosphopeptide-binding domains, including the FHA, SH2, WW, WD40, MH2, and Polo-box domains, as well as the 14-3-3 proteins, exert control functions in important processes such as cell growth, division, differentiation, and apoptosis. Structures and mechanisms of phosphopeptide binding are generally diverse, revealing few general principles. A computational method for analysis of phosphopeptide-binding domains was therefore developed to elucidate the physical and chemical nature of phosphopeptide binding, given this lack of structural similarity. The surfaces of nine phosphopeptide-binding proteins, representing seven distinct classes of phosphopeptide-binding modules, were discretized, and encoded with information about amino acid identity, surface curvature, and electrostatic potential at every point on the surface in order to identify local surface properties enriched in phosphoresidue contact sites. Cross-validation indicated that propensities corresponding to this enrichment calculated from a subset of the training data could be used to predict the phosphoresidue contact site on proteins not used in training with no false negative results, and with few unconfirmed positive predictions. The locations of phosphoresidue contact sites were then predicted on the surfaces of the checkpoint kinase Chk1 and the BRCA1 BRCT repeat domain, and these predictions are consistent with recent experimental evidence. 相似文献

7.

Prediction of protein secondary structure from amino acid sequence

Jen Tsi Yang 《Journal of Protein Chemistry》1996,15(2):185-191

The conformational parametersP _k for each amino acid species (j=1–20) of sequential peptides in proteins are presented as the product ofP _i,k, wherei is the number of the sequential residues in thekth conformational state (k=-helix,-sheet,-turn, or unordered structure). Since the average parameter for ann-residue segment is related to the average probability of finding the segment in the kth state, it becomes a geometric mean of (P _k)_av=(P _i,k)^1/n with amino acid residuei increasing from 1 ton. We then used ln(P_k)_av to convert a multiplicative process to a summation, i.e., ln(P _k)_av=(1/n)P _i,k (i=1 ton) for ease of operation. However, this is unlike the popular Chou-Fasman algorithm, which has the flaw of using the arithmetic mean for relative probabilities. The Chou-Fasman algorithm happens to be close to our calculations in many cases mainly because the difference between theirP _k and our InP _k is nearly constant for about one-half of the 20 amino acids. When stronger conformation formers and breakers exist, the difference become larger and the prediction at the N- and C-terminal-helix or-sheet could differ. If the average conformational parameters of the overlapping segments of any two states are too close for a unique solution, our calculations could lead to a different prediction. 相似文献

8.

Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation

A D Wilkins R Lua S Erdin R M Ward O Lichtarge 《Protein science : a publication of the Protein Society》2010,19(7):1296-1311

Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top‐ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top‐ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure‐function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting. 相似文献

9.

Prediction, conservation analysis, and structural characterization of mammalian mucin-type O-glycosylation sites 总被引：27，自引：0，他引：27

Julenius K Mølgaard A Gupta R Brunak S 《Glycobiology》2005,15(2):153-164

O-GalNAc-glycosylation is one of the main types of glycosylation in mammalian cells. No consensus recognition sequence for the O-glycosyltransferases is known, making prediction methods necessary to bridge the gap between the large number of known protein sequences and the small number of proteins experimentally investigated with regard to glycosylation status. From O-GLYCBASE a total of 86 mammalian proteins experimentally investigated for in vivo O-GalNAc sites were extracted. Mammalian protein homolog comparisons showed that a glycosylated serine or threonine is less likely to be precisely conserved than a nonglycosylated one. The Protein Data Bank was analyzed for structural information, and 12 glycosylated structures were obtained. All positive sites were found in coil or turn regions. A method for predicting the location for mucin-type glycosylation sites was trained using a neural network approach. The best overall network used as input amino acid composition, averaged surface accessibility predictions together with substitution matrix profile encoding of the sequence. To improve prediction on isolated (single) sites, networks were trained on isolated sites only. The final method combines predictions from the best overall network and the best isolated site network; this prediction method correctly predicted 76% of the glycosylated residues and 93% of the nonglycosylated residues. NetOGlyc 3.1 can predict sites for completely new proteins without losing its performance. The fact that the sites could be predicted from averaged properties together with the fact that glycosylation sites are not precisely conserved indicates that mucin-type glycosylation in most cases is a bulk property and not a very site-specific one. NetOGlyc 3.1 is made available at www.cbs.dtu.dk/services/netoglyc. 相似文献

10.

An analysis approach to identify specific functional sites in orthologous proteins using sequence and structural information: Application to neuroserpin reveals regions that differentially regulate inhibitory activity

下载免费PDF全文

Tet Woo Lee Annie Shu‐Ping Yang Thomas Brittain Nigel P. Birch 《Proteins》2015,83(1):135-152

The analysis of sequence conservation is commonly used to predict functionally important sites in proteins. We have developed an approach that first identifies highly conserved sites in a set of orthologous sequences using a weighted substitution‐matrix‐based conservation score and then filters these conserved sites based on the pattern of conservation present in a wider alignment of sequences from the same family and structural information to identify surface‐exposed sites. This allows us to detect specific functional sites in the target protein and exclude regions that are likely to be generally important for the structure or function of the wider protein family. We applied our method to two members of the serpin family of serine protease inhibitors. We first confirmed that our method successfully detected the known heparin binding site in antithrombin while excluding residues known to be generally important in the serpin family. We next applied our sequence analysis approach to neuroserpin and used our results to guide site‐directed polyalanine mutagenesis experiments. The majority of the mutant neuroserpin proteins were found to fold correctly and could still form inhibitory complexes with tissue plasminogen activator (tPA). Kinetic analysis of tPA inhibition, however, revealed altered inhibitory kinetics in several of the mutant proteins, with some mutants showing decreased association with tPA and others showing more rapid dissociation of the covalent complex. Altogether, these results confirm that our sequence analysis approach is a useful tool that can be used to guide mutagenesis experiments for the detection of specific functional sites in proteins. Proteins 2015; 83:135–152. © 2014 Wiley Periodicals, Inc. 相似文献

11.

基因组功能预测的进化印记方法 总被引：6，自引：1，他引：6

解涛丁达夫《Acta biochimica et biophysica Sinica》1999,31(4):433-439

改善基因组功能预测方案是目前功能基因组学的迫切问题,生物进化历程会在分子序列上留下相应进化印记－直系同源簇的特异模体,在这一生物学事实的基础上,提出了一个新的基因缚功能预测方法,首先利用进化分析方法构建直系同源簇,再找到各直系同源簇的功能模体,这样可以形成特异的功能模体库,未知基因的功能预测可望通过搜索该功能模体库而得以高效,准确地完成,对５个家族的检验初步证实该方案是可行的。相似文献

12.

Conservation of orientation and sequence in protein domain--domain interactions

Littler SJ Hubbard SJ 《Journal of molecular biology》2005,345(5):1265-1279

The repertoire of naturally occurring protein structures is usually characterised in structural terms at the domain level by their constituent folds. As structure is acknowledged to be an important stepping stone to the understanding of protein function, an appreciation of how individual domain interactions are built to form complete, functional protein structures is essential. A comprehensive study of protein domain interactions has been undertaken, covering all those observed in known structures, as well as those predicted to occur in 46 completed genome sequences from all three domains of life. In particular, we examine the promiscuity of protein domains characterised by SCOP superfamilies in terms of their interacting partners, the surface they use to form these interactions, and the relative orientations of their domain partners. Protein domains are shown to display a variety of behaviours, ranging from high promiscuity to absolute monogamy of domain surface employed, with both multiple and single domain partners. In addition, the conservation of sequence and volume at domain interface surfaces is observed to be significantly higher than at accessible surface in general, acting as a powerful potential predictor for domain interactions. We also examine the separation of interacting domains in protein sequence, showing that standard thresholds of 30 amino acid residues lead to a significant false positive rate, and an even more significant false negative rate of approximately 40%. These data suggest that there may be many more than the 2000 domain--domain interactions that have not yet been observed structurally, and we provide a top 30 hit-list of putative domain interactions which should be targeted. 相似文献

13.

Prediction of glutathionylation sites in proteins using minimal sequence information and their experimental validation

Debojyoti Pal Deepak Sharma Mukesh Kumar 《Free radical research》2016,50(9):1011-1021

S-glutathionylation of proteins plays an important role in various biological processes and is known to be protective modification during oxidative stress. Since, experimental detection of S-glutathionylation is labor intensive and time consuming, bioinformatics based approach is a viable alternative. Available methods require relatively longer sequence information, which may prevent prediction if sequence information is incomplete. Here, we present a model to predict glutathionylation sites from pentapeptide sequences. It is based upon differential association of amino acids with glutathionylated and non-glutathionylated cysteines from a database of experimentally verified sequences. This data was used to calculate position dependent F-scores, which measure how a particular amino acid at a particular position may affect the likelihood of glutathionylation event. Glutathionylation-score (G-score), indicating propensity of a sequence to undergo glutathionylation, was calculated using position-dependent F-scores for each amino-acid. Cut-off values were used for prediction. Our model returned an accuracy of 58% with Matthew’s correlation-coefficient (MCC) value of 0.165. On an independent dataset, our model outperformed the currently available model, in spite of needing much less sequence information. Pentapeptide motifs having high abundance among glutathionylated proteins were identified. A list of potential glutathionylation hotspot sequences were obtained by assigning G-scores and subsequent Protein-BLAST analysis revealed a total of 254 putative glutathionable proteins, a number of which were already known to be glutathionylated. Our model predicted glutathionylation sites in 93.93% of experimentally verified glutathionylated proteins. Outcome of this study may assist in discovering novel glutathionylation sites and finding candidate proteins for glutathionylation. 相似文献

14.

Predicting functional residues in Plasmodium falciparum plasmepsins by combining sequence and structural analysis with molecular dynamics simulations

Valiente PA Batista PR Pupo A Pons T Valencia A Pascutti PG 《Proteins》2008,73(2):440-457

Plasmepsins are aspartic proteases involved in the initial steps of the hemoglobin degradation pathway, a critical stage in the Plasmodium falciparum life cycle during human infection. Thus, they are attractive targets for novel therapeutic compounds to treat malaria, which remains one of the world's biggest health problems. The three-dimensional structures available for P. falciparum plasmepsins II and IV make structure-based drug design of antimalarial compounds that focus on inhibiting plasmepsins possible. However, the structural flexibility of the plasmepsin active site cavity combined with insufficient knowledge of the functional residues and of those determining the specificity of parasitic enzymes is a drawback when designing specific inhibitors. In this study, we have combined a sequence and structural analysis with molecular dynamics simulations to predict the functional residues in P. falciparum plasmepsins. The careful analysis of X-ray structures and 3D models carried out here suggests that residues Y17, V105, T108, L191, L242, Q275, and T298 are important for plasmepsin function. These seven amino acids are conserved across the malarial strains but not in human aspartic proteases. Residues V105 and T108 are localized in a flap of an interior pocket and they only establish contacts with a specific non-peptide achiral inhibitor. We also observed a rapid conformational change in the L3 region of plasmepsins that closes the active site of the enzyme, which explains earlier experimental findings. These results shed light on the role of V105 and T108 residues in plasmepsin specificities, and they should be useful in structure-based design of novel, selective inhibitors that may serve as antimalarial drugs. 相似文献

15.

iProtGly‐SS: Identifying protein glycation sites using sequence and structure based features

下载免费PDF全文

Md Mofijul Islam Sanjay Saha Md Mahmudur Rahman Swakkhar Shatabda Dewan Md Farid Abdollah Dehzangi 《Proteins》2018,86(7):777-789

Glycation is chemical reaction by which sugar molecule bonds with a protein without the help of enzymes. This is often cause to many diseases and therefore the knowledge about glycation is very important. In this paper, we present iProtGly‐SS, a protein lysine glycation site identification method based on features extracted from sequence and secondary structural information. In the experiments, we found the best feature groups combination: Amino Acid Composition, Secondary Structure Motifs, and Polarity. We used support vector machine classifier to train our model and used an optimal set of features using a group based forward feature selection technique. On standard benchmark datasets, our method is able to significantly outperform existing methods for glycation prediction. A web server for iProtGly‐SS is implemented and publicly available to use: http://brl.uiu.ac.bd/iprotgly-ss/ . 相似文献

16.

Functional flexibility of human cyclin-dependent kinase-2 and its evolutionary conservation

Bártová I Koca J Otyepka M 《Protein science : a publication of the Protein Society》2008,17(1):22-33

Cyclin-dependent kinase 2 (CDK2) is the most thoroughly studied of the cyclin-dependent kinases that regulate essential cellular processes, including the cell cycle, and it has become a model for studies of regulatory mechanisms at the molecular level. This contribution identifies flexible and rigid regions of CDK2 based on temperature B-factors acquired from both X-ray data and molecular dynamics simulations. In addition, the biological relevance of the identified flexible regions and their motions is explored using information from the essential dynamics analysis related to conformational changes of CDK2 and knowledge of its biological function(s). The conserved regions of CMGC protein kinases' primary sequences are located in the most rigid regions identified in our analyses, with the sole exception of the absolutely conserved G13 in the tip of the glycine-rich loop. The conserved rigid regions are important for nucleotide binding, catalysis, and substrate recognition. In contrast, the most flexible regions correlate with those where large conformational changes occur during CDK2 regulation processes. The rigid regions flank and form a rigid skeleton for the flexible regions, which appear to provide the plasticity required for CDK2 regulation. Unlike the rigid regions (which as mentioned are highly conserved) no evidence of evolutionary conservation was found for the flexible regions. 相似文献

17.

Conservation of structural fluctuations in homologous protein kinases and its implications on functional sites

下载免费PDF全文

Raju Kalaivani Alexandre G. de Brevern Narayanaswamy Srinivasan 《Proteins》2016,84(7):957-978

Our aim is to explore the similarities in structural fluctuations of homologous kinases. Gaussian Network Model based Normal Mode Analysis was performed on 73 active conformation structures in Ser/Thr/Tyr kinase superfamily. Categories of kinases with progressive evolutionary divergence, viz. (i) Same kinase with many crystal structures, (ii) Within‐Subfamily, (iii) Within‐Family, (iv) Within‐Group, and (v) Across‐Group, were analyzed. We identified a flexibility signature conserved in all kinases involving residues in and around the catalytic loop with consistent low‐magnitude fluctuations. However, the overall structural fluctuation profiles are conserved better in closely related kinases (Within‐Subfamily and Within‐family) than in distant ones (Within‐Group and Across‐Group). A substantial 65.4% of variation in flexibility was not accounted by variation in sequences or structures. Interestingly, we identified substructural residue‐wise fluctuation patterns characteristic of kinases of different categories. Specifically, we recognized statistically significant fluctuations unique to families of protein kinase A, cyclin‐dependent kinases, and nonreceptor tyrosine kinases. These fluctuation signatures localized to sites known to participate in protein‐protein interactions typical of these kinase families. We report for the first time that residues characterized by fluctuations unique to the group/family are involved in interactions specific to the group/family. As highlighted for Src family, local regions with differential fluctuations are proposed as attractive targets for drug design. Overall, our study underscores the importance of consideration of fluctuations, over and above sequence and structural features, in understanding the roles of sites characteristic of kinases. Proteins 2016; 84:957–978. © 2016 Wiley Periodicals, Inc. 相似文献

18.

植物功能性状对土壤保持的影响研究述评 总被引：3，自引：0，他引：3

王晶赵文武刘月贾立志《生态学报》2019,39(9):3355-3364

植被对土壤保持具有重要的影响,但是从植物功能性状的角度总结评述植被对土壤保持影响的研究并不多见。总结评述了植物地上功能性状、地下功能性状对土壤保持功能的影响以及植物地上、地下功能性状的关系,认为:(1)植被地上部分功能性状对土壤保持的作用主要体现在对溅蚀、面蚀的影响及间接改变土壤理化性质等方面,其功能性状指标主要包括叶面积、叶长、叶宽、枝数、植被高度等;(2)植被地下部分功能性状对土壤保持的作用主要体现在固持土壤、提高土壤抗剪切强度、提高土壤抗侵蚀能力、增强土壤渗透性,植物根系固持土壤与根系抗拉能力密切相关,植物根系土壤的物理和水文性质,与细根比例、根长密度、根表面积等性状密切相关;(3)可以通过植物地上部分功能性状间接反映地下部分功能性状,但是现有研究多为定性认识;(4)在植物功能性状对土壤保持的研究中亟待加强植被地上地下功能性状的长期定位监测,深化植被功能性状尤其是根系特征与土壤保持的作用机理,加强植被地上部分、地下部分功能性状的定量表达,建立植被功能性状与土壤保持功能的定量关系,实现植被功能性状与土壤保持功能特征的动态链接。相似文献

19.

Prediction of protein-protein interactions by combining structure and sequence conservation in protein interfaces 总被引：4，自引：0，他引：4

Aytuna AS Gursoy A Keskin O 《Bioinformatics (Oxford, England)》2005,21(12):2850-2855

MOTIVATION: Elucidation of the full network of protein-protein interactions is crucial for understanding of the principles of biological systems and processes. Thus, there is a need for in silico methods for predicting interactions. We present a novel algorithm for automated prediction of protein-protein interactions that employs a unique bottom-up approach combining structure and sequence conservation in protein interfaces. RESULTS: Running the algorithm on a template dataset of 67 interfaces and a sequentially non-redundant dataset of 6170 protein structures, 62 616 potential interactions are predicted. These interactions are compared with the ones in two publicly available interaction databases (Database of Interacting Proteins and Biomolecular Interaction Network Database) and also the Protein Data Bank. A significant number of predictions are verified in these databases. The unverified ones may correspond to (1) interactions that are not covered in these databases but known in literature, (2) unknown interactions that actually occur in nature and (3) interactions that do not occur naturally but may possibly be realized synthetically in laboratory conditions. Some unverified interactions, supported significantly with studies found in the literature, are discussed. AVAILABILITY: http://gordion.hpc.eng.ku.edu.tr/prism CONTACT: agursoy@ku.edu.tr; okeskin@ku.edu.tr. 相似文献

20.

Accurate prediction of protein structural classes using functional domains and predicted secondary structure sequences

Amin Ahmadi Adl Abbas Nowzari-Dalini Bin Xue Vladimir N. Uversky 《Journal of biomolecular structure & dynamics》2013,31(6):1127-1137

Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets. 相似文献