共查询到20条相似文献,搜索用时 0 毫秒
1.
Methods for predicting protein function from structure are becoming more important as the rate at which structures are solved increases more rapidly than experimental knowledge. As a result, protein structures now frequently lack functional annotations. The majority of methods for predicting protein function are reliant upon identifying a similar protein and transferring its annotations to the query protein. This method fails when a similar protein cannot be identified, or when any similar proteins identified also lack reliable annotations. Here, we describe a method that can assign function from structure without the use of algorithms reliant upon alignments. Using simple attributes that can be calculated from any crystal structure, such as secondary structure content, amino acid propensities, surface properties and ligands, we describe each enzyme in a non-redundant set. The set is split according to Enzyme Classification (EC) number. We combine the predictions of one-class versus one-class support vector machine models to make overall assignments of EC number to an accuracy of 35% with the top-ranked prediction, rising to 60% accuracy with the top two ranks. In doing so we demonstrate the utility of simple structural attributes in protein function prediction and shed light on the link between structure and function. We apply our methods to predict the function of every currently unclassified protein in the Protein Data Bank. 相似文献
2.
Appala Raju Kotaru Khader Shameer Pandurangan Sundaramurthy Ramesh Chandra Joshi 《Bioinformation》2013,9(7):368-374
Predicting functions of proteins and alternatively spliced isoforms encoded in a genome is one of the important applications of
bioinformatics in the post-genome era. Due to the practical limitation of experimental characterization of all proteins encoded in a
genome using biochemical studies, bioinformatics methods provide powerful tools for function annotation and prediction. These
methods also help minimize the growing sequence-to-function gap. Phylogenetic profiling is a bioinformatics approach to identify
the influence of a trait across species and can be employed to infer the evolutionary history of proteins encoded in genomes. Here
we propose an improved phylogenetic profile-based method which considers the co-evolution of the reference genome to derive
the basic similarity measure, the background phylogeny of target genomes for profile generation and assigning weights to target
genomes. The ordering of genomes and the runs of consecutive matches between the proteins were used to define phylogenetic
relationships in the approach. We used Escherichia coli K12 genome as the reference genome and its 4195 proteins were used in the
current analysis. We compared our approach with two existing methods and our initial results show that the predictions have
outperformed two of the existing approaches. In addition, we have validated our method using a targeted protein-protein
interaction network derived from protein-protein interaction database STRING. Our preliminary results indicates that
improvement in function prediction can be attained by using coevolution-based similarity measures and the runs on to the same
scale instead of computing them in different scales. Our method can be applied at the whole-genome level for annotating
hypothetical proteins from prokaryotic genomes. 相似文献
3.
Avinash Mishra Satyanarayan Rao Aditya Mittal B. Jayaram 《Biochimica et Biophysica Acta - Proteins and Proteomics》2013,1834(8):1520-1531
Specification of the three dimensional structure of a protein from its amino acid sequence, also called a “Grand Challenge” problem, has eluded a solution for over six decades. A modestly successful strategy has evolved over the last couple of decades based on development of scoring functions (e.g. mimicking free energy) that can capture native or native-like structures from an ensemble of decoys generated as plausible candidates for the native structure. A scoring function must be fast enough in discriminating the native from unfolded/misfolded structures, and requires validation on a large data set(s) to generate sufficient confidence in the score. Here we develop a scoring function called pcSM that detects true native structure in the top 5 with 93% accuracy from an ensemble of candidate structures. If we eliminate the native from ensemble of decoys then pcSM is able to capture near native structure (RMSD < = 5 ?) in top 10 with 86% accuracy. The parameters considered in pcSM are a C-alpha Euclidean metric, secondary structural propensity, surface areas and an intramolecular energy function. pcSM has been tested on 415 systems consisting 142,698 decoys (public and CASP—largest reported hitherto in literature). The average rank for the native is 2.38, a significant improvement over that existing in literature. In-silico protein structure prediction requires robust scoring technique(s). Therefore, pcSM is easily amenable to integration into a successful protein structure prediction strategy. The tool is freely available at http://www.scfbio-iitd.res.in/software/pcsm.jsp. 相似文献
4.
Qudsia Yousafi Ayesha Sarfaraz Muhammad Saad Khan Shahzad Saleem Umbreen Shahzad Azhar Abbas Khan Mazhar Sadiq Allah Ditta Abid Muhammad Sohail Shahzad Najam ul Hassan 《Saudi Journal of Biological Sciences》2021,28(4):2197-2209
Lepidoptera is the second most diverse insect order outnumbered only by the Coeleptera. Acetylcholinesterase (AChE) is the major target site for insecticides. Extensive use of insecticides, to inhibit the function of this enzyme, have resulted in the development of insecticide resistance. Complete knowledge of the target proteins is very important to know the cause of resistance. Computational annotation of insect acetylcholinesterase can be helpful for the characterization of this important protein. Acetylcholinesterase of fourteen lepidopteran insect pest species was annotated by using different bioinformatics tools. AChE in all the species was hydrophilic and thermostable. All the species showed lower values for instability index except L. orbonalis, S. exigua and T. absoluta. Highest percentage of Arg, Asp, Asn, Gln and Cys were recorded in P. rapae. High percentage of Cys and Gln might be reason for insecticide resistance development in P. rapae. Phylogenetic analysis revealed the AChE in T. absoluta, L. orbonalis and S. exigua are closely related and emerged from same primary branch. Three functional motifs were predicted in eleven species while only two were found in L. orbonalis, S. exigua and T. absoluta. AChE in eleven species followed secretory pathway and have signal peptides. No signal peptides were predicted for S. exigua, L. orbonalis and T. absoluta and follow non secretory pathway. Arginine methylation and cysteine palmotylation was found in all species except S. exigua, L. orbonalis and T. absoluta. Glycosylphosphatidylinositol (GPI) anchor was predicted in only nine species. 相似文献
5.
6.
Verspoor K Cohn J Mniszewski S Joslyn C 《Protein science : a publication of the Protein Society》2006,15(6):1544-1549
Automated function prediction (AFP) methods increasingly use knowledge discovery algorithms to map sequence, structure, literature, and/or pathway information about proteins whose functions are unknown into functional ontologies, typically (a portion of) the Gene Ontology (GO). While there are a growing number of methods within this paradigm, the general problem of assessing the accuracy of such prediction algorithms has not been seriously addressed. We present first an application for function prediction from protein sequences using the POSet Ontology Categorizer (POSOC) to produce new annotations by analyzing collections of GO nodes derived from annotations of protein BLAST neighborhoods. We then also present hierarchical precision and hierarchical recall as new evaluation metrics for assessing the accuracy of any predictions in hierarchical ontologies, and discuss results on a test set of protein sequences. We show that our method provides substantially improved hierarchical precision (measure of predictions made that are correct) when applied to the nearest BLAST neighbors of target proteins, as compared with simply imputing that neighborhood's annotations to the target. Moreover, when our method is applied to a broader BLAST neighborhood, hierarchical precision is enhanced even further. In all cases, such increased hierarchical precision performance is purchased at a modest expense of hierarchical recall (measure of all annotations that get predicted at all). 相似文献
7.
Jordi Mestres 《Journal of molecular modeling》2000,6(7-8):539-549
The use of a Gaussian-based representation of protein structures for evaluating protein-structure similarities and deriving three-dimensional superpositions is presented. The approach, as implemented in the program GAPS, is applied to three pairs of proteins with different topological characteristics (rich -helix, mixed -helix/-strand, and rich -strand), low sequence identities (10–30%), and recognized difficulties to define a unique optimum alignment.Validation of the GAPS superpositions is done by comparison with superpositions obtained by the TOP, GA_FIT, and ALIGN programs and those directly extracted from the FSSP database. Results suggest that a Gaussian-based methodology offers an objective means to, depending on the Gaussian-based representation, derive a consensus three-dimensional superposition when alternative superposition solutions exist. 相似文献
8.
The PABP-interacting motif PAM2 has been identified in various eukaryotic proteins as an important binding site for the PABC domain. This domain is contained in homologs of the poly(A)-binding protein PABP and the ubiquitin-protein ligase HYD. Despite the importance of the PAM2 motif, a comprehensive analysis of its occurrence in different proteins has been missing. Using iterated sequence profile searches, we obtained an extensive list of proteins carrying the PAM2 motif. We discuss their functional context and domain architecture, which often consists of RNA-binding domains. Our list of PAM2 motif proteins includes eukaryotic homologs of eRF3/GSPT1/2, PAIP1/2, Tob1/2, Ataxin-2, RBP37, RBP1, Blackjack, HELZ, TPRD, USP10, ERD15, C1D4.14, and the viral protease P29. The identification of the PAM2 motif in as yet uncharacterized proteins can give valuable hints with respect to their cellular function and potential interaction partners and suggests further experimentation. It is also striking that the PAM2 motif appears to occur solely outside globular protein domains. 相似文献
9.
A new topological method to measure protein structure similarity 总被引:5,自引:0,他引:5
A method for the quantitative evaluation of structural similarity between protein pairs is developed that makes use of a Delaunay-based topological mapping. The result of the mapping is a three-dimensional array which is representative of the global structural topology and whose elements can be used to construe an integral scoring scheme. This scoring scheme was tested for its dependence on the protein length difference in a pairwise comparison, its ability to provide a reasonable means for structural similarity comparison within a family of structural neighbors of similar length, and its sensitivity to the differences in protein conformation. It is shown that such a topological evaluation of similarity is capable of providing insight into these points of interest. Protein structure comparison using the method is computationally efficient and the topological scores, although providing different information about protein similarity, correlate well with the distance root-mean-square deviation values calculated by rigid-body structural alignment. 相似文献
10.
11.
利用有限个实验条件下的基因表达谱数据,只能对与实验条件相关的基因功能类进行有效预测,所以有必要限定可预测的基因功能类范围。据此,首先基于GeneOntology(GO)选择富集差异表达基因与实验条件相关的功能类。再通过支持向量机分类器,深化预测迄今只注释到实验条件相关功能类的父结点的基因是否属于该实验条件相关功能类。应用于一套酵母基因表达谱数据,结果显示,在剔除了高度不平衡的训练集合后,平均真阳性率(precision)与平均覆盖率(recall)都分别达到了71%与47%以上。 相似文献
12.
Back-propagation, feed-forward neural networks are used to predict the secondary structures of membrane proteins whose structures are known to atomic resolution. These networks are trained on globular proteins and can predict globular protein structures having no homology to those of the training set with correlation coefficients (C) of 0.45, 0.32 and 0.43 for a-helix, -strand and random coil structures, respectively. When tested on membrane proteins, neural networks trained on globular proteins do, on average, correctly predict (Qi) 62%, 38% and 69% of the residues in the -helix, -strand and random coil structures. These scores rank higher than those obtained with the currently used statistical methods and are comparable to those obtained with the joint approaches tested so far on membrane proteins. The lower success score for -strand as compared to the other structures suggests that the sample of -strand patterns contained in the training set is less representative than those of a-helix and random coil. Our analysis, which includes the effects of the network parameters and of the structural composition of the training set on the prediction, shows that regular patterns of secondary structures can be successfully extrapolated from globular to membrane proteins.Correspondence to: R. Casadio 相似文献
13.
Macdonald JR Johnson WC 《Protein science : a publication of the Protein Society》2001,10(6):1172-1177
We have investigated amino acid features that determine secondary structure: (1) the solvent accessibility of each side chain, and (2) the interaction of each side chain with others one to four residues apart. Solvent accessibility is a simple model that distinguishes residue environment. The pairwise interactions represent a simple model of local side chain to side chain interactions. To test the importance of these features we developed an algorithm to separate alpha-helices, beta-strands, and \"other\" structure. Single residue and pairwise probabilities were determined for 25,141 samples from proteins with <30% homology. Combining the features of solvent accessibility with pairwise probabilities allows us to distinguish the three structures after cross validation at the 82.0% level. We gain 1.4% to 2.0% accuracy by optimizing the propensities, demonstrating that probabilities do not necessarily reflect propensities. Optimization of residue exposures, weights of all probabilities, and propensities increased accuracy to 84.0%. 相似文献
14.
Myristoylation by the myristoyl-CoA:protein N-myristoyltransferase (NMT) is an important lipid anchor modification of eukaryotic and viral proteins. Automated prediction of N-terminal N-myristoylation from the substrate protein sequence alone is necessary for large-scale sequence annotation projects but it requires a low rate of false positive hits in addition to a sufficient sensitivity.Our previous analysis of substrate protein sequence variability, NMT sequences and 3D structures has revealed motif properties in addition to the known PROSITE motif that are utilized in a new predictor described here. The composite prediction function (with separate ad hoc parameterization (a) for queries from non-fungal eukaryotes and their viruses and (b) for sequences from fungal species) consists of terms evaluating amino acid type preferences at sequences positions close to the N terminus as well as terms penalizing deviations from the physical property pattern of amino acid side-chains encoded in multi-residue correlation within the motif sequence. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set as well as with kinetic data for model substrates. The sensitivity in recognizing documented NMT substrates is above 95 % for both taxon-specific versions. The corresponding rate of false positive prediction (for sequences with an N-terminal glycine residue) is close to 0.5 %; thus, the technique is applicable for large-scale automated sequence database annotation. The predictor is available as public WWW-server with the URL http://mendel.imp.univie.ac.at/myristate/. Additionally, we propose a version of the predictor that identifies a number of proteolytic protein processing sites at internal glycine residues and that evaluates possible N-terminal myristoylation of the protein fragments.A scan of public protein databases revealed new potential NMT targets for which the myristoyl modification may be of critical importance for biological function. Among others, the list includes kinases, phosphatases, proteasomal regulatory subunit 4, kinase interacting proteins KIP1/KIP2, protozoan flagellar proteins, homologues of mitochondrial translocase TOM40, of the neuronal calcium sensor NCS-1 and of the cytochrome c-type heme lyase CCHL. Analyses of complete eukaryote genomes indicate that about 0.5 % of all encoded proteins are apparent NMT substrates except for a higher fraction in Arabidopsis thaliana ( approximately 0.8 %). 相似文献
15.
Jayasinghe S Hristova K White SH 《Protein science : a publication of the Protein Society》2001,10(2):455-458
The reliability of the transmembrane (TM) sequence assignments for membrane proteins (MPs) in standard sequence databases is uncertain because the vast majority are based on hydropathy plots. A database of MPs with dependable assignments is necessary for developing new computational tools for the prediction of MP structure. We have therefore created MPtopo, a database of MPs whose topologies have been verified experimentally by means of crystallography, gene fusion, and other methods. Tests using MPtopo strongly validated four existing MP topology-prediction algorithms. MPtopo is freely available over the internet and can be queried by means of an SQL-based search engine. 相似文献
16.
Copley RR Russell RB Ponting CP 《Protein science : a publication of the Protein Society》2001,10(2):285-292
Sequence similarity is the most common measure currently used to infer homology between proteins. Typically, homologous protein domains show sequence similarity over their entire lengths. Here we identify Asp box motifs, initially found as repeats in sialidases and neuraminidases, in new structural and sequence contexts. These motifs represent significantly similar sequences, localized to beta hairpins within proteins that are otherwise different in sequence and three-dimensional structure. By performing a combined sequence- and structure-based analysis we detect Asp boxes in more than nine protein families, including bacterial ribonucleases, sulfite oxidases, reelin, netrins, some lipoprotein receptors, and a variety of glycosyl hydrolases. Although the function common to each of these proteins, if any, remains unclear, we discuss possible functions of Asp boxes on the basis of previously determined experimental results and discuss different evolutionary scenarios for the origin of Asp-box containing proteins. 相似文献
17.
Abstract Conformational switching in the secondary structure of RNAs has recently attracted considerable attention, fostered by the discovery of ‘riboswitches’ in living organisms. These are genetic control elements that were found in bacteria and offer a unique regulation mechanism based on switching between two highly stable states, separated by an energy barrier between them. In riboswitches, the energy barrier is crossed by direct metabolite binding, which facilitates regulation by allosteric means. However, other event triggers can cause switching to occur, such as single-point mutations and slight variations in temperature. Examples of switches with these event triggers have already been reported experimentally in the past. Here, the goal is to computationally design small RNA switches that rely on these triggers. Towards this end, our computer simulations utilize a variety of different similarity measures to assess the distances between an initial state and triggered states, based on the topology of the secondary structure itself. We describe these combined similarity measures that rely on both coarse-grained and fine-grained graph representations of the RNA secondary structure. As a result of our simulations, we provide some candidate sequences of approximately 30–50 nt, along with the exact triggers that drive the switching. The event triggers under consideration can be modelled by Zuker's mfold or the Vienna package. The proposed methodology that rely on shape measures can further be used to computationally generate more candidates by simulating various event triggers and calculating their effect on the shape. 相似文献
18.
The ROP2 family of Toxoplasma gondii rhoptry proteins: proteomic and genomic characterization and molecular modeling 总被引:1,自引:0,他引:1
El Hajj H Demey E Poncet J Lebrun M Wu B Galéotti N Fourmaux MN Mercereau-Puijalon O Vial H Labesse G Dubremetz JF 《Proteomics》2006,6(21):5773-5784
Four rhoptry proteins (ROP) of Toxoplasma gondii previously identified with mAb have been affinity purified and analyzed by MS; the data obtained allowed the genomic sequences to be assigned to these proteins. As previously suggested for some of them by antibody crossreactivity, these proteins were shown to belong to a family, the prototype of which being ROP2. We describe here the proteins ROP2, 4, 5, and 7. These four proteins correspond to the most abundant products of a gene family that comprises several members which we have identified in genomic and EST libraries. Eight additional sequences were found and we have cloned four of them. All members of the ROP2 family contain a protein-kinase-like domain, but only some of them possess a bona fide kinase catalytic site. Molecular modeling of the kinase domain demonstrates the conservation of residues critical for the stabilization of the protein-kinase fold, especially within a hydrophobic segment described so far as transmembrane and which appears as an helix buried inside the protein. The concomitant synthesis of these ROPs by T. gondii tachyzoites suggests a specific role for each of these proteins, especially in the early interaction with the host cell upon invasion. 相似文献
19.
L. Jaroszewski L. Rychlewski B. Zhang A. Godzik 《Protein science : a publication of the Protein Society》1998,7(6):1431-1440
Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence-sequence and sequence-structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a \"jury\" method, which has accuracy higher than any of the single methods. Finally, building full three-dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one-dimensional sequences, lead to unsolvable sterical conflicts for the full three-dimensional models. 相似文献
20.
Due to advances in molecular biology the DNA sequences of structural genes coding for proteins are often known before a protein is characterized or even isolated. The function of a protein whose amino acid sequence has been deduced from a DNA sequence may not even be known. This has created greater interest in the development of methods to predict the tertiary structures of proteins. The a priori prediction of a protein's structure from its amino acid sequence is not yet possible. However, since proteins with similar amino acid sequences are observed to have similar three-dimensional structures, it is possible to use an analogy with a protein of known structure to draw some conclusions about the structure and properties of an uncharacterized protein. The process of predicting the tertiary structure of a protein relies very much upon computer modeling and analysis of the structure. The prediction of the structure of the bacteriophage 434 cro repressor is used as an example illustrating current procedures. 相似文献