首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 662 毫秒
1.
2.
Protein function prediction using local 3D templates   总被引:8,自引:0,他引:8  
The prediction of a protein's function from its 3D structure is becoming more and more important as the worldwide structural genomics initiatives gather pace and continue to solve 3D structures, many of which are of proteins of unknown function. Here, we present a methodology for predicting function from structure that shows great promise. It is based on 3D templates that are defined as specific 3D conformations of small numbers of residues. We use four types of template, covering enzyme active sites, ligand-binding residues, DNA-binding residues and reverse templates. The latter are templates generated from the target structure itself and scanned against a representative subset of all known protein structures. Together, the templates provide a fairly thorough coverage of the known structures and ensure that if there is a match to a known structure it is unlikely to be missed. A new scoring scheme provides a highly sensitive means of discriminating between true positive and false positive template matches. In all, the methodology provides a powerful new tool for function prediction to complement those already in use.  相似文献   

3.
4.
5.
Motif-based searching in TOPS protein topology databases.   总被引:1,自引:0,他引:1  
MOTIVATION: TOPS cartoons are a schematic ion of protein three-dimensional structures in two dimensions, and are used for understanding and manual comparison of protein folds. Recently, an algorithm that produces the cartoons automatically from protein structures has been devised and cartoons have been generated to represent all the structures in the structural databank. There is now a need to be able to define target topological patterns and to search the database for matching domains. RESULTS: We have devised a formal language for describing TOPS diagrams and patterns, and have designed an efficient algorithm to match a pattern to a set of diagrams. A pattern-matching system has been implemented, and tested on a database derived from all the current entries in the Protein Data Bank (15,000 domains). Users can search on patterns selected from a library of motifs or, alternatively, they can define their own search patterns. AVAILABILITY: The system is accessible over the Web at http://tops.ebi.ac.uk/tops  相似文献   

6.
We have recently developed a fast approach to comparisons of 3-dimensional structures. Our method is unique, treating protein structures as collections of unconnected points (atoms) in space. It is completely independent of the amino acid sequence order. It is unconstrained by insertions, deletions, and chain directionality. It matches single, isolated amino acids between 2 different structures strictly by their spatial positioning regardless of their relative sequential position in the amino acid chain. It automatically detects a recurring 3D motif in protein molecules. No predefinition of the motif is required. The motif can be either in the interior of the proteins or on their surfaces. In this work, we describe an enhancement over our previously developed technique, which considerably reduces the complexity of the algorithm. This results in an extremely fast technique. A typical pairwise comparison of 2 protein molecules requires less than 3 s on a workstation. We have scanned the structural database with dozens of probes, successfully detecting structures that are similar to the probe. To illustrate the power of this method, we compare the structure of a trypsin-like serine protease against the structural database. Besides detecting homologous trypsin-like proteases, we automatically obtain 3D, sequence order-independent, active-site similarities with subtilisin-like and sulfhydryl proteases. These similarities equivalence isolated residues, not conserving the linear order of the amino acids in the chains. The active-site similarities are well known and have been detected by manually inspecting the structures in a time-consuming, laborious procedure. This is the first time such equivalences are obtained automatically from the comparison of full structures. The far-reaching advantages and the implications of our novel algorithm to studies of protein folding, to evolution, and to searches for pharmacophoric patterns are discussed.  相似文献   

7.
8.
MSAT     
This article describes the development of a new method for multiple sequence alignment based on fold-level protein structure alignments, which provides an improvement in accuracy compared with the most commonly used sequence-only-based techniques. This method integrates the widely used, progressive multiple sequence alignment approach ClustalW with the Topology of Protein Structure (TOPS) topology-based alignment algorithm. The TOPS approach produces a structural alignment for the input protein set by using a topology-based pattern discovery program, providing a set of matched sequence regions that can be used to guide a sequence alignment using ClustalW. The resulting alignments are more reliable than a sequence-only alignment, as determined by 20-fold cross-validation with a set of 106 protein examples from the CATH database, distributed in seven superfold families. The method is particularly effective for sets of proteins that have similar structures at the fold level but low sequence identity. The aim of this research is to contribute towards bridging the gap between protein sequence and structure analysis, in the hope that this can be used to assist the understanding of the relationship between sequence, structure and function. The tool is available at http://balabio.dcs.gla.ac.uk/msat/.  相似文献   

9.
We present a model-based parallel algorithm for origin and orientation refinement for 3D reconstruction in cryoTEM. The algorithm is based upon the Projection Theorem of the Fourier Transform. Rather than projecting the current 3D model and searching for the best match between an experimental view and the calculated projections, the algorithm computes the Discrete Fourier Transform (DFT) of each projection and searches for the central section ("cut") of the 3D DFT that best matches the DFT of the projection. Factors that affect the efficiency of a parallel program are first reviewed and then the performance and limitations of the proposed algorithm are discussed. The parallel program that implements this algorithm, called PO(2)R, has been used for the refinement of several virus structures, including those of the 500 Angstroms diameter dengue virus (to 9.5 Angstroms resolution), the 850 Angstroms mammalian reovirus (to better than 7A), and the 1800 Angstroms paramecium bursaria chlorella virus (to 15 Angstroms).  相似文献   

10.
Searching for protein structure-function relationships using three-dimensional (3D) structural coordinates represents a fundamental approach for determining the function of proteins with unknown functions. Since protein structure databases are rapidly growing in size, the development of a fast search method to find similar protein substructures by comparison of protein 3D structures is essential. In this article, we present a novel protein 3D structure search method to find all substructures with root mean square deviations (RMSDs) to the query structure that are lower than a given threshold value. Our new algorithm runs in O(m + N/m(0.5)) time, after O(N log N) preprocessing, where N is the database size and m is the query length. The new method is 1.8-41.6 times faster than the practically best known O(N) algorithm, according to computational experiments using a huge database (i.e., >20,000,000 C-alpha coordinates).  相似文献   

11.
A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers.  相似文献   

12.
Comparing and classifying the three-dimensional (3D) structures of proteins is of crucial importance to molecular biology, from helping to determine the function of a protein to determining its evolutionary relationships. Traditionally, 3D structures are classified into groups of families that closely resemble the grouping according to their primary sequence. However, significant structural similarities exist at multiple levels between proteins that belong to these different structural families. In this study, we propose a new algorithm, CLICK, to capture such similarities. The method optimally superimposes a pair of protein structures independent of topology. Amino acid residues are represented by the Cartesian coordinates of a representative point (usually the C(α) atom), side chain solvent accessibility, and secondary structure. Structural comparison is effected by matching cliques of points. CLICK was extensively benchmarked for alignment accuracy on four different sets: (i) 9537 pair-wise alignments between two structures with the same topology; (ii) 64 alignments from set (i) that were considered to constitute difficult alignment cases; (iii) 199 pair-wise alignments between proteins with similar structure but different topology; and (iv) 1275 pair-wise alignments of RNA structures. The accuracy of CLICK alignments was measured by the average structure overlap score and compared with other alignment methods, including HOMSTRAD, MUSTANG, Geometric Hashing, SALIGN, DALI, GANGSTA(+), FATCAT, ARTS and SARA. On average, CLICK produces pair-wise alignments that are either comparable or statistically significantly more accurate than all of these other methods. We have used CLICK to uncover relationships between (previously) unrelated proteins. These new biological insights include: (i) detecting hinge regions in proteins where domain or sub-domains show flexibility; (ii) discovering similar small molecule binding sites from proteins of different folds and (iii) discovering topological variants of known structural/sequence motifs. Our method can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark.  相似文献   

13.
A software suite, SABER (Selection of Active/Binding sites for Enzyme Redesign), has been developed for the analysis of atomic geometries in protein structures, using a geometric hashing algorithm (Barker and Thornton, Bioinformatics 2003;19:1644–1649). SABER is used to explore the Protein Data Bank (PDB) to locate proteins with a specific 3D arrangement of catalytic groups to identify active sites that might be redesigned to catalyze new reactions. As a proof‐of‐principle test, SABER was used to identify enzymes that have the same catalytic group arrangement present in o‐succinyl benzoate synthase (OSBS). Among the highest‐scoring scaffolds identified by the SABER search for enzymes with the same catalytic group arrangement as OSBS were L ‐Ala D/L ‐Glu epimerase (AEE) and muconate lactonizing enzyme II (MLE), both of which have been redesigned to become effective OSBS catalysts, demonstrated by experiments. Next, we used SABER to search for naturally existing active sites in the PDB with catalytic groups similar to those present in the designed Kemp elimination enzyme KE07. From over 2000 geometric matches to the KE07 active site, SABER identified 23 matches that corresponded to residues from known active sites. The best of these matches, with a 0.28 Å catalytic atom RMSD to KE07, was then redesigned to be compatible with the Kemp elimination using RosettaDesign. We also used SABER to search for potential Kemp eliminases using a theozyme predicted to provide a greater rate acceleration than the active site of KE07, and used Rosetta to create a design based on the proteins identified.  相似文献   

14.
Many increasingly prevalent diseases share a common risk factor: age. However, little is known about pharmaceutical interventions against aging, despite many genes and pathways shown to be important in the aging process and numerous studies demonstrating that genetic interventions can lead to a healthier aging phenotype. An important challenge is to assess the potential to repurpose existing drugs for initial testing on model organisms, where such experiments are possible. To this end, we present a new approach to rank drug‐like compounds with known mammalian targets according to their likelihood to modulate aging in the invertebrates Caenorhabditis elegans and Drosophila. Our approach combines information on genetic effects on aging, orthology relationships and sequence conservation, 3D protein structures, drug binding and bioavailability. Overall, we rank 743 different drug‐like compounds for their likelihood to modulate aging. We provide various lines of evidence for the successful enrichment of our ranking for compounds modulating aging, despite sparse public data suitable for validation. The top ranked compounds are thus prime candidates for in vivo testing of their effects on lifespan in C. elegans or Drosophila. As such, these compounds are promising as research tools and ultimately a step towards identifying drugs for a healthier human aging.  相似文献   

15.
Three-dimensional structure of the mini-M conotoxin mr3a   总被引:2,自引:0,他引:2  
Conotoxin mr3a from the venom of Conus marmoreus, a novel peptide that induces rolling seizures in mice, has the peptide sequence GCCGSFACRFGCVOCCV, where O is trans-4-hydroxyproline, and the chain is cross-linked with disulfide bonds between Cys-2 and Cys-16, Cys-3 and Cys-12, and Cys-8 and Cys-15. The tertiary structure of mr3a was determined by 2D 1H NMR in combination with a standard distance-geometry algorithm. The final set of 22 structures for the peptide had a mean global backbone RMS deviation of 0.53 +/- 0.22 A based on 51 NOE, 6 hydrogen bond, 6 phi dihedral angle, and 3 disulfide bond constraints. Conotoxin mr3a is the first example of the new mini-M branch of conopeptides in the M superfamily. Members of the maxi-M branch, whose structures are known, include the mu- and psi-conotoxins, both of which share a common disulfide bond connectivity. Although mr3a has the same arrangement of Cys residues as the mu- and psi-conotoxins, its disulfide connectivity is different. This gives mr3a a distinctive "triple-turn" backbone.  相似文献   

16.
17.
The database reported here is derived using the Combinatorial Extension (CE) algorithm which compares pairs of protein polypeptide chains and provides a list of structurally similar proteins along with their structure alignments. Using CE, structure-structure alignments can provide insights into biological function. When a protein of known function is shown to be structurally similar to a protein of unknown function, a relationship might be inferred; a relationship not necessarily detectable from sequence comparison alone. Establishing structure-structure relationships in this way is of great importance as we enter an era of structural genomics where there is a likelihood of an increasing number of structures with unknown functions being determined. Thus the CE database is an example of a useful tool in the annotation of protein structures of unknown function. Comparisons can be performed on the complete PDB or on a structurally representative subset of proteins. The source protein(s) can be from the PDB (updated monthly) or uploaded by the user. CE provides sequence alignments resulting from structural alignments and Cartesian coordinates for the aligned structures, which may be analyzed using the supplied Compare3D Java applet, or downloaded for further local analysis. Searches can be run from the CE web site, http://cl.sdsc.edu/ce.html, or the database and software downloaded from the site for local use.  相似文献   

18.
We report the derivation of scores that are based on the analysis of residue-residue contact matrices from 443 3-dimensional structures aligned structurally as 96 families, which can be used to evaluate sequence-structure matches. Residue-residue contacts and the more than 3 x 10(6) amino acid substitutions that take place between pairs of these contacts at aligned positions within each family of structures have been tabulated and segregated according to the solvent accessibility of the residues involved. Contact maps within a family of structures are shown to be highly conserved (approximately 75%) even when the sequence identity is approaching 10%. In a comparison involving a globin structure and the search of a sequence databank (> 21,000 sequences), the contact probability scores are shown to provide a very powerful secondary screen for the top scoring sequence-structure matches, where between 69% and 84% of the unrelated matches are eliminated. The search of an aligned set of 2 globins against a sequence databank and the subsequent residue contact-based evaluation of matches locates all 618 globin sequences before the first non-globin match. From a single bacterial serine proteinase structure, the structural template approach coupled with residue-residue contact substitution data lead to the detection of the mammalian serine proteinase family among the top matches in the search of a sequence databank.  相似文献   

19.
With the rapid increase in the size of the genome sequence database, computational analysis of RNA will become increasingly important in revealing structure-function relationships and potential drug targets. RNA secondary structure prediction for a single sequence is 73 % accurate on average for a large database of known secondary structures. This level of accuracy provides a good starting point for determining a secondary structure either by comparative sequence analysis or by the interpretation of experimental studies. Dynalign is a new computer algorithm that improves the accuracy of structure prediction by combining free energy minimization and comparative sequence analysis to find a low free energy structure common to two sequences without requiring any sequence identity. It uses a dynamic programming construct suggested by Sankoff. Dynalign, however, restricts the maximum distance, M, allowed between aligned nucleotides in the two sequences. This makes the calculation tractable because the complexity is simplified to O(M(3)N(3)), where N is the length of the shorter sequence.The accuracy of Dynalign was tested with sets of 13 tRNAs, seven 5 S rRNAs, and two R2 3' UTR sequences. On average, Dynalign predicted 86.1 % of known base-pairs in the tRNAs, as compared to 59.7 % for free energy minimization alone. For the 5 S rRNAs, the average accuracy improves from 47.8 % to 86.4 %. The secondary structure of the R2 3' UTR from Drosophila takahashii is poorly predicted by standard free energy minimization. With Dynalign, however, the structure predicted in tandem with the sequence from Drosophila melanogaster nearly matches the structure determined by comparative sequence analysis.  相似文献   

20.
The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号