首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure.  相似文献   

2.
Searching for protein structure-function relationships using three-dimensional (3D) structural coordinates represents a fundamental approach for determining the function of proteins with unknown functions. Since protein structure databases are rapidly growing in size, the development of a fast search method to find similar protein substructures by comparison of protein 3D structures is essential. In this article, we present a novel protein 3D structure search method to find all substructures with root mean square deviations (RMSDs) to the query structure that are lower than a given threshold value. Our new algorithm runs in O(m + N/m(0.5)) time, after O(N log N) preprocessing, where N is the database size and m is the query length. The new method is 1.8-41.6 times faster than the practically best known O(N) algorithm, according to computational experiments using a huge database (i.e., >20,000,000 C-alpha coordinates).  相似文献   

3.
G Vriend  C Sander 《Proteins》1991,11(1):52-58
We present a fully automatic algorithm for three-dimensional alignment of protein structures and for the detection of common substructures and structural repeats. Given two proteins, the algorithm first identifies all pairs of structurally similar fragments and subsequently clusters into larger units pairs of fragments that are compatible in three dimensions. The detection of similar substructures is independent of insertion/deletion penalties and can be chosen to be independent of the topology of loop connections and to allow for reversal of chain direction. Using distance geometry filters and other approximations, the algorithm, implemented in the WHAT IF program, is so fast that structural comparison of a single protein with the entire database of known protein structures can be performed routinely on a workstation. The method reproduces known non-trivial superpositions such as plastocyanin on azurin. In addition, we report surprising structural similarity between ubiquitin and a (2Fe-2S) ferredoxin.  相似文献   

4.
We present a comprehensive evaluation of a new structure mining method called PB-ALIGN. It is based on the encoding of protein structure as 1D sequence of a combination of 16 short structural motifs or protein blocks (PBs). PBs are short motifs capable of representing most of the local structural features of a protein backbone. Using derived PB substitution matrix and simple dynamic programming algorithm, PB sequences are aligned the same way amino acid sequences to yield structure alignment. PBs are short motifs capable of representing most of the local structural features of a protein backbone. Alignment of these local features as sequence of symbols enables fast detection of structural similarities between two proteins. Ability of the method to characterize and align regions beyond regular secondary structures, for example, N and C caps of helix and loops connecting regular structures, puts it a step ahead of existing methods, which strongly rely on secondary structure elements. PB-ALIGN achieved efficiency of 85% in extracting true fold from a large database of 7259 SCOP domains and was successful in 82% cases to identify true super-family members. On comparison to 13 existing structure comparison/mining methods, PB-ALIGN emerged as the best on general ability test dataset and was at par with methods like YAKUSA and CE on nontrivial test dataset. Furthermore, the proposed method performed well when compared to flexible structure alignment method like FATCAT and outperforms in processing speed (less than 45 s per database scan). This work also establishes a reliable cut-off value for the demarcation of similar folds. It finally shows that global alignment scores of unrelated structures using PBs follow an extreme value distribution. PB-ALIGN is freely available on web server called Protein Block Expert (PBE) at http://bioinformatics.univ-reunion.fr/PBE/.  相似文献   

5.
The availability of fast and robust algorithms for protein structure comparison provides an opportunity to produce a database of three-dimensional comparisons, called families of structurally similar proteins (FSSP). The database currently contains an extended structural family for each of 154 representative (below 30% sequence identity) protein chains. Each data set contains: the search structure; all its relatives with 70-30% sequence identity, aligned structurally; and all other proteins from the representative set that contain substructures significantly similar to the search structure. Very close relatives (above 70% sequence identity) rarely have significant structural differences and are excluded. The alignments of remote relatives are the result of pairwise all-against-all structural comparisons in the set of 154 representative protein chains. The comparisons were carried out with each of three novel automatic algorithms that cover different aspects of protein structure similarity. The user of the database has the choice between strict rigid-body comparisons and comparisons that take into account interdomain motion or geometrical distortions; and, between comparisons that require strictly sequential ordering of segments and comparisons, which allow altered topology of loop connections or chain reversals. The data sets report the structurally equivalent residues in the form of a multiple alignment and as a list of matching fragments to facilitate inspection by three-dimensional graphics. If substructures are ignored, the result is a database of structure alignments of full-length proteins, including those in the twilight zone of sequence similarity.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

6.
We present the development of a web server, a protein short motif search tool that allows users to simultaneously search for a protein sequence motif and its secondary structure assignments. The web server is able to query very short motifs searches against PDB structural data from the RCSB Protein Databank, with the users defining the type of secondary structures of the amino acids in the sequence motif. The output utilises 3D visualisation ability that highlights the position of the motif in the structure and on the corresponding sequence. Researchers can easily observe the locations and conformation of multiple motifs among the results. Protein short motif search also has an application programming interface (API) for interfacing with other bioinformatics tools. AVAILABILITY: The database is available for free at http://birg3.fbb.utm.my/proteinsms.  相似文献   

7.
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous super-position (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 'orphans' (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pauling.mbu.iisc.ernet. in/ approximately pali.  相似文献   

8.
Proteins that contain similar structural elements often have analogous functions regardless of the degree of sequence similarity or structure connectivity in space. In general, protein structure comparison (PSC) provides a straightforward methodology for biologists to determine critical aspects of structure and function. Here, we developed a novel PSC technique based on angle-distance image (A-D image) transformation and matching, which is independent of sequence similarity and connectivity of secondary structure elements (SSEs). An A-D image is constructed by utilizing protein secondary structure information. According to various types of SSEs, the mutual SSE pairs of the query protein are classified into three different types of sub-images. Subsequently, corresponding sub-images between query and target protein structures are compared using modified cross-correlation approaches to identify the similarity of various patterns. Structural relationships among proteins are displayed by hierarchical clustering trees, which facilitate the establishment of the evolutionary relationships between structure and function of various proteins.Four standard testing datasets and one newly created dataset were used to evaluate the proposed method. The results demonstrate that proteins from these five datasets can be categorized in conformity with their spatial distribution of SSEs. Moreover, for proteins with low sequence identity that share high structure similarity, the proposed algorithms are an efficient and effective method for structural comparison.  相似文献   

9.
Shih ES  Hwang MJ 《Proteins》2004,56(3):519-527
Comparison of two protein structures often results in not only a global alignment but also a number of distinct local alignments; the latter, referred to as alternative alignments, are however usually ignored in existing protein structure comparison analyses. Here, we used a novel method of protein structure comparison to extensively identify and characterize the alternative alignments obtained for structure pairs of a fold classification database. We showed that all alternative alignments can be classified into one of just a few types, and with which illustrated the potential of using alternative alignments to identify recurring protein substructures, including the internal structural repeats of a protein. Furthermore, we showed that among the alternative alignments obtained, permuted alignments, which included both circular and scrambled permutations, are as prevalent as topological alignments. These results demonstrated that the so far largely unattended alternative alignments of protein structures have implications and applications for research of protein classification and evolution.  相似文献   

10.
Based on a study involving structural comparisons of proteins sharing 25% or less sequence identity, three rounds of Psi-BLAST appear capable of identifying remote evolutionary homologs with greater than 95% confidence provided that more than 50% of the query sequence can be aligned with the target sequence. Since it seems that more than 80% of all homologous protein pairs may be characterized by a lack of significant sequence similarity, the experimental biologist is often confronted with a lack of guidance from conventional homology searches involving pair-wise sequence comparisons. The ability to disregard levels of sequence identity and expect value in Psi-BLAST if at least 50% of the query sequence has been aligned allows for generation of new hypotheses by consideration of matches that are conventionally disregarded. In one example, we suggest a possible evolutionary linkage between the cupredoxin and immunoglobulin fold families. A thermostable hypothetical protein of unknown function may be a circularly permuted homolog to phosphotriesterase, an enzyme capable of detoxifying organophosphate nerve agents. In a third example, the amino acid sequence of another hypothetical protein of unknown function reveals the ATP binding-site, metal binding site, and catalytic sidechain consistent with kinase activity of unknown specificity. This approach significantly expands the utility of existing sequence data to define the primary structure degeneracy of binding sites for substrates, cofactors and other proteins.  相似文献   

11.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

12.
We have recently developed a fast approach to comparisons of 3-dimensional structures. Our method is unique, treating protein structures as collections of unconnected points (atoms) in space. It is completely independent of the amino acid sequence order. It is unconstrained by insertions, deletions, and chain directionality. It matches single, isolated amino acids between 2 different structures strictly by their spatial positioning regardless of their relative sequential position in the amino acid chain. It automatically detects a recurring 3D motif in protein molecules. No predefinition of the motif is required. The motif can be either in the interior of the proteins or on their surfaces. In this work, we describe an enhancement over our previously developed technique, which considerably reduces the complexity of the algorithm. This results in an extremely fast technique. A typical pairwise comparison of 2 protein molecules requires less than 3 s on a workstation. We have scanned the structural database with dozens of probes, successfully detecting structures that are similar to the probe. To illustrate the power of this method, we compare the structure of a trypsin-like serine protease against the structural database. Besides detecting homologous trypsin-like proteases, we automatically obtain 3D, sequence order-independent, active-site similarities with subtilisin-like and sulfhydryl proteases. These similarities equivalence isolated residues, not conserving the linear order of the amino acids in the chains. The active-site similarities are well known and have been detected by manually inspecting the structures in a time-consuming, laborious procedure. This is the first time such equivalences are obtained automatically from the comparison of full structures. The far-reaching advantages and the implications of our novel algorithm to studies of protein folding, to evolution, and to searches for pharmacophoric patterns are discussed.  相似文献   

13.
Mooney SD  Liang MH  DeConde R  Altman RB 《Proteins》2005,61(4):741-747
A primary challenge for structural genomics is the automated functional characterization of protein structures. We have developed a sequence-independent method called S-BLEST (Structure-Based Local Environment Search Tool) for the annotation of previously uncharacterized protein structures. S-BLEST encodes the local environment of an amino acid as a vector of structural property values. It has been applied to all amino acids in a nonredundant database of protein structures to generate a searchable structural resource. Given a query amino acid from an experimentally determined or modeled structure, S-BLEST quickly identifies similar amino acid environments using a K-nearest neighbor search. In addition, the method gives an estimation of the statistical significance of each result. We validated S-BLEST on X-ray crystal structures from the ASTRAL 40 nonredundant dataset. We then applied it to 86 crystallographically determined proteins in the protein data bank (PDB) with unknown function and with no significant sequence neighbors in the PDB. S-BLEST was able to associate 20 proteins with at least one local structural neighbor and identify the amino acid environments that are most similar between those neighbors.  相似文献   

14.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

15.
We present a novel notion of binding site local similarity based on the analysis of complete protein environments of ligand fragments. Comparison of a query protein binding site (target) against the 3D structure of another protein (analog) in complex with a ligand enables ligand fragments from the analog complex to be transferred to positions in the target site, so that the complete protein environments of the fragment and its image are similar. The revealed environments are similarity regions and the fragments transferred to the target site are considered as binding patterns. The set of such binding patterns derived from a database of analog complexes forms a cloud-like structure (fragment cloud), which is a powerful tool for computational drug design. It has been shown on independent test sets that the combined use of a traditional energy-based score together with the cloud-based score responsible for the quality of embedding of a ligand into the fragment cloud improves the self-docking and screening results dramatically. The usage of a fragment cloud as a source of positioned molecular fragments fitting the binding protein environment has been validated by reproduction of experimental ligand optimization results.  相似文献   

16.
Reddy BV  Li WW  Shindyalov IN  Bourne PE 《Proteins》2001,42(2):148-163
An all-against-all protein structure comparison using the Combinatorial Extension (CE) algorithm applied to a representative set of PDB structures revealed a gallery of common substructures in proteins (http://cl.sdsc.edu/ce.html). These substructures represent commonly identified folds, domains, or components thereof. Most of the subsequences forming these similar substructures have no significant sequence similarity. We present a method to identify conserved amino acid positions and residue-dependent property clusters within these subsequences starting with structure alignments. Each of the subsequences is aligned to its homologues in SWALL, a nonredundant protein sequence database. The most similar sequences are purged into a common frequency matrix, and weighted homologues of each one of the subsequences are used in scoring for conserved key amino acid positions (CKAAPs). We have set the top 20% of the high-scoring positions in each substructure to be CKAAPs. It is hypothesized that CKAAPs may be responsible for the common folding patterns in either a local or global view of the protein-folding pathway. Where a significant number of structures exist, CKAAPs have also been identified in structure alignments of complete polypeptide chains from the same protein family or superfamily. Evidence to support the presence of CKAAPs comes from other computational approaches and experimental studies of mutation and protein-folding experiments, notably the Paracelsus challenge. Finally, the structural environment of CKAAPs versus non-CKAAPs is examined for solvent accessibility, hydrogen bonding, and secondary structure. The identification of CKAAPs has important implications for protein engineering, fold recognition, modeling, and structure prediction studies and is dependent on the availability of structures and an accurate structure alignment methodology. Proteins 2001;42:148-163.  相似文献   

17.
Searches using position specific scoring matrices (PSSMs) have been commonly used in remote homology detection procedures such as PSI-BLAST and RPS-BLAST. A PSSM is generated typically using one of the sequences of a family as the reference sequence. In the case of PSI-BLAST searches the reference sequence is same as the query. Recently we have shown that searches against the database of multiple family-profiles, with each one of the members of the family used as a reference sequence, are more effective than searches against the classical database of single family-profiles. Despite relatively a better overall performance when compared with common sequence-profile matching procedures, searches against the multiple family-profiles database result in a few false positives and false negatives. Here we show that profile length and divergence of sequences used in the construction of a PSSM have major influence on the performance of multiple profile based search approach. We also identify that a simple parameter defined by the number of PSSMs corresponding to a family that is hit, for a query, divided by the total number of PSSMs in the family can distinguish effectively the true positives from the false positives in the multiple profiles search approach.  相似文献   

18.
Structural comparison reveals remote homology that often fails to be detected by sequence comparison. The DALI web server ( http://ekhidna2.biocenter.helsinki.fi/dali ) is a platform for structural analysis that provides database searches and interactive visualization, including structural alignments annotated with secondary structure, protein families and sequence logos, and 3D structure superimposition supported by color-coded sequence and structure conservation. Here, we are using DALI to mine the AlphaFold Database version 1, which increased the structural coverage of protein families by 20%. We found 100 remote homologous relationships hitherto unreported in the current reference database for protein domains, Pfam 35.0. In particular, we linked 35 domains of unknown function (DUFs) to the previously characterized families, generating a functional hypothesis that can be explored downstream in structural biology studies. Other findings include gene fusions, tandem duplications, and adjustments to domain boundaries. The evidence for homology can be browsed interactively through live examples on DALI's website.  相似文献   

19.
To address many challenges in RNA structure/function prediction, the characterization of RNA''s modular architectural units is required. Using the RNA-As-Graphs (RAG) database, we have previously explored the existence of secondary structure (2D) submotifs within larger RNA structures. Here we present RAG-3D—a dataset of RNA tertiary (3D) structures and substructures plus a web-based search tool—designed to exploit graph representations of RNAs for the goal of searching for similar 3D structural fragments. The objects in RAG-3D consist of 3D structures translated into 3D graphs, cataloged based on the connectivity between their secondary structure elements. Each graph is additionally described in terms of its subgraph building blocks. The RAG-3D search tool then compares a query RNA 3D structure to those in the database to obtain structurally similar structures and substructures. This comparison reveals conserved 3D RNA features and thus may suggest functional connections. Though RNA search programs based on similarity in sequence, 2D, and/or 3D structural elements are available, our graph-based search tool may be advantageous for illuminating similarities that are not obvious; using motifs rather than sequence space also reduces search times considerably. Ultimately, such substructuring could be useful for RNA 3D structure prediction, structure/function inference and inverse folding.  相似文献   

20.
Yang JM  Tung CH 《Nucleic acids research》2006,34(13):3646-3659
As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号