首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We carry out a systematic analysis of the correlation between similarity of protein three-dimensional structures and their evolutionary relationships. The structural similarity is quantitatively identified by an all-against-all comparison of the spatial arrangement of secondary structural elements in nonredundant 967 representative proteins, and the evolutionary relationship is judged according to the definition of superfamily in the SCOP database. We find the following symmetry rule: a protein pair that has similar folds but belong to different superfamilies has (with a very rare exception) certain internal symmetry in its common similar folds. Possible reasons behind the symmetry rule are discussed.  相似文献   

2.
Similarity search for protein 3D structures become complex and computationally expensive due to the fact that the size of protein structure databases continues to grow tremendously. Recently, fast structural similarity search systems have been required to put them into practical use in protein structure classification whilst existing comparison systems do not provide comparison results on time. Our approach uses multi-step processing that composes of a preprocessing step to represent geometry of protein structures with spatial objects, a filter step to generate a small candidate set using approximate topological string matching, and a refinement step to compute a structural alignment. This paper describes the preprocessing and filtering for fast similarity search using the discovery of topological patterns of secondary structure elements based on spatial relations. Our system is fully implemented by using Oracle 8i spatial. We have previously shown that our approach has the advantage of speed of performance compared with other approach such as DALI. This work shows that the discovery of topological relations of secondary structure elements in protein structures by using spatial relations of spatial databases is practical for fast structural similarity search for proteins.  相似文献   

3.
We report an unsupervised structural motif discovery algorithm, FoldMiner, which is able to detect global and local motifs in a database of proteins without the need for multiple structure or sequence alignments and without relying on prior classification of proteins into families. Motifs, which are discovered from pairwise superpositions of a query structure to a database of targets, are described probabilistically in terms of the conservation of each secondary structure element's position and are used to improve detection of distant structural relationships. During each iteration of the algorithm, the motif is defined from the current set of homologs and is used both to recruit additional homologous structures and to discard false positives. FoldMiner thus achieves high specificity and sensitivity by distinguishing between homologous and nonhomologous structures by the regions of the query to which they align. We find that when two proteins of the same fold are aligned, highly conserved secondary structure elements in one protein tend to align to highly conserved elements in the second protein, suggesting that FoldMiner consistently identifies the same motif in members of a fold. Structural alignments are performed by an improved superposition algorithm, LOCK 2, which detects distant structural relationships by placing increased emphasis on the alignment of secondary structure elements. LOCK 2 obeys several properties essential in automated analysis of protein structure: It is symmetric, its alignments of secondary structure elements are transitive, its alignments of residues display a high degree of transitivity, and its scoring system is empirically found to behave as a metric.  相似文献   

4.
Discovery of local packing motifs in protein structures   总被引:1,自引:0,他引:1  
We present a language for describing structural patterns of residues in protein structures and a method for the discovery of such patterns that recur in a set of protein structures. The patterns impose restrictions on the spatial position of each residue, their order along the amino acid chain, and which amino acids are allowed in each position. Unlike other methods for comparing sets of protein structures, our method is not based on the use of pairwise structure comparisons which is often time consuming and can produce inconsistent results. Instead, the method simultaneously takes into account information from all structures in the search for conserved structure patterns which are potential structure motifs. The method is based on describing the spatial neighborhoods of each residue in each structure as a string and applying a sequence pattern discovery method to find patterns common to subsets of these strings. Finally it is checked whether the similarities between the neighborhood strings correspond to spatially similar substructures. We apply the method to analyze sets of very disparate proteins from the four different protein families: serine proteases, cuprodoxins, cysteine proteinases, and ferredoxins. The motifs found by the method correspond well to the site and motif information given in the annotation of these proteins in PDB, Swiss-Prot, and PROSITE. Furthermore, the motifs are confirmed by using the motif data to constrain the structural alignment of the proteins obtained with the program SAP. This gave the best superposition/alignment of the proteins given the motif assignment.  相似文献   

5.
RNA binding proteins recognize RNA targets in a sequence specific manner. Apart from the sequence, the secondary structure context of the binding site also affects the binding affinity. Binding sites are often located in single-stranded RNA regions and it was shown that the sequestration of a binding motif in a double-strand abolishes protein binding. Thus, it is desirable to include knowledge about RNA secondary structures when searching for the binding motif of a protein. We present the approach MEMERIS for searching sequence motifs in a set of RNA sequences and simultaneously integrating information about secondary structures. To abstract from specific structural elements, we precompute position-specific values measuring the single-strandedness of all substrings of an RNA sequence. These values are used as prior knowledge about the motif starts to guide the motif search. Extensive tests with artificial and biological data demonstrate that MEMERIS is able to identify motifs in single-stranded regions even if a stronger motif located in double-strand parts exists. The discovered motif occurrences in biological datasets mostly coincide with known protein-binding sites. This algorithm can be used for finding the binding motif of single-stranded RNA-binding proteins in SELEX or other biological sequence data.  相似文献   

6.
7.
RNAMotif, an RNA secondary structure definition and search algorithm   总被引:26,自引:7,他引:19       下载免费PDF全文
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures are assembled from a collection of RNA structural motifs. These basic building blocks are used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Identification of recurring RNA structural motifs will therefore enhance our understanding of RNA structure and help associate elements of RNA structure with functional and regulatory elements. Our goal was to develop a computer program that can describe an RNA structural element of any complexity and then search any nucleotide sequence database, including the complete prokaryotic and eukaryotic genomes, for these structural elements. Here we describe in detail a new computational motif search algorithm, RNAMotif, and demonstrate its utility with some motif search examples. RNAMotif differs from other motif search tools in two important aspects: first, the structure definition language is more flexible and can specify any type of base–base interaction; second, RNAMotif provides a user controlled scoring section that can be used to add capabilities that patterns alone cannot provide.  相似文献   

8.
Structural trees for large protein superfamilies, such as β proteins with the aligned β sheet packing, β proteins with the orthogonal packing of α helices, two-layer and three-layer α/β proteins, have been constructed. The structural motifs having unique overall folds and a unique handedness are taken as root structures of the trees. The larger protein structures of each superfamily are obtained by a stepwise addition of α helices and/or β strands to the corresponding root motif, taking into account a restricted set of rules inferred from known principles of the protein structure. Among these rules, prohibition of crossing connections, attention to handedness and compactness, and a requirement for α helices to be packed in α-helical layers and β strands in β layers are the most important. Proteins and domains whose structures can be obtained by stepwise addition of α helices and/or β strands to the same root motif can be grouped into one structural class or a superfamily. Proteins and domains found within branches of a structural tree can be grouped into subclasses or subfamilies. Levels of structural similarity between different proteins can easily be observed by visual inspection. Within one branch, protein structures having a higher position in the tree include the structures located lower. Proteins and domains of different branches have the structure located in the branching point as the common fold. Proteins 28:241–260, 1997. © 1997 Wiley-Liss Inc.  相似文献   

9.
Proteins that contain similar structural elements often have analogous functions regardless of the degree of sequence similarity or structure connectivity in space. In general, protein structure comparison (PSC) provides a straightforward methodology for biologists to determine critical aspects of structure and function. Here, we developed a novel PSC technique based on angle-distance image (A-D image) transformation and matching, which is independent of sequence similarity and connectivity of secondary structure elements (SSEs). An A-D image is constructed by utilizing protein secondary structure information. According to various types of SSEs, the mutual SSE pairs of the query protein are classified into three different types of sub-images. Subsequently, corresponding sub-images between query and target protein structures are compared using modified cross-correlation approaches to identify the similarity of various patterns. Structural relationships among proteins are displayed by hierarchical clustering trees, which facilitate the establishment of the evolutionary relationships between structure and function of various proteins.Four standard testing datasets and one newly created dataset were used to evaluate the proposed method. The results demonstrate that proteins from these five datasets can be categorized in conformity with their spatial distribution of SSEs. Moreover, for proteins with low sequence identity that share high structure similarity, the proposed algorithms are an efficient and effective method for structural comparison.  相似文献   

10.
Han K  Nepal C 《FEBS letters》2007,581(9):1881-1890
A complete understanding of protein and RNA structures and their interactions is important for determining the binding sites in protein-RNA complexes. Computational approaches exist for identifying secondary structural elements in proteins from atomic coordinates. However, similar methods have not been developed for RNA, due in part to the very limited structural data so far available. We have developed a set of algorithms for extracting and visualizing secondary and tertiary structures of RNA and for analyzing protein-RNA complexes. These algorithms have been implemented in a web-based program called PRI-Modeler (protein-RNA interaction modeler). Given one or more protein data bank files of protein-RNA complexes, PRI-Modeler analyzes the conformation of the RNA, calculates the hydrogen bond (H bond) and van der Waals interactions between amino acids and nucleotides, extracts secondary and tertiary RNA structure elements, and identifies the patterns of interactions between the proteins and RNAs. This paper presents PRI-Modeler and its application to the hydrogen bond and van der Waals interactions in the most representative set of protein-RNA complexes. The analysis reveals several interesting interaction patterns at various levels. The information provided by PRI-Modeler should prove useful for determining the binding sites in protein-RNA complexes. PRI-Modeler is accessible at http://wilab.inha.ac.kr/primodeler/, and supplementary materials are available in the analysis results section at http://wilab.inha.ac.kr/primodeler/.  相似文献   

11.
J Boberg  T Salakoski  M Vihinen 《Proteins》1992,14(2):265-276
Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of alpha- and beta-structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjective view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics.  相似文献   

12.
Liu X  Zhao YP  Zheng WM 《Proteins》2008,71(2):728-736
CLEMAPS is a tool for multiple alignment of protein structures. It distinguishes itself from other existing algorithms for multiple structure alignment by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of three angles formed by C(alpha) pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. The input 3D structures are first converted to sequences of conformational letters. Each string of a fixed length is then taken as the center seed to search other sequences for neighbors of the seed, which are strings similar to the seed. A seed and its neighbors form a center-star, which corresponds to a fragment set of local structural similarity shared by many proteins. The detection of center-stars using CLESUM is extremely efficient. Local similarity is a necessary, but insufficient, condition for structural alignment. Once center-stars are found, the spatial consistency between any two stars are examined to find consistent star duads using atomic coordinates. Consistent duads are later joined to create a core for multiple alignment, which is further polished to produce the final alignment. The utility of CLEMAPS is tested on various protein structure ensembles.  相似文献   

13.
14.
Accurate free energy estimation is essential for RNA structure prediction. The widely used Turner''s energy model works well for nested structures. For pseudoknotted RNAs, however, there is no effective rule for estimation of loop entropy and free energy. In this work we present a new free energy estimation method, termed the pseudoknot predictor in three-dimensional space (pk3D), which goes beyond Turner''s model. Our approach treats nested and pseudoknotted structures alike in one unifying physical framework, regardless of how complex the RNA structures are. We first test the ability of pk3D in selecting native structures from a large number of decoys for a set of 43 pseudoknotted RNA molecules, with lengths ranging from 23 to 113. We find that pk3D performs slightly better than the Dirks and Pierce extension of Turner''s rule. We then test pk3D for blind secondary structure prediction, and find that pk3D gives the best sensitivity and comparable positive predictive value (related to specificity) in predicting pseudoknotted RNA secondary structures, when compared with other methods. A unique strength of pk3D is that it also generates spatial arrangement of structural elements of the RNA molecule. Comparison of three-dimensional structures predicted by pk3D with the native structure measured by nuclear magnetic resonance or X-ray experiments shows that the predicted spatial arrangement of stems and loops is often similar to that found in the native structure. These close-to-native structures can be used as starting points for further refinement to derive accurate three-dimensional structures of RNA molecules, including those with pseudoknots.  相似文献   

15.
Detection of similarity is particularly difficult for small proteins and thus connections between many of them remain unnoticed. Structure and sequence analysis of several metal-binding proteins reveals unexpected similarities in structural domains classified as different protein folds in SCOP and suggests unification of seven folds that belong to two protein classes. The common motif, termed treble clef finger in this study, forms the protein structural core and is 25-45 residues long. The treble clef motif is assembled around the central zinc ion and consists of a zinc knuckle, loop, beta-hairpin and an alpha-helix. The knuckle and the first turn of the helix each incorporate two zinc ligands. Treble clef domains constitute the core of many structures such as ribosomal proteins L24E and S14, RING fingers, protein kinase cysteine-rich domains, nuclear receptor-like fingers, LIM domains, phosphatidylinositol-3-phosphate-binding domains and His-Me finger endonucleases. The treble clef finger is a uniquely versatile motif adaptable for various functions. This small domain with a 25 residue structural core can accommodate eight different metal-binding sites and can have many types of functions from binding of nucleic acids, proteins and small molecules, to catalysis of phosphodiester bond hydrolysis. Treble clef motifs are frequently incorporated in larger structures or occur in doublets. Present analysis suggests that the treble clef motif defines a distinct structural fold found in proteins with diverse functional properties and forms one of the major zinc finger groups.  相似文献   

16.
Sujatha MS  Balaji PV 《Proteins》2004,55(1):44-65
Galactose-binding proteins characterize an important subgroup of sugar-binding proteins that are involved in a variety of biological processes. Structural studies have shown that the Gal-specific proteins encompass a diverse range of primary and tertiary structures. The binding sites for galactose also seem to vary in different protein-galactose complexes. No common binding site features that are shared by the Gal-specific proteins to achieve ligand specificity are so far known. With the assumption that common recognition principles will exist for common substrate recognition, the present study was undertaken to identify and characterize any unique galactose-binding site signature by analyzing the three-dimensional (3D) structures of 18 protein-galactose complexes. These proteins belong to 7 nonhomologous families; thus, there is no sequence or structural similarity across the families. Within each family, the binding site residues and their relative distances were well conserved, but there were no similarities across families. A novel, yet simple, approach was adopted to characterize the binding site residues by representing their relative spatial dispositions in polar coordinates. A combination of the deduced geometrical features with the structural characteristics, such as solvent accessibility and secondary structure type, furnished a potential galactose-binding site signature. The signature was evaluated by incorporation into the program COTRAN to search for potential galactose-binding sites in proteins that share the same fold as the known galactose-binding proteins. COTRAN is able to detect galactose-binding sites with a very high specificity and sensitivity. The deduced galactose-binding site signature is strongly validated and can be used to search for galactose-binding sites in proteins. PROSITE-type signature sequences have also been inferred for galectin and C-type animal lectin-like fold families of Gal-binding proteins.  相似文献   

17.
18.
MOTIVATION: Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence and compactness. RESULTS: The new program, SCALI (Structural Core ALIgnment), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. For example, the three-layer sandwich domain architecture may be divided into four structural subclasses based on internal packing arrangements. These subclasses represent an intermediate level of structure classification, more general than topology, but more specific than architecture as defined in CATH. A strategy is presented for developing a set of predictive hidden Markov models based on multiple SCALI alignments.  相似文献   

19.
We have developed a new method and program, SARF2, for fast comparison of protein structures, which can detect topological as well as nontopological similarities. The method searches for large ensembles of secondary structure elements, which are mutually compatible in two proteins. These ensembles consist of small fragments of Cα-trace, similarly arranged in three-dimensional space in two proteins, but not necessarily equally-ordered along the polypeptide chains. The program SARF2 is available for everyone through the World-Wide Web (WWW). We have performed an exhaustive pairwise comparison of all the entries from a recent issue of the Protein Data Bank (PDB) and report here on the results of an automated hierarchical cluster analysis. In addition, we report on several new cases of significant structural resemblance between proteins. To this end, a new definition of the significance of structural similarity is introduced, which effectively distinguishes the biologically meaningful equivalences from those occurring by chance. Analyzing the distribution of sequence similarity in significant structural matches, we show that sequence similarity as low as 20% in structurally-prealigned proteins can be a strong indication for the biological relevance of structural similarity. © 1996 Wiley-Liss, Inc.  相似文献   

20.
The three-dimensional structures of homologous proteins are usually conserved during evolution, as are critical residues in a few short sequence motifs that often constitute the active site in enzymes. The precise spatial organization of such sites depends on the lengths and positions of the secondary structural elements connecting the motifs. We show how members of protein superfamilies, such as kinesins, myosins, and G(alpha) subunits of trimeric G proteins, are identified and classed by simply counting the number of amino acid residues between important sequence motifs in their nucleotide triphosphate-hydrolyzing domains. Subfamily-specific landmark patterns (motif to motif scores) are principally due to inserts and gaps in surface loops. Unusual protein sequences and possible sequence prediction errors are detected.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号