首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Brakoulias A  Jackson RM 《Proteins》2004,56(2):250-260
A method is described for the rapid comparison of protein binding sites using geometric matching to detect similar three-dimensional structure. The geometric matching detects common atomic features through identification of the maximum common sub-graph or clique. These features are not necessarily evident from sequence or from global structural similarity giving additional insight into molecular recognition not evident from current sequence or structural classification schemes. Here we use the method to produce an all-against-all comparison of phosphate binding sites in a number of different nucleotide phosphate-binding proteins. The similarity search is combined with clustering of similar sites to allow a preliminary structural classification. Clustering by site similarity produces a classification of binding sites for the 476 representative local environments producing ten main clusters representing half of the representative environments. The similarities make sense in terms of both structural and functional classification schemes. The ten main clusters represent a very limited number of unique structural binding motifs for phosphate. These are the structural P-loop, di-nucleotide binding motif [FAD/NAD(P)-binding and Rossman-like fold] and FAD-binding motif. Similar classification schemes for nucleotide binding proteins have also been arrived at independently by others using different methods.  相似文献   

2.
L Ellingson  J Zhang 《PloS one》2012,7(7):e40540
Comparison of the binding sites of proteins is an effective means for predicting protein functions based on their structure information. Despite the importance of this problem and much research in the past, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. Here, we designed a new algorithm, TIPSA (Triangulation-based Iterative-closest-point for Protein Surface Alignment), based on the iterative closest point (ICP) algorithm. TIPSA aims to find the maximum number of atoms that can be superposed between two protein binding sites, where any pair of superposed atoms has a distance smaller than a given threshold. The search starts from similar tetrahedra between two binding sites obtained from 3D Delaunay triangulation and uses the Hungarian algorithm to find additional matched atoms. We found that, due to the plasticity of protein binding sites, matching the rigid body of point clouds of protein binding sites is not adequate for satisfactory binding ligand prediction. We further incorporated global geometric information, the radius of gyration of binding site atoms, and used nearest neighbor classification for binding site prediction. Tested on benchmark data, our method achieved a performance comparable to the best methods in the literature, while simultaneously providing the common atom set and atom correspondences.  相似文献   

3.
The EU-funded STAR-project provided an opportunity to analyse 1418 macroinvertebrate samples from 310 sampling sites throughout Europe. At most of the sites, samples were taken in two seasons using both national protocols and the project’s STAR-AQEM protocol. At a subset of sites (86), two replicate samples were taken by each method in each of the two seasons. The resulting taxalists were analysed in terms of community similarity using the Bray–Curtis Index, Jaccard, and Renkonen Indices. A new concept of sample ‘coherence’ is used to measure the relative strength of within-site, within-season and within-method similarity and to determine their importance on variability in community composition. Site-coherence (i.e., highest similarity to another sample from the same site) was much higher where replicate samples were available. Season-coherence of samples was nearly 100% even if different methods were compared. Season appeared to be one of the major determinants of in-stream fauna. The STAR-AQEM method is most comparable to the Nordic, Portuguese and Czech (PERLA) national methods and less comparable to the Italian (IBE) and Latvian methods. Samples collected by these latter methods had higher similarities to other sites sampled with the same methods than to samples of the same site using the STAR-AQEM method, thus there was low site-coherence. In three stream types from Italy, Latvia and Greece 28–38% of the samples were most similar to a sample from a different site than to a replicate sample from the same site. This fact could have serious consequences for follow up bioassessments or impact assessments by cluster analysis based on similarity measures. Replicate samples are less coherent within site, season or method if the taxonomic resolution is family rather than species. Electronic supplementary material Electronic supplementary material is available for this article at and accessible for authorised users.  相似文献   

4.
Collagen phagocytosis is a critical mediator of extracellular matrix remodeling. Whereas the binding step of collagen phagocytosis is facilitated by Ca2+-dependent, gelsolin-mediated severing of actin filaments, the regulation of the collagen internalization step is not defined. We determined here whether phosphatidylinositol-4,5-bisphosphate [PI(4,5)P2] regulation of gelsolin is required for collagen internalization. In gelsolin null fibroblasts transfected with gelsolin severing mutants, actin severing and collagen binding were strongly impaired but internalization and actin monomer addition at collagen bead sites were much less affected. PI(4,5)P2 accumulated around collagen during internalization and was associated with gelsolin. Cell-permeable peptides mimicking the PI(4,5)P2 binding site of gelsolin blocked actin monomer addition, the association of gelsolin with actin at phagosomes, and collagen internalization but did not affect collagen binding. Collagen beads induced recruitment of type 1 gamma phosphatidylinositol phosphate kinase (PIPK1gamma661) to internalization sites. Dominant negative constructs and RNA interference demonstrated a requirement for catalytically active PIPK1gamma661 for collagen internalization. We conclude that separate functions of gelsolin mediate sequential stages of collagen phagocytosis: Ca2+-dependent actin severing facilitates collagen binding, whereas PI(4,5)P2-dependent regulation of gelsolin promotes the actin assembly required for internalization of collagen fibrils.  相似文献   

5.
Recognition of binding patterns common to a set of protein structures is important for recognition of function, prediction of binding, and drug design. We consider protein binding sites represented by a set of 3D points with assigned physico-chemical and geometrical properties important for protein-ligand interactions. We formulate the multiple binding site alignment problem as detection of the largest common set of such 3D points. We discuss the computational problem of multiple common point set detection and, particularly, the matching problem in K-partite-epsilon graphs, where K partitions are associated with K structures and edges are defined between epsilon-close points. We show that the K-partite-epsilon matching problem is NP-hard in the Euclidean space with dimension larger than one. Consequently, we show that the largest common point set problem between three point sets is NP-hard. On the practical side, we present a novel computational method, MultiBind, for recognition of binding patterns common to a set of protein structures. It performs a multiple alignment between protein binding sites in the absence of overall sequence, fold, or binding partner similarity. Despite the NP-hardness results, in our applications, we practically overcome the exponential number of multiple alignment combinations by applying an efficient branchand- bound filtering procedure. We show applications of MultiBind to several biological targets. The method recognizes patterns which are responsible for binding small molecules, such as estradiol, ATP/ANP, and transition state analogues.  相似文献   

6.
Two Akv murine leukemia virus-based retroviral vectors with primer binding sites matching tRNA(Gln-1) and tRNA(Lys-3) were constructed. The transduction efficiency of these mutated vectors was found to be comparable to that of a vector carrying the wild-type primer binding site matching tRNA(Pro). Polymerase chain reaction amplification and sequence analysis of transduced proviruses confirmed the transfer of vectors with mutated primer binding sites and further showed that tRNA(Gln-2) may act efficiently in conjunction with the tRNA(Gln-1) primer binding site. We conclude that murine leukemia virus can replicate by using various tRNA molecules as primers and propose primer binding site-tRNA primer interactions to be of major importance for tRNA primer selection. However, efficient primer selection does not require perfect Watson-Crick base pairing at all 18 positions of the primer binding site.  相似文献   

7.
The similarity comparison of binding sites based on amino acid between different proteins can facilitate protein function identification. However, Binding site usually consists of several crucial amino acids which are frequently dispersed among different regions of a protein and consequently make the comparison of binding sites difficult. In this study, we introduce a new method, named as chemical and geometric similarity of binding site (CGS-BSite), to compute the ligand binding site similarity based on discrete amino acids with maximum-weight bipartite matching algorithm. The principle of computing the similarity is to find a Euclidean Transformation which makes the similar amino acids approximate to each other in a geometry space, and vice versa. CGS-BSite permits site and ligand flexibilities, provides a stable prediction performance on the flexible ligand binding sites. Binding site prediction on three test datasets with CGS-BSite method has similar performance to Patch-Surfer method but outperforms other five tested methods, reaching to 0.80, 0.71 and 0.85 based on the area under the receiver operating characteristic curve, respectively. It performs a marginally better than Patch-Surfer on the binding sites with small volume and higher hydrophobicity, and presents good robustness to the variance of the volume and hydrophobicity of ligand binding sites. Overall, our method provides an alternative approach to compute the ligand binding site similarity and predict potential special ligand binding sites from the existing ligand targets based on the target similarity.  相似文献   

8.
MOTIVATION: An approach for identifying similarities of protein-protein binding sites is presented. The geometric shape of a binding site is described by computing a feature vector based on moment invariants. In order to search for similarities, feature vectors of binding sites are compared. Similar feature vectors indicate binding sites with similar shapes. RESULTS: The approach is validated on a representative set of protein-protein binding sites, extracted from the SCOPPI database. When querying binding sites from a representative set, we search for known similarities among 2819 binding sites. A median area under the ROC curve of 0.98 is observed. For half of the queries, a similar binding site is identified among the first two of 2819 when sorting all binding sites according the proposed similarity measure. Typical examples identified by this method are analyzed and discussed. The nitrogenase iron protein-like SCOP family is clustered hierarchically according to the proposed similarity measure as a case study. AVAILABILITY: Python code is available on request from the authors.  相似文献   

9.
Src homology 2 (SH2) domains are approximately 100 residue phosphotyrosyl peptide binding modules found in signalling proteins and are important targets for therapeutic intervention. The peptide binding site is evolutionarily well conserved, particularly at the two major binding pockets, pTyr and pTyr + 3. We present a computational analysis of diversity within the peptide binding region and discuss molecular recognition beyond the conventional binding motif, drawing attention to novel conserved ligand interaction sites which may be exploitable in ligand binding studies. The peptide binding site is defined by selecting crystal contacts and domains are clustered according to binding site residue similarity. Comparison with a classification based on experimental peptide screening reveals a high level of qualitative agreement, indicating that the method is able independently to generate functional information. A conservation scoring method reveals extensive patches of conservation in some groups not present across the whole family, challenging the notion that the domains recognise only a linear phosphopeptide sequence. Conservation difference maps determine group-dependent clusters of conserved residues that are not seen when considering a larger experimentally determined group. Many of these residues contact the peptide outside the pTyr to pTyr + 3 motif, challenging the conventional view that this motif is largely responsible for ligand recognition and discrimination.  相似文献   

10.
Class I phosphoinositide (PI) 3-kinases act through effector proteins whose 3-PI selectivity is mediated by a limited repertoire of structurally defined, lipid recognition domains. We describe here the lipid preferences and crystal structure of a new class of PI binding modules exemplified by select IQGAPs (IQ motif containing GTPase-activating proteins) known to coordinate cellular signaling events and cytoskeletal dynamics. This module is defined by a C-terminal 105-107 amino acid region of which IQGAP1 and -2, but not IQGAP3, binds preferentially to phosphatidylinositol 3,4,5-trisphosphate (PtdInsP(3)). The binding affinity for PtdInsP(3), together with other, secondary target-recognition characteristics, are comparable with those of the pleckstrin homology domain of cytohesin-3 (general receptor for phosphoinositides 1), an established PtdInsP(3) effector protein. Importantly, the IQGAP1 C-terminal domain and the cytohesin-3 pleckstrin homology domain, each tagged with enhanced green fluorescent protein, were both re-localized from the cytosol to the cell periphery following the activation of PI 3-kinase in Swiss 3T3 fibroblasts, consistent with their common, selective recognition of endogenous 3-PI(s). The crystal structure of the C-terminal IQGAP2 PI binding module reveals unexpected topological similarity to an integral fold of C2 domains, including a putative basic binding pocket. We propose that this module integrates select IQGAP proteins with PI 3-kinase signaling and constitutes a novel, atypical phosphoinositide binding domain that may represent the first of a larger group, each perhaps structurally unique but collectively dissimilar from the known PI recognition modules.  相似文献   

11.
Decamethonium and d-tubocurarine displace N-methylacridinium ion, a potent fluorescent inhibitor of acetylcholinesterase, from the surface of the enzyme. Decamethonium is competitive with N-methylacridinium which indicates that the binding sites for these ligands overlap. However, the displacement of N-methylacridinium ion by d-tubocurarine requires the existence of a binding site for d-tubocurarine in addition to the active site. Since the affinities for d-tubocurarine at both sites are comparable, two well defined ligand binding sites must exist for each catalytic site that is titratable by 7-dimethylcarbamyl-N-methylquinolinium iodide.  相似文献   

12.
The rapid growth in protein structural data and the emergence of structural genomics projects have increased the need for automatic structure analysis and tools for function prediction. Small molecule recognition is critical to the function of many proteins; therefore, determination of ligand binding site similarity is important for understanding ligand interactions and may allow their functional classification. Here, we present a binding sites database (SitesBase) that given a known protein-ligand binding site allows rapid retrieval of other binding sites with similar structure independent of overall sequence or fold similarity. However, each match is also annotated with sequence similarity and fold information to aid interpretation of structure and functional similarity. Similarity in ligand binding sites can indicate common binding modes and recognition of similar molecules, allowing potential inference of function for an uncharacterised protein or providing additional evidence of common function where sequence or fold similarity is already known. Alternatively, the resource can provide valuable information for detailed studies of molecular recognition including structure-based ligand design and in understanding ligand cross-reactivity. Here, we show examples of atomic similarity between superfamily or more distant fold relatives as well as between seemingly unrelated proteins. Assignment of unclassified proteins to structural superfamiles is also undertaken and in most cases substantiates assignments made using sequence similarity. Correct assignment is also possible where sequence similarity fails to find significant matches, illustrating the potential use of binding site comparisons for newly determined proteins.  相似文献   

13.
Cappello V  Tramontano A  Koch U 《Proteins》2002,47(2):106-115
Comparative analysis of protein binding sites for similar ligands yields information about conserved interactions, relevant for ligand affinity, and variable interactions, which are important for specificity. The pattern of variability can indicate new targets for a pharmacologically validated class of compounds binding to a similar site. A particularly vast group of therapeutically interesting proteins using the same or similar substrates are those that bind adenine-containing ligands. Drug development is focusing on compounds occupying the adenine-binding site and their specificity is an issue of paramount importance. We use a simple scheme to characterize and classify the adenine-binding sites in terms of their intermolecular interactions, and show that this classification does not necessarily correspond to protein classifications based on either sequence or structural similarity. We find that only a limited number of the different hydrogen bond patterns possible for adenine-binding is used, which can be utilized as an effective classification scheme. Closely related protein families usually share similar hydrogen patterns, whereas non-polar interactions are less well conserved. Our classification scheme can be used to select groups of proteins with a similar ligand-binding site, thus facilitating the definition of the properties that can be exploited to design specific inhibitors.  相似文献   

14.
Kinjo AR  Nakamura H 《PloS one》2012,7(2):e31437
Most biological processes are described as a series of interactions between proteins and other molecules, and interactions are in turn described in terms of atomic structures. To annotate protein functions as sets of interaction states at atomic resolution, and thereby to better understand the relation between protein interactions and biological functions, we conducted exhaustive all-against-all atomic structure comparisons of all known binding sites for ligands including small molecules, proteins and nucleic acids, and identified recurring elementary motifs. By integrating the elementary motifs associated with each subunit, we defined composite motifs that represent context-dependent combinations of elementary motifs. It is demonstrated that function similarity can be better inferred from composite motif similarity compared to the similarity of protein sequences or of individual binding sites. By integrating the composite motifs associated with each protein function, we define meta-composite motifs each of which is regarded as a time-independent diagrammatic representation of a biological process. It is shown that meta-composite motifs provide richer annotations of biological processes than sequence clusters. The present results serve as a basis for bridging atomic structures to higher-order biological phenomena by classification and integration of binding site structures.  相似文献   

15.
Zhang Z  Grigorov MG 《Proteins》2006,62(2):470-478
An increasing attention has been dedicated to the characterization of complex networks within the protein world. This work is reporting how we uncovered networked structures that reflected the structural similarities among protein binding sites. First, a 211 binding sites dataset has been compiled by removing the redundant proteins in the Protein Ligand Database (PLD) (http://www-mitchell.ch.cam.ac.uk/pld/). Using a clique detection algorithm we have performed all-against-all binding site comparisons among the 211 available ones. Within the set of nodes representing each binding site an edge was added whenever a pair of binding sites had a similarity higher than a threshold value. The generated similarity networks revealed that many nodes had few links and only few were highly connected, but due to the limited data available it was not possible to definitively prove a scale-free architecture. Within the same dataset, the binding site similarity networks were compared with the networks of sequence and fold similarity networks. In the protein world, indications were found that structure is better conserved than sequence, but on its own, sequence was better conserved than the subset of functional residues forming the binding site. Because a binding site is strongly linked with protein function, the identification of protein binding site similarity networks could accelerate the functional annotation of newly identified genes. In view of this we have discussed several potential applications of binding site similarity networks, such as the construction of novel binding site classification databases, as well as the implications for protein molecular design in general and computational chemogenomics in particular.  相似文献   

16.
17.
MOTIVATION: The development of epitope-based vaccines crucially relies on the ability to classify Human Leukocyte Antigen (HLA) molecules into sets that have similar peptide binding specificities, termed supertypes. In their seminal work, Sette and Sidney defined nine HLA class I supertypes and claimed that these provide an almost perfect coverage of the entire repertoire of HLA class I molecules. HLA alleles are highly polymorphic and polygenic and therefore experimentally classifying each of these molecules to supertypes is at present an impossible task. Recently, a number of computational methods have been proposed for this task. These methods are based on defining protein similarity measures, derived from analysis of binding peptides or from analysis of the proteins themselves. RESULTS: In this paper we define both peptide derived and protein derived similarity measures, which are based on learning distance functions. The peptide derived measure is defined using a peptide-peptide distance function, which is learned using information about known binding and non-binding peptides. The protein derived similarity measure is defined using a protein-protein distance function, which is learned using information about alleles previously classified to supertypes by Sette and Sidney (1999). We compare the classification obtained by these two complimentary methods to previously suggested classification methods. In general, our results are in excellent agreement with the classifications suggested by Sette and Sidney (1999) and with those reported by Buus et al. (2004). The main important advantage of our proposed distance-based approach is that it makes use of two different and important immunological sources of information-HLA alleles and peptides that are known to bind or not bind to these alleles. Since each of our distance measures is trained using a different source of information, their combination can provide a more confident classification of alleles to supertypes.  相似文献   

18.
19.
Sample Variability Influences on the Precision of Predictive Bioassessment   总被引:1,自引:0,他引:1  
The rapid bioassessment technique we investigate (AUSRIVAS) requires a nationally standardized sampling protocol that uses a single collection of macroinvertebrates (without replication) taken from 10 m of specific habitats (e.g. stream edge and/or riffle) and sub-samples of 200 animals. The macroinvertebrate data are run through predictive models that provide an assessment of biological condition based on a comparison of the animals found in the collection (the observed) and those expected to be there given the site-specific characteristics of the stream (the O/E taxa score). The important questions are related to the conclusions regarding river condition that can be drawn from the biological assessment. Rapid bioassessment studies are generally of two types: those for assessment of individual sites and those where many sites are selected to collectively assess the potential impacts of some human activity such as forestry or agriculture. We wanted to identify the effects of sample variability on the outputs of this predictive bioassessment technique. We found that a single collection of benthic macroinvertebrates was sufficient for bioassessment when taken from a site that had a large area of nearly uniform substrate and was in good condition. Also, collections taken from a larger and smaller area of substrate (1.75, 3.5 or 7 m2) gave the same bioassessment. In other sites, not in such good condition, the variability in bioassessment from different collections could result in different interpretations of biological condition. For all sites, regardless of condition, much of the variation in bioassessment was derived from sub-sampling the macroinvertebrates. We develop a statistical sub-sampling and solver algorithm that provides a measure of variability and a statistically valid probability of impairment for a single site, without the need to actually collect the hundreds of replicated collections needed for this study. We found that assessment at impaired sites, where only 1 collection and 1 sub-sample are taken (a common situation in rapid assessment), the 95% confidence level for O/E taxa scores is estimated to be as much as ±0.22. At sites in reference condition, the 95% confidence interval may be much narrower (~±0.1 O/E units). Therefore, assessments of sites at, or near, reference condition will be more precise than for impaired sites. Power analysis revealed that where single sites are being assessed we recommend a sample collected from 3.5 m2 of habitat, but replicate collections should be taken at a site (rather than one only) and we recommend replicate sub-samples of each collection (total of six sub-samples from a site). However, this would remove a ‘rapid’ component of the bioassessment. We recommend the addition of sub-sampling and solver algorithms to the predictive models such as AUSRIVAS to provide a statistical measure of probability of impairment. An adaptive sub-sampling regime could then be used to optimize sampling effort. For example, a single sub-sample may be sufficient for screening or the agency could use the sub-sample and solver algorithms to sub-sample the parent sample for a more precise estimate of the biological condition. Replication should be maximized at the spatial scale required for reporting: site, or regional. But as a general rule, catchment or land-use scale studies should maximize replicate sites, and site-scale assessments should maximize replication within sites.  相似文献   

20.
Information content of binding sites on nucleotide sequences   总被引:73,自引:0,他引:73  
Repressors, polymerases, ribosomes and other macromolecules bind to specific nucleic acid sequences. They can find a binding site only if the sequence has a recognizable pattern. We define a measure of the information (R sequence) in the sequence patterns at binding sites. It allows one to investigate how information is distributed across the sites and to compare one site to another. One can also calculate the amount of information (R frequency) that would be required to locate the sites, given that they occur with some frequency in the genome. Several Escherichia coli binding sites were analyzed using these two independent empirical measurements. The two amounts of information are similar for most of the sites we analyzed. In contrast, bacteriophage T7 RNA polymerase binding sites contain about twice as much information as is necessary for recognition by the T7 polymerase, suggesting that a second protein may bind at T7 promoters. The extra information can be accounted for by a strong symmetry element found at the T7 promoters. This element may be an operator. If this model is correct, these promoters and operators do not share much information. The comparisons between R sequence and R frequency suggest that the information at binding sites is just sufficient for the sites to be distinguished from the rest of the genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号