首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Brakoulias A  Jackson RM 《Proteins》2004,56(2):250-260
A method is described for the rapid comparison of protein binding sites using geometric matching to detect similar three-dimensional structure. The geometric matching detects common atomic features through identification of the maximum common sub-graph or clique. These features are not necessarily evident from sequence or from global structural similarity giving additional insight into molecular recognition not evident from current sequence or structural classification schemes. Here we use the method to produce an all-against-all comparison of phosphate binding sites in a number of different nucleotide phosphate-binding proteins. The similarity search is combined with clustering of similar sites to allow a preliminary structural classification. Clustering by site similarity produces a classification of binding sites for the 476 representative local environments producing ten main clusters representing half of the representative environments. The similarities make sense in terms of both structural and functional classification schemes. The ten main clusters represent a very limited number of unique structural binding motifs for phosphate. These are the structural P-loop, di-nucleotide binding motif [FAD/NAD(P)-binding and Rossman-like fold] and FAD-binding motif. Similar classification schemes for nucleotide binding proteins have also been arrived at independently by others using different methods.  相似文献   

2.
A method for simultaneous alignment of multiple protein structures   总被引:1,自引:0,他引:1  
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2004,56(1):143-156
Here, we present MultiProt, a fully automated highly efficient technique to detect multiple structural alignments of protein structures. MultiProt finds the common geometrical cores between input molecules. To date, most methods for multiple alignment start from the pairwise alignment solutions. This may lead to a small overall alignment. In contrast, our method derives multiple alignments from simultaneous superpositions of input molecules. Further, our method does not require that all input molecules participate in the alignment. Actually, it efficiently detects high scoring partial multiple alignments for all possible number of molecules in the input. To demonstrate the power of MultiProt, we provide a number of case studies. First, we demonstrate known multiple alignments of protein structures to illustrate the performance of MultiProt. Next, we present various biological applications. These include: (1) a partial alignment of hinge-bent domains; (2) identification of functional groups of G-proteins; (3) analysis of binding sites; and (4) protein-protein interface alignment. Some applications preserve the sequence order of the residues in the alignment, whereas others are order-independent. It is their residue sequence order-independence that allows application of MultiProt to derive multiple alignments of binding sites and of protein-protein interfaces, making MultiProt an extremely useful structural tool.  相似文献   

3.
Targeting non‐native‐ligand binding sites for potential investigative and therapeutic applications is an attractive strategy in proteins that share common native ligands, as in Rab1 protein. Rab1 is a subfamily member of Rab proteins, which are members of Ras GTPase superfamily. All Ras GTPase superfamily members bind to native ligands GTP and GDP, that switch on and off the proteins, respectively. Rab1 is physiologically essential for autophagy and transport between endoplasmic reticulum and Golgi apparatus. Pathologically, Rab1 is implicated in human cancers, a neurodegenerative disease, cardiomyopathy, and bacteria‐caused infectious diseases. We have performed structural analyses on Rab1 protein using a unique ensemble of clustering methods, including multi‐step principal component analysis, non‐negative matrix factorization, and independent component analysis, to better identify representative Rab1 proteins than the application of a single clustering method alone does. We then used the identified representative Rab1 structures, resolved in multiple ligand states, to map their known and novel binding sites. We report here at least a novel binding site on Rab1, involving Rab1‐specific residues that could be further explored for the rational design and development of investigative probes and/or therapeutic small molecules against the Rab1 protein. Proteins 2017; 85:859–871. © 2016 Wiley Periodicals, Inc.  相似文献   

4.

Background

Much progress has been made in understanding the 3D structure of proteins using methods such as NMR and X-ray crystallography. The resulting 3D structures are extremely informative, but do not always reveal which sites and residues within the structure are of special importance. Recently, there are indications that multiple-residue, sub-domain structural relationships within the larger 3D consensus structure of a protein can be inferred from the analysis of the multiple sequence alignment data of a protein family. These intra-dependent clusters of associated sites are used to indicate hierarchical inter-residue relationships within the 3D structure. To reveal the patterns of associations among individual amino acids or sub-domain components within the structure, we apply a k-modes attribute (aligned site) clustering algorithm to the ubiquitin and transthyretin families in order to discover associations among groups of sites within the multiple sequence alignment. We then observe what these associations imply within the 3D structure of these two protein families.

Results

The k-modes site clustering algorithm we developed maximizes the intra-group interdependencies based on a normalized mutual information measure. The clusters formed correspond to sub-structural components or binding and interface locations. Applying this data-directed method to the ubiquitin and transthyretin protein family multiple sequence alignments as a test bed, we located numerous interesting associations of interdependent sites. These clusters were then arranged into cluster tree diagrams which revealed four structural sub-domains within the single domain structure of ubiquitin and a single large sub-domain within transthyretin associated with the interface among transthyretin monomers. In addition, several clusters of mutually interdependent sites were discovered for each protein family, each of which appear to play an important role in the molecular structure and/or function.

Conclusions

Our results demonstrate that the method we present here using a k- modes site clustering algorithm based on interdependency evaluation among sites obtained from a sequence alignment of homologous proteins can provide significant insights into the complex, hierarchical inter-residue structural relationships within the 3D structure of a protein family.
  相似文献   

5.
Han Si  Lee SG  Kim KH  Choi CJ  Kim YH  Hwang KS 《Bio Systems》2006,84(3):175-182
Most multiple gene sequence alignment methods rely on conventions regarding the score of a multiple alignment in pairwise fashion. Therefore, as the number of sequences increases, the runtime of sequencing expands exponentially. In order to solve the problem, this paper presents a multiple sequence alignment method using a linear-time suffix tree algorithm to cluster similar sequences at one time without pairwise alignment. After searching for common subsequences, cross-matching common subsequences were generated, and sometimes inexact matching was found. So, a procedure aimed at masking the inexact cross-matching pairs was suggested here. In addition, BLAST was combined with a clustering tool in order to annotate the clusters generated by suffix tree clustering. The proposed method for clustering and annotating genes consists of the following steps: (1) construction of a suffix tree; (2) searching and overlapping common subsequences; (3) grouping subsequence pairs; (4) masking cross-matching pairs; (5) clustering gene sequences; (6) annotating gene clusters by the BLAST search. The performance of the proposed system, CLAGen, was successfully evaluated with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria. The system generated 11 clusters and found the longest subsequences of each cluster, which are biologically significant.  相似文献   

6.
Here we present an algorithm designed to carry out multiple structure alignment and to detect recurring substructural motifs. So far we have implemented it for comparison of protein structures. However, this general method is applicable to comparisons of RNA structures and to detection of a pharmacophore in a series of drug molecules. Further, its sequence order independence permits its application to detection of motifs on protein surfaces, interfaces, and binding/active sites. While there are many methods designed to carry out pairwise structure comparisons, there are only a handful geared toward the multiple structure alignment task. Most of these tackle multiple structure comparison as a collection of pairwise structure comparison tasks. The multiple structural alignment algorithm presented here automatically finds the largest common substructure (core) of atoms that appears in all the molecules in the ensemble. The detection of the core and the structural alignment are done simultaneously. The algorithm begins by finding small substructures that are common to all the proteins in the ensemble. One of the molecules is considered the reference; the others are the source molecules. The small substructures are stored in special arrays termed combinatorial buckets, which define sets of multistructural alignments from the source molecules that coincide with the same small set of reference atoms (C(alpha)-atoms here). These substructures are initial small fragments that have congruent copies in each of the proteins. The substructures are extended, through the processing of the combinatorial buckets, by clustering the superpositions (transformations). The method is very efficient.  相似文献   

7.
We report the largest and most comprehensive comparison of protein structural alignment methods. Specifically, we evaluate six publicly available structure alignment programs: SSAP, STRUCTAL, DALI, LSQMAN, CE and SSM by aligning all 8,581,970 protein structure pairs in a test set of 2930 protein domains specially selected from CATH v.2.4 to ensure sequence diversity. We consider an alignment good if it matches many residues, and the two substructures are geometrically similar. Even with this definition, evaluating structural alignment methods is not straightforward. At first, we compared the rates of true and false positives using receiver operating characteristic (ROC) curves with the CATH classification taken as a gold standard. This proved unsatisfactory in that the quality of the alignments is not taken into account: sometimes a method that finds less good alignments scores better than a method that finds better alignments. We correct this intrinsic limitation by using four different geometric match measures (SI, MI, SAS, and GSAS) to evaluate the quality of each structural alignment. With this improved analysis we show that there is a wide variation in the performance of different methods; the main reason for this is that it can be difficult to find a good structural alignment between two proteins even when such an alignment exists. We find that STRUCTAL and SSM perform best, followed by LSQMAN and CE. Our focus on the intrinsic quality of each alignment allows us to propose a new method, called "Best-of-All" that combines the best results of all methods. Many commonly used methods miss 10-50% of the good Best-of-All alignments. By putting existing structural alignments into proper perspective, our study allows better comparison of protein structures. By highlighting limitations of existing methods, it will spur the further development of better structural alignment methods. This will have significant biological implications now that structural comparison has come to play a central role in the analysis of experimental work on protein structure, protein function and protein evolution.  相似文献   

8.
9.
Molecular recognition by protein mostly occurs in a local region on the protein surface. Thus, an efficient computational method for accurate characterization of protein local structural conservation is necessary to better understand biology and drug design. We present a novel local structure alignment tool, G‐LoSA. G‐LoSA aligns protein local structures in a sequence order independent way and provides a GA‐score, a chemical feature‐based and size‐independent structure similarity score. Our benchmark validation shows the robust performance of G‐LoSA to the local structures of diverse sizes and characteristics, demonstrating its universal applicability to local structure‐centric comparative biology studies. In particular, G‐LoSA is highly effective in detecting conserved local regions on the entire surface of a given protein. In addition, the applications of G‐LoSA to identifying template ligands and predicting ligand and protein binding sites illustrate its strong potential for computer‐aided drug design. We hope that G‐LoSA can be a useful computational method for exploring interesting biological problems through large‐scale comparison of protein local structures and facilitating drug discovery research and development. G‐LoSA is freely available to academic users at http://im.compbio.ku.edu/GLoSA/ .  相似文献   

10.
The binding properties of cibacron blue F3GA (CB-F3GA) bound to a model NAD(P)H/FAD(H2)-dependent protein system, namely cytosolic quinone reductase (QR), was characterized by AMBER in an attempt to address the binding properties of immobilized CB-F3GA used in the separation of serum albumin. A favorable binding free energy of -4.52kcal/mol (KD=5.09 x 10(-4)kcal/mol) was determined for CB-F3GA binding by MM-PBSA method, which was found to be a ballpark estimate of empirical values reported in literature (DeltaG approximately -6kcal/mol). We propose that CB-F3GA primarily follows a class III binding motif in presence of FAD in the binding site of QR in solution, while a class II binding motif is observed in the crystal form. It was found that favorable van der Waals/hydrophobic interactions take place in the binding site making a major contribution to a favorably dominating enthalpy of binding (DeltaHtot=-25.87kcal/mol) as compared to a disfavorable binding entropy term (TDeltaStot=-21.35kcal/mol). Additional MM-PBSA experiments in the absence of FAD gave rise to a disfavorable binding free energy for CB in complex with QR, suggesting that FAD is an essential determinant of CB-F3GA binding. This is in contrast to an earlier observation of Denizli et al. on separation of human serum albumin (HSA) by immobilized CB-F3GA in the absence of FAD. Therefore, a class I binding model for CB-F3GA is proposed here to account for the efficient separation of HSA in affinity chromatography systems.  相似文献   

11.
Recent progress in structure determination techniques has led to a significant growth in the number of known membrane protein structures, and the first structural genomics projects focusing on membrane proteins have been initiated, warranting an investigation of appropriate bioinformatics strategies for optimal structural target selection for these molecules. What determines a membrane protein fold? How many membrane structures need to be solved to provide sufficient structural coverage of the membrane protein sequence space? We present the CAMPS database (Computational Analysis of the Membrane Protein Space) containing almost 45,000 proteins with three or more predicted transmembrane helices (TMH) from 120 bacterial species. This large set of membrane proteins was subjected to single‐linkage clustering using only sequence alignments covering at least 40% of the TMH present in a given family. This process yielded 266 sequence clusters with at least 15 members, roughly corresponding to membrane structural folds, sufficiently structurally homogeneous in terms of the variation of TMH number between individual sequences. These clusters were further subdivided into functionally homogeneous subclusters according to the COG (Clusters of Orthologous Groups) system as well as more stringently defined families sharing at least 30% identity. The CAMPS sequence clusters are thus designed to reflect three main levels of interest for structural genomics: fold, function, and modeling distance. We present a library of Hidden Markov Models (HMM) derived from sequence alignments of TMH at these three levels of sequence similarity. Given that 24 out of 266 clusters corresponding to membrane folds already have associated known structures, we estimate that 242 additional new structures, one for each remaining cluster, would provide structural coverage at the fold level of roughly 70% of prokaryotic membrane proteins belonging to the currently most populated families. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

12.
Structure comparison is widely used to quantify protein relationships. Although there are several approaches to calculate structural similarity, specifying significance thresholds for similarity metrics is difficult due to the inherent likeness of common secondary structure elements. In this study, metal co‐factor location is used to assess the biological relevance of structural alignments. The distance between the centroids of bound co‐factors adds a chemical and function‐relevant constraint to the structural superimposition of two proteins. This additional dimension can be used to define cut‐off values for discriminating valid and spurious alignments in large alignment sets. The hypothesis underlying our approach is that metal coordination sites constrain structural evolution, thus revealing functional relationships between distantly related proteins. A comparison of three related nitrogenases shows the sequence and fold constraints imposed on the protein structures up to 18 Å away from the centers of their bound metal clusters. Proteins 2014; 82:648–656. © 2013 Wiley Periodicals, Inc.  相似文献   

13.
An Y  Friesner RA 《Proteins》2002,48(2):352-366
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.  相似文献   

14.
A novel approach was proposed to evaluate the steadiness of polar clusters containing the RNA-binding sites on the protein surface. The degree of clustering of RNA-binding polar residues was used as a measure of the steadiness of the corresponding polar clusters. Escherichia coli ribosomal protein L25 utilizes two binding sites, S1 and S2, to complexate with a 5S rRNA fragment. The cluster distribution of RNA-contacting polar residues on the protein surface was studied using the structural data on the complex (in crystal and in solution) and the free state (in solution). The degree of polar residue clustering in S1 and S2 in crystal was estimated at 71.4 and 100%, respectively. For the free state in solution, the degree of clustering of the two sites was 22.8 and 68.6%, respectively. Thus, the steadiness was quantitatively estimated for the RNA-binding sites of two different types, one preexisting in the protein and the other induced by the RNA structure upon complexation. The difference between the protein structures in crystal and in solution was found to be functionally significant. The results can be extrapolated to numerous complexes of proteins with double-stranded RNA and DNA.  相似文献   

15.
Plant physiological and biochemical processes are significantly affected by gamma irradiation stress. In addition, gamma‐ray (GA) differentially affects gene expression across the whole genome. In this study, we identified radio marker genes (RMGs) responding only to GA stress compared with six abiotic stresses (chilling, cold, anoxia, heat, drought and salt) in rice. To analyze the expression patterns of differentially expressed genes (DEGs) in gamma‐irradiated rice plants against six abiotic stresses, we conducted a hierarchical clustering analysis by using a complete linkage algorithm. The up‐ and downregulated DEGs were observed against six abiotic stresses in three and four clusters among a total of 31 clusters, respectively. The common gene ontology functions of upregulated DEGs in clusters 9 and 19 are associated with oxidative stress. In a Pearson's correlation coefficient analysis, GA stress showed highly negative correlation with salt stress. On the basis of specific data about the upregulated DEGs, we identified the 40 candidate RMGs that are induced by gamma irradiation. These candidate RMGs, except two genes, were more highly induced in rice roots than in other tissues. In addition, we obtained other 38 root‐induced genes by using a coexpression network analysis of the specific upregulated candidate RMGs in an ARACNE algorithm. Among these genes, we selected 16 RMGs and 11 genes coexpressed with three RMGs to validate coexpression network results. RT‐PCR assay confirmed that these genes were highly upregulated in GA treatment. All 76 genes (38 root‐induced genes and 38 candidate RMGs) might be useful for the detection of GA sensitivity in rice roots.  相似文献   

16.
We isolated cDNAs encoding type 2 and type 3 inositol 1,4,5-trisphosphate (IP(3)) receptors (IP(3)R2 and IP(3)R3, respectively) from mouse lung and found a novel alternative splicing segment, SI(m2), at 176-208 of IP(3)R2. The long form (IP(3)R2 SI(m2)(+)) was dominant, but the short form (IP(3)R2 SI(m2)(-)) was detected in all tissues examined. IP(3)R2 SI(m2)(-) has neither IP(3) binding activity nor Ca(2+) releasing activity. In addition to its reticular distribution, IP(3)R2 SI(m2)(+) is present in the form of clusters in the endoplasmic reticulum of resting COS-7 cells, and after ATP or Ca(2+) ionophore stimulation, most of the IP(3)R2 SI(m2)(+) is in clusters. IP(3)R3 is localized uniformly on the endoplasmic reticulum of resting cells and forms clusters after ATP or Ca(2+) ionophore stimulation. IP(3)R2 SI(m2)(-) does not form clusters in either resting or stimulated cells. IP(3) binding-deficient site-directed mutants of IP(3)R2 SI(m2)(+) and IP(3)R3 fail to form clusters, indicating that IP(3) binding is involved in the cluster formation by these isoforms. Coexpression of IP(3)R2 SI(m2)(-) prevents stimulus-induced IP(3)R clustering, suggesting that IP(3)R2 SI(m2)(-) functions as a negative coordinator of stimulus-induced IP(3)R clustering. Expression of IP(3)R2 SI(m2)(-) in CHO-K1 cells significantly reduced ATP-induced Ca(2+) entry, but not Ca(2+) release, suggesting that the novel splice variant of IP(3)R2 specifically influences the dynamics of the sustained phase of Ca(2+) signals.  相似文献   

17.
Many proteins function by interacting with other small molecules (ligands). Identification of ligand‐binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand‐binding protein sequences and functions. Consequently, we classified the patches into ~2000 well‐characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross‐fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.  相似文献   

18.
Deep sequencing of PCR amplicon libraries facilitates the detection of low‐abundance populations in environmental DNA surveys of complex microbial communities. At the same time, deep sequencing can lead to overestimates of microbial diversity through the generation of low‐frequency, error‐prone reads. Even with sequencing error rates below 0.005 per nucleotide position, the common method of generating operational taxonomic units (OTUs) by multiple sequence alignment and complete‐linkage clustering significantly increases the number of predicted OTUs and inflates richness estimates. We show that a 2% single‐linkage preclustering methodology followed by an average‐linkage clustering based on pairwise alignments more accurately predicts expected OTUs in both single and pooled template preparations of known taxonomic composition. This new clustering method can reduce the OTU richness in environmental samples by as much as 30–60% but does not reduce the fraction of OTUs in long‐tailed rank abundance curves that defines the rare biosphere.  相似文献   

19.
20.
It is suspected that correlated motions among a subset of spatially separated residues drive conformational dynamics not only in multidomain but also in single domain proteins. Sequence and structure‐based methods have been proposed to determine covariation between two sites on a protein. The statistical coupling analysis (SCA) that compares the changes in probability at two sites in a multiple sequence alignment (MSA) and a subset of the MSA has been used to infer the network of residues that encodes allosteric signals in protein families. The structural perturbation method (SPM), that probes the response of a local perturbation at all other sites, has been used to probe the allostery wiring diagram in biological machines and enzymes. To assess the efficacy of the SCA, we used an exactly soluble two dimensional lattice model and performed double‐mutant cycle (DMC) calculations to predict the extent of physical coupling between two sites. The predictions of the SCA and the DMC results show that only residues that are in contact in the native state are accurately identified. In addition, covariations among strongly interacting residues are most easily identified by the SCA. These conclusions are consistent with the DMC experiments on the PDZ family. Good correlation between the SCA and the DMC is only obtained by performing multiple experiments that vary the nature of amino acids at a given site. In contrast, the energetic coupling found in experiments for the PDZ domain are recovered using the SPM. We also predict, using the SPM, several residues that are coupled energetically. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号