首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have developed a tool, named "SCOPExplorer", for browsing and analyzing SCOP information. SCOPExplorer 1) contains a tree-style viewer to display an overview of protein structure data, 2) is able to employ a variety of options to analyze SCOP data statistically, and 3) provides a function to link protein domains to protein data bank (PDB) resources. SCOPExplorer uses an XML-based structural document format, named "SCOPML", derived from the SCOP data. To evaluate SCOPExplorer, proteins containing more than 20 domains were analyzed. The Skp1-Skp2 protein complex and the Fab fragment of IgG2 contain the largest numbers of domains in the current eukaryotic SCOP database. These proteins are known to either bind to various proteins or generate diversity. This suggests that the more domains a protein has, the more interactions or more variability it will be capable of. (SCOPExplorer is available for download at http://scopexplorer.ulsan.ac.kr).  相似文献   

2.
The analysis and prediction of protein-protein interaction sites from structural data are restricted by the limited availability of structural complexes that represent the complete protein-protein interaction space. The domain classification schemes CATH and SCOP are normally used independently in the analysis and prediction of protein domain-domain interactions. In this article, the effect of different domain classification schemes on the number and type of domain-domain interactions observed in structural data is systematically evaluated for the SCOP and CATH hierarchies. Although there is a large overlap in domain assignments between SCOP and CATH, 23.6% of CATH interfaces had no SCOP equivalent and 37.3% of SCOP interfaces had no CATH equivalent in a nonredundant set. Therefore, combining both classifications gives an increase of between 23.6 and 37.3% in domain-domain interfaces. It is suggested that if possible, both domain classification schemes should be used together, but if only one is selected, SCOP provides better coverage than CATH. Employing both SCOP and CATH reduces the false negative rate of predictive methods, which employ homology matching to structural data to predict protein-protein interaction by an estimated 6.5%.  相似文献   

3.
4.
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012–2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non‐SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. Proteins 2015; 83:2025–2038. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc.  相似文献   

5.
One of the major research directions in bioinformatics is that of predicting the protein superfamily in large databases and classifying a given set of protein domains into superfamilies. The classification reflects the structural, evolutionary and functional relatedness. These relationships are embodied in hierarchical classification such as Structural Classification of Protein (SCOP), which is manually curated. Such classification is essential for the structural and functional analysis of proteins. Yet, a large number of proteins remain unclassified. We have proposed an unsupervised machine-learning FuzzyART neural network algorithm to classify a given set of proteins into SCOP superfamilies. The proposed method is fast learning and uses an atypical non-linear pattern recognition technique. In this approach, we have constructed a similarity matrix from p-values of BLAST all-against-all, trained the network with FuzzyART unsupervised learning algorithm using the similarity matrix as input vectors and finally the trained network offers SCOP superfamily level classification. In this experiment, we have evaluated the performance of our method with existing techniques on six different datasets. We have shown that the trained network is able to classify a given similarity matrix of a set of sequences into SCOP superfamilies at high classification accuracy.  相似文献   

6.
Dividing protein structures into domains is proven useful for more accurate structural and functional characterization of proteins. Here, we develop a method, called DDOMAIN, that divides structure into DOMAINs using a normalized contact-based domain-domain interaction profile. Results of DDOMAIN are compared to AUTHORS annotations (domain definitions are given by the authors who solved protein structures), as well as to popular SCOP and CATH annotations by human experts and automatic programs. DDOMAIN's automatic annotations are most consistent with the AUTHORS annotations (90% agreement in number of domains and 88% agreement in both number of domains and at least 85% overlap in domain assignment of residues) if its three adjustable parameters are trained by the AUTHORS annotations. By comparison, the agreement is 83% (81% with at least 85% overlap criterion) between SCOP-trained DDOMAIN and SCOP annotations and 77% (73%) between CATH-trained DDOMAIN and CATH annotations. The agreement between DDOMAIN and AUTHORS annotations goes beyond single-domain proteins (97%, 82%, and 56% for single-, two-, and three-domain proteins, respectively). For an "easy" data set of proteins whose CATH and SCOP annotations agree with each other in number of domains, the agreement is 90% (89%) between "easy-set"-trained DDOMAIN and CATH/SCOP annotations. The consistency between SCOP-trained DDOMAIN and SCOP annotations is superior to two other recently developed, SCOP-trained, automatic methods PDP (protein domain parser), and DomainParser 2. We also tested a simple consensus method made of PDP, DomainParser 2, and DDOMAIN and a different version of DDOMAIN based on a more sophisticated statistical energy function. The DDOMAIN server and its executable are available in the services section on http://sparks.informatics.iupui.edu.  相似文献   

7.
A natural way to study protein sequence, structure, and function is to put them in the context of evolution. Homologs inherit similarities from their common ancestor, while analogs converge to similar structures due to a limited number of energetically favorable ways to pack secondary structural elements. Using novel strategies, we previously assembled two reliable databases of homologs and analogs. In this study, we compare these two data sets and develop a support vector machine (SVM)-based classifier to discriminate between homologs and analogs. The classifier uses a number of well-known similarity scores. We observe that although both structure scores and sequence scores contribute to SVM performance, profile sequence scores computed based on structural alignments are the best discriminators between remote homologs and structural analogs. We apply our classifier to a representative set from the expert-constructed database, Structural Classification of Proteins (SCOP). The SVM classifier recovers 76% of the remote homologs defined as domains in the same SCOP superfamily but from different families. More importantly, we also detect and discuss interesting homologous relationships between SCOP domains from different superfamilies, folds, and even classes.  相似文献   

8.

Background  

Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources.  相似文献   

9.
We carry out a systematic analysis of the correlation between similarity of protein three-dimensional structures and their evolutionary relationships. The structural similarity is quantitatively identified by an all-against-all comparison of the spatial arrangement of secondary structural elements in nonredundant 967 representative proteins, and the evolutionary relationship is judged according to the definition of superfamily in the SCOP database. We find the following symmetry rule: a protein pair that has similar folds but belong to different superfamilies has (with a very rare exception) certain internal symmetry in its common similar folds. Possible reasons behind the symmetry rule are discussed.  相似文献   

10.
Conformational analysis of alternative protein structures   总被引:1,自引:0,他引:1  
MOTIVATION: Alternative structural models determined experimentally are available for an increasing number of proteins. Structural and functional studies of these proteins need to take these models into consideration as they can present considerable structural differences. The characterization of the structural differences and similarities between these models is a fundamental task in structural biology requiring appropriate methods. RESULTS: We propose a method for characterizing sets of alternative structural models. Three types of analysis are performed: grouping according to structural similarity, visualization and detection of structural variation and comparison of subsets for identifying and locating distinct conformational states. The alpha carbon atoms are used in order to analyse the backbone conformations. Alternatively, side-chain atoms are used for detailed conformational analysis of specific sites. The method takes into account estimates of atom coordinate uncertainty. The invariant regions are used to generate optimal superpositions of these models. We present the results obtained for three proteins showing different degrees of conformational variability: relative motion of two structurally conserved subdomains, a disordered subdomain and flexibility in the functional site associated with ligand binding. The method has been applied in the analysis of the alternative models available in SCOP. Considerable structural variability can be observed for most proteins. AVAILABILITY: The results of the analysis of the SCOP alternative models, the estimates of coordinate uncertainty as well as the source code of the implementation are available in the STRuster web site: http://struster.bioinf.mpi-inf.mpg.de.  相似文献   

11.

Background  

Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Among various efforts to improve the database of Structural Classification of Proteins (SCOP), automation has received particular attention. Herein, we predict the deepest SCOP structural level that an unclassified protein shares with classified proteins with an equal number of secondary structure elements (SSEs).  相似文献   

12.
We have developed a program for automatic identification of domains in protein three-dimensional structures. Performance of the program was assessed by three different benchmarks: (i) by comparison with the expert-curated SCOP database of structural domains; (ii) by comparison with a collection of manual domain assignments; and (iii) by comparison with a set of 55 proteins, frequently used as a benchmark for automatic domain assignment. In all these benchmarks PDP identified domains correctly in more than 80% of proteins. AVAILABILITY: http://123d.ncifcrf.gov/.  相似文献   

13.
14.
Chu CK  Feng LL  Wouters MA 《Proteins》2005,60(4):577-583
Structural data mining studies attempt to deduce general principles of protein structure from solved structures deposited in the protein data bank (PDB). The entire database is unsuitable for such studies because it is not representative of the ensemble of protein folds. Given that novel folds continue to be unearthed, some folds are currently unrepresented in the PDB while other folds are overrepresented. Overrepresentation can easily be avoided by filtering the dataset. PDB_SELECT is a well-used representative subset of the PDB that has been deduced by sequence comparison. Specifically, structures with sequences that exhibit a pairwise sequence identity above a threshold value are weeded from the dataset. Although length criteria for pairwise alignments have a structural basis, this automated method of pruning is essentially sequence-based and runs into problems in the twilight zone, possibly resulting in some folds being overrepresented. The value-added structure databases SCOP and CATH are also a potential source of a nonredundant dataset. Here we compare the sequence-derived dataset PDB_SELECT with the structural databases SCOP (Structural Classification Of Proteins) and CATH (Class-Architecture-Topology-Homology). We show that some folds remain overrepresented in the PDB_SELECT dataset while other folds are not represented at all. However, SCOP and CATH also have their own problems such as the labor-intensiveness of the update process and the problem of determining whether all folds are equally or sufficiently distant. We discuss areas where further work is required.  相似文献   

15.
Suprachiasmatic nucleus circadian oscillatory protein (SCOP) is a member of the leucine-rich repeat (LRR)-containing protein family. In addition to circadian expression in the rat hypothalamic suprachiasmatic nucleus, SCOP is constitutively expressed in neurons throughout the rat brain. Here we found that a substantial amount of SCOP was localized in the brain membrane rafts, in which only K-Ras was abundant among Ras isoforms. SCOP interacted directly through its LRR domain with a subset of K-Ras in the guanine nucleotide-free form that was present in the raft fraction. This interaction interfered with the binding of added guanine nucleotide to K-Ras in vitro. A negative regulatory role of SCOP for K-Ras function was examined in PC12 cell lines stably overexpressing SCOP or its deletion mutants. Overexpression of full-length SCOP markedly down-regulated ERK1/ERK2 activation induced by depolarization or phorbol ester stimulation, and this inhibitory effect of overexpressed SCOP was dependent on its LRR domain. These results strongly suggest that SCOP negatively regulates K-Ras signaling in the membrane rafts, identifying a novel mechanism for regulation of the Ras-MAPK pathway.  相似文献   

16.
MOTIVATION: The sequence patterns contained in the available motif and hidden Markov model (HMM) databases are a valuable source of information for protein sequence annotation. For structure prediction and fold recognition purposes, we computed mappings from such pattern databases to the protein domain hierarchy given by the ASTRAL compendium and applied them to the prediction of SCOP classifications. Our aim is to make highly confident predictions also for non-trivial cases if possible and abstain from a prediction otherwise, and thus to provide a method that can be used as a first step in a pipeline of prediction methods. We describe two successful examples for such pipelines. With the AutoSCOP approach, it is possible to make predictions in a large-scale manner for many domains of the available sequences in the well-known protein sequence databases. RESULTS: AutoSCOP computes unique sequence patterns and pattern combinations for SCOP classifications. For instance, we assign a SCOP superfamily to a pattern found in its members whenever the pattern does not occur in any other SCOP superfamily. Especially on the fold and superfamily level, our method achieves both high sensitivity (above 93%) and high specificity (above 98%) on the difference set between two ASTRAL versions, due to being able to abstain from unreliable predictions. Further, on a harder test set filtered at low sequence identity, the combination with profile-profile alignments improves accuracy and performs comparably even to structure alignment methods. Integrating our method with structure alignment, we are able to achieve an accuracy of 99% on SCOP fold classifications on this set. In an analysis of false assignments of domains from new folds/superfamilies/families to existing SCOP classifications, AutoSCOP correctly abstains for more than 70% of the domains belonging to new folds and superfamilies, and more than 80% of the domains belonging to new families. These findings show that our approach is a useful additional filter for SCOP classification prediction of protein domains in combination with well-known methods such as profile-profile alignment. AVAILABILITY: A web server where users can input their domain sequences is available at http://www.bio.ifi.lmu.de/autoscop.  相似文献   

17.
BACKGROUND: Several methods of structural classification have been developed to introduce some order to the large amount of data present in the Protein Data Bank. Such methods facilitate structural comparisons and provide a greater understanding of structure and function. The most widely used and comprehensive databases are SCOP, CATH and FSSP, which represent three unique methods of classifying protein structures: purely manual, a combination of manual and automated, and purely automated, respectively. In order to develop reliable template libraries and benchmarks for protein-fold recognition, a systematic comparison of these databases has been carried out to determine their overall agreement in classifying protein structures. RESULTS: Approximately two-thirds of the protein chains in each database are common to all three databases. Despite employing different methods, and basing their systems on different rules of protein structure and taxonomy, SCOP, CATH and FSSP agree on the majority of their classifications. Discrepancies and inconsistencies are accounted for by a small number of explanations. Other interesting features have been identified, and various differences between manual and automatic classification methods are presented. CONCLUSIONS: Using these databases requires an understanding of the rules upon which they are based; each method offers certain advantages depending on the biological requirements and knowledge of the user. The degree of discrepancy between the systems also has an impact on reliability of prediction methods that employ these schemes as benchmarks. To generate accurate fold templates for threading, we extract information from a consensus database, encompassing agreements between SCOP, CATH and FSSP.  相似文献   

18.
Getz G  Vendruscolo M  Sachs D  Domany E 《Proteins》2002,46(4):405-415
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used.  相似文献   

19.
MOTIVATION: It is commonly believed that sequence determines structure, which in turn determines function. However, the presence of many proteins with the same structural fold but different functions suggests that global structure and function do not always correlate well. RESULTS: We propose a method for accurate functional annotation, based on identification of functional signatures from structural alignments (FSSA) using the Structural Classification of Proteins (SCOP) database. The FSSA method is superior at function discrimination and classification compared with several methods that directly inherit functional annotation information from homology inference, such as Smith-Waterman, PSI-BLAST, hidden Markov models and structure comparison methods, for a large number of structural fold families. Our results indicate that the contributions of amino acid residue types and positions to structure and function are largely separable for proteins in multi-functional fold families.  相似文献   

20.
The analysis of disulphide bond containing proteins in the Protein Data Bank (PDB) revealed that out of 27,209 protein structures analyzed, 12,832 proteins contain at least one intra-chain disulphide bond and 811 proteins contain at least one inter-chain disulphide bond. The intra-chain disulphide bond containing proteins can be grouped into 256 categories based on the number of disulphide bonds and the disulphide bond connectivity patterns (DBCPs) that were generated according to the position of half-cystine residues along the protein chain. The PDB entries corresponding to these 256 categories represent 509 unique SCOP superfamilies. A simple web-based computational tool is made freely available at the website http://www.ccmb.res.in/bioinfo/dsbcp that allows flexible queries to be made on the database in order to retrieve useful information on the disulphide bond containing proteins in the PDB. The database is useful to identify the different SCOP superfamilies associated with a particular disulphide bond connectivity pattern or vice versa. It is possible to define a query based either on a single field or a combination of the following fields, i.e., PDB code, protein name, SCOP superfamily name, number of disulphide bonds, disulphide bond connectivity pattern and the number of amino acid residues in a protein chain and retrieve information that match the criterion. Thereby, the database may be useful to select suitable protein structural templates in order to model the more distantly related protein homologs/analogs using the comparative modeling methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号