首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The analysis and prediction of protein-protein interaction sites from structural data are restricted by the limited availability of structural complexes that represent the complete protein-protein interaction space. The domain classification schemes CATH and SCOP are normally used independently in the analysis and prediction of protein domain-domain interactions. In this article, the effect of different domain classification schemes on the number and type of domain-domain interactions observed in structural data is systematically evaluated for the SCOP and CATH hierarchies. Although there is a large overlap in domain assignments between SCOP and CATH, 23.6% of CATH interfaces had no SCOP equivalent and 37.3% of SCOP interfaces had no CATH equivalent in a nonredundant set. Therefore, combining both classifications gives an increase of between 23.6 and 37.3% in domain-domain interfaces. It is suggested that if possible, both domain classification schemes should be used together, but if only one is selected, SCOP provides better coverage than CATH. Employing both SCOP and CATH reduces the false negative rate of predictive methods, which employ homology matching to structural data to predict protein-protein interaction by an estimated 6.5%.  相似文献   

Getz G  Vendruscolo M  Sachs D  Domany E 《Proteins》2002,46(4):405-415
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used.  相似文献   

We have used GRATH, a graph-based structure comparison algorithm, to map the similarities between the different folds observed in the CATH domain structure database. Statistical analysis of the distributions of the fold similarities has allowed us to assess the significance for any similarity. Therefore we have examined whether it is best to represent folds as discrete entities or whether, in fact, a more accurate model would be a continuum wherein folds overlap via common motifs. To do this we have introduced a new statistical measure of fold similarity, termed gregariousness. For a particular fold, gregariousness measures how many other folds have a significant structural overlap with that fold, typically comprising 40% or more of the larger structure. Gregarious folds often contain commonly occurring super-secondary structural motifs, such as beta-meanders, greek keys, alpha-beta plait motifs or alpha-hairpins, which are matching similar motifs in other folds. Apart from one example, all the most gregarious folds matching 20% or more of the other folds in the database, are alpha-beta proteins. They also occur in highly populated architectural regions of fold space, adopting sandwich-like arrangements containing two or more layers of alpha-helices and beta-strands.Domains that exhibit a low gregariousness, are those that have very distinctive folds, with few common motifs or motifs that are packed in unusual arrangements. Most of the superhelices exhibit low gregariousness despite containing some commonly occurring super-secondary structural motifs. In these folds, these common motifs are combined in an unusual way and represent a small proportion of the fold (<10%). Our results suggest that fold space may be considered as continuous for some architectural arrangements (e.g. alpha-beta sandwiches), in that super-secondary motifs can be used to link neighbouring fold groups. However, in other regions of fold space much more discrete topologies are observed with little similarity between folds.  相似文献   

BACKGROUND: Several methods of structural classification have been developed to introduce some order to the large amount of data present in the Protein Data Bank. Such methods facilitate structural comparisons and provide a greater understanding of structure and function. The most widely used and comprehensive databases are SCOP, CATH and FSSP, which represent three unique methods of classifying protein structures: purely manual, a combination of manual and automated, and purely automated, respectively. In order to develop reliable template libraries and benchmarks for protein-fold recognition, a systematic comparison of these databases has been carried out to determine their overall agreement in classifying protein structures. RESULTS: Approximately two-thirds of the protein chains in each database are common to all three databases. Despite employing different methods, and basing their systems on different rules of protein structure and taxonomy, SCOP, CATH and FSSP agree on the majority of their classifications. Discrepancies and inconsistencies are accounted for by a small number of explanations. Other interesting features have been identified, and various differences between manual and automatic classification methods are presented. CONCLUSIONS: Using these databases requires an understanding of the rules upon which they are based; each method offers certain advantages depending on the biological requirements and knowledge of the user. The degree of discrepancy between the systems also has an impact on reliability of prediction methods that employ these schemes as benchmarks. To generate accurate fold templates for threading, we extract information from a consensus database, encompassing agreements between SCOP, CATH and FSSP.  相似文献   

Carugo O 《Bioinformation》2010,4(8):347-351
Several non-redundant ensembles of protein three-dimensional structures were analyzed in order to estimate their natural clustering tendency by means of the Cox-Lewis coefficient. It was observed that, despite proteins tend to aggregate into different and well separated groups, some overlap between different clusters occurs. This suggests that classifications bases only on structural data cannot allow a systematic classification of proteins. Additional information are in particular needed in order to monitor completely the complex evolutionary relationships between proteins.  相似文献   

MOTIVATION: The success of the consensus approach to the protein structure prediction problem has led to development of several different consensus methods. Most of them only rely on a structural comparison of a number of different models. However, there are other types of information that might be useful such as the score from the server and structural evaluation. RESULTS: Pcons5 is a new and improved version of the consensus predictor Pcons. Pcons5 integrates information from three different sources: the consensus analysis, structural evaluation and the score from the fold recognition servers. We show that Pcons5 is better than the previous version of Pcons and that it performs better than using only the consensus analysis. In addition, we also present a version of Pmodeller based on Pcons5, which performs significantly better than Pcons5. AVAILABILITY: Pcons5 is the first Pcons version available as a standalone program from http://www.sbc.su.se/~bjorn/Pcons5. It should be easy to implement in local meta-servers.  相似文献   

吴新智 《人类学学报》2009,28(3):217-236
本文报道大荔颅骨的一系列测量数据, 并且将其与中国, 欧洲和非洲的中更新世人类的相应数据进行比较, 发现大荔颅骨的测量数据大多没有超出中国和欧洲/非洲中更新世人的变异范围, 有的与中国中更新世人接近, 有的与欧洲和/或非洲标本更加接近。本文将这些结果与大荔颅骨的与中国古人类共同具有的其他测量和观察特征进行综合考虑, 建议大荔人群属于中国古人类连续进化链中的一员, 并且表现出中国古人类与欧洲和非洲古人类之间基因交流的形态证据。  相似文献   

Tobi D 《Proteins》2012,80(4):1167-1176
A novel methodology for comparison of protein dynamics is presented. Protein dynamics is calculated using the Gaussian network model and the modes of motion are globally aligned using the dynamic programming algorithm of Needleman and Wunsch, commonly used for sequence alignment. The alignment is fast and can be used to analyze large sets of proteins. The methodology is applied to the four major classes of the SCOP database: "all alpha proteins," "all beta proteins," "alpha and beta proteins," and "alpha/beta proteins". We show that different domains may have similar global dynamics. In addition, we report that the dynamics of "all alpha proteins" domains are less specific to structural variations within a given fold or superfamily compared with the other classes. We report that domain pairs with the most similar and the least similar global dynamics tend to be of similar length. The significance of the methodology is that it suggests a new and efficient way of mapping between the global structural features of protein families/subfamilies and their encoded dynamics.  相似文献   

The cell bodies of ascending noradrenergic neurons in the brain are located predominantly in the locus coeruleus. An in vitro model of locus coeruleus neurons could prove to be a useful tool in the investigation of noradrenergic neural networks and their associated pathophysiologies. The CATH.a cell line demonstrates some of the properties expected of locus coeruleus neurons, and the present study investigated the neurotransmitter uptake and release properties of the CATH.a cells. It was surprising that the CATH.a cells failed to accumulate [3H]noradrenaline ([3H]NA), suggesting the lack of a functional NA transporter. RT-PCR supported this finding by demonstrating the absence of NA transporter mRNA. Treatment of CATH.a cells with various differentiating agents failed to increase the [3H]NA uptake. Endogenous NA release was studied using HPLC detection, which revealed a lack of depolarisation-induced increases in endogenous NA release. A human NA transporter-transfected CATH.a cell line was generated (termed RUNT), and a study of the [3H]NA uptake revealed that the RUNT cells displayed significant uptake that could be blocked by cocaine (10 microM). Furthermore, the uptake capacity could be dramatically increased by differentiation of the cells with dibutyryl cyclic AMP (1 mM) for 24 h. Using dibutyryl cyclic AMP-differentiated RUNT cells, high K+ concentrations (50 mM) significantly increased [3H]NA release above basal levels.  相似文献   

The floristic characteristic, position in Chinese floristic division, and the origin and development of the flora of the Cangshan Mountain Range (the Dali Range) were discussed from different aspects, based on the local seed plants of 2503 species, 45 subspecies, 194 varieties, belong to 852 genera in 164 families. Preliminary conclusions are as follows: The Cangshan Mountain Range belongs to three river gorges subregion, Hengduan mountain region, Sino Himalayan forest subkingdom, East Asiatic Kingdom, in Chinese floristic division. The floristic characteristic of seed plants of the Cangshan Mountain Range is temperate. Because of the geological events of the collision between Eurasian Plate and Indic Plate, the Tethys fading away, and the uplift of Himalayas, the elements from Gondwana land, Mediterranean and Arctic Tertiary gradually developed into its modern flora. The endemism within the flora is rich, with the neoendemic elements dominant. It shows that the Cangshan Mountain Range is not only a refuge for some ancient floristic elements but also a differentiation center for young floristic elements. It is an important floristic spot connecting floristic elements from different directions. And the distribution limits of a great deal of elements also are here.  相似文献   

Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, > 100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.  相似文献   

Amyloid-a state in many guises: survival of the fittest fibril fold   总被引:2,自引:0,他引:2  
Under appropriate conditions, essentially all proteins are able to aggregate to form long, well-ordered and beta-sheet-rich arrays known as amyloid-like fibrils. These fibrils consist of varying numbers of intertwined protofibrils and can for any given protein exhibit a wealth of different forms at the ultrastructural level. Traditionally, this structural variability or polymorphism has been attributed to differences in the assembly of a common protofibril structure. However, recent work on glucagon, insulin, and the Abeta peptide suggests that this polymorphism can occur at the level of secondary structure. Simple variations in either solvent conditions such as temperature, protein concentration, and ionic strength or external mechanical influences such as agitation can lead to formation of fibrils with markedly different characteristics. In some cases, these characteristics can be passed on to new fibrils in a strain-specific manner, similar to what is known for prions. The preferred structure of fibrils formed can be explained in terms of selective pressure and survival of the fittest; the most populated types of fibrils we observe at the end of an experiment are those that had the fastest overall growth rate under the given conditions. Fibrillar polymorphism is probably a consequence of the lack of structural restraints on a nonfunctional conformational state.  相似文献   

Teyra J  Hawkins J  Zhu H  Pisabarro MT 《Proteins》2011,79(2):499-508
The emerging picture of a continuous protein fold space highlights the existence of non obvious structural similarities between proteins with apparent different topologies. The identification of structure resemblances across fold space and the analysis of similar recognition regions may be a valuable source of information towards protein structure-based functional characterization. In this work, we use non-sequential structural alignment methods (ns-SAs) to identify structural similarities between protein pairs independently of their SCOP hierarchy, and we calculate the significance of binding region conservation using the interacting residues overlap in the ns-SA. We cluster the binding inferences for each family to distinguish already known family binding regions from putative new ones. Our methodology exploits the enormous amount of data available in the PDB to identify binding region similarities within protein families and to propose putative binding regions. Our results indicate that there is a plethora of structurally common binding regions among proteins, independently of current fold classifications. We obtain a 6- to 8-fold enrichment of novel binding regions, and identify binding inferences for 728 protein families that so far lack binding information in the PDB. We explore binding mode analogies between ligands from commonly clustered binding regions to investigate the utility of our methodology. A comprehensive analysis of the obtained binding inferences may help in the functional characterization of protein recognition and assist rational engineering. The data obtained in this work is available in the download link at www.scowlp.org.  相似文献   

Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.  相似文献   

We report herein the NMR structure of Tm0979, a structural proteomics target from Thermotoga maritima. The Tm0979 fold consists of four beta/alpha units, which form a central parallel beta-sheet with strand order 1234. The first three helices pack toward one face of the sheet and the fourth helix packs against the other face. The protein forms a dimer by adjacent parallel packing of the fourth helices sandwiched between the two beta-sheets. This fold is very interesting from several points of view. First, it represents the first structure determination for the DsrH family of conserved hypothetical proteins, which are involved in oxidation of intracellular sulfur but have no defined molecular function. Based on structure and sequence analysis, possible functions are discussed. Second, the fold of Tm0979 most closely resembles YchN-like folds; however the proteins that adopt these folds differ in secondary structural elements and quaternary structure. Comparison of these proteins provides insight into possible mechanisms of evolution of quaternary structure through a simple mechanism of hydrophobicity-changing mutations of one or two residues. Third, the Tm0979 fold is found to be similar to flavodoxin-like folds and beta/alpha barrel proteins, and may provide a link between these very abundant folds and putative ancestral half-barrel proteins.  相似文献   

Recent progress in structure determination techniques has led to a significant growth in the number of known membrane protein structures, and the first structural genomics projects focusing on membrane proteins have been initiated, warranting an investigation of appropriate bioinformatics strategies for optimal structural target selection for these molecules. What determines a membrane protein fold? How many membrane structures need to be solved to provide sufficient structural coverage of the membrane protein sequence space? We present the CAMPS database (Computational Analysis of the Membrane Protein Space) containing almost 45,000 proteins with three or more predicted transmembrane helices (TMH) from 120 bacterial species. This large set of membrane proteins was subjected to single‐linkage clustering using only sequence alignments covering at least 40% of the TMH present in a given family. This process yielded 266 sequence clusters with at least 15 members, roughly corresponding to membrane structural folds, sufficiently structurally homogeneous in terms of the variation of TMH number between individual sequences. These clusters were further subdivided into functionally homogeneous subclusters according to the COG (Clusters of Orthologous Groups) system as well as more stringently defined families sharing at least 30% identity. The CAMPS sequence clusters are thus designed to reflect three main levels of interest for structural genomics: fold, function, and modeling distance. We present a library of Hidden Markov Models (HMM) derived from sequence alignments of TMH at these three levels of sequence similarity. Given that 24 out of 266 clusters corresponding to membrane folds already have associated known structures, we estimate that 242 additional new structures, one for each remaining cluster, would provide structural coverage at the fold level of roughly 70% of prokaryotic membrane proteins belonging to the currently most populated families. Proteins 2006. © 2006 Wiley‐Liss, Inc.  相似文献   

Copper and iron play important roles in a variety of biological processes, especially when being chelated with proteins. The proteins involved in the metal binding, transporting and metabolism have aroused much interest. To facilitate the study on this topic, we constructed two databases (DCCP and DICP) containing the known copper- and iron-chelating proteins~ which are freely available from the website http://sdbi.sdut.edu.cn/en. Users can conveniently search and browse all of the entries in the databases. Based on the two databases, bioinformatic analyses were performed, which provided some novel insights into metalloproteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号