首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Seven protein structure comparison methods and two sequence comparison programs were evaluated on their ability to detect either protein homologs or domains with the same topology (fold) as defined by the CATH structure database. The structure alignment programs Dali, Structal, Combinatorial Extension (CE), VAST, and Matras were tested along with SGM and PRIDE, which calculate a structural distance between two domains without aligning them. We also tested two sequence alignment programs, SSEARCH and PSI-BLAST. Depending upon the level of selectivity and error model, structure alignment programs can detect roughly twice as many homologous domains in CATH as sequence alignment programs. Dali finds the most homologs, 321-533 of 1120 possible true positives (28.7%-45.7%), at an error rate of 0.1 errors per query (EPQ), whereas PSI-BLAST finds 365 true positives (32.6%), regardless of the error model. At an EPQ of 1.0, Dali finds 42%-70% of possible homologs, whereas Matras finds 49%-57%; PSI-BLAST finds 36.9%. However, Dali achieves >84% coverage before the first error for half of the families tested. Dali and PSI-BLAST find 9.2% and 5.2%, respectively, of the 7056 possible topology pairs at an EPQ of 0.1 and 19.5, and 5.9% at an EPQ of 1.0. Most statistical significance estimates reported by the structural alignment programs overestimate the significance of an alignment by orders of magnitude when compared with the actual distribution of errors. These results help quantify the statistical distinction between analogous and homologous structures, and provide a benchmark for structure comparison statistics.  相似文献   

2.
Getz G  Vendruscolo M  Sachs D  Domany E 《Proteins》2002,46(4):405-415
We present an automated procedure to assign CATH and SCOP classifications to proteins whose FSSP score is available. CATH classification is assigned down to the topology level, and SCOP classification is assigned to the fold level. Because the FSSP database is updated weekly, this method makes it possible to update also CATH and SCOP with the same frequency. Our predictions have a nearly perfect success rate when ambiguous cases are discarded. These ambiguous cases are intrinsic in any protein structure classification that relies on structural information alone. Hence, we introduce the "twilight zone for structure classification." We further suggest that to resolve these ambiguous cases, other criteria of classification, based also on information about sequence and function, must be used.  相似文献   

3.

Background

Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets’ lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process.

Results

Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (−0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area.

Conclusions

Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0576-2) contains supplementary material, which is available to authorized users.  相似文献   

4.
We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web.  相似文献   

5.
Tobi D 《Proteins》2012,80(4):1167-1176
A novel methodology for comparison of protein dynamics is presented. Protein dynamics is calculated using the Gaussian network model and the modes of motion are globally aligned using the dynamic programming algorithm of Needleman and Wunsch, commonly used for sequence alignment. The alignment is fast and can be used to analyze large sets of proteins. The methodology is applied to the four major classes of the SCOP database: "all alpha proteins," "all beta proteins," "alpha and beta proteins," and "alpha/beta proteins". We show that different domains may have similar global dynamics. In addition, we report that the dynamics of "all alpha proteins" domains are less specific to structural variations within a given fold or superfamily compared with the other classes. We report that domain pairs with the most similar and the least similar global dynamics tend to be of similar length. The significance of the methodology is that it suggests a new and efficient way of mapping between the global structural features of protein families/subfamilies and their encoded dynamics.  相似文献   

6.
Huang Y  Cao H  Liu Z 《Proteins》2012,80(6):1610-1619
Since the proposal of three-dimensional (3D) domain swapping, many 3D domain-swapped structures have been reported. However, when compared with the vast protein structure space, it is still unclear whether 3D domain swapping is a general mechanism for protein assembly. Here, we investigated this possibility by constructing a dataset consisting of more than 500 domain-swapped structures. The domain-swapped structures were mapped into the protein structure space. We found that about 10% of protein folds and 5% of protein families contain domain-swapped structures. When comparing the domain-swapped structures in a family/superfamily, we found that proteins within a family/superfamily can swap in different ways. Interface analysis revealed that the hinge loops contributed more than half of the open interface in 70% of bona fide domain-swapped dimers, indicating that the hinge loops play an important role in stabilizing the domain-swapped conformations. Our study supports the suggestion that domain swapping is a general property of all proteins and will facilitate further understanding the mechanism of 3D domain swapping.  相似文献   

7.
Peter R. Jungblut 《Proteomics》2013,13(21):3103-3105
In proteomics, in the past years, there was a focus on high throughput and reaching of large numbers of identified proteins with the basic discourse of protein expression. To avoid the impression of producing pure lists attempts are usually made to correlate proteins changed in amount between two biological situations to different pathways or protein interactions. This discourse is based on two simplifications, which limit the applicability of proteomics drastically: (i) it is sufficient to quantify a protein from several enzymatic digestion products; (ii) a biological situation is sufficiently described, if a peptide with its PTM is identified, resulting in long lists of modified peptides with data amounts, which are not anymore made accessible for the reader of a publication. The elucidation of N‐terminal methylation of proteasome subunit Rpt1 in yeast by Kimura et al. (Proteomics 2013, 13, 3167–3174) , which represents the focus on one protein, shows the value of solid chemical analysis with a complete data documentation and paves the way to proteomics based on the protein speciation discourse.  相似文献   

8.
9.
The rate of membrane protein (MP) structure determination has been examined for the 18-year period following the publication of the first high-resolution crystal structure. The growth is solidly exponential, but lags behind the rate for soluble proteins during the equivalent time period.  相似文献   

10.
  1. Repeatability is the cornerstone of science, and it is particularly important for systematic reviews. However, little is known on how researchers’ choice of database, and search platform influence the repeatability of systematic reviews. Here, we aim to unveil how the computer environment and the location where the search was initiated from influence hit results.
  2. We present a comparative analysis of time‐synchronized searches at different institutional locations in the world and evaluate the consistency of hits obtained within each of the search terms using different search platforms.
  3. We revealed a large variation among search platforms and showed that PubMed and Scopus returned consistent results to identical search strings from different locations. Google Scholar and Web of Science''s Core Collection varied substantially both in the number of returned hits and in the list of individual articles depending on the search location and computing environment. Inconsistency in Web of Science results has most likely emerged from the different licensing packages at different institutions.
  4. To maintain scientific integrity and consistency, especially in systematic reviews, action is needed from both the scientific community and scientific search platforms to increase search consistency. Researchers are encouraged to report the search location and the databases used for systematic reviews, and database providers should make search algorithms transparent and revise access rules to titles behind paywalls. Additional options for increasing the repeatability and transparency of systematic reviews are storing both search metadata and hit results in open repositories and using Application Programming Interfaces (APIs) to retrieve standardized, machine‐readable search metadata.
  相似文献   

11.
For over 2 decades, continuous efforts to organize the jungle of available protein structures have been underway. Although a number of discrepancies between different classification approaches for soluble proteins have been reported, the classification of membrane proteins has so far not been comparatively studied because of the limited amount of available structural data. Here, we present an analysis of α‐helical membrane protein classification in the SCOP and CATH databases. In the current set of 63 α‐helical membrane protein chains having between 1 and 13 transmembrane helices, we observed a number of differently classified proteins both regarding their domain and fold assignment. The majority of all discrepancies affect single transmembrane helix, two helix hairpin, and four helix bundle domains, while domains with more than five helices are mostly classified consistently between SCOP and CATH. It thus appears that the structural constraints imposed by the lipid bilayer complicate the classification of membrane proteins with only few membrane‐spanning regions. This problem seems to be specific for membrane proteins as soluble four helix bundles, not restrained by the membrane, are more consistently classified by SCOP and CATH. Our findings indicate that the structural space of small membrane helix bundles is highly continuous such that even minor differences in individual classification procedures may lead to a significantly different classification. Membrane proteins with few helices and limited structural diversity only seem to be reasonably classifiable if the definition of a fold is adapted to include more fine‐grained structural features such as helix–helix interactions and reentrant regions. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

12.
Undergraduate biology curricula are being modified to model and teach the activities of scientists better. The assignment described here, one that investigates protein structure and function, was designed for use in a sophomore-level cell physiology course at Earlham College. Students work in small groups to read and present in poster format on the content of a single research article reporting on the structure and/or function of a protein. Goals of the assignment include highlighting the interdependence of protein structure and function; asking students to review, integrate, and apply previously acquired knowledge; and helping students see protein structure/function in a context larger than cell physiology. The assignment also is designed to build skills in reading scientific literature, oral and written communication, and collaboration among peers. Assessment of student perceptions of the assignment in two separate offerings indicates that the project successfully achieves these goals. Data specifically show that students relied heavily on their peers to understand their article. The assignment was also shown to require students to read articles more carefully than previously. In addition, the data suggest that the assignment could be modified and used successfully in other courses and at other institutions.  相似文献   

13.
Reduced amino acid alphabets are useful to understand molecular evolution as they reveal basal, shared properties of amino acids, which the structures and functions of proteins rely on. Several previous studies derived such reduced alphabets and linked them to the origin of life and biotechnological applications. However, all this previous work presupposes that only direct contacts of amino acids in native protein structures are relevant. We show in this work, using information–theoretical measures, that an appropriate alphabet reduction scheme is in fact a function of the maximum distance amino acids interact at. Although for small distances our results agree with previous ones, we show how long‐range interactions change the overall picture and prompt for a revised understanding of the protein design process. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
蛋白质折叠规律研究是生命科学领域重要的前沿课题之一,蛋白质折叠类型分类是折叠规律研究的基础。本研究以SCOP数据库的蛋白质折叠类型分类为基础、以Astral SCOPe 2.05数据库中相似性小于40%的α、β、α+β及α/β类所属的折叠类型为研究对象,完成了989种蛋白质折叠类型的模板构建并形成模板数据库;基于折叠类型设计模板建立了蛋白质折叠类型分类方法,实现了SCOP数据库蛋白质折叠类型的自动化分类。家族模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:95.00%、99.99%、0.94与90.00%、99.97%、0.92,折叠类型模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:93.71%、99.97%、0.91与86.00%、99.93%、0.87。结果表明:模板设计合理,可有效用于对已知结构的蛋白质进行分类。  相似文献   

15.
16.
17.
PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross-validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without (13)C shifts and Q3 = 86.8% with only H(alpha) shifts available). Finally, errors made by PsiCSI almost exclusively involve the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high-throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.  相似文献   

18.
We describe a database of protein structure alignments as well as methods and tools that use this database to improve comparative protein modeling. The current version of the database contains 105 alignments of similar proteins or protein segments. The database comprises 416 entries, 78,495 residues, 1,233 equivalent entry pairs, and 230,396 pairs of equivalent alignment positions. At present, the main application of the database is to improve comparative modeling by satisfaction of spatial restraints implemented in the program MODELLER (?ali A, Blundell TL, 1993, J Mol Biol 234:779–815). To illustrate the usefulness of the database, the restraints on the conformation of a disulfide bridge provided by an equivalent disulfide bridge in a related structure are derived from the alignments; the prediction success of the disulfide dihedral angle classes is increased to approximately 80%, compared to approximately 55% for modeling that relies on the stereochemistry of disulfide bridges alone. The second example of the use of the database is the derivation of the probability density function for comparative modeling of the cis/trans isomerism of the proline residues; the prediction success is increased from 0% to 82.9% for cis-proline and from 93.3% to 96.2% for trans-proline. The database is available via electronic mail.  相似文献   

19.
The structure of protein evolution and the evolution of protein structure   总被引:4,自引:3,他引:1  
The observed distribution of protein structures can give us important clues about the underlying evolutionary process, imposing important constraints on possible models. The availability of results from an increasing number of genome projects has made the development of these models an active area of research. Models explaining the observed distribution of structures have focused on the inherent functional capabilities and structural properties of different folds and on the evolutionary dynamics. Increasingly, these elements are being combined.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号