首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

Inference of remote homology between proteins is very challenging and remains a prerogative of an expert. Thus a significant drawback to the use of evolutionary-based protein structure classifications is the difficulty in assigning new proteins to unique positions in the classification scheme with automatic methods. To address this issue, we have developed an algorithm to map protein domains to an existing structural classification scheme and have applied it to the SCOP database.  相似文献   

2.

Background  

Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources.  相似文献   

3.

Background  

Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Among various efforts to improve the database of Structural Classification of Proteins (SCOP), automation has received particular attention. Herein, we predict the deepest SCOP structural level that an unclassified protein shares with classified proteins with an equal number of secondary structure elements (SSEs).  相似文献   

4.

Background  

SCOP and CATH are widely used as gold standards to benchmark novel protein structure comparison methods as well as to train machine learning approaches for protein structure classification and prediction. The two hierarchies result from different protocols which may result in differing classifications of the same protein. Ignoring such differences leads to problems when being used to train or benchmark automatic structure classification methods. Here, we propose a method to compare SCOP and CATH in detail and discuss possible applications of this analysis.  相似文献   

5.

Background  

Structure alignment methods offer the possibility of measuring distant evolutionary relationships between proteins that are not visible by sequence-based analysis. However, the question of how structural differences and similarities ought to be quantified in this regard remains open. In this study we construct a training set of sequence-unique CATH and SCOP domains, from which we develop a scoring function that can reliably identify domains with the same CATH topology and SCOP fold classification. The score is implemented in the ASH structure alignment package, for which the source code and a web service are freely available from the PDBj website .  相似文献   

6.

Background  

We apply a new machine learning method, the so-called Support Vector Machine method, to predict the protein structural class. Support Vector Machine method is performed based on the database derived from SCOP, in which protein domains are classified based on known structures and the evolutionary relationships and the principles that govern their 3-D structure.  相似文献   

7.

Background

Prediction of function of proteins on the basis of structure and vice versa is a partially solved problem, largely in the domain of biophysics and biochemistry. This underlies the need of computational and bioinformatics approach to solve the problem. Large and organized latent knowledge on protein classification exists in the form of independently created protein classification databases. By creating probabilistic maps between classes of structural classification databases (e.g. SCOP [1]) and classes of functional classification databases (e.g. PROSITE [2]), structure and function of proteins could be probabilistically related.

Results

We demonstrate that PROSITE and SCOP have significant semantic overlap, in spite of independent classification schemes. By training classifiers of SCOP using classes of PROSITE as attributes and vice versa, accuracy of Support Vector Machine classifiers for both SCOP and PROSITE was improved. Novel attributes, 2-D elastic profiles and Blocks were used to improve time complexity and accuracy. Many relationships were extracted between classes of SCOP and PROSITE using decision trees.

Conclusion

We demonstrate that presented approach can discover new probabilistic relationships between classes of different taxonomies and render a more accurate classification. Extensive mappings between existing protein classification databases can be created to link the large amount of organized data. Probabilistic maps were created between classes of SCOP and PROSITE allowing predictions of structure using function, and vice versa. In our experiments, we also found that functions are indeed more strongly related to structure than are structure to functions.  相似文献   

8.

Background  

SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure.  相似文献   

9.

Background  

Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification.  相似文献   

10.
11.

Background  

Current classification of protein folds are based, ultimately, on visual inspection of similarities. Previous attempts to use computerized structure comparison methods show only partial agreement with curated databases, but have failed to provide detailed statistical and structural analysis of the causes of these divergences.  相似文献   

12.

Background  

Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important.  相似文献   

13.

Background  

In protein evolution, the mechanism of the emergence of novel protein domain is still an open question. The incremental growth of protein variable regions, which was produced by stochastic insertions, has the potential to generate large and complex sub-structures. In this study, a deterministic methodology is proposed to reconstruct phylogenies from protein structures, and to infer insertion events in protein evolution. The analysis was performed on a broad range of SCOP domain families.  相似文献   

14.

Background  

The Asp-box is a short sequence and structure motif that folds as a well-defined β-hairpin. It is present in different folds, but occurs most prominently as repeats in β-propellers. Asp-box β-propellers are known to be characteristically irregular and to occur in many medically important proteins, most of which are glycosidase enzymes, but they are otherwise not well characterized and are only rarely treated as a distinct β-propeller family. We have analyzed the sequence, structure, function and occurrence of the Asp-box and s-Asp-box -a related shorter variant, and provide a comprehensive classification and computational analysis of the Asp-box β-propeller family.  相似文献   

15.

Background  

Partitioning of a protein into structural components, known as domains, is an important initial step in protein classification and for functional and evolutionary studies. While the systematic assignments of domains by human experts exist (CATH and SCOP), the introduction of high throughput technologies for structure determination threatens to overwhelm expert approaches. A variety of algorithmic methods have been developed to expedite this process, allowing almost instant structural decomposition into domains. The performance of algorithmic methods can approach 85% agreement on the number of domains with the consensus reached by experts. However, each algorithm takes a somewhat different conceptual approach, each with unique strengths and weaknesses. Currently there is no simple way to automatically compare assignments from different structure-based domain assignment methods, thereby providing a comprehensive understanding of possible structure partitioning as well as providing some insight into the tendencies of particular algorithms. Most importantly, a consensus assignment drawn from multiple assignment methods can provide a singular and presumably more accurate view.  相似文献   

16.

Background  

The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.  相似文献   

17.

Background  

Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring.  相似文献   

18.

Background

The function of proteins is a direct consequence of their three-dimensional structure. The structural classification of proteins describes the ways of folding patterns all proteins could adopt. Although, the protein folds were described in many ways the functional properties of individual folds were not studied.

Results

We have analyzed two β-barrel folds generally adopted by small proteins to be looking similar but have different topology. On the basis of the topology they could be divided into two different folds named SH3-fold and OB-fold. There was no sequence homology between any of the proteins considered. The sequence diversity and loop variability was found to be important for various binding functions.

Conclusions

The function of Oligonucleotide/oligosaccharide-binding (OB) fold proteins was restricted to either DNA/RNA binding or sugar binding whereas the Src homology 3 (SH3) domain like proteins bind to a variety of ligands through loop modulations. A question was raised whether the evolution of these two folds was through DNA shuffling.  相似文献   

19.

Background  

Most profile and motif databases strive to classify protein sequences into a broad spectrum of protein families. The next step of such database studies should include the development of classification systems capable of distinguishing between subfamilies within a structurally and functionally diverse superfamily. This would be helpful in elucidating sequence-structure-function relationships of proteins.  相似文献   

20.

Background  

Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号