共查询到20条相似文献,搜索用时 15 毫秒
1.
Alignment free methods based on Chaos Game Representation (CGR), also known as sequence signature approaches, have proven of great interest for DNA sequence analysis. Indeed, they have been successfully applied for sequence comparison, phylogeny, detection of horizontal transfers or extraction of representative motifs in regulation sequences. Transposing such methods to proteins poses several fundamental questions related to representation space dimensionality. Several studies have tackled these points, but none has, so far, brought the application of CGRs to proteins to their fully expected potential. Yet, several studies have shown that techniques based on n-peptide frequencies can be relevant for proteins. Here, we investigate the effectiveness of a strategy based on the CGR approach using a fixed reverse encoding of amino acids into nucleic sequences. We first explore its relevance to protein classification into functional families. We then attempt to apply it to the prediction of protein structural classes. Our results suggest that the reverse encoding approach could be relevant in both cases. We show that it is able to classify functional families of proteins by extracting signatures close to the ProSite patterns. Applied to structural classification, the approach reaches scores of correct classification close to 84%, i.e. close to the scores of related methods in the field. Various optimizations of the approach are still possible, which open the door for future applications. 相似文献
2.
In this paper, we present a new scheme named ProtClass for automatic classification of three-dimensional (3D) protein structures. It is a dedicated and unified multiclass classification scheme. Neither detailed structural alignment nor multiple binary classifications are required in this scheme. We adopt a nearest neighbor-based classification strategy. We use a filter-and-refine scheme. In the first step, we filter out the improbable answers using the precalculated parameters from the training data. In the second, we perform a relatively more detailed nearest neighbor search on the remaining answers. We use very concise and effective encoding schemes of the 3D protein structures in both steps. We compare our proposed method against two other dedicated protein structure classification schemes, namely SGM and CPMine. The experimental results show that ProtClass is slightly better in accuracy than SGM and much faster. In comparison with CPMine, ProtClass is much more accurate, while their running times are about the same. We also compare ProtClass against a structural alignment-based classification scheme named DALI, which is found to be more accurate, but extremely slow. The software is available upon request from the authors. The supplementary information on ProtClass method can be found at: http://xena1.ddns.comp.nus.edu.sg/ approximately genesis/PClass.htm. 相似文献
3.
Mark B. Swindells Christine A. Orengo David T. Jones E. Gail Hutchinson Janet M. Thornton 《BioEssays : news and reviews in molecular, cellular and developmental biology》1998,20(11):884-891
In a similar manner to sequence database searching, it is also possible to compare three-dimensional protein structures. Such methods can be extremely useful because a structural similarity may represent a distant evolutionary relationship that is undetectable by sequence analysis. In this review, we summarise the most popular structure comparison methods, show how they can be used for database searching, and then describe some of the most advanced attempts to develop comprehensive protein structure classifications. With such data, it is possible to identify distant evolutionary relationships, provide libraries of unique folds for structure prediction, estimate the total number of folds that exist, and investigate the preference for certain types of structures over others. BioEssays 20:884–891, 1998. © 1998 John Wiley & Sons, Inc. 相似文献
4.
Fourier transform infrared (FTIR) spectroscopy is a very flexible technique for characterization of protein secondary structure. Measurements can be carried out rapidly in a number of different environments based on only small quantities of proteins. For this technique to become more widely used for protein secondary structure characterization, however, further developments in methods to accurately quantify protein secondary structure are necessary. Here we propose a structural classification of proteins (SCOP) class specialized neural networks architecture combining an adaptive neuro-fuzzy inference system (ANFIS) with SCOP class specialized backpropagation neural networks for improved protein secondary structure prediction. Our study shows that proteins can be accurately classified into two main classes "all alpha proteins" and "all beta proteins" merely based on the amide I band maximum position of their FTIR spectra. ANFIS is employed to perform the classification task to demonstrate the potential of this architecture with moderately complex problems. Based on studies using a reference set of 17 proteins and an evaluation set of 4 proteins, improved predictions were achieved compared to a conventional neural network approach, where structure specialized neural networks are trained based on protein spectra of both "all alpha" and "all beta" proteins. The standard errors of prediction (SEPs) in % structure were improved by 4.05% for helix structure, by 5.91% for sheet structure, by 2.68% for turn structure, and by 2.15% for bend structure. For other structure, an increase of SEP by 2.43% was observed. Those results were confirmed by a "leave-one-out" run with the combined set of 21 FTIR spectra of proteins. 相似文献
5.
Ison JC 《Briefings in bioinformatics》2000,1(3):305-312
The protein databank contains coordinates of over 10,000 protein structures, which constitute more than 25,000 structural domains in total. The investigation of protein structural, functional and evolutionary relationships is fundamental to many important fields in bioinformatics research, and will be crucial in determining the function of the human and other genomes.This review describes the SCOP and CATH databases of protein structure classification, which define, classify and annotate each domain in the protein databank. The hierarchical structure, use and annotation of the databases are explained. Other tools for exploring protein structure relationships are also described. 相似文献
6.
Vichetra Sam Chin-Hsien Tai Jean Garnier Jean-Francois Gibrat Byungkook Lee Peter J Munson 《BMC bioinformatics》2008,9(1):74
Background
Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification. 相似文献7.
Gomide J Melo-Minardi R Dos Santos MA Neshich G Meira W Lopes JC Santoro M 《Genetics and molecular biology》2009,32(3):645-651
In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in. 相似文献
8.
Taylor WR 《Molecular & cellular proteomics : MCP》2002,1(4):334-339
A measure of protein structure similarity is calculated from the matching of pairs of secondary structure elements between two proteins. The interaction of each pair was estimated from their axial line segments and combined with other geometric features to produce an optimal discrimination between intrafamily and interfamily relationships. The matching used a fast bipartite graph-matching algorithm that avoids the computational complexity of searching for the full subgraph isomorphism between the two sets of interactions. The main algorithm used was the "stable marriage" algorithm, which works on the ranked "preferences" of one interaction for another. The method takes 1/10 of a second for a typical comparison making it suitable as a fast pre-filter for slower, more exhaustive approaches. An application to protein structure classification is described. 相似文献
9.
The Berkeley Phylogenomics Group presents PhyloFacts, a structural phylogenomic encyclopedia containing almost 10,000 'books'
for protein families and domains, with pre-calculated structural, functional and evolutionary analyses. PhyloFacts enables
biologists to avoid the systematic errors associated with function prediction by homology through the integration of a variety
of experimental data and bioinformatics methods in an evolutionary framework. Users can submit sequences for classification
to families and functional subfamilies. PhyloFacts is available as a worldwide web resource from . 相似文献
10.
Structural biology and structural genomics are expected to produce many three-dimensional protein structures in the near future. Each new structure raises questions about its function and evolution. Correct functional and evolutionary classification of a new structure is difficult for distantly related proteins and error-prone using simple statistical scores based on sequence or structure similarity. Here we present an accurate numerical method for the identification of evolutionary relationships (homology). The method is based on the principle that natural selection maintains structural and functional continuity within a diverging protein family. The problem of different rates of structural divergence between different families is solved by first using structural similarities to produce a global map of folds in protein space and then further subdividing fold neighborhoods into superfamilies based on functional similarities. In a validation test against a classification by human experts (SCOP), 77% of homologous pairs were identified with 92% reliability. The method is fully automated, allowing fast, self-consistent and complete classification of large numbers of protein structures. In particular, the discrimination between analogy and homology of close structural neighbors will lead to functional predictions while avoiding overprediction. 相似文献
11.
A A Adzhubei F Eisenmenger V G Tumanyan M Zinke S Brodzinski N G Esipova 《Journal of biomolecular structure & dynamics》1987,5(3):689-704
A complete classification of types of the protein secondary structure is developed on the basis of computer analysis of the crystallographic structural data deposited in the protein Data Bank. The majority of amino acid residues fall into five conformation types. A conclusion is drawn that the number of sequence variants of torsion angles phi, psi in globular proteins is limited and is essentially less than the number of possible amino acid sequences for this chain length. Along with alpha-helix and beta-structure, the distribution analysis assigning every maximum of distribution of amino acid conformations on Ramachandran map to a certain type of the secondary structure exposed a third type of the secondary structure that was previously neglected. This type of the structure is extended left-handed helical conformation, designated as mobile (M-) conformation. A full set of M-conformation fragments that seems to play a major role in protein globule dynamics has been obtained, a small radius of correlation for the polypeptide chain in M-conformation is demonstrated. It explains a prevalence of short segments of mobile conformation revealed in globular proteins. For secondary structure types, the frequency of occurrence of amino acid residues has been computed. 相似文献
12.
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes. 相似文献
13.
14.
Borro LC Oliveira SR Yamagishi ME Mancini AL Jardine JG Mazoni I Santos EH Higa RH Kuser PR Neshich G 《Genetics and molecular research : GMR》2006,5(1):193-202
Predicting enzyme class from protein structure parameters is a challenging problem in protein analysis. We developed a method to predict enzyme class that combines the strengths of statistical and data-mining methods. This method has a strong mathematical foundation and is simple to implement, achieving an accuracy of 45%. A comparison with the methods found in the literature designed to predict enzyme class showed that our method outperforms the existing methods. 相似文献
15.
Background
The classification of protein domains in the CATH resource is primarily based on structural comparisons, sequence similarity and manual analysis. One of the main bottlenecks in the processing of new entries is the evaluation of 'borderline' cases by human curators with reference to the literature, and better tools for helping both expert and non-expert users quickly identify relevant functional information from text are urgently needed. A text based method for protein classification is presented, which complements the existing sequence and structure-based approaches, especially in cases exhibiting low similarity to existing members and requiring manual intervention. The method is based on the assumption that textual similarity between sets of documents relating to proteins reflects biological function similarities and can be exploited to make classification decisions. 相似文献16.
Summary A method of stabilizing folded proteins is described, which allows NMR studies under conditions where a protein would normally be unfolded. This enables stable proteins to be examined at elevated temperatures, or spectra recorded on samples that are insufficiently stable under normal conditions. Up to two molar perdeuterated glycine, a potent osmolyte, can be added to aqueous protein NMR samples without altering the folded three-dimensional structure or function of the protein. However, the stability of the folded form is dramatically increased. This is illustrated for the protein lysozyme at high temperature (348 K) where the structural integrity is destroyed in standard aqueous solution, but is retained in the osmolyte solution. We hope that the technique will be of value to those studying by NMR the structural biology of protein fragments and mutants, which are often of reduced stability compared with the original proteins.To whom correspondence should be addressed. 相似文献
17.
Exploring the range of protein flexibility, from a structural proteomics perspective 总被引:3,自引:0,他引:3
Changes in protein conformation play a vital role in biochemical processes, from biopolymer synthesis to membrane transport. Initial systematizations of protein flexibility, in a database framework, concentrated on the movement of domains and linkers. Movements were described in terms of simple sliding and hinging mechanisms of individual secondary structural elements. Recently, the accelerated pace and sophistication of methods for structural characterization of proteins has allowed high-resolution studies of increasingly complex assemblies and conformational changes. New data emphasize a breadth of possible structural mechanisms, particularly the ability to drastically alter protein architecture and the native flexibility of many structures. 相似文献
18.
In order to probe the relative contribution of local and non-local interactions to the thermodynamic stability of proteins, we have devised an experimental approach based on a combination of motif engineering and sequence shuffling. Candidate chain segments in an immunoglobulin V(L) domain were identified whose conformation is proposed to be dominated by non-local interactions. Locally interacting structural motifs of a different conformation were then constructed as replacements, by introducing motif consensus sequences. We find that all nine replacements we constructed systematically reduce the folding cooperativity. By comparing this destabilising effect with the folding transitions of shuffled sequences for three of these motifs, we estimate the contribution of local, native interactions to the free energy of folding. Our results suggest that local and non-local interactions contribute to stability by an approximately equal amount, but that local interactions stabilise by increasing the resistance to denaturation while non-local interactions increase folding cooperativity. The systematic loss of stability by sequence shuffling in these host-guest experiments suggests that the designed interactions indeed are present in the native state, thus consensus sequence engineering may be a useful tool in structure design, but non-local interactions must be taken into account for global stability engineering. Statistical approaches are powerful tools for engineering protein structure and stability, but an analysis based on local sequence propensities alone does not adequately represent the balance of sequence and context in protein structures. 相似文献
19.
Background
Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important. 相似文献20.
- Download : Download high-res image (130KB)
- Download : Download full-size image