首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have determined consensus protein-fold classifications on the basis of three classification methods, SCOP, CATH, and Dali. These classifications make use of different methods of defining and categorizing protein folds that lead to different views of protein-fold space. Pairwise comparisons of domains on the basis of their fold classifications show that much of the disagreement between the classification systems is due to differing domain definitions rather than assigning the same domain to different folds. However, there are significant differences in the fold assignments between the three systems. These remaining differences can be explained primarily in terms of the breadth of the fold classifications. Many structures may be defined as having one fold in one system, whereas far fewer are defined as having the analogous fold in another system. By comparing these folds for a nonredundant set of proteins, the consensus method breaks up broad fold classifications and combines restrictive fold classifications into metafolds, creating, in effect, an averaged view of fold space. This averaged view requires that the structural similarities between proteins having the same metafold be recognized by multiple classification systems. Thus, the consensus map is useful for researchers looking for fold similarities that are relatively independent of the method used to compare proteins. The 30 most populated metafolds, representing the folds of about half of a nonredundant subset of the PDB, are presented here. The full list of metafolds is presented on the Web.  相似文献   

2.
MOTIVATION: The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. RESULTS: We have tried to assess the comparative probabilities for a number of known structural changes, and to relate the probabilities of such changes with the distance between protein sequences. We have formalized these structural changes using a topological representation of structures (TOPS), and have developed an algorithm for measuring structural distances that involve few evolutionary steps. The probabilities of structural changes then were estimated on the basis of all-against-all comparisons of the sequence and structure of protein domains from the CATH-95 representative set. The results obtained are reasonably consistent for a number of different data subsets and permit the identification of several 'most popular' types of evolutionary changes in protein structure. The results also suggest that alterations in protein structure are more likely to occur when the sequence similarity is >10% (the average similarity being approximately 6% for the data sets employed in this study), and that the distribution of probabilities of structural changes is fairly uniform within the interval of 15-50% sequence similarity. AVAILABILITY: The algorithms have been implemented on the Windows operating system in C++ and using the Borland Visual Component Library. The source code is available on request from the first author. The data sets used for this study (representative sets of protein domains, matrices of sequence similarities and structural distances) are available on http://bioinf.mii.lu.lv/epsrc_project/struct_ev.html.  相似文献   

3.
A common evolutionary origin of two elementary enzyme folds   总被引:1,自引:0,他引:1  
The (beta alpha)(8)-barrel is the most frequent and most versatile fold among enzymes [H?cker et al., Curr. Opin. Biotechnol. 12 (2001) 376-381; Wierenga, FEBS Lett. 492 (2001) 193-198]. Structural and functional evidence suggests that (beta alpha)(8)-barrels evolved from an ancestral half-barrel, which consisted of four (beta alpha) units stabilized by dimerization [Lang et al., Science 289 (2000) 1546-550; H?cker et al., Nat. Struct. Biol. 8 (2001) 32-36; Gerlt and Babbitt, Nat. Struct. Biol. 8 (2001) 5-7]. Here, by performing a comprehensive database search, we detect a striking and unexpected structural and amino acid sequence similarity between (beta alpha)(4) half-barrels and members of the (beta alpha)(5) flavodoxin-like fold. These findings provoke the hypothesis that a large fraction of the modern-day enzymes evolved from a basic structural building block, which can be identified by a combination of sequence and structural analyses.  相似文献   

4.
The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences. The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together. The classification is done at different levels of confidence, and yields a hierarchical organization of all proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il  相似文献   

5.
Single-particle analysis is a structure determining method using electron microscopic (EM) images, which does not require protein crystal. In this method, projections are picked up and used to reconstruct a three-dimensional (3D) structure. When the conical tilting method is not available, the particle images are usually classified and averaged to improve the signal-to-noise ratio. The Euler angles of these average images must be posteriorically assigned to create a primary 3D model. We developed a new, fully automatic unsupervised Euler angle assignment method, which does not require an initial 3D reference and which is applicable to asymmetric molecules. In this method, the Euler angle of each average image is initially set randomly and then automatically corrected in relation to those of the other averages by iterated optimizations using the Simulated Annealing (SA) algorithm. At each iteration, the 3D structure is reconstructed based on the current Euler angles and reprojected back in the average-input directions. A modified cross-correlation between each reprojection and its corresponding original average is then calculated. The correlations are summed as a total 3D echo-correlation score to evaluate the Euler angles at this iteration. Then, one of the projections is selected, its Euler angle is changed randomly, and the score is also calculated. Based on the score change, judgment of whether to accept or reject the new angle is made using the SA algorithm, which is introduced to overcome the local minimums. After a certain number of iterations of this process, the angles of all averages converge so as to create a reliable primary 3D model. This echo-correlated 3D reconstruction with simulated annealing also has potential for wide application to general 3D reconstruction from various types of 2D images.  相似文献   

6.

Background  

Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identification process, called EVEREST (EVolutionary Ensembles of REcurrent SegmenTs), begins by constructing a library of protein segments that emerge in an all vs. all pairwise sequence comparison. It then proceeds to cluster these segments into putative domain families. The selection of the best putative families is done using machine learning techniques. A statistical model is then created for each of the chosen families. This procedure is then iterated: the aforementioned statistical models are used to scan all protein sequences, to recreate a library of segments and to cluster them again.  相似文献   

7.
Huang HL  Chang FL 《Bio Systems》2007,90(2):516-528
An optimal design of support vector machine (SVM)-based classifiers for prediction aims to optimize the combination of feature selection, parameter setting of SVM, and cross-validation methods. However, SVMs do not offer the mechanism of automatic internal relevant feature detection. The appropriate setting of their control parameters is often treated as another independent problem. This paper proposes an evolutionary approach to designing an SVM-based classifier (named ESVM) by simultaneous optimization of automatic feature selection and parameter tuning using an intelligent genetic algorithm, combined with k-fold cross-validation regarded as an estimator of generalization ability. To illustrate and evaluate the efficiency of ESVM, a typical application to microarray classification using 11 multi-class datasets is adopted. By considering model uncertainty, a frequency-based technique by voting on multiple sets of potentially informative features is used to identify the most effective subset of genes. It is shown that ESVM can obtain a high accuracy of 96.88% with a small number 10.0 of selected genes using 10-fold cross-validation for the 11 datasets averagely. The merits of ESVM are three-fold: (1) automatic feature selection and parameter setting embedded into ESVM can advance prediction abilities, compared to traditional SVMs; (2) ESVM can serve not only as an accurate classifier but also as an adaptive feature extractor; (3) ESVM is developed as an efficient tool so that various SVMs can be used conveniently as the core of ESVM for bioinformatics problems.  相似文献   

8.
Rapid increases in taxonomic diversity are generally described as adaptive or evolutionary radiations. Such radiations differ widely in the rate and extent of morphologic innovation, taxonomic diversification and phylogenetic breadth, suggesting that several patterns, and likely processes, are involved. At least four distinct patterns of evolutionary radiation can be identified: novelty events, which generate new morphological complexity (altering the body plan of the group under consideration) but not necessarily with the associated production of many lower taxa; broad diversification events involving many independent lineages that undergo diversification, generate many new species and are driven by new ecological opportunities; economic radiations of a limited group of ecologically (but not necessarily phylogenetically) related clades exploiting a limited new ecologic opportunity; and adaptive radiations that may occur at any taxonomic level, but involve a rapid increase in diversity within a single clade, including “true”; adaptive radiations. Many events produce simple diversity increases with no corresponding increase in genetic/developmental/morphological/behavioral sophistication, but the most evolutionarily interesting events add new levels of complexity.  相似文献   

9.
A protein function is carried out by a specific domain localized at a specific position. In the present study, we report that, within a gene, a specific amino acid sequence can move between a certain position and another position. This was discovered when the sequences of restriction-modification systems within the bacterial species Helicobacter pylori were compared. In the specificity subunit of Type I restriction-modification systems, DNA sequence recognition is mediated by target recognition domain 1 (TRD1) and TRD2. To our surprise, several sequences are shared by TRD1 and TRD2 of genes (alleles) at the same locus (chromosomal location); these domains appear to have moved between the two positions. The gene/protein organization can be represented as x-(TRD1)-y-x-(TRD2)-y, where x and y represent repeat sequences. Movement probably occurs by recombination at these flanking DNA repeats. In accordance with this hypothesis, recombination at these repeats also appears to decrease two TRDs into one TRD or increase these two TRDs to three TRDs (TRD1-TRD2-TRD2) and to allow TRD movement between genes even at different loci. Similar movement of domains between TRD1 and TRD2 was observed for the specificity subunit of a Type IIG restriction enzyme. Similar movement of domain between TRD1 and TRD2 was observed for Type I restriction-modification enzyme specificity genes in two more eubacterial species, Streptococcus pyogenes and Mycoplasma agalactiae. Lateral domain movements within a protein, which we have designated DOMO (domain movement), represent novel routes for the diversification of proteins.  相似文献   

10.
MOTIVATION: Evolutionary relationships of proteins have long been derived from the alignment of protein sequences. But from the view of function, most restraints of evolutionary divergence operate at the level of tertiary structure. It has been demonstrated that quantitative measures of dissimilarity in families of structurally similar proteins can be applied to the construction of trees from a comparison of their three-dimensional structures. However, no convenient tool is publicly available to carry out such analyses. RESULTS: We developed STRUCLA (STRUcture CLAssification), a WWW tool for generation of trees based on evolutionary distances inferred from protein structures according to various methods. The server takes as an input a list of PDB files or the initial alignment of protein coordinates provided by the user (for instance exported from SWISS PDB VIEWER). The user specifies the distance cutoff and selects the distance measures. The server returns series of unrooted trees in the NEXUS format and corresponding distance matrices, as well as a consensus tree. The results can be used as an alternative and a complement to a fixed hierarchy of current protein structure databases. It can complement sequence-based phylogenetic analysis in the 'twilight zone of homology', where amino acid sequences are too diverged to provide reliable relationships.  相似文献   

11.
We have introduced the mutation glycine 29 to alanine, designed to increase the rate of protein folding, into the B-domain of protein A (BdpA). From NMR lineshape analysis, we find the G29A mutation increases the folding rate constant by threefold; the folding time is 3 microsec. Although wild-type BdpA folds extremely fast, simple-point mutations can still speed up the folding; thus, the folding rate is not evolutionarily maximized. The short folding time of G29A BdpA (the shortest time yet reported) makes it an attractive candidate for an all-atom molecular dynamics simulation that could potentially show a complete folding reaction starting from an extended chain. We also constructed a fluorescent variant of BdpA by mutating phenylalanine 13 to tryptophan, allowing fluorescence-based time-resolved temperature-jump measurements. Temperature jumps and NMR complement each other, and give a very complete picture of the folding kinetics.  相似文献   

12.
Here, we provide an analysis of molecular evolution of five of the most populated protein folds: immunoglobulin fold, oligonucleotide-binding fold, Rossman fold, alpha/beta plait, and TIM barrels. In order to distinguish between "historic", functional and structural reasons for amino acid conservations, we consider proteins that acquire the same fold and have no evident sequence homology. For each fold we identify positions that are conserved within each individual family and coincide when non-homologous proteins are structurally superimposed. As a baseline for statistical assessment we use the conservatism expected based on the solvent accessibility. The analysis is based on a new concept of "conservatism-of-conservatism". This approach allows us to identify the structural features that are stabilized in all proteins having a given fold, despite the fact that actual interactions that provide such stabilization may vary from protein to protein. Comparison with experimental data on thermodynamics, folding kinetics and function of the proteins reveals that such universally conserved clusters correspond to either: (i) super-sites (common location of active site in proteins having common tertiary structures but not function) or (ii) folding nuclei whose stability is an important determinant of folding rate, or both (in the case of Rossman fold). The analysis also helps to clarify the relation between folding and function that is apparent for some folds.  相似文献   

13.
MOTIVATION: The number of protein families has been estimated to be as small as 1000. Recent study shows that the growth in discovery of novel structures that are deposited into PDB and the related rate of increase of SCOP categories are slowing down. This indicates that the protein structure space will be soon covered and thus we may be able to derive most of remaining structures by using the known folding patterns. Present tertiary structure prediction methods behave well when a homologous structure is predicted, but give poorer results when no homologous templates are available. At the same time, some proteins that share twilight-zone sequence identity can form similar folds. Therefore, determination of structural similarity without sequence similarity would be beneficial for prediction of tertiary structures. RESULTS: The proposed PFRES method for automated protein fold classification from low identity (<35%) sequences obtains 66.4% and 68.4% accuracy for two test sets, respectively. PFRES obtains 6.3-12.4% higher accuracy than the existing methods. The prediction accuracy of PFRES is shown to be statistically significantly better than the accuracy of competing methods. Our method adopts a carefully designed, ensemble-based classifier, and a novel, compact and custom-designed feature representation that includes nearly 90% less features than the representation of the most accurate competing method (36 versus 283). The proposed representation combines evolutionary information by using the PSI-BLAST profile-based composition vector and information extracted from the secondary structure predicted with PSI-PRED. AVAILABILITY: The method is freely available from the authors upon request.  相似文献   

14.
Mezei M 《Protein engineering》2003,16(10):713-715
A novel fingerprint, defined without the use of distances, is introduced to characterize protein folds. It is of the form of binary matrices whose elements are defined by angles between the C=O direction, the backbone axis and the line connecting the alpha-carbons of the various residues. It is shown that matches in the fingerprint matrices correspond to low r.m.s.d.  相似文献   

15.
One of the goals of structural genomics is to obtain a structural representative of almost every fold in nature. A recent estimate suggests that 70%-80% of soluble protein domains identified in the first 1000 genome sequences should be covered by about 25,000 structures-a reasonably achievable goal. As no current estimates exist for the number of membrane protein families, however, it is not possible to know whether family coverage is a realistic goal for membrane proteins. Here we find that virtually all polytopic helical membrane protein families are present in the already known sequences so we can make an estimate of the total number of families. We find that only approximately 700 polytopic membrane protein families account for 80% of structured residues and approximately 1700 cover 90% of structured residues. While apparently a finite and reachable goal, we estimate that it will likely take more than three decades to obtain the structures needed for 90% residue coverage, if current trends continue.  相似文献   

16.
蛋白质折叠规律研究是生命科学领域重要的前沿课题之一,蛋白质折叠类型分类是折叠规律研究的基础。本研究以SCOP数据库的蛋白质折叠类型分类为基础、以Astral SCOPe 2.05数据库中相似性小于40%的α、β、α+β及α/β类所属的折叠类型为研究对象,完成了989种蛋白质折叠类型的模板构建并形成模板数据库;基于折叠类型设计模板建立了蛋白质折叠类型分类方法,实现了SCOP数据库蛋白质折叠类型的自动化分类。家族模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:95.00%、99.99%、0.94与90.00%、99.97%、0.92,折叠类型模板自洽性检验与独立性检验所得的敏感性、特异性以及MCC的平均值分别为:93.71%、99.97%、0.91与86.00%、99.93%、0.87。结果表明:模板设计合理,可有效用于对已知结构的蛋白质进行分类。  相似文献   

17.
18.
Geno3D: automatic comparative molecular modelling of protein   总被引:14,自引:0,他引:14  
Geno3D (http://geno3d-pbil.ibcp.fr) is an automatic web server for protein molecular modelling. Starting with a query protein sequence, the server performs the homology modelling in six successive steps: (i) identify homologous proteins with known 3D structures by using PSI-BLAST; (ii) provide the user all potential templates through a very convenient user interface for target selection; (iii) perform the alignment of both query and subject sequences; (iv) extract geometrical restraints (dihedral angles and distances) for corresponding atoms between the query and the template; (v) perform the 3D construction of the protein by using a distance geometry approach and (vi) finally send the results by e-mail to the user.  相似文献   

19.
Yona G  Linial N  Linial M 《Proteins》1999,37(3):360-378
We investigate the space of all protein sequences in search of clusters of related proteins. Our aim is to automatically detect these sets, and thus obtain a classification of all protein sequences. Our analysis, which uses standard measures of sequence similarity as applied to an all-vs.-all comparison of SWISSPROT, gives a very conservative initial classification based on the highest scoring pairs. The many classes in this classification correspond to protein subfamilies. Subsequently we merge the subclasses using the weaker pairs in a two-phase clustering algorithm. The algorithm makes use of transitivity to identify homologous proteins; however, transitivity is applied restrictively in an attempt to prevent unrelated proteins from clustering together. This process is repeated at varying levels of statistical significance. Consequently, a hierarchical organization of all proteins is obtained. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Different indices of validity were applied to assess the quality of our classification and compare it with the protein families in the PROSITE and Pfam databases. Our classification agrees with these domain-based classifications for between 64.8% and 88.5% of the proteins. It also finds many new clusters of protein sequences which were not classified by these databases. The hierarchical organization suggested by our analysis reveals finer subfamilies in families of known proteins as well as many novel relations between protein families.  相似文献   

20.

Background  

Formal classification of a large collection of protein structures aids the understanding of evolutionary relationships among them. Classifications involving manual steps, such as SCOP and CATH, face the challenge of increasing volume of available structures. Automatic methods such as FSSP or Dali Domain Dictionary, yield divergent classifications, for reasons not yet fully investigated. One possible reason is that the pairwise similarity scores used in automatic classification do not adequately reflect the judgments made in manual classification. Another possibility is the difference between manual and automatic classification procedures. We explore the degree to which these two factors might affect the final classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号