首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
PALI is a database of structure-based sequence alignments and phylogenetic relationships derived on the basis of three-dimensional structures of homologous proteins. This database enables grouping of pairs of homologous protein structures on the basis of their sequence identity calculated from the structure-based alignment and PALI also enables association of a new sequence to a family and automatic generation of a dendrogram combining the query sequence and homologous protein structures.  相似文献   

2.
Several studies based on the known three-dimensional (3-D) structures of proteins show that two homologous proteins with insignificant sequence similarity could adopt a common fold and may perform same or similar biochemical functions. Hence, it is appropriate to use similarities in 3-D structure of proteins rather than the amino acid sequence similarities in modelling evolution of distantly related proteins. Here we present an assessment of using 3-D structures in modelling evolution of homologous proteins. Using a dataset of 108 protein domain families of known structures with at least 10 members per family we present a comparison of extent of structural and sequence dissimilarities among pairs of proteins which are inputs into the construction of phylogenetic trees. We find that correlation between the structure-based dissimilarity measures and the sequence-based dissimilarity measures is usually good if the sequence similarity among the homologues is about 30% or more. For protein families with low sequence similarity among the members, the correlation coefficient between the sequence-based and the structure-based dissimilarities are poor. In these cases the structure-based dendrogram clusters proteins with most similar biochemical functional properties better than the sequence-similarity based dendrogram. In multi-domain protein families and disulphide-rich protein families the correlation coefficient for the match of sequence-based and structure-based dissimilarity (SDM) measures can be poor though the sequence identity could be higher than 30%. Hence it is suggested that protein evolution is best modelled using 3-D structures if the sequence similarities (SSM) of the homologues are very low.  相似文献   

3.
SCOP: a structural classification of proteins database   总被引:17,自引:0,他引:17  
  相似文献   

4.
The current pace of structural biology now means that protein three-dimensional structure can be known before protein function, making methods for assigning homology via structure comparison of growing importance. Previous research has suggested that sequence similarity after structure-based alignment is one of the best discriminators of homology and often functional similarity. Here, we exploit this observation, together with a merger of protein structure and sequence databases, to predict distant homologous relationships. We use the Structural Classification of Proteins (SCOP) database to link sequence alignments from the SMART and Pfam databases. We thus provide new alignments that could not be constructed easily in the absence of known three-dimensional structures. We then extend the method of Murzin (1993b) to assign statistical significance to sequence identities found after structural alignment and thus suggest the best link between diverse sequence families. We find that several distantly related protein sequence families can be linked with confidence, showing the approach to be a means for inferring homologous relationships and thus possible functions when proteins are of known structure but of unknown function. The analysis also finds several new potential superfamilies, where inspection of the associated alignments and superimpositions reveals conservation of unusual structural features or co-location of conserved amino acids and bound substrates. We discuss implications for Structural Genomics initiatives and for improvements to sequence comparison methods.  相似文献   

5.
According to the method developed previously (Kubota, Y., Takahashi, S., Nishikawa, K. and Ooi, T. (1981) J. Theor, Biol. 91, 347-361), homology among proteins may be estimated quantitatively. We extended the method to investigate the relationship of an amino acid sequence to its teritary structure and identify homologous segments which have homologous native conformations in proteins. First, we selected proper indices for the computation of correlation coefficients from 32 properties inherent to amino acids, such as hydrophobicity. The arithmetic average of correlation coefficients using six indices gave rise to a good correlation for the CD- and EF-hand regions (Ca2+ binding sites) in carp parvalbumin, but poor ones for other segments. We then applied the method to homologous proteins, the three-dimensional structures of which are known: horse hemoglobin alpha-chain and beta-chain; cytochrome c and c2; serine proteases, chymotrypsinogen and elastase; alpha-lytic protease and protease A from prokaryotic organisms. The results show that the sequence homology estimated by the present method has a good correspondence to the homology in three-dimensional structures and therefore the method is promising for the identification of important sites in sequences which have similar native conformations. For an example of the application of the method, two sequences of human interferon, one from fibroblast and the other from leukocyte, are compared, suggesting functional sites in the molecule.  相似文献   

6.
18th Sir Hans Krebs lecture. Knowledge-based protein modelling and design   总被引:12,自引:0,他引:12  
A systematic technique for protein modelling that is applicable to the design of drugs, peptide vaccines and novel proteins is described. Our approach is knowledge-based, depending on the structures of homologous or analogous proteins and more generally on a relational data base of protein three-dimensional structures. The procedure simultaneously aligns the known tertiary structures, selects fragments from the structurally conserved regions on the basis of sequence homology, aligns these with the 'average structure' or 'framework', builds on the loops selected from homologous proteins or a wider database, substitutes sidechains and energy minimises the resultant model. Applications to modelling an homologous structure, tissue plasminogen activator on the basis of another serine proteinase, and to modelling an analogous protein, HIV viral proteinase on the basis of aspartic proteinases, are described. The converse problem of ab initio design is also addressed: this involves the selection of an amino acid sequence to give a particular tertiary structure, in this case a symmetrical domain of two Greek-key motifs.  相似文献   

7.
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.  相似文献   

8.
Material remains of ancestor nucleotides and proteins are largely unavailable, thus sequence comparison among homologous genes in present-day organisms forms the core of current knowledge of molecular evolution. Variation in protein three-dimensional structure is a basis for functional diversity. To study the evolution of three-dimensional structures in related proteins would significantly improve our understanding of protein evolution and function. A protein may contain ancestor conformations that have been allosterically suppressed by evolutionarily additive structures. Using monoclonal antibody probes to detect such conformation in proteins after removing the suppressor structure, our study demonstrates three-dimensional structure evidence for the evolutionary relationship between troponin I and troponin T, two subunits of the troponin complex in the Ca2+-regulatory system of striated muscle, and among their muscle type-specific isoforms. The experimental data show the feasibility of detecting evolutionarily suppressed history-telling structural states in proteins by removing conformational modulator segments added during evolution. In addition to identifying structural modifications that were critical to the emergence of diverged proteins, investigating this novel mode of evolution will help us to understand the origin and functional potential of protein structures.  相似文献   

9.
An approach is described for modelling the three-dimensional structure of a protein from the tertiary structures of several homologous proteins that have been determined by X-ray analysis. A method is developed for the simultaneous superposition of several protein molecules and for the calculation of an 'average structure' or 'framework'. Investigation of the convergence properties of this method, in the case of both weighted and unweighted least squares, demonstrates that both give a unique answer and the latter is robust for an homologous family of proteins. Multi-dimensional scaling is used to subgroup of the proteins with respect to structural homology. The framework calculated on the basis of the family of homologous proteins, or of an appropriate subgroup, is used to align fragments of the known protein structures of high sequence homology with the unknown. This alignment provides a basis for model building the tertiary structure. Different techniques for using the framework to model the mainchain of various globins and an immunoglobulin domain in the structurally conserved regions are investigated.  相似文献   

10.
Similarities in amino acid sequences, three-dimensional structures, and the exon-intron patterns of their genes have indicated thatc-type lysozymes and-lactalbumins are homologous proteins, i.e., descended by divergent evolution from a common ancestor. Like the-lactalbumins, echidna milk, horse milk, and pigeon eggwhite lysozymes all bind Ca(II). Models of their three-dimensional structures, based on their amino acid sequences and the known crystal structures of domestic hen eggwhite and human lysozymes and baboon and human-lactalbumins, have been built. The several structures have been compared and their relationships discussed.  相似文献   

11.
PALI (release 1.2) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of homologous protein domains in various families. The data set of homologous protein structures has been derived by consulting the SCOP database (release 1.50) and the data set comprises 604 families of homologous proteins involving 2739 protein domain structures with each family made up of at least two members. Each member in a family has been structurally aligned with every other member in the same family (pairwise alignment) and all the members in the family are also aligned using simultaneous super-position (multiple alignment). The structural alignments are performed largely automatically, with manual interventions especially in the cases of distantly related proteins, using the program STAMP (version 4.2). Every family is also associated with two dendrograms, calculated using PHYLIP (version 3.5), one based on a structural dissimilarity metric defined for every pairwise alignment and the other based on similarity of topologically equivalent residues. These dendrograms enable easy comparison of sequence and structure-based relationships among the members in a family. Structure-based alignments with the details of structural and sequence similarities, superposed coordinate sets and dendrograms can be accessed conveniently using a web interface. The database can be queried for protein pairs with sequence or structural similarities falling within a specified range. Thus PALI forms a useful resource to help in analysing the relationship between sequence and structure variation at a given level of sequence similarity. PALI also contains over 653 'orphans' (single member families). Using the web interface involving PSI_BLAST and PHYLIP it is possible to associate the sequence of a new protein with one of the families in PALI and generate a phylogenetic tree combining the query sequence and proteins of known 3-D structure. The database with the web interfaced search and dendrogram generation tools can be accessed at http://pauling.mbu.iisc.ernet. in/ approximately pali.  相似文献   

12.
A neural network based predictor of residue contacts in proteins   总被引:1,自引:0,他引:1  
We describe a method based on neural networks for predicting contact maps of proteins using as input chemicophysical and evolutionary information. Neural networks are trained on a data set comprising the contact maps of 200 non-homologous proteins of well resolved three-dimensional structures. The systems learn the association rules between the covalent structure of each protein and its correspondent contact map by means of a standard back propagation algorithm. Validation of the predictor on the training set and on 408 proteins of known structure which are not homologous to those contained in the training set indicate that this method scores higher than statistical approaches previously described and based on correlated mutations and sequence information.  相似文献   

13.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

14.
A new, automated, knowledge-based method for the construction of three-dimensional models of proteins is described. Geometric restraints on target structures are calculated from a consideration of homologous template structures and the wider knowledge base of unrelated protein structures. Three-dimensional structures are calculated from initial partly folded states by high-temperature molecular dynamics simulations followed slow cooling of the system (simulated annealing) using nonphysical potentials. Three-dimensional models for the biotinylated domain from the pyruvate carboxylase of yeast and the lipoylated H-protein from the glycine cleavage system of pea leaf were constructed, based on the known structures of two lipoylated domains of 2-oxo acid dehydrogenase multienzyme complexes. Despite their weak sequence similarity, the three proteins are predicted to have similar three-dimensional structures, representative of a new protein module. Implications for the mechanisms of posttranslational modification of these proteins and their catalytic function are discussed.  相似文献   

15.
The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.  相似文献   

16.
The GF14 family of proteins in Arabidopsis thaliana consists of a homologous group of polypeptides ranging in size from 27 kDa to 32 kDa. As a group, GF14 proteins are also homologous to a family of mammalian proteins most commonly referred to as 14-3-3 proteins. Several distinct and different biochemical activities have been historically attributed to the various isoforms of the mammalian 14-3-3 proteins. These data present the possibility that the various activities are performed by functionally distinct lineages of the gene family. Here we present phylogenetic analyses based on the derived amino acid sequences of five GF14 isoforms expressed in Arabidopsis suspension-cultured cells. A high degree of sequence integrity is apparent in the various Arabidopsis isoforms, and the overall structures of the plant forms are quite conserved with regard to the structures of the known mammalian forms. These gene phylogenies indicate no evolutionary conservation of specific isoform lineages within both plants and animals. Rather, the evolutionary history of this protein appears to be characterized by a separate radiation of plant and animal forms from a common ancestral sequence. Even though the plant and animal forms have evolved independently since that ancestral split, large domains are conserved in both major lineages.  相似文献   

17.

Background  

Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.  相似文献   

18.
In order to bridge the gap between proteins with three-dimensional (3-D) structural information and those without 3-D structures, extensive experimental and computational efforts for structure recognition are being invested. One of the rapid and simple computational approaches for structure recognition makes use of sequence profiles with sensitive profile matching procedures to identify remotely related homologous families. While adopting this approach we used profiles that are generated from structure-based sequence alignment of homologous protein domains of known structures integrated with sequence homologues. We present an assessment of this fast and simple approach. About one year ago, using this approach, we had identified structural homologues for 315 sequence families, which were not known to have any 3-D structural information. The subsequent experimental structure determination for at least one of the members in 110 of 315 sequence families allowed a retrospective assessment of the correctness of structure recognition. We demonstrate that correct folds are detected with an accuracy of 96.4% (106/110). Most (81/106) of the associations are made correctly to the specific structural family. For 23/106, the structure associations are valid at the superfamily level. Thus, profiles of protein families of known structure when used with sensitive profile-based search procedure result in structure association of high confidence. Further assignment at the level of superfamily or family would provide clues to probable functions of new proteins. Importantly, the public availability of these profiles from us could enable one to perform genome wide structure assignment in a local machine in a fast and accurate manner.  相似文献   

19.
The technique of model-building a protein of known sequence but unknown tertiary structure from the structures of homologous proteins is probably so far the most reliable means of mapping from primary to tertiary structure. A key step towards the realization of the aim is to develop ways of aligning three-dimensional structures of homologus proteins, thereby deriving the rules useful for protein modelling. We have developed a generalized differential-geometric representation of protein local conformation for use in a protein comparison program which aligns protein sequences on the basis of their sequence and conformational knowledge. Because the differetial-geometric distance measure between local conformations is independent of the coordinate frame and remains chirality information, the comparison program is easily implemented, relatively rational and reasonably fast. The utility of this program for aligning closely and distantly related homologous proteins is demonstrated by multiple alignment of globins, serine proteinases and aspartic proteinase domains. Particularly, the method has reached the rational alignment between the mammalian and microbial serine proteinases as compared with many published alignment programs.  相似文献   

20.
The catalytic or functionally important residues of a protein are known to exist in evolutionarily constrained regions. However, the patterns of residue conservation alone are sometimes not very informative, depending on the homologous sequences available for a given query protein. Here, we present an integrated method to locate the catalytic residues in an enzyme from its sequence and structure. Mutations of functional residues usually decrease the activity, but concurrently often increase stability. Also, catalytic residues tend to occupy partially buried sites in holes or clefts on the molecular surface. After confirming these general tendencies by carrying out statistical analyses on 49 representative enzymes, these data together with amino acid conservation were evaluated. This novel method exhibited better sensitivity in the prediction accuracy than traditional methods that consider only the residue conservation. We applied it to some so-called "hypothetical" proteins, with known structures but undefined functions. The relationships among the catalytic, conserved, and destabilizing residues in enzymatic proteins are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号