首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Li W  Liu Z  Lai L 《Biopolymers》1999,49(6):481-495
A general problem in comparative modeling and protein design is the conformational evaluation of loops with a certain sequence in specific environmental protein frameworks. Loops of different sequences and structures on similar scaffolds are common in the Protein Data Bank (PDB). In order to explore both structural and sequential diversity of them, a data base of loops connecting similar secondary structure fragments is constructed by searching the data base of families of structurally similar proteins and PDB. A total of 84 loop families having 2-13 residues are found among the well-determined structures of resolution better than 2.5 A. Eight alpha-alpha, 20 alpha-beta, 19 beta-alpha, and 37 beta-beta families are identified. Every family contains more than 5 loop motifs. In each family, no loops share same sequence and all the frameworks are well superimposed. Forty-three new loop classes are distinguished in the data base. The structural variability of loops in homologous proteins are examined and shown in 44 families. Motif families are characterized with geometric parameters and sequence patterns. The conformations of loops in each family are clustered into subfamilies using average linkage cluster analysis method. Information such as geometric properties, sequence profile, sequential and structural variability in loop, structural alignment parameters, sequence similarities, and clustering results are provided. Correlations between the conformation of loops and loop sequence, motif sequence, and global sequence of PDB chain are examined in order to find how loop structures depend on their sequences and how they are affected by the local and global environment. Strong correlations (R > 0.75) are only found in 24 families. The best R value is 0.98. The data base is available through the Internet.  相似文献   

2.
《Proteins》2018,86(5):566-580
The ω‐Transaminase Engineering Database (oTAED) was established as a publicly accessible resource on sequences and structures of the biotechnologically relevant ω‐transaminases (ω‐TAs) from Fold types I and IV. The oTAED integrates sequence and structure data, provides a classification based on fold type and sequence similarity, and applies a standard numbering scheme to identify equivalent positions in homologous proteins. The oTAED includes 67 210 proteins (114 655 sequences) which are divided into 169 homologous families based on global sequence similarity. The 44 and 39 highly conserved positions which were identified in Fold type I and IV, respectively, include the known catalytic residues and a large fraction of glycines and prolines in loop regions, which might have a role in protein folding and stability. However, for most of the conserved positions the function is still unknown. Literature information on positions that mediate substrate specificity and stereoselectivity was systematically examined. The standard numbering schemes revealed that many positions which have been described in different enzymes are structurally equivalent. For some positions, multiple functional roles have been suggested based on experimental data in different enzymes. The proposed standard numbering schemes for Fold type I and IV ω‐TAs assist with analysis of literature data, facilitate annotation of ω‐TAs, support prediction of promising mutation sites, and enable navigation in ω‐TA sequence space. Thus, it is a useful tool for enzyme engineering and the selection of novel ω‐TA candidates with desired biochemical properties.  相似文献   

3.
May AC 《Protein engineering》2001,14(4):209-217
Hierarchical classification is probably the most popular approach to group related proteins. However, there are a number of problems associated with its use for this purpose. One is that the resulting tree showing a nested sequence of groups may not be the most suitable representation of the data. Another is that visual inspection is the most common method to decide the most appropriate number of subsets from a tree. In fact, classification of proteins in general is bedevilled with the need for subjective thresholds to define group membership (e.g., 'significant' sequence identity for homologous families). Such arbitrariness is not only intellectually unsatisfying but also has important practical consequences. For instance, it hinders meaningful identification of protein targets for structural genomics. I describe an alternative approach to cluster related proteins without the need for an a priori threshold: one, through its use of dynamic programming, which is guaranteed to produce globally optimal solutions at all levels of partition granularity. Grouping proteins according to weights assigned to their aligned sequences makes it possible to delineate dynamically a 'core-periphery' structure within families. The 'core' of a protein family comprises the most typical sequences while the 'periphery' consists of the atypical ones. Further, a new sequence weighting scheme that combines the information in all the multiply aligned positions of an alignment in a novel way is put forward. Instead of averaging over all positions, this procedure takes into account directly the distribution of sequence variability along an alignment. The relationships between sequence weights and sequence identity are investigated for 168 families taken from HOMSTRAD, a database of protein structure alignments for homologous families. An exact solution is presented for the problem of how to select the most representative pair of sequences for a protein family. Extension of this approach by a greedy algorithm allows automatic identification of a minimal set of aligned sequences. The results of this analysis are available on the Web at http://mathbio.nimr.mrc.ac.uk/~amay.  相似文献   

4.
Summary A database search has revealed significant and extensive sequence similarities among prokaryotic and eukaryotic pyridoxal phosphate (PLP)-dependent decarboxylases, includingDrosophila glutamic acid decarboxylase (GAD) and bacterial histidine decarboxylase (HDC). Based on these findings, the sequences of seven PLP-dependent decarboxylases from five different organisms have been aligned to derive a consensus sequence for this family of enzymes. In addition, quantitative methods have been employed to calculate the relative evolutionary distances between pairs of the decarboxylases comprising this family. The multiple sequence analysis together with the quantitative results strongly suggest an ancient and common origin for all PLP-dependent decarboxylases. This analysis also indicates that prokaryotic and eukaryotic HDC activities evolved independently. Finally, a sensitive search algorithm (PROFILE) was unable to detect additional members of this decarboxylase family in protein sequence databases.  相似文献   

5.
Thiamin diphosphate (ThDP) is the biologically active form of vitamin B1, and ThDP-dependent enzymes are found in all forms of life. The catalytic mechanism of this family requires the formation of a common intermediate, the 2α-carbanion–enamine, regardless of whether the enzyme is involved in C–C bond formation or breakdown, or even formation of C−N, C−O and C−S bonds. This demands that the enzymes must screen substrates prior to, and/or after, formation of the common intermediate. This review is focused on the group for which the second step is the protonation of the 2α-carbanion, i.e., the ThDP-dependent decarboxylases. Based on kinetic data, sequence/structure alignments and mutagenesis studies the factors involved in substrate specificity have been identified.  相似文献   

6.
Family profile analysis (FPA), described in this paper, compares all available homologous amino acid sequences of a target family with the profile of a probe family while conventional sequence profile analysis (Gribskov M, Lüthy R, Eisenberg D. Meth Enzymol 1990;183:146-159) considers only a single target sequence in comparison with the probe family. The increased input of sequence information in FPA expands the range for sequence-based recognition of structural relationships. In the FPA algorithm, Zscores of each of the target sequences, obtained from a probe profile search over all known amino acid sequences, are averaged and then compared with the scores for sequences of 100 reference families in the same probe family search. The resulting F-Zscore of the target family, expressed in "effective standard deviations" of the mean Zscores of the reference families, with value above a threshold of 3.5 indicates a statistically significant evolutionary relationship between the target and probe families. The sensitivity of FPA to sequence information was tested with several protein families where distant relationships have been verified from known tertiary protein architectures, which included vitamin B6-dependent enzymes, (beta/alpha)8-barrel proteins, beta-trefoil proteins, and globins. In comparison to other methods, FPA proved to be significantly more sensitive, finding numerous new homologies. The FPA technique is not only useful to test a suspected relationship between probe and target families but also identifies possible target families in profile searches over all known primary structures.  相似文献   

7.
The 3D structural comparison of families of divergent homologous domains revealed two main populations of hydrophobic amino acids, one with a low and the other with a significantly higher mean solvent accessibility, allowing two regions of the core of protein globular domains to be distinguished. The side chains of hydrophobic amino acids in topologically conserved positions (positions in the structural alignment where only hydrophobic amino acids are found), which we call topohydrophobic positions, are considerably less dispersed than those of the other amino acids (hydrophobic or not). Mean distances between gravity centers of amino acids in topohydrophobic positions are significantly shorter than those for non-topohydrophobic positions and show that the corresponding amino acids are almost all in direct contact in the inner core of globular domains. This study also showed that the small number of topohydrophobic positions is a characteristic of the structural differences between proteins of a family. This criterion is independent of the sequence identity between the sequences and of the root-mean-square distance between their corresponding structures. Using sensitive sequence alignment processes it will be possible, for many protein families, to identify topohydrophobic positions from sequences only. Proteins 33:329–342, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

8.
Coordinated amino acid changes in homologous protein families   总被引:4,自引:0,他引:4  
In the tobamovirus coat protein family, amino acid residues at some spatially close positions are found to be substituted in a coordinated manner [Altschuh et al. (1987) J. Mol. Biol., 193, 693]. Therefore, these positions show an identical pattern of amino acid substitutions when amino acid sequences of these homologous proteins are aligned. Based on this principle, coordinated substitutions have been searched for in three additional protein families: serine proteases, cysteine proteases and the haemoglobins. Coordinated changes have been found in all three protein families mostly within structurally constrained regions. This method works with a varying degree of success depending on the function of the proteins, the range of sequence similarities and the number of sequences considered. By relaxing the criteria for residue selection, the method was adapted to cover a broader range of protein families and to study regions of the proteins having weaker structural constraints. The information derived by these methods provides a general guide for engineering of a large variety of proteins to analyse structure-function relationships.  相似文献   

9.
We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.  相似文献   

10.
The crystal structure of the E1 component from the Escherichia coli pyruvate dehydrogenase multienzyme complex (PDHc) has been determined with phosphonolactylthiamin diphosphate (PLThDP) in its active site. PLThDP serves as a structural and electrostatic analogue of the natural intermediate alpha-lactylthiamin diphosphate (LThDP), in which the carboxylate from the natural substrate pyruvate is replaced by a phosphonate group. This represents the first example of an experimentally determined, three-dimensional structure of a thiamin diphosphate (ThDP)-dependent enzyme containing a covalently bound, pre-decarboxylation reaction intermediate analogue and should serve as a model for the corresponding intermediates in other ThDP-dependent decarboxylases. Regarding the PDHc-specific reaction, the presence of PLThDP induces large scale conformational changes in the enzyme. In conjunction with the E1-PLThDP and E1-ThDP structures, analysis of a H407A E1-PLThDP variant structure shows that an interaction between His-407 and PLThDP is essential for stabilization of two loop regions in the active site that are otherwise disordered in the absence of intermediate analogue. This ordering completes formation of the active site and creates a new ordered surface likely involved in interactions with the lipoyl domains of E2s within the PDHc complex. The tetrahedral intermediate analogue is tightly held in the active site through direct hydrogen bonds to residues His-407, Tyr-599, and His-640 and reveals a new, enzyme-induced, strain-related feature that appears to aid in the decarboxylation process. This feature is almost certainly present in all ThDP-dependent decarboxylases; thus its inclusion in our understanding of general thiamin catalysis is important.  相似文献   

11.
The group IV pyridoxal-5'-phosphate (PLP)-dependent decarboxylases belong to the beta/alpha barrel structural family, and include enzymes with substrate specificity for a range of basic amino acids. A unique homolog of this family, the Paramecium bursaria Chlorella virus arginine decarboxylase (cvADC), shares about 40% amino acid sequence identity with the eukaryotic ornithine decarboxylases (ODCs). The X-ray structure of cvADC has been solved to 1.95 and 1.8 A resolution for the free and agmatine (product)-bound enzymes. The global structural differences between cvADC and eukaryotic ODC are minimal (rmsd of 1.2-1.4 A); however, the active site has significant structural rearrangements. The key "specificity element," is identified as the 310-helix that contains and positions substrate-binding residues such as E296 cvADC (D332 in T. brucei ODC). In comparison to the ODC structures, the 310-helix in cvADC is shifted over 2 A away from the PLP cofactor, thus accommodating the larger arginine substrate. Within the context of this conserved fold, the protein is designed to be flexible in the positioning and amino acid sequence of the 310-helix, providing a mechanism to evolve different substrate preferences within the family without large structural rearrangements. Also, in the structure, the "K148-loop" (homologous to the "K169-loop" of ODC) is observed in a closed, substrate-bound conformation for the first time. Apparently the K148 loop is a mobile loop, analogous to those observed in triose phosphate isomerase and tryptophan synthetase. In conjunction with prior structural studies these data predict that this loop adopts different conformations throughout the catalytic cycle, and that loop movement may be kinetically linked to the rate-limiting step of product release.  相似文献   

12.
Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying ‘priority proteins’ for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/~supfam.  相似文献   

13.
Summary Close structural resemblances between several mammalian highly or moderately repetitive families and some specific tRNAs were detected. The rodent type 2 Alu family, rat identifier (ID) sequences, rabbit C family, and bovine or goat 73-bp repeat are most homologous with lysine tRNA5, phenylalanine tRNA, glycine tRNA, and glycine tRNA, respectively. The homologies extend to secondary structures, and the homologous nucleotides are located on nearly the same secondary structures. The repetitive families mentioned have a common structural organization, with a tRNA-like sequence devoid of an aminoacyl stem region. These features suggest that these repetitive families may be generated by nonhomologous recombination between a tRNA gene and a tRNA-unrelated block.  相似文献   

14.
The lysA gene encodes meso-diaminopimelate (DAP) decarboxylase (E.C.4.1.1.20), the last enzyme of the lysine biosynthetic pathway in bacteria. We have determined the nucleotide sequence of the lysA gene from Pseudomonas aeruginosa. Comparison of the deduced amino acid sequence of the lysA gene product revealed extensive similarity with the sequences of the functionally equivalent enzymes from Escherichia coli and Corynebacterium glutamicum. Even though both P. aeruginosa and E. coli are Gram-negative bacteria, sequence comparisons indicate a greater similarity between enzymes of P. aeruginosa and the Gram- positive bacterium C. glutamicum than between those of P. aeruginosa and E. coli enzymes. Comparison of DAP decarboxylase with protein sequences present in data bases revealed that bacterial DAP decarboxylases are homologous to mouse (Mus musculus) ornithine decarboxylase (E.C.4.1.1.17), the key enzyme in polyamine biosynthesis in mammals. On the other hand, no similarity was detected between DAP decarboxylases and other bacterial amino acid decarboxylases.   相似文献   

15.
A set of aligned homologous protein sequences is divided into two groups consisting of m and n most related sequences. The value of position variability for homologous protein sequences is defined as a number of failures to coincide in the intergroup comparison of all possible m*n pairs of amino acid residues in that position divided by m*n. The position variability value plotted versus the sequence position number with a window of 10 positions gives the intergroup local variability profile. Area S of the figure included between the local variability profile and the straight line corresponding to the mean local variability value is compared with the average area Sr for 1000 random homologous protein families. If S is greater than Sr by more than 2 standard deviation units sigma r, the local variability profile is assumed to contain peaks and hollows corresponding to significant variable and conservative regions of the sequences. The profile extrema containing the area surplus delta S = S-(Sr+ 2 sigma r) are cut off by two straight lines to locate significant regions. The difference (S-Sr) given in standard deviation units sigma r is believed to be the amino acid substitution overall irregularity along the homologous protein sequences OI = (S-Sr)/sigma r. The significant conservative and variable regions of six homologous sequence families (phospholipase A2, cytochromes b, alpha-subunits of Na,K-ATPase, L- and M-subunits of photosynthetic bacteria photoreaction centre and human rhodopsins) were identified. It was shown that for artificial homologous protein sequences derived by k-fold lengthening of natural protein sequences, the OI value rises as square root of k. To compare the degree of substitution irregularity in homologous protein sequence families of different lengths L the value of standard substitution overall irregularity for L = 250 is proposed.  相似文献   

16.
We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of four nucleotides, their distances from the origin and their distribution along the sequences, a twelve-dimensional vector is given to any DNA sequence. The applicability of this analysis on global comparison of gene structures was tested on myoglobin, beta-globin, histone-4, lysozyme, and rhodopsin families. Members from each family exhibit smaller vector distances relative to the distances of members from different families. The vector distance also distinguishes random sequences generated based on same bases composition. Sequence comparisons showed consistency with the BLAST method. Once the new gene is discovered, we can compute the location of this new gene in our genomic space. It is natural to predict that the properties of this new gene are similar to the properties of known genes that are locating near by. Biologists can do various experiments to test these properties.  相似文献   

17.
Study of structure/function relationships constitutes an important field of research, especially for modification of protein function and drug design. However, the fact that rational design (i.e. the modification of amino acid sequences by means of directed mutagenesis, based on knowledge of the three-dimensional structure) appears to be much less efficient than irrational design (i.e. random mutagenesis followed by in vitro selection) clearly indicates that we understand little about the relationships between primary sequence, three-dimensional structure and function. The use of evolutionary approaches and concepts will bring insights to this difficult question. The increasing availability of multigene family sequences that has resulted from genome projects has inspired the creation of novel in silico evolutionary methods to predict details of protein function in duplicated (paralogous) proteins. The underlying principle of all such approaches is to compare the evolutionary properties of homologous sequence positions in paralogs. It has been proposed that the positions that show switches in substitution rate over time--i.e., 'heterotachous sites'--are good indicators of functional divergence. However, it appears that heterotachy is a much more general process, since most variable sites of homologous proteins with no evidence of functional shift are heterotachous. Similarly, it appears that switches in substitution rate are as frequent when paralogous sequences are compared as when orthologous sequences are compared. Heterotachy, instead of being indicative of functional shift, may more generally reflect a less specific process related to the many intra- and inter-molecular interactions compatible with a range of more or less equally viable protein conformations. These interactions will lead to different constraints on the nature of the primary sequences, consistently with theories suggesting the non-independence of substitutions in proteins. However, a specific type of amino acid variation might constitute a good indicator of functional divergence: substitutions occurring at positions that are generally slowly evolving. Such substitutions at constrained sites are indeed much more frequent soon after gene duplication. The identification and analysis of these sites by complementing structural information with evolutionary data may represent a promising direction to future studies dealing with the functional characterization of an ever increasing number of multi-gene families identified by complete genome analysis.  相似文献   

18.
The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.  相似文献   

19.
Structural genomics projects are producing many three-dimensional structures of proteins that have been identified only from their gene sequences. It is therefore important to develop computational methods that will predict sites involved in productive intermolecular interactions that might give clues about functions. Techniques based on evolutionary conservation of amino acids have the advantage over physiochemical methods in that they are more general. However, the majority of techniques neither use all available structural and sequence information, nor are able to distinguish between evolutionary restraints that arise from the need to maintain structure and those that arise from function. Three methods to identify evolutionary restraints on protein sequence and structure are described here. The first identifies those residues that have a higher degree of conservation than expected: this is achieved by comparing for each amino acid position the sequence conservation observed in the homologous family of proteins with the degree of conservation predicted on the basis of amino acid type and local environment. The second uses information theory to identify those positions where environment-specific substitution tables make poor predictions of the overall amino acid substitution pattern. The third method identifies those residues that have highly conserved positions when three-dimensional structures of proteins in a homologous family are superposed. The scores derived from these methods are mapped onto the protein three-dimensional structures and contoured, allowing identification clusters of residues with strong evolutionary restraints that are sites of interaction in proteins involved in a variety of functions. Our method differs from other published techniques by making use of structural information to identify restraints that arise from the structure of the protein and differentiating these restraints from others that derive from intermolecular interactions that mediate functions in the whole organism.  相似文献   

20.
Arf proteins are important regulators of cellular traffic and the founding members of an expanding family of homologous proteins and genomic sequences. They depart from other small GTP-binding proteins by a unique structural device, which we call the 'interswitch toggle', that implements front–back communication from the N-terminus to the nucleotide binding site. Here we define the sequence and structural determinants that propagate information across the protein and identify them in all of the Arf family proteins other than Arl6 and Arl4/Arl7. The positions of these determinants lead us to propose that Arf family members with the interswitch toggle device are activated by a bipartite mechanism acting on opposite sides of the protein. The presence of this communication device might provide a more useful basis for unifying Arf homologs as a family than do the cellular functions of these proteins, which are mostly unrelated. We review available genomic sequences and functional data from this perspective, and identify a novel subfamily that we call Arl8.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号