首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
MOTIVATION: Most proteins have evolved to perform specific functions that are dependent on the adoption of well-defined three-dimensional (3D) structures. Specific patterns of conserved residues in amino acid sequences of divergently evolved proteins are frequently observed; these may reflect evolutionary restraints arising both from the need to maintain tertiary structure and the requirement to conserve residues more directly involved in function. Databases of such sequence patterns are valuable in identifying distant homologues, in predicting function and in the study of evolution. RESULTS: A fully automated database of protein sequence patterns, Functional Protein Sequence Pattern Database (FPSPD), has been derived from the analysis of the conserved residues that are predicted to be functional in structurally aligned homologous families in the HOMSTRAD database. Environment-dependent substitution tables, evolutionary trace analysis, solvent accessibility calculations and 3D-structures were used to obtain the FPSPD. The method yielded 3584 patterns that are considered functional and 3049 patterns that are probably functional. FPSPD could be useful for assigning a protein to a homologous superfamily and thereby providing clues about function. AVAILABILITY: FPSPD is available at http://www-cryst.bioc.cam.ac.uk/~fpspd/  相似文献   

2.
The ever increasing speed of DNA sequencing widens the discrepancy between the number of known gene products, and the knowledge of their function and structure. Proper annotation of protein sequences is therefore crucial if the missing information is to be deduced from sequence‐based similarity comparisons. These comparisons become exceedingly difficult as the pairwise identities drop to very low values. To improve the accuracy of domain identification, we exploit the fact that the three‐dimensional structures of domains are much more conserved than their sequences. Based on structure‐anchored multiple sequence alignments of low identity homologues we constructed 850 structure‐anchored hidden Markov models (saHMMs), each representing one domain family. Since the saHMMs are highly family specific, they can be used to assign a domain to its correct family and clearly distinguish it from domains belonging to other families, even within the same superfamily. This task is not trivial and becomes particularly difficult if the unknown domain is distantly related to the rest of the domain sequences within the family. In a search with full length protein sequences, harbouring at least one domain as defined by the structural classification of proteins database (SCOP), version 1.71, versus the saHMM database based on SCOP version 1.69, we achieve an accuracy of 99.0%. All of the few hits outside the family fall within the correct superfamily. Compared to Pfam_ls HMMs, the saHMMs obtain about 11% higher coverage. A comparison with BLAST and PSI‐BLAST demonstrates that the saHMMs have consistently fewer errors per query at a given coverage. Within our recommended E‐value range, the same is true for a comparison with SUPERFAMILY. Furthermore, we are able to annotate 232 proteins with 530 nonoverlapping domains belonging to 102 different domain families among human proteins labelled “unknown” in the NCBI protein database. Our results demonstrate that the saHMM database represents a versatile and reliable tool for identification of domains in protein sequences. With the aid of saHMMs, homology on the family level can be assigned, even for distantly related sequences. Due to the construction of the saHMMs, the hits they provide are always associated with high quality crystal structures. The saHMM database can be accessed via the FISH server at http://babel.ucmp.umu.se/fish/ . Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

3.
4.
The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain‐linker regions and classify multi‐domain proteins. An alignment‐free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi‐domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi‐domain protein sequences. In this article, CLAP‐based classification has been explored on 5 datasets of multi‐domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain‐level CLAP‐based classification scheme resulted in a clustering similar to that obtained from an alignment‐based method. CLAP‐based clusters obtained for full‐length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi‐domain proteins could be classified effectively by considering full‐length sequences without a requirement of identification of domains in the sequence.  相似文献   

5.
Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top‐ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top‐ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure‐function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.  相似文献   

6.
In the postgenomic era it is essential that protein sequences are annotated correctly in order to help in the assignment of their putative functions. Over 1300 proteins in current protein sequence databases are predicted to contain a PAS domain based upon amino acid sequence alignments. One of the problems with the current annotation of the PAS domain is that this domain exhibits limited similarity at the amino acid sequence level. It is therefore essential, when using proteins with low-sequence similarities, to apply profile hidden Markov model searches for the PAS domain-containing proteins, as for the PFAM database. From recent 3D X-ray and NMR structures, however, PAS domains appear to have a conserved 3D fold as shown here by structural alignment of the six representative 3D-structures from the PDB database. Large-scale modelling of the PAS sequences from the PFAM database against the 3D-structures of these six structural prototypes was performed. All 3D models generated (> 5700) were evaluated using prosaii. We conclude from our large-scale modelling studies that the PAS and PAC motifs (which are separately defined in the PFAM database) are directly linked and that these two motifs form the PAS fold. The existing subdivision in PAS and PAC motifs, as used by the PFAM and SMART databases, appears to be caused by major differences in sequences in the region connecting these two motifs. This region, as has been shown by Gardner and coworkers for human PAS kinase (Amezcua, C.A., Harper, S.M., Rutter, J. & Gardner, K.H. (2002) Structure 10, 1349-1361, [1]), is very flexible and adopts different conformations depending on the bound ligand. Some PAS sequences present in the PFAM database did not produce a good structural model, even after realignment using a structure-based alignment method, suggesting that these representatives are unlikely to have a fold resembling any of the structural prototypes of the PAS domain superfamily.  相似文献   

7.
Protein kinases phosphorylating Ser/Thr/Tyr residues in several cellular proteins exert tight control over their biological functions. They constitute the largest protein family in most eukaryotic species. Protein kinases classified based on sequence similarity in their catalytic domains, cluster into subfamilies, which share gross functional properties. Many protein kinases are associated or tethered covalently to domains that serve as adapter or regulatory modules, aiding substrate recruitment, specificity, and also serve as scaffolds. Hence the modular organisation of the protein kinases serves as guidelines to their functional and molecular properties. Analysis of genomic repertoires of protein kinases in eukaryotes have revealed wide spectrum of domain organisation across various subfamilies of kinases. Occurrence of organism-specific novel domain combinations suggests functional diversity achieved by protein kinases in order to regulate variety of biological processes. In addition, domain architecture of protein kinases revealed existence of hybrid protein kinase subfamilies and their emerging roles in the signaling of eukaryotic organisms. In this review we discuss the repertoire of non-kinase domains tethered to multi-domain kinases in the metazoans. Similarities and differences in the domain architectures of protein kinases in these organisms indicate conserved and unique features that are critical to functional specialization.  相似文献   

8.
Prediction of amino acid sequence from structure   总被引:2,自引:0,他引:2       下载免费PDF全文
We have developed a method for the prediction of an amino acid sequence that is compatible with a three-dimensional backbone structure. Using only a backbone structure of a protein as input, the algorithm is capable of designing sequences that closely resemble natural members of the protein family to which the template structure belongs. In general, the predicted sequences are shown to have multiple sequence profile scores that are dramatically higher than those of random sequences, and sometimes better than some of the natural sequences that make up the superfamily. As anticipated, highly conserved but poorly predicted residues are often those that contribute to the functional rather than structural properties of the protein. Overall, our analysis suggests that statistical profile scores of designed sequences are a novel and valuable figure of merit for assessing and improving protein design algorithms.  相似文献   

9.
Selengut JD 《Biochemistry》2001,40(42):12704-12711
MDP-1 is a eukaryotic magnesium-dependent acid phosphatase with little sequence homology to previously characterized phosphatases. The presence of a conserved motif (Asp-X-Asp-X-Thr) in the N terminus of MDP-1 suggested a relationship to the haloacid dehalogenase (HAD) superfamily, which contains a number of magnesium-dependent acid phosphatases. These phosphatases utilize an aspartate nucleophile and contain a number of conserved active-site residues and hydrophobic patches, which can be plausibly aligned with conserved residues in MDP-1. Seven site-specific point mutants of MDP-1 were produced by modifying the catalytic aspartate, serine, and lysine residues to asparagine or glutamate, alanine, and arginine, respectively. The activity of these mutants confirms the assignment of MDP-1 as a member of the HAD superfamily. Detailed comparison of the sequence of the 15 MDP-1 sequences from various organisms with other HAD superfamily sequences suggests that MDP-1 is not closely related to any particular member of the superfamily. The crystal structures of several HAD family enzymes identify a domain proximal to the active site responsible for important interactions with low molecular weight substrates. The absence of this domain or any other that might perform the same function in MDP-1 suggests an "open" active site capable of interactions with large substrates such as proteins. This suggestion was experimentally confirmed by demonstration that MDP-1 is competent to catalyze the dephosphorylation of tyrosine-phosphorylated proteins.  相似文献   

10.
In multi‐domain proteins, the domains typically run end‐to‐end, that is, one domain follows the C‐terminus of another domain. However, approximately 10% of multi‐domain proteins are formed by insertion of one domain sequence into that of another domain. Detecting such insertions within protein sequences is a fundamental challenge in structural biology. The haloacid dehalogenase superfamily (HADSF) serves as a challenging model system wherein a variable cap domain (~5–200 residues in length) accessorizes the ubiquitous Rossmann‐fold core domain, with variations in insertion site and topology corresponding to different classes of cap types. Herein, we describe a comprehensive computational strategy, CapPredictor, for determining large, variable domain insertions in protein sequences. Using a novel sequence‐alignment algorithm in conjunction with a structure‐guided sequence profile from 154 core‐domain‐only structures, more than 40,000 HADSF member sequences were assigned cap types. The resulting data set afforded insight into HADSF evolution. Notably, a similar distribution of cap‐type classes across different phyla was observed, indicating that all cap types existed in the last universal common ancestor. In addition, comparative analyses of the predicted cap‐type and functional assignments showed that different cap types carry out similar chemistries. Thus, while cap domains play a role in substrate recognition and chemical reactivity, cap‐type does not strictly define functional class. Through this example, we have shown that CapPredictor is an effective new tool for the study of form and function in protein families where domain insertion occurs. Proteins 2014; 82:1896–1906. © 2014 Wiley Periodicals, Inc.  相似文献   

11.
12.
Biotin carboxyl carrier protein (BCCP) is the small biotinylated subunit of Escherichia coli acetyl-CoA carboxylase, the enzyme that catalyzes the first committed step of fatty acid synthesis. E. coli BCCP is a member of a large family of protein domains modified by covalent attachment of biotin. In most biotinylated proteins, the biotin moiety is attached to a lysine residue located about 35 residues from the carboxyl terminus of the protein, which lies in the center of a strongly conserved sequence that forms a tightly folded anti-parallel beta-barrel structure. Located upstream of the conserved biotinoyl domain sequence are proline/alanine-rich sequences of varying lengths, which have been proposed to act as flexible linkers. In E. coli BCCP, this putative linker extends for about 42 residues with over half of the residues being proline or alanine. I report that deletion of the 30 linker residues located adjacent to the biotinoyl domain resulted in a BCCP species that was defective in function in vivo, although it was efficiently biotinylated. Expression of this BCCP species failed to restore normal growth and fatty acid synthesis to a temperature-sensitive E. coli strain that lacks BCCP when grown at nonpermissive temperatures. In contrast, replacement of the deleted BCCP linker with a linker derived from E. coli pyruvate dehydrogenase gave a chimeric BCCP species that had normal in vivo function. Expression of BCCPs having deletions of various segments of the linker region of the chimeric protein showed that some deletions of up to 24 residues had significant or full biological activity, whereas others had very weak or no activity. The inactive deletion proteins all lacked an APAAAAA sequence located adjacent to the tightly folded biotinyl domain, whereas deletions that removed only upstream linker sequences remained active. Deletions within the linker of the wild type BCCP protein also showed that the residues adjacent to the tightly folded domain play an essential role in protein function, although in this case some proteins with deletions within this region retained activity. Retention of activity was due to fusion of the domain to upstream sequences. These data provide new evidence for the functional and structural similarities of biotinylated and lipoylated proteins and strongly support a common evolutionary origin of these enzyme subunits.  相似文献   

13.
We describe a novel approach for inferring functional relationship of proteins by detecting sequence and spatial patterns of protein surfaces. Well-formed concave surface regions in the form of pockets and voids are examined to identify similarity relationship that might be directly related to protein function. We first exhaustively identify and measure analytically all 910,379 surface pockets and interior voids on 12,177 protein structures from the Protein Data Bank. The similarity of patterns of residues forming pockets and voids are then assessed in sequence, in spatial arrangement, and in orientational arrangement. Statistical significance in the form of E and p-values is then estimated for each of the three types of similarity measurements. Our method is fully automated without human intervention and can be used without input of query patterns. It does not assume any prior knowledge of functional residues of a protein, and can detect similarity based on surface patterns small and large. It also tolerates, to some extent, conformational flexibility of functional sites. We show with examples that this method can detect functional relationship with specificity for members of the same protein family and superfamily, as well as remotely related functional surfaces from proteins of different fold structures. We envision that this method can be used for discovering novel functional relationship of protein surfaces, for functional annotation of protein structures with unknown biological roles, and for further inquiries on evolutionary origins of structural elements important for protein function.  相似文献   

14.
Libraries of de novo proteins provide an opportunity to explore the structural and functional potential of biological molecules that have not been biased by billions of years of evolutionary selection. Given the enormity of sequence space, a rational approach to library design is likely to yield a higher fraction of folded and functional proteins than a stochastic sampling of random sequences. We previously investigated the potential of library design by binary patterning of hydrophobic and hydrophilic amino acids. The structure of the most stable protein from a binary patterned library of de novo 4-helix bundles was solved previously and shown to be consistent with the design. One structure, however, cannot fully assess the potential of the design strategy, nor can it account for differences in the stabilities of individual proteins. To more fully probe the quality of the library, we now report the NMR structure of a second protein, S-836. Protein S-836 proved to be a 4-helix bundle, consistent with design. The similarity between the two solved structures reinforces previous evidence that binary patterning can encode stable, 4-helix bundles. Despite their global similarities, the two proteins have cores that are packed at different degrees of tightness. The relationship between packing and dynamics was probed using the Modelfree approach, which showed that regions containing a high frequency of chemical exchange coincide with less well-packed side chains. These studies show (1) that binary patterning can drive folding into a particular topology without the explicit design of residue-by-residue packing, and (2) that within a superfamily of binary patterned proteins, the structures and dynamics of individual proteins are modulated by the identity and packing of residues in the hydrophobic core.  相似文献   

15.
Southampton virus (SHV) is a member of the Norwalk-like viruses (NLVs), one of four genera of the family Caliciviridae. The genome of SHV contains three open reading frames (ORFs). ORF 1 encodes a polyprotein that is autocatalytically processed into six proteins, one of which is p41. p41 shares sequence motifs with protein 2C of picornaviruses and superfamily 3 helicases. We have expressed p41 of SHV in bacteria. Purified p41 exhibited nucleoside triphosphate (NTP)-binding and NTP hydrolysis activities. The NTPase activity was not stimulated by single-stranded nucleic acids. SHV p41 had no detectable helicase activity. Protein sequence comparison between the consensus sequences of NLV p41 and enterovirus protein 2C revealed regions of high similarity. According to secondary structure prediction, the conserved regions were located within a putative central domain of alpha helices and beta strands. This study reveals for the first time an NTPase activity associated with a calicivirus-encoded protein. Based on enzymatic properties and sequence information, a functional relationship between NLV p41 and enterovirus 2C is discussed in regard to the role of 2C-like proteins in virus replication.  相似文献   

16.
Hemerythrin‐like proteins have generally been studied for their ability to reversibly bind oxygen through their binuclear nonheme iron centers. However, in recent years, it has become increasingly evident that some members of the hemerythrin‐like superfamily also participate in many other biological processes. For instance, the binuclear nonheme iron site of YtfE, a hemerythrin‐like protein involved in the repair of iron centers in Escherichia coli, catalyzes the reduction of nitric oxide to nitrous oxide, and the human F‐box/LRR‐repeat protein 5, which contains a hemerythrin‐like domain, is involved in intracellular iron homeostasis. Furthermore, structural data on hemerythrin‐like domains from two proteins of unknown function, PF0695 from Pyrococcus furiosus and NMB1532 from Neisseria meningitidis, show that the cation‐binding sites, typical of hemerythrin, can be absent or be occupied by metal ions other than iron. To systematically investigate this functional and structural diversity of the hemerythrin‐like superfamily, we have collected hemerythrin‐like sequences from a database comprising fully sequenced proteomes and generated a cluster map based on their all‐against‐all pairwise sequence similarity. Our results show that the hemerythrin‐like superfamily comprises a large number of protein families which can be classified into three broad groups on the basis of their cation‐coordinating residues: (a) signal‐transduction and oxygen‐carrier hemerythrins (H‐HxxxE‐HxxxH‐HxxxxD); (b) hemerythrin‐like (H‐HxxxE‐H‐HxxxE); and, (c) metazoan F‐box proteins (H‐HExxE‐H‐HxxxE). Interestingly, all but two hemerythrin‐like families exhibit internal sequence and structural symmetry, suggesting that a duplication event may have led to the origin of the hemerythrin domain.  相似文献   

17.
WD40‐repeat proteins are abundant and play important roles in forming protein complexes. The domain usually has seven WD40 repeats, which folds into a seven β‐sheet propeller with each β‐sheet in a four‐strand structure. An analysis of 20 available WD40‐repeat proteins in Protein Data Bank reveals that each protein has at least one Asp‐His‐Ser/Thr‐Trp (D‐H‐S/T‐W) hydrogen‐bonded tetrad, and some proteins have up to six or seven such tetrads. The relative positions of the four residues in the tetrads are also found to be conserved. A sequence alignment analysis of 560 WD40‐repeat protein sequences in human reveals very similar features, indicating that such tetrad may be a general feature of WD40‐repeat proteins. We carried out density functional theory and found that these tetrads can lead to significant stabilization including hydrogen‐bonding cooperativity. The hydrogen bond involving Trp is significant. These results lead us to propose that the tetrads may be critical to the stability and the mechanism of folding of these proteins. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

18.
Ward RE  Schweizer L  Lamb RS  Fehon RG 《Genetics》2001,159(1):219-228
Coracle is a member of the Protein 4.1 superfamily of proteins, whose members include Protein 4.1, the Neurofibromatosis 2 tumor suppressor Merlin, Expanded, the ERM proteins, protein tyrosine phosphatases, and unconventional myosins. Recent evidence suggests that members of this family participate in cell signaling events, including those that regulate cell proliferation and the cytoskeleton. Previously, we demonstrated that Coracle protein is localized to the septate junction in epithelial cells and is required for septate junction integrity. Loss of coracle function leads to defects in embryonic development, including failure in dorsal closure, and to proliferation defects. In addition, we determined that the N-terminal 383 amino acids define an essential functional domain possessing membrane-organizing properties. Here we investigate the full range of functions provided by this highly conserved domain and find that it is sufficient to rescue all embryonic defects associated with loss of coracle function. In addition, this domain is sufficient to rescue the reduced cell proliferation defect in imaginal discs, although it is incapable of rescuing null mutants to the adult stage. This result suggests the presence of a second functional domain within Coracle, a notion supported by molecular characterization of a series of coracle alleles.  相似文献   

19.
Nine proteins have been assigned to date to the superfamily of mammalian small heat shock proteins (sHsps): Hsp27 (HspB1, Hsp25), myotonic dystrophy protein kinase-binding protein (MKBP) (HspB2), HspB3, alphaA-crystallin (HspB4), alphaB-crystallin (HspB5), Hsp20 (p20, HspB6), cardiovascular heat shock protein (cvHsp [HspB7]), Hsp22 (HspB8), and HspB9. The most pronounced structural feature of sHsps is the alpha-crystallin domain, a conserved stretch of approximately 80 amino acid residues in the C-terminal half of the molecule. Using the alpha-crystallin domain of human Hsp27 as query in a BLAST search, we found sequence similarity with another mammalian protein, the sperm outer dense fiber protein (ODFP). ODFP occurs exclusively in the axoneme of sperm cells. Multiple alignment of human ODFP with the other human sHsps reveals that the primary structure of ODFP fits into the sequence pattern that is typical for this protein superfamily: alpha-crystallin domain (conserved), N-terminal domain (less conserved), central region (variable), and C-terminal tails (variable). In a phylogenetic analysis of 167 proteins of the sHsp superfamily, using Bayesian inference, mammalian ODFPs form a clade and are nested within previously identified sHsps, some of which have been implicated in cytoskeletal functions. Both the multiple alignment and the phylogeny suggest that ODFP is the 10th member of the superfamily of mammalian sHsps, and we propose to name it HspB10 in analogy with the other sHsps. The C-terminal tail of HspB10 has a remarkable low-complexity structure consisting of 10 repeats of the motif C-X-P. A BLAST search using the C-terminal tail as query revealed similarity with sequence elements in a number of Drosophila male sperm proteins, and mammalian type I keratins and cornifin-alpha. Taken together, the following findings suggest a specialized role of HspB10 in cytoskeleton: (1) the exclusive location in sperm cell tails, (2) the phylogenetic relationship with sHsps implicated in cytoskeletal functions, and (3) the partial similarity with cytoskeletal proteins.  相似文献   

20.
Proteins of the nucleic acid‐binding proteins superfamily perform such functions as processing, transport, storage, stretching, translation, and degradation of RNA. It is one of the 16 superfamilies containing the OB‐fold in protein structures. Here, we have analyzed the superfamily of nucleic acid‐binding proteins (the number of sequences exceeds 200,000) and obtained that this superfamily prevalently consists of proteins containing the cold shock DNA‐binding domain (ca. 131,000 protein sequences). Proteins containing the S1 domain compose 57% from the cold shock DNA‐binding domain family. Furthermore, we have found that the S1 domain was identified mainly in the bacterial proteins (ca. 83%) compared to the eukaryotic and archaeal proteins, which are available in the UniProt database. We have found that the number of multiple repeats of S1 domain in the S1 domain‐containing proteins depends on the taxonomic affiliation. All archaeal proteins contain one copy of the S1 domain, while the number of repeats in the eukaryotic proteins varies between 1 and 15 and correlates with the protein size. In the bacterial proteins, the number of repeats is no more than 6, regardless of the protein size. The large variation of the repeat number of S1 domain as one of the structural variants of the OB‐fold is a distinctive feature of S1 domain‐containing proteins. Proteins from the other families and superfamilies have either one OB‐fold or change slightly the repeat numbers. On the whole, it can be supposed that the repeat number is a vital for multifunctional activity of the S1 domain‐containing proteins. Proteins 2017; 85:602–613. © 2016 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号