首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The lipocalins and fatty acid-binding proteins (FABPs) are two recently identified protein families that both function by binding small hydrophobic molecules. We have sought to clarify relationships within and between these two groups through an analysis of both structure and sequence. Within a similar overall folding pattern, we find large parts of the lipocalin and FABP structures to be quantitatively equivalent. The three largest structurally conserved regions within the lipocalin common core correspond to characteristic sequence motifs that we have used to determine the constitution of this family using an iterative sequence analysis procedure. This afforded a new interpretation of the family, which highlighted the difficulties of determining a comprehensive and coherent classification of the lipocalins. The first of the three conserved sequence motifs is also common to the FABPs and corresponds to a conserved structural element characteristic of both families. Similarities of structure and sequence within the two families suggests that they form part of a larger "structural superfamily"; we have christened this overall group the calycins to reflect the cup-shaped structure of its members.  相似文献   

2.
MOTIVATION: Protein families can be defined based on structure or sequence similarity. We wanted to compare two protein family databases, one based on structural and one on sequence similarity, to investigate to what extent they overlap, the similarity in definition of corresponding families, and to create a list of large protein families with unknown structure as a resource for structural genomics. We also wanted to increase the sensitivity of fold assignment by exploiting protein family HMMs. RESULTS: We compared Pfam, a protein family database based on sequence similarity, to Scop, which is based on structural similarity. We found that 70% of the Scop families exist in Pfam while 57% of the Pfam families exist in Scop. Most families that occur in both databases correspond well to each other, but in some cases they are different. Such cases highlight situations in which structure and sequence approaches differ significantly. The comparison enabled us to compile a list of the largest families that do not occur in Scop; these are suitable targets for structure prediction and determination, and may be useful to guide projects in structural genomics. It can be noted that 13 out of the 20 largest protein families without a known structure are likely transmembrane proteins. We also exploited Pfam to increase the sensitivity of detecting homologs of proteins with known structure, by comparing query sequences to Pfam HMMs that correspond to Scop families. For SWISSPROT+TREMBL, this yielded an increase in fold assignment from 31% to 42% compared to using FASTA only. This method assigned a structure to 22% of the proteins in Saccharomyces cerevisiae, 24% in Escherichia coli, and 16% in Methanococcus jannaschii.  相似文献   

3.
The lipocalin protein family: structural and sequence overview   总被引:20,自引:0,他引:20  
Lipocalins are remarkably diverse at the sequence level yet have highly conserved structures. Most lipocalins share three characteristic conserved sequence motifs - the kernel lipocalins - while others are more divergent family members - the outlier lipocalins - typically sharing only one or two. This classification is a useful tool for analysing the family, and within these large sets are smaller groups sharing much higher levels of sequence similarity. The lipocalins are also part of a larger protein superfamily: the calycins, which includes the fatty acid binding proteins, avidins, a group of metalloproteinase inhibitors, and triabin. The superfamily is characterised by a similar structure (a repeated +1 topology beta-barrel) and by the conservation of a remarkable structural signature.  相似文献   

4.
Han LY  Cai CZ  Ji ZL  Cao ZW  Cui J  Chen YZ 《Nucleic acids research》2004,32(21):6437-6444
The function of a protein that has no sequence homolog of known function is difficult to assign on the basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vector machines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed that SVM prediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

5.
Lysine acetylation is a well-studied post-translational modification on both histone and nonhistone proteins. More than 2000 acetylated proteins and 4000 lysine acetylation sites have been identified by large scale mass spectrometry or traditional experimental methods. Although over 20 lysine (K)-acetyl-transferases (KATs) have been characterized, which KAT is responsible for a given protein or lysine site acetylation is mostly unknown. In this work, we collected KAT-specific acetylation sites manually and analyzed sequence features surrounding the acetylated lysine of substrates from three main KAT families (CBP/p300, GCN5/PCAF, and the MYST family). We found that each of the three KAT families acetylates lysines with different sequence features. Based on these differences, we developed a computer program, Acetylation Set Enrichment Based method to predict which KAT-families are responsible for acetylation of a given protein or lysine site. Finally, we evaluated the efficiency of our method, and experimentally detected four proteins that were predicted to be acetylated by two KAT families when one representative member of the KAT family is over expressed. We conclude that our approach, combined with more traditional experimental methods, may be useful for identifying KAT families responsible for acetylated substrates proteome-wide.  相似文献   

6.
Pánek J  Eidhammer I  Aasland R 《Proteins》2005,58(4):923-934
Structural similarity among proteins is reflected in the distribution of hydropathicity along the amino acids in the protein sequence. Similarities in the hydropathy distributions are obvious for homologous proteins within a protein family. They also were observed for proteins with related structures, even when sequence similarities were undetectable. Here we present a novel method that employs the hydropathy distribution in proteins for identification of (sub)families in a set of (homologous) proteins. We represent proteins as points in a generalized hydropathy space, represented by vectors of specifically defined features. The features are derived from hydropathy of the individual amino acids. Projection of this space onto principal axes reveals groups of proteins with related hydropathy distributions. The groups identified correspond well to families of structurally and functionally related proteins. We found that this method accurately identifies protein families in a set of proteins, or subfamilies in a set of homologous proteins. Our results show that protein families can be identified by the analysis of hydropathy distribution, without the need for sequence alignment.  相似文献   

7.
8.
Protein–protein interactions (PPIs) are involved in diverse functions in a cell. To optimize functional roles of interactions, proteins interact with a spectrum of binding affinities. Interactions are conventionally classified into permanent and transient, where the former denotes tight binding between proteins that result in strong complexes, whereas the latter compose of relatively weak interactions that can dissociate after binding to regulate functional activity at specific time point. Knowing the type of interactions has significant implications for understanding the nature and function of PPIs. In this study, we constructed amino acid substitution models that capture mutation patterns at permanent and transient type of protein interfaces, which were found to be different with statistical significance. Using the substitution models, we developed a novel computational method that predicts permanent and transient protein binding interfaces (PBIs) in protein surfaces. Without knowledge of the interacting partner, the method uses a single query protein structure and a multiple sequence alignment of the sequence family. Using a large dataset of permanent and transient proteins, we show that our method, BindML+, performs very well in protein interface classification. A very high area under the curve (AUC) value of 0.957 was observed when predicted protein binding sites were classified. Remarkably, near prefect accuracy was achieved with an AUC of 0.991 when actual binding sites were classified. The developed method will be also useful for protein design of permanent and transient PBIs. © Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

9.
Ubiquitin-like domains are present, apart from ubiquitin-like proteins themselves, in many multidomain proteins involved in different signal transduction processes. The sequence conservation for all ubiquitin superfold family members is rather poor, even between subfamily members, leading to mistakes in sequence alignments using conventional sequence alignment methods. However, a correct alignment is essential, especially for in silico methods that predict binding partners on the basis of sequence and structure. In this study, using 3D-structural information we have generated and manually corrected sequence alignments for proteins of the five ubiquitin superfold subfamilies. On the basis of this alignment, we suggest domains for which structural information will be useful to allow homology modelling. In addition, we have analysed the energetic and electrostatic properties of ubiquitin-like domains in complex with various functional binding proteins using the protein design algorithm FoldX. On the basis of an in silico alanine-scanning mutagenesis, we provide a detailed binding epitope mapping of the hotspots of the ubiquitin domain fold, involved in the interaction with different domains and proteins. Finally, we provide a consensus fingerprint sequence that identifies all sequences described to belong to the ubiquitin superfold family. It is possible that the method that we describe may be applied to other domain families sharing a similar fold but having low levels of sequence homology.  相似文献   

10.
11.
Schultz J  Pils B 《FEBS letters》2002,529(2-3):179-182
N-Acetyl-beta-D-glucosaminidase (O-GlcNAcase) is a key enzyme in the posttranslational modification of intracellular proteins by O-linked N-acetylglucosamine (O-GlcNAc). Here, we show that this protein contains two catalytic domains, one homologous to bacterial hyaluronidases and one belonging to the GCN5-related family of acetyltransferases (GNATs). Using sequence and structural information, we predict that the GNAT homologous region contains the O-GlcNAcase activity. Thus, O-GlcNAcase is the first member of the GNAT family not involved in transfer of acetyl groups, adding a new mode of evolution to this large protein family. Comparison with solved structures of different GNATs led to a reliable structure prediction and mapping of residues involved in binding of the GlcNAc-modified proteins and catalysis.  相似文献   

12.
13.
Plasmodium vivax and Plasmodium knowlesi merozoites invade human erythrocytes that express Duffy blood group surface determinants. A soluble parasite protein of 135 kd binds specifically to a human Duffy antigen. Using antisera affinity purified on the 135 kd protein, we cloned a gene that encodes a member of a P. knowlesi family of erythrocyte binding proteins. The gene is a member of a family that includes three homologous genes located on separate chromosomes. Two genes are expressed as major membrane-bound products that give rise to soluble erythrocyte binding proteins: the 135 kd Duffy binding protein and a 138 kd protein that binds only rhesus erythrocytes. These different erythrocyte binding specificities may result from sequence divergence of the homologous genes. The Duffy receptor family is localized in micronemes, an organelle found in all organisms of the phylum Apicomplexa.  相似文献   

14.
15.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

16.
The HMG1/2 family is a large group of proteins that share a conserved sequence of ~80 amino acids rich in basic, aromatic and proline side chains, referred to as an HMG box. Previous studies show that HMG boxes can bind to DNA in a structure-specific manner. To define the basis for DNA recognition by HMG boxes, we characterize the interaction of two model HMG boxes, one a structure-specific box, rHMGb from the rat HMG1 protein, the other a sequence-specific box, Rox1 from yeast, with oligodeoxynucleotide substrates. Both proteins interact with single-stranded oligonucleotides in this study to form 1:1 complexes. The stoichiometry of binding of rHMGb to duplex or branched DNAs differs: for a 16mer duplex we find a weak 2:1 complex, while a 4:1 protein:DNA complex is detected with a four-way DNA junction of 16mers in the presence of Mg2+. In the case of the sequence-specific Rox1 protein we find tight 1:1 and 2:1 complexes with its cognate duplex sequence and again a 4:1 complex with four-way branched DNA. If the DNA branching is reduced to three arms, both proteins form 3:1 complexes. We believe that these multimeric complexes are relevant for HMG1/2 proteins in vivo, since Mg2+ is present in the nucleus and these proteins are expressed at a very high level.  相似文献   

17.
18.
19.
Shi H  Fan X  Ni Z  Lis JT 《RNA (New York, N.Y.)》2002,8(11):1461-1470
Iterative cycles of in vitro selection and amplification allow rare functional nucleic acid molecules, aptamers, to be isolated from large sequence pools. Here we present an analysis of the progression of a selection experiment that simultaneously yielded two families of RNA aptamers against two disparate targets: the intended target protein (B52/SRp55) and the partitioning matrix. We tracked the sequence abundance and binding activity to reveal the enrichment of the aptamers through successive generations of selected pools. The two aptamer families showed distinct trajectories of evolution, as did members within a single family. We also developed a method to control the relative abundance of an aptamer family in selected pools. This method, involving specific ribonuclease digestion, can be used to reduce the background selection for aptamers that bind the matrix. Additionally, it can be used to isolate a full spectrum of aptamers in a sequential and exhaustive manner for all the different targets in a mixture.  相似文献   

20.
An increasing number of functional studies of proteins have shown that sequence and structural similarities alone may not be sufficient for reliable prediction of their interaction properties. This is particularly true for proteins recognizing specific antibodies, where the prediction of antibody-binding sites, called epitopes, has proven challenging. The antibody-binding properties of an antigen depend on its structure and related dynamics. Aiming to predict the antibody-binding regions of a protein, we investigate a new approach based on the integrated analysis of the dynamical and energetic properties of antigens, to identify nonoptimized, low-intensity energetic interaction networks in the protein structure isolated in solution. The method is based on the idea that recognition sites may correspond to localized regions with low-intensity energetic couplings with the rest of the protein, which allows them to undergo conformational changes, to be recognized by a binding partner, and to tolerate mutations with minimal energetic expense. Upon analyzing the results on isolated proteins and benchmarking against antibody complexes, it is found that the method successfully identifies binding sites located on the protein surface that are accessible to putative binding partners. The combination of dynamics and energetics can thus discriminate between epitopes and other substructures based only on physical properties. We discuss implications for vaccine design.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号