首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

2.
Ubiquitin E3 ligases are a diverse family of protein complexes that mediate the ubiquitination and subsequent proteolytic turnover of proteins in a highly specific manner. Among the several classes of ubiquitin E3 ligases, the Skp1-Cullin-F-box (SCF) class is generally comprised of three 'core' subunits: Skp1 and Cullin, plus at least one F-box protein (FBP) subunit that imparts specificity for the ubiquitination of selected target proteins. Recent genetic and biochemical evidence in Arabidopsis thaliana suggests that post-translational turnover of proteins mediated by SCF complexes is important for the regulation of diverse developmental and environmental response pathways. In this report, we extend upon a previous annotation of the Arabidopsis Skp1-like (ASK) and FBP gene families to include the Cullin family of proteins. Analysis of the protein interaction profiles involving the products of all three gene families suggests a functional distinction between ASK proteins in that selected members of the protein family interact generally while others interact more specifically with members of the F-box protein family. Analysis of the interaction of Cullins with FBPs indicates that CUL1 and CUL2, but not CUL3A, persist as components of selected SCF complexes, suggesting some degree of functional specialization for these proteins. Yeast two-hybrid analyses also revealed binary protein interactions between selected members of the FBP family in Arabidopsis. These and related results are discussed in terms of their implications for subunit composition, stoichiometry and functional diversity of SCF complexes in Arabidopsis.  相似文献   

3.
The "A Disintegrin And Metalloproteinase" (ADAM) protein family and the "A Disintegrin-like And Metalloproteinase with ThromboSpondin motifs" (ADAMTS) protein family are two related families of human proteins. The similarities and differences between these two families have been investigated using phylogenetic trees and homology modeling. The phylogenetic analysis indicates that the two families are well differentiated, even when only the common metalloprotease domain is taken into account. Within the ADAM family, several proteins are lacking the binding motif for the catalytic zinc in the active site and thus presumably lack any catalytic activity. These proteins tend to cluster within the ADAM phylogenetic tree and are expressed in specific tissues, suggesting a functional differentiation. The present analysis allows us to propose the following: (i) ADAMTS proteins have a conserved role in the human organism as proteases, with some differentiation in terms of substrate specificity; (ii) ADAM proteins can act as proteases and/or mediators of intermolecular interactions; (iii) proteolytically active ADAMs tend to be more ubiquitously expressed than the inactive ones.  相似文献   

4.
MOTIVATION: Databases of protein families often exhibit drastically different properties of the protein family space. RESULTS: We compared the properties of protein family space as reflected by exhaustive protein family databases and databases with predefined families. We used TRIBES, Protomap, ProDom and COGs as representatives of the exhaustive databases, and Pfam-A and Superfamily as databases that predefine families. We observe a power-law distribution of family sizes in all these databases, albeit in predefined databases the power-law line collapses before reaching smaller sized families. We discuss the future trends of this power-law distribution and suggest that saturation in the sampling of protein family space will result in a distortion of the power law in small family sizes. For larger genome sizes, predefined databases show logarithmic growth of the number of families per genome, whereas exhaustive databases exhibit a virtually linear relationship. All databases consistently differ in the proportion of protein families shared between taxa. Predefined databases have a larger number of protein families shared between the three domains of life, while exhaustive databases show a much more fragmented distribution. We argue that these discrepancies reflect alternative approaches to the trade-off issue of sensitivity versus specificity in the detection of homologous proteins. We conclude that these properties are complementary rather than contradictory, while describing the protein universe from different perspectives.  相似文献   

5.
6.
We have analyzed structure-sequence relationships in 32 families of flavin adenine dinucleotide (FAD)-binding proteins, to prepare for genomic-scale analyses of this family. Four different FAD-family folds were identified, each containing at least two or more protein families. Three of these families, exemplified by glutathione reductase (GR), ferredoxin reductase (FR), and p-cresol methylhydroxylase (PCMH) were previously defined, and a family represented by pyruvate oxidase (PO) is newly defined. For each of the families, several conserved sequence motifs have been characterized. Several newly recognized sequence motifs are reported here for the PO, GR, and PCMH families. Each FAD fold can be uniquely identified by the presence of distinctive conserved sequence motifs. We also analyzed cofactor properties, some of which are conserved within a family fold while others display variability. Among the conserved properties is cofactor directionality: in some FAD-structural families, the adenine ring of the FAD points toward the FAD-binding domain, whereas in others the isoalloxazine ring points toward this domain. In contrast, the FAD conformation and orientation are conserved in some families while in others it displays some variability. Nevertheless, there are clear correlations among the FAD-family fold, the shape of the pocket, and the FAD conformation. Our general findings are as follows: (a) no single protein 'pharmacophore' exists for binding FAD; (b) in every FAD-binding family, the pyrophosphate moiety binds to the most strongly conserved sequence motif, suggesting that pyrophosphate binding is a significant component of molecular recognition; and (c) sequence motifs can identify proteins that bind phosphate-containing ligands.  相似文献   

7.
Liu J  Hegyi H  Acton TB  Montelione GT  Rost B 《Proteins》2004,56(2):188-200
A central goal of structural genomics is to experimentally determine representative structures for all protein families. At least 14 structural genomics pilot projects are currently investigating the feasibility of high-throughput structure determination; the National Institutes of Health funded nine of these in the United States. Initiatives differ in the particular subset of "all families" on which they focus. At the NorthEast Structural Genomics consortium (NESG), we target eukaryotic protein domain families. The automatic target selection procedure has three aims: 1) identify all protein domain families from currently five entirely sequenced eukaryotic target organisms based on their sequence homology, 2) discard those families that can be modeled on the basis of structural information already present in the PDB, and 3) target representatives of the remaining families for structure determination. To guarantee that all members of one family share a common foldlike region, we had to begin by dissecting proteins into structural domain-like regions before clustering. Our hierarchical approach, CHOP, utilizing homology to PrISM, Pfam-A, and SWISS-PROT chopped the 103,796 eukaryotic proteins/ORFs into 247,222 fragments. Of these fragments, 122,999 appeared suitable targets that were grouped into >27,000 singletons and >18,000 multifragment clusters. Thus, our results suggested that it might be necessary to determine >40,000 structures to minimally cover the subset of five eukaryotic proteomes.  相似文献   

8.
SUMMARY: In eukaryotes, membranous proteins account for 20-30% of the proteome. Most of these proteins contain one or more transmembrane (TM) domains. These are short segments that transverse the bilayer lipid membrane. Various properties of the TM domains, such as their number, their topology and their arrangement within the membrane, are closely related to the protein's cellular functions. The properties of the TM domains also determine the cellular targeting and localization of these proteins. It is not known, however, whether the information encoded by TM domains suffices for the purpose of classifying proteins into their functional families. This is the question we address here. We introduce an algorithm that creates a profile of each functional family of membranous proteins based only on the amino acid composition of their TM domains. This is complemented by a classifier program for each such family (to determine whether a given protein belongs to it or not). We find that in most instances TM domains contain enough information to allow an accurate discrimination of approximately 80% sensitivity and approximately 90% specificity among unrelated polytopic functional families with the same number of TM domains. SUPPLEMENTARY INFORMATION: Available at www.protonet.cs.huji.ac.il/TM/  相似文献   

9.
Cell surface protein receptors in oral streptococci   总被引:19,自引:0,他引:19  
Abstract Streptococci have a vast repertoire of adherence properties which include binding to human tissue components, epithelial cells and to other bacterial cells. These interactions are determined by the expression of cell-surface receptors some of which are species-specific. In the oral streptococci, two families of surface protein receptors with highly conserved amino acid sequences have been identified. The antigen I/II family of polypeptides are wall-associated high molecular mass proteins (158–166 kDa) with several binding functions that may be attributed to different domains of the receptor molecules. The LraI family of polypeptides are surface-associated lipoproteins (32–33 kDa) involved in adherence of streptococci to salivary glycoprotein pellicle and to oral Actinomyces . A region of amino acid sequence similarity is evident amongst members of the two protein families in Streptococcus gordonii . Ligand-binding specificities of these receptor polypeptides may account for species-specific adherence and site-directed colonization of streptococci within the human oral cavity.  相似文献   

10.
Membrane proteins serve as cellular gatekeepers, regulators, and sensors. Prior studies have explored the functional breadth and evolution of proteins and families of particular interest, such as the diversity of transport-associated membrane protein families in prokaryotes and eukaryotes, the composition of integral membrane proteins, and family classification of all human G-protein coupled receptors. However, a comprehensive analysis of the content and evolutionary associations between membrane proteins and families in a diverse set of genomes is lacking. Here, a membrane protein annotation pipeline was developed to define the integral membrane genome and associations between 21,379 proteins from 34 genomes; most, but not all of these proteins belong to 598 defined families. The pipeline was used to provide target input for a structural genomics project that successfully cloned, expressed, and purified 61 of our first 96 selected targets in yeast. Furthermore, the methodology was applied (1) to explore the evolutionary history of the substrate-binding transmembrane domains of the human ABC transporter superfamily, (2) to identify the multidrug resistance-associated membrane proteins in whole genomes, and (3) to identify putative new membrane protein families.  相似文献   

11.
The 106 small molecule metabolic (SMM) pathways in Escherichia coli are formed by the protein products of 581 genes. We can define 722 domains, nearly all of which are homologous to proteins of known structure, that form all or part of 510 of these proteins. This information allows us to answer general questions on the structural anatomy of the SMM pathway proteins and to trace family relationships and recruitment events within and across pathways. Half the gene products contain a single domain and half are formed by combinations of between two and six domains. The 722 domains belong to one of 213 families that have between one and 51 members. Family members usually conserve their catalytic or cofactor binding properties; substrate recognition is rarely conserved. Of the 213 families, members of only a quarter occur in isolation, i.e. they form single-domain proteins. Most members of the other families combine with domains from just one or two other families and a few more versatile families can combine with several different partners.Excluding isoenzymes, more than twice as many homologues are distributed across pathways as within pathways. However, serial recruitment, with two consecutive enzymes both being recruited to another pathway, is rare and recruitment of three consecutive enzymes is not observed. Only eight of the 106 pathways have a high number of homologues. Homology between consecutive pairs of enzymes with conservation of the main substrate-binding site but change in catalytic mechanism (which would support a simple model of retrograde pathway evolution) occurs only six times in the whole set of enzymes. Most of the domains that form SMM pathways have homologues in non-SMM pathways. Taken together, these results imply a pervasive "mosaic" model for the formation of protein repertoires and pathways.  相似文献   

12.
Computer analysis of the complete genome of Deinococcus radioduransR1 reveals a number of protein families, which are over-represented in this organism, compared to most other bacteria with known genome sequences. These families include both previously characterized and uncharacterized proteins. Most of the families whose functions are known or could be predicted seem to be related to stress-response and elimination of damage products (cell-cleaning). The two most prominent family expansions are the Nudix (MutT) family of pyrophosphohydrolases and a previously unnoticed family of proteins related to Bacillus subtilisDinB that could possess a metal-dependent enzymatic activity whose exact nature remains to be determined. Several proteins of the expanded families, particularly the Nudix family, are fused to other domains and form multidomain proteins that are so far unique for Deinococcus. The domain composition of some of these proteins indicates that they could be involved in novel DNA-repair pathways. Such unique proteins are good targets for knock-out and gene expression studies, which are aimed to shed light on the unusual features of this interesting10.6pt bacterium.  相似文献   

13.
盛嘉  郑思远  郝沛 《生物信息学》2010,8(2):124-126,133
药物靶标发现是目前生物学研究领域的热点和难点问题。从已有药物靶标中寻找规律可以为新靶标的发现总结规律,提供依据。随着功能基因组学的发展,这种组学数据的积累为这一问题的研究提供了契机。本文研究了已有靶标在蛋白网络中的分布,并分析了它们的蛋白功能域组成情况。结果显示靶标基因倾向位于网络的核心区域,并且集中在一些特定蛋白家族中。这些规律的总结将对药物研发过程中药物靶点的选择提供一定的帮助。  相似文献   

14.
Type III secretion systems are used by many Gram-negative bacteria to inject effector proteins into eukaryotic cells to subvert their normal activities. Structurally conserved portions of the type III secretion apparatus include a: basal body located within the bacterial envelope; an exposed needle with tip complex that delivers effectors across the target cell membrane; and cytoplasmic sorting platform that selects cargo and powers secretion. While structurally conserved, the individual proteins that make up this nanomachine are typically not interchangeable though they do tend to fall into families. Here we selected a single domain from the inner membrane ring of the basal body from six different type III secretion systems (called SctD using a proposed unifying nomenclature). The selected domain creates an integral interface between the basal body and the sorting platform. Therefore, it represents a pivotal point between two distinct assemblies. All six protein domains possess a structural motif called a forkhead-associated-like (FHA-like) domain but differ greatly in their sequences and solution behaviors. These differences are used here to define family boundaries for these FHA-like domains. The data parallel, though not precisely, family boundaries defined by other proteins within the apparatus and by phylogenetic analysis. Ultimately, differences in the families are likely to reflect differences in the activities of these type III secretion systems or the host niches in which these pathogens are found.  相似文献   

15.
Family profile analysis (FPA), described in this paper, compares all available homologous amino acid sequences of a target family with the profile of a probe family while conventional sequence profile analysis (Gribskov M, Lüthy R, Eisenberg D. Meth Enzymol 1990;183:146-159) considers only a single target sequence in comparison with the probe family. The increased input of sequence information in FPA expands the range for sequence-based recognition of structural relationships. In the FPA algorithm, Zscores of each of the target sequences, obtained from a probe profile search over all known amino acid sequences, are averaged and then compared with the scores for sequences of 100 reference families in the same probe family search. The resulting F-Zscore of the target family, expressed in "effective standard deviations" of the mean Zscores of the reference families, with value above a threshold of 3.5 indicates a statistically significant evolutionary relationship between the target and probe families. The sensitivity of FPA to sequence information was tested with several protein families where distant relationships have been verified from known tertiary protein architectures, which included vitamin B6-dependent enzymes, (beta/alpha)8-barrel proteins, beta-trefoil proteins, and globins. In comparison to other methods, FPA proved to be significantly more sensitive, finding numerous new homologies. The FPA technique is not only useful to test a suspected relationship between probe and target families but also identifies possible target families in profile searches over all known primary structures.  相似文献   

16.
Spassov DS  Jurecic R 《IUBMB life》2003,55(7):359-366
Drosophila Pumilio (Pum) protein is a founder member of a novel family of RNA-binding proteins, known as the PUF family. The PUF proteins constitute an evolutionarily highly conserved family of proteins present from yeast to humans and plants, and are characterized by a highly conserved C-terminal RNA-binding domain, composed of eight tandem repeats. The conserved biochemical features and genetic function of PUF family members have emerged from studies of model organisms. PUF proteins bind to related sequence motifs in the 3' untranslated region (3'UTR) of specific target mRNAs and repress their translation. Frequently, PUF proteins function asymmetrically to create protein gradients, thus causing asymmetric cell division and regulating cell fate specification. Thus, it was recently proposed that the primordial role of PUF proteins is to sustain mitotic proliferation of stem cells. Here we review the evolution, conserved genetic and biochemical properties of PUF family of proteins, and discuss protein interactions, upstream regulators and downstream targets of PUF proteins. We also suggest that a conserved mechanism of PUF function extends to the newly described mammalian members of the PUF family (human PUM1 and PUM2, and mouse Pum1 and Pum2), that show extensive homology to Drosophila Pum, and could have an important role in cell development, fate specification and differentiation.  相似文献   

17.
We analyzed length differences of eukaryotic, bacterial and archaeal proteins in relation to function, conservation and environmental factors. Comparing Eukaryotes and Prokaryotes, we found that the greater length of eukaryotic proteins is pervasive over all functional categories and involves the vast majority of protein families. The magnitude of these differences suggests that the evolution of eukaryotic proteins was influenced by processes of fusion of single-function proteins into extended multi-functional and multi-domain proteins. Comparing Bacteria and Archaea, we determined that the small but significant length difference observed between their proteins results from a combination of three factors: (i) bacterial proteomes include a greater proportion than archaeal proteomes of longer proteins involved in metabolism or cellular processes, (ii) within most functional classes, protein families unique to Bacteria are generally longer than protein families unique to Archaea and (iii) within the same protein family, homologs from Bacteria tend to be longer than the corresponding homologs from Archaea. These differences are interpreted with respect to evolutionary trends and prevailing environmental conditions within the two prokaryotic groups.  相似文献   

18.
Exobiology, the study of the origin, evolution and distribution of life (including life on earth) within the context of cosmic evolution, is being given a remarkable boost by genome sequencing projects, which are now making the evolutionary histories of protein families routinely available. These histories comprise a multiple alignment for their protein sequences and the corresponding DNA sequences, an evolutionary tree showing the pedigree of these sequences, and reconstructed ancestral sequences for each node in the tree. In a post-genomic world having genomic sequences from an unlimited number of organisms, these histories will be used to connect structure, chemical reactivity, and physiological function to these families. This paper describes several “post-genomic” tools that exploit these evolutionary histories. They can be used to confirm or deny long distance homology between two protein families, identify proteins within a family that have new functions, and identify specific in vitro properties of the protein that are important for its physiological role. Evolution-based data structures for organizing large sequence databases are also described.  相似文献   

19.
20.
MOTIVATION: The completion of the Arabidopsis genome offers the first opportunity to analyze all of the membrane protein sequences of a plant. The majority of integral membrane proteins including transporters, channels, and pumps contain hydrophobic alpha-helices and can be selected based on TransMembrane Spanning (TMS) domain prediction. By clustering the predicted membrane proteins based on sequence, it is possible to sort the membrane proteins into families of known function, based on experimental evidence or homology, or unknown function. This provides a way to identify target sequences for future functional analysis. RESULTS: An automated approach was used to select potential membrane protein sequences from the set of all predicted proteins and cluster the sequences into related families. The recently completed sequence of Arabidopsis thaliana, a model plant, was analyzed. Of the 25,470 predicted protein sequences 4589 (18%) were identified as containing two or more membrane spanning domains. The membrane protein sequences clustered into 628 distinct families containing 3208 sequences. Of these, 211 families (1764 sequences) either contained proteins of known function or showed homology to proteins of known function in other species. However, 417 families (1444 sequences) contained only sequences with no known function and no homology to proteins of known function. In addition, 1381 sequences did not cluster with any family and no function could be assigned to 1337 of these.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号