首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users’ documents. A SMART mirror has been created at http://smart.ox.ac.uk.  相似文献   

2.
3.
Fan JS  Zhang M 《Neuro-Signals》2002,11(6):315-321
As one of the most abundant protein domains in the genomes of metazoans, PDZ domains play important roles in the targeting of proteins to specific cell membranes, as well as assembling proteins into supramolecular signaling complexes. The structures of individual PDZ domains, along with their diverse cooccurrence with a great variety of other protein domains, provide the biochemical basis for the functional diversity of PDZ proteins. In this review, we first briefly summarize the structure and target-binding properties of PDZ domains. After surveying the SMART protein domain database, we attempt to classify PDZ domain proteins into three general categories. We end the review by presenting several recent studies showing some novel features of PDZ domain proteins.  相似文献   

4.
A diverse family of PDZ domains has been identified, but the rules that govern their ligand specificity are not clear. Here we propose a novel classification of PDZ domains based on the nature of amino acids in the two critical positions in the PDZ domain fold. Using these principles, we classified PDZ domains present in the SMART database. Using yeast two-hybrid, in vitro pull-down and plasmon surface resonance assays, we demonstrated that in agreement with their position in the proposed classification the Mint1-1, hINADL-5, and PAR6 PDZ domains display similar dual ligand specificity. The proposed classification helps to organize PDZ domain containing proteins.  相似文献   

5.
Topology predictions for integral membrane proteins can be substantially improved if parts of the protein can be constrained to a given in/out location relative to the membrane using experimental data or other information. Here, we have identified a set of 367 domains in the SMART database that, when found in soluble proteins, have compartment-specific localization of a kind relevant for membrane protein topology prediction. Using these domains as prediction constraints, we are able to provide high-quality topology models for 11% of the membrane proteins extracted from 38 eukaryotic genomes. Two-thirds of these proteins are single spanning, a group of proteins for which current topology prediction methods perform particularly poorly.  相似文献   

6.
Domains are considered as the basic units of protein folding, evolution, and function. Decomposing each protein into modular domains is thus a basic prerequisite for accurate functional classification of biological molecules. Here, we present ADDA, an automatic algorithm for domain decomposition and clustering of all protein domain families. We use alignments derived from an all-on-all sequence comparison to define domains within protein sequences based on a global maximum likelihood model. In all, 90% of domain boundaries are predicted within 10% of domain size when compared with the manual domain definitions given in the SCOP database. A representative database of 249,264 protein sequences were decomposed into 450,462 domains. These domains were clustered on the basis of sequence similarities into 33,879 domain families containing at least two members with less than 40% sequence identity. Validation against family definitions in the manually curated databases SCOP and PFAM indicates almost perfect unification of various large domain families while contamination by unrelated sequences remains at a low level. The global survey of protein-domain space by ADDA confirms that most large and universal domain families are already described in PFAM and/or SMART. However, a survey of the complete set of mobile modules leads to the identification of 1479 new interesting domain families which shuffle around in multi-domain proteins. The data are publicly available at ftp://ftp.ebi.ac.uk/pub/contrib/heger/adda.  相似文献   

7.
In the postgenomic era it is essential that protein sequences are annotated correctly in order to help in the assignment of their putative functions. Over 1300 proteins in current protein sequence databases are predicted to contain a PAS domain based upon amino acid sequence alignments. One of the problems with the current annotation of the PAS domain is that this domain exhibits limited similarity at the amino acid sequence level. It is therefore essential, when using proteins with low-sequence similarities, to apply profile hidden Markov model searches for the PAS domain-containing proteins, as for the PFAM database. From recent 3D X-ray and NMR structures, however, PAS domains appear to have a conserved 3D fold as shown here by structural alignment of the six representative 3D-structures from the PDB database. Large-scale modelling of the PAS sequences from the PFAM database against the 3D-structures of these six structural prototypes was performed. All 3D models generated (> 5700) were evaluated using prosaii. We conclude from our large-scale modelling studies that the PAS and PAC motifs (which are separately defined in the PFAM database) are directly linked and that these two motifs form the PAS fold. The existing subdivision in PAS and PAC motifs, as used by the PFAM and SMART databases, appears to be caused by major differences in sequences in the region connecting these two motifs. This region, as has been shown by Gardner and coworkers for human PAS kinase (Amezcua, C.A., Harper, S.M., Rutter, J. & Gardner, K.H. (2002) Structure 10, 1349-1361, [1]), is very flexible and adopts different conformations depending on the bound ligand. Some PAS sequences present in the PFAM database did not produce a good structural model, even after realignment using a structure-based alignment method, suggesting that these representatives are unlikely to have a fold resembling any of the structural prototypes of the PAS domain superfamily.  相似文献   

8.
The exponential growth of sequence data has become a challenge to database curators and end-users alike and biologists seeking to utilize the data effectively are faced with numerous analysis methods. Here, with practical examples from our bioinformatics analysis of the protein tyrosine phosphatases (PTPs), we show how computational analysis can be exploited to fuel hypothesis-driven experimental research through the exploration of online databases. We cover the following elements: (i) similarity searches and strategies to collect a non-redundant database of tyrosine-specific PTP domains; (ii) utilization of this database to classify human, fly, and worm PTPs (based on alignments and phylogenetic analysis); (iii) three-dimensional structural analysis to identify conserved regions (structure-function) and non-conserved selectivity-determining regions (substrate specificity); and (iv) genomic analysis, including mapping of exon structure, identification of pseudogenes, and exploration of disease databases. We discuss the importance of manual curation, illustrating examples in which pseudogenes give rise to predicted proteins in GenBank and note that domain servers, such as PFAM and SMART, erroneously include dual-specificity and lipid phosphatases in their collection of tyrosine-specific PTPs. To capitalize on our annotated set of 402 PTP domains (from 47 species and five phyla), we identify sequence conservation across taxonomic categories and explore structure-function relationships among tandem domain receptor-like PTPs. We define three Src homology 2 domain-containing PTP genes in stingray, zebrafish, and fugu and speculate on their evolutionary relationship with human pseudogenes. Our annotated sequences, along with a web service for phylogenetic classification of PTP domains, are available online (http://ptp.cshl.edu and http://science.novonordisk.com/ptp).  相似文献   

9.
Structures for protein domains have increased rapidly in recent years owing to advances in structural biology and structural genomics projects. New structures are often similar to those solved previously, and such similarities can give insights into function by linking poorly understood families to those that are better characterized. They also allow the possibility of combing information to find still more proteins adopting a similar structure and sometimes a similar function, and to reprioritize families in structural genomics pipelines. We explore this possibility here by preparing merged profiles for pairs of structurally similar, but not necessarily sequence-similar, domains within the SMART and Pfam database by way of the Structural Classification of Proteins (SCOP). We show that such profiles are often able to successfully identify further members of the same superfamily and thus can be used to increase the sensitivity of database searching methods like HMMer and PSI-BLAST. We perform detailed benchmarks using the SMART and Pfam databases with four complete genomes frequently used as annotation benchmarks. We quantify the associated increase in structural information in Swissprot and discuss examples illustrating the applicability of this approach to understand functional and evolutionary relationships between protein families.  相似文献   

10.
11.
Filamins are large actin-binding and cross-linking proteins which act as linkers between the cytoskeleton and various signaling proteins. Filamin A (FLNa) is the most abundant of the three filamin isoforms found in humans. FLNa contains an N-terminal actin-binding domain and 24 immunoglobulin-like (Ig) domains. The Ig domains are responsible for the FLNa dimerization and most of the interactions that FLNa has with numerous other proteins. There are several crystal and solution structures from isolated single Ig domains of filamins in the PDB database, but only few from longer constructs. Here, we present nearly complete chemical shift assignments of FLNa tandem Ig domains 16–17 and 18–19. Chemical shift mapping between FLNa tandem Ig domain 16–17 and isolated domain 17 suggests a novel domain–domain interaction mode.  相似文献   

12.
The actin cytoskeleton presents the basic force in processes such as cytokinesis, endocytosis, vesicular trafficking and cell migration. Here, we list 30 human singlet CH (calpononin homology/actin binding) containing multidomain molecules, each encoded by one gene. We show the domain distributions as given by the SMART program. These mosaic proteins organize geographically the placement of selected proteins in proximity within the cell. In most instances, their precise location, their actin binding capacity by way of the singlet CH (or by other domains?) and their physiological functions need further elucidation. A dendrogram based solely on the relationship for the human singlet CH domains (in terms of AA sequences) for the various molecules that possess the domain, implies that the singlet descended from a common ancestor which in turn sprouted three main branches of protein products. Each branch bifurcated multiple times thus accounting for a cornucopia of products. Wherever, additional (unassigned), highly homologous regions exist in related proteins (e.g., in LIM and LMO7 or in Tangerin and EH/BP1), these unrecognized domain regions await assignment as specific functional domains. Frequently genes coding multidomain proteins duplicated. The varying modular nature within multidomain proteins should have accelerated evolutionary changes to a degree not feasible to achieve by means of mere post-duplication mutational changes.  相似文献   

13.
Helicases are motor proteins of biological system, which catalyze the opening of energetically stable duplex nucleic acids in an ATP-dependent manner and thereby are involved in almost all aspects of nucleic acid metabolism including cell cycle progression. They contain several conserved domains including the DEAD-box and also several unique domains associated with these. The Pfam database (http://pfam.janelia.org/) is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs). A diverse range of proteins are found in nature, and the functional specificity to each protein, to a greater extent, is imparted by its domain architecture. To this extent, a DEAD-box ATP-dependent RNA helicase (LOC_Os01g36890; Genomic sequence length: 6284 nucleotides; CDS length: 1299 nucleotides; Protein length: 432 amino acids) was studied. The protein sequence was imported for domain search on Pfam. This particular Pfam entry after covering a large proportion of the sequences in the underlying database has generated a more comprehensive coverage across a wide range of phyla of the known domains that are associated with the typical DEAD-box helicase motif. A total of 362 domain architectures were recollected from the Pfam database for the Family: DEAD (PF00270). We have therefore systematically analyzed the domains closely associated with DEAD-motif, which occur in a variety of proteins and can provide insights into their function.  相似文献   

14.
Rice tungro disease is caused by a combination of two viruses: Rice tungro spherical virus (RTSV) and Rice tungro bacilliform virus (RTBV). RTSV has a capsid comprising three coat proteins (CP) species. Three CP genes of RTSV-AP isolate were sequenced and compared with 9 other isolates reported worldwide for their phylogenetic survey of recombination events which revealed that in general Indian isolates are forming one separate cluster while those of Philippines and Malaysia forming a different cluster. A significant proportion of recombination sites were found in the CP1 gene, followed by CP2 and CP3 suggesting that it is a major phenomenon in the evolution of various isolates of RTSV. Some interesting domains and motifs such as; 3,4-dihydroxy-2-butanone 4-phosphate synthase in CP1, Type 1 glutamine amidotransferase domain and RNA binding motifs in CP2, domains of receptor proteins in CP3, and glycosylation motif in CP2 and CP3 were also obtained in RTSV coat protein. In addition, simple modular architecture research tool (SMART) analysis of coat proteins of RTSV predicted the coat protein domain of calicivirus suggesting evolutionary linkages between plant and animal viruses. This study provides an opportunity to establish the molecular evolution and sequence-function relationship of RTSV.  相似文献   

15.
Chen TW  Gan RR  Wu TH  Lin WC  Tang P 《Genomics》2012,100(3):149-156
During the viral infection and replication processes, viral proteins are highly regulated and may interact with host proteins. However, the functions and interaction partners of many viral proteins have yet to be explored. Here, we compiled a VIral Protein domain DataBase (VIP DB) to associate viral proteins with putative functions and interaction partners. We systematically assign domains and infer the functions of proteins and their protein interaction partners from their domain annotations. A total of 2,322 unique domains that were identified from 2,404 viruses are used as a starting point to correlate GO classification, KEGG metabolic pathway annotation and domain-domain interactions. Of the unique domains, 42.7% have GO records, 39.6% have at least one domain-domain interaction record and 26.3% can also be found in either mammals or plants. This database provides a resource to help virologists identify potential roles for viral protein. All of the information is available at http://vipdb.cgu.edu.tw.  相似文献   

16.
The number of amino acid residues contained in the S1 ribosomal protein of various bacteria varies in a wide range: from 111 to 863 residues in Spiroplasma kunkelii and Treponema pallidum, respectively. The architecture of this protein is traditionally (in particular, because of unknown spatial structure) represented as repeated S1 domains, the copy number of which depends on the protein length. The data on the copy number and boundaries of these domains is available in specialized databases, such as SMART, Pfam, and PROSITE; however, these data can be rather different for the same object. In this work, we used the approach utilizing analysis of predicted secondary structure (PsiPred program). This allowed us to detect the structural domains in S1 protein sequences; their copy number varied from one to six. Alignment of the S1 proteins containing different numbers of domains with the S1 RNA-binding domain of Escherichia coli polynucleotide phosphorylase provided for discovering a domain within this family displaying the maximal homology to the E. coli domain. This conservative domain migrates along the chain, and its location in the proteins with different numbers of domains follows a certain pattern. Similar to the S1 domain of polynucleotide phosphorylase, residues Phe19, Phe22, His34, Asp64, and Arg68 in this conservative domain are clustered on the surface to form an RNA-binding site.  相似文献   

17.
Dengler U  Siddiqui AS  Barton GJ 《Proteins》2001,42(3):332-344
The 3Dee database of domain definitions was developed as a comprehensive collection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB). The database includes definitions for complex, multiple-segment and multiple-chain domains as well as simple sequential domains, organized in a structural hierarchy. Two different snapshots of the 3Dee database were analyzed at September 1996 and November 1999. For the November 1999 release, 7,995 PDB entries contained 13,767 protein chains and gave rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 domain structure families (families with similar folds). The proportion of different domain structure families per domain sequence family increases from 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have more alternative folds available to them. Of the representative domains from the domain sequence families, 49% are in the range of 51-150 residues, whereas 64% of the representative chains over 200 residues have more than 1 domain. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest number of domains found in a protein is 13. The analysis shows that over the history of the PDB, new domain folds have been discovered at a slower rate than by random selection of all known folds. Between 1992 and 1997, a constant 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison of the September 1996 release of 3Dee to the Structural Classification of Proteins (SCOP) showed that the domain definitions agreed for 80% of the representative protein chains. However, 3Dee provided explicit domain boundaries for more proteins. 3Dee is accessible on the World Wide Web at http://barton.ebi.ac.uk/servers/3Dee.html.  相似文献   

18.
Hard RL  Liu J  Shen J  Zhou P  Pei D 《Biochemistry》2010,49(50):10737-10746
The BUZ/Znf-UBP domain is a protein module found in the cytoplasmic deacetylase HDAC6, E3 ubiquitin ligase BRAP2/IMP, and a subfamily of ubiquitin-specific proteases. Although several BUZ domains have been shown to bind ubiquitin with high affinity by recognizing its C-terminal sequence (RLRGG-COOH), it is currently unknown whether the interaction is sequence-specific or whether the BUZ domains are capable of binding to proteins other than ubiquitin. In this work, the BUZ domains of HDAC6 and Ubp-M were subjected to screening against a one-bead-one-compound (OBOC) peptide library that exhibited random peptide sequences with free C-termini. Sequence analysis of the selected binding peptides as well as alanine scanning studies revealed that the BUZ domains require a C-terminal Gly-Gly motif for binding. At the more N-terminal positions, the two BUZ domains have distinct sequence specificities, allowing them to bind to different peptides and/or proteins. A database search of the human proteome on the basis of the BUZ domain specificities identified 11 and 24 potential partner proteins for Ubp-M and HDAC6 BUZ domains, respectively. Peptides corresponding to the C-terminal sequences of four of the predicted binding partners (FBXO11, histone H4, PTOV1, and FAT10) were synthesized and tested for binding to the BUZ domains by fluorescence polarization. All four peptides bound to the HDAC6 BUZ domain with low micromolar K(D) values and less tightly to the Ubp-M BUZ domain. Finally, in vitro pull-down assays showed that the Ubp-M BUZ domain was capable of binding to the histone H3-histone H4 tetramer protein complex. Our results suggest that BUZ domains are sequence-specific protein-binding modules, with each BUZ domain potentially binding to a different subset of proteins.  相似文献   

19.
MOTIVATION: Ideally, only proteins that exhibit highly similar domain architectures should be compared with one another as homologues or be classified into a single family. By combining three different indices, the Jaccard index, the Goodman-Kruskal gamma function and the domain duplicate index, into a single similarity measure, we propose a method for comparing proteins based on their domain architectures. RESULTS: Evaluation of the method using the eukaryotic orthologous groups of proteins (KOGs) database indicated that it allows the automatic and efficient comparison of multiple-domain proteins, which are usually refractory to classic approaches based on sequence similarity measures. As a case study, the PDZ and LRR_1 domains are used to demonstrate how proteins containing promiscuous domains can be clearly compared using our method. For the convenience of users, a web server was set up where three different query interfaces were implemented to compare different domain architectures or proteins with domain(s), and to identify the relationships among domain architectures within a given KOG from the Clusters of Orthologous Groups of Proteins database. Conclusion: The approach we propose is suitable for estimating the similarity of domain architectures of proteins, especially those of multidomain proteins. AVAILABILITY: http://cmb.bnu.edu.cn/pdart/.  相似文献   

20.
BackgroundProtein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).ResultsWe characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts.ConclusionsPartial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein’s gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0656-7) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号