首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The Structural Motifs of Superfamilies (SMoS) database provides information about the structural motifs of aligned protein domain superfamilies. Such motifs among structurally aligned multiple members of protein superfamilies are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural content, hydrogen bonding, non-polar interaction and residue packing. These motifs, along with their sequence and spatial orientation, represent the conserved core structure of each superfamily and also provide the minimal requirement of sequence and structural information to retain each superfamily fold.  相似文献   



SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure.  相似文献   



Inference of remote homology between proteins is very challenging and remains a prerogative of an expert. Thus a significant drawback to the use of evolutionary-based protein structure classifications is the difficulty in assigning new proteins to unique positions in the classification scheme with automatic methods. To address this issue, we have developed an algorithm to map protein domains to an existing structural classification scheme and have applied it to the SCOP database.  相似文献   

A database comprising all ligand-binding sites of known structure aligned with all related protein sequences and structures is described. Currently, the database contains approximately 50000 ligand-binding sites for small molecules found in the Protein Data Bank (PDB). The structure-structure alignments are obtained by the Combinatorial Extension (CE) program (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998) and sequence-structure alignments are extracted from the ModBase database of comparative protein structure models for all known protein sequences (Sanchez et al., Nucleic Acids Res., 28, 250-253, 2000). It is possible to search for binding sites in LigBase by a variety of criteria. LigBase reports summarize ligand data including relevant structural information from the PDB file, such as ligand type and size, and contain links to all related protein sequences in the TrEMBL database. Residues in the binding sites are graphically depicted for comparison with other structurally defined family members. LigBase provides a resource for the analysis of families of related binding sites.  相似文献   



The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins.  相似文献   

The Pfam protein families database   总被引:105,自引:12,他引:93  
Pfam is a large collection of protein multiple sequence alignments and profile hidden Markov models. Pfam is available on the WWW in the UK at http://www.sanger.ac.uk/Software/Pfam/, in Sweden at http://www.cgr.ki.se/Pfam/ and in the US at http://pfam.wustl.edu/. The latest version (4.3) of Pfam contains 1815 families. These Pfam families match 63% of proteins in SWISS-PROT 37 and TrEMBL 9. For complete genomes Pfam currently matches up to half of the proteins. Genomic DNA can be directly searched against the Pfam library using the Wise2 package.  相似文献   

The PRINTS database: a resource for identification of protein families   总被引:4,自引:0,他引:4  
The PRINTS database houses a collection of protein fingerprints, which may be used to assign family and functional attributes to uncharacterised sequences, such as those currently emanating from the various genome-sequencing projects. The April 2002 release includes 1,700 family fingerprints, encoding approximately 10,500 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. Fingerprints are groups of conserved motifs that, taken together, provide diagnostic protein family signatures. They derive much of their potency from the biological context afforded by matching motif neighbours; this makes them at once more flexible and powerful than single-motif approaches. The technique further departs from other pattern-matching methods by readily allowing the creation of fingerprints at superfamily-, family- and subfamily-specific levels, thereby allowing more fine-grained diagnoses. Here, we provide an overview of the method of protein fingerprinting and how the results of fingerprint analyses are used to build PRINTS and its relational cousin, PRINTS-S.  相似文献   

TIGRFAMs is a collection of manually curated protein families consisting of hidden Markov models (HMMs), multiple sequence alignments, commentary, Gene Ontology (GO) assignments, literature references and pointers to related TIGRFAMs, Pfam and InterPro models. These models are designed to support both automated and manually curated annotation of genomes. TIGRFAMs contains models of full-length proteins and shorter regions at the levels of superfamilies, subfamilies and equivalogs, where equivalogs are sets of homologous proteins conserved with respect to function since their last common ancestor. The scope of each model is set by raising or lowering cutoff scores and choosing members of the seed alignment to group proteins sharing specific function (equivalog) or more general properties. The overall goal is to provide information with maximum utility for the annotation process. TIGRFAMs is thus complementary to Pfam, whose models typically achieve broad coverage across distant homologs but end at the boundaries of conserved structural domains. The database currently contains over 1600 protein families. TIGRFAMs is available for searching or downloading at www.tigr.org/TIGRFAMs.  相似文献   

We describe a database of protein structure alignments for homologous families. The database HOMSTRAD presently contains 130 protein families and 590 aligned structures, which have been selected on the basis of quality of the X-ray analysis and accuracy of the structure. For each family, the database provides a structure-based alignment derived using COMPARER and annotated with JOY in a special format that represents the local structural environment of each amino acid residue. HOMSTRAD also provides a set of superposed atomic coordinates obtained using MNYFIT, which can be viewed with a graphical user interface or used for comparative modeling studies. The database is freely available on the World Wide Web at: http://www-cryst.bioc.cam. ac.uk/-homstrad/, with search facilities and links to other databases.  相似文献   

PALI is a database of structure-based sequence alignments and phylogenetic relationships derived on the basis of three-dimensional structures of homologous proteins. This database enables grouping of pairs of homologous protein structures on the basis of their sequence identity calculated from the structure-based alignment and PALI also enables association of a new sequence to a family and automatic generation of a dendrogram combining the query sequence and homologous protein structures.  相似文献   

Large-scale genome projects generate an unprecedented number of protein sequences, most of them are experimentally uncharacterized. Predicting the 3D structures of sequences provides important clues as to their functions. We constructed the Genomes TO Protein structures and functions (GTOP) database, containing protein fold predictions of a huge number of sequences. Predictions are mainly carried out with the homology search program PSI-BLAST, currently the most popular among high-sensitivity profile search methods. GTOP also includes the results of other analyses, e.g. homology and motif search, detection of transmembrane helices and repetitive sequences. We have completed analyzing the sequences of 41 organisms, with the number of proteins exceeding 120 000 in total. GTOP uses a graphical viewer to present the analytical results of each ORF in one page in a ‘color-bar’ format. The assigned 3D structures are presented by Chime plug-in or RasMol. The binding sites of ligands are also included, providing functional information. The GTOP server is available at http://spock.genes.nig.ac.jp/~genome/gtop.html.  相似文献   

Infectious diseases are a major threat to global public health and prosperity. The causative agents consist of a suite of pathogens, ranging from bacteria to viruses, including fungi, helminthes and protozoa. Although these organisms are extremely varied in their biological structure and interactions with the host, they share similar methods of evading the host immune system. Antigenic variation and drift are mechanisms by which pathogens change their exposed epitopes while maintaining protein function. Accordingly, these traits enable pathogens to establish chronic infections in the host. The varDB database was developed to serve as a central repository of protein and nucleotide sequences as well as associated features (e.g. field isolate data, clinical parameters, etc.) involved in antigenic variation. The data currently contained in varDB were mined from GenBank as well as multiple specialized data repositories (e.g. PlasmoDB, GiardiaDB). Family members and ortholog groups were identified using a hierarchical search strategy, including literature/author-based searches and HMM profiles. Included in the current release are>29,00 sequences from 39 gene families from 25 different pathogens. This resource will enable researchers to compare antigenic variation within and across taxa with the goal of identifying common mechanisms of pathogenicity to assist in the fight against a range of devastating diseases. AVAILABILITY: varDB is freely accessible at http://www.vardb.org/  相似文献   

MOTIVATION: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. RESULTS: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5'-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects. AVAILABILITY: MASIA WEB site: http://www.scsb.utmb.edu/masia/masia.html SUPPLEMENTARY INFORMATION: The dendrogram of 42 APE sequences used to derive motifs is available on http://www.scsb.utmb.edu/comp_biol.html/DNA_repair/publication.html  相似文献   

Electrostatic interactions play a key role in enzyme catalytic function. At long range, electrostatics steer the incoming ligand/substrate to the active site, and at short distances, electrostatics provide the specific local interactions for catalysis. In cases in which electrostatics determine enzyme function, orthologs should share the electrostatic properties to maintain function. Often, electrostatic potential maps are employed to depict how conserved surface electrostatics preserve function. We expand on previous efforts to explain conservation of function, using novel electrostatic sequence and structure analyses of four enzyme families and one enzyme superfamily. We show that the spatial charge distribution is conserved within each family and superfamily. Conversely, phylogenetic analysis of key electrostatic residues provide the evolutionary origins of functionality.  相似文献   

GlycoSuiteDB is a relational database that curates information from the scientific literature on glyco-protein derived glycan structures, their biological sources, the references in which the glycan was described and the methods used to determine the glycan structure. To date, the database includes most published O:-linked oligosaccharides from the last 50 years and most N:-linked oligosaccharides that were published in the 1990s. For each structure, information is available concerning the glycan type, linkage and anomeric configuration, mass and composition. Detailed information is also provided on native and recombinant sources, including tissue and/or cell type, cell line, strain and disease state. Where known, the proteins to which the glycan structures are attached are reported, and cross-references to the SWISS-PROT/TrEMBL protein sequence databases are given if applicable. The GlycoSuiteDB annotations include literature references which are linked to PubMed, and detailed information on the methods used to determine each glycan structure are noted to help the user assess the quality of the structural assignment. GlycoSuiteDB has a user-friendly web interface which allows the researcher to query the database using mono-isotopic or average mass, monosaccharide composition, glycosylation linkages (e.g. N:- or O:-linked), reducing terminal sugar, attached protein, taxonomy, tissue or cell type and GlycoSuiteDB accession number. Advanced queries using combinations of these parameters are also possible. GlycoSuiteDB can be accessed on the web at http://www.glycosuite.com.  相似文献   

The ProDom database of protein domain families.   总被引:11,自引:1,他引:11       下载免费PDF全文
F Corpet  J Gouzy    D Kahn 《Nucleic acids research》1998,26(1):323-326
The ProDom database contains protein domain families generated from the SWISS-PROT database by automated sequence comparisons. It can be searched on the World Wide Web (http://protein.toulouse.inra. fr/prodom.html ) or by E-mail (prodom@toulouse.inra.fr) to study domain arrangements within known families or new proteins. Strong emphasis has been put on the graphical user interface which allows for interactive analysis of protein homology relationships. Recent improvements to the server include: ProDom search by keyword; links to PROSITE and PDB entries; more sensitive ProDom similarity search with BLAST or WU-BLAST; alignments of query sequences with homologous ProDom domain families; and links to the SWISS-MODEL server (http: //www.expasy.ch/swissmod/SWISS-MODEL.html ) for homology based 3-D domain modelling where possible.  相似文献   

Holins are small “hole-forming” transmembrane proteins that mediate bacterial cell lysis during programmed cell death or following phage infection. We have identified fifty two families of established or putative holins and have included representative members of these proteins in the Transporter Classification Database (TCDB; www.tcdb.org). We have identified the organismal sources of members of these families, calculated their average protein sizes, estimated their topologies and determined their relative family sizes. Topological analyses suggest that these proteins can have 1, 2, 3 or 4 transmembrane α-helical segments (TMSs), and members of a single family are frequently, but not always, of a single topology. In one case, proteins of a family proved to have either 2 or 4 TMSs, and the latter arose by intragenic duplication of a primordial 2 TMS protein-encoding gene resembling the former. Using established statistical approaches, some of these families have been shown to be related by common descent. Seven superfamilies, including 21 of the 52 recognized families were identified. Conserved motif and Pfam analyses confirmed most superfamily assignments. These results serve to expand upon the scope of channel-forming bacterial holins.  相似文献   

The CATH database of domain structures has been used to explore the structural variation of homologous domains in 294 well populated domain structure superfamilies, each containing at least three sequence diverse relatives. Our analyses confirm some previously detected trends relating sequence divergence to structural variation but for a much larger dataset and in some superfamilies the new data reveal exceptional structural variation. Use of a new algorithm (2DSEC) to analyse variability in secondary structure compositions across a superfamily sheds new light on how structures evolve. 2DSEC detects inserted secondary structures that embellish the core of conserved secondary structures found throughout the superfamily. Analysis showed that for 56% of highly populated superfamilies (>9 sequence diverse relatives), there are twofold or more increases in the numbers of secondary structures in some relatives. In some families fivefold increases occur, sometimes modifying the fold of the domain. Manual inspection of secondary structure insertions or embellishments in 48 particularly variable superfamilies revealed that although these insertions were usually discontiguous in the sequence they were often co-located in 3D resulting in a larger structural motif that often modified the geometry of the active site or the surface conformation promoting diverse domain partnerships and protein interactions. These observations, supported by automatic analysis of all well populated CATH families, suggest that accretion of small secondary structure insertions may provide a simple mechanism for evolving new functions in diverse relatives. Some layered domain architectures (e.g. mainly-beta and alpha-beta sandwiches) that recur highly in the genomes more frequently exploit these types of embellishments to modify function. In these architectures, aggregation occurs most often at the edges, top or bottom of the beta-sheets. Information on structural variability across domain superfamilies has been made available through the CATH Dictionary of Homologous Structures (DHS).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号