首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
HSSP (http: //www.sander.embl-ebi.ac.uk/hssp/) is a derived database merging structure (3-D) and sequence (1-D) information. For each protein of known 3D structure from the Protein Data Bank (PDB), we provide a multiple sequence alignment of putative homologues and a sequence profile characteristic of the protein family, centered on the known structure. The list of homologues is the result of an iterative database search in SWISS-PROT using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed putative homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database not only provides aligned sequence families, but also implies secondary and tertiary structures covering 33% of all sequences in SWISS-PROT.  相似文献   

2.
The HSSP database of protein structure-sequence alignments.   总被引:4,自引:0,他引:4       下载免费PDF全文
HSSP is a derived database merging structural (3-D) and sequence (1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in SwissProt using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 29% of all SwissProt-stored sequences.  相似文献   

3.
The HSSP database of protein structure-sequence alignments.   总被引:2,自引:0,他引:2       下载免费PDF全文
HSSP is a derived database merging structural three dimensional (3-D) and sequence one dimensional(1-D) information. For each protein of known 3-D structure from the Protein Data Bank (PDB), the database has a multiple sequence alignment of all available homologues and a sequence profile characteristic of the family. The list of homologues is the result of a database search in Swissprot using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). The database is updated frequently. The listed homologues are very likely to have the same 3-D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of aligned sequence families, but also a database of implied secondary and tertiary structures covering 27% of all Swissprot-stored sequences.  相似文献   

4.
SUMMARY: PDZBase is a database that aims to contain all known PDZ-domain-mediated protein-protein interactions. Currently, PDZBase contains approximately 300 such interactions, which have been manually extracted from > 200 articles. The database can be queried through both sequence motif and keyword-based searches, and the sequences of interacting proteins can be visually inspected through alignments (for the comparison of several interactions), or as residue-based diagrams including schematic secondary structure information (for individual complexes).  相似文献   

5.
A quantitative biochronological study by Cody et al. (2008) integrates comprehensive diatom biostratigraphy, magnetostratigraphy, and tephrostratigraphy from 32 Neogene sections around the Southern Ocean and Antarctic continental margin. A recent method, known as Constrained Optimization (CONOP), which can be viewed as a multidimensional version of graphic correlation, is applied to that very interesting database.The goal of the present paper is to discuss some theoretical aspects of quantitative biochronology and to compare the constrained optimization with the deterministic method called Unitary Associations (UAM), a graph theoretical model. We illustrate the fact that the UAM is an extremely powerful and unique theory allowing an in-depth analysis of the internal conflicting inter-taxon stratigraphic relationships, inherent to any complex biostratigraphical database.  相似文献   

6.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

7.
The merozoite is the invasive form of the asexual stage of Plasmodium species. At least two polymorphic glycoproteins have been found on its surface in the human malaria parasite Plasmodium falciparum. The best-characterized of these is known as merozoite surface antigen-1 (MSA1) (185-200 kDa) (Ref. 1). Similar molecules are found in other malaria species. The other merozoite surface antigen, MSA2 (35-48 kDa) (Ref. 2), is distinct from MSA1 but is equally polymorphic. In this review, Juan Cooper condenses the body of structural information on MSA1 known to date. A database compiled from MSA1 sequences from several species used together with sequence comparisons and predicted secondary structure reveals interesting features of this molecule.  相似文献   

8.
A representative cDNA library from mRNA obtained from lipopolysaccharide and concanavalin-A-induced head kidney cells of carp, Cyprinus carpio, was constructed. Two hundred single pass and partially sequenced clones (AU183343 to AU183542) were generated from expressed sequence tags (ESTs) and these were searched for homology in the DDBJ/GENBANK with blastN and blastX programs. Clones matching known genes were classified according to their function and distribution. One hundred and twenty-nine genes showed homology with known genes in databases, whereas 71 (35.5%) clones did not show any significant homology to sequences in the public database. Known genes also showed homology to fish genes deposited in the database. Twenty-two clones (11%), encoding 16 different sequences, were identified as putative biodefense and oncogenes, associated with an immune response. High expression of lysozyme (3%) was detected. Putatively identified biodefense-related sequences such as Lectin type 2, MHC class II invariant chain, mcl-1a and lysozyme were aligned with known homologues from the database and the percentage identity determined. A time course evaluation of gene expression due to mitogen stimulation by RT-PCR revealed the above mentioned gene homologues were switched on early during the cell proliferation.  相似文献   

9.
One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses.  相似文献   

10.
Peptide mass fingerprinting (PMF) has become one of the most widely used methods for rapid identification of proteins in proteomics research. Many peaks, however, remain unassigned after PMF analysis, partly because of post-translational modification and the limited scope of protein sequences. Almost all PMF tools employ only known or predicted protein sequences and do not include open reading frames (ORFs) in the genome, which eliminates the chance of finding novel functional peptides. Unlike most tools that search protein sequences from known coding sequences, the tool we developed uses a database for theoretical small ORFs (tsORFs) and a PMF application using a tsORFs database (tsORFdb). The tsORFdb is a database for ORFeome that encompasses all potential tsORFs derived from whole genome sequences as well as the predicted ones. The massProphet system tries to extend the search scope to include the ORFeome using the tsORFdb. The tsORFdb and massProphet should be useful for proteomics research to give information about unknown small ORFs as well as predicted and registered proteins.  相似文献   

11.
DNA sequences from orthologous loci can provide universal characters for taxonomic identification. Molecular taxonomy is of particular value for groups in which distinctive morphological features are difficult to observe or compare. To assist in species identification for the little known family Ziphiidae (beaked whales), we compiled a reference database of mitochondrial DNA (mtDNA) control region (437 bp) and cytochrome b (384 bp) sequences for all 21 described species in this group. This mtDNA database is complemented by a nuclear database of actin intron sequences (925 bp) for 17 of the 21 species. All reference sequences were derived from specimens validated by diagnostic skeletal material or other documentation, and included four holotypes. Phylogenetic analyses of mtDNA sequences confirmed the genetic distinctiveness of all beaked whale species currently recognized. Both mitochondrial loci were well suited for species identification, with reference sequences for all known ziphiids forming robust species-specific clades in phylogenetic reconstructions. The majority of species were also distinguished by nuclear alleles. Phylogenetic comparison of sequence data from "test" specimens to these reference databases resulted in three major taxonomic discoveries involving animals previously misclassified from morphology. Based on our experience with this family and the order Cetacea as a whole, we suggest that a molecular taxonomy should consider the following components: comprehensiveness, validation, locus sensitivity, genetic distinctiveness and exclusivity, concordance, and universal accessibility and curation.  相似文献   

12.
SCOP: a structural classification of proteins database   总被引:17,自引:0,他引:17  
  相似文献   

13.
Ricin is known as a potent toxin against animals. It consists of two chains, Ricin Toxin A (RTA) and Ricin Toxin B (RTB). The toxic effect is known to be caused by RTA. Inhibitors for RTA with less efficiency have been reported. Hence, it is of interest to identify new inhibitors. Virtual screening methods (computer aided drug designing) to find similar molecules in drug database were used for screening new inhibitors against RTA. We used the structure of RTA in complex with Pteroic acid (PDB code: 1BR6) as target molecule. Ligand based virtual screening approach was used in which the known inhibitory molecule Pteroic acid (PTA) served as a template to identify similar ligands from the ZINC database. These ligands were docked inside the binding pocket of RTA by using the MVD (Molegro Virtual Docker). This approach successfully identified six novel compounds. These docked ligands interacted with Asn78, Ala79, Val81, Gly121 and Ser176 amino acids, which are key residues of the RTA active site. Three compounds in particular, ZINC05156321 (6, 7 diphenylpteridin-4-ol), ZINC05156324 (6, 7-bis (3-fluorophenyl) pteridin-4-ol) and ZINC08555900 (6, 7-bis (4-fluorophenyl)-1H-pteridin-4-one), showed higher binding affinity in comparison to PTA, with high interaction energy, better space fitting and electrostatic interactions. These molecules should be tested for in vitro and in vivo activities in future for consideration as effective inhibitors.  相似文献   

14.
A database comprising all ligand-binding sites of known structure aligned with all related protein sequences and structures is described. Currently, the database contains approximately 50000 ligand-binding sites for small molecules found in the Protein Data Bank (PDB). The structure-structure alignments are obtained by the Combinatorial Extension (CE) program (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998) and sequence-structure alignments are extracted from the ModBase database of comparative protein structure models for all known protein sequences (Sanchez et al., Nucleic Acids Res., 28, 250-253, 2000). It is possible to search for binding sites in LigBase by a variety of criteria. LigBase reports summarize ligand data including relevant structural information from the PDB file, such as ligand type and size, and contain links to all related protein sequences in the TrEMBL database. Residues in the binding sites are graphically depicted for comparison with other structurally defined family members. LigBase provides a resource for the analysis of families of related binding sites.  相似文献   

15.
Liisa Holm  Chris Sander 《Proteins》1994,19(3):165-173
The number of protein structures known in atomic detail has increased from one in 1960 (Kendrew, J. C., Strandberg, B. E., Hart, R. G., Davies, D. R., Phillips, D. C., Shore, V. C. Nature (London) 185:422–427, 1960) to more than 1000 in 1994. The rate at which new structures are being published exceeds one a day as a result of recent advances in protein engineering, crystallography, and spectroscopy. More and more frequently, a newly determined structure is similar in fold to a known one, even when no sequence similarity is detectable. A new generation of computer algorithms has now been developed that allows routine comparison of a protein structure with the database of all known structures. Such structure database searches are already used daily and they are beginning to rival sequence database searches as a tool for discovering biologically interesting relationships. © 1994 Wiley-Liss, Inc.  相似文献   

16.
Helices in membrane spanning regions are more tightly packed than the helices in soluble proteins. Thus, we introduce a method that uses a simple scale of burial propensity and a new algorithm to predict transmembrane helical (TMH) segments and a positive-inside rule to predict amino-terminal orientation. The method (the topology predictor of transmembrane helical proteins using mean burial propensity [THUMBUP]) correctly predicted the topology of 55 of 73 proteins (or 75%) with known three-dimensional structures (the 3D helix database). This level of accuracy can be reached by MEMSAT 1.8 (a 200-parameter model-recognition method) and a new HMM-based method (a 111-parameter hidden Markov model, UMDHMM(TMHP)) if they were retrained with the 73-protein database. Thus, a method based on a physiochemical property can provide topology prediction as accurate as those methods based on more complicated statistical models and learning algorithms for the proteins with accurately known structures. Commonly used HMM-based methods and MEMSAT 1.8 were trained with a combination of the partial 3D helix database and a 1D helix database of TMH proteins in which topology information were obtained by gene fusion and other experimental techniques. These methods provide a significantly poorer prediction for the topology of TMH proteins in the 3D helix database. This suggests that the 1D helix database, because of its inaccuracy, should be avoided as either a training or testing database. A Web server of THUMBUP and UMDHMM(TMHP) is established for academic users at http://www.smbs.buffalo.edu/phys_bio/service.htm. The 3D helix database is also available from the same Web site.  相似文献   

17.
The HSSP database of protein structure-sequence alignments.   总被引:3,自引:0,他引:3       下载免费PDF全文
HSSP (homology-derived structures of proteins) is a derived database merging structural (2-D and 3-D) and sequence information (1-D). For each protein of known 3D structure from the Protein Data Bank, the database has a file with all sequence homologues, properly aligned to the PDB protein. Homologues are very likely to have the same 3D structure as the PDB protein to which they have been aligned. As a result, the database is not only a database of sequence aligned sequence families, but it is also a database of implied secondary and tertiary structures.  相似文献   

18.
KEGG: Kyoto Encyclopedia of Genes and Genomes.   总被引:14,自引:0,他引:14       下载免费PDF全文
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).  相似文献   

19.
MHCPEP, a database of MHC-binding peptides: update 1997.   总被引:10,自引:1,他引:10       下载免费PDF全文
MHCPEP (http://wehih.wehi.edu.au/mhcpep/) is a curated database comprising over 13 000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and where available, experimental method, observed activity, binding affinity, source protein and anchor positions, as well as publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW or FTP.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号