首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Three-dimensional structures are now known for roughly half of all protein families. It is thus quite likely, in searching sequence databases, that one will encounter a homolog with known structure and be able to use this information to infer structure-function properties. The goal of Entrez's 3D structure database is to make this information accessible and useful to molecular biologists. To this end, Entrez's search engine provides three powerful features: (i) Links between databases; one may search by term matching in Medline((R)), for example, and link to 3D structures reported in these articles. (ii) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view a combined molecular-graphic and alignment display, to infer approximate 3D structure. Entrez's MMDB (Molecular Modeling DataBase) may be accessed at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html  相似文献   

2.
Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure.  相似文献   

3.
MMDB: Entrez's 3D structure database.   总被引:5,自引:1,他引:4       下载免费PDF全文
The three dimensional structures for representatives of nearly half of all protein families are now available in public databases. Thus, no matter which protein one investigates, it is increasingly likely that the 3D structure of a homolog will be known and may reveal unsuspected structure-function relationships. The goal of Entrez's 3D-structure database is to make this information accessible and usable by molecular biologists (http://www.ncbi.nlm.nih.gov/Entrez). To this end Entrez provides two major analysis tools, a search engine based on sequence and structure 'neighboring' and an integrated visualization system for sequence and structure alignments. From a protein's sequence 'neighbors' one may rapidly identify other members of a protein family, including those where 3D structure is known. By comparing aligned sequences and/or structures in detail, using the visualization system, one may identify conserved features and perhaps infer functional properties. Here we describe how these analysis tools may be used to investigate the structure and function of newly discovered proteins, using the PTEN gene product as an example.  相似文献   

4.
The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.  相似文献   

5.
Software to automate the process of extracting molecular interactions from three-dimensional (3D) structures has been developed that records these as Biomolecular Interaction Network Database (BIND) pairwise interaction records. Full annotation of BIND records is provided through a database processing tool called MMDBind, including detailed atom-atom and residue-residue level interaction information. BIND three-dimensional interaction annotation is synthesized by combining information from the Molecular Modeling Database (MMDB), and the HET (heterogen) group dictionary of small molecules in the macromolecular Crystallographic Information Format (mmCIF). Interactions are validated using the Protein Quaternary Structure (PQS) system. A total of 18,166 interactions were removed as being redundant or biologically irrelevant after PQS validation. This first pass MMDBind annotation creates two new divisions of BIND, 3D Biopolymers (BIND-3DBP) comprising 16,737 initial interaction records, and 3D Small Molecules (BIND-3DSM) comprising 48,219 records. Visualization of interacting residues and nucleotides within a macromolecular structure is possible directly from the BIND database owing to added 3D feature annotation within the BIND records that can be conveniently seen using Cn3D ("see-in-3D") after query from the BIND Data Manager. These interaction records provide a further demonstration of the completeness of the BIND data specification and its capabilities as storage and exchange format for all kinds of molecular interactions, including RNA, DNA, protein, and small molecules. Data from the 3DBP and 3DSM sets are available for downloading in Abstract Syntax Notation.1 (ASN.1) or Extensible Markup Language (XML) formats at ftp://ftp.bind.ca/DB/MMDBBind. Data from the 3DBP set is available for interactive query from the BIND Data Manager at www.bind.ca.  相似文献   

6.
HCVDB   总被引:2,自引:0,他引:2  
To date, more than 30 000 hepatitis C virus (HCV) sequences have been deposited in the generalist databases DNA Data Bank of Japan (DDBJ), EMBL Nucleotide Sequence Database (EMBL) and GenBank. The main difficulties with HCV sequences in these databases are their retrieval, annotation and analyses. To help HCV researchers face the increasing needs of HCV sequence analyses, we developed a specialised database of computer-annotated HCV sequences, called HCVDB. HCVDB is re-built every month from an up-to-date EMBL database by an automated process. HCVDB provides key data about the HCV sequences (e.g. genotype, genomic region, protein names and functions, known 3-dimensional structures) and ensures consistency of the annotations, which enables reliable keyword queries. The database is highly integrated with sequence and structure analysis tools and the SRS (LION bioscience) keywords query system. Thus, any user can extract subsets of sequences matching particular criteria or enter their own sequences and analyse them with various bioinformatics programs available on the same server. AVAILABILITY: HCVDB is available from http://hepatitis.ibcp.fr.  相似文献   

7.
iSPOT (http://cbm.bio.uniroma2.it/ispot) is a web tool developed to infer the recognition specificity of protein module families; it is based on the SPOT procedure that utilizes information from position-specific contacts, derived from the available domain/ligand complexes of known structure, and experimental interaction data to build a database of residue-residue contact frequencies. iSPOT is available to infer the interaction specificity of PDZ, SH3 and WW domains. For each family of protein domains, iSPOT evaluates the probability of interaction between a query domain of the specified families and an input protein/peptide sequence and makes it possible to search for potential binding partners of a given domain within the SWISS-PROT database. The experimentally derived interaction data utilized to build the PDZ, SH3 and WW databases of residue-residue contact frequencies are also accessible. Here we describe the application to the WW family of protein modules.  相似文献   

8.
The Yeast Protein Database (YPD) is a database for the proteins of the budding yeast,Saccharomyces cerevisiae. YPD is the first annotated database for the complete proteome of any organism. Now that the complete genome sequence of yeast is available, YPD contains entries for each of the characterized proteins and for each of the uncharacterized proteins predicted from the sequence. Contained in YPD are the calculated properties of each protein such as molecular weight and isoelectric point, experimentally determined properties such as subcellular localization and post-translational modifications, and extensive annotations from the yeast literature. YPD contains 25 000 lines of textual annotation that describe the known functions, mutant phenotypes, interactions, and other properties for the approximately 6000 proteins in the yeast proteome. The information in YPD is updated daily, and it is available on the World Wide Web at http://www.proteome.com/YPDhome.html .  相似文献   

9.
10.

Background

Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity.

Results

To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine.

Conclusion

This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.  相似文献   

11.
12.
13.
Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15–25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at .  相似文献   

14.
A procedure that automatically provides an evaluation of thediagnostic ability of a protein sequence functional patternis described. The procedure relies on the identification ofthe closest definable set in terms of a (protein sequence) databasefunctional annotation to the set of database instances containinga given pattern. Assuming annotation correctness and completenessin the protein sequence database, the degree of statisticalassociation between these sets provides an appropriate measureof the diagnostic ability of the pattern. An experimental implementationof the procedure, using the NBRF/PIR protein database, has beenapplied to a diverse collection of published sequence patterns.Results obtained reveal that frequently it is not possible todefine (in NBRF/PIR database terminology) the set of databaseinstances containing a given pattern, suggesting either lackof pattern diagnostic ability or protein database annotationincompleteness and/or inconsistencies. Received on November 30, 1989; accepted on July 20, 1990  相似文献   

15.
Increasingly large numbers of proteins require methods for functional annotation. This is typically based on pairwise inference from the homology of either protein sequence or structure. Recently, similarity networks have been presented to leverage both the ability to visualize relationships between proteins and assess the transferability of functional inference. Here we present PANADA, a novel toolkit for the visualization and analysis of protein similarity networks in Cytoscape. Networks can be constructed based on pairwise sequence or structural alignments either on a set of proteins or, alternatively, by database search from a single sequence. The Panada web server, executable for download and examples and extensive help files are available at URL: http://protein.bio.unipd.it/panada/.  相似文献   

16.
WebACT--an online companion for the Artemis Comparison Tool   总被引:4,自引:0,他引:4  
SUMMARY: WebACT is an online resource which enables the rapid provision of simultaneous BLAST comparisons between up to five genomic sequences in a format amenable for visualization with the well-known Artemis Comparison Tool (ACT). Comparisons can be generated on-the-fly using sequences directly retrieved via EMBL database queries, or by entering or uploading user sequences. Furthermore, pre-computed comparisons are available between all publicly available, completed prokaryotic genomes and plasmids currently contained within the Genome Reviews database (372 sequences, representing 175 different species). The system is designed to minimize the volume of downloaded data and maximize performance. Genome sequences, annotation and pre-computed comparisons are stored in a relational database allowing flexible querying based on user-defined sequence regions, from whole genome to a defined region flanking a specified gene. Comparison and sequence files, whether computed online or retrieved from the database of pre-computed genome comparisons, can be viewed online using ACT and are available for download. AVAILABILITY: Freely accessible at http://www.webact.org. SUPPLEMENTARY INFORMATION: User guide and worked examples are available at http://www.webact.org/WebACT/docs.  相似文献   

17.
MOTIVATION: There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs. RESULTS: We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not <10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (<10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available. AVAILABILITY: The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/  相似文献   

18.
PartiGene--constructing partial genomes   总被引:4,自引:0,他引:4  
Expressed sequence tags (ESTs) offer a low-cost approach to gene discovery and are being used by an increasing number of laboratories to obtain sequence information for a wide variety of organisms. The challenge lies in processing and organizing this data within a genomic context to facilitate large scale analyses. Here we present PartiGene, an integrated sequence analysis suite that uses freely available public domain software to (1) process raw trace chromatograms into sequence objects suitable for submission to dbEST; (2) place these sequences within a genomic context; (3) perform customizable first-pass annotation of the data; and (4) present the data as HTML tables and an SQL database resource. PartiGene has been used to create a number of non-model organism database resources including NEMBASE (http://www.nematodes.org) and LumbriBase (http://www.earthworms.org/). The packages are readily portable, freely available and can be run on simple Linux-based workstations. AVAILABILITY: PartiGene is available from http://www.nematodes.org/PartiGene and also forms part of the EST analysis software, associated with the Natural Environmental Research Council (UK) Bio-Linux project (http://envgen.nox.ac.uk/biolinux.html).  相似文献   

19.

Background

Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals.

Results

We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog).

Conclusions

We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at http://rth.dk/resources/rnannotator/susscr102/version1.02.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号