首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
EXProt (database for EXPerimentally verified Protein functions) is a new non-redundant database containing protein sequences for which the function has been experimentally verified. It is a selection of 3976 entries from the Prokaryotes section of the EMBL Nucleotide Sequence Database, Release 66, and 375 entries from the Pseudomonas Community Annotation Project (PseudoCAP). The entries in EXProt all have a unique ID number and provide information about the organism, protein sequence, functional annotation, link to entry in original database, and if known, gene name and link to references in PubMed/Medline. The EXProt web page (http://www.cmbi.nl/EXProt) provides further details of the database and a link to a BLAST search (blastp & blastx) of the database. The EXProt entries are indexed in SRS (http://www.cmbi.nl/srs/) and can be searched by means of keywords. Authors can be reached by email (exprot(cmbi.kun.nl).  相似文献   

2.
EXProt is a non-redundant protein database containing a selection of entries from genome annotation projects and public databases, aimed at including only proteins with an experimentally verified function. In EXProt release 2.0 we have collected entries from the Pseudomonas aeruginosa community annotation project (PseudoCAP), the Escherichia coli genome and proteome database (GenProtEC) and the translated coding sequences from the Prokaryotes division of EMBL nucleotide sequence database, which are described as having an experimentally verified function. Each entry in EXProt has a unique ID number and contains information about the species, amino acid sequence, functional annotation and, in most cases, links to references in MEDLINE/PubMed and to the entry in the original database. EXProt is indexed in SRS at CMBI (http://www.cmbi.kun.nl/srs/) and can be searched with BLAST and FASTA through the EXProt web page (http://www.cmbi.kun.nl/EXProt/).  相似文献   

3.
SRS (Sequence Retrieval System) is a widely used keyword search engine for querying biological databases. BLAST2 is the most widely used tool to query databases by sequence similarity search. These tools allow users to retrieve sequences by shared keyword or by shared similarity, with many public web servers available. However, with the increasingly large datasets available it is now quite common that a user is interested in some subset of homologous sequences but has no efficient way to restrict retrieval to that set. By allowing the user to control SRS from the BLAST output, BLAST2SRS (http://blast2srs.embl.de/) aims to meet this need. This server therefore combines the two ways to search sequence databases: similarity and keyword.  相似文献   

4.
EMBL-Search: a CD-ROM based database query system   总被引:1,自引:0,他引:1  
This paper describes a system of generally applicable indexfiles provided on the EMBL sequence databases CD–ROM tofacilitate the development offronz–end software to thesequence databases available on this CD–ROM. The indexfiles are used by a new versatile and user–friendly databaseretrieval program for the Apple Macintosh, EMBL–Search,which allows the easy construction of complex database queries.EMBL–Search utilizes cross–reference informationcontained in the databases to support navigation between differentinformation resources. The ability to run EMBL–Searchon a local computer network accessing a shared database CD–ROMmakes its use particularly cost effective.  相似文献   

5.
Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.  相似文献   

6.
7.
We have written two programs for searching biological sequencedatabases that run on Intel hypercube computers. PSCANLJB comparesa single sequence against a sequence library, and PCOMPLIB comparesall the entries in one sequence library against a second library.The programs provide a general framework for similarity searching;they include functions for reading in query sequences, searchparameters and library entries, and reporting the results ofa search. We have isolated the code for the specific functionthat calculates the similarity score between the query and librarysequence; alternative searching algorithms can be implementedby editing two files. We have implemented the rapid FASTA sequencecomparison algorithm and the more rigorous Smith — Watermanalgorithm within this framework. The PSCANLIB program on a 16node iPSC/2 80386-based hypercube can compare a 229 amino acidprotein sequence with a 3.4 million residue sequence libraryin {small tilde}16s with the FASTA algorithm. Using the Smith— Waterman algorithm, the same search takes 35 min. ThePCOMPUB program can compare a 0.8 millon amino acid proteinsequence library with itself in 5.3 min with FASTA on a third-generation32 node Intel iPSC/860 hypercube. Received on September 8, 1990; accepted on December 15, 1990  相似文献   

8.

Background  

The MEDLINE database contains over 12 million references to scientific literature, with about 3/4 of recent articles including an abstract of the publication. Retrieval of entries using queries with keywords is useful for human users that need to obtain small selections. However, particular analyses of the literature or database developments may need the complete ranking of all the references in the MEDLINE database as to their relevance to a topic of interest. This report describes a method that does this ranking using the differences in word content between MEDLINE entries related to a topic and the whole of MEDLINE, in a computational time appropriate for an article search query engine.  相似文献   

9.
Although a post-genomic era is emerging for many plants, the bacterial artificial chromosome (BAC) library is still a valuable tool for genomic studies and preservation of precious genetic resources. Construction of non-gridded BAC libraries would dramatically reduce cost and save storage space. A non-gridded BAC library composed of approximately 96,000 insert-containing clones in 80 pools with an average insert size of 75 kb was constructed. This library represented 5.2 genome equivalents. We successfully developed a unique procedure to retrieve positive clones from the non-gridded pools. With this retrieving protocol, the non-gridded library system can be adapted to different species and to serve various research needs.  相似文献   

10.
Database scanning programs such as BLAST and FASTA are used nowadays by most biologists for the post-genomic processing of DNA or protein sequence information (in particular to retrieve the structure/function of uncharacterized proteins). Unfortunately, their results can be polluted by identical alignments (called redundancies) coming from the same protein or DNA sequences present in different entries of the database. This makes the efficient use of the listed alignments difficult. Pretreatment of databases has been proposed to suppress strictly identical entries. However, there still remain many identical alignments since redundancies may occur locally for entries corresponding to various fragments of the same sequence or for entries corresponding to very homologous sequences but differing at the level of a few residues such as ortholog proteins. In the present work, we show that redundant alignments can be indeed numerous even when working with a pretreated non-redundant data bank, going as high as 60% of the output results according to the query and the bank. Therefore the accuracy and the efficiency of the post-genomic work will be greatly increased if these redundancies are removed. To solve this up to now unaddressed problem, we have developed an algorithm that allows for the efficient and safe suppression of all the redundancies with no loss of information. This algorithm is based on various filtering steps that we describe here in the context of the Automat similarity search program, and such an algorithm should also be added to the other similarity search programs (BLAST, FASTA, etc...).  相似文献   

11.
We have characterized at the molecular level seven chromosome-specific libraries constructed in phage lambda Charon 21A from flow-sorted human chromosomes. The purity of libraries prepared from chromosomes sorted from hamster X human cells was estimated by species-specific hybridization and ranged from 48% to 83% of clones containing human inserts. Among libraries of chromosomes from human cells, mass screenings were made for repetitive sequences and 20 clones from the #18 and #20 libraries were analyzed in detail. Ten to fifteen percent of all clones contain sequences which can be mapped; 80-100% of these derive from the intended chromosome of origin, demonstrating very high purity and a 35 X enrichment of chromosome-specific sequences over a total genomic library. The two libraries contain a high, though dissimilar, percent of repeat-containing clones; the #18 library has 55% repetitive clones and the #20 library 85%. This dissimilarity may be due to a difference in insert size distribution, since the #18 library has smaller inserts than the #20. This could be caused by variation in extent of digestion of insert DNA and/or differences in sequence organization between the two chromosomes. A method more sensitive than conventional plaque-lift screening was used to detect repetitive inserts; in this way nearly all repetitive clones could be eliminated before purification of their DNAs.  相似文献   

12.
The red alga Porphyra purpurea (Roth) C. Agardh has a life cycle that alternates between shell-boring, filamentous sporophytes and free-living, foliose gametophytes. The significant morphological differences between these two phases suggest that many genes should be developmentally regulated and expressed in a phase-specific manner. In this study, we prepared and screened subtracted complementary DNA (cDNA) libraries specific for the sporophyte and gametophyte of P. purpurea. This involved the construction of cDNA libraries from each phase, followed by the removal of common clones through subtractive hybridization. Sampling of the subtracted libraries indicated that 8–10% of the recombinant colonies in each library were specific for the appropriate phase. Of 20 putative phase-specific cDNAs selected from each subtracted library, eight unique clones were obtained for the sporophyte and seven for the gametophyte. After confirming their phase-specificities by hybridization to gametophyte and sporophyte messenger RNA, these 15 phase-specific cDNAs were sequenced, and the deduced amino acid sequences were used to search protein databanks. Two proteins encoded by the sporophyte-specific cDNAs and two by the gametophyte-specific cDNAs were identified by their similarity to databank entries.  相似文献   

13.
HGVbase (Human Genome Variation database; http://hgvbase.cgb.ki.se, formerly known as HGBASE) is an academic effort to provide a high quality and non-redundant database of available genomic variation data of all types, mostly comprising single nucleotide polymorphisms (SNPs). Records include neutral polymorphisms as well as disease-related mutations. Online search tools facilitate data interrogation by sequence similarity and keyword queries, and searching by genome coordinates is now being implemented. Downloads are freely available in XML, Fasta, SRS, SQL and tagged-text file formats. Each entry is presented in the context of its surrounding sequence and many records are related to neighboring human genes and affected features therein. Population allele frequencies are included wherever available. Thorough semi-automated data checking ensures internal consistency and addresses common errors in the source information. To keep pace with recent growth in the field, we have developed tools for fully automated annotation. All variants have been uniquely mapped to the draft genome sequence and are referenced to positions in EMBL/GenBank files. Data utility is enhanced by provision of genotyping assays and functional predictions. Recent data structure extensions allow the capture of haplotype and genotype information, and a new initiative (along with BiSC and HUGO-MDI) aims to create a central repository for the broad collection of clinical mutations and associated disease phenotypes of interest.  相似文献   

14.
Active Sequences Collection (ASC) is a collection of amino acid sequences, with an unique feature: only short sequences are collected, with a demonstrated biological activity. The current version of ASC consists of three sections: DORRS, a collection of active RGD-containing peptides; TRANSIT, a collection of protein regions active as substrates of transglutaminase enzyme (TGase), and BAC, a collection of short peptides with demonstrated biological activity. Literature references for each entry are reported, as well as cross references to other databases, when available. The current version of ASC includes more than 800 different entries. The main scope of this collection is to offer a new tool to investigate the structural features of protein active sites, additionally to similarity searches against large protein databases or searching for known functional patterns. ASC database is available at the web address http://crisceb.unina2.it/ASC/ which also offers a dedicated query interface to compare user-defined protein sequences with the database, as well as an updating interface to allow contribution of new referenced active sequences.  相似文献   

15.
16.
Fly larvae living on dead corpses can be used to estimate post-mortem intervals. The identification of these flies is decisive in forensic casework and can be facilitated by using DNA barcodes provided that a representative and comprehensive reference library of DNA barcodes is available.We constructed a local (Belgium and France) reference library of 85 sequences of the COI DNA barcode fragment (mitochondrial cytochrome c oxidase subunit I gene), from 16 fly species of forensic interest (Calliphoridae, Muscidae, Fanniidae). This library was then used to evaluate the ability of two public libraries (GenBank and the Barcode of Life Data Systems – BOLD) to identify specimens from Belgian and French forensic cases. The public libraries indeed allow a correct identification of most specimens. Yet, some of the identifications remain ambiguous and some forensically important fly species are not, or insufficiently, represented in the reference libraries. Several search options offered by GenBank and BOLD can be used to further improve the identifications obtained from both libraries using DNA barcodes.  相似文献   

17.
Serial analysis of gene expression (SAGE) is a powerful quantification technique for gene expression data. The huge amount of tag data in SAGE libraries of samples is difficult to analyze with current SAGE analysis tools. Data is often not provided in a biologically significant way for cross‐analysis and ‐comparison, thus limiting its application. Hence, an integrated software platform that can perform such a complex task is required. Here, we implement set theory for cross‐analyzing gene expression data among different SAGE libraries of tissue sources; up‐ or down‐regulated tissue‐specific tags can be identified computationally. Extract‐SAGE employs a genetic algorithm (GA) to reduce the number of genes among the SAGE libraries. Its representative tag mining will facilitate the discovery of the candidate genes with discriminating gene expression.  相似文献   

18.
Most current implementations of motif matching in biologicalsequences have sacrificed the generality of weight matrix scoringfor shorter runfimes. The program MOTIF incorporates a weightmatrix and a rapid, backtracking tree–search algorithmto score motif compliance with greatly enhanced performancewhile placing no constraints on the motif in addition, any positionswithin a motif can be marked as ‘inviolate’, therebyrequiring an exact match. MOTIF allows a choice of regular expressionformats and can use both motif and sequence libraries as eithertargets or queries. Nucleic acid sequences can optionally betranslated by MOTIF in any frame(s) and used against peptidemotifs.  相似文献   

19.
Combinatorial peptide libraries have been playing a major role in the search for new drugs, ligands, enzyme substrates, and other specifically interacting molecules. The principal features of these libraries require a versatile repertoire, an easily identifiable tag for each of the library members, a simple method of synthesis, and a compability with the biochemical milieu. Two types of combinatorial libraries are in use: synthetic libraries and biological (mainly phage display) ones. An advantage of the biological libraries is due to the ability of each of the library members to replicate itself and to the fact that they carry their own coding sequences. The uniqueness of filamentous phage is that of its five virion proteins, three can tolerate the insertion of foreign peptides, each in a distinctive manner. The major coat protein, pVIII, is capable of displaying hundreds of peptide copies over the phage virion, pIII can display either one or five copies, and pVI, as opposed to the first two, displays its peptides such that the carboxy terminus is oriented outward. A major drawback of filamentous phage is its size. The length of an intact phage particle is 930 nm and it contains an ssDNA of 6400 bp. 2800 copies of the major coat protein form a “fish scale” cover over most of the virion DNA, whereas five copies of pIII, which has been the major protein used for library display, and five copies of pVI are located at one end of the filamentous virion. There is no doubt that in order to improve the quality of filamentous phage libraries, the size of phage should be drastically reduced. Comprehensive research on the phage life cycle and its structure will lead us to the construction of miniature phage and to other methods that will enable an in vivo expanding of the library repertoire as well as to binding-induced specific clone-proliferation.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号