首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
Words appearing in abstracts of scientific articles are often useful as search terms, particularly those words and word patterns that are unique to the relevant field of endeavour. In view of the heightened interest in obtaining information about alternatives to animal testing, efforts directed toward enhancing retrieval of pertinent references from the biomedical literature are warranted. Words and phrases, and word-phrase co-occurrences describing methods of experimentation in abstracts about alternatives to skin-irritation testing in animals, were evaluated with regard to retrieval efficiency in the National Library of Medicine database, Toxline(. Precision of retrieval was defined as the number of pertinent references found in the total number of citations retrieved. Retrieval precision values ranged from 0.25% to 100%.  相似文献   

4.
5.
6.
GenBank.   总被引:19,自引:15,他引:19       下载免费PDF全文
D Benson  D J Lipman    J Ostell 《Nucleic acids research》1993,21(13):2963-2965
The GenBank sequence database has undergone an expansion in data coverage, annotation content and the development of new services for the scientific community. In addition to nucleotide sequences, data from the major protein sequence and structural databases, and from U.S. and European patents is now included in an integrated system. MEDLINE abstracts from published articles describing the sequences provide an important new source of biological annotation for sequence entries. In addition to the continued support of existing services, new CD-ROM and network-based systems have been implemented for literature retrieval and sequence similarity searching. Major releases of GenBank are now more frequent and the data are distributed in several new forms for both end users and software developers.  相似文献   

7.
The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI.  相似文献   

8.
9.
Automated extraction of information in molecular biology   总被引:3,自引:0,他引:3  
Andrade MA  Bork P 《FEBS letters》2000,476(1-2):12-17
We review data mining techniques in molecular biology, specifically those that extract information from the scientific literature itself. As more of the biological literature is published electronically, there is an opportunity, and even a need, to automatically summarize the literature in a customized way, for example by associating keywords to a topic. These keywords can be extracted from relevant publications. The process of keyword extraction can be automated and optimized to keep literature pointers automatically up-to-date or to filter relevant information from the literature. To illustrate these points, OMIM (Online Mendelian Inheritance in Man), a database of human inherited diseases, was linked to the literature and keywords were derived that covered distinct aspects such as genetic information on the one hand and disease-specific protein and phenotypic information on the other. They were used to extract information that is helpful for keeping entries about disease up-to-date.  相似文献   

10.
Perez-Iratxeta C  Keer HS  Bork P  Andrade MA 《BioTechniques》2002,32(6):1380-2, 1384-5
The increase of information in biology makes it difficult for researchers in any field to keep current with the literature. The MEDLINE database of scientific abstracts can be quickly scanned using electronic mechanisms. Potentially interesting abstracts can be selected by matching words joined by Boolean operators. However this means of selecting documents is not optimal. Nonspecific queries have to be effected, resulting in large numbers of irrelevant abstracts that have to be manually scanned To facilitate this analysis, we have developed a system that compiles a summary of subjects and related documents on the results of a MEDLINE query. For this, we have applied a fuzzy binary relation formalism that deduces relations between words present in a set of abstracts preprocessed with a standard grammatical tagger. Those relations are used to derive ensembles of related words and their associated subsets of abstracts. The algorithm can be used publicly at http:// www.bork.embl-heidelberg.de/xplormed/.  相似文献   

11.
Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.  相似文献   

12.
GlycoSuiteDB is a relational database that curates information from the scientific literature on glyco-protein derived glycan structures, their biological sources, the references in which the glycan was described and the methods used to determine the glycan structure. To date, the database includes most published O:-linked oligosaccharides from the last 50 years and most N:-linked oligosaccharides that were published in the 1990s. For each structure, information is available concerning the glycan type, linkage and anomeric configuration, mass and composition. Detailed information is also provided on native and recombinant sources, including tissue and/or cell type, cell line, strain and disease state. Where known, the proteins to which the glycan structures are attached are reported, and cross-references to the SWISS-PROT/TrEMBL protein sequence databases are given if applicable. The GlycoSuiteDB annotations include literature references which are linked to PubMed, and detailed information on the methods used to determine each glycan structure are noted to help the user assess the quality of the structural assignment. GlycoSuiteDB has a user-friendly web interface which allows the researcher to query the database using mono-isotopic or average mass, monosaccharide composition, glycosylation linkages (e.g. N:- or O:-linked), reducing terminal sugar, attached protein, taxonomy, tissue or cell type and GlycoSuiteDB accession number. Advanced queries using combinations of these parameters are also possible. GlycoSuiteDB can be accessed on the web at http://www.glycosuite.com.  相似文献   

13.
GenBank.   总被引:8,自引:3,他引:5       下载免费PDF全文
The GenBank sequence database continues to expand its data coverage, quality control, annotation content and retrieval services for the scientific community. Besides handling direct submissions of sequence data from authors, GenBank also incorporates DNA sequences from all available public sources; an integrated retrieval system, known as Entrez, also makes available data from the major protein sequence and structural databases, and from U.S. and European patents. MIDLINE abstracts from published articles describing the sequences are also included as an additional source of biological annotation for sequence entries. GenBank supports distribution of the data via FTP, CD-ROM, and E-mail servers. Network server-client programs provide access to an integrated database for literature retrieval and sequence similarity searching.  相似文献   

14.
Recent technological advances in lasers and optical detectors have enabled a variety of new, single molecule technologies to be developed. Using intense and highly collimated laser light sources in addition to super-sensitive cameras, the fluorescence of single fluorophores can now be imaged in aqueous solution. Also, laser optical tweezers have enabled the piconewton forces produced by pair of interacting biomolecules to be measured directly. However, for a researcher new to the field to begin to use such techniques in their own research might seem a daunting prospect. Most of the equipment that is in use is custom-built. However, most of the equipment is essence fairly simple and the aim of this article is to provide an entry point to the field for a newcomer. It focuses mainly on those practical aspects which are not particularly well covered in the literature, and aims to provide an overview of the field as a whole with references and web links to more detailed sources elsewhere. Indeed, the opportunity to publish an article such as this on the Internet affords many new opportunities (and more space!) for presenting scientific ideas and information. For example, we have illustrated the nature of optical trap data with an interactive Java simulation; provided links to relevant web sites and technical documents, and included a large number of colour figures and plots. Our group’s research focuses on molecular motors, and the bias of this article reflects this. It turns out that molecular motors have been a paradigm (or prototype) for single molecule research and the field has seen a rapid development in the techniques. It is hoped that the methods described here will be broadly applicable to other biological systems.This is an interactive contribution, which can be accessed at:  相似文献   

15.
One of the main goals in proteomics is to solve biological and molecular questions regarding a set of identified proteins. In order to achieve this goal, one has to extract and collect the existing biological data from public repositories for every protein and afterward, analyze and organize the collected data. Due to the complexity of this task and the huge amount of data available, it is not possible to gather this information by hand, making it necessary to find automatic methods of data collection. Within a proteomic context, we have developed Protein Information and Knowledge Extractor (PIKE) which solves this problem by automatically accessing several public information systems and databases across the Internet. PIKE bioinformatics tool starts with a set of identified proteins, listed as the most common protein databases accession codes, and retrieves all relevant and updated information from the most relevant databases. Once the search is complete, PIKE summarizes the information for every single protein using several file formats that share and exchange the information with other software tools. It is our opinion that PIKE represents a great step forward for information procurement and drastically reduces manual database validation for large proteomic studies. It is available at http://proteo.cnb.csic.es/pike .  相似文献   

16.
17.
DNA barcoding is based on the use of short DNA sequences to provide taxonomic tags for rapid, efficient identification of biological specimens. Currently, reference databases are being compiled. In the future, it will be important to facilitate access to these databases, especially for nonspecialist users. The method described here provides a rapid, web-based, user-friendly link between the DNA sequence from an unidentified biological specimen and various types of biological information, including the species name. Specifically, we use a customized, Google-type search algorithm to quickly match an unknown DNA sequence to a list of verified DNA barcodes in the reference database. In addition to retrieving the species name, our web tool also provides automatic links to a range of other information about that species. As the DNA barcode database becomes more populated, it will become increasingly important for the broader user community to be able to exploit it for the rapid identification of unknown specimens and to easily obtain relevant biological information about these species. The application presented here meets that need.  相似文献   

18.
SRS (Sequence Retrieval System), an indexing system for flatfile libraries, provides fast access to individual library entriesvia retrie by keywords from rious data fields. SRS is now alsoable to build indices using cross–references that mostlibraries provide. Fifteen libraries of DNA and protein sequencesand structures have been selected. These libraries interactwith at least one other by means of cross–references.Indexing these cross–references allows a complete networkof libraries to be built. In the network an entry from one librarycan be linked in principle to every other library. If two librariesare not directly cross–referenced, the linkage can bemade with a succession of single links between neighbouring,cross–referenced libraries. A new operator has been addedto the query language of SRS for convenient specification oflinks amongst complete libraries or entry sets generated byprevious queries on particular libraries. All the informationin the network can now be used to retrieve an entry in a specificlibrary, e.g. the full information given in amino acid sequenceentries from SwissProt can now be used to retrieve related tertiarystructure entries from PDB. Furthermore, a search in a singlelibrary can be extended to a search in the complete librarynetwork, e.g. all entries in all databases pertaining to elastasecan be found.  相似文献   

19.

Background  

The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature.  相似文献   

20.
A limitation of many gene expression analytic approaches is that they do not incorporate comprehensive background knowledge about the genes into the analysis. We present a computational method that leverages the peer-reviewed literature in the automatic analysis of gene expression data sets. Including the literature in the analysis of gene expression data offers an opportunity to incorporate functional information about the genes when defining expression clusters. We have created a method that associates gene expression profiles with known biological functions. Our method has two steps. First, we apply hierarchical clustering to the given gene expression data set. Secondly, we use text from abstracts about genes to (i) resolve hierarchical cluster boundaries to optimize the functional coherence of the clusters and (ii) recognize those clusters that are most functionally coherent. In the case where a gene has not been investigated and therefore lacks primary literature, articles about well-studied homologous genes are added as references. We apply our method to two large gene expression data sets with different properties. The first contains measurements for a subset of well-studied Saccharomyces cerevisiae genes with multiple literature references, and the second contains newly discovered genes in Drosophila melanogaster; many have no literature references at all. In both cases, we are able to rapidly define and identify the biologically relevant gene expression profiles without manual intervention. In both cases, we identified novel clusters that were not noted by the original investigators.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号