首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.

Background  

With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE? abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: bm25 and the ranking algorithm implemented in the open-source Lucene search engine.  相似文献   

2.

Background

Graph analysis algorithms such as PageRank and HITS have been successful in Web environments because they are able to extract important inter-document relationships from manually-created hyperlinks. We consider the application of these techniques to biomedical text retrieval. In the current PubMed® search interface, a MEDLINE® citation is connected to a number of related citations, which are in turn connected to other citations. Thus, a MEDLINE record represents a node in a vast content-similarity network. This article explores the hypothesis that these networks can be exploited for text retrieval, in the same manner as hyperlink graphs on the Web.

Results

We conducted a number of reranking experiments using the TREC 2005 genomics track test collection in which scores extracted from PageRank and HITS analysis were combined with scores returned by an off-the-shelf retrieval engine. Experiments demonstrate that incorporating PageRank scores yields significant improvements in terms of standard ranked-retrieval metrics.

Conclusion

The link structure of content-similarity networks can be exploited to improve the effectiveness of information retrieval systems. These results generalize the applicability of graph analysis algorithms to text retrieval in the biomedical domain.
  相似文献   

3.

Background  

Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available.  相似文献   

4.

Background  

The recombination of homologous genes is an effective protein engineering tool to evolve proteins. DNA shuffling by gene fragmentation and reassembly has dominated the literature since its first publication, but this fragmentation-based method is labor intensive. Recently, a fragmentation-free PCR based protocol has been published, termed recombination-dependent PCR, which is easy to perform. However, a detailed comparison of both methods is still missing.  相似文献   

5.

Background  

The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner.  相似文献   

6.

Background  

One of the most powerful methods for the prediction of protein structure from sequence information alone is the iterative construction of profile-type models. Because profiles are built from sequence alignments, the sequences included in the alignment and the method used to align them will be important to the sensitivity of the resulting profile. The inclusion of highly diverse sequences will presumably produce a more powerful profile, but distantly related sequences can be difficult to align accurately using only sequence information. Therefore, it would be expected that the use of protein structure alignments to improve the selection and alignment of diverse sequence homologs might yield improved profiles. However, the actual utility of such an approach has remained unclear.  相似文献   

7.

Background  

In a systems biology perspective, protein-protein interactions (PPI) are encoded in machine-readable formats to avoid issues encountered in their retrieval for the reconstruction of comprehensive interaction maps and biological pathways. However, the information stored in electronic formats currently used doesn't allow a valid automatic reconstruction of biological pathways.  相似文献   

8.
9.

Background  

Ontologies such as the Gene Ontology can enable the construction of complex queries over biological information in a conceptual way, however existing systems to do this are too technical. Within the biological domain there is an increasing need for software that facilitates the flexible retrieval of information. OntoDas aims to fulfil this need by allowing the definition of queries by selecting valid ontology terms.  相似文献   

10.

Background  

With the amount of influenza genome sequence data growing rapidly, researchers need machine assistance in selecting datasets and exploring the data. Enhanced visualization tools are required to represent results of the exploratory analysis on the web in an easy-to-comprehend form and to facilitate convenient information retrieval.  相似文献   

11.

Background  

Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on information content. Topic detection will benefit many other natural language processing tasks including information retrieval, text summarization and question answering; and is a necessary step towards the building of an information system that provides an efficient way for biologists to seek information from an ocean of literature.  相似文献   

12.

Background  

Accuracy of document retrieval from MEDLINE for gene queries is crucially important for many applications in bioinformatics. We explore five information retrieval-based methods to rank documents retrieved by PubMed gene queries for the human genome. The aim is to rank relevant documents higher in the retrieved list. We address the special challenges faced due to ambiguity in gene nomenclature: gene terms that refer to multiple genes, gene terms that are also English words, and gene terms that have other biological meanings.  相似文献   

13.

Background  

Manual curation of experimental data from the biomedical literature is an expensive and time-consuming endeavor. Nevertheless, most biological knowledge bases still rely heavily on manual curation for data extraction and entry. Text mining software that can semi- or fully automate information retrieval from the literature would thus provide a significant boost to manual curation efforts.  相似文献   

14.
DiGeorge syndrome (DGS) presents with a wide spectrum of thymic pathologies. Nationwide neonatal screening programs of lymphocyte production using T-cell recombination excision circles (TREC) have repeatedly identified patients with DGS. We tested what proportion of DGS patients could be identified at birth by combined TREC and kappa-deleting element recombination circle (KREC) screening. Furthermore, we followed TREC/KREC levels in peripheral blood (PB) to monitor postnatal changes in lymphocyte production.

Methods

TREC/KREC copies were assessed by quantitative PCR (qPCR) and were related to the albumin control gene in dry blood spots (DBSs) from control (n = 56), severe immunodeficiency syndrome (SCID, n = 10) and DGS (n = 13) newborns. PB was evaluated in DGS children (n = 32), in diagnostic samples from SCID babies (n = 5) and in 91 controls.

Results

All but one DGS patient had TREC levels in the normal range at birth, albeit quantitative TREC values were significantly lower in the DGS cohort. One patient had slightly reduced KREC at birth. Postnatal DGS samples revealed reduced TREC numbers in 5 of 32 (16%) patients, whereas KREC copy numbers were similar to controls. Both TREC and KREC levels showed a more pronounced decrease with age in DGS patients than in controls (p<0.0001 for both in a linear model). DGS patients had higher percentages of NK cells at the expense of T cells (p<0.0001). The patients with reduced TREC levels had repeated infections in infancy and developed allergy and/or autoimmunity, but they were not strikingly different from other patients. In 12 DGS patients with paired DBS and blood samples, the TREC/KREC levels were mostly stable or increased and showed similar kinetics in respective patients.

Conclusions

The combined TREC/KREC approach with correction via control gene identified 1 of 13 (8%) of DiGeorge syndrome patients at birth in our cohort. The majority of patients had TREC/KREC levels in the normal range.  相似文献   

15.

Background  

With the vast amounts of biomedical data being generated by high-throughput analysis methods, controlled vocabularies and ontologies are becoming increasingly important to annotate units of information for ease of search and retrieval. Each scientific community tends to create its own locally available ontology. The interfaces to query these ontologies tend to vary from group to group. We saw the need for a centralized location to perform controlled vocabulary queries that would offer both a lightweight web-accessible user interface as well as a consistent, unified SOAP interface for automated queries.  相似文献   

16.

Background  

Advances in biotechnology and in high-throughput methods for gene analysis have contributed to an exponential increase in the number of scientific publications in these fields of study. While much of the data and results described in these articles are entered and annotated in the various existing biomedical databases, the scientific literature is still the major source of information. There is, therefore, a growing need for text mining and information retrieval tools to help researchers find the relevant articles for their study. To tackle this, several tools have been proposed to provide alternative solutions for specific user requests.  相似文献   

17.

Background  

In order to perform a 3D reconstruction of electron microscopic images of viruses, it is necessary to determine the orientation (Euler angels) of the 2D projections of the virus. The projections containing high resolution information are usually very noisy. This paper proposes a new method, based on weighted-projection matching in wavelet space for virus orientation determination. In order to speed the retrieval of the best match between projections from a model and real virus particle, a hierarchical correlation matching method is also proposed.  相似文献   

18.

Background  

Until today, analysis of 16S ribosomal RNA (rRNA) sequences has been the de-facto gold standard for the assessment of phylogenetic relationships among prokaryotes. However, the branching order of the individual phlya is not well-resolved in 16S rRNA-based trees. In search of an improvement, new phylogenetic methods have been developed alongside with the growing availability of complete genome sequences. Unfortunately, only a few genes in prokaryotic genomes qualify as universal phylogenetic markers and almost all of them have a lower information content than the 16S rRNA gene. Therefore, emphasis has been placed on methods that are based on multiple genes or even entire genomes. The concatenation of ribosomal protein sequences is one method which has been ascribed an improved resolution. Since there is neither a comprehensive database for ribosomal protein sequences nor a tool that assists in sequence retrieval and generation of respective input files for phylogenetic reconstruction programs, RibAlign has been developed to fill this gap.  相似文献   

19.

Background  

Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora.  相似文献   

20.

Background  

The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition, species-specific document retrieval, and semantic enrichment of biomedical articles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号