首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: Searching relevant publications for manual database annotation is a tedious task. In this paper, we apply a combination of Natural Language Processing (NLP) and probabilistic classification to re-rank documents returned by PubMed according to their relevance to Swiss-Prot annotation, and to identify significant terms in the documents. RESULTS: With a Probabilistic Latent Categoriser (PLC) we obtained 69% recall and 59% precision for relevant documents in a representative query. As the PLC technique provides the relative contribution of each term to the final document score, we used the Kullback-Leibler symmetric divergence to determine the most discriminating words for Swiss-Prot medical annotation. This information should allow curators to understand classification results better. It also has great value for fine-tuning the linguistic pre-processing of documents, which in turn can improve the overall classifier performance.  相似文献   

2.
3.
4.

Background  

Accuracy of document retrieval from MEDLINE for gene queries is crucially important for many applications in bioinformatics. We explore five information retrieval-based methods to rank documents retrieved by PubMed gene queries for the human genome. The aim is to rank relevant documents higher in the retrieved list. We address the special challenges faced due to ambiguity in gene nomenclature: gene terms that refer to multiple genes, gene terms that are also English words, and gene terms that have other biological meanings.  相似文献   

5.
6.
BACKGROUND: The PubMed database contains nearly 15 million references from more than 4,800 biomedical journals. In general, authors of scientific articles are addressed by their last name and forename initial. DISCUSSION: In general, names can be too common and not unique enough to be search criteria. Today, Ph.D. students, other researchers and women publish scientific work. A person may not only have one name but several names and publish under each name. A Unique Scientist ID could help to address people in peer-to-peer (P2P) networks. As a starting point, perhaps PubMed could generate and manage such a scientist ID. SUMMARY: A Unique Scientist ID would improve knowledge management in science. Unfortunately in some of the publications, and then within the online databases, only one letter abbreviates the author's forename. A common name with only one initial could retrieve pertinent citations, but include many false drops (retrieval matching searched criteria but indisputably irrelevant).  相似文献   

7.
Sequence variants, in particular single nucleotide polymorphisms (SNPs), are key elements for the identification of genes associated with complex diseases and with particular drug responses. The search for literature about sequence variation is hampered by the large number of allelic variants reported for many genes and by the variability in both gene and sequence variants nomenclatures. We describe OSIRIS, a search tool that integrates different sources of information with the aim to retrieve literature about sequence variation of a gene. In addition, it provides a method to link a dbSNP entry with the articles referring to it. AVAILABILITY: OSIRIS is available for public use at http://ibi.imim.es/  相似文献   

8.
Online searches for research topics in thermal physiology usually do not retrieve more than 40% of the existing publications. In order to determine whether this low retrieval rate is due to deficient coverage of the literature or to problems in storage and retrieval of information, a list of 25 major papers, 25 minor papers and 15 obscure papers on thermal physiology was compiled. The 65 documents were searched online in the MEDLINE database. Retrieval rate was more than 60% for obscure papers, 80% for minor papers, and 90% for major papers. It is concluded that MEDLINE has an appropriate coverage of the literature on thermal physiology and that the low retrieval rate in online searches is due mostly to problems in information processing.  相似文献   

9.
Little is known about the mechanisms underlying hepatocellular carcinoma (HCC). Some studies have focused on the role of HCV viral proteins in hepatocyte transformation. In this work we have compiled and analysed current articles regarding the impact of polymorphisms in the HCV core gene and protein on the development of HCC. An exhaustive search for full-text articles until November 2016 in PubMed database was performed using the MeSH keywords: ‘hepatitis C’, ‘polymorphisms’, ‘core’, ‘hepatocellular cancer’ and ‘hepatocarcinogenesis’. Nineteen full-text articles published between 2000 and 2016 were considered. Different articles associate not only the HCC development with polymorphisms at residues 70 and 91 in the core protein, but more with mortality and treatment response. Also, different polymorphisms were found in core and other viral proteins related to HCC development. Eleven articles reported that HCC development is significantly associated with Gln/His70, four associated it with Leu91 and two more associated it with both markers together. Additional studies are necessary, including those in different types of populations worldwide, to validate the possibility of the usability and influence in chronically HCV-infected patients as well as to observe their interaction with other risk factors or prognosis and genetic markers of the host.  相似文献   

10.
单核苷酸多态性可以划分为位于基因编码区的SNP和非编码区的SNP两大种类;而在基因编码区的SNP还可以进一步划分为两个亚类:不改变氨基酸序列的同义SNP和改变氨基酸序列的非同义SNP.显然,非同义SNP将导致氨基酸序列的改变,即形成单氨基酸多态性.基于蛋白质组学方法,对亚洲人群血浆样本中的SAP进行了系统研究,发现某一特定SAP在纯合人群和杂合人群中可能与生理或病理性状有着不同的关联.更为重要的是,近期有研究发现,在生物体中广泛存在着RNA序列与DNA序列不一致的现象.导致这种差异的主要原因是在转录水平上存在着规模化的RNA编辑(被称为RNA编辑组,RNA editome).该发现表明,个体拥有的SAP中可能有一部分与基因组SNP无关,而是源于RNA编辑组.进一步推论,可能在翻译水平上存在着不依赖DNA和RNA序列的全新的SAP.  相似文献   

11.
JR Wu  R Zeng 《FEBS letters》2012,586(18):2841-2845
Single nucleotide polymorphisms (SNPs) are one type of genomic DNA variations in a population. Correspondingly, single amino-acid polymorphisms (SAPs) derived from non-synonymous SNPs represent protein variations in a population. Recently, using proteomic approaches, SAPs in the plasma proteomes of an Asian population were systematically identified for the first time. That study showed that heterozygous and homozygous proteins with various SAPs have different associations with particular traits in the population. Recent discoveries of widespread differences between RNA and DNA sequences indicate that RNA editing is also a source of SAPs - one that is independent of genomic SNPs. Furthermore, we argue that there are de novo SAPs that are not encoded by either DNA or RNA sequences.  相似文献   

12.
A commonly used strategy to improve search accuracy is through feedback techniques. Most existing work on feedback relies on positive information, and has been extensively studied in information retrieval. However, when a query topic is difficult and the results from the first-pass retrieval are very poor, it is impossible to extract enough useful terms from a few positive documents. Therefore, the positive feedback strategy is incapable to improve retrieval in this situation. Contrarily, there is a relatively large number of negative documents in the top of the result list, and it has been confirmed that negative feedback strategy is an important and useful way for adapting this scenario by several recent studies.In this paper, we consider a scenario when the search results are so poor that there are at most three relevant documents in the top twenty documents. Then, we conduct a novel study of multiple strategies for relevance feedback using both positive and negative examples from the first-pass retrieval to improve retrieval accuracy for such difficult queries. Experimental results on these TREC collections show that the proposed language model based multiple model feedback method which is generally more effective than both the baseline method and the methods using only positive or negative model.  相似文献   

13.

Background  

As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence.  相似文献   

14.
15.
Single amino acid polymorphisms (SAPs), also known as non-synonymous single nucleotide polymorphisms (nsSNPs), are responsible for most of human genetic diseases. Discriminate the deleterious SAPs from neutral ones can help identify the disease genes and understand the mechanism of diseases. In this work, a method of deleterious SAP prediction at system level was established. Unlike most existing methods, our method not only considers the sequence and structure information, but also the network information. The integration of network information can improve the performance of deleterious SAP prediction. To make our method available to the public, we developed SySAP (a System-level predictor of deleterious Single Amino acid Polymorphisms), an easy-to-use and high accurate web server. SySAP is freely available at http://www.biosino.org/SySAP/and http://lifecenter.sgst.cn/SySAP/.  相似文献   

16.

Background:

Physicians face challenges when searching PubMed for research evidence, and they may miss relevant articles while retrieving too many nonrelevant articles. We investigated whether the use of search filters in PubMed improves searching by physicians.

Methods:

We asked a random sample of Canadian nephrologists to answer unique clinical questions derived from 100 systematic reviews of renal therapy. Physicians provided the search terms that they would type into PubMed to locate articles to answer these questions. We entered the physician-provided search terms into PubMed and applied two types of search filters alone or in combination: a methods-based filter designed to identify high-quality studies about treatment (clinical queries “therapy”) and a topic-based filter designed to identify studies with renal content. We evaluated the comprehensiveness (proportion of relevant articles found) and efficiency (ratio of relevant to nonrelevant articles) of the filtered and nonfiltered searches. Primary studies included in the systematic reviews served as the reference standard for relevant articles.

Results:

The average physician-provided search terms retrieved 46% of the relevant articles, while 6% of the retrieved articles were nonrelevant (the ratio of relevant to nonrelevant articles was 1:16). The use of both filters together produced a marked improvement in efficiency, resulting in a ratio of relevant to nonrelevant articles of 1:5 (16 percentage point improvement; 99% confidence interval 9% to 22%; p < 0.003) with no substantive change in comprehensiveness (44% of relevant articles found; p = 0.55).

Interpretation:

The use of PubMed search filters improves the efficiency of physician searches. Improved search performance may enhance the transfer of research into practice and improve patient care.Retrieving health literature is a cornerstone of evidence-based practice. With the rapid increase in available evidence, physicians can no longer rely on one or two key journals to stay current. Increasingly, physicians search bibliographic databases, such as PubMed, for research evidence, which is dispersed across hundreds of journals. Each year, physicians perform over 200 million searches in PubMed.1,2 Physicians face challenges while searching PubMed and often miss relevant articles while retrieving too many nonrelevant articles.36 Clinical decision-making based on evidence from a search may be impaired if relevant articles are missed. Retrieving many nonrelevant articles impedes the efficiency of searching. Improved search strategies are therefore necessary to retrieve a manageable amount of information. The use of PubMed search filters may help solve this problem. Filters are objectively derived, pretested strategies optimized to help users efficiently retrieve articles for a specific purpose.7,8PubMed provides two types of clinical search filters: methods-based and topic-based. Methods-based filters (known as clinical queries) were designed to retrieve articles on therapy, diagnosis, prognosis and etiology.913 For example, the clinical queries “therapy” filter is optimized to retrieve publications of randomized controlled trials. Methods-based filters can be used for any clinical discipline and are available for general use in PubMed (www.ncbi.nlm.nih.gov/pubmed/clinical). Topic-based filters, in contrast, are designed to retrieve articles within a specific discipline or topic. For example, the recently developed nephrology filters were optimized to retrieve articles with renal content.1Physicians can use methods- and topic-based filters alone or in combination. For example, Figure 1A shows a search without search filters for studies about the effectiveness of hepatitis B vaccination in patients with chronic kidney disease. Alternatively, this search could be performed with search filters (Figure 1B). Using filters removes the task of generating and including method-specific or topic-specific terms in a search strategy because the filters act as optimized substitutes. For example, applying the nephrology filter eliminates the need to enter renal terms and synonyms in a search query (e.g., chronic kidney disease, end-stage renal disease, chronic renal failure). The nephrology filter, instead, maximizes the retrieval of all renal content (see the nephrology filter strategy in Figure 1B).Open in a separate windowFigure 1:PubMed searches without (A) and with (B) filters. This figure was created from the PubMed clinical queries Web interface; this page currently does not feature a “clinical category” section. When we performed searches with the nephrology filter (B), we removed the term “chronic kidney disease” because the filter acts as an optimized substitute for clinical content terms.In theory, filters should make searching more efficient; however, empiric evidence of this among physicians is lacking. We conducted this study to determine whether the use of methods-based filters and topic-based filters (alone and in combination) improve the efficiency of physician searches in PubMed. The area of renal medicine is an excellent test model because the literature in this field is dispersed across 400 multidisciplinary journals, and many nephrologists search PubMed for information to guide patient care.14,15  相似文献   

17.
To develop search filters and retrieve information estimating the Croatian scientific output (SO) focusing on Public Health (PH) and Preventive Medicine (PM) in MEDLINE. A PubMed search of the MEDLINE database was performed to retrieve articles added to this database between 2000 and 2007. Search filters inspired by previous strategies were applied involving 'geographical', 'place of publication', 'subject' and 'language of publication' aspects. An evaluation of the geographical filter performance was done and sensitivity and specificity were calculated. There were obtained publications in several languages, originated in Croatia, published in Croatia and/or abroad. The Croatian SO in the field of PH-PM was obtained for the same period of time by combining search filters. The evaluation of the filter performance showed sensitivity 95.56% and specificity 100%. The filters constructed permitted the retrieval of the Croatian eight years research output. Increased tendency was observed in the global SO evolution and in the PH-PM area as well. The main languages of publication were English and Croatian. This study is a contribution to research in the field of scientific documentation and further analysis is recommended in constructing and developing search filters to retrieve and focus on specific information.  相似文献   

18.
Words appearing in abstracts of scientific articles are often useful as search terms, particularly those words and word patterns that are unique to the relevant field of endeavour. In view of the heightened interest in obtaining information about alternatives to animal testing, efforts directed toward enhancing retrieval of pertinent references from the biomedical literature are warranted. Words and phrases, and word-phrase co-occurrences describing methods of experimentation in abstracts about alternatives to skin-irritation testing in animals, were evaluated with regard to retrieval efficiency in the National Library of Medicine database, Toxline(. Precision of retrieval was defined as the number of pertinent references found in the total number of citations retrieved. Retrieval precision values ranged from 0.25% to 100%.  相似文献   

19.
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.  相似文献   

20.

Background:

The biomedical literature is the primary information source for manual protein-protein interaction annotations. Text-mining systems have been implemented to extract binary protein interactions from articles, but a comprehensive comparison between the different techniques as well as with manual curation was missing.

Results:

We designed a community challenge, the BioCreative II protein-protein interaction (PPI) task, based on the main steps of a manual protein interaction annotation workflow. It was structured into four distinct subtasks related to: (a) detection of protein interaction-relevant articles; (b) extraction and normalization of protein interaction pairs; (c) retrieval of the interaction detection methods used; and (d) retrieval of actual text passages that provide evidence for protein interactions. A total of 26 teams submitted runs for at least one of the proposed subtasks. In the interaction article detection subtask, the top scoring team reached an F-score of 0.78. In the interaction pair extraction and mapping to SwissProt, a precision of 0.37 (with recall of 0.33) was obtained. For associating articles with an experimental interaction detection method, an F-score of 0.65 was achieved. As for the retrieval of the PPI passages best summarizing a given protein interaction in full-text articles, 19% of the submissions returned by one of the runs corresponded to curator-selected sentences. Curators extracted only the passages that best summarized a given interaction, implying that many of the automatically extracted ones could contain interaction information but did not correspond to the most informative sentences.

Conclusion:

The BioCreative II PPI task is the first attempt to compare the performance of text-mining tools specific for each of the basic steps of the PPI extraction pipeline. The challenges identified range from problems in full-text format conversion of articles to difficulties in detecting interactor protein pairs and then linking them to their database records. Some limitations were also encountered when using a single (and possibly incomplete) reference database for protein normalization or when limiting search for interactor proteins to co-occurrence within a single sentence, when a mention might span neighboring sentences. Finally, distinguishing between novel, experimentally verified interactions (annotation relevant) and previously known interactions adds additional complexity to these tasks.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号