首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Images (e.g., figures) are important experimental results that are typically reported in bioscience full-text articles. Biologists need to access images to validate research facts and to formulate or to test novel research hypotheses. On the other hand, biologists live in an age of information explosion. As thousands of biomedical articles are published every day, systems that help biologists efficiently access images in literature would greatly facilitate biomedical research. We hypothesize that much of image content reported in a full-text article can be summarized by the sentences in the abstract of the article. In our study, more than one hundred biologists had tested this hypothesis and more than 40 biologists had evaluated a novel user-interface BioEx that allows biologists to access images directly from abstract sentences. Our results show that 87.8% biologists were in favor of BioEx over two other baseline user-interfaces. We further developed systems that explored hierarchical clustering algorithms to automatically identify abstract sentences that summarize the images. One of the systems achieves a precision of 100% that corresponds to a recall of 4.6%.  相似文献   

2.
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine [1], a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles'' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface.  相似文献   

3.
MOTIVATION: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no 'average biologist' client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. RESULTS: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.  相似文献   

4.
MOTIVATION: Phosphorylation is an important biochemical reaction that plays a critical role in signal transduction pathways and cell-cycle processes. A text mining system to extract the phosphorylation relation from the literature is reported. The focus of this paper is on the new methods developed and implemented to connect and merge pieces of information about phosphorylation mentioned in different sentences in the text. The effectiveness and accuracy of the system as a whole as well as that of the methods for extraction beyond a clause/sentence is evaluated using an independently annotated dataset, the Phospho.ELM database. The new methods developed to merge pieces of information from different sentences are shown to be effective in significantly raising the recall without much difference in precision.  相似文献   

5.

Background  

The exploitation of information extraction (IE), a technology aiming to provide instances of structured representations from free-form text, has been rapidly growing within the molecular biology (MB) research community to keep track of the latest results reported in literature. IE systems have traditionally used shallow syntactic patterns for matching facts in sentences but such approaches appear inadequate to achieve high accuracy in MB event extraction due to complex sentence structure. A consensus in the IE community is emerging on the necessity for exploiting deeper knowledge structures such as through the relations between a verb and its arguments shown by predicate-argument structure (PAS). PAS is of interest as structures typically correspond to events of interest and their participating entities. For this to be realized within IE a key knowledge component is the definition of PAS frames. PAS frames for non-technical domains such as newswire are already being constructed in several projects such as PropBank, VerbNet, and FrameNet. Knowledge from PAS should enable more accurate applications in several areas where sentence understanding is required like machine translation and text summarization. In this article, we explore the need to adapt PAS for the MB domain and specify PAS frames to support IE, as well as outlining the major issues that require consideration in their construction.  相似文献   

6.
Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F1≈0.93, MCC≈0.74, iAUC≈0.99) and sentences (F1≈0.76, MCC≈0.65, iAUC≈0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence.  相似文献   

7.

Background

Biomedical literature is expanding rapidly, and tools that help locate information of interest are needed. To this end, a multitude of different approaches for classifying sentences in biomedical publications according to their coarse semantic and rhetoric categories (e.g., Background, Methods, Results, Conclusions) have been devised, with recent state-of-the-art results reported for a complex deep learning model. Recent evidence showed that shallow and wide neural models such as fastText can provide results that are competitive or superior to complex deep learning models while requiring drastically lower training times and having better scalability. We analyze the efficacy of the fastText model in the classification of biomedical sentences in the PubMed 200k RCT benchmark, and introduce a simple pre-processing step that enables the application of fastText on sentence sequences. Furthermore, we explore the utility of two unsupervised pre-training approaches in scenarios where labeled training data are limited.

Results

Our fastText-based methodology yields a state-of-the-art F1 score of.917 on the PubMed 200k benchmark when sentence ordering is taken into account, with a training time of only 73 s on standard hardware. Applying fastText on single sentences, without taking sentence ordering into account, yielded an F1 score of.852 (training time 13 s). Unsupervised pre-training of N-gram vectors greatly improved the results for small training set sizes, with an increase of F1 score of.21 to.74 when trained on only 1000 randomly picked sentences without taking sentence ordering into account.

Conclusions

Because of it’s ease of use and performance, fastText should be among the first choices of tools when tackling biomedical text classification problems with large corpora. Unsupervised pre-training of N-gram vectors on domain-specific corpora also makes it possible to apply fastText when labeled training data are limited.
  相似文献   

8.
Neuropeptides are an important class of signaling molecules that result from complex and variable posttranslational processing of precursor proteins and thus are difficult to identify based solely on genomic information. Bioinformatics prediction of precursor cleavage sites can support effective biochemical characterization of neuropeptides. Neuropeptide cleavage models were developed using comprehensive human, mouse, rat, and cattle precursor data sets and used to compare predicted neuropeptide processing across these species. Logistic regression and artificial neural network models were used to predict cleavages based on amino acid and physiochemical properties of amino acids at precursor sequence locations proximal to cleavage. Correct cleavage classification rates across species and models ranged from 85% to 100%, suggesting that amino acid and amino acid properties have major impact on the probability of cleavage and that these factors have comparable effects in human, mouse, rat, and cattle. The variable accuracy of each species-specific model to predict cleavage sites indicated that there are species- and precursor-specific processing patterns. Prediction of mouse cleavages using rat models was highly accurate, yet the reverse was not observed. Sensitivity and specificity revealed that logistic models are well suited to maximize the rate of true noncleavage predictions with moderate rates of true cleavage predictions; meanwhile, artificial neural networks maximize the rate of true cleavage predictions with moderate to low true noncleavage predictions. Logistic models also provided insights into the strength of the amino acid associations with cleavage. Prediction of neuropeptide cleavage sites using human, mouse, rat, and cattle models are available at . Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users. Allison Tegge and Bruce Southey contributed equally to this work.  相似文献   

9.
SUMMARY: BioIE is a rule-based system that extracts informative sentences relating to protein families, their structures, functions and diseases from the biomedical literaturE. Based on manual definition of templates and rules, it aims at precise sentence extraction rather than wide recall. After uploading source text or retrieving abstracts from MEDLINE, users can extract sentences based on predefined or user-defined template categories. BioIE also provides a brief insight into the syntactic and semantic context of the source-text by looking at word, N-gram and MeSH-term distributions. Important Applications of BioIE are in, for example, annotation of microarray data and of protein databases. AVAILABILITY: http://umber.sbs.man.ac.uk/dbbrowser/bioie/  相似文献   

10.
Qingrong C  Yan H 《PloS one》2012,7(4):e35517
This article reports the results of an eye-tracking experiment that investigated the processing of coordinate structures in Chinese sentence comprehension. The study tracked the eye movements of native Chinese readers as they read sentences consisting of two independent clauses connected by the word huo zhe. The data strongly confirmed readers' preference for an initial noun phrase (NP)-coordination parsing in Chinese coordination structure. When huo zhe was absent from the beginning of a sentence, we identified a cost associated with abandoning the NP-coordination analysis, which was evident with regard to the second NP when the coordination was unambiguous. Otherwise, this cost was evident with regard to the verb, the syntactically disambiguating region, when the coordination was ambiguous. However, the presence of a sentence-initial huo zhe reduced reading times and regressions in the huo zhe NP and the verb regions. We believe that the word huo zhe at the beginning of a sentence helps the reader predict that the sentence contains a parallel structure. Before the corresponding phrases appear, the readers can use the word huo zhe and the language structure thereafter to predicatively construct the syntactic structure. Such predictive capability can eliminate the reader's preference for NP-coordination analysis. Implications for top-down parsing theory and models of initial syntactic analysis and reanalysis are discussed.  相似文献   

11.
During text reading, the parafoveal word was usually presented between 2° and 5° from the point of fixation. Whether semantic information of parafoveal words can be processed during sentence reading is a critical and long-standing issue. Recently, studies using the RSVP-flanker paradigm have shown that the incongruent parafoveal word, presented as right flanker, elicited a more negative N400 compared with the congruent parafoveal word. This suggests that the semantic information of parafoveal words can be extracted and integrated during sentence reading, because the N400 effect is a classical index of semantic integration. However, as most previous studies did not control the word-pair congruency of the parafoveal and the foveal words that were presented in the critical triad, it is still unclear whether such integration happened at the sentence level or just at the word-pair level. The present study addressed this question by manipulating verbs in Chinese sentences to yield either a semantically congruent or semantically incongruent context for the critical noun. In particular, the interval between the critical nouns and verbs was controlled to be 4 or 5 characters. Thus, to detect the incongruence of the parafoveal noun, participants had to integrate it with the global sentential context. The results revealed that the N400 time-locked to the critical triads was more negative in incongruent than in congruent sentences, suggesting that parafoveal semantic information can be integrated at the sentence level during Chinese reading.  相似文献   

12.
Event-related potentials were used to investigate whether semantic integration in discourse is influenced by the number of intervening sentences between the endpoints of integration. Readers read discourses in which the last sentence contained a critical word that was either congruent or incongruent with the information introduced in the first sentence. Furthermore, for the short discourses, the first and last sentence were intervened by only one sentence while for the long discourses, they were intervened by three sentences. We found that the incongruent words elicited an N400 effect for both the short and long discourses. However, a P600 effect was only observed for the long discourses, but not for the short ones. These results suggest that although readers can successfully integrate upcoming words into the existing discourse representation, the effort required for this integration process is modulated by the number of intervening sentences. Thus, discourse distance as measured by the number of intervening sentences should be taken as an important factor for semantic integration in discourse.  相似文献   

13.
Hundreds of millions of figures are available in biomedical literature, representing important biomedical experimental evidence. Since text is a rich source of information in figures, automatically extracting such text may assist in the task of mining figure information. A high-quality ground truth standard can greatly facilitate the development of an automated system. This article describes DeTEXT: A database for evaluating text extraction from biomedical literature figures. It is the first publicly available, human-annotated, high quality, and large-scale figure-text dataset with 288 full-text articles, 500 biomedical figures, and 9308 text regions. This article describes how figures were selected from open-access full-text biomedical articles and how annotation guidelines and annotation tools were developed. We also discuss the inter-annotator agreement and the reliability of the annotations. We summarize the statistics of the DeTEXT data and make available evaluation protocols for DeTEXT. Finally we lay out challenges we observed in the automated detection and recognition of figure text and discuss research directions in this area. DeTEXT is publicly available for downloading at http://prir.ustb.edu.cn/DeTEXT/.  相似文献   

14.

Background

Theories of embodied language suggest that the motor system is differentially called into action when processing motor-related versus abstract content words or sentences. It has been recently shown that processing negative polarity action-related sentences modulates neural activity of premotor and motor cortices.

Methods and Findings

We sought to determine whether reading negative polarity sentences brought about differential modulation of cortico-spinal motor excitability depending on processing hand-action related or abstract sentences. Facilitatory paired-pulses Transcranial Magnetic Stimulation (pp-TMS) was applied to the primary motor representation of the right-hand and the recorded amplitude of induced motor-evoked potentials (MEP) was used to index M1 activity during passive reading of either hand-action related or abstract content sentences presented in both negative and affirmative polarity. Results showed that the cortico-spinal excitability was affected by sentence polarity only in the hand-action related condition. Indeed, in keeping with previous TMS studies, reading positive polarity, hand action-related sentences suppressed cortico-spinal reactivity. This effect was absent when reading hand action-related negative polarity sentences. Moreover, no modulation of cortico-spinal reactivity was associated with either negative or positive polarity abstract sentences.

Conclusions

Our results indicate that grammatical cues prompting motor negation reduce the cortico-spinal suppression associated with affirmative action sentences reading and thus suggest that motor simulative processes underlying the embodiment may involve even syntactic features of language.  相似文献   

15.
SUMMARY: METIS is a web-based integrated annotation tool. From single query sequences, the PRECIS component allows users to generate structured protein family reports from sets of related Swiss-Prot entries. These reports may then be augmented with pertinent sentences extracted from online biomedical literature via support vector machine and rule-based sentence classification systems. AVAILABILITY: http://umber.sbs.man.ac.uk/dbbrowser/metis/  相似文献   

16.
Topographic maps are a fundamental and ubiquitous feature of the sensory and motor regions of the brain. There is less evidence for the existence of conventional topographic maps in associational areas of the brain such as the prefrontal cortex and parietal cortex. The existence of topographically arranged anatomical projections is far more widespread and occurs in associational regions of the brain as well as sensory and motor regions: this points to a more widespread existence of topographically organised maps within associational cortex than currently recognised. Indeed, there is increasing evidence that abstract topographic representations may also occur in these regions. For example, a topographic mnemonic map of visual space has been described in the dorsolateral prefrontal cortex and topographically arranged visuospatial attentional signals have been described in parietal association cortex. This article explores how abstract representations might be extracted from sensory topographic representations and subsequently code abstract information. Finally a simple model is presented that shows how abstract topographic representations could be integrated with other information within the brain to solve problems or form abstract associations. The model uses correlative firing to detect associations between different types of stimuli. It is flexible because it can produce correlations between information represented in a topographic or non-topographic coordinate system. It is proposed that a similar process could be used in high-level cognitive operations such as learning and reasoning.  相似文献   

17.
18.
To elucidate the relationships between syntactic and semantic processes, one interesting question is how syntactic structures are constructed by the argument structure of a verb, where each argument corresponds to a semantic role of each noun phrase (NP). Here we examined the effects of possessivity [sentences with or without a possessor] and canonicity [canonical or noncanonical word orders] using Japanese ditransitive sentences. During a syntactic decision task, the syntactic structure of each sentence would be constructed in an incremental manner based on the predicted argument structure of the ditransitive verb in a verb-final construction. Using magnetoencephalography, we found a significant canonicity effect on the current density in the left inferior frontal gyrus (IFG) at 530-550 ms after the verb onset. This effect was selective to canonical sentences, and significant even when the precedent NP was physically identical. We suggest that the predictive effects associated with syntactic processing became larger for canonical sentences, where the NPs and verb were merged with a minimum structural distance, leading to the left IFG activations. For monotransitive and intransitive verbs, in which structural computation of the sentences was simpler than that of ditransitive sentences, we observed a significant effect selective to noncanonical sentences in the temporoparietal regions during 480-670 ms. This effect probably reflects difficulty in semantic processing of noncanonical sentences. These results demonstrate that the left IFG plays a predictive role in syntactic processing, which depends on the canonicity determined by argument structures, whereas other temporoparietal regions would subserve more semantic aspects of sentence processing.  相似文献   

19.
To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches.  相似文献   

20.
Previous behavioral evidence suggests that instructed strategy use benefits associative memory formation in paired associate tasks. Two such effective encoding strategies--visual imagery and sentence generation--facilitate memory through the production of different types of mediators (e.g., mental images and sentences). Neuroimaging evidence suggests that regions of the brain support memory reflecting the mental operations engaged at the time of study. That work, however, has not taken into account self-reported encoding task success (i.e., whether participants successfully generated a mediator). It is unknown, therefore, whether task-selective memory effects specific to each strategy might be found when encoding strategies are successfully implemented. In this experiment, participants studied pairs of abstract nouns under either visual imagery or sentence generation encoding instructions. At the time of study, participants reported their success at generating a mediator. Outside of the scanner, participants further reported the quality of the generated mediator (e.g., images, sentences) for each word pair. We observed task-selective memory effects for visual imagery in the left middle occipital gyrus, the left precuneus, and the lingual gyrus. No such task-selective effects were observed for sentence generation. Intriguingly, activity at the time of study in the left precuneus was modulated by the self-reported quality (vividness) of the generated mental images with greater activity for trials given higher ratings of quality. These data suggest that regions of the brain support memory in accord with the encoding operations engaged at the time of study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号