首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining.  相似文献   

2.

Background  

Bioinformatics tools for automatic processing of biomedical literature are invaluable for both the design and interpretation of large-scale experiments. Many information extraction (IE) systems that incorporate natural language processing (NLP) techniques have thus been developed for use in the biomedical field. A key IE task in this field is the extraction of biomedical relations, such as protein-protein and gene-disease interactions. However, most biomedical relation extraction systems usually ignore adverbial and prepositional phrases and words identifying location, manner, timing, and condition, which are essential for describing biomedical relations. Semantic role labeling (SRL) is a natural language processing technique that identifies the semantic roles of these words or phrases in sentences and expresses them as predicate-argument structures. We construct a biomedical SRL system called BIOSMILE that uses a maximum entropy (ME) machine-learning model to extract biomedical relations. BIOSMILE is trained on BioProp, our semi-automatic, annotated biomedical proposition bank. Currently, we are focusing on 30 biomedical verbs that are frequently used or considered important for describing molecular events.  相似文献   

3.

Background  

Information Extraction (IE) is a component of text mining that facilitates knowledge discovery by automatically locating instances of interesting biomedical events from huge document collections. As events are usually centred on verbs and nominalised verbs, understanding the syntactic and semantic behaviour of these words is highly important. Corpora annotated with information concerning this behaviour can constitute a valuable resource in the training of IE components and resources.  相似文献   

4.
5.
MOTIVATION: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no 'average biologist' client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks. RESULTS: The annotation scheme was applied to a large corpus in a controlled effort by eight independent annotators, where three individual annotators independently tagged each sentence. We then trained and tested machine learning classifiers to automatically categorize sentence fragments based on the annotation. We discuss here the issues involved in this task, and present an overview of the results. The latter strongly suggest that automatic annotation along most of the dimensions is highly feasible, and that this new framework for scientific sentence categorization is applicable in practice.  相似文献   

6.

Background  

The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.  相似文献   

7.

Background  

Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction.  相似文献   

8.

Background  

The OMIM database is a tool used daily by geneticists. Syndrome pages include a Clinical Synopsis section containing a list of known phenotypes comprising a clinical syndrome. The phenotypes are in free text and different phrases are often used to describe the same phenotype, the differences originating in spelling variations or typing errors, varying sentence structures and terminological variants.  相似文献   

9.
Shang Y  Li Y  Lin H  Yang Z 《PloS one》2011,6(8):e23862
Automatic text summarization for a biomedical concept can help researchers to get the key points of a certain topic from large amount of biomedical literature efficiently. In this paper, we present a method for generating text summary for a given biomedical concept, e.g., H1N1 disease, from multiple documents based on semantic relation extraction. Our approach includes three stages: 1) We extract semantic relations in each sentence using the semantic knowledge representation tool SemRep. 2) We develop a relation-level retrieval method to select the relations most relevant to each query concept and visualize them in a graphic representation. 3) For relations in the relevant set, we extract informative sentences that can interpret them from the document collection to generate text summary using an information retrieval based method. Our major focus in this work is to investigate the contribution of semantic relation extraction to the task of biomedical text summarization. The experimental results on summarization for a set of diseases show that the introduction of semantic knowledge improves the performance and our results are better than the MEAD system, a well-known tool for text summarization.  相似文献   

10.
A survey of current work in biomedical text mining   总被引:3,自引:0,他引:3  
The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5-10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.  相似文献   

11.
The media play a key role in forming opinions by influencing people´s understanding and perception of a topic. People gather information about topics of interest from the internet and print media, which employ various news frames to attract attention. One example of a common news frame is the human-interest frame, which emotionalizes and dramatizes information and often accentuates individual affectedness. Our study investigated effects of human-interest frames compared to a neutral-text condition with respect to perceived risk, emotions, and knowledge acquisition, and tested whether these effects can be "generalized" to common variants of the human-interest frame. Ninety-one participants read either one variant of the human-interest frame or a neutrally formulated version of a newspaper article describing the effects of invasive species in general and the Asian ladybug (an invasive species) in particular. The framing was achieved by varying the opening and concluding paragraphs (about invasive species), as well as the headline. The core text (about the Asian ladybug) was the same across all conditions. All outcome variables on framing effects referred to this common core text. We found that all versions of the human-interest frame increased perceived risk and the strength of negative emotions compared to the neutral text. Furthermore, participants in the human-interest frame condition displayed better (quantitative) learning outcomes but also biased knowledge, highlighting a potential dilemma: Human-interest frames may increase learning, but they also lead to a rather unbalanced view of the given topic on a “deeper level”.  相似文献   

12.
The emergence of increasingly complex data in industrial ecology (IE) has caused scholarly interest in interactive visualization (IV). IV allows users to interact with data, aiding in processing and interpreting complex datasets, processes, and simulations. Consequently, IV can help IE practitioners communicate the complexities of their methods and results, shed light on the underlying research assumptions, and enable more transparent monitoring of data quality and error. This can significantly increase the reach and impact of research, promote transparency, reproducibility, and open science, as well as improve the clarity and presentation of IE research. A review of current IV applications reveals that, while data exploration has received some attention among IE practitioners, IV applications in scientific communication are clearly lacking. With the help of a working example, we explore the value of IV, discuss its operationalization, and highlight challenges that the IE community must face during IV uptake. Such challenges include technical and knowledge limitations, limits on user interaction, and implementation strategies. With these challenges in mind, we outline key aspects needed to lift the IE field to the forefront of scientific communication in the coming years. Among these, we draft the basic principles of a “Hub for Interactive Visualization in Industrial Ecology” (HIVE), a point of encounter where IE practitioners could find an array of data visualization tools that are geared toward IE datasets. IV is here to stay, and its inceptive stage presents many opportunities to IE practitioners to shape its operationalization and benefit from early adoption.  相似文献   

13.
BioRAT: extracting biological information from full-length papers   总被引:2,自引:0,他引:2  
MOTIVATION: Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. RESULTS: We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers.  相似文献   

14.
The photosensitizer, methylene blue (MB), generates singlet oxygen that irreversibly inhibits Torpedo californica acetylcholinesterase (TcAChE). In the dark, it inhibits reversibly. Binding is accompanied by a bathochromic absorption shift, used to demonstrate displacement by other acetylcholinesterase inhibitors interacting with the catalytic "anionic" subsite (CAS), the peripheral "anionic" subsite (PAS), or bridging them. MB is a noncompetitive inhibitor of TcAChE, competing with reversible inhibitors directed at both "anionic" subsites, but a single site is involved in inhibition. MB also quenches TcAChE's intrinsic fluorescence. It binds to TcAChE covalently inhibited by a small organophosphate (OP), but not an OP containing a bulky pyrene. Differential scanning calorimetry shows an ~8° increase in the denaturation temperature of the MB/TcAChE complex relative to native TcAChE, and a less than twofold increase in cooperativity of the transition. The crystal structure reveals a single MB stacked against Trp279 in the PAS, oriented down the gorge toward the CAS; it is plausible that irreversible inhibition is associated with photooxidation of this residue and others within the active-site gorge. The kinetic and spectroscopic data showing that inhibitors binding at the CAS can impede binding of MB are reconciled by docking studies showing that the conformation adopted by Phe330, midway down the gorge, in the MB/TcAChE crystal structure, precludes simultaneous binding of a second MB at the CAS. Conversely, binding of ligands at the CAS dislodges MB from its preferred locus at the PAS. The data presented demonstrate that TcAChE is a valuable model for understanding the molecular basis of local photooxidative damage.  相似文献   

15.

Background  

Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community.  相似文献   

16.

Background:

Reliable information extraction applications have been a long sought goal of the biomedical text mining community, a goal that if reached would provide valuable tools to benchside biologists in their increasingly difficult task of assimilating the knowledge contained in the biomedical literature. We present an integrated approach to concept recognition in biomedical text. Concept recognition provides key information that has been largely missing from previous biomedical information extraction efforts, namely direct links to well defined knowledge resources that explicitly cement the concept's semantics. The BioCreative II tasks discussed in this special issue have provided a unique opportunity to demonstrate the effectiveness of concept recognition in the field of biomedical language processing.

Results:

Through the modular construction of a protein interaction relation extraction system, we present several use cases of concept recognition in biomedical text, and relate these use cases to potential uses by the benchside biologist.

Conclusion:

Current information extraction technologies are approaching performance standards at which concept recognition can begin to deliver high quality data to the benchside biologist. Our system is available as part of the BioCreative Meta-Server project and on the internet http://bionlp.sourceforge.net.
  相似文献   

17.

Background  

A neoplastic tumor cannot grow beyond a millimeter or so in diameter without recruitment of endothelial cells and new blood vessels to supply nutrition and oxygen for tumor cell survival. This study was designed to investigate formation of new blood vessels within a human growing breast cancer tumor model (MDA MB231 in mammary fat pad of nude female mouse). Once the tumor grew to 35 mm3, it developed a well-vascularized capsule. Histological sections of tumors greater than 35 mm3 were stained with PAS, with CD-31 antibody (an endothelial cell maker), or with hypoxia inducible factor 1α antibody (HIF). The extent of blood vessel and endothelial cell pseudopod volume density was measured by ocular grid intercept counting in the PAS stained slides.  相似文献   

18.
We investigated the use of bacterial cells isolated from paddy crab for the extraction of oil from Jatropha seed kernels in aqueous media while simultaneously preserving the protein structures of this protein-rich endosperm. A bacterial strain-which was marked as MB4 and identified by means of 16S rDNA sequencing and physiological characterization as either Bacillus pumilus or Bacillus altitudinis-enhanced the extraction yield of Jatropha oil. The incubation of an MB4 starter culture with preheated kernel slurry in aqueous media with the initial pH of 5.5 at 37?°C for 6?h liberated 73% w/w of the Jatropha oil. Since MB4 produces xylanases, it is suggested that strain MB4 facilitates oil liberation via degradation of hemicelluloses which form the oil-containing cell wall structure of the kernel. After MB4 assisted oil extraction, SDS-PAGE analysis showed that the majority of Jatropha proteins were preserved in the solid phase of the extraction residues. The advantages offered by this process are: protein in the residue can be further processed for other applications, no purified enzyme preparation is needed, and the resulting oil can be used for biodiesel production.  相似文献   

19.

Background  

Despite increasing interest in applying Natural Language Processing (NLP) to biomedical text, whether this technology can facilitate tasks such as database curation remains unclear.  相似文献   

20.
Recent advances in DNA sequencing technology have allowed the collection of high-dimensional data from human-associated microbial communities on an unprecedented scale. A major goal of these studies is the identification of important groups of microorganisms that vary according to physiological or disease states in the host, but the incidence of rare taxa and the large numbers of taxa observed make that goal difficult to obtain using traditional approaches. Fortunately, similar problems have been addressed by the machine learning community in other fields of study such as microarray analysis and text classification. In this review, we demonstrate that several existing supervised classifiers can be applied effectively to microbiota classification, both for selecting subsets of taxa that are highly discriminative of the type of community, and for building models that can accurately classify unlabeled data. To encourage the development of new approaches to supervised classification of microbiota, we discuss several structures inherent in microbial community data that may be available for exploitation in novel approaches, and we include as supplemental information several benchmark classification tasks for use by the community.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号