首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.

Background  

The majority of information in the biological literature resides in full text articles, instead of abstracts. Yet, abstracts remain the focus of many publicly available literature data mining tools. Most literature mining tools rely on pre-existing lexicons of biological names, often extracted from curated gene or protein databases. This is a limitation, because such databases have low coverage of the many name variants which are used to refer to biological entities in the literature.  相似文献   

2.
In this paper, we present a novel approach Bio-IEDM (biomedical information extraction and data mining) to integrate text mining and predictive modeling to analyze biomolecular network from biomedical literature databases. Our method consists of two phases. In phase 1, we discuss a semisupervised efficient learning approach to automatically extract biological relationships such as protein-protein interaction, protein-gene interaction from the biomedical literature databases to construct the biomolecular network. Our method automatically learns the patterns based on a few user seed tuples and then extracts new tuples from the biomedical literature based on the discovered patterns. The derived biomolecular network forms a large scale-free network graph. In phase 2, we present a novel clustering algorithm to analyze the biomolecular network graph to identify biologically meaningful subnetworks (communities). The clustering algorithm considers the characteristics of the scale-free network graphs and is based on the local density of the vertex and its neighborhood functions that can be used to find more meaningful clusters with different density level. The experimental results indicate our approach is very effective in extracting biological knowledge from a huge collection of biomedical literature. The integration of data mining and information extraction provides a promising direction for analyzing the biomolecular network  相似文献   

3.

Background  

Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic information, among others. Despite dedication to these problems and effective solutions being reported, easily integrated tools to perform these tasks are not readily available.  相似文献   

4.
A survey of current work in biomedical text mining   总被引:3,自引:0,他引:3  
The volume of published biomedical research, and therefore the underlying biomedical knowledge base, is expanding at an increasing rate. Among the tools that can aid researchers in coping with this information overload are text mining and knowledge extraction. Significant progress has been made in applying text mining to named entity recognition, text classification, terminology extraction, relationship extraction and hypothesis generation. Several research groups are constructing integrated flexible text-mining systems intended for multiple uses. The major challenge of biomedical text mining over the next 5-10 years is to make these systems useful to biomedical researchers. This will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.  相似文献   

5.
In this study, based on the resonator model and exciplex model of electromagnetic radiation within the human body, mathematical model of biological order state, also referred to as syndrome in traditional Chinese medicine, was established and expressed as: “ \textSy = n/ln(6I + 1) {\text{Sy}} = \nu /\ln (6I + 1) ”. This model provides the theoretical foundation for experimental research addressing the order state of living system, especially the quantitative research syndrome in traditional Chinese medicine.  相似文献   

6.
A huge amount of important biomedical information is hidden in the bulk of research articles in biomedical fields. At the same time, the publication of databases of biological information and of experimental datasets generated by high-throughput methods is in great expansion, and a wealth of annotated gene databases, chemical, genomic (including microarray datasets), clinical and other types of data repositories are now available on the Web. Thus a current challenge of bioinformatics is to develop targeted methods and tools that integrate scientific literature, biological databases and experimental data for reducing the time of database curation and for accessing evidence, either in the literature or in the datasets, useful for the analysis at hand. Under this scenario, this article reviews the knowledge discovery systems that fuse information from the literature, gathered by text mining, with microarray data for enriching the lists of down and upregulated genes with elements for biological understanding and for generating and validating new biological hypothesis. Finally, an easy to use and freely accessible tool, GeneWizard, that exploits text mining and microarray data fusion for supporting researchers in discovering gene-disease relationships is described.  相似文献   

7.
With biomedical literature increasing at a rate of several thousand papers per week, it is impossible to keep abreast of all developments; therefore, automated means to manage the information overload are required. Text mining techniques, which involve the processes of information retrieval, information extraction and data mining, provide a means of solving this. By adding meaning to text, these techniques produce a more structured analysis of textual knowledge than simple word searches, and can provide powerful tools for the production and analysis of systems biology models.  相似文献   

8.

Background  

DNA methylation is an important epigenetic modification of the genome. Abnormal DNA methylation may result in silencing of tumor suppressor genes and is common in a variety of human cancer cells. As more epigenetics research is published electronically, it is desirable to extract relevant information from biological literature. To facilitate epigenetics research, we have developed a database called MeInfoText to provide gene methylation information from text mining.  相似文献   

9.
MOTIVATION: Phosphorylation is an important biochemical reaction that plays a critical role in signal transduction pathways and cell-cycle processes. A text mining system to extract the phosphorylation relation from the literature is reported. The focus of this paper is on the new methods developed and implemented to connect and merge pieces of information about phosphorylation mentioned in different sentences in the text. The effectiveness and accuracy of the system as a whole as well as that of the methods for extraction beyond a clause/sentence is evaluated using an independently annotated dataset, the Phospho.ELM database. The new methods developed to merge pieces of information from different sentences are shown to be effective in significantly raising the recall without much difference in precision.  相似文献   

10.

Background  

Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature.  相似文献   

11.
This article collects opinions from leading scientists about how text mining can provide better access to the biological literature, how the scientific community can help with this process, what the next steps are, and what role future BioCreative evaluations can play. The responses identify several broad themes, including the possibility of fusing literature and biological databases through text mining; the need for user interfaces tailored to different classes of users and supporting community-based annotation; the importance of scaling text mining technology and inserting it into larger workflows; and suggestions for additional challenge evaluations, new applications, and additional resources needed to make progress.  相似文献   

12.
The cell division control protein (Cdc2) kinase is a catalytic subunit of a protein kinase complex, called the M phase promoting factor, which induces entry into mitosis and is universal among eukaryotes. This protein is believed to play a major role in cell division and control. The lives of biological cells are controlled by proteins interacting in metabolic and signaling pathways, in complexes that replicate genes and regulate gene activity, and in the assembly of the cytoskeletal infrastructure. Our knowledge of protein–protein (P–P) interactions has been accumulated from biochemical and genetic experiments, including the widely used yeast two-hybrid test. In this paper we examine if P–P interactions in regenerating tissues and cells of the anuran Xenopus laevis can be discovered from biomedical literature using computational and literature mining techniques. Using literature mining techniques, we have identified a set of implicitly interacting proteins in regenerating tissues and cells of Xenopus laevis that may interact with Cdc2 to control cell division. Genome sequence based bioinformatics tools were then applied to validate a set of proteins that appear to interact with the Cdc2 protein. Pathway analysis of these proteins suggests that Myc proteins function as the regulator of M phase initiation by controlling expression of the Akt1 molecule that ultimately inhibits the Cdc2-cyclin B complex in cells. P–P interactions that are implicitly appearing in literature can be effectively discovered using literature mining techniques. By applying evolutionary principles on the P–P interacting pairs, it is possible to quantitatively analyze the significance of the associations with biological relevance. The developed BioMap system allows discovering implicit P–P interactions from large quantity of biomedical literature data. The unique similarities and differences observed within the interacting proteins can lead to the development of the new hypotheses that can be used to design further laboratory experiments.  相似文献   

13.
14.

Background  

To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant.  相似文献   

15.

Background  

Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck.  相似文献   

16.

Background

Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. A relation extraction system achieving high performance is expensive to develop because of the substantial time and effort required for its design and implementation. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. It has several unique design features: (1) leveraging syntactic variations possible in a language and automatically generating extraction patterns in a systematic manner, (2) applying sentence simplification to improve the coverage of extraction patterns, and (3) identifying referential relations between a syntactic argument of a predicate and the actual target expected in the relation extraction task.

Results

A relation extraction system derived using the proposed framework achieved overall F-scores of 72.66% for the Simple events and 55.57% for the Binding events on the BioNLP-ST 2011 GE test set, comparing favorably with the top performing systems that participated in the BioNLP-ST 2011 GE task. We obtained similar results on the BioNLP-ST 2013 GE test set (80.07% and 60.58%, respectively). We conducted additional experiments on the training and development sets to provide a more detailed analysis of the system and its individual modules. This analysis indicates that without increasing the number of patterns, simplification and referential relation linking play a key role in the effective extraction of biomedical relations.

Conclusions

In this paper, we present a novel framework for fast development of relation extraction systems. The framework requires only a list of triggers as input, and does not need information from an annotated corpus. Thus, we reduce the involvement of domain experts, who would otherwise have to provide manual annotations and help with the design of hand crafted patterns. We demonstrate how our framework is used to develop a system which achieves state-of-the-art performance on a public benchmark corpus.  相似文献   

17.
For the average biologist, hands-on literature mining currently means a keyword search in PubMed. However, methods for extracting biomedical facts from the scientific literature have improved considerably, and the associated tools will probably soon be used in many laboratories to automatically annotate and analyse the growing number of system-wide experimental data sets. Owing to the increasing body of text and the open-access policies of many journals, literature mining is also becoming useful for both hypothesis generation and biological discovery. However, the latter will require the integration of literature and high-throughput data, which should encourage close collaborations between biologists and computational linguists.  相似文献   

18.
《BIOSILICO》2003,1(2):69-80
The information age has made the electronic storage of large amounts of data effortless. The proliferation of documents available on the Internet, corporate intranets, news wires and elsewhere is overwhelming. Search engines only exacerbate this overload problem by making increasingly more documents available in only a few keystrokes. This information overload also exists in the biomedical field, where scientific publications, and other forms of text-based data are produced at an unprecedented rate. Text mining is the combined, automated process of analyzing unstructured, natural language text to discover information and knowledge that are typically difficult to retrieve. Here, we focus on text mining as applied to the biomedical literature. We focus in particular on finding relationships among genes, proteins, drugs and diseases, to facilitate an understanding and prediction of complex biological processes. The LitMiner™ system, developed specifically for this purpose; is described in relation to the Knowledge Discovery and Data Mining Cup 2002, which serves as a formal evaluation of the system.  相似文献   

19.
Apo and holo forms of lactoferrin (LF) from caprine and bovine species have been characterized and compared with regard to the structural stability determined by thermal denaturation temperature values (T m), at pH 2.0–8.0. The bovine lactoferrin (bLF) showed highest thermal stability with a T m of 90 ± 1°C at pH 7.0 whereas caprine lactoferrin (cLF) showed a lower T m value 68 ± 1°C. The holo form was much more stable than the apo form for the bLF as compared to cLF. When pH was gradually reduced to 3.0, the T m values of both holo bLF and holo cLF were reduced showing T m values of 49 ± 1 and 40 ± 1°C, respectively. Both apo and holo forms of cLF and bLF were found to be most stable at pH 7.0. A significant loss in the iron content of both holo and apo forms of the cLF and bLF was observed when pH was decreased from 7.0 to 2.0. At the same time a gradual unfolding of the apo and holo forms of both cLF and bLF was shown by maximum exposure of hydrophobic regions at pH 3.0. This was supported with a loss in α-helix structure together with an increase in the content of unordered (aperiodic) structure, while β structure seemed unchanged at all pH values. Since LF is used today as fortifier in many products, like infant formulas and exerts many biological functions in human, the structural changes, iron binding and release affected by pH and thermal denaturation temperature are important factors to be clarified for more than the bovine species.  相似文献   

20.
Extraction of regulatory gene/protein networks from Medline   总被引:2,自引:0,他引:2  
MOTIVATION: We have previously developed a rule-based approach for extracting information on the regulation of gene expression in yeast. The biomedical literature, however, contains information on several other equally important regulatory mechanisms, in particular phosphorylation, which we now expanded for our rule-based system also to extract. RESULTS: This paper presents new results for extraction of relational information from biomedical text. We have improved our system, STRING-IE, to capture both new types of linguistic constructs as well as new types of biological information [i.e. (de-)phosphorylation]. The precision remains stable with a slight increase in recall. From almost one million PubMed abstracts related to four model organisms, we manage to extract regulatory networks and binary phosphorylations comprising 3,319 relation chunks. The accuracy is 83-90% and 86-95% for gene expression and (de-)phosphorylation relations, respectively. To achieve this, we made use of an organism-specific resource of gene/protein names considerably larger than those used in most other biology related information extraction approaches. These names were included in the lexicon when retraining the part-of-speech (POS) tagger on the GENIA corpus. For the domain in question, an accuracy of 96.4% was attained on POS tags. It should be noted that the rules were developed for yeast and successfully applied to both abstracts and full-text articles related to other organisms with comparable accuracy. AVAILABILITY: The revised GENIA corpus, the POS tagger, the extraction rules and the full sets of extracted relations are available from http://www.bork.embl.de/Docu/STRING-IE  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号