首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Microbe plays a crucial role in the functional mechanism of an ecosystem. Identification of the interactions among microbes is an important step towards understand the structure and function of microbial communities, as well as of the impact of microbes on human health and disease. Despite the importance of it, there is not a gold-standard dataset of microbial interactions currently. Traditional approaches such as growth and co-culture analysis need to be performed in the laboratory, which are time-consuming and costly. By providing predicted candidate interactions to experimental verification, computational methods are able to alleviate this problem. Mining microbial interactions from mass medical texts is one type of computational methods. Identification of the named entity of bacteria and related entities from the text is the basis for microbial relation extraction. In the previous work, a system of bacteria named entities recognition based on the dictionary and conditional random field was proposed. However, it is inefficient when dealing with large-scale text.

Results

We implemented bacteria named entity recognition on Spark platform and designed experiments for comparison to verify the correctness and validity of the proposed system. The experimental results show that it can achieve higher F-Measure on the comparison of correctness. Moreover, the predicting speed is much faster than the previous version in large-scale biomedical datasets, and the computational efficiency is improved remarkably by about 3.1 to 6.7 times.

Conclusions

The system for bacteria named entity recognition solves the inefficiency of the previous proposed system on large-scale datasets. The proposed system has good performance in accuracy and scalability.
  相似文献   

2.

Motivation

Biomedical entities, their identifiers and names, are essential in the representation of biomedical facts and knowledge. In the same way, the complete set of biomedical and chemical terms, i.e. the biomedical “term space” (the “Lexeome”), forms a key resource to achieve the full integration of the scientific literature with biomedical data resources: any identified named entity can immediately be normalized to the correct database entry. This goal does not only require that we are aware of all existing terms, but would also profit from knowing all their senses and their semantic interpretation (ambiguities, nestedness).

Result

This study compiles a resource for lexical terms of biomedical interest in a standard format (called “LexEBI”), determines the overall number of terms, their reuse in different resources and the nestedness of terms. LexEBI comprises references for protein and gene entries and their term variants and chemical entities amongst other terms. In addition, disease terms have been identified from Medline and PubmedCentral and added to LexEBI. Our analysis demonstrates that the baseforms of terms from the different semantic types show only little polysemous use. Nonetheless, the term variants of protein and gene names (PGNs) frequently contain species mentions, which should have been avoided according to protein annotation guidelines. Furthermore, the protein and gene entities as well as the chemical entities, both do comprise enzymes leading to hierarchical polysemy, and a large portion of PGNs make reference to a chemical entity. Altogether, according to our analysis based on the Medline distribution, 401,869 unique PGNs in the documents contain a reference to 25,022 chemical entities, 3,125 disease terms or 1,576 species mentions.

Conclusion

LexEBI delivers the complete biomedical and chemical Lexeome in a standardized representation (http://www.ebi.ac.uk/Rebholz-srv/LexEBI/). The resource provides the disease terms as open source content, and fully interlinks terms across resources.  相似文献   

3.
We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal.  相似文献   

4.

Background

Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of the results of this approach, we identified prostate cancer and gene terms with the ID tags of public biomedical databases. Moreover, considering that genetics experts will use our results, we classified them based on six topics that can be used to analyze the type of prostate cancers, genes, and their relations.

Methods

We developed a maximum entropy-based named entity recognizer and a relation recognizer and applied them to a corpus-based approach. We collected prostate cancer-related abstracts from MEDLINE, and constructed an annotated corpus of gene and prostate cancer relations based on six topics by biologists. We used it to train the maximum entropy-based named entity recognizer and relation recognizer.

Results

Topic-classified relation recognition achieved 92.1% precision for the relation (an increase of 11.0% from that obtained in a baseline experiment). For all topics, the precision was between 67.6 and 88.1%.

Conclusion

A series of experimental results revealed two important findings: a carefully designed relation recognition system using named entity recognition can improve the performance of relation recognition, and topic-classified relation recognition can be effectively addressed through a corpus-based approach using manual annotation and machine learning techniques.
  相似文献   

5.

Background

Congenital myasthenic syndromes (CMS) are a heterogeneous group of inherited neuromuscular disorders sharing the common feature of fatigable weakness due to defective neuromuscular transmission. Despite rapidly increasing knowledge about the genetic origins, specific features and potential treatments for the known CMS entities, the lack of standardized classification at the most granular level has hindered the implementation of computer-based systems for knowledge capture and reuse. Where individual clinical or genetic entities do not exist in disease coding systems, they are often invisible in clinical records and inadequately annotated in information systems, and features that apply to one disease but not another cannot be adequately differentiated.

Results

We created a detailed classification of all CMS disease entities suitable for use in clinical and genetic databases and decision support systems. To avoid conflict with existing coding systems as well as with expert-defined group-level classifications, we developed a collaboration with the Orphanet nomenclature for rare diseases, creating a clinically understandable name for each entity and placing it within a logical hierarchy that paves the way towards computer-aided clinical systems and improved knowledge bases for CMS that can adequately differentiate between types and ascribe relevant expert knowledge to each.

Conclusions

We suggest that data science approaches can be used effectively in the clinical domain in a way that does not disrupt preexisting expert classification and that enhances the utility of existing coding systems. Our classification provides a comprehensive view of the individual CMS entities in a manner that supports differential diagnosis and understanding of the range and heterogeneity of the disease but that also enables robust computational coding and hierarchy for machine-readability. It can be extended as required in the light of future scientific advances, but already provides the starting point for the creation of FAIR (Findable, Accessible, Interoperable and Reusable) knowledge bases of data on the congenital myasthenic syndromes.
  相似文献   

6.
The cladistic species concept proposed by Ridley (1989) rests on an undefined notion of speciation and its meaning is thus indeterminate. If the cladistic concept is made determinate through the definition of speciation, then it reduces to a form of whatever species concept is implicit in the definition of speciation and fails to be a truly alternative species concept. The cladistic formalism advocated by Ridley is designed to ensure that species are monophyletic, that they are objectively real entities, and that they are individuals. It is argued that species need not be monophyletic in order to be real entities, and that ancestor-descendant relations are not the only relations that confer individuality on entities. The species problem is recast in terms of a futile quest for a definition of that single kind of entity to which the term species should uniquely apply.  相似文献   

7.

Background

Although ample evidence suggests that emotion and response inhibition are interrelated at the behavioral and neural levels, neural substrates of response inhibition to negative facial information remain unclear. Thus we used event-related potential (ERP) methods to explore the effects of explicit and implicit facial expression processing in response inhibition.

Methods

We used implicit (gender categorization) and explicit emotional Go/Nogo tasks (emotion categorization) in which neutral and sad faces were presented. Electrophysiological markers at the scalp and the voxel level were analyzed during the two tasks.

Results

We detected a task, emotion and trial type interaction effect in the Nogo-P3 stage. Larger Nogo-P3 amplitudes during sad conditions versus neutral conditions were detected with explicit tasks. However, the amplitude differences between the two conditions were not significant for implicit tasks. Source analyses on P3 component revealed that right inferior frontal junction (rIFJ) was involved during this stage. The current source density (CSD) of rIFJ was higher with sad conditions compared to neutral conditions for explicit tasks, rather than for implicit tasks.

Conclusions

The findings indicated that response inhibition was modulated by sad facial information at the action inhibition stage when facial expressions were processed explicitly rather than implicitly. The rIFJ may be a key brain region in emotion regulation.  相似文献   

8.

Background

Biomedical named entity recognition (Bio-NER) is a fundamental task in handling biomedical text terms, such as RNA, protein, cell type, cell line, and DNA. Bio-NER is one of the most elementary and core tasks in biomedical knowledge discovery from texts. The system described here is developed by using the BioNLP/NLPBA 2004 shared task. Experiments are conducted on a training and evaluation set provided by the task organizers.

Results

Our results show that, compared with a baseline having a 70.09% F1 score, the RNN Jordan- and Elman-type algorithms have F1 scores of approximately 60.53% and 58.80%, respectively. When we use CRF as a machine learning algorithm, CCA, GloVe, and Word2Vec have F1 scores of 72.73%, 72.74%, and 72.82%, respectively.

Conclusions

By using the word embedding constructed through the unsupervised learning, the time and cost required to construct the learning data can be saved.
  相似文献   

9.

Background

Web-based, free-text documents on science and technology have been increasing growing on the web. However, most of these documents are not immediately processable by computers slowing down the acquisition of useful information. Computational ontologies might represent a possible solution by enabling semantically machine readable data sets. But, the process of ontology creation, instantiation and maintenance is still based on manual methodologies and thus time and cost intensive.

Method

We focused on a large corpus containing information on researchers, research fields, and institutions. We based our strategy on traditional entity recognition, social computing and correlation. We devised a semi automatic approach for the recognition, correlation and extraction of named entities and relations from textual documents which are then used to create, instantiate, and maintain an ontology.

Results

We present a prototype demonstrating the applicability of the proposed strategy, along with a case study describing how direct and indirect relations can be extracted from academic and professional activities registered in a database of curriculum vitae in free-text format. We present evidence that this system can identify entities to assist in the process of knowledge extraction and representation to support ontology maintenance. We also demonstrate the extraction of relationships among ontology classes and their instances.

Conclusion

We have demonstrated that our system can be used for the conversion of research information in free text format into database with a semantic structure. Future studies should test this system using the growing number of free-text information available at the institutional and national levels.  相似文献   

10.

Background

Human movement can be guided automatically (implicit control) or attentively (explicit control). Explicit control may be engaged when learning a new movement, while implicit control enables simultaneous execution of multiple actions. Explicit and implicit control can often be assigned arbitrarily: we can simultaneously drive a car and tune the radio, seamlessly allocating implicit or explicit control to either action. This flexibility suggests that sensorimotor signals, including those that encode spatially overlapping perception and behavior, can be accurately segregated to explicit and implicit control processes.

Methodology/Principal Findings

We tested human subjects'' ability to segregate sensorimotor signals to parallel control processes by requiring dual (explicit and implicit) control of the same reaching movement and testing for interference between these processes. Healthy control subjects were able to engage dual explicit and implicit motor control without degradation of performance compared to explicit or implicit control alone. We then asked whether segregation of explicit and implicit motor control can be selectively disrupted by studying dual-control performance in subjects with no clinically manifest neurologic deficits in the presymptomatic stage of Huntington''s disease (HD). These subjects performed successfully under either explicit or implicit control alone, but were impaired in the dual-control condition.

Conclusion/Significance

The human nervous system can exert dual control on a single action, and is therefore able to accurately segregate sensorimotor signals to explicit and implicit control. The impairment observed in the presymptomatic stage of HD points to a possible crucial contribution of the striatum to the segregation of sensorimotor signals to multiple control processes.  相似文献   

11.

Background

A variety of options and techniques for causing implicit and explicit motor learning have been described in the literature. The aim of the current paper was to provide clearer guidance for practitioners on how to apply motor learning in practice by exploring experts’ opinions and experiences, using the distinction between implicit and explicit motor learning as a conceptual departure point.

Methods

A survey was designed to collect and aggregate informed opinions and experiences from 40 international respondents who had demonstrable expertise related to motor learning in practice and/or research. The survey was administered through an online survey tool and addressed potential options and learning strategies for applying implicit and explicit motor learning. Responses were analysed in terms of consensus (≥ 70%) and trends (≥ 50%). A summary figure was developed to illustrate a taxonomy of the different learning strategies and options indicated by the experts in the survey.

Results

Answers of experts were widely distributed. No consensus was found regarding the application of implicit and explicit motor learning. Some trends were identified: Explicit motor learning can be promoted by using instructions and various types of feedback, but when promoting implicit motor learning, instructions and feedback should be restricted. Further, for implicit motor learning, an external focus of attention should be considered, as well as practicing the entire skill. Experts agreed on three factors that influence motor learning choices: the learner’s abilities, the type of task, and the stage of motor learning (94.5%; n = 34/36). Most experts agreed with the summary figure (64.7%; n = 22/34).

Conclusion

The results provide an overview of possible ways to cause implicit or explicit motor learning, signposting examples from practice and factors that influence day-to-day motor learning decisions.  相似文献   

12.
Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.  相似文献   

13.

Background  

The number of corpora, collections of structured texts, has been increasing, as a result of the growing interest in the application of natural language processing methods to biological texts. Many named entity recognition (NER) systems have been developed based on these corpora. However, in the biomedical community, there is yet no general consensus regarding named entity annotation; thus, the resources are largely incompatible, and it is difficult to compare the performance of systems developed on resources that were divergently annotated. On the other hand, from a practical application perspective, it is desirable to utilize as many existing annotated resources as possible, because annotation is costly. Thus, it becomes a task of interest to integrate the heterogeneous annotations in these resources.  相似文献   

14.
The exponential growth of the biomedical literature is making the need for efficient, accurate text-mining tools increasingly clear. The identification of named biological entities in text is a central and difficult task. We have developed an efficient algorithm and implementation of a dictionary-based approach to named entity recognition, which we here use to identify names of species and other taxa in text. The tool, SPECIES, is more than an order of magnitude faster and as accurate as existing tools. The precision and recall was assessed both on an existing gold-standard corpus and on a new corpus of 800 abstracts, which were manually annotated after the development of the tool. The corpus comprises abstracts from journals selected to represent many taxonomic groups, which gives insights into which types of organism names are hard to detect and which are easy. Finally, we have tagged organism names in the entire Medline database and developed a web resource, ORGANISMS, that makes the results accessible to the broad community of biologists. The SPECIES software is open source and can be downloaded from http://species.jensenlab.org along with dictionary files and the manually annotated gold-standard corpus. The ORGANISMS web resource can be found at http://organisms.jensenlab.org.  相似文献   

15.
The Botrychium lunaria (Ophioglossaceae) complex worldwide includes the named species B. lunaria, B. crenulatum, B. tunux, and B. yaaxudakeit. These species have been distinguished from each other morphologically and genetically. This study further investigates the genetic diversity and geographic distribution of this complex, examining a large number of plants worldwide. Enzyme electrophoresis was used to examine allelic variation of 22 loci for 1574 plants of putative B. lunaria, B. crenulatum and B. tunux from North America, Eurasia, and New Zealand, and B. dusenii from the Falkland Islands. Variation in allelic composition assessed by genetic identity and cluster analysis using the programs PopGene and STRUCTURE as well as morphology and geography indicated that the complex is composed of six distinct entities; two of which warrant recognition as new species, B. neolunaria, endemic to North America, and B. nordicum, sister to the B. lunaria complex, from Iceland and Norway; and a new combination, B. lunaria var. melzeri , endemic to Greenland, Iceland, and Norway. The new taxa are described in this paper. Three entities within B. tunux are discussed but not proposed for recognition at this time. Botrychium lanceolatum, included in this study, is composed of three morphologically and genetically distinct entities warranting taxonomic recognition.  相似文献   

16.
View from the top: hierarchies and reverse hierarchies in the visual system   总被引:33,自引:0,他引:33  
Hochstein S  Ahissar M 《Neuron》2002,36(5):791-804
We propose that explicit vision advances in reverse hierarchical direction, as shown for perceptual learning. Processing along the feedforward hierarchy of areas, leading to increasingly complex representations, is automatic and implicit, while conscious perception begins at the hierarchy's top, gradually returning downward as needed. Thus, our initial conscious percept--vision at a glance--matches a high-level, generalized, categorical scene interpretation, identifying "forest before trees." For later vision with scrutiny, reverse hierarchy routines focus attention to specific, active, low-level units, incorporating into conscious perception detailed information available there. Reverse Hierarchy Theory dissociates between early explicit perception and implicit low-level vision, explaining a variety of phenomena. Feature search "pop-out" is attributed to high areas, where large receptive fields underlie spread attention detecting categorical differences. Search for conjunctions or fine discriminations depends on reentry to low-level specific receptive fields using serial focused attention, consistent with recently reported primary visual cortex effects.  相似文献   

17.
18.
19.
Age-group membership effects on explicit emotional facial expressions recognition have been widely demonstrated. In this study we investigated whether Age-group membership could also affect implicit physiological responses, as facial mimicry and autonomic regulation, to observation of emotional facial expressions. To this aim, facial Electromyography (EMG) and Respiratory Sinus Arrhythmia (RSA) were recorded from teenager and adult participants during the observation of facial expressions performed by teenager and adult models. Results highlighted that teenagers exhibited greater facial EMG responses to peers'' facial expressions, whereas adults showed higher RSA-responses to adult facial expressions. The different physiological modalities through which young and adults respond to peers'' emotional expressions are likely to reflect two different ways to engage in social interactions with coetaneous. Findings confirmed that age is an important and powerful social feature that modulates interpersonal interactions by influencing low-level physiological responses.  相似文献   

20.

Background  

The increasing amount of published literature in biomedicine represents an immense source of knowledge, which can only efficiently be accessed by a new generation of automated information extraction tools. Named entity recognition of well-defined objects, such as genes or proteins, has achieved a sufficient level of maturity such that it can form the basis for the next step: the extraction of relations that exist between the recognized entities. Whereas most early work focused on the mere detection of relations, the classification of the type of relation is also of great importance and this is the focus of this work. In this paper we describe an approach that extracts both the existence of a relation and its type. Our work is based on Conditional Random Fields, which have been applied with much success to the task of named entity recognition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号