首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This article surveys efforts on text mining of the pharmacogenomics literature, mainly from the period 2008 to 2011. Pharmacogenomics (or pharmacogenetics) is the field that studies how human genetic variation impacts drug response. Therefore, publications span the intersection of research in genotypes, phenotypes and pharmacology, a topic that has increasingly become a focus of active research in recent years. This survey covers efforts dealing with the automatic recognition of relevant named entities (e.g. genes, gene variants and proteins, diseases and other pathological phenomena, drugs and other chemicals relevant for medical treatment), as well as various forms of relations between them. A wide range of text genres is considered, such as scientific publications (abstracts, as well as full texts), patent texts and clinical narratives. We also discuss infrastructure and resources needed for advanced text analytics, e.g. document corpora annotated with corresponding semantic metadata (gold standards and training data), biomedical terminologies and ontologies providing domain-specific background knowledge at different levels of formality and specificity, software architectures for building complex and scalable text analytics pipelines and Web services grounded to them, as well as comprehensive ways to disseminate and interact with the typically huge amounts of semiformal knowledge structures extracted by text mining tools. Finally, we consider some of the novel applications that have already been developed in the field of pharmacogenomic text mining and point out perspectives for future research.  相似文献   

2.
SUMMARY: CNplot is a simple technique for the visualization of global connectivity within pre-clustered network data. CNplot is easy to implement and in most cases produces informative and satisfactory summary of the data. AVAILABILITY: A Java implementation is available that allows users to modify graphics parameters and produces a LaTeX output. This software is free and is available at http://csb.stanford.edu/nbatada/VCN.html  相似文献   

3.
In this article, we offer a critical view of Thibodeau and Boroditsky who report an effect of metaphorical framing on readers'' preference for political measures after exposure to a short text on the increase of crime in a fictitious town: when crime was metaphorically presented as a beast, readers became more enforcement-oriented than when crime was metaphorically framed as a virus. We argue that the design of the study has left room for alternative explanations. We report four experiments comprising a follow-up study, remedying several shortcomings in the original design while collecting more encompassing sets of data. Our experiments include three additions to the original studies: (1) a non-metaphorical control condition, which is contrasted to the two metaphorical framing conditions used by Thibodeau and Boroditsky, (2) text versions that do not have the other, potentially supporting metaphors of the original stimulus texts, (3) a pre-exposure measure of political preference (Experiments 1–2). We do not find a metaphorical framing effect but instead show that there is another process at play across the board which presumably has to do with simple exposure to textual information. Reading about crime increases people''s preference for enforcement irrespective of metaphorical frame or metaphorical support of the frame. These findings suggest the existence of boundary conditions under which metaphors can have differential effects on reasoning. Thus, our four experiments provide converging evidence raising questions about when metaphors do and do not influence reasoning.  相似文献   

4.

Background

Zipf''s law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,…) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf''s law-like word rank distribution.

Methodology/Principal Findings

In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text.

Conclusions/Significance

The good fit of random texts to real Zipf''s law-like rank distributions has not yet been established. Therefore, we suggest that Zipf''s law might in fact be a fundamental law in natural languages.  相似文献   

5.
MethodsWe developed an interactive 3D PDF report document format and implemented a software tool to create these reports automatically. After more than 1000 liver CASP cases that have been reported in clinical routine using our 3D PDF report, an international user survey was carried out online to evaluate the user experience.ResultsOur solution enables the user to interactively explore the anatomical configuration and to have different analyses and various resection proposals displayed within a 3D PDF document covering only a single page that acts more like a software application than like a typical PDF file (“PDF App”). The new 3D PDF report offers many advantages over the previous solutions. According to the results of the online survey, the users have assessed the pragmatic quality (functionality, usability, perspicuity, efficiency) as well as the hedonic quality (attractiveness, novelty) very positively.ConclusionThe usage of 3D PDF for reporting and sharing CASP results is feasible and well accepted by the target audience. Using interactive PDF with embedded 3D models is an enabler for presenting and exchanging complex medical information in an easy and platform-independent way. Medical staff as well as patients can benefit from the possibilities provided by 3D PDF. Our results open the door for a wider use of this new technology, since the basic idea can and should be applied for many medical disciplines and use cases.  相似文献   

6.
7.
Cooperation is central to human existence, forming the bedrock of everyday social relationships and larger societal structures. Thus, understanding the psychological underpinnings of cooperation is of both scientific and practical importance. Recent work using a dual-process framework suggests that intuitive processing can promote cooperation while deliberative processing can undermine it. Here we add to this line of research by more specifically identifying deliberative and intuitive processes that affect cooperation. To do so, we applied automated text analysis using the Linguistic Inquiry and Word Count (LIWC) software to investigate the association between behavior in one-shot anonymous economic cooperation games and the presence inhibition (a deliberative process) and positive emotion (an intuitive process) in free-response narratives written after (Study 1, N = 4,218) or during (Study 2, N = 236) the decision-making process. Consistent with previous results, across both studies inhibition predicted reduced cooperation while positive emotion predicted increased cooperation (even when controlling for negative emotion). Importantly, there was a significant interaction between positive emotion and inhibition, such that the most cooperative individuals had high positive emotion and low inhibition. This suggests that inhibition (i.e., reflective or deliberative processing) may undermine cooperative behavior by suppressing the prosocial effects of positive emotion.  相似文献   

8.
Cabrera León N 《BioTechniques》1999,27(6):1228-1231
This paper describes a Microsoft Word 97 macro designed for restriction endonuclease analysis. Selected DNA fragments in the active Word document can be analyzed through a dynamic dialog box that formats the enzyme restriction lists for further analysis. The results can be obtained in a new Word document with the name of the enzymes, number of cuts and positions. This macro has several advantages: the results can be printed in a format suitable for record keeping, no additional programs are required and it is simple to use.  相似文献   

9.
The structure of patient information leaflets (PILs) supplied with medicines in the European Union is largely determined by a regulatory template, requiring a fixed sequence of pre-formulated headings and sub-headings. The template has been criticized on various occasions, but it has never been tested with users. This paper proposes an alternative template, informed by templates used in the USA and Australia, and by previous user testing.The main research question is whether the revision better enables users to find relevant information. Besides, the paper proposes a methodology for testing templates. Testing document templates is complex, as they are “empty”. For both the current and the alternative template, we produced a document with bogus text and real headings (reflecting the empty template) and a real-life document with readable text (reflecting the “filled” template). The documents were tested both in Dutch and in English, with 64 British and 64 Dutch users. The test used a set of scenario questions that covers the full range of template (sub)topics; users needed to indicate the text locations where they expected each question to be answered. The revised template improved findability of information; this effect was strongest for the “filled” template with readable text. When participants were shown both filled templates, there was a clear preference for the revised template. A closer analysis of the findability data revealed question-specific effects of topic grouping, topic ordering, subtopic granularity and wording of headings. Most of these favoured the revised template, but our revision led to adverse effects as well, for instance in the new heading Check with your doctor. Language-specific effects showed that the wording of the headings is a delicate task. Generally, we conclude that document template designs can be analyzed in terms of the four parameters grouping, ordering, granularity and wording. Furthermore, they need to be tested on their effects on information findability, with template translations requiring separate testing. The methodology used in this study seems an appropriate one for such tests. More specifically, we find that the new patient information leaflet template proposed here provides better information findability.  相似文献   

10.
How can one understand the increasing interest in “urban invasions”, or biological invasions in urban environments? We argue that interest in urban invasions echoes a broader evolution in how ecologists view “the city” in relation to “the natural”. Previously stark categorical distinctions between urban and natural, human and wild, city and ecology have floundered. Drawing on conceptual material and an analysis of key texts, we first show how the ecological sciences in general—and then invasion science in particular—previously had a blind spot for cities, despite a number of important historical and continental European exceptions. Then, we document the advent of an urban turn in ecology and, more recently, in invasion ecology, and how this has challenged fundamental concepts about “nativity”, “naturalness”, and human agency in nature. The urban turn necessitates more explicit and direct attention to human roles and judgements. Ecology has moved from contempt (or indifference) for cities, towards interest or even sympathy.  相似文献   

11.
MOTIVATION: Full-text documents potentially hold more information than their abstracts, but require more resources for processing. We investigated the added value of full text over abstracts in terms of information content and occurrences of gene symbol--gene name combinations that can resolve gene-symbol ambiguity. RESULTS: We analyzed a set of 3902 biomedical full-text articles. Different keyword measures indicate that information density is highest in abstracts, but that the information coverage in full texts is much greater than in abstracts. Analysis of five different standard sections of articles shows that the highest information coverage is located in the results section. Still, 30-40% of the information mentioned in each section is unique to that section. Only 30% of the gene symbols in the abstract are accompanied by their corresponding names, and a further 8% of the gene names are found in the full text. In the full text, only 18% of the gene symbols are accompanied by their gene names.  相似文献   

12.
Objective To compare the demographic characteristics and risk behaviors for hepatitis B infection among injection drug users younger than 30 years with those aged 30 or older and to evaluate participants'' knowledge, attitudes, and experiences of infection, screening, and vaccination against hepatitis B virus. Design A systematic sample of injection drug users not currently in a treatment program were recruited and interviewed at needle exchange programs and community sites. Participants 135 injection drug users younger than 30 years and 96 injection drug users aged 30 or older. Results Injection drug users younger than 30 were twice as likely as drug users aged 30 or older to report having shared needles in the past 30 days (36/135 [27%] vs 12/96 [13%]). Injection drug users younger than 30 were also twice as likely to report having had more than two sexual partners in the past 6 months (80/135 [59%] vs 29/96 [30%]). Although 88 of 135 (68%) young injection drug users reported having had contact with medical providers within the past 6 months only 13 of 135 (10%) had completed the hepatitis B vaccine series and only 16 of (13%) perceived themselves as being at high risk of becoming infected with the virus. Conclusion Few young injection drug users have been immunized even though they have more frequent contact with medical providers and are at a higher risk for new hepatitis B infection than older drug users. Clinicians caring for young injection drug users and others at high risk of infection should provide education, screening, and vaccination to reduce an important source of hepatitis B infection.  相似文献   

13.
Tool making or modification to produce a tool of apparent improved functionality has rarely been reported in monkeys, especially when tools are used outside the context of food acquisition. We report on an observation of selection, modification and use of splinters for hygiene purposes in a male mandrill. The zoo-housed animal was video-recorded breaking splinters in sequence to use them underneath his toenails. This record brings forward new evidence that the ability to use and modify tools is not limited to apes and some New World monkeys but is also apparent in Old Word monkeys.  相似文献   

14.
ABSTRACT: BACKGROUND: A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. RESULTS: We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9 % and recall = 70.5 %) compared to a popular dictionary based approach (precision = 97.5 % and recall = 54.3 %) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5 % and 96.2 % respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Additionally, we present the comparison results of various machine learning algorithms on our annotated corpus. Naive Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms. CONCLUSIONS: We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org.  相似文献   

15.
Despite being a paradigm of quantitative linguistics, Zipf’s law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf’s law in texts as anecdotic. We try to solve these issues by studying three different versions of Zipf’s law and fitting them to all available English texts in the Project Gutenberg database (consisting of more than 30 000 texts). To do so we use state-of-the art tools in fitting and goodness-of-fit tests, carefully tailored to the peculiarities of text statistics. Remarkably, one of the three versions of Zipf’s law, consisting of a pure power-law form in the complementary cumulative distribution function of word frequencies, is able to fit more than 40% of the texts in the database (at the 0.05 significance level), for the whole domain of frequencies (from 1 to the maximum value), and with only one free parameter (the exponent).  相似文献   

16.
As people increasingly communicate via asynchronous non-spoken modes on mobile devices, particularly text messaging (e.g., SMS), longstanding assumptions and practices of social measurement via telephone survey interviewing are being challenged. In the study reported here, 634 people who had agreed to participate in an interview on their iPhone were randomly assigned to answer 32 questions from US social surveys via text messaging or speech, administered either by a human interviewer or by an automated interviewing system. 10 interviewers from the University of Michigan Survey Research Center administered voice and text interviews; automated systems launched parallel text and voice interviews at the same time as the human interviews were launched. The key question was how the interview mode affected the quality of the response data, in particular the precision of numerical answers (how many were not rounded), variation in answers to multiple questions with the same response scale (differentiation), and disclosure of socially undesirable information. Texting led to higher quality data—fewer rounded numerical answers, more differentiated answers to a battery of questions, and more disclosure of sensitive information—than voice interviews, both with human and automated interviewers. Text respondents also reported a strong preference for future interviews by text. The findings suggest that people interviewed on mobile devices at a time and place that is convenient for them, even when they are multitasking, can give more trustworthy and accurate answers than those in more traditional spoken interviews. The findings also suggest that answers from text interviews, when aggregated across a sample, can tell a different story about a population than answers from voice interviews, potentially altering the policy implications from a survey.  相似文献   

17.
Expanding digital data sources, including social media, online news articles and blogs, provide an opportunity to understand better the context and intensity of human-nature interactions, such as wildlife exploitation. However, online searches encompassing large taxonomic groups can generate vast datasets, which can be overwhelming to filter for relevant content without the use of automated tools. The variety of machine learning models available to researchers, and the need for manually labelled training data with an even balance of labels, can make applying these tools challenging. Here, we implement and evaluate a hierarchical text classification pipeline which brings together three binary classification tasks with increasingly specific relevancy criteria. Crucially, the hierarchical approach facilitates the filtering and structuring of a large dataset, of which relevant sources make up a small proportion. Using this pipeline, we also investigate how the accuracy with which text classifiers identify relevant and irrelevant texts is influenced by the use of different models, training datasets, and the classification task. To evaluate our methods, we collected data from Facebook, Twitter, Google and Bing search engines, with the aim of identifying sources documenting the hunting and persecution of bats (Chiroptera). Overall, the ‘state-of-the-art’ transformer-based models were able to identify relevant texts with an average accuracy of 90%, with some classifiers achieving accuracy of >95%. Whilst this demonstrates that application of more advanced models can lead to improved accuracy, comparable performance was achieved by simpler models when applied to longer documents and less ambiguous classification tasks. Hence, the benefits from using more computationally expensive models are dependent on the classification context. We also found that stratification of training data, according to the presence of key search terms, improved classification accuracy for less frequent topics within datasets, and therefore improves the applicability of classifiers to future data collection. Overall, whilst our findings reinforce the usefulness of automated tools for facilitating online analyses in conservation and ecology, they also highlight that the effectiveness and appropriateness of such tools is determined by the nature and volume of data collected, the complexity of the classification task, and the computational resources available to researchers.  相似文献   

18.

Background  

A biomedical entity mention in articles and other free texts is often ambiguous. For example, 13% of the gene names (aliases) might refer to more than one gene. The task of Gene Symbol Disambiguation (GSD) – a special case of Word Sense Disambiguation (WSD) – is to assign a unique gene identifier for all identified gene name aliases in biology-related articles. Supervised and unsupervised machine learning WSD techniques have been applied in the biomedical field with promising results. We examine here the utilisation potential of the fact – one of the special features of biological articles – that the authors of the documents are known through graph-based semi-supervised methods for the GSD task.  相似文献   

19.
STORM is a recently developed super-resolution microscopy technique with up to 10 times better resolution than standard fluorescence microscopy techniques. However, as the image is acquired in a very different way than normal, by building up an image molecule-by-molecule, there are some significant challenges for users in trying to optimize their image acquisition. In order to aid this process and gain more insight into how STORM works we present the preparation of 3 test samples and the methodology of acquiring and processing STORM super-resolution images with typical resolutions of between 30-50 nm. By combining the test samples with the use of the freely available rainSTORM processing software it is possible to obtain a great deal of information about image quality and resolution. Using these metrics it is then possible to optimize the imaging procedure from the optics, to sample preparation, dye choice, buffer conditions, and image acquisition settings. We also show examples of some common problems that result in poor image quality, such as lateral drift, where the sample moves during image acquisition and density related problems resulting in the ''mislocalization'' phenomenon.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号