首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Within the structural and grammatical bounds of a common language, all authors develop their own distinctive writing styles. Whether the relative occurrence of common words can be measured to produce accurate models of authorship is of particular interest. This work introduces a new score that helps to highlight such variations in word occurrence, and is applied to produce models of authorship of a large group of plays from the Shakespearean era.

Methodology

A text corpus containing 55,055 unique words was generated from 168 plays from the Shakespearean era (16th and 17th centuries) of undisputed authorship. A new score, CM1, is introduced to measure variation patterns based on the frequency of occurrence of each word for the authors John Fletcher, Ben Jonson, Thomas Middleton and William Shakespeare, compared to the rest of the authors in the study (which provides a reference of relative word usage at that time). A total of 50 WEKA methods were applied for Fletcher, Jonson and Middleton, to identify those which were able to produce models yielding over 90% classification accuracy. This ensemble of WEKA methods was then applied to model Shakespearean authorship across all 168 plays, yielding a Matthews'' correlation coefficient (MCC) performance of over 90%. Furthermore, the best model yielded an MCC of 99%.

Conclusions

Our results suggest that different authors, while adhering to the structural and grammatical bounds of a common language, develop measurably distinct styles by the tendency to over-utilise or avoid particular common words and phrasings. Considering language and the potential of words as an abstract chaotic system with a high entropy, similarities can be drawn to the Maxwell''s Demon thought experiment; authors subconsciously favour or filter certain words, modifying the probability profile in ways that could reflect their individuality and style.  相似文献   

2.
OBJECTIVE: To assess knowledge, views, and behaviour of researchers on criteria for authorship and causes and control of gift authorship. DESIGN: Interview survey of stratified sample of researchers. SETTING: University medical faculty. SUBJECTS: 66 staff (94% response rate) comprising several levels of university academic and research appointments. MAIN OUTCOME MEASURES: Awareness and use of criteria for authorship, views on which contributions to research merit authorship, perceptions about gift authorship and strategies for reducing it, and experiences of authorship problems. RESULTS: 50 (76%) respondents supported criteria for authorship, but few knew about or used available criteria. Of the five people who could specify all three criteria of the International Committee of Medical Journal Editors, only one knew that all criteria had to be met. Forty one respondents (62%) disagreed with this stipulation. A range of practical and academic contributions were seen as sufficient for authorship. Gift authorship was perceived as common, promoted by pressure to publish, to motivate research teams, and to maintain working relationships. A signed statement justifying authorship and a published statement of the contribution of each author were perceived as practical ways of tackling gift authorship. Most researchers had experienced problems with authorship, most commonly the perception that authorship had been deserved but not awarded (49%). CONCLUSION: There seems to be a gap between editors'' criteria for authorship and researchers'' practice. Lack of awareness of criteria is only a partial explanation. Researchers give more weight than editors to practical research contributions. Future criteria should be agreed by researchers and not be imposed by editors.  相似文献   

3.
4.
5.
The Homeric epics are among the greatest masterpieces of literature, but when they were produced is not known with certainty. Here we apply evolutionary‐linguistic phylogenetic statistical methods to differences in Homeric, Modern Greek and ancient Hittite vocabulary items to estimate a date of approximately 710–760 BCE for these great works. Our analysis compared a common set of vocabulary items among the three pairs of languages, recording for each item whether the words in the two languages were cognate – derived from a shared ancestral word – or not. We then used a likelihood‐based Markov chain Monte Carlo procedure to estimate the most probable times in years separating these languages given the percentage of words they shared, combined with knowledge of the rates at which different words change. Our date for the epics is in close agreement with historians' and classicists' beliefs derived from historical and archaeological sources.  相似文献   

6.
7.

Background

Zipf''s law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,…) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random characters including blanks behaving as word delimiters - exhibit a Zipf''s law-like word rank distribution.

Methodology/Principal Findings

In this article, we examine the flaws of such putative good fits of random texts. We demonstrate - by means of three different statistical tests - that ranks derived from random texts and ranks derived from real texts are statistically inconsistent with the parameters employed to argue for such a good fit, even when the parameters are inferred from the target real text. Our findings are valid for both the simplest random texts composed of equally likely characters as well as more elaborate and realistic versions where character probabilities are borrowed from a real text.

Conclusions/Significance

The good fit of random texts to real Zipf''s law-like rank distributions has not yet been established. Therefore, we suggest that Zipf''s law might in fact be a fundamental law in natural languages.  相似文献   

8.
Systemic lupus erythematosus (SLE) is an autoimmune disorder with several clinical manifestations. SLE etiology has a strong genetic component, which plays a key role in disease's predisposition, as well as participation of environmental factors, such and UV light exposure. In this regard, we investigated whether polymorphisms in STK17A, a DNA repair related gene, encoding for serine/threonine-protein kinase 17A, are associated with SLE susceptibility. A total of 143 SLE patients and 177 healthy controls from Southern Brazil were genotyped for five STK17A TagSNPs. Our results indicated association of rs7805969 SNP (A and G/A genotype, OR = 1.40 and OR = 1.73, respectively) with SLE predisposition and the following clinical manifestations: arthritis, cutaneous and immunological alterations. When analyzing haplotypes distribution, we found association between TGGTC, TAGTC and AAGAT haplotypes and risk to develop SLE. When considering clinical manifestations, the haplotypes TGGTT and TAGTC were associated with protection against cutaneous alterations and the haplotype TAGTC to hematological alterations. We also observed association between SLE clinical manifestations and ethnicity, with the European-derived patients being more susceptible to cutaneous and hematological alterations.  相似文献   

9.
10.
Over the past 10 years the AIDS crisis has produced a large volume of writing. Much of this is documentary. Dozens of studies of AIDS from various clinical and political perspectives have been complemented by just as many published diaries, autobiographies, novels, plays, and poems. A few of these works have risen to the surface not only as extraordinarily valuable testimonies to the changes AIDS has wrought in individual and collective life but also as first-rate literary works, worth reading because beyond their immediate purposes they articulate with extraordinary lucidity and compassion some deep truths about the human--and the modern--condition. Paul Monette's Borrowed Time is among the most distinctive of those. It speaks not only for the community of people with AIDS and those who support them but for a generation.  相似文献   

11.
The aim of this methods paper is to describe how to implement a neuroimaging technique to examine complementary brain processes engaged by two similar tasks. Participants'' behavior during task performance in an fMRI scanner can then be correlated to the brain activity using the blood-oxygen-level-dependent signal. We measure behavior to be able to sort correct trials, where the subject performed the task correctly and then be able to examine the brain signals related to correct performance. Conversely, if subjects do not perform the task correctly, and these trials are included in the same analysis with the correct trials we would introduce trials that were not only for correct performance. Thus, in many cases these errors can be used themselves to then correlate brain activity to them. We describe two complementary tasks that are used in our lab to examine the brain during suppression of an automatic responses: the stroop1 and anti-saccade tasks. The emotional stroop paradigm instructs participants to either report the superimposed emotional ''word'' across the affective faces or the facial ''expressions'' of the face stimuli1,2. When the word and the facial expression refer to different emotions, a conflict between what must be said and what is automatically read occurs. The participant has to resolve the conflict between two simultaneously competing processes of word reading and facial expression. Our urge to read out a word leads to strong ''stimulus-response (SR)'' associations; hence inhibiting these strong SR''s is difficult and participants are prone to making errors. Overcoming this conflict and directing attention away from the face or the word requires the subject to inhibit bottom up processes which typically directs attention to the more salient stimulus. Similarly, in the anti-saccade task3,4,5,6, where an instruction cue is used to direct only attention to a peripheral stimulus location but then the eye movement is made to the mirror opposite position. Yet again we measure behavior by recording the eye movements of participants which allows for the sorting of the behavioral responses into correct and error trials7 which then can be correlated to brain activity. Neuroimaging now allows researchers to measure different behaviors of correct and error trials that are indicative of different cognitive processes and pinpoint the different neural networks involved.  相似文献   

12.
In this paper, we develop two automated authorship attribution schemes, one based on Multiple Discriminant Analysis (MDA) and the other based on a Support Vector Machine (SVM). The classification features we exploit are based on word frequencies in the text. We adopt an approach of preprocessing each text by stripping it of all characters except a-z and space. This is in order to increase the portability of the software to different types of texts. We test the methodology on a corpus of undisputed English texts, and use leave-one-out cross validation to demonstrate classification accuracies in excess of 90%. We further test our methods on the Federalist Papers, which have a partly disputed authorship and a fair degree of scholarly consensus. And finally, we apply our methodology to the question of the authorship of the Letter to the Hebrews by comparing it against a number of original Greek texts of known authorship. These tests identify where some of the limitations lie, motivating a number of open questions for future work. An open source implementation of our methodology is freely available for use at https://github.com/matthewberryman/author-detection.  相似文献   

13.
The human gut microbiome plays a crucial role in human health and efforts need to be done for cultivation and characterisation of bacteria with potential health benefits. Here, we isolated a bacterium from a healthy Indian adult faeces and investigated its potential as probiotic. The cultured bacterial strain 17OM39 was identified as Enterococcus faecium by 16S rRNA gene sequencing. The strain 17OM39 exhibited tolerance to acidic pH, showed antimicrobial activity and displayed strong cell surface traits such as hydrophobicity and autoaggregation capacity. The strain was able to tolerate bile salts and showed bile salt hydrolytic (BSH) activity, exopolysaccharide production and adherence to human HT-29 cell line. Importantly, partial haemolytic activity was detected and the strain was susceptible to the human serum. Genomics investigation of strain 17OM39 revealed the presence of diverse genes encoding for proteolytic enzymes, stress response systems and the ability to produce essential amino acids, vitamins and antimicrobial compound Bacteriocin-A. No virulence factors and plasmids were found in this genome of the strain 17OM39. Collectively, these physiological and genomic features of 17OM39 confirm the potential of this strain as a candidate probiotic.  相似文献   

14.
This report describes the 17th Chromosome‐Centric Human Proteome Project which was held in Tehran, Iran, April 27 and 28, 2017. A brief summary of the symposium's talks including new technical and computational approaches for the identification of novel proteins from non‐coding genomic regions, physicochemical and biological causes of missing proteins, and the close interactions between Chromosome‐ and Biology/Disease‐driven Human Proteome Project are presented. A synopsis of decisions made on the prospective programs to maintain collaborative works, share resources and information, and establishment of a newly organized working group, the task force for missing protein analysis are discussed.  相似文献   

15.
It is well-known that word frequencies arrange themselves according to Zipf''s law. However, little is known about the dependency of the parameters of the law and the complexity of a communication system. Many models of the evolution of language assume that the exponent of the law remains constant as the complexity of a communication systems increases. Using longitudinal studies of child language, we analysed the word rank distribution for the speech of children and adults participating in conversations. The adults typically included family members (e.g., parents) or the investigators conducting the research. Our analysis of the evolution of Zipf''s law yields two main unexpected results. First, in children the exponent of the law tends to decrease over time while this tendency is weaker in adults, thus suggesting this is not a mere mirror effect of adult speech. Second, although the exponent of the law is more stable in adults, their exponents fall below 1 which is the typical value of the exponent assumed in both children and adults. Our analysis also shows a tendency of the mean length of utterances (MLU), a simple estimate of syntactic complexity, to increase as the exponent decreases. The parallel evolution of the exponent and a simple indicator of syntactic complexity (MLU) supports the hypothesis that the exponent of Zipf''s law and linguistic complexity are inter-related. The assumption that Zipf''s law for word ranks is a power-law with a constant exponent of one in both adults and children needs to be revised.  相似文献   

16.
The olfactory system of the pigeon (Columba livia) was examined. Our electrophysiological and experimental neuroanatomical (Fink-Heimer technique) data showed that axons from the olfactory bulb terminated in both sides of the forebrain. The cortex prepiriformis (olfactory cortex), the hyperstriatum ventrale and the lobus parolfactorius comprised the uncrossed terminal field. The crossed field included the paleostriatum primitivum and the caudal portion of the lobus parolfactorius, areas which were reached through the anterior commissure. In this report the relationships between areas that receive olfactory information and the possible roles that olfaction plays in the birds' behavior are discussed.  相似文献   

17.
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between their intrinsic properties and the environments in which they function. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.  相似文献   

18.
The word tradition has a very specific meaning in linguistics: the passing down of a text, which may have been completed or corrected by different copyists at different times, when the concept of authorship was not the same as it is today. When reading an ancient text the word tradition must be in the reader's mind. To discuss one of the problems an ancient text poses to its modern readers, this work deals with one of the first printed medical texts in Portuguese, the Regimento proueytoso contra ha pestenen?a, and draws a parallel between it and two related texts, A moche profitable treatise against the pestilence, and the Recopila?am das cousas que conuem guardar se no modo de preseruar à Cidade de Lixboa E os s?os, & curar os que esteuerem enfermos de Peste. The problems which arise out of the textual structure of those books show how difficult is to establish a tradition of another type, the medical tradition. The linguistic study of the innumerable medieval plague treatises may throw light on the continuities and on the disruptions of the so-called hippocratic-galenical medical tradition.  相似文献   

19.

Background

The journal Impact factor (IF) is generally accepted to be a good measurement of the relevance/quality of articles that a journal publishes. In spite of an, apparently, homogenous peer-review process for a given journal, we hypothesize that the country affiliation of authors from developing Latin American (LA) countries affects the IF of a journal detrimentally.

Methodology/Principal Findings

Seven prestigious international journals, one multidisciplinary journal and six serving specific branches of science, were examined in terms of their IF in the Web of Science. Two subsets of each journal were then selected to evaluate the influence of author''s affiliation on the IF. They comprised contributions (i) with authorship from four Latin American (LA) countries (Argentina, Brazil, Chile and Mexico) and (ii) with authorship from five developed countries (England, France, Germany, Japan and USA). Both subsets were further subdivided into two groups: articles with authorship from one country only and collaborative articles with authorship from other countries. Articles from the five developed countries had IF close to the overall IF of the journals and the influence of collaboration on this value was minor. In the case of LA articles the effect of collaboration (virtually all with developed countries) was significant. The IFs for non-collaborative articles averaged 66% of the overall IF of the journals whereas the articles in collaboration raised the IFs to values close to the overall IF.

Conclusion/Significance

The study shows a significantly lower IF in the group of the subsets of non-collaborative LA articles and thus that country affiliation of authors from non-developed LA countries does affect the IF of a journal detrimentally. There are no data to indicate whether the lower IFs of LA articles were due to their inherent inferior quality/relevance or psycho-social trend towards under-citation of articles from these countries. However, further study is required since there are foreseeable consequences of this trend as it may stimulate strategies by editors to turn down articles that tend to be under-cited.  相似文献   

20.
Traditionally, language processing has been attributed to a separate system in the brain, which supposedly works in an abstract propositional manner. However, there is increasing evidence suggesting that language processing is strongly interrelated with sensorimotor processing. Evidence for such an interrelation is typically drawn from interactions between language and perception or action. In the current study, the effect of words that refer to entities in the world with a typical location (e.g., sun, worm) on the planning of saccadic eye movements was investigated. Participants had to perform a lexical decision task on visually presented words and non-words. They responded by moving their eyes to a target in an upper (lower) screen position for a word (non-word) or vice versa. Eye movements were faster to locations compatible with the word''s referent in the real world. These results provide evidence for the importance of linguistic stimuli in directing eye movements, even if the words do not directly transfer directional information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号