首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), as well as our system for filtering general Twitter data based on health keywords and supervised classification. We show how ATAM and other topic models can automatically infer health topics in 144 million Twitter messages from 2011 to 2013. ATAM discovered 13 coherent clusters of Twitter messages, some of which correlate with seasonal influenza (r = 0.689) and allergies (r = 0.810) temporal surveillance data, as well as exercise (r = .534) and obesity (r = −.631) related geographic survey data in the United States. These results demonstrate that it is possible to automatically discover topics that attain statistically significant correlations with ground truth data, despite using minimal human supervision and no historical data to train the model, in contrast to prior work. Additionally, these results demonstrate that a single general-purpose model can identify many different health topics in social media.  相似文献   

2.
Micro-blogging services, such as Twitter, offer opportunities to analyse user behaviour. Discovering and distinguishing behavioural patterns in micro-blogging services is valuable. However, it is difficult and challenging to distinguish users, and to track the temporal development of collective attention within distinct user groups in Twitter. In this paper, we formulate this problem as tracking matrices decomposed by Nonnegative Matrix Factorisation for time-sequential matrix data, and propose a novel extension of Nonnegative Matrix Factorisation, which we refer to as Time Evolving Nonnegative Matrix Factorisation (TENMF). In our method, we describe users and words posted in some time interval by a matrix, and use several matrices as time-sequential data. Subsequently, we apply Time Evolving Nonnegative Matrix Factorisation to these time-sequential matrices. TENMF can decompose time-sequential matrices, and can track the connection among decomposed matrices, whereas previous NMF decomposes a matrix into two lower dimension matrices arbitrarily, which might lose the time-sequential connection. Our proposed method has an adequately good performance on artificial data. Moreover, we present several results and insights from experiments using real data from Twitter.  相似文献   

3.
The pervasiveness of mobile devices, which is increasing daily, is generating a vast amount of geo-located data allowing us to gain further insights into human behaviors. In particular, this new technology enables users to communicate through mobile social media applications, such as Twitter, anytime and anywhere. Thus, geo-located tweets offer the possibility to carry out in-depth studies on human mobility. In this paper, we study the use of Twitter in transportation by identifying tweets posted from roads and rails in Europe between September 2012 and November 2013. We compute the percentage of highway and railway segments covered by tweets in 39 countries. The coverages are very different from country to country and their variability can be partially explained by differences in Twitter penetration rates. Still, some of these differences might be related to cultural factors regarding mobility habits and interacting socially online. Analyzing particular road sectors, our results show a positive correlation between the number of tweets on the road and the Average Annual Daily Traffic on highways in France and in the UK. Transport modality can be studied with these data as well, for which we discover very heterogeneous usage patterns across the continent.  相似文献   

4.
Social media are increasingly reflecting and influencing behavior of other complex systems. In this paper we investigate the relations between a well-known micro-blogging platform Twitter and financial markets. In particular, we consider, in a period of 15 months, the Twitter volume and sentiment about the 30 stock companies that form the Dow Jones Industrial Average (DJIA) index. We find a relatively low Pearson correlation and Granger causality between the corresponding time series over the entire time period. However, we find a significant dependence between the Twitter sentiment and abnormal returns during the peaks of Twitter volume. This is valid not only for the expected Twitter volume peaks (e.g., quarterly announcements), but also for peaks corresponding to less obvious events. We formalize the procedure by adapting the well-known “event study” from economics and finance to the analysis of Twitter data. The procedure allows to automatically identify events as Twitter volume peaks, to compute the prevailing sentiment (positive or negative) expressed in tweets at these peaks, and finally to apply the “event study” methodology to relate them to stock returns. We show that sentiment polarity of Twitter peaks implies the direction of cumulative abnormal returns. The amount of cumulative abnormal returns is relatively low (about 1–2%), but the dependence is statistically significant for several days after the events.  相似文献   

5.
6.
Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach.  相似文献   

7.

Objective

This study explores the presence and actions of an electronic cigarette (e-cigarette) brand, Blu, on Twitter to observe how marketing messages are sent and diffused through the retweet (i.e., message forwarding) functionality. Retweet networks enable messages to reach additional Twitter users beyond the sender’s local network. We follow messages from their origin through multiple retweets to identify which messages have more reach, and the different users who are exposed.

Methods

We collected three months of publicly available data from Twitter. A combination of techniques in social network analysis and content analysis were applied to determine the various networks of users who are exposed to e-cigarette messages and how the retweet network can affect which messages spread.

Results

The Blu retweet network expanded during the study period. Analysis of user profiles combined with network cluster analysis showed that messages of certain topics were only circulated within a community of e-cigarette supporters, while other topics spread further, reaching more general Twitter users who may not support or use e-cigarettes.

Conclusions

Retweet networks can serve as proxy filters for marketing messages, as Twitter users decide which messages they will continue to diffuse among their followers. As certain e-cigarette messages extend beyond their point of origin, the audience being exposed expands beyond the e-cigarette community. Potential implications for health education campaigns include utilizing Twitter and targeting important gatekeepers or hubs that would maximize message diffusion.  相似文献   

8.
Theodore D. Cosco 《CMAJ》2015,187(18):1353-1357

Background:

Twitter is an increasingly popular means of research dissemination. I sought to examine the relation between scientific merit and mainstream popularity of general medical journals.

Methods:

I extracted impact factors and citations for 2014 for all general medical journals listed in the Thomson Reuters InCites Journal Citation Reports. I collected Twitter statistics (number of followers, number following, number of tweets) between July 25 and 27, 2015 from the Twitter profiles of journals that had Twitter accounts. I calculated the ratio of observed to expected Twitter followers according to citations via the Kardashian Index. I created the (Fifty Shades of) Grey Scale to calculate the analogous ratio according to impact factor.

Results:

Only 28% (43/153) of journals had Twitter profiles. The scientific and social media impact of journals were correlated: in adjusted models, Twitter followers increased by 0.78% (95% confidence interval [CI] 0.38%–1.18%) for every 1% increase in impact factor and by 0.62% (95% CI 0.34%–0.90%) for every 1% increase in citations. Kardashian Index scores above the 99% CI were obsverved in 16% (7/43) of journals, including 6 of the 7 highest ranked journals by impact factor, whereas 58% (25/43) had scores below this interval. For the Grey Scale, 12% (5/43) of journals had scores above and 35% (15/43) had scores below the 99% CI.

Interpretation:

The size of a general medical journal’s Twitter following is strongly linked to its impact factor and citations, suggesting that higher quality research received more mainstream attention. Many journals have not capitalized on this dissemination method, although others have used it to their advantage.Social media has reached near ubiquity; medical research, researchers and journals are no exception to its pervasiveness.17 One of the most popular social media platforms is Twitter, with an estimated 302 million monthly active users sending 500 million tweets per day.8 Twitter differs from other social media platforms in that posts are limited to 140 or fewer characters. With an emphasis on brevity, Twitter provides a unique opportunity for medical knowledge to be disseminated to the general public.9,10 However, people with questionable medical research pedigrees are the stars of Twitter.With more than 65 million followers, Canadian @justinbieber is one of the most popular Twitter celebrities, narrowly edging out the most popular physician (@bengoldacre) by a margin of 64.5 million followers. Despite widespread distaste among non-Beliebers, it can be argued that The Biebs does display a modicum of musical talent (assuming multiplatinum albums and chart-topping singles11 are a proxy for talent). Conversely, the woefully popular @KimKardashian, who trumps top scientists by more than 33 million followers, falls into the “famous for being famous” trope, alongside fellow Glitterati Paris Hilton and Nicole Richie. The notion that people with dubious levels of talent and questionable means of attaining celebrity can become immensely popular is worrisome. This notion has sparked debate within the scientific community. Do these self-perpetuated self-promoters exist in academia? Are any scientists “renowned for being renowned”?12There has been increasing use of alternative means of quantifying journals’ impact, notably using the Altmetric statistic, which conglomerates an article’s social media presence through blogs, news outlets, Facebook and Twitter. In response to the meteoric unmeritocratic rise of social media celebrities via Twitter, @neilhall_uk developed the playfully dubbed Kardashian Index (K-index) to address these issues in an academic context.12 The K-index quantifies the discrepancy between mainstream popularity and scientific merit by examining one’s social media profile in relation to one’s citations in peer reviewed works.Continuing in this vein, I propose the (Fifty Shades of) Grey Scale for use with medical journals (in reference to the book, which has sold more than 125 million copies to date, despite being critically lambasted).13 Using a similar equation to the K-index, the Grey Scale calculates the ratio of the number of actual to expected followers using journal impact factor (rather than citations, as in the K-index) as the predictor variable. Journal impact factor and total citations are closely related. Impact factor is the ratio of total citations to the number of articles published by the journal, which adjusts for journals that have many more, or fewer, citable publications (e.g., weekly or bimonthly journals).14Unpacking the mechanisms of Twitter celebrity is difficult. Personal Twitter profiles often include humour, wit and other attributes not normally attributed to the reporting of a new paper, as per general medical journal Tweets. By eliminating the individuality of the Tweet, looking only at medical journals’ Twitter profiles rather than individual researchers’, a more direct examination of the relation between Twitter celebrity and scientific merit is possible. Although Tweets linking to papers have been associated with greater citations than non-Tweeted papers,15 whether or not this translates into greater Twitter followings for the authors and the journal in which the paper was published has yet to explored. The relation between the number of Twitter followers and impact factor scores has recently been investigated in urology journals, where nonsignificant correlations between the number of Twitter followers and the impact factor of the journal were found.16The current study seeks to examine whether scientific merit (captured by journal impact factor and citations) translates into Twitter celebrity (i.e., number of followers) in general medical journals.  相似文献   

9.
In this paper we take advantage of recent developments in identifying the demographic characteristics of Twitter users to explore the demographic differences between those who do and do not enable location services and those who do and do not geotag their tweets. We discuss the collation and processing of two datasets—one focusing on enabling geoservices and the other on tweet geotagging. We then investigate how opting in to either of these behaviours is associated with gender, age, class, the language in which tweets are written and the language in which users interact with the Twitter user interface. We find statistically significant differences for both behaviours for all demographic characteristics, although the magnitude of association differs substantially by factor. We conclude that there are significant demographic variations between those who opt in to geoservices and those who geotag their tweets. Not withstanding the limitations of the data, we suggest that Twitter users who publish geographical information are not representative of the wider Twitter population.  相似文献   

10.
Previous research has shown that political leanings correlate with various psychological factors. While surveys and experiments provide a rich source of information for political psychology, data from social networks can offer more naturalistic and robust material for analysis. This research investigates psychological differences between individuals of different political orientations on a social networking platform, Twitter. Based on previous findings, we hypothesized that the language used by liberals emphasizes their perception of uniqueness, contains more swear words, more anxiety-related words and more feeling-related words than conservatives’ language. Conversely, we predicted that the language of conservatives emphasizes group membership and contains more references to achievement and religion than liberals’ language. We analysed Twitter timelines of 5,373 followers of three Twitter accounts of the American Democratic and 5,386 followers of three accounts of the Republican parties’ Congressional Organizations. The results support most of the predictions and previous findings, confirming that Twitter behaviour offers valid insights to offline behaviour.  相似文献   

11.
The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results show that the methods presented are able to successfully identify a target audience with high accuracy. In addition, we show that using a statistical inference approach such as bootstrapping in over-sampling, instead of using random sampling, to construct training datasets can achieve a better classifier in an SVM ensemble. We conclude that such an ensemble system can take advantage of data diversity, which enables real-world applications for differentiating prospective customers from the general audience, leading to business advantage in the crowded social media space.  相似文献   

12.
Twitter is a major social media platform in which users send and read messages (“tweets”) of up to 140 characters. In recent years this communication medium has been used by those affected by crises to organize demonstrations or find relief. Because traffic on this media platform is extremely heavy, with hundreds of millions of tweets sent every day, it is difficult to differentiate between times of turmoil and times of typical discussion. In this work we present a new approach to addressing this problem. We first assess several possible “thermostats” of activity on social media for their effectiveness in finding important time periods. We compare methods commonly found in the literature with a method from economics. By combining methods from computational social science with methods from economics, we introduce an approach that can effectively locate crisis events in the mountains of data generated on Twitter. We demonstrate the strength of this method by using it to locate the social events relating to the Occupy Wall Street movement protests at the end of 2011.  相似文献   

13.
Online social media such as Twitter are widely used for mining public opinions and sentiments on various issues and topics. The sheer volume of the data generated and the eager adoption by the online-savvy public are helping to raise the profile of online media as a convenient source of news and public opinions on social and political issues as well. Due to the uncontrollable biases in the population who heavily use the media, however, it is often difficult to measure how accurately the online sphere reflects the offline world at large, undermining the usefulness of online media. One way of identifying and overcoming the online–offline discrepancies is to apply a common analytical and modeling framework to comparable data sets from online and offline sources and cross-analyzing the patterns found therein. In this paper we study the political spectra constructed from Twitter and from legislators'' voting records as an example to demonstrate the potential limits of online media as the source for accurate public opinion mining, and how to overcome the limits by using offline data simultaneously.  相似文献   

14.
Social media like blogs, micro-blogs or social networks are increasingly being investigated and employed to detect and predict trends for not only social and physical phenomena, but also to capture environmental information. Here we argue that opportunistic biodiversity observations published through Twitter represent one promising and until now unexplored example of such data mining. As we elaborate, it can contribute to real-time information to traditional ecological monitoring programmes including those sourced via citizen science activities. Using Twitter data collected for a generic assessment of social media data in ecological monitoring we investigated a sample of what we denote biodiversity observations with species determination requests (N = 191). These entail images posted as messages on the micro-blog service Twitter. As we show, these frequently trigger conversations leading to taxonomic determinations of those observations. All analysed Tweets were posted with species determination requests, which generated replies for 64% of Tweets, 86% of those contained at least one suggested determination, of which 76% were assessed as correct. All posted observations included or linked to images with the overall image quality categorised as satisfactory or better for 81% of the sample and leading to taxonomic determinations at the species level in 71% of provided determinations. We claim that the original message authors and conversation participants can be viewed as implicit or embryonic citizen science communities which have to offer valuable contributions both as an opportunistic data source in ecological monitoring as well as potential active contributors to citizen science programmes.  相似文献   

15.
Twitter is a free social networking and micro-blogging service that enables its millions of users to send and read each other's "tweets," or short, 140-character messages. The service has more than 190 million registered users and processes about 55 million tweets per day. Useful information about news and geopolitical events lies embedded in the Twitter stream, which embodies, in the aggregate, Twitter users' perspectives and reactions to current events. By virtue of sheer volume, content embedded in the Twitter stream may be useful for tracking or even forecasting behavior if it can be extracted in an efficient manner. In this study, we examine the use of information embedded in the Twitter stream to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity. We also show that Twitter can be used as a measure of public interest or concern about health-related events. Our results show that estimates of influenza-like illness derived from Twitter chatter accurately track reported disease levels.  相似文献   

16.
Social networking services (e.g., Twitter, Facebook) are now major sources of World Wide Web (called “Web”) dynamics, together with Web search services (e.g., Google). These two types of Web services mutually influence each other but generate different dynamics. In this paper, we distinguish two modes of Web dynamics: the reactive mode and the default mode. It is assumed that Twitter messages (called “tweets”) and Google search queries react to significant social movements and events, but they also demonstrate signs of becoming self-activated, thereby forming a baseline Web activity. We define the former as the reactive mode and the latter as the default mode of the Web. In this paper, we investigate these reactive and default modes of the Web''s dynamics using transfer entropy (TE). The amount of information transferred between a time series of 1,000 frequent keywords in Twitter and the same keywords in Google queries is investigated across an 11-month time period. Study of the information flow on Google and Twitter revealed that information is generally transferred from Twitter to Google, indicating that Twitter time series have some preceding information about Google time series. We also studied the information flow among different Twitter keywords time series by taking keywords as nodes and flow directions as edges of a network. An analysis of this network revealed that frequent keywords tend to become an information source and infrequent keywords tend to become sink for other keywords. Based on these findings, we hypothesize that frequent keywords form the Web''s default mode, which becomes an information source for infrequent keywords that generally form the Web''s reactive mode. We also found that the Web consists of different time resolutions with respect to TE among Twitter keywords, which will be another focal point of this paper.  相似文献   

17.
Despite the potential of social media for environmental monitoring, concerns remain about the quality and reliability of the information automatically extracted. Notably there are many observations of wildlife on Twitter, but their automated detection is a challenge due to the frequent use of wildlife related words in messages that have no connection with wildlife observation. We investigate whether and what type of supervised machine learning methods can be used to create a fully automated text classification model to identify genuine wildlife observations on Twitter, irrespective of species type or whether Tweets are geo-tagged. We perform experiments with various techniques for building feature vectors that serve as input to the classifiers, and consider how they affect classification performance. We compare three classification approaches and perform an analysis of the types of features that are indicative for genuine wildlife observations on Twitter. In particular, we compare some classical machine learning algorithms, widely used in ecology studies, with state-of-the-art neural network models. Results showed that the neural network-based model Bidirectional Encoder Representations from Transformers (BERT) outperformed the classical methods. Notably this was the case for a relatively small training corpus, consisting of less than 3000 instances. This reflects that fact that the BERT classifier uses a transfer learning approach that benefits from prior learning on a very much larger collection of generic text. BERT performed particularly well even for Tweets that employed specialised language relating to wildlife observations. The analysis of possible indicative features for wildlife Tweets revealed interesting trends in the usage of hashtags that are unrelated to official citizen science campaigns. The findings from this study facilitate more accurate identification of wildlife-related data on social media which can in turn be used for enriching citizen science data collections.  相似文献   

18.
19.
Chew C  Eysenbach G 《PloS one》2010,5(11):e14118

Background

Surveys are popular methods to measure public perceptions in emergencies but can be costly and time consuming. We suggest and evaluate a complementary “infoveillance” approach using Twitter during the 2009 H1N1 pandemic. Our study aimed to: 1) monitor the use of the terms “H1N1” versus “swine flu” over time; 2) conduct a content analysis of “tweets”; and 3) validate Twitter as a real-time content, sentiment, and public attention trend-tracking tool.

Methodology/Principal Findings

Between May 1 and December 31, 2009, we archived over 2 million Twitter posts containing keywords “swine flu,” “swineflu,” and/or “H1N1.” using Infovigil, an infoveillance system. Tweets using “H1N1” increased from 8.8% to 40.5% (R 2 = .788; p<.001), indicating a gradual adoption of World Health Organization-recommended terminology. 5,395 tweets were randomly selected from 9 days, 4 weeks apart and coded using a tri-axial coding scheme. To track tweet content and to test the feasibility of automated coding, we created database queries for keywords and correlated these results with manual coding. Content analysis indicated resource-related posts were most commonly shared (52.6%). 4.5% of cases were identified as misinformation. News websites were the most popular sources (23.2%), while government and health agencies were linked only 1.5% of the time. 7/10 automated queries correlated with manual coding. Several Twitter activity peaks coincided with major news stories. Our results correlated well with H1N1 incidence data.

Conclusions

This study illustrates the potential of using social media to conduct “infodemiology” studies for public health. 2009 H1N1-related tweets were primarily used to disseminate information from credible sources, but were also a source of opinions and experiences. Tweets can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns.  相似文献   

20.
This research examines how information about an oil spill, its impacts, and the use of dispersants to treat the oil, moved through social media and the surrounding Internet during the 2010 BP Deepwater Horizon oil spill. Using a collection of tweets captured during the spill, we employ a mixed-method approach including an in-depth qualitative analysis to examine the content of Twitter posts, the connections that Twitter users made with each other, and the links between Twitter content and the surrounding Internet. This article offers a range of findings to help practitioners and others understand how social media is used by a variety of different actors during a slow-moving, long-term, environmental disaster. We enumerate some of the most salient themes in the Twitter data, noting that concerns about health impacts were more likely to be communicated in tweets about dispersant use, than in the larger conversation. We describe the accounts and behaviors of highly retweeted Twitter users, noting how locals helped to shape the network and the conversation. Importantly, our results show the online crowd wanting to participate in and contribute to response efforts, a finding with implications for future oil spill response.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号