首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Predicting what items will be selected by a target user in the future is an important function for recommendation systems. Matrix factorization techniques have been shown to achieve good performance on temporal rating-type data, but little is known about temporal item selection data. In this paper, we developed a unified model that combines Multi-task Non-negative Matrix Factorization and Linear Dynamical Systems to capture the evolution of user preferences. Specifically, user and item features are projected into latent factor space by factoring co-occurrence matrices into a common basis item-factor matrix and multiple factor-user matrices. Moreover, we represented both within and between relationships of multiple factor-user matrices using a state transition matrix to capture the changes in user preferences over time. The experiments show that our proposed algorithm outperforms the other algorithms on two real datasets, which were extracted from Netflix movies and Last.fm music. Furthermore, our model provides a novel dynamic topic model for tracking the evolution of the behavior of a user over time.  相似文献   

2.
In this paper we take advantage of recent developments in identifying the demographic characteristics of Twitter users to explore the demographic differences between those who do and do not enable location services and those who do and do not geotag their tweets. We discuss the collation and processing of two datasets—one focusing on enabling geoservices and the other on tweet geotagging. We then investigate how opting in to either of these behaviours is associated with gender, age, class, the language in which tweets are written and the language in which users interact with the Twitter user interface. We find statistically significant differences for both behaviours for all demographic characteristics, although the magnitude of association differs substantially by factor. We conclude that there are significant demographic variations between those who opt in to geoservices and those who geotag their tweets. Not withstanding the limitations of the data, we suggest that Twitter users who publish geographical information are not representative of the wider Twitter population.  相似文献   

3.
In the Twitter blogosphere, the number of followers is probably the most basic and succinct quantity for measuring popularity of users. However, the number of followers can be manipulated in various ways; we can even buy follows. Therefore, alternative popularity measures for Twitter users on the basis of, for example, users'' tweets and retweets, have been developed. In the present work, we take a purely network approach to this fundamental question. First, we find that two relatively distinct types of users possessing a large number of followers exist, in particular for Japanese, Russian, and Korean users among the seven language groups that we examined. A first type of user follows a small number of other users. A second type of user follows approximately the same number of other users as the number of follows that the user receives. Then, we compare local (i.e., egocentric) followership networks around the two types of users with many followers. We show that the second type, which is presumably uninfluential users despite its large number of followers, is characterized by high link reciprocity, a large number of friends (i.e., those whom a user follows) for the followers, followers'' high link reciprocity, large clustering coefficient, large fraction of the second type of users among the followers, and a small PageRank. Our network-based results support that the number of followers used alone is a misleading measure of user''s popularity. We propose that the number of friends, which is simple to measure, also helps us to assess the popularity of Twitter users.  相似文献   

4.
Twitter is a free social networking and micro-blogging service that enables its millions of users to send and read each other's "tweets," or short, 140-character messages. The service has more than 190 million registered users and processes about 55 million tweets per day. Useful information about news and geopolitical events lies embedded in the Twitter stream, which embodies, in the aggregate, Twitter users' perspectives and reactions to current events. By virtue of sheer volume, content embedded in the Twitter stream may be useful for tracking or even forecasting behavior if it can be extracted in an efficient manner. In this study, we examine the use of information embedded in the Twitter stream to (1) track rapidly-evolving public sentiment with respect to H1N1 or swine flu, and (2) track and measure actual disease activity. We also show that Twitter can be used as a measure of public interest or concern about health-related events. Our results show that estimates of influenza-like illness derived from Twitter chatter accurately track reported disease levels.  相似文献   

5.

Objective

This study explores the presence and actions of an electronic cigarette (e-cigarette) brand, Blu, on Twitter to observe how marketing messages are sent and diffused through the retweet (i.e., message forwarding) functionality. Retweet networks enable messages to reach additional Twitter users beyond the sender’s local network. We follow messages from their origin through multiple retweets to identify which messages have more reach, and the different users who are exposed.

Methods

We collected three months of publicly available data from Twitter. A combination of techniques in social network analysis and content analysis were applied to determine the various networks of users who are exposed to e-cigarette messages and how the retweet network can affect which messages spread.

Results

The Blu retweet network expanded during the study period. Analysis of user profiles combined with network cluster analysis showed that messages of certain topics were only circulated within a community of e-cigarette supporters, while other topics spread further, reaching more general Twitter users who may not support or use e-cigarettes.

Conclusions

Retweet networks can serve as proxy filters for marketing messages, as Twitter users decide which messages they will continue to diffuse among their followers. As certain e-cigarette messages extend beyond their point of origin, the audience being exposed expands beyond the e-cigarette community. Potential implications for health education campaigns include utilizing Twitter and targeting important gatekeepers or hubs that would maximize message diffusion.  相似文献   

6.
Sports fans are able to watch games from many locations using TV services while interacting with other fans online. In this paper, we identify the factors that affect sports viewers’ online interactions. Using a large-scale dataset of more than 25 million chat messages from a popular social TV site for baseball, we extract various game-related factors, and investigate the relationships between these factors and fans’ interactions using a series of multiple regression analyses. As a result, we identify several factors that are significantly related to viewer interactions. In addition, we determine that the influence of these factors varies according to the user group; i.e., active vs. less active users, and loyal vs. non-loyal users.  相似文献   

7.
Social networking services (e.g., Twitter, Facebook) are now major sources of World Wide Web (called “Web”) dynamics, together with Web search services (e.g., Google). These two types of Web services mutually influence each other but generate different dynamics. In this paper, we distinguish two modes of Web dynamics: the reactive mode and the default mode. It is assumed that Twitter messages (called “tweets”) and Google search queries react to significant social movements and events, but they also demonstrate signs of becoming self-activated, thereby forming a baseline Web activity. We define the former as the reactive mode and the latter as the default mode of the Web. In this paper, we investigate these reactive and default modes of the Web''s dynamics using transfer entropy (TE). The amount of information transferred between a time series of 1,000 frequent keywords in Twitter and the same keywords in Google queries is investigated across an 11-month time period. Study of the information flow on Google and Twitter revealed that information is generally transferred from Twitter to Google, indicating that Twitter time series have some preceding information about Google time series. We also studied the information flow among different Twitter keywords time series by taking keywords as nodes and flow directions as edges of a network. An analysis of this network revealed that frequent keywords tend to become an information source and infrequent keywords tend to become sink for other keywords. Based on these findings, we hypothesize that frequent keywords form the Web''s default mode, which becomes an information source for infrequent keywords that generally form the Web''s reactive mode. We also found that the Web consists of different time resolutions with respect to TE among Twitter keywords, which will be another focal point of this paper.  相似文献   

8.

Background  

Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community.  相似文献   

9.
In this study, starting from human dental pulp cells cultured in vitro, we simulated reparative dentinogenesis using a medium supplemented with different odontogenic inductors. The differentiation of dental pulp cells in odontoblast-like cells was evaluated by means of staining, and ultramorphological, biochemical and biomolecular methods. Alizarin red staining showed mineral deposition while transmission electron microscopy revealed a synthesis of extracellular matrix fibers during the differentiation process. Biochemical assays demonstrated that the differentiated phenotype expressed odontoblast markers, such as Dentin Matrix Protein 1 (DMP1) and Dentin Sialoprotein (DSP), as well as type I collagen. Quantitative data regarding the mRNA expression of DMP1, DSP and type I collagen were obtained by Real Time PCR. Immunofluorescence data demonstrated the various localizations of DSP and DMP1 during odontoblast differentiation. Based on our results, we obtained odontoblast-like cells which simulated the reparative dentin processes in order to better investigate the mechanism of odontoblast differentiation, and dentin extracellular matrix deposition and mineralization.Key words: dental tissue, in vitro differentiation, DMP1, DSP, type I collagen  相似文献   

10.
Matrix correlation represents an innovative methodology to evaluate the explanatory power of several hypotheses by measuring their correspondence with observed morphological variation. In this paper, we view the origins of Patagonians from a matrix correlation approach. Personal and published data on nonmetric cranial traits were used to estimate a biological distance matrix involving five major groups from Patagonia and two from the northwest and northeast regions of Argentina. To evaluate correspondence with other important factors, we used a geographic distance matrix and four design matrices, representing several patterns of settlement and differentiation. Biological distance was found to be strongly associated with spatial separation; the correlation between geography and nonmetric cranial distances was highly significant. When geographic distance is held constant, correlation between a model representing high levels of heterogeneity between the samples and morphological (nonmetric) variation becomes highly significant.  相似文献   

11.
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases.  相似文献   

12.
MOTIVATION: Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the literature it is sometimes recommended to try several scoring matrices on the sequences of interest. The significance of an alignment is usually assessed by looking at E-values and p-values. While sequence lengths and data base sizes enter the standard calculations of significance, it is much less common to take the use of several scoring matrices on the same sequences into account. Altschul proposed corrections of the p-value that account for the simultaneous use of an infinite number of PAM matrices. Here we consider the more realistic situation where the user may choose from a finite set of popular PAM and BLOSUM matrices, in particular the ones available in BLAST. It turns out that the significance of a result can be considerably overestimated, if a set of substitution matrices is used in an alignment problem and the most significant alignment is then quoted. RESULTS: Based on extensive simulations, we study the multiple testing problem that occurs when several scoring matrices for local sequence alignment are used. We consider a simple Bonferroni correction of the p-values and investigate its accuracy. Finally, we propose a more accurate correction based on extreme value distributions fitted to the maximum of the normalized scores obtained from different scoring matrices. For various sets of matrices we provide correction factors which can be easily applied to adjust p- and E-values reported by software packages.  相似文献   

13.
Cloud computing is an emerging computing paradigm in which IT resources and capacities are provided as services over the Internet. Promising as it is, this paradigm also brings forth new challenges for security when users want to securely outsource the computation of cryptographic operations to the untrusted cloud servers. As we know, modular exponentiation is one of the basic operations among most of current cryptosystems. In this paper, we present the generic secure outsourcing schemes enabling users to securely outsource the computations of exponentiations to the untrusted cloud servers. With our techniques, a batch of exponentiations (e.g. t exponentiations) can be efficiently computed by the user with only O(n+t) multiplications, where n is the number of bits of the exponent. Compared with the state-of-the-art algorithm, the proposed schemes are superior in both efficiency and verifiability. Furthermore, there are not any complicated pre-computations on the user side. Finally, the schemes are proved to be secure under the Subset Sum Problem.  相似文献   

14.
In this paper, we conduct a study about differences between female and male discursive strategies when posting in the microblogging service Twitter, with a particular focus on the hashtag designation process during political debate. The fact that men and women use language in distinct ways, reverberating practices linked to their expected roles in the social groups, is a linguistic phenomenon known to happen in several cultures and that can now be studied on the Web and on online social networks in a large scale enabled by computing power. Here, for instance, after analyzing tweets with political content posted during Brazilian presidential campaign,we found out that male Twitter users, when expressing their attitude toward a given candidate, are more prone to use imperative verbal forms in hashtags, while female users tend to employ declarative forms. This difference can be interpreted as a sign of distinct approaches in relation to other network members: for example, if political hashtags are seen as strategies of persuasion in Twitter, imperative tags could be understood as more overt ways of persuading and declarative tags as more indirect ones. Our findings help to understand human gendered behavior in social networks and contribute to research on the new fields of computer-enabled Internet linguistics and social computing, besides being useful for several computational tasks such as developing tag recommendation systems based on users'' collective preferences and tailoring targeted advertising strategies, among others.  相似文献   

15.
In September 2013 the Intergovernmental Panel on Climate Change published its Working Group 1 report, the first comprehensive assessment of physical climate science in six years, constituting a critical event in the societal debate about climate change. This paper analyses the nature of this debate in one public forum: Twitter. Using statistical methods, tweets were analyzed to discover the hashtags used when people tweeted about the IPCC report, and how Twitter users formed communities around their conversational connections. In short, the paper presents the topics and tweeters at this particular moment in the climate debate. The most used hashtags related to themes of science, geographical location and social issues connected to climate change. Particularly noteworthy were tweets connected to Australian politics, US politics, geoengineering and fracking. Three communities of Twitter users were identified. Researcher coding of Twitter users showed how these varied according to geographical location and whether users were supportive, unsupportive or neutral in their tweets about the IPCC. Overall, users were most likely to converse with users holding similar views. However, qualitative analysis suggested the emergence of a community of Twitter users, predominantly based in the UK, where greater interaction between contrasting views took place. This analysis also illustrated the presence of a campaign by the non-governmental organization Avaaz, aimed at increasing media coverage of the IPCC report.  相似文献   

16.
Matrix population models are one of the most common mathematical models in ecology, which describe the dynamics of stage-structured populations and provide us many population statistics. One of the statistics, elasticity onto population growth rate, is frequently used and represents the degree of the relative impact of life history parameters to the population growth rate. Due to the utility of elasticities for cross-taxonomic comparisons, Silvertown and his coauthors have published multiple papers and reported the relationship between elasticities and life forms (or life history) in multiple plant species, using a triangle map (called “ternary plot”). To understand why their elasticities are located in specific regions of the ternary plot, we constructed four archetypes of population matrices, from which we simulated 24,000 randomly generated population matrices and obtained the consequent elasticities. We found a large discrepancy when comparing our results to those in Silvertown et al.'s study (Conserv Biol 10:591–597, 1996): for our simulated matrices where rapid transitions were not allowed (e.g., trees), the elasticity distribution resulted in a line across the ternary plot. We provided the mathematical proof for this result, and found that its slope depends on matrix dimension. We also used 1230 matrices from the COMPADRE Plant Matrix Database and calculated the elasticities. Our simulated results were validated with field data from COMPADRE: two straight lines appeared in the ternary plot. Furthermore, we answered several addressed questions, such as, “Is there any special elasticity distribution in matrices with high population growth rates?” and “Why are the elasticities of natural populations concentrated in the upper half of the ternary plot?”.  相似文献   

17.
As an archive of sequence data for over 165,000 species, GenBank is an indispensable resource for phylogenetic inference. Here we describe an informatics processing pipeline and online database, the PhyLoTA Browser (http://loco.biosci.arizona.edu/pb), which offers a view of GenBank tailored for molecular phylogenetics. The first release of the Browser is computed from 2.6 million sequences representing the taxonomically enriched subset of GenBank sequences for eukaryotes (excluding most genome survey sequences, ESTs, and other high-throughput data). In addition to summarizing sequence diversity and species diversity across nodes in the NCBI taxonomy, it reports 87,000 potentially phylogenetically informative clusters of homologous sequences, which can be viewed or downloaded, along with provisional alignments and coarse phylogenetic trees. At each node in the NCBI hierarchy, the user can display a "data availability matrix" of all available sequences for entries in a subtaxa-by-clusters matrix. This matrix provides a guidepost for subsequent assembly of multigene data sets or supertrees. The database allows for comparison of results from previous GenBank releases, highlighting recent additions of either sequences or taxa to GenBank and letting investigators track progress on data availability worldwide. Although the reported alignments and trees are extremely approximate, the database reports several statistics correlated with alignment quality to help users choose from alternative data sources.  相似文献   

18.
In the last decade, advances in high-throughput technologies such as DNA microarrays have made it possible to simultaneously measure the expression levels of tens of thousands of genes and proteins. This has resulted in large amounts of biological data requiring analysis and interpretation. Nonnegative matrix factorization (NMF) was introduced as an unsupervised, parts-based learning paradigm involving the decomposition of a nonnegative matrix V into two nonnegative matrices, W and H, via a multiplicative updates algorithm. In the context of a pxn gene expression matrix V consisting of observations on p genes from n samples, each column of W defines a metagene, and each column of H represents the metagene expression pattern of the corresponding sample. NMF has been primarily applied in an unsupervised setting in image and natural language processing. More recently, it has been successfully utilized in a variety of applications in computational biology. Examples include molecular pattern discovery, class comparison and prediction, cross-platform and cross-species analysis, functional characterization of genes and biomedical informatics. In this paper, we review this method as a data analytical and interpretive tool in computational biology with an emphasis on these applications.  相似文献   

19.
Hypatia-trackRadar is a Java standalone application designed to help biologists extract and process bird movement data from marine surveillance radars. This application integrates simultaneous collection of radar data and field observations by allowing the user to link information gathered from visual observers (such as bird species and flock size) to the radar echoes. A virtual transparent sheet positioned on the radar screen allows the user to visually follow and track the echoes on the radar screen. The application translates the position of the echoes on the screen in a metric coordinate system. Based on time and spatial position of the echoes the software automatically calculates multiple flight parameters, such as ground speed, track length and duration. We validated Hypatia-trackRadar using an unmanned aerial vehicle. Here we present the features of this application software and its first use in a real case study in a raptor migration bottle-neck.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号