首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
Wikipedia is a huge global repository of human knowledge that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obtain the top 100 historical figures, for each edition and for each algorithm. We investigate their spatial, temporal, and gender distributions in dependence of their cultural origins. Our study demonstrates not only the existence of skewness with local figures, mainly recognized only in their own cultures, but also the existence of global historical figures appearing in a large number of editions. By determining the birth time and place of these persons, we perform an analysis of the evolution of such figures through 35 centuries of human history for each language, thus recovering interactions and entanglement of cultures over time. We also obtain the distributions of historical figures over world countries, highlighting geographical aspects of cross-cultural links. Considering historical figures who appear in multiple editions as interactions between cultures, we construct a network of cultures and identify the most influential cultures according to this network.  相似文献   

Can online media predict new and emerging trends, since there is a relationship between trends in society and their representation in online systems? While several recent studies have used Google Trends as the leading online information source to answer corresponding research questions, we focus on the online encyclopedia Wikipedia often used for deeper topical reading. Wikipedia grants open access to all traffic data and provides lots of additional (semantic) information in a context network besides single keywords. Specifically, we suggest and study context-normalized and time-dependent measures for a topic’s importance based on page-view time series of Wikipedia articles in different languages and articles related to them by internal links. As an example, we present a study of the recently emerging Big Data market with a focus on the Hadoop ecosystem, and compare the capabilities of Wikipedia versus Google in predicting its popularity and life cycles. To support further applications, we have developed an open web platform to share results of Wikipedia analytics, providing context-rich and language-independent relevance measures for emerging trends.  相似文献   

The online encyclopedia Wikipedia has become one of the most important online references in the world and has a substantial and growing scientific content. A search of Google with many RNA-related keywords identifies a Wikipedia article as the top hit. We believe that the RNA community has an important and timely opportunity to maximize the content and quality of RNA information in Wikipedia. To this end, we have formed the RNA WikiProject (http://en.wikipedia.org/wiki/Wikipedia:WikiProject_RNA) as part of the larger Molecular and Cellular Biology WikiProject. We have created over 600 new Wikipedia articles describing families of noncoding RNAs based on the Rfam database, and invite the community to update, edit, and correct these articles. The Rfam database now redistributes this Wikipedia content as the primary textual annotation of its RNA families. Users can, therefore, for the first time, directly edit the content of one of the major RNA databases. We believe that this Wikipedia/Rfam link acts as a functioning model for incorporating community annotation into molecular biology databases.  相似文献   

Using a longitudinal network analysis approach, we investigate the structural development of the knowledge base of Wikipedia in order to explain the appearance of new knowledge. The data consists of the articles in two adjacent knowledge domains: psychology and education. We analyze the development of networks of knowledge consisting of interlinked articles at seven snapshots from 2006 to 2012 with an interval of one year between them. Longitudinal data on the topological position of each article in the networks is used to model the appearance of new knowledge over time. Thus, the structural dimension of knowledge is related to its dynamics. Using multilevel modeling as well as eigenvector and betweenness measures, we explain the significance of pivotal articles that are either central within one of the knowledge domains or boundary-crossing between the two domains at a given point in time for the future development of new knowledge in the knowledge base.  相似文献   

We develop a three-step computing approach to explore a hierarchical ranking network for a society of captive rhesus macaques. The computed network is sufficiently informative to address the question: Is the ranking network for a rhesus macaque society more like a kingdom or a corporation? Our computations are based on a three-step approach. These steps are devised to deal with the tremendous challenges stemming from the transitivity of dominance as a necessary constraint on the ranking relations among all individual macaques, and the very high sampling heterogeneity in the behavioral conflict data. The first step simultaneously infers the ranking potentials among all network members, which requires accommodation of heterogeneous measurement error inherent in behavioral data. Our second step estimates the social rank for all individuals by minimizing the network-wide errors in the ranking potentials. The third step provides a way to compute confidence bounds for selected empirical features in the social ranking. We apply this approach to two sets of conflict data pertaining to two captive societies of adult rhesus macaques. The resultant ranking network for each society is found to be a sophisticated mixture of both a kingdom and a corporation. Also, for validation purposes, we reanalyze conflict data from twenty longhorn sheep and demonstrate that our three-step approach is capable of correctly computing a ranking network by eliminating all ranking error.  相似文献   

In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.  相似文献   

Although chronobiology is of growing interest to scientists, physicians, and the general public, access to recent discoveries and historical perspectives is limited. Wikipedia is an online, user-written encyclopedia that could enhance public access to current understanding in chronobiology. However, Wikipedia is lacking important information and is not universally trusted. Here, 46 students in a university course edited Wikipedia to enhance public access to important discoveries in chronobiology. Students worked for an average of 9 h each to evaluate the primary literature and available Wikipedia information, nominated sites for editing, and, after voting, edited the 15 Wikipedia pages they determined to be highest priorities. This assignment (http://www.nslc.wustl.edu/courses/Bio4030/wikipedia_project.html) was easy to implement, required relatively short time commitments from the professor and students, and had measurable impacts on Wikipedia and the students. Students created 3 new Wikipedia sites, edited 12 additional sites, and cited 347 peer-reviewed articles. The targeted sites all became top hits in online search engines. Because their writing was and will be read by a worldwide audience, students found the experience rewarding. Students reported significantly increased comfort with reading, critiquing, and summarizing primary literature and benefited from seeing their work edited by other scientists and editors of Wikipedia. We conclude that, in a short project, students can assist in making chronobiology widely accessible and learn from the editorial process.  相似文献   



Optimal ranking of literature importance is vital in overcoming article overload. Existing ranking methods are typically based on raw citation counts, giving a sum of ‘inbound’ links with no consideration of citation importance. PageRank, an algorithm originally developed for ranking webpages at the search engine, Google, could potentially be adapted to bibliometrics to quantify the relative importance weightings of a citation network. This article seeks to validate such an approach on the freely available, PubMed Central open access subset (PMC-OAS) of biomedical literature.


On-demand cloud computing infrastructure was used to extract a citation network from over 600,000 full-text PMC-OAS articles. PageRanks and citation counts were calculated for each node in this network. PageRank is highly correlated with citation count (R?=?0.905, P?<?0.01) and we thus validate the former as a surrogate of literature importance. Furthermore, the algorithm can be run in trivial time on cheap, commodity cluster hardware, lowering the barrier of entry for resource-limited open access organisations.


PageRank can be trivially computed on commodity cluster hardware and is linearly correlated with citation count. Given its putative benefits in quantifying relative importance, we suggest it may enrich the citation network, thereby overcoming the existing inadequacy of citation counts alone. We thus suggest PageRank as a feasible supplement to, or replacement of, existing bibliometric ranking methods.

Dynamics of conflicts in Wikipedia   总被引:1,自引:0,他引:1  
In this work we study the dynamical features of editorial wars in Wikipedia (WP). Based on our previously established algorithm, we build up samples of controversial and peaceful articles and analyze the temporal characteristics of the activity in these samples. On short time scales, we show that there is a clear correspondence between conflict and burstiness of activity patterns, and that memory effects play an important role in controversies. On long time scales, we identify three distinct developmental patterns for the overall behavior of the articles. We are able to distinguish cases eventually leading to consensus from those cases where a compromise is far from achievable. Finally, we analyze discussion networks and conclude that edit wars are mainly fought by few editors only.  相似文献   

Interactions between oil‐collecting bees and oil‐producing flowers are a very specialized mutualism, whose natural history is well known at the organism and population levels. In this study, we assessed these interactions at the biome level with a network approach, and hypothesized that widespread bee and plant species would occupy different ecological functional roles (Eltonian niches) in different biomes. Furthermore, we expected the most important functional roles in each network to be occupied more frequently by Byrsonima oil flowers and Centris oil bees, which share the longest coevolutionary history in the Neotropics. By compiling data from 40 articles on oil flower interactions within the Malpighiaceae family, we built six networks for different Brazilian biomes. We assessed the ecological functional role of each species in pollination networks of oil flowers through the metric known as ‘network functional role’. Although 90 percent of the species occupied peripheral roles in each network, some were found to occupy highly central roles. Oil flowers of the genera Byrsonima and Banisteriopsis and oil bees of the genera Centris and Epicharis were the most important species in all networks, as they made a disproportionally high number of interactions (hubs), or helped bind together different modules (connectors). Our findings suggest that functional roles vary geographically and seem to be affected by local conditions in different biomes. Furthermore, coevolutionary history seems to play an important role in determining functional roles in oil flower networks, although other factors are probably also important, especially the degree of specialization in this kind of interaction.  相似文献   

Studies of spatial proximity between individuals are important for an understanding of social structure because animals are more likely to interact with individuals in close spatial proximity. Here, we apply social network analysis to proximity data collected between 2001 and 2003 from an individually identified, provisioned, free-ranging band of Sichuan snub-nosed monkeys (Rhinopithecus roxellana) in the Qinling Mountains, central China. We aimed to quantify the social network structure and to gain insight into each individual’s position within the social network. The overall network connectivity of the study band was sparse, with a low group density of 0.17. We identified nine one-male-multifemale units (OMUs) in the study band using hierarchical cluster analysis, which confirms that this species forms a multilevel society in its natural habitat. Based on sex differences in eigenvector and betweenness centralities, adult females have more important social roles than males. Among females, lactating females scored higher betweenness and eigenvector centralities than other females. However, our results do not suggest the existence of key individual(s) in the social network of the study band. The global clustering coefficient characteristic of the band was 0.3?±?0.1, with little variation between individuals, suggesting that the removal or death of any specific individual would not significantly disrupt its general network structure. Our results also show that proximity commonly occurs among unit members, but can also occur between females of different OMUs. These observations suggest that snub-nosed monkeys have a loose-knit or fluid rather than a rigid female-bonded social system, which may be a common trend for species living in multilevel societies.  相似文献   

The Internet has dramatically expanded citizens’ access to and ability to engage with political information. On many websites, any user can contribute and edit “crowd-sourced” information about important political figures. One of the most prominent examples of crowd-sourced information on the Internet is Wikipedia, a free and open encyclopedia created and edited entirely by users, and one of the world’s most accessed websites. While previous studies of crowd-sourced information platforms have found them to be accurate, few have considered biases in what kinds of information are included. We report the results of four randomized field experiments that sought to explore what biases exist in the political articles of this collaborative website. By randomly assigning factually true but either positive or negative and cited or uncited information to the Wikipedia pages of U.S. senators, we uncover substantial evidence of an editorial bias toward positivity on Wikipedia: Negative facts are 36% more likely to be removed by Wikipedia editors than positive facts within 12 hours and 29% more likely within 3 days. Although citations substantially increase an edit’s survival time, the editorial bias toward positivity is not eliminated by inclusion of a citation. We replicate this study on the Wikipedia pages of deceased as well as recently retired but living senators and find no evidence of an editorial bias in either. Our results demonstrate that crowd-sourced information is subject to an editorial bias that favors the politically active.  相似文献   

Ghrelin is a hormone, initially described as a gastric peptide stimulating appetite and growth hormone secretion, which also has an important role in the regulation of many other processes, including higher brain functions. Ghrelin has been described in situ in different parts of the brain, but so far there has been no data about its expression in cell cultures. Therefore, we aimed in this study to investigate the levels of ghrelin in dissociated cortical neurons at various times in culture. We applied the ABC immunocytochemical method for the detection of ghrelin in one-day-, one-week-, and two-week-old cultures. Our results clearly show that at the early stages after plating the cultures 86.2% (± 8.93) of the neurons are ghrelin-positive and their number decreases during the culturing period. As ghrelin is present in the majority of cultured newborn neurons, when the neuronal differentiation and network formation take place, it may also influence the early synaptic formation and cell-to-cell interactions, which are both very important for network functions like learning and memory.  相似文献   

MOTIVATION: A global view of the protein space is essential for functional and evolutionary analysis of proteins. In order to achieve this, a similarity network can be built using pairwise relationships among proteins. However, existing similarity networks employ a single similarity measure and therefore their utility depends highly on the quality of the selected measure. A more robust representation of the protein space can be realized if multiple sources of information are used. RESULTS: We propose a novel approach for analyzing multi-attribute similarity networks by combining random walks on graphs with Bayesian theory. A multi-attribute network is created by combining sequence and structure based similarity measures. For each attribute of the similarity network, one can compute a measure of affinity from a given protein to every other protein in the network using random walks. This process makes use of the implicit clustering information of the similarity network, and we show that it is superior to naive, local ranking methods. We then combine the computed affinities using a Bayesian framework. In particular, when we train a Bayesian model for automated classification of a novel protein, we achieve high classification accuracy and outperform single attribute networks. In addition, we demonstrate the effectiveness of our technique by comparison with a competing kernel-based information integration approach.  相似文献   

We introduce a new method for detecting communities of arbitrary size in an undirected weighted network. Our approach is based on tracing the path of closest-friendship between nodes in the network using the recently proposed Generalized Erds Numbers. This method does not require the choice of any arbitrary parameters or null models, and does not suffer from a system-size resolution limit. Our closest-friend community detection is able to accurately reconstruct the true network structure for a large number of real world and artificial benchmarks, and can be adapted to study the multi-level structure of hierarchical communities as well. We also use the closeness between nodes to develop a degree of robustness for each node, which can assess how robustly that node is assigned to its community. To test the efficacy of these methods, we deploy them on a variety of well known benchmarks, a hierarchal structured artificial benchmark with a known community and robustness structure, as well as real-world networks of coauthorships between the faculty at a major university and the network of citations of articles published in Physical Review. In all cases, microcommunities, hierarchy of the communities, and variable node robustness are all observed, providing insights into the structure of the network.  相似文献   

Counterparty risk denotes the risk that a party defaults in a bilateral contract. This risk not only depends on the two parties involved, but also on the risk from various other contracts each of these parties holds. In rather informal markets, such as the OTC (over-the-counter) derivative market, institutions only report their aggregated quarterly risk exposure, but no details about their counterparties. Hence, little is known about the diversification of counterparty risk. In this paper, we reconstruct the weighted and time-dependent network of counterparty risk in the OTC derivatives market of the United States between 1998 and 2012. To proxy unknown bilateral exposures, we first study the co-occurrence patterns of institutions based on their quarterly activity and ranking in the official report. The network obtained this way is further analysed by a weighted k-core decomposition, to reveal a core-periphery structure. This allows us to compare the activity-based ranking with a topology-based ranking, to identify the most important institutions and their mutual dependencies. We also analyse correlations in these activities, to show strong similarities in the behavior of the core institutions. Our analysis clearly demonstrates the clustering of counterparty risk in a small set of about a dozen US banks. This not only increases the default risk of the central institutions, but also the default risk of peripheral institutions which have contracts with the central ones. Hence, all institutions indirectly have to bear (part of) the counterparty risk of all others, which needs to be better reflected in the price of OTC derivatives.  相似文献   

In the present paper, we have created several novel journal similarity metrics. The MeSH odds ratio measures the topical similarity of any pair of journals, based on the major MeSH headings assigned to articles in MEDLINE. The second metric employed the 2009 Author-ity author name disambiguation dataset as a gold standard for estimating the author odds ratio. This gives a straightforward, intuitive answer to the question: Given two articles in PubMed that share the same author name (lastname, first initial), how does knowing only the identity of the journals (in which the articles were published) predict the relative likelihood that they are written by the same person vs. different persons? The article pair odds ratio detects the tendency of authors to publish repeatedly in the same journal, as well as in specific pairs of journals. The metrics can be applied not only to estimate the similarity of a pair of journals, but to provide novel profiles of individual journals as well. For example, for each journal, one can define the MeSH cloud as the number of other journals that are topically more similar to it than expected by chance, and the author cloud as the number of other journals that share more authors than expected by chance. These metrics for journal pairs and individual journals have been provided in the form of public datasets that can be readily studied and utilized by others.  相似文献   

The online resource Wikipedia is increasingly used by students for knowledge acquisition and learning. However, the lack of a formal editorial review and the heterogeneous expertise of contributors often results in skepticism by educators whether Wikipedia should be recommended to students as an information source. In this study we systematically analyzed the accuracy and completeness of drug information in the German and English language versions of Wikipedia in comparison to standard textbooks of pharmacology. In addition, references, revision history and readability were evaluated. Analysis of readability was performed using the Amstad readability index and the Erste Wiener Sachtextformel. The data on indication, mechanism of action, pharmacokinetics, adverse effects and contraindications for 100 curricular drugs were retrieved from standard German textbooks of general pharmacology and compared with the corresponding articles in the German language version of Wikipedia. Quantitative analysis revealed that accuracy of drug information in Wikipedia was 99.7%±0.2% when compared to the textbook data. The overall completeness of drug information in Wikipedia was 83.8±1.5% (p<0.001). Completeness varied in-between categories, and was lowest in the category “pharmacokinetics” (68.0%±4.2%; p<0.001) and highest in the category “indication” (91.3%±2.0%) when compared to the textbook data overlap. Similar results were obtained for the English language version of Wikipedia. Of the drug information missing in Wikipedia, 62.5% was rated as didactically non-relevant in a qualitative re-evaluation study. Drug articles in Wikipedia had an average of 14.6±1.6 references and 262.8±37.4 edits performed by 142.7±17.6 editors. Both Wikipedia and textbooks samples had comparable, low readability. Our study suggests that Wikipedia is an accurate and comprehensive source of drug-related information for undergraduate medical education.  相似文献   

Primary neural cultures from the fruit fly, Drosophila melanogaster, enable a high-resolution glance into cellular processes and neuronal interaction. The development of the culture, along with its vitality and functionality, can be continuously monitored, and the abundance of available tools for D. melanogaster research can greatly assist in characterizing different aspects of the culture. The fly primary neural culture preparation thus offers a promising platform for studying a variety of processes relating to nervous system development, activity and pathology. Our data reveal that neural cultures derived from the CNS of third-instar D. melanogaster larvae undergo an organization process that is specific and consistent throughout different cultures, and culminates in the creation of an elaborate neural network. We demonstrate that this process is accompanied by detectable changes in the protein expression profile of the culture, indicating the involvement of multi-protein processes specific to each stage of the network's development. As a further proof of concept, we demonstrate differential expression of a particular protein family, the gap-junction constructing innexin protein family, throughout the network's life.  相似文献   



Living systems are associated with Social networks — networks made up of nodes, some of which may be more important in various aspects as compared to others. While different quantitative measures labeled as “centralities” have previously been used in the network analysis community to find out influential nodes in a network, it is debatable how valid the centrality measures actually are. In other words, the research question that remains unanswered is: how exactly do these measures perform in the real world? So, as an example, if a centrality of a particular node identifies it to be important, is the node actually important?


The goal of this paper is not just to perform a traditional social network analysis but rather to evaluate different centrality measures by conducting an empirical study analyzing exactly how do network centralities correlate with data from published multidisciplinary network data sets.


We take standard published network data sets while using a random network to establish a baseline. These data sets included the Zachary''s Karate Club network, dolphin social network and a neural network of nematode Caenorhabditis elegans. Each of the data sets was analyzed in terms of different centrality measures and compared with existing knowledge from associated published articles to review the role of each centrality measure in the determination of influential nodes.


Our empirical analysis demonstrates that in the chosen network data sets, nodes which had a high Closeness Centrality also had a high Eccentricity Centrality. Likewise high Degree Centrality also correlated closely with a high Eigenvector Centrality. Whereas Betweenness Centrality varied according to network topology and did not demonstrate any noticeable pattern. In terms of identification of key nodes, we discovered that as compared with other centrality measures, Eigenvector and Eccentricity Centralities were better able to identify important nodes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号