首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Identifying influential nodes in very large-scale directed networks is a big challenge relevant to disparate applications, such as accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks. Known methods range from node centralities, such as degree, closeness and betweenness, to diffusion-based processes, like PageRank and LeaderRank. Some of these methods already take into account the influences of a node’s neighbors but do not directly make use of the interactions among it’s neighbors. Local clustering is known to have negative impacts on the information spreading. We further show empirically that it also plays a negative role in generating local connections. Inspired by these facts, we propose a local ranking algorithm named ClusterRank, which takes into account not only the number of neighbors and the neighbors’ influences, but also the clustering coefficient. Subject to the susceptible-infected-recovered (SIR) spreading model with constant infectivity, experimental results on two directed networks, a social network extracted from delicious.com and a large-scale short-message communication network, demonstrate that the ClusterRank outperforms some benchmark algorithms such as PageRank and LeaderRank. Furthermore, ClusterRank can also be applied to undirected networks where the superiority of ClusterRank is significant compared with degree centrality and k-core decomposition. In addition, ClusterRank, only making use of local information, is much more efficient than global methods: It takes only 191 seconds for a network with about nodes, more than 15 times faster than PageRank.  相似文献   

2.
Identifying influential spreaders in networks, which contributes to optimizing the use of available resources and efficient spreading of information, is of great theoretical significance and practical value. A random-walk-based algorithm LeaderRank has been shown as an effective and efficient method in recognizing leaders in social network, which even outperforms the well-known PageRank method. As LeaderRank is initially developed for binary directed networks, further extensions should be studied in weighted networks. In this paper, a generalized algorithm PhysarumSpreader is proposed by combining LeaderRank with a positive feedback mechanism inspired from an amoeboid organism called Physarum Polycephalum. By taking edge weights into consideration and adding the positive feedback mechanism, PhysarumSpreader is applicable in both directed and undirected networks with weights. By taking two real networks for examples, the effectiveness of the proposed method is demonstrated by comparing with other standard centrality measures.  相似文献   

3.
We live in a computerized and networked society where many of our actions leave a digital trace and affect other people's actions. This has lead to the emergence of a new data-driven research field: mathematical methods of computer science, statistical physics and sociometry provide insights on a wide range of disciplines ranging from social science to human mobility. A recent important discovery is that search engine traffic (i.e., the number of requests submitted by users to search engines on the www) can be used to track and, in some cases, to anticipate the dynamics of social phenomena. Successful examples include unemployment levels, car and home sales, and epidemics spreading. Few recent works applied this approach to stock prices and market sentiment. However, it remains unclear if trends in financial markets can be anticipated by the collective wisdom of on-line users on the web. Here we show that daily trading volumes of stocks traded in NASDAQ-100 are correlated with daily volumes of queries related to the same stocks. In particular, query volumes anticipate in many cases peaks of trading by one day or more. Our analysis is carried out on a unique dataset of queries, submitted to an important web search engine, which enable us to investigate also the user behavior. We show that the query volume dynamics emerges from the collective but seemingly uncoordinated activity of many users. These findings contribute to the debate on the identification of early warnings of financial systemic risk, based on the activity of users of the www.  相似文献   

4.
Peptide mass fingerprinting (PMF) is a valuable method for rapid and high-throughput protein identification using the proteomics approach. Automated search engines, such as Ms-Fit, Mascot, ProFound, and Peptldent, have facilitated protein identification through PMF. The potential to obtain a true MS protein identification result depends on the choice of algorithm as well as experimental factors that influence the information content in MS data. When mass spectral data are incomplete and/or have low mass accuracy, the “number of matches” approach may be inadequate for a useful identification. Several studies have evaluated factors influencing the quality of mass spectrometry (MS) experiments. Missed cleavages, posttranslational modifications of peptides and contaminants (e.g., keratin) are important factors that can affect the results of MS analyses by influencing the identification process as well as the quality of the MS spectra. We compared search engines frequently used to identify proteins fromHomo sapiens andHalobacterium salinarum by evaluating factors, including data-based and mass tolerance to develop an improved search engine for PMF. This study may provide information to help develop a more effective algorithm for protein identification in each species through PMF.  相似文献   

5.
The user-based collaborative filtering (CF) algorithm is one of the most popular approaches for making recommendation. Despite its success, the traditional user-based CF algorithm suffers one serious problem that it only measures the influence between two users based on their symmetric similarities calculated by their consumption histories. It means that, for a pair of users, the influences on each other are the same, which however may not be true. Intuitively, an expert may have an impact on a novice user but a novice user may not affect an expert at all. Besides, each user may possess a global importance factor that affects his/her influence to the remaining users. To this end, in this paper, we propose an asymmetric user influence model to measure the directed influence between two users and adopt the PageRank algorithm to calculate the global importance value of each user. And then the directed influence values and the global importance values are integrated to deduce the final influence values between two users. Finally, we use the final influence values to improve the performance of the traditional user-based CF algorithm. Extensive experiments have been conducted, the results of which have confirmed that both the asymmetric user influence model and global importance value play key roles in improving recommendation accuracy, and hence the proposed method significantly outperforms the existing recommendation algorithms, in particular the user-based CF algorithm on the datasets of high rating density.  相似文献   

6.

Background

Optimal ranking of literature importance is vital in overcoming article overload. Existing ranking methods are typically based on raw citation counts, giving a sum of ‘inbound’ links with no consideration of citation importance. PageRank, an algorithm originally developed for ranking webpages at the search engine, Google, could potentially be adapted to bibliometrics to quantify the relative importance weightings of a citation network. This article seeks to validate such an approach on the freely available, PubMed Central open access subset (PMC-OAS) of biomedical literature.

Results

On-demand cloud computing infrastructure was used to extract a citation network from over 600,000 full-text PMC-OAS articles. PageRanks and citation counts were calculated for each node in this network. PageRank is highly correlated with citation count (R?=?0.905, P?<?0.01) and we thus validate the former as a surrogate of literature importance. Furthermore, the algorithm can be run in trivial time on cheap, commodity cluster hardware, lowering the barrier of entry for resource-limited open access organisations.

Conclusions

PageRank can be trivially computed on commodity cluster hardware and is linearly correlated with citation count. Given its putative benefits in quantifying relative importance, we suggest it may enrich the citation network, thereby overcoming the existing inadequacy of citation counts alone. We thus suggest PageRank as a feasible supplement to, or replacement of, existing bibliometric ranking methods.
  相似文献   

7.
Tandem mass spectrometry-based proteomics experiments produce large amounts of raw data, and different database search engines are needed to reliably identify all the proteins from this data. Here, we present Compid, an easy-to-use software tool that can be used to integrate and compare protein identification results from two search engines, Mascot and Paragon. Additionally, Compid enables extraction of information from large Mascot result files that cannot be opened via the Web interface and calculation of general statistical information about peptide and protein identifications in a data set. To demonstrate the usefulness of this tool, we used Compid to compare Mascot and Paragon database search results for mitochondrial proteome sample of human keratinocytes. The reports generated by Compid can be exported and opened as Excel documents or as text files using configurable delimiters, allowing the analysis and further processing of Compid output with a multitude of programs. Compid is freely available and can be downloaded from http://users.utu.fi/lanatr/compid. It is released under an open source license (GPL), enabling modification of the source code. Its modular architecture allows for creation of supplementary software components e.g. to enable support for additional input formats and report categories.  相似文献   

8.
LC‐MS experiments can generate large quantities of data, for which a variety of database search engines are available to make peptide and protein identifications. Decoy databases are becoming widely used to place statistical confidence in result sets, allowing the false discovery rate (FDR) to be estimated. Different search engines produce different identification sets so employing more than one search engine could result in an increased number of peptides (and proteins) being identified, if an appropriate mechanism for combining data can be defined. We have developed a search engine independent score, based on FDR, which allows peptide identifications from different search engines to be combined, called the FDR Score. The results demonstrate that the observed FDR is significantly different when analysing the set of identifications made by all three search engines, by each pair of search engines or by a single search engine. Our algorithm assigns identifications to groups according to the set of search engines that have made the identification, and re‐assigns the score (combined FDR Score). The combined FDR Score can differentiate between correct and incorrect peptide identifications with high accuracy, allowing on average 35% more peptide identifications to be made at a fixed FDR than using a single search engine.  相似文献   

9.
Text similarity: an alternative way to search MEDLINE   总被引:1,自引:0,他引:1  
MOTIVATION: The most widely used literature search techniques, such as those offered by NCBI's PubMed system, require significant effort on the part of the searcher, and inexperienced searchers do not use these systems as effectively as experienced users. Improved literature search engines can save researchers time and effort by making it easier to locate the most important and relevant literature. RESULTS: We have created and optimized a new, hybrid search system for Medline that takes natural text as input and then delivers results with high precision and recall. The combination of a fast, low-sensitivity weighted keyword-based first pass algorithm to cast a wide net to gather an initial set of literature, followed by a unique sentence-alignment based similarity algorithm to rank order those results was developed that is sensitive, fast and easy to use. Several text similarity search algorithms, both standard and novel, were implemented and tested in order to determine which obtained the best results in information retrieval exercises. AVAILABILITY: Literature searching algorithms are implemented in a system called eTBLAST, freely accessible over the web at http://invention.swmed.edu. A variety of other derivative systems and visualization tools provides the user with an enhanced experience and additional capabilities. CONTACT: Harold.Garner@UTSouthwestern.edu.  相似文献   

10.
The dissemination of biological information has become critically dependent on the Internet and World Wide Web (WWW), which enable distributed access to information in a platform independent manner. The mode of interaction between biologists and on-line information resources, however, has been mostly limited to simple interface technologies such has hypertext links, tables and forms. The introduction of platform-independent runtime environments facilitates the development of more sophisticated WWW-based user interfaces. Until recently, most such interfaces have been tightly coupled to the underlying computation engines, and not separated as reusable components. We believe that many subdisciplines of biology have intuitive and familiar graphical representations of knowledge that can serve as multipurpose user interface elements. We call such graphical idioms “domain graphics”. In order to illustrate the power of such graphics, we have built a reusable interface based on the standard two dimensional (2D) layout of RNA secondary structure. The interface can be used to represent any pre-computed layout of RNA, and takes as a parameters the sets of actions to be performed as a user interacts with the interface. It can provide to any associated application program information about the base, helix, or subsequence selected by the user. We show the versatility of this interface by using it as a special purpose interface to BLAST, Medline and the RNA MFOLD search/compute engines. These demonstrations are available at: ir|url|http://www-smi.stanford.edu/projects/helix/pubs/ gene-combis-96/  相似文献   

11.
Granholm V  Käll L 《Proteomics》2011,11(6):1086-1093
The peptide identification process in shotgun proteomics is most frequently solved with search engines. Such search engines assign scores that reflect similarity between the measured fragmentation spectrum and the theoretical spectra of the peptides of a given database. However, the scores from most search engines do not have a direct statistical interpretation. To understand and make use of the significance of peptide identifications, one must thus be familiar with some statistical concepts. Here, we discuss different statistical scores used to show the confidence of an identification and a set of methods to estimate these scores. We also describe the variance of statistical scores and imperfections of scoring functions of peptide-spectrum matches.  相似文献   

12.
As the speed of mass spectrometers, sophistication of sample fractionation, and complexity of experimental designs increase, the volume of tandem mass spectra requiring reliable automated analysis continues to grow. Software tools that quickly, effectively, and robustly determine the peptide associated with each spectrum with high confidence are sorely needed. Currently available tools that postprocess the output of sequence-database search engines use three techniques to distinguish the correct peptide identifications from the incorrect: statistical significance re-estimation, supervised machine learning scoring and prediction, and combining or merging of search engine results. We present a unifying framework that encompasses each of these techniques in a single model-free machine-learning framework that can be trained in an unsupervised manner. The predictor is trained on the fly for each new set of search results without user intervention, making it robust for different instruments, search engines, and search engine parameters. We demonstrate the performance of the technique using mixtures of known proteins and by using shuffled databases to estimate false discovery rates, from data acquired on three different instruments with two different ionization technologies. We show that this approach outperforms machine-learning techniques applied to a single search engine’s output, and demonstrate that combining search engine results provides additional benefit. We show that the performance of the commercial Mascot tool can be bested by the machine-learning combination of two open-source tools X!Tandem and OMSSA, but that the use of all three search engines boosts performance further still. The Peptide identification Arbiter by Machine Learning (PepArML) unsupervised, model-free, combining framework can be easily extended to support an arbitrary number of additional searches, search engines, or specialized peptide–spectrum match metrics for each spectrum data set. PepArML is open-source and is available from . Electronic supplementary material The online version of this article (doi: ) contains supplementary material, which is available to authorized users.  相似文献   

13.
The biomedical literature contains a wealth of information on associations between many different types of objects, such as protein-protein interactions, gene-disease associations and subcellular locations of proteins. When searching such information using conventional search engines, e.g. PubMed, users see the data only one-abstract at a time and 'hidden' in natural language text. AliBaba is an interactive tool for graphical summarization of search results. It parses the set of abstracts that fit a PubMed query and presents extracted information on biomedical objects and their relationships as a graphical network. AliBaba extracts associations between cells, diseases, drugs, proteins, species and tissues. Several filter options allow for a more focused search. Thus, researchers can grasp complex networks described in various articles at a glance. AVAILABILITY: http://alibaba.informatik.hu-berlin.de/  相似文献   

14.

Background  

Interpretation of simple microarray experiments is usually based on the fold-change of gene expression between a reference and a "treated" sample where the treatment can be of many types from drug exposure to genetic variation. Interpretation of the results usually combines lists of differentially expressed genes with previous knowledge about their biological function. Here we evaluate a method – based on the PageRank algorithm employed by the popular search engine Google – that tries to automate some of this procedure to generate prioritized gene lists by exploiting biological background information.  相似文献   

15.
The summed Alberta Stroke Program Early CT Score (ASPECTS) is useful for predicting stroke outcome. The anatomical information in the CT template is rarely used for this purpose because traditional regression methods are not adept at handling collinearity (relatedness) among brain regions. While penalized logistic regression (PLR) can handle collinearity, it does not provide an intuitive understanding of the interaction among network structures in a way that eigenvector method such as PageRank can (used in Google search engine). In this exploratory analysis we applied graph theoretical analysis to explore the relationship among ASPECTS regions with respect to disability outcome. The Virtual International Stroke Trials Archive (VISTA) was searched for patients who had infarct in at least one ASPECTS region (ASPECTS ≤9, ASPECTS=10 were excluded), and disability (modified Rankin score/mRS). A directed graph was created from a cross correlation matrix (thresholded at false discovery rate of 0.01) of the ASPECTS regions and demographic variables and disability (mRS>2). We estimated the network-based importance of each ASPECTS region by comparing PageRank and node strength measures. These results were compared with those from PLR. There were 185 subjects, average age 67.5± 12.8 years (55% Males). Model 1: demographic variables having no direct connection with disability, the highest PageRank was M2 (0.225, bootstrap 95% CI 0.215-0.347). Model 2: demographic variables having direct connection with disability, the highest PageRank were M2 (0.205, bootstrap 95% CI 0.194-0.367) and M5 (0.125, bootstrap 95% CI 0.096-0.204). Both models illustrate the importance of M2 region to disability. The PageRank method reveals complex interaction among ASPECTS regions with respects to disability. This approach may help to understand the infarcted brain network involved in stroke disability.  相似文献   

16.
Confident identification of peptides via tandem mass spectrometry underpins modern high-throughput proteomics. This has motivated considerable recent interest in the postprocessing of search engine results to increase confidence and calculate robust statistical measures, for example through the use of decoy databases to calculate false discovery rates (FDR). FDR-based analyses allow for multiple testing and can assign a single confidence value for both sets and individual peptide spectrum matches (PSMs). We recently developed an algorithm for combining the results from multiple search engines, integrating FDRs for sets of PSMs made by different search engine combinations. Here we describe a web-server and a downloadable application that makes this routinely available to the proteomics community. The web server offers a range of outputs including informative graphics to assess the confidence of the PSMs and any potential biases. The underlying pipeline also provides a basic protein inference step, integrating PSMs into protein ambiguity groups where peptides can be matched to more than one protein. Importantly, we have also implemented full support for the mzIdentML data standard, recently released by the Proteomics Standards Initiative, providing users with the ability to convert native formats to mzIdentML files, which are available to download.  相似文献   

17.
This paper presents a novel framework for Visual Exploratory Search of Relationship Graphs on Smartphones (VESRGS) that is composed of three major components: inference and representation of semantic relationship graphs on the Web via meta-search, visual exploratory search of relationship graphs through both querying and browsing strategies, and human-computer interactions via the multi-touch interface and mobile Internet on smartphones. In comparison with traditional lookup search methodologies, the proposed VESRGS system is characterized with the following perceived advantages. 1) It infers rich semantic relationships between the querying keywords and other related concepts from large-scale meta-search results from Google, Yahoo! and Bing search engines, and represents semantic relationships via graphs; 2) the exploratory search approach empowers users to naturally and effectively explore, adventure and discover knowledge in a rich information world of interlinked relationship graphs in a personalized fashion; 3) it effectively takes the advantages of smartphones’ user-friendly interfaces and ubiquitous Internet connection and portability. Our extensive experimental results have demonstrated that the VESRGS framework can significantly improve the users’ capability of seeking the most relevant relationship information to their own specific needs. We envision that the VESRGS framework can be a starting point for future exploration of novel, effective search strategies in the mobile Internet era.  相似文献   

18.
Recent analyses of Internet search behaviour conclude that the public’s interest in environmental issues is falling (McCallum and Bury, Biodiv Conserv 22:1355–1367, 2013). Ficetola (Biodiv Conserv 22:2983–2988, 2013) argued that the nature of the underpinning data processing may create an artificially declining trend, even when the absolute number of searches increases and public interest is growing. These findings are highly relevant for applied conservation strategies and the public media have quickly picked the message of the alarming fading interest worldwide, the possibility of devastating repercussions and calls for rapid responses in conservation communication. We challenge both analysis by evaluating Internet searches of English and non-English speaking users. The inclusion of information on the linguistic background reveals a much more differentiated picture, with some cultures displaying an increasing interest and others a decreasing interest. These analyses allow a better understanding of the importance of global—local viewpoints, cultural knowledge and cultural differences on the interpretation of underpinning human interest from Internet search patterns. Despite methodological problems limiting the utility of summary data provided by search engines, they can offer powerful information when applied spatially and temporally restricted and analysed alongside suitable benchmark indicators. We discuss that due consideration of methodological caveats is essential to inform the general public about the relevance for conservation without triggering sensationalist or over-generalizing conclusions. Conservation communication needs considering that Internet search engines do not necessarily mirror the interest of many people who are essential for the conservation of biodiversity.  相似文献   

19.
Research on collective movements has often focused on the sociodemographic parameters explaining the success of some individuals as leaders or initiators of collective movements. Several of these studies have shown the influence of social structure, through kinship and affiliative relationships, on the organization of collective movements. However, these studies have been conducted on semi-free-ranging groups of macaques that were not faced with a natural environment and its constraints. In the socially intolerant rhesus macaque (Macaca mulatta) the success of an initiator correlates with its hierarchical rank, the most dominant individuals being the most successful. We investigated the collective movements of another socially intolerant macaque species, Japanese macaques, in the wild, to assess whether the social structure was still a determinant factor under natural conditions. In line with previous studies of macaques, we found that social structure drove the organization of collective movements. More dominant individuals initiated more collective movements. However, dominance did not affect the success of an initiation, i.e., the number of individuals joining. In addition, kinship strongly constrained the associations observed in females during collective movements. These results reflect the social structure of Japanese macaques, in which strong power asymmetry and kinship relationships constrain the majority of interactions between individuals within the group. Moreover, these results are similar to those observed in semi-free-ranging rhesus macaques and support the hypothesis of an effect of social determinants on collective movements of primates even under natural conditions.  相似文献   

20.
Search engines running on MEDLINE abstracts have been widely used by biologists to find publications that are related to their research. The existing search engines such as PubMed, however, have limitations when applied for the task of seeking textual evidence of relations between given concepts. The limitations are mainly due to the problem that the search engines do not effectively deal with multi-term queries which may imply semantic relations between the terms. To address this problem, we present MedEvi, a novel search engine that imposes positional restriction on occurrences matching multi-term queries, based on the observation that terms with semantic relations which are explicitly stated in text are not found too far from each other. MedEvi further identifies additional keywords of biological and statistical significance from local context of matching occurrences in order to help users reformulate their queries for better results. AVAILABILITY: http://www.ebi.ac.uk/tc-test/textmining/medevi/  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号