首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Dengue is a common and growing problem worldwide, with an estimated 70–140 million cases per year. Traditional, healthcare-based, government-implemented dengue surveillance is resource intensive and slow. As global Internet use has increased, novel, Internet-based disease monitoring tools have emerged. Google Dengue Trends (GDT) uses near real-time search query data to create an index of dengue incidence that is a linear proxy for traditional surveillance. Studies have shown that GDT correlates highly with dengue incidence in multiple countries on a large spatial scale. This study addresses the heterogeneity of GDT at smaller spatial scales, assessing its accuracy at the state-level in Mexico and identifying factors that are associated with its accuracy. We used Pearson correlation to estimate the association between GDT and traditional dengue surveillance data for Mexico at the national level and for 17 Mexican states. Nationally, GDT captured approximately 83% of the variability in reported cases over the 9 study years. The correlation between GDT and reported cases varied from state to state, capturing anywhere from 1% of the variability in Baja California to 88% in Chiapas, with higher accuracy in states with higher dengue average annual incidence. A model including annual average maximum temperature, precipitation, and their interaction accounted for 81% of the variability in GDT accuracy between states. This climate model was the best indicator of GDT accuracy, suggesting that GDT works best in areas with intense transmission, particularly where local climate is well suited for transmission. Internet accessibility (average ∼36%) did not appear to affect GDT accuracy. While GDT seems to be a less robust indicator of local transmission in areas of low incidence and unfavorable climate, it may indicate cases among travelers in those areas. Identifying the strengths and limitations of novel surveillance is critical for these types of data to be used to make public health decisions and forecasting models.  相似文献   

2.
Economies are instances of complex socio-technical systems that are shaped by the interactions of large numbers of individuals. The individual behavior and decision-making of consumer agents is determined by complex psychological dynamics that include their own assessment of present and future economic conditions as well as those of others, potentially leading to feedback loops that affect the macroscopic state of the economic system. We propose that the large-scale interactions of a nation''s citizens with its online resources can reveal the complex dynamics of their collective psychology, including their assessment of future system states. Here we introduce a behavioral index of Chinese Consumer Confidence (C3I) that computationally relates large-scale online search behavior recorded by Google Trends data to the macroscopic variable of consumer confidence. Our results indicate that such computational indices may reveal the components and complex dynamics of consumer psychology as a collective socio-economic phenomenon, potentially leading to improved and more refined economic forecasting.  相似文献   

3.

Background

In South Korea, there is currently no syndromic surveillance system using internet search data, including Google Flu Trends. The purpose of this study was to investigate the correlation between national influenza surveillance data and Google Trends in South Korea.

Methods

Our study was based on a publicly available search engine database, Google Trends, using 12 influenza-related queries, from September 9, 2007 to September 8, 2012. National surveillance data were obtained from the Korea Centers for Disease Control and Prevention (KCDC) influenza-like illness (ILI) and virologic surveillance system. Pearson''s correlation coefficients were calculated to compare the national surveillance and the Google Trends data for the overall period and for 5 influenza seasons.

Results

The correlation coefficient between the KCDC ILI and virologic surveillance data was 0.72 (p<0.05). The highest correlation was between the Google Trends query of H1N1 and the ILI data, with a correlation coefficient of 0.53 (p<0.05), for the overall study period. When compared with the KCDC virologic data, the Google Trends query of bird flu had the highest correlation with a correlation coefficient of 0.93 (p<0.05) in the 2010-11 season. The following queries showed a statistically significant correlation coefficient compared with ILI data for three consecutive seasons: Tamiflu (r = 0.59, 0.86, 0.90, p<0.05), new flu (r = 0.64, 0.43, 0.70, p<0.05) and flu (r = 0.68, 0.43, 0.77, p<0.05).

Conclusions

In our study, we found that the Google Trends for certain queries using the survey on influenza correlated with national surveillance data in South Korea. The results of this study showed that Google Trends in the Korean language can be used as complementary data for influenza surveillance but was insufficient for the use of predictive models, such as Google Flu Trends.  相似文献   

4.
The goal of influenza-like illness (ILI) surveillance is to determine the timing, location and magnitude of outbreaks by monitoring the frequency and progression of clinical case incidence. Advances in computational and information technology have allowed for automated collection of higher volumes of electronic data and more timely analyses than previously possible. Novel surveillance systems, including those based on internet search query data like Google Flu Trends (GFT), are being used as surrogates for clinically-based reporting of influenza-like-illness (ILI). We investigated the reliability of GFT during the last decade (2003 to 2013), and compared weekly public health surveillance with search query data to characterize the timing and intensity of seasonal and pandemic influenza at the national (United States), regional (Mid-Atlantic) and local (New York City) levels. We identified substantial flaws in the original and updated GFT models at all three geographic scales, including completely missing the first wave of the 2009 influenza A/H1N1 pandemic, and greatly overestimating the intensity of the A/H3N2 epidemic during the 2012/2013 season. These results were obtained for both the original (2008) and the updated (2009) GFT algorithms. The performance of both models was problematic, perhaps because of changes in internet search behavior and differences in the seasonality, geographical heterogeneity and age-distribution of the epidemics between the periods of GFT model-fitting and prospective use. We conclude that GFT data may not provide reliable surveillance for seasonal or pandemic influenza and should be interpreted with caution until the algorithm can be improved and evaluated. Current internet search query data are no substitute for timely local clinical and laboratory surveillance, or national surveillance based on local data collection. New generation surveillance systems such as GFT should incorporate the use of near-real time electronic health data and computational methods for continued model-fitting and ongoing evaluation and improvement.  相似文献   

5.
The estimation of disease prevalence in online search engine data (e.g., Google Flu Trends (GFT)) has received a considerable amount of scholarly and public attention in recent years. While the utility of search engine data for disease surveillance has been demonstrated, the scientific community still seeks ways to identify and reduce biases that are embedded in search engine data. The primary goal of this study is to explore new ways of improving the accuracy of disease prevalence estimations by combining traditional disease data with search engine data. A novel method, Biased Sentinel Hospital-based Area Disease Estimation (B-SHADE), is introduced to reduce search engine data bias from a geographical perspective. To monitor search trends on Hand, Foot and Mouth Disease (HFMD) in Guangdong Province, China, we tested our approach by selecting 11 keywords from the Baidu index platform, a Chinese big data analyst similar to GFT. The correlation between the number of real cases and the composite index was 0.8. After decomposing the composite index at the city level, we found that only 10 cities presented a correlation of close to 0.8 or higher. These cities were found to be more stable with respect to search volume, and they were selected as sample cities in order to estimate the search volume of the entire province. After the estimation, the correlation improved from 0.8 to 0.864. After fitting the revised search volume with historical cases, the mean absolute error was 11.19% lower than it was when the original search volume and historical cases were combined. To our knowledge, this is the first study to reduce search engine data bias levels through the use of rigorous spatial sampling strategies.  相似文献   

6.
Cyberinfrastructure is a product of the information age that provides a framework for informing adaptive management of ecological entities under the impact of regional and global change. It supports proximity monitoring, user-friendly data management, knowledge discovery by data synthesis, and decision making by forecasting.A workflow is proposed that suits the iterative nature of adaptive management. It takes advantage of novel sensor, genomics, and communication technology for ecological monitoring, of ontologies, semantic webs and blockchain for data management, of hybrid, machine and deep learning concepts for data synthesis and forecasting. Forecasting at different time horizons is guiding decision making for adjusting management and continuing monitoring.This review aims to make researchers, decision makers and stakeholders aware of currently existing technology to make better use of ecological data and models for timely and evidence-based decisions.  相似文献   

7.
Chikungunya, a mosquito-borne disease, is a growing threat in Brazil, where over 640,000 cases have been reported since 2017. However, there are often long delays between diagnoses of chikungunya cases and their entry in the national monitoring system, leaving policymakers without the up-to-date case count statistics they need. In contrast, weekly data on Google searches for chikungunya is available with no delay. Here, we analyse whether Google search data can help improve rapid estimates of chikungunya case counts in Rio de Janeiro, Brazil. We build on a Bayesian approach suitable for data that is subject to long and varied delays, and find that including Google search data reduces both model error and uncertainty. These improvements are largest during epidemics, which are particularly important periods for policymakers. Including Google search data in chikungunya surveillance systems may therefore help policymakers respond to future epidemics more quickly.  相似文献   

8.
Public interest in most aspects of the environment is sharply declining relative to other subjects, as measured by internet searches performed on Google. Changes in the search behavior by the public are closely tied to their interests, and those interests are critical to driving public policy. Google Insights for Search (GIFS) was a tool that provided access to search data but is now combined with another tool, Google Trends. We used GIFS to obtain data for 19 environment-related terms from 2001 to 2009. The only environment-related term with large positive slope was climate change. All other terms that we queried had strong negative slopes indicating that searches for these topics dropped over the last decade. Our results suggest that the public is growing less interested in the environment.  相似文献   

9.
Experimental studies in the area of Psychology and Behavioral Economics have suggested that people change their search pattern in response to positive and negative events. Using Internet search data provided by Google, we investigated the relationship between stock-specific events and related Google searches. We studied daily data from 13 stocks from the Dow-Jones and NASDAQ100 indices, over a period of 4 trading years. Focusing on periods in which stocks were extensively searched (Intensive Search Periods), we found a correlation between the magnitude of stock returns at the beginning of the period and the volume, peak, and duration of search generated during the period. This relation between magnitudes of stock returns and subsequent searches was considerably magnified in periods following negative stock returns. Yet, we did not find that intensive search periods following losses were associated with more Google searches than periods following gains. Thus, rather than increasing search, losses improved the fit between people’s search behavior and the extent of real-world events triggering the search. The findings demonstrate the robustness of the attentional effect of losses.  相似文献   

10.
ABSTRACT: BACKGROUND: In systems biology, the task of reverse engineering gene pathways from data has been limited not just by the curse of dimensionality (the interaction space is huge) but also by systematic error in the data. The gene expression barcode reduces spurious association driven by batch effects and probe effects. The binary nature of the resulting expression calls lends itself perfectly for modern regularization approaches that thrive with dimensionality. RESULTS: The Partitioned LASSO-Patternsearch algorithm is proposed to identify patterns of multiple dichotomous risk factors for outcomes of interest in genomic studies. A partitioning scheme is used to identify promising patterns by solving many LASSO-Patternsearch subproblems in parallel. All variables that survive this stage proceed to an aggregation stage where the most significant patterns are identified by solving a reduced LASSO-Patternsearch problem in just these variables. This approach was applied to genetic data sets with expression levels dichotomized by gene expression bar code. Most of the genes and second-order interactions thus selected and are known to be related to the outcomes. CONCLUSIONS: We demonstrate with simulations and data analyses that the proposed method not only selects variables and patterns more accurately, but also provides smaller models with better prediction accuracy, in comparison to several competing methodologies.  相似文献   

11.
Theory suggests that human behavior has implications for disease spread. We examine the hypothesis that individuals engage in voluntary defensive behavior during an epidemic. We estimate the number of passengers missing previously purchased flights as a function of concern for swine flu or A/H1N1 influenza using 1.7 million detailed flight records, Google Trends, and the World Health Organization''s FluNet data. We estimate that concern over “swine flu,” as measured by Google Trends, accounted for 0.34% of missed flights during the epidemic. The Google Trends data correlates strongly with media attention, but poorly (at times negatively) with reported cases in FluNet. Passengers show no response to reported cases. Passengers skipping their purchased trips forwent at least $50 M in travel related benefits. Responding to actual cases would have cut this estimate in half. Thus, people appear to respond to an epidemic by voluntarily engaging in self-protection behavior, but this behavior may not be responsive to objective measures of risk. Clearer risk communication could substantially reduce epidemic costs. People undertaking costly risk reduction behavior, for example, forgoing nonrefundable flights, suggests they may also make less costly behavior adjustments to avoid infection. Accounting for defensive behaviors may be important for forecasting epidemics, but linking behavior with epidemics likely requires consideration of risk communication.  相似文献   

12.
With the continuous growth of internet usage, Google Trends has emerged as a source of information to investigate how social trends evolve over time. Knowing how the level of interest in conservation topics—approximated using Google search volume—varies over time can help support targeted conservation science communication. However, the evolution of search volume over time and the mechanisms that drive peaks in searches are poorly understood. We conducted time series analyses on Google search data from 2004 to 2013 to investigate: (i) whether interests in selected conservation topics have declined and (ii) the effect of news reporting and academic publishing on search volume. Although trends were sensitive to the term used as benchmark, we did not find that public interest towards conservation topics such as climate change, ecosystem services, deforestation, orangutan, invasive species and habitat loss was declining. We found, however, a robust downward trend for endangered species and an upward trend for ecosystem services. The quantity of news articles was related to patterns in Google search volume, whereas the number of research articles was not a good predictor but lagged behind Google search volume, indicating the role of news in the transfer of conservation science to the public.  相似文献   

13.
BackgroundWeb queries are now widely used for modeling, nowcasting and forecasting influenza-like illness (ILI). However, given that ILI attack rates vary significantly across ages, in terms of both magnitude and timing, little is known about whether the association between ILI morbidity and ILI-related queries is comparable across different age-groups. The present study aimed to investigate features of the association between ILI morbidity and ILI-related query volume from the perspective of age.MethodsSince Google Flu Trends is unavailable in Italy, Google Trends was used to identify entry terms that correlated highly with official ILI surveillance data. All-age and age-class-specific modeling was performed by means of linear models with generalized least-square estimation. Hold-out validation was used to quantify prediction accuracy. For purposes of comparison, predictions generated by exponential smoothing were computed.ResultsFive search terms showed high correlation coefficients of > .6. In comparison with exponential smoothing, the all-age query-based model correctly predicted the peak time and yielded a higher correlation coefficient with observed ILI morbidity (.978 vs. .929). However, query-based prediction of ILI morbidity was associated with a greater error. Age-class-specific query-based models varied significantly in terms of prediction accuracy. In the 0–4 and 25–44-year age-groups, these did well and outperformed exponential smoothing predictions; in the 15–24 and ≥ 65-year age-classes, however, the query-based models were inaccurate and highly overestimated peak height. In all but one age-class, peak timing predicted by the query-based models coincided with observed timing.ConclusionsThe accuracy of web query-based models in predicting ILI morbidity rates could differ among ages. Greater age-specific detail may be useful in flu query-based studies in order to account for age-specific features of the epidemiology of ILI.  相似文献   

14.
Infectious disease is a leading threat to public health, economic stability, and other key social structures. Efforts to mitigate these impacts depend on accurate and timely monitoring to measure the risk and progress of disease. Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social internet data, such as social media and search queries, are emerging. These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness. We examine a freely available, open data source for this use: access logs from the online encyclopedia Wikipedia. Using linear models, language as a proxy for location, and a systematic yet simple article selection procedure, we tested 14 location-disease combinations and demonstrate that these data feasibly support an approach that overcomes these challenges. Specifically, our proof-of-concept yields models with up to 0.92, forecasting value up to the 28 days tested, and several pairs of models similar enough to suggest that transferring models from one location to another without re-training is feasible. Based on these preliminary results, we close with a research agenda designed to overcome these challenges and produce a disease monitoring and forecasting system that is significantly more effective, robust, and globally comprehensive than the current state of the art.  相似文献   

15.
Malaria is one of the most severe problems faced by the world even today. Understanding the causative factors such as age, sex, social factors, environmental variability etc. as well as underlying transmission dynamics of the disease is important for epidemiological research on malaria and its eradication. Thus, development of suitable modeling approach and methodology, based on the available data on the incidence of the disease and other related factors is of utmost importance. In this study, we developed a simple non-linear regression methodology in modeling and forecasting malaria incidence in Chennai city, India, and predicted future disease incidence with high confidence level. We considered three types of data to develop the regression methodology: a longer time series data of Slide Positivity Rates (SPR) of malaria; a smaller time series data (deaths due to Plasmodium vivax) of one year; and spatial data (zonal distribution of P. vivax deaths) for the city along with the climatic factors, population and previous incidence of the disease. We performed variable selection by simple correlation study, identification of the initial relationship between variables through non-linear curve fitting and used multi-step methods for induction of variables in the non-linear regression analysis along with applied Gauss-Markov models, and ANOVA for testing the prediction, validity and constructing the confidence intervals. The results execute the applicability of our method for different types of data, the autoregressive nature of forecasting, and show high prediction power for both SPR and P. vivax deaths, where the one-lag SPR values plays an influential role and proves useful for better prediction. Different climatic factors are identified as playing crucial role on shaping the disease curve. Further, disease incidence at zonal level and the effect of causative factors on different zonal clusters indicate the pattern of malaria prevalence in the city. The study also demonstrates that with excellent models of climatic forecasts readily available, using this method one can predict the disease incidence at long forecasting horizons, with high degree of efficiency and based on such technique a useful early warning system can be developed region wise or nation wise for disease prevention and control activities.  相似文献   

16.
Hyracoids have been allied with either perissodactyls or tethytheres (i.e., Proboscidea + Sirenia) based on morphological data. The latter hypothesis, termed Paenungulata, is corroborated by numerous molecular studies. However, molecular studies have failed to support Tethytheria, a group that is supported by morphological data. We examined relationships among living paenungulate orders using a multigene data set that included sequences from four mitochondrial genes (12S rRNA, tRNA valine, 16S rRNA, cytochrome b) and four nuclear genes (aquaporin, A2AB, IRBP, vWF). Nineteen maximum-likelihood models were employed, including models with process partitions for base composition and substitution parameterizations. With the inclusion of partitions with a heterogeneous base composition, 18 of 19 models favored Hyracoidea + Sirenia. All 19 models favored Hyracoidea + Sirenia after excluding heterogeneous base composition partitions. Most of the support for Hyracoidea + Sirenia derived from the mitochondrial genes (bootstrap support ranged from 51 to 99%); Tethytheria, in turn, received 0 to 19% support in different analyses. Bootstrap support deriving from the nuclear genes was more evenly split among the competing hypotheses (3 to 45% for Tethytheria; 17.5 to 62% for Hyracoidea + Sirenia). Lineage-specific rate variation among both mitochondrial and nuclear genes may contribute to the different results that were obtained with mitochondrial versus nuclear data. Whether Tethytheria or a competing hypothesis is correct, short internodes on the molecular phylogenies suggest that paenungulate orders diverged from each other over a 5- to 8-million-year time window extending from the late Paleocene into the early Eocene. We also used likelihood-ratio tests to compare different models of sequence evolution. A gamma distribution of rates results in a greater improvement in likelihood scores than does an allowance for invariant sites. Twenty-one rate partitions corresponding to stems, loops, and codon positions of different genes result in higher likelihood scores than a gamma distribution of rates and/or an allowance for invariant sites. Process partitions of the data that incorporate base composition and substitution parameterizations result in significant improvements in likelihood scores in comparison to models that allow only for relative rate differences among partitions.  相似文献   

17.

Background

The use of internet search data has been demonstrated to be effective at predicting influenza incidence. This approach may be more successful for dengue which has large variation in annual incidence and a more distinctive clinical presentation and mode of transmission.

Methods

We gathered freely-available dengue incidence data from Singapore (weekly incidence, 2004–2011) and Bangkok (monthly incidence, 2004–2011). Internet search data for the same period were downloaded from Google Insights for Search. Search terms were chosen to reflect three categories of dengue-related search: nomenclature, signs/symptoms, and treatment. We compared three models to predict incidence: a step-down linear regression, generalized boosted regression, and negative binomial regression. Logistic regression and Support Vector Machine (SVM) models were used to predict a binary outcome defined by whether dengue incidence exceeded a chosen threshold. Incidence prediction models were assessed using and Pearson correlation between predicted and observed dengue incidence. Logistic and SVM model performance were assessed by the area under the receiver operating characteristic curve. Models were validated using multiple cross-validation techniques.

Results

The linear model selected by AIC step-down was found to be superior to other models considered. In Bangkok, the model has an , and a correlation of 0.869 between fitted and observed. In Singapore, the model has an , and a correlation of 0.931. In both Singapore and Bangkok, SVM models outperformed logistic regression in predicting periods of high incidence. The AUC for the SVM models using the 75th percentile cutoff is 0.906 in Singapore and 0.960 in Bangkok.

Conclusions

Internet search terms predict incidence and periods of large incidence of dengue with high accuracy and may prove useful in areas with underdeveloped surveillance systems. The methods presented here use freely available data and analysis tools and can be readily adapted to other settings.  相似文献   

18.
《Biophysical journal》2019,116(12):2367-2377
A one-dimensional (1D) search is an essential step in DNA target recognition. Theoretical studies have suggested that the sequence dependence of 1D diffusion can help resolve the competing demands of a fast search and high target affinity, a conflict known as the speed-selectivity paradox. The resolution requires that the diffusion energy landscape is correlated with the underlying specific binding energies. In this work, we report observations of a 1D search by quantum dot-labeled EcoRI. Our data supports the view that proteins search DNA via rotation-coupled sliding over a corrugated energy landscape. We observed that whereas EcoRI primarily slides along DNA at low salt concentrations, at higher concentrations, its diffusion is a combination of sliding and hopping. We also observed long-lived pauses at genomic star sites, which differ by a single nucleotide from the target sequence. To reconcile these observations with prior biochemical and structural data, we propose a model of search in which the protein slides over a sequence-independent energy landscape during fast search but rapidly interconverts with a “hemispecific” binding mode in which a half site is probed. This half site interaction stabilizes the transition to a fully specific mode of binding, which can then lead to target recognition.  相似文献   

19.
Sequence data often have competing signals that are detected by network programs or Lento plots. Such data can be formed by generating sequences on more than one tree, and combining the results, a mixture model. We report that with such mixture models, the estimates of edge (branch) lengths from maximum likelihood (ML) methods that assume a single tree are biased. Based on the observed number of competing signals in real data, such a bias of ML is expected to occur frequently. Because network methods can recover competing signals more accurately, there is a need for ML methods allowing a network. A fundamental problem is that mixture models can have more parameters than can be recovered from the data, so that some mixtures are not, in principle, identifiable. We recommend that network programs be incorporated into best practice analysis, along with ML and Bayesian trees.  相似文献   

20.
Seasonal influenza epidemics cause consistent, considerable, widespread loss annually in terms of economic burden, morbidity, and mortality. With access to accurate and reliable forecasts of a current or upcoming influenza epidemic’s behavior, policy makers can design and implement more effective countermeasures. This past year, the Centers for Disease Control and Prevention hosted the “Predict the Influenza Season Challenge”, with the task of predicting key epidemiological measures for the 2013–2014 U.S. influenza season with the help of digital surveillance data. We developed a framework for in-season forecasts of epidemics using a semiparametric Empirical Bayes framework, and applied it to predict the weekly percentage of outpatient doctors visits for influenza-like illness, and the season onset, duration, peak time, and peak height, with and without using Google Flu Trends data. Previous work on epidemic modeling has focused on developing mechanistic models of disease behavior and applying time series tools to explain historical data. However, tailoring these models to certain types of surveillance data can be challenging, and overly complex models with many parameters can compromise forecasting ability. Our approach instead produces possibilities for the epidemic curve of the season of interest using modified versions of data from previous seasons, allowing for reasonable variations in the timing, pace, and intensity of the seasonal epidemics, as well as noise in observations. Since the framework does not make strict domain-specific assumptions, it can easily be applied to some other diseases with seasonal epidemics. This method produces a complete posterior distribution over epidemic curves, rather than, for example, solely point predictions of forecasting targets. We report prospective influenza-like-illness forecasts made for the 2013–2014 U.S. influenza season, and compare the framework’s cross-validated prediction error on historical data to that of a variety of simpler baseline predictors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号