首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Disease prevalence is ideally estimated using a 'gold standard' to ascertain true disease status on all subjects in a population of interest. In practice, however, the gold standard may be too costly or invasive to be applied to all subjects, in which case a two-phase design is often employed. Phase 1 data consisting of inexpensive and non-invasive screening tests on all study subjects are used to determine the subjects that receive the gold standard in the second phase. Naive estimates of prevalence in two-phase studies can be biased (verification bias). Imputation and re-weighting estimators are often used to avoid this bias. We contrast the forms and attributes of the various prevalence estimators. Distribution theory and simulation studies are used to investigate their bias and efficiency. We conclude that the semiparametric efficient approach is the preferred method for prevalence estimation in two-phase studies. It is more robust and comparable in its efficiency to imputation and other re-weighting estimators. It is also easy to implement. We use this approach to examine the prevalence of depression in adolescents with data from the Great Smoky Mountain Study.  相似文献   

2.

Background

Google Flu Trends (GFT) uses anonymized, aggregated internet search activity to provide near-real time estimates of influenza activity. GFT estimates have shown a strong correlation with official influenza surveillance data. The 2009 influenza virus A (H1N1) pandemic [pH1N1] provided the first opportunity to evaluate GFT during a non-seasonal influenza outbreak. In September 2009, an updated United States GFT model was developed using data from the beginning of pH1N1.

Methodology/Principal Findings

We evaluated the accuracy of each U.S. GFT model by comparing weekly estimates of ILI (influenza-like illness) activity with the U.S. Outpatient Influenza-like Illness Surveillance Network (ILINet). For each GFT model we calculated the correlation and RMSE (root mean square error) between model estimates and ILINet for four time periods: pre-H1N1, Summer H1N1, Winter H1N1, and H1N1 overall (Mar 2009–Dec 2009). We also compared the number of queries, query volume, and types of queries (e.g., influenza symptoms, influenza complications) in each model. Both models'' estimates were highly correlated with ILINet pre-H1N1 and over the entire surveillance period, although the original model underestimated the magnitude of ILI activity during pH1N1. The updated model was more correlated with ILINet than the original model during Summer H1N1 (r = 0.95 and 0.29, respectively). The updated model included more search query terms than the original model, with more queries directly related to influenza infection, whereas the original model contained more queries related to influenza complications.

Conclusions

Internet search behavior changed during pH1N1, particularly in the categories “influenza complications” and “term for influenza.” The complications associated with pH1N1, the fact that pH1N1 began in the summer rather than winter, and changes in health-seeking behavior each may have played a part. Both GFT models performed well prior to and during pH1N1, although the updated model performed better during pH1N1, especially during the summer months.  相似文献   

3.
Google Flu Trends (GFT) uses Internet search queries in an effort to provide early warning of increases in influenza-like illness (ILI). In the United States, GFT estimates the percentage of physician visits related to ILI (%ILINet) reported by the Centers for Disease Control and Prevention (CDC). However, during the 2012–13 influenza season, GFT overestimated %ILINet by an appreciable amount and estimated the peak in incidence three weeks late. Using data from 2010–14, we investigated the relationship between GFT estimates (%GFT) and %ILINet. Based on the relationship between the relative change in %GFT and the relative change in %ILINet, we transformed %GFT estimates to better correspond with %ILINet values. In 2010–13, our transformed %GFT estimates were within ±10% of %ILINet values for 17 of the 29 weeks that %ILINet was above the seasonal baseline value determined by the CDC; in contrast, the original %GFT estimates were within ±10% of %ILINet values for only two of these 29 weeks. Relative to the %ILINet peak in 2012–13, the peak in our transformed %GFT estimates was 2% lower and one week later, whereas the peak in the original %GFT estimates was 74% higher and three weeks later. The same transformation improved %GFT estimates using the recalibrated 2013 GFT model in early 2013–14. Our transformed %GFT estimates can be calculated approximately one week before %ILINet values are reported by the CDC and the transformation equation was stable over the time period investigated (2010–13). We anticipate our results will facilitate future use of GFT.  相似文献   

4.
The goal of influenza-like illness (ILI) surveillance is to determine the timing, location and magnitude of outbreaks by monitoring the frequency and progression of clinical case incidence. Advances in computational and information technology have allowed for automated collection of higher volumes of electronic data and more timely analyses than previously possible. Novel surveillance systems, including those based on internet search query data like Google Flu Trends (GFT), are being used as surrogates for clinically-based reporting of influenza-like-illness (ILI). We investigated the reliability of GFT during the last decade (2003 to 2013), and compared weekly public health surveillance with search query data to characterize the timing and intensity of seasonal and pandemic influenza at the national (United States), regional (Mid-Atlantic) and local (New York City) levels. We identified substantial flaws in the original and updated GFT models at all three geographic scales, including completely missing the first wave of the 2009 influenza A/H1N1 pandemic, and greatly overestimating the intensity of the A/H3N2 epidemic during the 2012/2013 season. These results were obtained for both the original (2008) and the updated (2009) GFT algorithms. The performance of both models was problematic, perhaps because of changes in internet search behavior and differences in the seasonality, geographical heterogeneity and age-distribution of the epidemics between the periods of GFT model-fitting and prospective use. We conclude that GFT data may not provide reliable surveillance for seasonal or pandemic influenza and should be interpreted with caution until the algorithm can be improved and evaluated. Current internet search query data are no substitute for timely local clinical and laboratory surveillance, or national surveillance based on local data collection. New generation surveillance systems such as GFT should incorporate the use of near-real time electronic health data and computational methods for continued model-fitting and ongoing evaluation and improvement.  相似文献   

5.
This paper considers questions of standard error and questions of bias in the maximum likelihood estimation of parameters associated with an HLA-linked disease. It is shown that a considerable reduction in standard error is possible using data on population prevalence and parental disease status, if available. Comparison is made with standard errors arising in the shared haplotypes method. The biases considered relate to misspecification of the ascertainment scheme, to incorrect assumptions about parameter values, to the possibility that affected parents have lower fitness than unaffected parents, and to the possibility of within family correlation of penetrance values due to effects of a common environment.  相似文献   

6.
The Corona Virus Disease (COVID-19) pandemic has increased mortality in countries worldwide. To evaluate the impact of the pandemic on mortality, the use of excess mortality rather than reported COVID-19 deaths has been suggested. Excess mortality, however, requires estimation of mortality under nonpandemic conditions. Although many methods exist to forecast mortality, they are either complex to apply, require many sources of information, ignore serial correlation, and/or are influenced by historical excess mortality. We propose a linear mixed model that is easy to apply, requires only historical mortality data, allows for serial correlation, and down-weighs the influence of historical excess mortality. Appropriateness of the linear mixed model is evaluated with fit statistics and forecasting accuracy measures for Belgium and the Netherlands. Unlike the commonly used 5-year weekly average, the linear mixed model is forecasting the year-specific mortality, and as a result improves the estimation of excess mortality for Belgium and the Netherlands.  相似文献   

7.
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.  相似文献   

8.
Area disease estimation based on sentinel hospital records   总被引:2,自引:0,他引:2  
Wang JF  Reis BY  Hu MG  Christakos G  Yang WZ  Sun Q  Li ZJ  Li XZ  Lai SJ  Chen HY  Wang DC 《PloS one》2011,6(8):e23428

Background

Population health attributes (such as disease incidence and prevalence) are often estimated using sentinel hospital records, which are subject to multiple sources of uncertainty. When applied to these health attributes, commonly used biased estimation techniques can lead to false conclusions and ineffective disease intervention and control. Although some estimators can account for measurement error (in the form of white noise, usually after de-trending), most mainstream health statistics techniques cannot generate unbiased and minimum error variance estimates when the available data are biased.

Methods and Findings

A new technique, called the Biased Sample Hospital-based Area Disease Estimation (B-SHADE), is introduced that generates space-time population disease estimates using biased hospital records. The effectiveness of the technique is empirically evaluated in terms of hospital records of disease incidence (for hand-foot-mouth disease and fever syndrome cases) in Shanghai (China) during a two-year period. The B-SHADE technique uses a weighted summation of sentinel hospital records to derive unbiased and minimum error variance estimates of area incidence. The calculation of these weights is the outcome of a process that combines: the available space-time information; a rigorous assessment of both, the horizontal relationships between hospital records and the vertical links between each hospital''s records and the overall disease situation in the region. In this way, the representativeness of the sentinel hospital records was improved, the possible biases of these records were corrected, and the generated area incidence estimates were best linear unbiased estimates (BLUE). Using the same hospital records, the performance of the B-SHADE technique was compared against two mainstream estimators.

Conclusions

The B-SHADE technique involves a hospital network-based model that blends the optimal estimation features of the Block Kriging method and the sample bias correction efficiency of the ratio estimator method. In this way, B-SHADE can overcome the limitations of both methods: Block Kriging''s inadequacy concerning the correction of sample bias and spatial clustering; and the ratio estimator''s limitation as regards error minimization. The generality of the B-SHADE technique is further demonstrated by the fact that it reduces to Block Kriging in the case of unbiased samples; to ratio estimator if there is no correlation between hospitals; and to simple statistic if the hospital records are neither biased nor space-time correlated. In addition to the theoretical advantages of the B-SHADE technique over the two other methods above, two real world case studies (hand-foot-mouth disease and fever syndrome cases) demonstrated its empirical superiority, as well.  相似文献   

9.
Understanding infectious disease dynamics and the effect on prevalence and incidence is crucial for public health policies. Disease incidence and prevalence are typically not observed directly and increasingly are estimated through the synthesis of indirect information from multiple data sources. We demonstrate how an evidence synthesis approach to the estimation of human immunodeficiency virus (HIV) prevalence in England and Wales can be extended to infer the underlying HIV incidence. Diverse time series of data can be used to obtain yearly "snapshots" (with associated uncertainty) of the proportion of the population in 4 compartments: not at risk, susceptible, HIV positive but undiagnosed, and diagnosed HIV positive. A multistate model for the infection and diagnosis processes is then formulated by expressing the changes in these proportions by a system of differential equations. By parameterizing incidence in terms of prevalence and contact rates, HIV transmission is further modeled. Use of additional data or prior information on demographics, risk behavior change and contact parameters allows simultaneous estimation of the transition rates, compartment prevalences, contact rates, and transmission probabilities.  相似文献   

10.
Accurately estimating infection prevalence is fundamental to the study of population health, disease dynamics, and infection risk factors. Prevalence is estimated as the proportion of infected individuals (“individual‐based estimation”), but is also estimated as the proportion of samples in which evidence of infection is detected (“anonymous estimation”). The latter method is often used when researchers lack information on individual host identity, which can occur during noninvasive sampling of wild populations or when the individual that produced a fecal sample is unknown. The goal of this study was to investigate biases in individual‐based versus anonymous prevalence estimation theoretically and to test whether mathematically derived predictions are evident in a comparative dataset of gastrointestinal helminth infections in nonhuman primates. Using a mathematical model, we predict that anonymous estimates of prevalence will be lower than individual‐based estimates when (a) samples from infected individuals do not always contain evidence of infection and/or (b) when false negatives occur. The mathematical model further predicts that no difference in bias should exist between anonymous estimation and individual‐based estimation when one sample is collected from each individual. Using data on helminth parasites of primates, we find that anonymous estimates of prevalence are significantly and substantially (12.17%) lower than individual‐based estimates of prevalence. We also observed that individual‐based estimates of prevalence from studies employing single sampling are on average 6.4% higher than anonymous estimates, suggesting a bias toward sampling infected individuals. We recommend that researchers use individual‐based study designs with repeated sampling of individuals to obtain the most accurate estimate of infection prevalence. Moreover, to ensure accurate interpretation of their results and to allow for prevalence estimates to be compared among studies, it is essential that authors explicitly describe their sampling designs and prevalence calculations in publications.  相似文献   

11.
We consider the possible bias in cancer risk estimation from A-bomb survivors due to selection of the cohort by survival. The paper considers both relevant information from the data and basic theoretical issues involved. The most direct information from the data comes from making various restrictions on the dose-distance range, partly to reduce differential selection and partly just to reduce the magnitude of the selection. These analyses suggest that there are no serious biases, but they are not conclusive. Theoretical considerations include laying out more explicitly than usual just how biases could result from the selection. This involves heterogeneities in the ability to survive acute effects, in baseline and radiogenic cancer rates, and most importantly the correlation between survival-related and cancer-related heterogeneities. Following on this, idealized modeling is used to quantify the extent of possible bias in terms of the assumed values of the magnitude of these heterogeneities and their correlation. It is indicated that these values would need to be very large to introduce substantial bias. Based on all these considerations, it seems unlikely that the bias in cancer risk estimation could be large in relation to other uncertainties in generalizing from what is seen among A-bomb survivors; in particular, indications are that the bias in relative risks is unlikely to be as large as 0.05 to 0.07. For solid cancer this would correspond to bias in the excess relative risk at 1 Sv of at most about 15-20%.  相似文献   

12.
中国中东部地区暴雨气候及其农业灾情的风险评估   总被引:2,自引:0,他引:2  
利用中国中东部292个站点1961-2008年的暴雨气候资料和各省的农业灾情资料,采用主成分分析、软直方图估计、灰色关联法、正态信息扩散等方法,分别构建了暴雨气候指数、农业相对灾情指数及其风险估算模型,对中国中东部地区暴雨气候及其农业灾情的风险进行了研究.结果表明:中国中东部地区的暴雨气候风险,由南向北逐渐减少,高值区位于海南和广东、广西的沿海地区,其次是广西和广东的中北部,江淮地区的湖北、安徽、江西以及湘赣地区的湖南,低值区主要位于除辽宁沿海区域外的东北地区以及华北北部的河北和山西;农业相对灾情风险的高值区位于江淮地区的安徽、湖北,湘赣地区的湖南,东南沿海的广东,低值区是华北地区的河北、河南,东北地区的辽宁;除广东省外,各代表省的暴雨气候指数与农业相对灾情指数的相关系数均达到0.6以上(P<0.Ol).经多年实际农业灾情验证,暴雨气候指数和农业相对灾情指数能较好地评估实际暴雨的强度及其对农业的影响.  相似文献   

13.
SUMMARY: Modern biological experiments create vast amounts of data which are geographically distributed. These datasets consist of petabytes of raw data and billions of documents. Yet to the best of our knowledge, a search engine technology that searches and cross-links all different data types in life sciences does not exist. We have developed a prototype distributed scientific search engine technology, 'Sciencenet', which facilitates rapid searching over this large data space. By 'bringing the search engine to the data', we do not require server farms. This platform also allows users to contribute to the search index and publish their large-scale data to support e-Science. Furthermore, a community-driven method guarantees that only scientific content is crawled and presented. Our peer-to-peer approach is sufficiently scalable for the science web without performance or capacity tradeoff. AVAILABILITY AND IMPLEMENTATION: The free to use search portal web page and the downloadable client are accessible at: http://sciencenet.kit.edu. The web portal for index administration is implemented in ASP.NET, the 'AskMe' experiment publisher is written in Python 2.7, and the backend 'YaCy' search engine is based on Java 1.6.  相似文献   

14.
The study of the effect of large-scale drivers (e.g., climate) of human diseases typically relies on aggregate disease data collected by the government surveillance network. The usual approach to analyze these data, however, often ignores a) changes in the total number of individuals examined, b) the bias towards symptomatic individuals in routine government surveillance, and; c) the influence that observations can have on disease dynamics. Here, we highlight the consequences of ignoring the problems listed above and develop a novel modeling framework to circumvent them, which is illustrated using simulations and real malaria data. Our simulations reveal that trends in the number of disease cases do not necessarily imply similar trends in infection prevalence or incidence, due to the strong influence of concurrent changes in sampling effort. We also show that ignoring decreases in the pool of infected individuals due to the treatment of part of these individuals can hamper reliable inference on infection incidence. We propose a model that avoids these problems, being a compromise between phenomenological statistical models and mechanistic disease dynamics models; in particular, a cross-validation exercise reveals that it has better out-of-sample predictive performance than both of these alternative models. Our case study in the Brazilian Amazon reveals that infection prevalence was high in 2004–2008 (prevalence of 4% with 95% CI of 3–5%), with outbreaks (prevalence up to 18%) occurring during the dry season of the year. After this period, infection prevalence decreased substantially (0.9% with 95% CI of 0.8–1.1%), which is due to a large reduction in infection incidence (i.e., incidence in 2008–2010 was approximately one fifth of the incidence in 2004–2008).We believe that our approach to modeling government surveillance disease data will be useful to advance current understanding of large-scale drivers of several diseases.  相似文献   

15.
Hao K  Cawley S 《Human heredity》2007,63(3-4):219-228
BACKGROUND: Current biotechnologies are able to achieve high accuracy and call rates. Concerns are raised on how differential performance on various genotypes may bias association tests. Quantitatively, we define differential dropout rate as the ratio of no-call rate among heterozygotes and homozygotes. METHODS: The hazard ofdifferential dropout is examined for population- and family-based association tests through a simulation study. Also, we investigate detection approaches such as Hardy-Weinberg Equilibrium (HWE) and testing for correlation between sample call rate and sample heterozygosity. Finally, we analyze two public datasets and evaluate the magnitudes of differential dropout. RESULTS: In case-control settings, differential dropout has negligible effect on power and odds ratio (OR) estimation. However, the impact on family-based tests range from minor to severe depending on the disease parameters. Such impact is more prominent when disease allele frequency is relatively low (e.g., 5%), where a differential dropout rate of 2.5 can dramatically bias OR estimation and reduce power even at a decent 98% overall call rate and moderate effect size (e.g., OR(true) = 2.11). Both of the two public datasets follow HWE; however, HapMap data carries detectable differential dropout that may endanger family-based studies. CONCLUSIONS: Case-control approach appears to be robust to differential dropout; however, family-based association tests can be heavily biased. Both of the public genotype data show high call rate, but differential dropout is detected in HapMap data. We suggest researchers carefully control this potential confounder even using data of high accuracy and high overall call rate.  相似文献   

16.
Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.  相似文献   

17.
Outbreaks of infectious viruses resulting from spillover events from bats have brought much attention to bat‐borne zoonoses, which has motivated increased ecological and epidemiological studies on bat populations. Field sampling methods often collect pooled samples of bat excreta from plastic sheets placed under‐roosts. However, positive bias is introduced because multiple individuals may contribute to pooled samples, making studies of viral dynamics difficult. Here, we explore the general issue of bias in spatial sample pooling using Hendra virus in Australian bats as a case study. We assessed the accuracy of different under‐roost sampling designs using generalized additive models and field data from individually captured bats and pooled urine samples. We then used theoretical simulation models of bat density and under‐roost sampling to understand the mechanistic drivers of bias. The most commonly used sampling design estimated viral prevalence 3.2 times higher than individual‐level data, with positive bias 5–7 times higher than other designs due to spatial autocorrelation among sampling sheets and clustering of bats in roosts. Simulation results indicate using a stratified random design to collect 30–40 pooled urine samples from 80 to 100 sheets, each with an area of 0.75–1 m2, and would allow estimation of true prevalence with minimum sampling bias and false negatives. These results show that widely used under‐roost sampling techniques are highly sensitive to viral presence, but lack specificity, providing limited information regarding viral dynamics. Improved estimation of true prevalence can be attained with minor changes to existing designs such as reducing sheet size, increasing sheet number, and spreading sheets out within the roost area. Our findings provide insight into how spatial sample pooling is vulnerable to bias for a wide range of systems in disease ecology, where optimal sampling design is influenced by pathogen prevalence, host population density, and patterns of aggregation.  相似文献   

18.
One widely used measure of familial aggregation is the sibling recurrence-risk ratio, which is defined as the ratio of risk of disease manifestation, given that one's sibling is affected, as compared with the disease prevalence in the general population. Known as lambdaS, it has been used extensively in the mapping of complex diseases. In this paper, I show that, for a fictitious disease that is strictly nongenetic and nonenvironmental, lambdaS can be dramatically inflated because of misunderstanding of the original definition of lambdaS, ascertainment bias, and overreporting. Therefore, for a disease of entirely environmental origin, the lambdaS inflation due to ascertainment bias and/or overreporting is expected to be more prominent if the risk factor also is familially aggregated. This suggests that, like segregation analysis, the estimation of lambdaS also is prone to ascertainment bias and should be performed with great care. This is particularly important if one uses lambdaS for exclusion mapping, for discrimination between different genetic models, and for association studies, since these practices hinge tightly on an accurate estimation of lambdaS.  相似文献   

19.
Diseased animals may exhibit behavioral shifts that increase or decrease their probability of being randomly sampled. In harvest-based sampling approaches, animal movements, changes in habitat utilization, changes in breeding behaviors during harvest periods, or differential susceptibility to harvest via behaviors like hiding or decreased sensitivity to stimuli may result in a non-random sample that biases prevalence estimates. We present a method that can be used to determine whether bias exists in prevalence estimates from harvest samples. Using data from harvested mule deer (Odocoileus hemionus) sampled in northcentral Colorado (USA) during fall hunting seasons 1996-98 and Akaike's information criterion (AIC) model selection, we detected within-yr trends indicating potential bias in harvest-based prevalence estimates for chronic wasting disease (CWD). The proportion of CWD-positive deer harvested slightly increased through time within a yr. We speculate that differential susceptibility to harvest or breeding season movements may explain the positive trend in proportion of CWD-positive deer harvested during fall hunting seasons. Detection of bias may provide information about temporal patterns of a disease, suggest biological hypotheses that could further understanding of a disease, or provide wildlife managers with information about when diseased animals are more or less likely to be harvested. Although AIC model selection can be useful for detecting bias in data, it has limited utility in determining underlying causes of bias. In cases where bias is detected in data using such model selection methods, then design-based methods (i.e., experimental manipulation) may be necessary to assign causality.  相似文献   

20.
The goal of this work is to introduce new metrics to assess risk of Alzheimer''s disease (AD) which we call AD Pattern Similarity (AD-PS) scores. These metrics are the conditional probabilities modeled by large-scale regularized logistic regression. The AD-PS scores derived from structural MRI and cognitive test data were tested across different situations using data from the Alzheimer''s Disease Neuroimaging Initiative (ADNI) study. The scores were computed across groups of participants stratified by cognitive status, age and functional status. Cox proportional hazards regression was used to evaluate associations with the distribution of conversion times from mild cognitive impairment to AD. The performances of classifiers developed using data from different types of brain tissue were systematically characterized across cognitive status groups. We also explored the performance of anatomical and cognitive-anatomical composite scores generated by combining the outputs of classifiers developed using different types of data. In addition, we provide the AD-PS scores performance relative to other metrics used in the field including the Spatial Pattern of Abnormalities for Recognition of Early AD (SPARE-AD) index and total hippocampal volume for the variables examined.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号