首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We develop a Bayesian simulation based approach for determining the sample size required for estimating a binomial probability and the difference between two binomial probabilities where we allow for dependence between two fallible diagnostic procedures. Examples include estimating the prevalence of disease in a single population based on results from two imperfect diagnostic tests applied to sampled individuals, or surveys designed to compare the prevalences of two populations using diagnostic outcomes that are subject to misclassification. We propose a two stage procedure in which the tests are initially assumed to be independent conditional on true disease status (i.e. conditionally independent). An interval based sample size determination scheme is performed under this assumption and data are collected and used to test the conditional independence assumption. If the data reveal the diagnostic tests to be conditionally dependent, structure is added to the model to account for dependence and the sample size routine is repeated in order to properly satisfy the criterion under the correct model. We also examine the impact on required sample size when adding an extra heterogeneous population to a study.  相似文献   

2.
Accurately estimating infection prevalence is fundamental to the study of population health, disease dynamics, and infection risk factors. Prevalence is estimated as the proportion of infected individuals (“individual‐based estimation”), but is also estimated as the proportion of samples in which evidence of infection is detected (“anonymous estimation”). The latter method is often used when researchers lack information on individual host identity, which can occur during noninvasive sampling of wild populations or when the individual that produced a fecal sample is unknown. The goal of this study was to investigate biases in individual‐based versus anonymous prevalence estimation theoretically and to test whether mathematically derived predictions are evident in a comparative dataset of gastrointestinal helminth infections in nonhuman primates. Using a mathematical model, we predict that anonymous estimates of prevalence will be lower than individual‐based estimates when (a) samples from infected individuals do not always contain evidence of infection and/or (b) when false negatives occur. The mathematical model further predicts that no difference in bias should exist between anonymous estimation and individual‐based estimation when one sample is collected from each individual. Using data on helminth parasites of primates, we find that anonymous estimates of prevalence are significantly and substantially (12.17%) lower than individual‐based estimates of prevalence. We also observed that individual‐based estimates of prevalence from studies employing single sampling are on average 6.4% higher than anonymous estimates, suggesting a bias toward sampling infected individuals. We recommend that researchers use individual‐based study designs with repeated sampling of individuals to obtain the most accurate estimate of infection prevalence. Moreover, to ensure accurate interpretation of their results and to allow for prevalence estimates to be compared among studies, it is essential that authors explicitly describe their sampling designs and prevalence calculations in publications.  相似文献   

3.
Planning studies involving diagnostic tests is complicated by the fact that virtually no test provides perfectly accurate results. The misclassification induced by imperfect sensitivities and specificities of diagnostic tests must be taken into account, whether the primary goal of the study is to estimate the prevalence of a disease in a population or to investigate the properties of a new diagnostic test. Previous work on sample size requirements for estimating the prevalence of disease in the case of a single imperfect test showed very large discrepancies in size when compared to methods that assume a perfect test. In this article we extend these methods to include two conditionally independent imperfect tests, and apply several different criteria for Bayesian sample size determination to the design of such studies. We consider both disease prevalence studies and studies designed to estimate the sensitivity and specificity of diagnostic tests. As the problem is typically nonidentifiable, we investigate the limits on the accuracy of parameter estimation as the sample size approaches infinity. Through two examples from infectious diseases, we illustrate the changes in sample sizes that arise when two tests are applied to individuals in a study rather than a single test. Although smaller sample sizes are often found in the two-test situation, they can still be prohibitively large unless accurate information is available about the sensitivities and specificities of the tests being used.  相似文献   

4.
ABSTRACT: BACKGROUND: Many problems in bioinformatics involve classification based on features such as sequence, structure or morphology. Given multiple classifiers, two crucial questions arise: how does their performance compare, and how can they best be combined to produce a better classifier? A classifier can be evaluated in terms of sensitivity and specificity using benchmark, or gold standard, data, that is, data for which the true classification is known. However, a gold standard is not always available. Here we demonstrate that a Bayesian model for comparing medical diagnostics without a gold standard can be successfully applied in the bioinformatics domain, to genomic scale data sets. We present a new implementation, which unlike previous implementations is applicable to any number of classifiers. We apply this model, for the first time, to the problem of finding the globally optimal logical combination of classifiers. RESULTS: We compared three classifiers of protein subcellular localisation, and evaluated our estimates of sensitivity and specificity against estimates obtained using a gold standard. The method overestimated sensitivity and specificity with only a small discrepancy, and correctly ranked the classifiers. Diagnostic tests for swine flu were then compared on a small data set. Lastly, classifiers for a genome-wide association study of macular degeneration with 541094 SNPs were analysed. In all cases, run times were feasible, and results precise. The optimal logical combination of classifiers was also determined for all three data sets. Code and data are available from http://bioinformatics.monash.edu.au/downloads/. CONCLUSIONS: The examples demonstrate the methods are suitable for both small and large data sets, applicable to the wide range of bioinformatics classification problems, and robust to dependence between classifiers. In all three test cases, the globally optimal logical combination of the classifiers was found to be their union, according to three out of four ranking criteria. We propose as a general rule of thumb that the union of classifiers will be close to optimal.  相似文献   

5.
McNamee R 《Biometrics》2004,60(3):783-792
Two-phase designs for estimation of prevalence, where the first-phase classification is fallible and the second is accurate but relatively expensive, are not necessarily justified on efficiency grounds. However, they might be advantageous for dual-purpose studies, for example where prevalence estimation is followed by a clinical trial or case-control study, if they can identify cases of disease for the second study in a cost-effective way. Alternatively, they may be justified on ethical grounds if they can identify more, previously undetected but treatable cases of disease, than a simple random sample design. An approach to sampling is proposed, which formally combines the goals of efficient prevalence estimation and case detection by setting different notional study costs for investigating cases and noncases. Two variants of the method are compared with an "ethical" two-phase scheme proposed by Shrout and Newman (1989, Biometrics 45, 549-555), and with the most efficient scheme for prevalence estimation alone, in terms of the standard error of the prevalence estimate, the expected number of cases, and the fraction of cases among second-phase subjects, given a fixed budget. One variant yields the highest fraction and expected number of cases but also the largest standard errors. The other yields a higher fraction than Shrout and Newman's scheme and a similar number of cases but appears to do so more efficiently.  相似文献   

6.
We assessed proteomic patterns in breast cancer using MALDI MS and laser capture microdissected cells. Protein and peptide expression in invasive mammary carcinoma versus normal mammary epithelium and estrogen-receptor positive versus estrogen-receptor negative tumors were compared. Biomarker candidates were identified by statistical analysis and classifiers were developed and validated in blinded test sets. Several of the m/ z features used in the classifiers were identified by LC-MS/MS and two were confirmed by immunohistochemistry.  相似文献   

7.
In heterogametic species, biological differences between the two sexes are ubiquitous, and hence, errors in sex identification can be a significant source of noise and bias in studies where sex‐related sources of variation are of interest or need to be controlled for. We developed and validated a universal multimarker assay for reliable sex identification of three‐spined sticklebacks (Gasterosteus aculeatus). The assay makes use of genotype scores from three sex‐linked loci and utilizes Bayesian probabilistic inference to identify sex of the genotyped individuals. The results, validated with 286 phenotypically sexed individuals from six populations of sticklebacks representing all major genetic lineages (cf. Pacific, Atlantic and Japan Sea), indicate that in contrast to commonly used single‐marker‐based sex identification assays, the developed multimarker assay should be 100% accurate. As the markers in the assay can be scored from agarose gels, it provides a quick and cost‐efficient tool for universal sex identification of three‐spined sticklebacks. The general principle of combining information from multiple markers to improve the reliability of sex identification is transferable and can be utilized to develop and validate similar assays for other species.  相似文献   

8.
doi: 10.1111/j.1741‐2358.2010.00446.x
Analysis of socio‐demographic and systemic health factors and the normative conditions of oral health care in a population of the Brazilian elderly Objective: To investigate the association of socio‐demographic and systemic health factors according to the normative conditions of oral health care (dental caries, edentulism, periodontal disease and oral mucosal lesion) in elderly individuals. Material and methods: A cross‐sectional study was carried out in a group of elderly with access to community health care (n = 200). The normative conditions of oral health were then investigated according to the WHO and the SB Brazil criteria. Bivariate analyses were evaluated by the chi‐square test and Fisher’s exact test. An estimation of prevalence for the covariates was performed using Poisson’s regression models. Results: The prevalence of edentulism and oral mucosal lesions was detected in 58% and 21.5% of elderly patients, respectively. In the dentate subjects, the prevalence of dental caries and periodontal disease was 51.2% and 20.8%, respectively. Older men and individuals from lower‐income groups exhibited a higher prevalence of dental caries. Elderly women, illiterate individuals, and individuals over the age of 65 years exhibited a higher prevalence of edentulism. Elderly 60–64 years old and those who are employed had a significant association with periodontal disease. Conclusion: Socio‐demographic factors were associated with some notable oral diseases in the elderly.  相似文献   

9.
Disease prevalence is ideally estimated using a 'gold standard' to ascertain true disease status on all subjects in a population of interest. In practice, however, the gold standard may be too costly or invasive to be applied to all subjects, in which case a two-phase design is often employed. Phase 1 data consisting of inexpensive and non-invasive screening tests on all study subjects are used to determine the subjects that receive the gold standard in the second phase. Naive estimates of prevalence in two-phase studies can be biased (verification bias). Imputation and re-weighting estimators are often used to avoid this bias. We contrast the forms and attributes of the various prevalence estimators. Distribution theory and simulation studies are used to investigate their bias and efficiency. We conclude that the semiparametric efficient approach is the preferred method for prevalence estimation in two-phase studies. It is more robust and comparable in its efficiency to imputation and other re-weighting estimators. It is also easy to implement. We use this approach to examine the prevalence of depression in adolescents with data from the Great Smoky Mountain Study.  相似文献   

10.
Dendukuri N  Joseph L 《Biometrics》2001,57(1):158-167
Many analyses of results from multiple diagnostic tests assume the tests are statistically independent conditional on the true disease status of the subject. This assumption may be violated in practice, especially in situations where none of the tests is a perfectly accurate gold standard. Classical inference for models accounting for the conditional dependence between tests requires that results from at least four different tests be used in order to obtain an identifiable solution, but it is not always feasible to have results from this many tests. We use a Bayesian approach to draw inferences about the disease prevalence and test properties while adjusting for the possibility of conditional dependence between tests, particularly when we have only two tests. We propose both fixed and random effects models. Since with fewer than four tests the problem is nonidentifiable, the posterior distributions are strongly dependent on the prior information about the test properties and the disease prevalence, even with large sample sizes. If the degree of correlation between the tests is known a priori with high precision, then our methods adjust for the dependence between the tests. Otherwise, our methods provide adjusted inferences that incorporate all of the uncertainty inherent in the problem, typically resulting in wider interval estimates. We illustrate our methods using data from a study on the prevalence of Strongyloides infection among Cambodian refugees to Canada.  相似文献   

11.
  1. Obtaining accurate estimates of disease prevalence is crucial for the monitoring and management of wildlife populations but can be difficult if different diagnostic tests yield conflicting results and if the accuracy of each diagnostic test is unknown. Bayesian latent class analysis (BLCA) modeling offers a potential solution, providing estimates of prevalence levels and diagnostic test accuracy under the realistic assumption that no diagnostic test is perfect.
  2. In typical applications of this approach, the specificity of one test is fixed at or close to 100%, allowing the model to simultaneously estimate the sensitivity and specificity of all other tests, in addition to infection prevalence. In wildlife systems, a test with near‐perfect specificity is not always available, so we simulated data to investigate how decreasing this fixed specificity value affects the accuracy of model estimates.
  3. We used simulations to explore how the trade‐off between diagnostic test specificity and sensitivity impacts prevalence estimates and found that directional biases depend on pathogen prevalence. Both the precision and accuracy of results depend on the sample size, the diagnostic tests used, and the true infection prevalence, so these factors should be considered when applying BLCA to estimate disease prevalence and diagnostic test accuracy in wildlife systems. A wildlife disease case study, focusing on leptospirosis in California sea lions, demonstrated the potential for Bayesian latent class methods to provide reliable estimates under real‐world conditions.
  4. We delineate conditions under which BLCA improves upon the results from a single diagnostic across a range of prevalence levels and sample sizes, demonstrating when this method is preferable for disease ecologists working in a wide variety of pathogen systems.
  相似文献   

12.
Ecological diffusion is a theory that can be used to understand and forecast spatio‐temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white‐tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression‐based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.  相似文献   

13.
D McClish  D Quade 《Biometrics》1985,41(1):81-89
Suppose a screening or diagnostic test with unknown properties is to be used, not primarily for classifying individuals, but for estimating the prevalence of disease. Its sensitivity and specificity may be enhanced by applying it repeatedly to the same individuals, thus bringing the proportion of individuals with overall positive results closer to the true prevalence. Repeated testing also makes it possible to estimate the prevalence by maximum likelihood. Some simple designs for estimation are evaluated in terms of their accuracy and cost.  相似文献   

14.
In many areas of the world, Potato virus Y (PVY) is one of the most economically important disease problems in seed potatoes. In Taiwan, generation 2 (G2) class certified seed potatoes are required by law to be free of detectable levels of PVY. To meet this standard, it is necessary to perform accurate tests at a reasonable cost. We used a two‐stage testing design involving group testing which was performed in Taiwan's Seed Improvement and Propagation Station to identify plants infected with PVY. At the first stage of this two‐stage testing design, plants are tested in groups. The second stage involves no retesting for negative test groups and exhaustive testing of all constituent individual samples from positive test groups. In order to minimise costs while meeting government standards, it is imperative to estimate optimal group size. However, because of limited test accuracy, classification errors for diagnostic tests are inevitable; to get a more accurate estimate, it is necessary to adjust for these errors. Therefore, this paper describes an analysis of diagnostic test data in which specimens are grouped for batched testing to offset costs. The optimal batch size is determined by various cost parameters as well as test sensitivity, specificity and disease prevalence. Here, the Bayesian method is employed to deal with uncertainty in these parameters. Moreover, we developed a computer program to determine optimal group size for PVY tests such that the expected cost is minimised even when using imperfect diagnostic tests of pooled samples. Results from this research show that, compared with error free testing, when the presence of diagnostic testing errors is taken into account, the optimal group size becomes smaller. Higher diagnostic testing costs, lower costs of false negatives or smaller prevalence can all lead to a larger optimal group size. Regarding the effects of sensitivity and specificity, optimal group size increases as sensitivity increases; however, specificity has little effect on determining optimal group size. From our simulated study, it is apparent that the Bayesian method can truly update the prior information to more closely approximate the intrinsic characteristics of the parameters of interest. We believe that the results of this study will be useful in the implementation of seed potato certification programmes, particularly those which require zero tolerance for quarantine diseases in certified tubers.  相似文献   

15.
To solve the class imbalance problem in the classification of pre-miRNAs with the ab initio method, we developed a novel sample selection method according to the characteristics of pre-miRNAs. Real/pseudo pre-miRNAs are clustered based on their stem similarity and their distribution in high dimensional sample space, respectively. The training samples are selected according to the sample density of each cluster. Experimental results are validated by the cross-validation and other testing datasets composed of human real/pseudo pre-miRNAs. When compared with the previous method, microPred, our classifier miRNAPred is nearly 12% more accurate. The selected training samples also could be used to train other SVM classifiers, such as triplet-SVM, MiPred, miPred, and microPred, to improve their classification performance. The sample selection algorithm is useful for constructing a more efficient classifier for the classification of real pre-miRNAs and pseudo hairpin sequences.  相似文献   

16.
Accurate prediction of survival of cancer patients is still a key open problem in clinical research. Recently, many large-scale gene expression clusterings have identified sets of genes reportedly predictive of prognosis; however, those gene sets shared few genes in common and were poorly validated using independent data. We have developed a systems biology-based approach by using either combined gene sets and the protein interaction network (Method A) or the protein network alone (Method B) to identify common prognostic genes based on microarray gene expression data of glioblastoma multiforme and compared with differential gene expression clustering (Method C). Validations of prediction performance show that the 23-prognostic gene classifier identified by Method A outperforms other gene classifiers identified by Methods B and C or previously reported for gliomas on 17 of 20 independent sample cohorts across five tumor types. We also find that among the 23 genes are 21 related to cellular proliferation and two related to response to stress/immune response. We further find that the increased expression of the 21 genes and the decreased expression of the other two genes are associated with poorer survival, which is supportive with the notion that cellular proliferation and immune response contribute to a significant portion of predictive power of prognostic classifiers. Our results demonstrate that the systems biology-based approach enables to identify common survival-associated genes.  相似文献   

17.
We consider the estimation of the prevalence of a rare disease, and the log‐odds ratio for two specified groups of individuals from group testing data. For a low‐prevalence disease, the maximum likelihood estimate of the log‐odds ratio is severely biased. However, Firth correction to the score function leads to a considerable improvement of the estimator. Also, for a low‐prevalence disease, if the diagnostic test is imperfect, the group testing is found to yield more precise estimate of the log‐odds ratio than the individual testing.  相似文献   

18.
Parentage assignment is defined as the identification of the true parents of one focal offspring among a list of candidates and has been commonly used in zoological, ecological, and agricultural studies. Although likelihood‐based parentage assignment is the preferred method in most cases, it requires genotyping a predefined set of DNA markers and providing their population allele frequencies. In the present study, we proposed an alternative method of parentage assignment that does not depend on genotype data and prior information of allele frequencies. Our method employs the restriction site‐associated DNA sequencing (RAD‐seq) reads for clustering into the overlapped RAD loci among the compared individuals, following which the likelihood ratio of parentage assignment could be directly calculated using two parameters—the genome heterozygosity and error rate of sequencing reads. This method was validated on one simulated and two real data sets with the accurate assignment of true parents to focal offspring. However, our method could not provide a statistical confidence to conclude that the first ranked candidate is a true parent.  相似文献   

19.
Insights into latent class analysis of diagnostic test performance   总被引:2,自引:0,他引:2  
Latent class analysis is used to assess diagnostic test accuracy when a gold standard assessment of disease is not available but results of multiple imperfect tests are. We consider the simplest setting, where 3 tests are observed and conditional independence (CI) is assumed. Closed-form expressions for maximum likelihood parameter estimates are derived. They show explicitly how observed 2- and 3-way associations between test results are used to infer disease prevalence and test true- and false-positive rates. Although interesting and reasonable under CI, the estimators clearly have no basis when it fails. Intuition for bias induced by conditional dependence follows from the analytic expressions. Further intuition derives from an Expectation Maximization (EM) approach to calculating the estimates. We discuss implications of our results and related work for settings where more than 3 tests are available. We conclude that careful justification of assumptions about the dependence between tests in diseased and nondiseased subjects is necessary in order to ensure unbiased estimates of prevalence and test operating characteristics and to provide these estimates clinical interpretations. Such justification must be based in part on a clear clinical definition of disease and biological knowledge about mechanisms giving rise to test results.  相似文献   

20.
Individual‐based data sets tracking organisms over space and time are fundamental to answering broad questions in ecology and evolution. A ‘permanent’ genetic tag circumvents a need to invasively mark or tag animals, especially if there are little phenotypic differences among individuals. However, genetic tracking of individuals does not come without its limits; correctly matching genotypes and error rates associated with laboratory work can make it difficult to parse out matched individuals. In addition, defining a sampling design that effectively matches individuals in the wild can be a challenge for researchers. Here, we combine the two objectives of defining sampling design and reducing genotyping error through an efficient Python‐based computer‐modelling program, wisepair . We describe the methods used to develop the computer program and assess its effectiveness through three empirical data sets, with and without reference genotypes. Our results show that wisepair outperformed similar genotype matching programs using previously published from reference genotype data of diurnal poison frogs (Allobates femoralis) and without‐reference (faecal) genotype sample data sets of harbour seals (Phoca vitulina) and Eurasian otters (Lutra lutra). In addition, due to limited sampling effort in the harbour seal data, we present optimal sampling designs for future projects. wisepair allows for minimal sacrifice in the available methods as it incorporates sample rerun error data, allelic pairwise comparisons and probabilistic simulations to determine matching thresholds. Our program is the lone tool available to researchers to define parameters a priori for genetic tracking studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号