首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the estimation of the prevalence of a rare disease, and the log‐odds ratio for two specified groups of individuals from group testing data. For a low‐prevalence disease, the maximum likelihood estimate of the log‐odds ratio is severely biased. However, Firth correction to the score function leads to a considerable improvement of the estimator. Also, for a low‐prevalence disease, if the diagnostic test is imperfect, the group testing is found to yield more precise estimate of the log‐odds ratio than the individual testing.  相似文献   

2.
Abstract Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.  相似文献   

3.
Tung L  Gordon D  Finch SJ 《Human heredity》2007,63(2):101-110
This paper extends gene-environment (G x E) interaction study designs in which the gene (G) is known and the environmental variable (E) is specified to the analysis of 'time-to-event' data, using Cox proportional hazards (PH) modeling. The objectives are to assess whether a random sample of subjects can be used to detect a specific G x E interaction and to study the sensitivity of the power of PH modeling to genotype misclassification. We find that a random sample of 2,100 is sufficient to detect a moderate G x E interaction. The increase in sample size necessary (SSN) to maintain Type I and Type II error rates is calculated for each of the 6 genotyping errors for both dominant and recessive modes of inheritance (MOI). The increase in SSN required is relatively small when each genotyping error rate is less than 1% and the disease allele frequency is between 0.2 and 0.5. The genotyping errors that require the greatest increase in SSN are any misclassification of a subject without the at-risk genotype as having the at-risk genotype. Such errors require an indefinitely large increase in SSN as the disease allele frequency approaches 0, suggesting that it is especially important that subjects recorded as having the at-risk genotype be correctly genotyped. Additionally, for a dominant MOI, large increases in SSN can occur with large disease allele frequency.  相似文献   

4.
Kim W  Gordon D  Sebat J  Ye KQ  Finch SJ 《PloS one》2008,3(10):e3475
Recent studies suggest that copy number polymorphisms (CNPs) may play an important role in disease susceptibility and onset. Currently, the detection of CNPs mainly depends on microarray technology. For case-control studies, conventionally, subjects are assigned to a specific CNP category based on the continuous quantitative measure produced by microarray experiments, and cases and controls are then compared using a chi-square test of independence. The purpose of this work is to specify the likelihood ratio test statistic (LRTS) for case-control sampling design based on the underlying continuous quantitative measurement, and to assess its power and relative efficiency (as compared to the chi-square test of independence on CNP counts). The sample size and power formulas of both methods are given. For the latter, the CNPs are classified using the Bayesian classification rule. The LRTS is more powerful than this chi-square test for the alternatives considered, especially alternatives in which the at-risk CNP categories have low frequencies. An example of the application of the LRTS is given for a comparison of CNP distributions in individuals of Caucasian or Taiwanese ethnicity, where the LRTS appears to be more powerful than the chi-square test, possibly due to misclassification of the most common CNP category into a less common category.  相似文献   

5.
Simple Bayesian statistical models are introduced to estimate the proportion of identifiable individuals and group sizes in photographic identification, or photo‐ID, studies of animals that are found in groups. The models require a simple random photographic sampling of animals, where the photographic captures are treated as sampling with replacement within each group. The total number of images, including those that cannot be identified, and the number of images that contain identifiable individuals are used to make inference about the proportion of identifiable individuals within each group and as the population when a number of groups are sampled. The numbers of images for individuals within each group are used to make inference about the group size. Based on analyses of simulated and real data, the models perform well with respect to accuracy and precision of posterior distributions of the parameters. Widths of posterior intervals were affected by the number of groups sampled, sampling duration, and the proportion of identifiable individuals in each group that was sampled. The structure of the models can accommodate covariates, which may affect photographic efficiency, defined in this study as the probability of photographically capturing individuals.  相似文献   

6.
Accurately estimating infection prevalence is fundamental to the study of population health, disease dynamics, and infection risk factors. Prevalence is estimated as the proportion of infected individuals (“individual‐based estimation”), but is also estimated as the proportion of samples in which evidence of infection is detected (“anonymous estimation”). The latter method is often used when researchers lack information on individual host identity, which can occur during noninvasive sampling of wild populations or when the individual that produced a fecal sample is unknown. The goal of this study was to investigate biases in individual‐based versus anonymous prevalence estimation theoretically and to test whether mathematically derived predictions are evident in a comparative dataset of gastrointestinal helminth infections in nonhuman primates. Using a mathematical model, we predict that anonymous estimates of prevalence will be lower than individual‐based estimates when (a) samples from infected individuals do not always contain evidence of infection and/or (b) when false negatives occur. The mathematical model further predicts that no difference in bias should exist between anonymous estimation and individual‐based estimation when one sample is collected from each individual. Using data on helminth parasites of primates, we find that anonymous estimates of prevalence are significantly and substantially (12.17%) lower than individual‐based estimates of prevalence. We also observed that individual‐based estimates of prevalence from studies employing single sampling are on average 6.4% higher than anonymous estimates, suggesting a bias toward sampling infected individuals. We recommend that researchers use individual‐based study designs with repeated sampling of individuals to obtain the most accurate estimate of infection prevalence. Moreover, to ensure accurate interpretation of their results and to allow for prevalence estimates to be compared among studies, it is essential that authors explicitly describe their sampling designs and prevalence calculations in publications.  相似文献   

7.
Technological developments allow increasing numbers of markers to be deployed in case-control studies searching for genetic factors that influence disease susceptibility. However, with vast numbers of markers, true 'hits' may become lost in a sea of false positives. This problem may be particularly acute for infectious diseases, where the control group may contain unexposed individuals with susceptible genotypes. To explore this effect, we used a series of stochastic simulations to model a scenario based loosely on bovine tuberculosis. We find that a candidate gene approach tends to have greater statistical power than studies that use large numbers of single nucleotide polymorphisms (SNPs) in genome-wide association tests, almost regardless of the number of SNPs deployed. Both approaches struggle to detect genetic effects when these are either weak or if an appreciable proportion of individuals are unexposed to the disease when modest sample sizes (250 each of cases and controls) are used, but these issues are largely mitigated if sample sizes can be increased to 2000 or more of each class. We conclude that the power of any genotype-phenotype association test will be improved if the sampling strategy takes account of exposure heterogeneity, though this is not necessarily easy to do.  相似文献   

8.
One of the first and most important steps in planning a genetic association study is the accurate estimation of the statistical power under a proposed study design and sample size. In association studies for candidate genes or in fine-mapping applications, allele and genotype frequencies are often assumed to be known when, in fact, they are unknown (i.e., random variables from some distribution). For example, if we consider a diallelic marker with allele frequencies of 0.5 and 0.5 and Hardy-Weinberg proportions, the three genotype frequencies are often assumed to be 0.25, 0.50, and 0.25, and the statistical power is calculated. Unfortunately, ignoring this source of variation can inflate the estimated power of the study. In the present article, we propose averaging the estimates of power over the distribution of the genotype frequencies to calculate the true estimate of power for a fixed allele frequency. For the usual situation, in which allele frequencies in a population are not known, we propose placing a prior distribution on the allele frequency, taking advantage of any available genotype information. This Bayesian approach provides a more accurate estimate of power. We present examples for quantitative and qualitative traits in cohort studies of unrelated individuals and results from an extensive series of examples that show that ignoring the uncertainty in allele frequencies can inflate the estimated power of the study. We also present the results from case-control studies and show that standard methods may also overestimate power. As discussed in this article, the approach of fixing allele frequencies even if they are not known is the common approach to power calculations. We show that ignoring the sources of variation in allele frequencies tends to result in overestimates of power and, consequently, in studies that are underpowered. Software in C is available at http://www.ambrosius.net/Power/.  相似文献   

9.
Angiotensin II is the major effector molecule of renin-angiotensin system; its production can be conveniently interrupted by angiotensin-converting enzyme (ACE). Typical plasma levels of ACE accompany the I/D polymorphism; however, a controversy exists as to whether the DD genotype of the ACE polymorphism affects the risk for the development of coronary artery disease (CAD) and to what extent the ACE polymorphism is associated with CAD in different populations. We compared the I/D polymorphism in 212 CAD patients younger than 50 years with 165 healthy control individuals. They were all from the Tuzla region in Bosnia and Herzegovina. Patients with CAD had a higher prevalence of the DD genotype (36.3%) than controls (25.6%). The odds ratio for the ACE DD genotype in CAD patients was 1.7 (95% confidence interval 1.0-2.7; p < 0.05). We may conclude that the D/D genotype of the ACE gene polymorphism is associated with an increased risk for CAD in the Bosnian population.  相似文献   

10.
Our goal was to compare methods for tagging single-nucleotide polymorphisms (tagSNPs) with respect to the power to detect disease association under differing haplotype-disease association models. We were also interested in the effect that SNP selection samples, consisting of either cases, controls, or a mixture, would have on power. We investigated five previously described algorithms for choosing tagSNPS: two that picked SNPs based on haplotype structure (Chapman-haplotypic and Stram), two that picked SNPs based on pair-wise allelic association (Chapman-allelic and Cousin), and one control method that chose equally spaced SNPs (Zhai). In two disease-associated regions from the Genetic Analysis Workshop 14 simulated data, we tested the association between tagSNP genotype and disease over the tagSNP sets chosen by each method for each sampling scheme. This was repeated for 100 replicates to estimate power. The two allelic methods chose essentially all SNPs in the region and had nearly optimal power. The two haplotypic methods chose about half as many SNPs. The haplotypic methods had poor performance compared to the allelic methods in both regions. We expected an improvement in power when the selection sample contained cases; however, there was only moderate variation in power between the sampling approaches for each method. Finally, when compared to the haplotypic methods, the reference method performed as well or worse in the region with ancestral disease haplotype structure.  相似文献   

11.
Colorectal cancer represents a complex disease where susceptibility may be influenced by genetic polymorphisms in the DNA repair system. In the present study we investigated the role of nine single nucleotide polymorphisms in eight DNA repair genes on the risk of colorectal cancer in a hospital-based case-control population (532 cases and 532 sex- and age-matched controls). Data analysis showed that the variant allele homozygotes for the Asn148Glu polymorphism in the APE1 gene were at a statistically non-significant increased risk of colorectal cancer. The risk was more pronounced for colon cancer (odds ratio, OR: 1.50; 95% confidence interval, CI: 1.01-2.22; p=0.05). The data stratification showed increased risk of colorectal cancer in the age group 64-86 years in both individuals heterozygous (OR: 1.79; 95% CI: 1.04-3.07; p=0.04) and homozygous (OR: 2.57; 95% CI: 1.30-5.06; p=0.007) for the variant allele of the APE1 Asn148Glu polymorphism. Smokers homozygous for the variant allele of the hOGG1 Ser326Cys polymorphism showed increased risk of colorectal cancer (OR: 4.17; 95% CI: 1.17-15.54; p=0.03). The analysis of binary genotype combinations showed increased colorectal cancer risk in individuals simultaneously homozygous for the variant alleles of APE1 Asn148Glu and hOGG1 Ser326Cys (OR: 6.37; 95% CI: 1.40-29.02; p=0.02). Considering the subtle effect of the DNA repair polymorphisms on the risk of colorectal cancer, exploration of gene-gene and gene-environmental interactions with a large sample size with sufficient statistical power are recommended.  相似文献   

12.
Genetic variants of interleukin-3 (IL-3), a well-studied cytokine, may have a role in the pathophysiology of rheumatoid arthritis (RA); but reports on this association sometimes conflict. A case-control study was designed to investigate association between RA and a single-nucleotide polymorphism (SNP) in the IL-3 promoter region. Comparison of cases of RA versus control individuals yielded a chi(2) value of 14.28 (P=.0002), with a genotype odds ratio of 2.24 (95% confidence interval [95%CI] 1.44-3.49). When female cases with earlier onset were compared with female control individuals, the SNP revealed an even more significant correlation, with chi2=21.75 (P=.000004) and a genotype odds ratio of 7.27 (95%CI 2.80-18.89). The stronger association that we observed in this clinically distinct subgroup (females with early onset), within a region where linkage disequilibrium was not significantly extended, suggested that the genuine RA locus should locate either within or close to the IL-3 gene. Combined genotype data on SNPs on eight other candidate genes were combined with our IL-3 results, to estimate relationships between pairs of loci and RA, by maximum-likelihood analysis. The utility of combining the genotype data in this way to identify possible contributions of various genes to this disease is discussed.  相似文献   

13.
Group testing, also known as pooled testing, and inverse sampling are both widely used methods of data collection when the goal is to estimate a small proportion. Taking a Bayesian approach, we consider the new problem of estimating disease prevalence from group testing when inverse (negative binomial) sampling is used. Using different distributions to incorporate prior knowledge of disease incidence and different loss functions, we derive closed form expressions for posterior distributions and resulting point and credible interval estimators. We then evaluate our new estimators, on Bayesian and classical grounds, and apply our methods to a West Nile Virus data set.  相似文献   

14.
Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.  相似文献   

15.
Designs for synthetic case-control studies in open cohorts   总被引:3,自引:0,他引:3  
Several designs are proposed for case-control studies within cohorts when the cohort is open to late entry. These and previously proposed designs are examined with respect to consistency and efficiency of relative risk parameter estimation, and a small simulation study is reported. If study costs increase in proportion to the total number of "at-risk" controls, the most efficient design, Design C, is as follows. For a case failing at time t, controls are selected at random (and without regard to "at-risk" status) from among cohort members who are (i) known not to have failed prior to t and (ii) have not been previously selected as controls. At each t, control sampling proceeds until a prespecified number of controls who are "at risk" at t have been obtained. The efficiency advantage of Design C over that of the standard case-control design proposed by Thomas (in Appendix to Liddell, McDonald, and Thomas, 1977, Journal of the Royal Statistical Society, Series B 140, 469-490) will often be small. If, on the other hand, the costs increase in proportion to the number of distinct "at-risk" controls, Design C is no longer the most efficient design. In this case, several alternative designs are proposed.  相似文献   

16.
Summary We consider a problem of testing mixture proportions using two‐sample data, one from group one and the other from a mixture of groups one and two with unknown proportion, λ, for being in group two. Various statistical applications, including microarray study, infectious epidemiological studies, case–control studies with contaminated controls, clinical trials allowing “nonresponders,” genetic studies for gene mutation, and fishery applications can be formulated in this setup. Under the assumption that the log ratio of probability (density) functions from the two groups is linear in the observations, we propose a generalized score test statistic to test the mixture proportion. Under some regularity conditions, it is shown that this statistic converges to a weighted chi‐squared random variable under the null hypothesis of λ= 0 , where the weight depends only on the sampling fraction of both groups. The permutation method is used to provide more reliable finite sample approximation. Simulation results and two real data applications are presented.  相似文献   

17.
Objectives: To estimate the independent association between the wearing of removable partial dentures (RPD) and the presence of root caries in a population of older adults. Design: Multivariate logistic regression modeling of root caries prevalence using different measures of root caries as dependent variables. The model included measures of disease history as indicators of historical risk. Setting: Data collected in the field from three areas of England. Subjects: Random sample of adults aged 60 years and over, drawn from lists of patients registered with general medical practitioners. Intervension: Field measurements of a range of oral health variables including oral disease, disease history, oral status and various social and demographic measures. Main outcome measures: The presence of root caries, unsound and sound root restorations. Results: Of the five different models of root caries prevalence which were used, RPDs featured as an independent risk indicator for root surface cades in the three which were related to the presence of untreated disease. The odds ratios for the contribution made by RPDs were all over 1.6, and when considered alone was in excess of 2 in one model. These models were generally well fitting. RPDs did not feature as a risk indicator in the two models which related only to the presence of root surface restorations. Conclusions: In this study, where RPDs were present, the odds of untreated disease being present increased substantially  相似文献   

18.
The distribution of divergence times between member species of a community reflects the pattern of species composition. In this study, we contrast the species composition of a community against the meta‐community, which we define as the species composition of a set of target communities. We regard the collection of species that comprise a community as a sample from the set of member species of the meta‐community, and interpret the pattern of the community species composition in terms of the type of species sampled from the meta‐community. A newly defined effective species sampling proportion explains the amount of the difference between the divergence time distributions of the community and that of the meta‐community, assuming random sampling. We propose a new index of phylogenetic skew (PS), as the ratio of the maximum‐likelihood estimate of the effective species sampling proportion to the observed sampling proportion. A PS value of 1 is interpreted as random sampling. If the value is >1, the sampling is suspected to be phylogenetically skewed. If it is <1, systematic thinning of species is likely. Unlike other indices, the PS does not depend on species richness as long as the community has more than a few members of a species. Because it is possible to compare partially observed communities, the index may be effectively used in exploratory analysis to detect candidate communities with unique species compositions from a large number of communities.  相似文献   

19.
Selecting a control group that is perfectly matched for ethnic ancestry with a group of affected individuals is a major problem in studying the association of a candidate gene with a disease. This problem can be avoided by a design that uses parental data in place of nonrelated controls. Schaid and Sommer presented two new methods for the statistical analysis using this approach: (1) a likelihood method (Hardy-Weinberg equilibrium [HWE] method), which rests on the assumption that HWE holds, and (2) a conditional likelihood method (conditional on parental genotype [CPG] method) appropriate when HWE is absent. Schaid and Sommer claimed that the CPG method can be more efficient than the HWE method, even when equilibrium holds. It can be shown, however that in the equilibrium situation the HWE method is always more efficient than the CPG method. For a dominant disease, the differences are slim. But for a recessive disease, the CPG method requires a much larger sample size to achieve a prescribed power than the HWE method. Additionally, we show how the relative risks for the various candidate-gene genotypes can be estimated without relying on iterative methods. For the CPG method, we represent an asymptotic power approximation that is sufficiently precise for planning the sample size of an association study.  相似文献   

20.
Species diversity may be additively partitioned within and among samples (alpha and beta diversity) from hierarchically scaled studies to assess the proportion of the total diversity (gamma) found in different habitats, landscapes, or regions. We developed a statistical approach for testing null hypotheses that observed partitions of species richness or diversity indices differed from those expected by chance, and we illustrate these tests using data from a hierarchical study of forest-canopy beetles. Two null hypotheses were implemented using individual- and sample-based randomization tests to generate null distributions for alpha and beta components of diversity at multiple sampling scales. The two tests differed in their null distributions and power to detect statistically significant diversity components. Individual-based randomization was more powerful at all hierarchical levels and was sensitive to departures between observed and null partitions due to intraspecific aggregation of individuals. Sample-based randomization had less power but still may be useful for determining whether different habitats show a higher degree of differentiation in species diversity compared with random samples from the landscape. Null hypothesis tests provide a basis for inferences on partitions of species richness or diversity indices at multiple sampling levels, thereby increasing our understanding of how alpha and beta diversity change across spatial scales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号