首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data analytic methods for matched case-control studies   总被引:3,自引:0,他引:3  
D Pregibon 《Biometrics》1984,40(3):639-651
The recent introduction of complex multivariate statistical models in matched case-control studies is a mixed blessing. Their use can lead to a better understanding of the way in which many variables contribute to the risk of disease. On the other hand, these powerful methods can obscure salient features in the data that might have been detected by other, less sophisticated methods. This shortcoming is due to a lack of support methodology for the routine use of these models. Satisfactory computation of estimated relative risks and their standard errors is not sufficient justification for the fitted model. Goodness of fit must be examined if inferences are to be trusted. This paper is concerned with the analysis of matched case-control studies with logistic models. Analogies of these models to linear regression models are emphasized. In particular, basic concepts such as analysis of variance, multiple correlation coefficient, one-degree-of-freedom tests, and residual analysis are discussed. The fairly new field of regression diagnostics is also introduced. All procedures are illustrated on a study of bladder cancer in males.  相似文献   

2.
Chen J  Rodriguez C 《Biometrics》2007,63(4):1099-1107
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.  相似文献   

3.
D Zelterman  C T Le 《Biometrics》1991,47(2):751-755
We examine several tests of homogeneity of the odds ratio in the analysis of 2 x 2 tables arising from epidemiologic 1:R matched case-control studies. The T4 and T5 statistics proposed by Liang and Self (1985, Biometrika 72, 353-358) are unable to detect obvious inhomogeneity in two numerical examples and in simulation studies. The null hypothesis is rejected by the chi-square statistic of Ejigou and McHugh (1984, Biometrika 71, 408-411) and by a new proposed method whose significance level must be simulated.  相似文献   

4.
We propose a conditional scores procedure for obtaining bias-corrected estimates of log odds ratios from matched case-control data in which one or more covariates are subject to measurement error. The approach involves conditioning on sufficient statistics for the unobservable true covariates that are treated as fixed unknown parameters. For the case of Gaussian nondifferential measurement error, we derive a set of unbiased score equations that can then be solved to estimate the log odds ratio parameters of interest. The procedure successfully removes the bias in naive estimates, and standard error estimates are obtained by resampling methods. We present an example of the procedure applied to data from a matched case-control study of prostate cancer and serum hormone levels, and we compare its performance to that of regression calibration procedures.  相似文献   

5.
Kim I  Cohen ND  Carroll RJ 《Biometrics》2003,59(4):1158-1169
We develop semiparametric methods for matched case-control studies using regression splines. Three methods are developed: 1) an approximate cross-validation scheme to estimate the smoothing parameter inherent in regression splines, as well as 2) Monte Carlo expectation maximization (MCEM) and 3) Bayesian methods to fit the regression spline model. We compare the approximate cross-validation approach, MCEM, and Bayesian approaches using simulation, showing that they appear approximately equally efficient; the approximate cross-validation method is computationally the most convenient. An example from equine epidemiology that motivated the work is used to demonstrate our approaches.  相似文献   

6.
Association tests that pool minor alleles into a measure of burden at a locus have been proposed for case-control studies using sequence data containing rare variants. However, such pooling tests are not robust to the inclusion of neutral and protective variants, which can mask the association signal from risk variants. Early studies proposing pooling tests dismissed methods for locus-wide inference using nonnegative single-variant test statistics based on unrealistic comparisons. However, such methods are robust to the inclusion of neutral and protective variants and therefore may be more useful than previously appreciated. In fact, some recently proposed methods derived within different frameworks are equivalent to performing inference on weighted sums of squared single-variant score statistics. In this study, we compared two existing methods for locus-wide inference using nonnegative single-variant test statistics to two widely cited pooling tests under more realistic conditions. We established analytic results for a simple model with one rare risk and one rare neutral variant, which demonstrated that pooling tests were less powerful than even Bonferroni-corrected single-variant tests in most realistic situations. We also performed simulations using variants with realistic minor allele frequency and linkage disequilibrium spectra, disease models with multiple rare risk variants and extensive neutral variation, and varying rates of missing genotypes. In all scenarios considered, existing methods using nonnegative single-variant test statistics had power comparable to or greater than two widely cited pooling tests. Moreover, in disease models with only rare risk variants, an existing method based on the maximum single-variant Cochran-Armitage trend chi-square statistic in the locus had power comparable to or greater than another existing method closely related to some recently proposed methods. We conclude that efficient locus-wide inference using single-variant test statistics should be reconsidered as a useful framework for devising powerful association tests in sequence data with rare variants.  相似文献   

7.
Exact inference for matched case-control studies   总被引:1,自引:0,他引:1  
K F Hirji  C R Mehta  N R Patel 《Biometrics》1988,44(3):803-814
In an epidemiological study with a small sample size or a sparse data structure, the use of an asymptotic method of analysis may not be appropriate. In this paper we present an alternative method of analyzing data for case-control studies with a matched design that does not rely on large-sample assumptions. A recursive algorithm to compute the exact distribution of the conditional sufficient statistics of the parameters of the logistic model for such a design is given. This distribution can be used to perform exact inference on model parameters, the methodology of which is outlined. To illustrate the exact method, and compare it with the conventional asymptotic method, analyses of data from two case-control studies are also presented.  相似文献   

8.
The paper considers the problem of determining the number of matched sets in 1 : M matched case-control studies with a categorical exposure having k + 1 categories, k > or = 1. The basic interest lies in constructing a test statistic to test whether the exposure is associated with the disease. Estimates of the k odds ratios for 1 : M matched case-control studies with dichotomous exposure and for 1 : 1 matched case-control studies with exposure at several levels are presented in Breslow and Day (1980), but results holding in full generality were not available so far. We propose a score test for testing the hypothesis of no association between disease and the polychotomous exposure. We exploit the power function of this test statistic to calculate the required number of matched sets to detect specific departures from the null hypothesis of no association. We also consider the situation when there is a natural ordering among the levels of the exposure variable. For ordinal exposure variables, we propose a test for detecting trend in disease risk with increasing levels of the exposure variable. Our methods are illustrated with two datasets, one is a real dataset on colorectal cancer in rats and the other a simulated dataset for studying disease-gene association.  相似文献   

9.
Sensitivity analysis for matched case-control studies   总被引:1,自引:0,他引:1  
P R Rosenbaum 《Biometrics》1991,47(1):87-100
A sensitivity analysis in an observational study indicates the degree to which conclusions would be altered by hidden biases of various magnitudes. A method of sensitivity analysis previously proposed for cohort studies is extended for use in matched case-control studies with multiple controls, where slightly different derivations and calculations are required. Also discussed is a sensitivity analysis for case-control studies that have two distinct types of controls, say hospital and neighborhood controls, where the two types may be affected by different biases. For illustration, the method is applied to five case-control studies, including a study of herniated lumbar disc in which there are three types of cases, and a study of breast cancer with two types of controls.  相似文献   

10.
The problem of exact conditional inference for discrete multivariate case-control data has two forms. The first is grouped case-control data, where Monte Carlo computations can be done using the importance sampling method of Booth and Butler (1999, Biometrika86, 321-332), or a proposed alternative sequential importance sampling method. The second form is matched case-control data. For this analysis we propose a new exact sampling method based on the conditional-Poisson distribution for conditional testing with one binary and one integral ordered covariate. This method makes computations on data sets with large numbers of matched sets fast and accurate. We provide detailed derivation of the constraints and conditional distributions for conditional inference on grouped and matched data. The methods are illustrated on several new and old data sets.  相似文献   

11.
Ghosh D 《Biometrics》2003,59(3):721-726
In tumorigenicity experiments, a complication is that the time to event is generally not observed, so that the time to tumor is subject to interval censoring. One of the goals in these studies is to properly model the effect of dose on risk. Thus, it is important to have goodness of fit procedures available for assessing the model fit. While several estimation procedures have been developed for current-status data, relatively little work has been done on model-checking techniques. In this article, we propose numerical and graphical methods for the analysis of current-status data using the additive-risk model, primarily focusing on the situation where the monitoring times are dependent. The finite-sample properties of the proposed methodology are examined through numerical studies. The methods are then illustrated with data from a tumorigenicity experiment.  相似文献   

12.
Population structure has been presumed to cause many of the unreplicated disease-marker associations reported in the literature, yet few actual case-control studies have been evaluated for the presence of structure. Here, we examine four moderate case-control samples, comprising 3,472 individuals, to determine if detectable population subdivision is present. The four population samples include: 500 U.S. whites and 236 African Americans with hypertension; and 500 U.S. whites and 500 Polish whites with type 2 diabetes, all with matched control subjects. Both diabetes populations were typed for the PPARg Pro12Ala polymorphism, to replicate this well-supported association (Altshuler et al. 2000). In each of the four samples, we tested for structure, using the sum of the case-control allele frequency chi(2) statistics for 9 STR and 35 SNP markers (Pritchard and Rosenberg 1999). We found weak evidence for population structure in the African American sample only, but further refinement of the sample, to include only individuals with U.S.-born parents and grandparents, eliminated the stratification. Our examples provide insight into the factors affecting the replication of association studies and suggest that carefully matched, moderate-sized case-control samples in cosmopolitan U.S. and European populations are unlikely to contain levels of structure that would result in significantly inflated numbers of false-positive associations. We explore the role that extreme differences in power among studies, due to sample size and risk-allele frequency differences, may play in the replication problem.  相似文献   

13.
We introduce a liability-threshold mixed linear model (LTMLM) association statistic for case-control studies and show that it has a well-controlled false-positive rate and more power than existing mixed-model methods for diseases with low prevalence. Existing mixed-model methods suffer a loss in power under case-control ascertainment, but no solution has been proposed. Here, we solve this problem by using a χ2 score statistic computed from posterior mean liabilities (PMLs) under the liability-threshold model. Each individual’s PML is conditional not only on that individual’s case-control status but also on every individual’s case-control status and the genetic relationship matrix (GRM) obtained from the data. The PMLs are estimated with a multivariate Gibbs sampler; the liability-scale phenotypic covariance matrix is based on the GRM, and a heritability parameter is estimated via Haseman-Elston regression on case-control phenotypes and then transformed to the liability scale. In simulations of unrelated individuals, the LTMLM statistic was correctly calibrated and achieved higher power than existing mixed-model methods for diseases with low prevalence, and the magnitude of the improvement depended on sample size and severity of case-control ascertainment. In a Wellcome Trust Case Control Consortium 2 multiple sclerosis dataset with >10,000 samples, LTMLM was correctly calibrated and attained a 4.3% improvement (p = 0.005) in χ2 statistics over existing mixed-model methods at 75 known associated SNPs, consistent with simulations. Larger increases in power are expected at larger sample sizes. In conclusion, case-control studies of diseases with low prevalence can achieve power higher than that in existing mixed-model methods.  相似文献   

14.
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels.  相似文献   

15.
Accessibility of high-throughput genotyping technology allows genome-wide association studies for common complex diseases. This paper addresses two challenges commonly facing such studies: (i) searching an enormous amount of possible gene interactions and (ii) finding reproducible associations. These challenges have been traditionally addressed in statistics while here we apply computational approaches--optimization and cross-validation. A complex risk factor is modeled as a subset of single nucleotide polymorphisms (SNPs) with specified alleles and the optimization formulation asks for the one with the maximum odds ratio. To measure and compare ability of search methods to find reproducible risk factors, we propose to apply a cross-validation scheme usually used for prediction validation. We have applied and cross-validated known search methods with proposed enhancements on real case-control studies for several diseases (Crohn's disease, autoimmune disorder, tick-borne encephalitis, lung cancer, and rheumatoid arthritis). Proposed methods are compared favorably to the exhaustive search: they are faster, find more frequently statistically significant risk factors, and have significantly higher leave-half-out cross-validation rate.  相似文献   

16.
Case-control studies offer a rapid and efficient way to evaluate hypotheses. On the other hand, proper selection of the controls is challenging, and the potential for selection bias is a major weakness. Valid inferences about parameters of interest cannot be drawn if selection bias exists. Furthermore, the selection bias is difficult to evaluate. Even in situations where selection bias can be estimated, few methods are available. In the matched case-control Northern Manhattan Stroke Study (NOMASS), stroke-free controls are sampled in two stages. First, a telephone survey ascertains demographic and exposure status from a large random sample. Then, in an in-person interview, detailed information is collected for the selected controls to be used in a matched case-control study. The telephone survey data provides information about the selection probability and the potential selection bias. In this article, we propose bias-corrected estimators in a case-control study using a joint estimating equation approach. The proposed bias-corrected estimate and its standard error can be easily obtained by standard statistical software.  相似文献   

17.
In this paper, we develop Poisson-type regression methods that require the durations of exposure be measured only on a possibly nonrandom subset of the cohort members. These methods can be used to make inferences about the incidence density during exposure as well as the ratio of incidence densities during exposure versus not during exposure. Numerical studies demonstrate that the proposed methods yield reliable results in practical settings. We describe an application to a population-based case-control study assessing the transient increase in the risk of primary cardiac arrest during leisure-time physical activity.  相似文献   

18.
We propose a simple method for comparison of series of matched observations. While in all our examples we address “individual bioequivalence” (IBE), which is the subject of much discussion in pharmaceutical statistics, the methodology can be applied to a wide class of cross‐over experiments, including cross‐over imaging. From the statistical point of view the considered models belong to the class of the “error‐in‐variables” models. In computational statistics the corresponding optimization method is referred to as the “least squares distance” and the “total least squares” method. The derived confidence regions for both intercept and slope provide the basis for formulation of the IBE criteria and methods for its assessing. Simple simulations show that the proposed approach is very intuitive and transparent, and, at the same time, has a solid statistical and computational background.  相似文献   

19.
Summary In this article, we propose a family of semiparametric transformation models with time‐varying coefficients for recurrent event data in the presence of a terminal event such as death. The new model offers great flexibility in formulating the effects of covariates on the mean functions of the recurrent events among survivors at a given time. For the inference on the proposed models, a class of estimating equations is developed and asymptotic properties of the resulting estimators are established. In addition, a lack‐of‐fit test is provided for assessing the adequacy of the model, and some tests are presented for investigating whether or not covariate effects vary with time. The finite‐sample behavior of the proposed methods is examined through Monte Carlo simulation studies, and an application to a bladder cancer study is also illustrated.  相似文献   

20.
In a preceding paper, we (Nurminen et al. 1981) advocated the use of the sole referent series as the basis of estimating moments in the construction of test statistics for comparative studies. Three simple test statistics, two metric approaches and one procedure based on ranks, incorporating this principle are introduced for small matched samples with ordinal outcome variables. Associated methods for computing an “exact” probability value are derived. The techniques are illustrated by real data from a study in the field of occupational health epidemiology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号