首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Rosenbaum PR 《Biometrics》2007,63(2):456-464
Huber's m-estimates use an estimating equation in which observations are permitted a controlled level of influence. The family of m-estimates includes least squares and maximum likelihood, but typical applications give extreme observations limited weight. Maritz proposed methods of exact and approximate permutation inference for m-tests, confidence intervals, and estimators, which can be derived from random assignment of paired subjects to treatment or control. In contrast, in observational studies, where treatments are not randomly assigned, subjects matched for observed covariates may differ in terms of unobserved covariates, so differing outcomes may not be treatment effects. In observational studies, a method of sensitivity analysis is developed for m-tests, m-intervals, and m-estimates: it shows the extent to which inferences would be altered by biases of various magnitudes due to nonrandom treatment assignment. The method is developed for both matched pairs, with one treated subject matched to one control, and for matched sets, with one treated subject matched to one or more controls. The method is illustrated using two studies: (i) a paired study of damage to DNA from exposure to chromium and nickel and (ii) a study with one or two matched controls comparing side effects of two drug regimes to treat tuberculosis. The approach yields sensitivity analyses for: (i) m-tests with Huber's weight function and other robust weight functions, (ii) the permutational t-test which uses the observations directly, and (iii) various other procedures such as the sign test, Noether's test, and the permutation distribution of the efficient score test for a location family of distributions. Permutation inference with covariance adjustment is briefly discussed.  相似文献   

2.
The recent controversy over the increased risk of venous thrombosis with third generation oral contraceptives illustrates the public policy dilemma that can be created by relying on conventional statistical tests and estimates: case-control studies showed a significant increase in risk and forced a decision either to warn or not to warn. Conventional statistical tests are an improper basis for such decisions because they dichotomise results according to whether they are or are not significant and do not allow decision makers to take explicit account of additional evidence--for example, of biological plausibility or of biases in the studies. A Bayesian approach overcomes both these problems. A Bayesian analysis starts with a "prior" probability distribution for the value of interest (for example, a true relative risk)--based on previous knowledge--and adds the new evidence (via a model) to produce a "posterior" probability distribution. Because different experts will have different prior beliefs sensitivity analyses are important to assess the effects on the posterior distributions of these differences. Sensitivity analyses should also examine the effects of different assumptions about biases and about the model which links the data with the value of interest. One advantage of this method is that it allows such assumptions to be handled openly and explicitly. Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence.  相似文献   

3.
There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case-control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed unsupervised. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available.In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.  相似文献   

4.
There are many sources of systematic variation in cDNA microarray experiments which affect the measured gene expression levels (e.g. differences in labeling efficiency between the two fluorescent dyes). The term normalization refers to the process of removing such variation. A constant adjustment is often used to force the distribution of the intensity log ratios to have a median of zero for each slide. However, such global normalization approaches are not adequate in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments. The selection of appropriate controls for normalization is discussed and a novel set of controls (microarray sample pool, MSP) is introduced to aid in intensity-dependent normalization. Lastly, to allow for comparisons of expression levels across slides, a robust method based on maximum likelihood estimation is proposed to adjust for scale differences among slides.  相似文献   

5.
D C Thomas  M Blettner  N E Day 《Biometrics》1992,48(3):781-794
A method is proposed for analysis of nested case-control studies that combines the matched comparison of covariate values between cases and controls and a comparison of the observed numbers of cases in the nesting cohort with expected numbers based on external rates and average relative risks estimated from the controls. The former comparison is based on the conditional likelihood for matched case-control studies and the latter on the unconditional likelihood for Poisson regression. It is shown that the two likelihoods are orthogonal and that their product is an estimator of the full survival likelihood that would have been obtained on the total cohort, had complete covariate data been available. Parameter estimation and significance tests follow in the usual way by maximizing this product likelihood. The method is illustrated using data on leukemia following irradiation for cervical cancer. In this study, the original cohort study showed a clear excess of leukemia in the first 15 years after exposure, but it was not feasible to obtain dose estimates on the entire cohort. However, the subsequent nested case-control study failed to demonstrate significant differences between alternative dose-response relations and effects of time-related modifiers. The combined analysis allows much clearer discrimination between alternative dose-time-response models.  相似文献   

6.
In many case-control genetic association studies, a set of correlated secondary phenotypes that may share common genetic factors with disease status are collected. Examination of these secondary phenotypes can yield valuable insights about the disease etiology and supplement the main studies. However, due to unequal sampling probabilities between cases and controls, standard regression analysis that assesses the effect of SNPs (single nucleotide polymorphisms) on secondary phenotypes using cases only, controls only, or combined samples of cases and controls can yield inflated type I error rates when the test SNP is associated with the disease. To solve this issue, we propose a Gaussian copula-based approach that efficiently models the dependence between disease status and secondary phenotypes. Through simulations, we show that our method yields correct type I error rates for the analysis of secondary phenotypes under a wide range of situations. To illustrate the effectiveness of our method in the analysis of real data, we applied our method to a genome-wide association study on high-density lipoprotein cholesterol (HDL-C), where "cases" are defined as individuals with extremely high HDL-C level and "controls" are defined as those with low HDL-C level. We treated 4 quantitative traits with varying degrees of correlation with HDL-C as secondary phenotypes and tested for association with SNPs in LIPG, a gene that is well known to be associated with HDL-C. We show that when the correlation between the primary and secondary phenotypes is >0.2, the P values from case-control combined unadjusted analysis are much more significant than methods that aim to correct for ascertainment bias. Our results suggest that to avoid false-positive associations, it is important to appropriately model secondary phenotypes in case-control genetic association studies.  相似文献   

7.
The most simple and commonly used approach for genetic associations is the case-control study design of unrelated people. This design is susceptible to population stratification. This problem is obviated in family-based studies, but it is usually difficult to accumulate large enough samples of well-characterized families. We addressed empirically whether the two designs give similar estimates of association in 93 investigations where both unrelated case-control and family-based designs had been employed. Estimated odds ratios differed beyond chance between the two designs in only four instances (4%). The summary relative odds ratio (ROR) (the ratio of odds ratios obtained from unrelated case-control and family-based studies) was close to unity (0.96 [95% confidence interval, 0.91-1.01]). There was no heterogeneity in the ROR across studies (amount of heterogeneity beyond chance I(2) = 0%). Differences on whether results were nominally statistically significant (p < 0.05) or not with the two designs were common (opposite classification rates 14% and 17%); this reflected largely differences in power. Conclusions were largely similar in diverse subgroup analyses. Unrelated case-control and family-based designs give overall similar estimates of association. We cannot rule out rare large biases or common small biases.  相似文献   

8.
In this paper we propose a method to be used in the planning stage of a case-control study. An allocation rule for controls in multicenter case-control studies is proposed which would assure a simple, efficient and unbiased estimation of the odds ratio in the pooled data. It is shown that the efficiency of the design increases with increasing correlation between study center and risk factor. Sources of bias and their implications for relative risk estimation are discussed. The method is demonstrated with data from a case-control study.  相似文献   

9.
A large number of factors can affect the statistical power and bias of analyses of data from large cohort studies, including misclassification, correlated data, follow-up time, prevalence of the risk factor of interest, and prevalence of the outcome. This paper presents a method for simulating cohorts where individual's risk is correlated within communities, recruitment is staggered over time, and outcomes are observed after different follow-up periods. Covariates and outcomes are misclassified, and Cox proportional hazards models are fit with a community-level frailty term. The effect on study power of varying effect sizes, prevalences, correlation, and misclassification are explored, as well as varying the proportion of controls in nested case-control studies.  相似文献   

10.
Lu SE  Wang MC 《Biometrics》2002,58(4):764-772
Cohort case-control design is an efficient and economical design to study risk factors for disease incidence or mortality in a large cohort. In the last few decades, a variety of cohort case-control designs have been developed and theoretically justified. These designs have been exclusively applied to the analysis of univariate failure-time data. In this work, a cohort case-control design adapted to multivariate failure-time data is developed. A risk set sampling method is proposed to sample controls from nonfailures in a large cohort for each case matched by failure time. This method leads to a pseudolikelihood approach for the estimation of regression parameters in the marginal proportional hazards model (Cox, 1972, Journal of the Royal Statistical Society, Series B 34, 187-220), where the correlation structure between individuals within a cluster is left unspecified. The performance of the proposed estimator is demonstrated by simulation studies. A bootstrap method is proposed for inferential purposes. This methodology is illustrated by a data example from a child vitamin A supplementation trial in Nepal (Nepal Nutrition Intervention Project-Sarlahi, or NNIPS).  相似文献   

11.
Case-control studies offer a rapid and efficient way to evaluate hypotheses. On the other hand, proper selection of the controls is challenging, and the potential for selection bias is a major weakness. Valid inferences about parameters of interest cannot be drawn if selection bias exists. Furthermore, the selection bias is difficult to evaluate. Even in situations where selection bias can be estimated, few methods are available. In the matched case-control Northern Manhattan Stroke Study (NOMASS), stroke-free controls are sampled in two stages. First, a telephone survey ascertains demographic and exposure status from a large random sample. Then, in an in-person interview, detailed information is collected for the selected controls to be used in a matched case-control study. The telephone survey data provides information about the selection probability and the potential selection bias. In this article, we propose bias-corrected estimators in a case-control study using a joint estimating equation approach. The proposed bias-corrected estimate and its standard error can be easily obtained by standard statistical software.  相似文献   

12.
A power calculation is crucial in planning genetic studies. In genetic association studies, the power is often calculated using the expected number of individuals with each genotype calculated from an assumed allele frequency under Hardy-Weinberg equilibrium. Since the allele frequency is often unknown, the number of individuals with each genotype is random and so a power calculation assuming a known allele frequency may be incorrect. Ambrosius et al. recently showed that the power ignoring this randomness may lead to studies with insufficient power and proposed averaging the power due to the randomness. We extend the method of averaging power in two directions. First, for testing association in case-control studies, we use the Cochran-Armitage trend test and find that the time needed for calculating the averaged power is much reduced compared to the chi-square test with two degrees of freedom studied by Ambrosius et al. A real study is used for illustration of the method. Second, we extend the method to linkage analysis, where the number of identical-by-descent alleles shared by siblings is random. The distribution of identical-by-descent numbers depends on the underlying genetic model rather than the allele frequency. The robust test for linkage analysis is also examined using the averaged powers. We also recommend a sensitivity analysis when the true allele frequency or the number of identical-by-descent alleles is unknown.  相似文献   

13.
A method of inverse sampling of controls in a matched case-control study is described in which, for each case, controls are sampled until a discordant set is achieved. For a binary exposure, inverse sampling is used to determine the number of controls for each case. When most individuals in a population have the same exposure, standard case-control sampling may result in many case-control sets being concordant with respect to exposure and thus uninformative in the conditional logistic analysis. The method using inverse control sampling is proposed as a solution to this problem in situations when it is practically feasible. In many circumstances, inverse control sampling is found to offer improved statistical efficiency relative to a comparable study with a fixed number of controls per case.  相似文献   

14.
In biomedical cohort studies for assessing the association between an outcome variable and a set of covariates, usually, some covariates can only be measured on a subgroup of study subjects. An important design question is—which subjects to select into the subgroup to increase statistical efficiency. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, similar two-phase design options have not been explored for the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available to relate the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the external model and further matches them on complete covariates similarly to the balanced design. We develop a pseudolikelihood method for estimating OR parameters. Through simulation studies and explorations in a real-cohort study, we find that our design generally leads to reduced asymptotic variances of the OR estimates and the reduction for the matching covariates is comparable to that of the balanced design.  相似文献   

15.
A very common polymorphism of p53, that of codon 72, codes either for a proline (P72) or an arginine (R72). The two alleles differ in their biological properties: P72 is a stronger inducer of p21, while R72 induces 5-10 times more apoptosis. It is not known, however, whether this polymorphism influences genome stability. The influence of p53 codon 72 polymorphism on cancer risk has been studied for different types of cancer with mixed and inconsistent results. With respect to sporadic non-melanoma skin cancer (NMSC), there are few studies, with small sample sizes, and none in a Latinoamerican population. These studies have found no association between p53 genotype at codon 72 and NMSC. We analyzed whether p53 codon 72 genotype influences genomic stability and the sensitivity of cells to UVB. We also carried out a case-control study of NMSC in a Mexican population which included 204 BCC cases, 42 SCC cases, and 238 controls. There was no association between p53 genotype and basal levels of DNA damage, oxidative DNA damage sensitivity, or DNA repair capacity. R72 dominantly increased the in vitro sensitivity of cells to UVB-induced apoptosis. There was no significant association either between p53 genotype and basal cell carcinoma (BCC), squamous cell carcinoma (SCC) or both combined.  相似文献   

16.
Time averaged field measurements produced by a Positron dosimeter worn by study subjects was the primary method of exposure evaluation in two Canadian studies of childhood leukemia and AC magnetic field exposure. Statistically significant but mutually contradictory results obtained in the two studies, done in different locales but under similar study conditions, have not been explained. This report examines operational features of the Positron meter, including an unanticipated sensitivity to wearer motion. If the convalescent cases studied were less active than their healthy controls, as one might expect, then the meter's characteristic responses to motion, particularly as they would affect case-control distributions above and below the different referent group cutpoints used in the two studies, could help to explain both the unexpected inverse risks reported in the larger study and the unusually high risks reported in the smaller study.  相似文献   

17.
Exact inference for matched case-control studies   总被引:1,自引:0,他引:1  
K F Hirji  C R Mehta  N R Patel 《Biometrics》1988,44(3):803-814
In an epidemiological study with a small sample size or a sparse data structure, the use of an asymptotic method of analysis may not be appropriate. In this paper we present an alternative method of analyzing data for case-control studies with a matched design that does not rely on large-sample assumptions. A recursive algorithm to compute the exact distribution of the conditional sufficient statistics of the parameters of the logistic model for such a design is given. This distribution can be used to perform exact inference on model parameters, the methodology of which is outlined. To illustrate the exact method, and compare it with the conventional asymptotic method, analyses of data from two case-control studies are also presented.  相似文献   

18.
19.
The large variety of clustering algorithms and their variants can be daunting to researchers wishing to explore patterns within their microarray datasets. Furthermore, each clustering method has distinct biases in finding patterns within the data, and clusterings may not be reproducible across different algorithms. A consensus approach utilizing multiple algorithms can show where the various methods agree and expose robust patterns within the data. In this paper, we present a software package - Consense, written for R/Bioconductor - that utilizes such an approach to explore microarray datasets. Consense produces clustering results for each of the clustering methods and produces a report of metrics comparing the individual clusterings. A feature of Consense is identification of genes that cluster consistently with an index gene across methods. Utilizing simulated microarray data, sensitivity of the metrics to the biases of the different clustering algorithms is explored. The framework is easily extensible, allowing this tool to be used by other functional genomic data types, as well as other high-throughput OMICS data types generated from metabolomic and proteomic experiments. It also provides a flexible environment to benchmark new clustering algorithms. Consense is currently available as an installable R/Bioconductor package (http://www.ohsucancer.com/isrdev/consense/).  相似文献   

20.
Here I report that, when partnered people judge the facial attractiveness of potential mates for a short- and a long-term relationship, the order in which the two conditions are presented biases responses in a systematic manner. Women and men display symmetrical biases. Women find men less attractive as new long-term partners if they have first imagined them as one-night stands. Men find women less attractive as one-night stands if they have first imagined them as new long-term partners. On a total sample of over 3000 individuals from different studies, I show that both biases are robust and replicable in partnered people and neither is found in singles. Alas, so far no study has statistically controlled the effect of the order in which participants consider the two types of relationships. Whatever their interpretation, these biases are capable of producing spurious or inconsistent associations and mislead us when we compare studies that on the surface appear similar—most notably, direct and conceptual replications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号