首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The association between a binary variable Y and a variable X having an at least ordinal measurement scale might be examined by selecting a cutpoint in the range of X and then performing an association test for the obtained 2 x 2 contingency table using the chi-square statistic. The distribution of the maximally selected chi-square statistic (i.e. the maximal chi-square statistic over all possible cutpoints) under the null-hypothesis of no association between X and Y is different from the known chi-square distribution. In the last decades, this topic has been extensively studied for continuous X variables, but not for non-continuous variables of at least ordinal measurement scale (which include e.g. classical ordinal or discretized continuous variables). In this paper, we suggest an exact method to determine the finite-sample distribution of maximally selected chi-square statistics in this context. This novel approach can be seen as a method to measure the association between a binary variable and variables having an at least ordinal scale of different types (ordinal, discretized continuous, etc). As an illustration, this method is applied to a new data set describing pregnancy and birth for 811 babies.  相似文献   

2.
It is common in epidemiologic analyses to summarize continuous outcomes as falling above or below a threshold. With paired data and with a threshold chosen without reference to the outcomes, McNemar's test of marginal homogeneity may be applied to the resulting dichotomous pairs when testing for equality of the marginal distributions of the underlying continuous outcomes. If the threshold is chosen to maximize the test statistic, however, referring the resulting test statistic to the nominal chi 2 distribution is incorrect; instead, the p-value must be adjusted for the multiple comparisons. Here the distribution of a maximally selected McNemar's statistic is derived, and it is shown that an approximation due to Durbin (1985, Journal of Applied Probability 22, 99-122) may be used to estimate approximate p-values. The methodology is illustrated by an application to measurements of insulin-like growth factor-I (IGF-I) in matched prostate cancer cases and controls from the Physicians' Health Study. The results of simulation experiments that assess the accuracy of the approximation in moderate sample sizes are reported.  相似文献   

3.
It is common in epidemiologic analyses to summarize continuous outcomes as falling above or below a threshold. With such a dichotomized outcome, the usual chi2 statistics for association or trend can be used to test for equality of proportions across strata of the study population. However, if the threshold is chosen to maximize the test statistic, the nominal chi2 reference distributions are incorrect. In this paper, the asymptotic distributions of maximally selected chi2 statistics for association and for trend for the k x 2 table are derived. The methodology is illustrated with data from an AIDS clinical trial. The results of simulation experiments that assess the accuracy of the asymptotic distributions in moderate sample sizes are also reported.  相似文献   

4.
Significance testing for correlated binary outcome data   总被引:1,自引:0,他引:1  
B Rosner  R C Milton 《Biometrics》1988,44(2):505-512
Multiple logistic regression is a commonly used multivariate technique for analyzing data with a binary outcome. One assumption needed for this method of analysis is the independence of outcome for all sample points in a data set. In ophthalmologic data and other types of correlated binary data, this assumption is often grossly violated and the validity of the technique becomes an issue. A technique has been developed (Rosner, 1984) that utilizes a polychotomous logistic regression model to allow one to look at multiple exposure variables in the context of a correlated binary data structure. This model is an extension of the beta-binomial model, which has been widely used to model correlated binary data when no covariates are present. In this paper, a relationship is developed between the two techniques, whereby it is shown that use of ordinary logistic regression in the presence of correlated binary data can result in true significance levels that are considerably larger than nominal levels in frequently encountered situations. This relationship is explored in detail in the case of a single dichotomous exposure variable. In this case, the appropriate test statistic can be expressed as an adjusted chi-square statistic based on the 2 X 2 contingency table relating exposure to outcome. The test statistic is easily computed as a function of the ordinary chi-square statistic and the correlation between eyes (or more generally between cluster members) for outcome and exposure, respectively. This generalizes some previous results obtained by Koval and Donner (1987, in Festschrift for V. M. Joshi, I. B. MacNeill (ed.), Vol. V, 199-224.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

5.
We address the problem of tests of homogeneity in two-way contingency tables in case-control studies when the case category is subdivided into k subcategories. In this situation, we have two cells with large frequencies and 2 X k cells with frequencies that become small as k increases. We propose two ad hoc statistics in which a statistic for the sparse cells is combined with a statistic for the cells with large frequencies. We will study these tests along with the Pearson test (using a chi-square approximation) in a Monte Carlo simulation study. Two sets of null hypothesis models and two sets of alternative hypothesis models are considered. The best test for the models considered is the usual Pearson test (using an approximate chi-square distribution) although the ad hoc models are more powerful under one alternative model considered.  相似文献   

6.
Halpern AL 《Biometrics》1999,55(4):1044-1050
A novel changepoint statistic based on the minimum value, over possible changepoint locations, of Fisher's Exact Test, is introduced. Specific points in the exact distribution of the minimally selected Fisher's value may be rapidly calculated as a lattice-path counting problem via known recurrence methods. The test is compared to the Kolmogorov-Smirnov two-sample test, the maximally selected chi-square, and a likelihood ratio test. The tests are applied to assessing recombination in genetic sequences of HIV.  相似文献   

7.
Hothorn T  Zeileis A 《Biometrics》2008,64(4):1263-1269
SUMMARY: Maximally selected statistics for the estimation of simple cutpoint models are embedded into a generalized conceptual framework based on conditional inference procedures. This powerful framework contains most of the published procedures in this area as special cases, such as maximally selected chi(2) and rank statistics, but also allows for direct construction of new test procedures for less standard test problems. As an application, a novel maximally selected rank statistic is derived from this framework for a censored response partitioned with respect to two ordered categorical covariates and potential interactions. This new test is employed to search for a high-risk group of rectal cancer patients treated with a neo-adjuvant chemoradiotherapy. Moreover, a new efficient algorithm for the evaluation of the asymptotic distribution for a large class of maximally selected statistics is given enabling the fast evaluation of a large number of cutpoints.  相似文献   

8.
The degree of agreement between two raters is re-examined. An alternative statistic which uses the chi-square distribution is proposed. We conclude that this statistic is better than the usual k-statistic when the classification variable is at least ordinal.  相似文献   

9.
The use of the Pearson chi-square statistic for testing hypotheses on biological populations is not appropriate when the individuals are distributed by clusters. In the case where the clusters are distributed independently of each other, we propose an asymptotically chi-square distributed test statistic taking into account the cluster size distribution. An example provided by European Corn Borer eggs data is used to illustrate the test procedure.  相似文献   

10.
In calibration experiments, an estimated relationship between covariate information for a sample and an observed response is used to infer the covariate information for unknown samples from their responses. In some situations, this covariate information comprises a nominal variable (e.g., identity of a chemical, sex of an animal) and a real-valued variable (e.g., concentration of the chemical, age of animal). If the calibrating relationship can be estimated separately for each candidate identity, the first step in analyzing unknown samples is to correctly determine their identity. A discrimination statistic is suggested for use in this situation and its asymptotic distribution is derived. The investigation is motivated by the possibility of using multiple immunoassays in environmental monitoring to identify and quantitate contaminated samples in situations where there are several candidate pollutants that cross-react significantly to single assays. An example is given of the use of a four-antibody assay for the simultaneous monitoring of the levels in water samples of several of the commonly used triazine herbicides and their derivatives.  相似文献   

11.
Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology.  相似文献   

12.
J Rochon 《Biometrics》1989,45(1):193-205
Grizzle, Starmer, and Koch (1969, Biometrics 25, 489-503) presented a unified approach for data analysis when the outcome variable is measured on a nominal or ordinal scale. The technique uses a weighted least squares methodology, and hypotheses are tested using asymptotic chi-square statistics. In this paper, we adapt these procedures to the problem of determining the minimum sample size required for an applied research effort, and use the noncentral versions of these chi-square statistics. The results are compared against several procedures widely used in the literature, and are found to concur well with these techniques. As well, some new situations are considered.  相似文献   

13.
Bilder CR  Loughin TM 《Biometrics》2004,60(1):241-248
Questions that ask respondents to "choose all that apply" from a set of items occur frequently in surveys. Categorical variables that summarize this type of survey data are called both pick any/c variables and multiple-response categorical variables. It is often of interest to test for independence between two categorical variables. When both categorical variables can have multiple responses, traditional Pearson chi-square tests for independence should not be used because of the within-subject dependence among responses. An intuitively constructed version of the Pearson statistic is proposed to perform the test using bootstrap procedures to approximate its sampling distribution. First- and second-order adjustments to the proposed statistic are given in order to use a chi-square distribution approximation. A Bonferroni adjustment is proposed to perform the test when the joint set of responses for individual subjects is unavailable. Simulations show that the bootstrap procedures hold the correct size more consistently than the other procedures.  相似文献   

14.
A superpopulation model generates the probabilities of a Bernouilli random variable. The ranks of the involved variables are considered as survey weights. The distribution f each linear rank statistic is derived under the null hypothesis for the two sample problem and for the case k2 when a simple random sampling or stratified sampling is used. The growth of a population of insects and the behavior of patients with imsomnia are studied using these procedures.  相似文献   

15.
In multivariate matching, fine balance constrains the marginal distributions of a nominal variable in treated and matched control groups to be identical without constraining who is matched to whom. In this way, a fine balance constraint can balance a nominal variable with many levels while focusing efforts on other more important variables when pairing individuals to minimize the total covariate distance within pairs. Fine balance is not always possible; that is, it is a constraint on an optimization problem, but the constraint is not always feasible. We propose a new algorithm that returns a minimum distance finely balanced match when one is feasible, and otherwise minimizes the total distance among all matched samples that minimize the deviation from fine balance. Perhaps we can come very close to fine balance when fine balance is not attainable; moreover, in any event, because our algorithm is guaranteed to come as close as possible to fine balance, the investigator may perform one match, and on that basis judge whether the best attainable balance is adequate or not. We also show how to incorporate an additional constraint. The algorithm is implemented in two similar ways, first as an optimal assignment problem with an augmented distance matrix, second as a minimum cost flow problem in a network. The case of knee surgery in the Obesity and Surgical Outcomes Study motivated the development of this algorithm and is used as an illustration. In that example, 2 of 47 hospitals had too few nonobese patients to permit fine balance for the nominal variable with 47 levels representing the hospital, but our new algorithm came very close to fine balance. Moreover, in that example, there was a shortage of nonobese diabetic patients, and incorporation of an additional constraint forced the match to include all of these nonobese diabetic patients, thereby coming as close as possible to balance for this important but recalcitrant covariate.  相似文献   

16.
Sample size determination for case-control studies of chronic disease are often based on the simple 2 X 2 tabular cross-classification of exposure and disease, thereby ignoring stratification which may be considered in the analysis. One consequence of this approach is that the sample size may be inadequate to attain a specified power and size when performing a statistical analysis on J 2 X 2 tables using Cochran's (1954, Biometrics 10, 417-451) statistic or the Mantel-Haenszel (1959, Journal of the National Cancer Institute 22, 719-748) statistic. A sample size formula is derived from Cochran's statistic and it is compared with the corresponding one derived when the data are treated as unstratified, and also with two other formulas proposed for stratified data analysis. The formula developed yields values slightly higher than one recently proposed by Mu?oz and Rosner (1984, Biometrics 40, 995-1004), which assumes that both margins of each 2 X 2 table are fixed, while the present study considers only the case-control margin to be fixed.  相似文献   

17.
Donner A  Klar N  Zou G 《Biometrics》2004,60(4):919-925
Split-cluster designs are frequently used in the health sciences when naturally occurring clusters such as multiple sites or organs in the same subject are assigned to different treatments. However, statistical methods for the analysis of binary data arising from such designs are not well developed. The purpose of this article is to propose and evaluate a new procedure for testing the equality of event rates in a design dividing each of k clusters into two segments having multiple sites (e.g., teeth, lesions). The test statistic proposed is a generalization of a previously published procedure based on adjusting the standard Pearson chi-square statistic, but can also be derived as a score test using the approach of generalized estimating equations.  相似文献   

18.
数量性状的遗传分析可以通过"选择基因型"的方式完成。本文提出了一个利用极端样本来对数量性状位点(QTL)进行关联分析的统计量T。统计量T比较上极端群体样本中具有纯合子标记的性状值差异。通过计算机模拟考察了无关联情形时T的分布和Ⅰ型错误率,结果表明,在各种样本选择策略下,T的分布近似于χ^2-分布,Ⅰ型错误率接近设定的显著性水平。同时,考察了各种遗传模型下不同遗传率,不同样本大小,及不同样本选择阈值对T的统计功效的影响,结果表明,T的功效随着标记和QTL间连锁不平衡程度的增强及遗传率和样本大小的增大而增大,当样本选择阈值更严格时,功效也越大。  相似文献   

19.
A new statistical test for linkage heterogeneity.   总被引:6,自引:5,他引:1       下载免费PDF全文
A new, statistical test for linkage heterogeneity is described. It is a likelihood-ratio test based on a beta distribution for the prior distribution of the recombination fraction among families (or individuals). The null distribution for this statistic (called the B-test) is derived under a broad range of circumstances. Two other heterogeneity test statistics--the admixture test or A-test first described by Smith and Morton's test (here referred to as the K-test)--are also examined. The probability distribution for the K-test statistic is very sensitive to family size, whereas the other two statistics are not. All three statistics are somewhat sensitive to the magnitude of the recombination fraction theta. Critical values for each of the test statistics are given. A conservative approximation for both the A-test and B-test is given by a chi 2 distribution when P/2 instead of P is used for the observed significance level. In terms of power, the B-test performs best among the three tests over a broad range of alternate heterogeneity hypotheses--except for the specific case of admixture with loose linkage, in which the A-test performs best. Overall, the difference in power among the three tests is not large. An application to some recently published data on the fragile-X syndrome and X-chromosome markers is given.  相似文献   

20.
On assessing interrater agreement for multiple attribute responses   总被引:2,自引:0,他引:2  
L L Kupper  K B Hafner 《Biometrics》1989,45(3):957-967
New methods are developed for assessing the extent of interrater agreement when each unit to be rated is characterized by a (possibly empty) subset of a specified set of distinct nominal attributes. For such multiple attribute response data, a two-rater concordance statistic is derived, and associated statistical inference-making procedures are provided. This concordance statistic is corrected for chance agreement by using an underlying hypergeometric model. Numerical examples are given to illustrate the proposed methodology, and comparisons to other agreement statistics (e.g., kappa) are made.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号