首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The association between a binary variable Y and a variable X having an at least ordinal measurement scale might be examined by selecting a cutpoint in the range of X and then performing an association test for the obtained 2 x 2 contingency table using the chi-square statistic. The distribution of the maximally selected chi-square statistic (i.e. the maximal chi-square statistic over all possible cutpoints) under the null-hypothesis of no association between X and Y is different from the known chi-square distribution. In the last decades, this topic has been extensively studied for continuous X variables, but not for non-continuous variables of at least ordinal measurement scale (which include e.g. classical ordinal or discretized continuous variables). In this paper, we suggest an exact method to determine the finite-sample distribution of maximally selected chi-square statistics in this context. This novel approach can be seen as a method to measure the association between a binary variable and variables having an at least ordinal scale of different types (ordinal, discretized continuous, etc). As an illustration, this method is applied to a new data set describing pregnancy and birth for 811 babies.  相似文献   

2.
B Rosner 《Biometrics》1992,48(3):721-731
Clustered binary data occur frequently in biostatistical work. Several approaches have been proposed for the analysis of clustered binary data. In Rosner (1984, Biometrics 40, 1025-1035), a polychotomous logistic regression model was proposed that is a generalization of the beta-binomial distribution and allows for unit- and subunit-specific covariates, while controlling for clustering effects. One assumption of this model is that all pairs of subunits within a cluster are equally correlated. This is appropriate for ophthalmologic work where clusters are generally of size 2, but may be inappropriate for larger cluster sizes. A beta-binomial mixture model is introduced to allow for multiple subclasses within a cluster and to estimate odds ratios relating outcomes for pairs of subunits within a subclass as well as in different subclasses. To include covariates, an extension of the polychotomous logistic regression model is proposed, which allows one to estimate effects of unit-, class-, and subunit-specific covariates, while controlling for clustering using the beta-binomial mixture model. This model is applied to the analysis of respiratory symptom data in children collected over a 14-year period in East Boston, Massachusetts, in relation to maternal and child smoking, where the unit is the child and symptom history is divided into early-adolescent and late-adolescent symptom experience.  相似文献   

3.
C T Le 《Biometrics》1988,44(1):299-303
This paper is concerned with the issue of testing for trend with correlated binary data. We consider the problem where one has either one or two ears (or eyes) available for analysis at baseline and one wishes to look at changes over time in a dichotomous outcome taking into account the correlation between responses from two ears. A reparameterization of Rosner's (1982, Biometrics 38, 105-114) correlated binary data model is presented and applied to a test for trend where the stratifying variable is age (or any other subject-specific variable). Observed and expected values are calculated for the trend statistic separately for both unilateral and bilateral cases and are then summed to obtain an overall summary statistic. The proposed method is illustrated by a reanalysis of data presented in a published study of the efficacy of antibiotics for the treatment of otitis media.  相似文献   

4.
Begg MD 《Biometrics》1999,55(1):302-307
In many data analytic applications, such as ophthalmologic, longitudinal, or periodontal studies, multiple observations are recorded over several sites (or timepoints) within the same subject, bringing about dependence between measurements. This correlation, in turn, precludes the use of standard statistical methods that assume independence between outcome measurements. For example, the Mantel-Haenszel statistic, used to assess association between a binary outcome and a binary exposure while adjusting for a categorical covariate, does not follow the usual chi-squared distribution under the null hypothesis when there is correlation between observations. A modified Mantel-Haenszel procedure, which makes adjustment for dependence, is proposed. No particular correlation structure is assumed for responses within a cluster. This closed-form adjustment stems from Liang and Zeger's (1986, Biometrika 73, 13-22) generalized estimating equations approach for clustered data. The difference between this tabular (i.e., noniterative) technique and many earlier tabular methods is that the current method allows for consideration of site-specific exposure and covariate information. An example from a periodontal research study illustrates application of the method.  相似文献   

5.
Parzen M  Lipsitz SR 《Biometrics》1999,55(2):580-584
In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.  相似文献   

6.
Summary .   Standard prospective logistic regression analysis of case–control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern "retrospective" methods, including the "case-only" approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel empirical Bayes-type shrinkage estimator to analyze case–control data that can relax the gene-environment independence assumption in a data-adaptive fashion. In the special case, involving a binary gene and a binary exposure, the method leads to an estimator of the interaction log odds ratio parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case–control estimators. We also describe a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika 92, 399–418). Both simulated and real data examples suggest that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study.  相似文献   

7.
In many studies, the association of longitudinal measurements of a continuous response and a binary outcome are often of interest. A convenient framework for this type of problems is the joint model, which is formulated to investigate the association between a binary outcome and features of longitudinal measurements through a common set of latent random effects. The joint model, which is the focus of this article, is a logistic regression model with covariates defined as the individual‐specific random effects in a non‐linear mixed‐effects model (NLMEM) for the longitudinal measurements. We discuss different estimation procedures, which include two‐stage, best linear unbiased predictors, and various numerical integration techniques. The proposed methods are illustrated using a real data set where the objective is to study the association between longitudinal hormone levels and the pregnancy outcome in a group of young women. The numerical performance of the estimating methods is also evaluated by means of simulation.  相似文献   

8.
Rieger RH  Weinberg CR 《Biometrics》2002,58(2):332-341
Conditional logistic regression (CLR) is useful for analyzing clustered binary outcome data when interest lies in estimating a cluster-specific exposure parameter while treating the dependency arising from random cluster effects as a nuisance. CLR aggregates unmeasured cluster-specific factors into a cluster-specific baseline risk and is invalid in the presence of unmodeled heterogeneous covariate effects or within-cluster dependency. We propose an alternative, resampling-based method for analyzing clustered binary outcome data, within-cluster paired resampling (WCPR), which allows for within-cluster dependency not solely due to baseline heterogeneity. For example, dependency may be in part caused by heterogeneity in response to an exposure across clusters due to unmeasured cofactors. When both CLR and WCPR are valid, our simulations suggest that the two methods perform comparably. When CLR is invalid, WCPR continues to have good operating characteristics. For illustration, we apply both WCPR and CLR to a periodontal data set where there is heterogeneity in response to exposure across clusters.  相似文献   

9.
Large-scale surveys, such as national forest inventories and vegetation monitoring programs, usually have complex sampling designs that include geographical stratification and units organized in clusters. When models are developed using data from such programs, a key question is whether or not to utilize design information when analyzing the relationship between a response variable and a set of covariates. Standard statistical regression methods often fail to account for complex sampling designs, which may lead to severely biased estimators of model coefficients. Furthermore, ignoring that data are spatially correlated within clusters may underestimate the standard errors of regression coefficient estimates, with a risk for drawing wrong conclusions. We first review general approaches that account for complex sampling designs, e.g. methods using probability weighting, and stress the need to explore the effects of the sampling design when applying logistic regression models. We then use Monte Carlo simulation to compare the performance of the standard logistic regression model with two approaches to model correlated binary responses, i.e. cluster-specific and population-averaged logistic regression models. As an example, we analyze the occurrence of epiphytic hair lichens in the genus Bryoria; an indicator of forest ecosystem integrity. Based on data from the National Forest Inventory (NFI) for the period 1993–2014 we generated a data set on hair lichen occurrence on  >100,000 Picea abies trees distributed throughout Sweden. The NFI data included ten covariates representing forest structure and climate variables potentially affecting lichen occurrence. Our analyses show the importance of taking complex sampling designs and correlated binary responses into account in logistic regression modeling to avoid the risk of obtaining notably biased parameter estimators and standard errors, and erroneous interpretations about factors affecting e.g. hair lichen occurrence. We recommend comparisons of unweighted and weighted logistic regression analyses as an essential step in development of models based on data from large-scale surveys.  相似文献   

10.
Association Models for Clustered Data with Binary and Continuous Responses   总被引:1,自引:0,他引:1  
Summary .  We consider analysis of clustered data with mixed bivariate responses, i.e., where each member of the cluster has a binary and a continuous outcome. We propose a new bivariate random effects model that induces associations among the binary outcomes within a cluster, among the continuous outcomes within a cluster, between a binary outcome and a continuous outcome from different subjects within a cluster, as well as the direct association between the binary and continuous outcomes within the same subject. For the ease of interpretations of the regression effects, the marginal model of the binary response probability integrated over the random effects preserves the logistic form and the marginal expectation of the continuous response preserves the linear form. We implement maximum likelihood estimation of our model parameters using standard software such as PROC NLMIXED of SAS . Our simulation study demonstrates the robustness of our method with respect to the misspecification of the regression model as well as the random effects model. We illustrate our methodology by analyzing a developmental toxicity study of ethylene glycol in mice.  相似文献   

11.
Joint regression analysis of correlated data using Gaussian copulas   总被引:2,自引:0,他引:2  
Song PX  Li M  Yuan Y 《Biometrics》2009,65(1):60-68
Summary .  This article concerns a new joint modeling approach for correlated data analysis. Utilizing Gaussian copulas, we present a unified and flexible machinery to integrate separate one-dimensional generalized linear models (GLMs) into a joint regression analysis of continuous, discrete, and mixed correlated outcomes. This essentially leads to a multivariate analogue of the univariate GLM theory and hence an efficiency gain in the estimation of regression coefficients. The availability of joint probability models enables us to develop a full maximum likelihood inference. Numerical illustrations are focused on regression models for discrete correlated data, including multidimensional logistic regression models and a joint model for mixed normal and binary outcomes. In the simulation studies, the proposed copula-based joint model is compared to the popular generalized estimating equations, which is a moment-based estimating equation method to join univariate GLMs. Two real-world data examples are used in the illustration.  相似文献   

12.
We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinal X, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a birth data set.  相似文献   

13.
J M Neuhaus  N P Jewell 《Biometrics》1990,46(4):977-990
Recently a great deal of attention has been given to binary regression models for clustered or correlated observations. The data of interest are of the form of a binary dependent or response variable, together with independent variables X1,...., Xk, where sets of observations are grouped together into clusters. A number of models and methods of analysis have been suggested to study such data. Many of these are extensions in some way of the familiar logistic regression model for binary data that are not grouped (i.e., each cluster is of size 1). In general, the analyses of these clustered data models proceed by assuming that the observed clusters are a simple random sample of clusters selected from a population of clusters. In this paper, we consider the application of these procedures to the case where the clusters are selected randomly in a manner that depends on the pattern of responses in the cluster. For example, we show that ignoring the retrospective nature of the sample design, by fitting standard logistic regression models for clustered binary data, may result in misleading estimates of the effects of covariates and the precision of estimated regression coefficients.  相似文献   

14.
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure.  相似文献   

15.
A protective effect of breastfeeding on overweight (binary) has been reported by meta-analyses using logistic regression, whereas studies using linear regression and BMI (continuous) detected no significant association. To assess the relationship of these differences with different outcome classification, we compared results for linear, logistic, and quantile regression models in a cross-sectional data set of considerable size. Height, weight, and questionnaire data on 9,368 preschool children were collected during school-entry examinations in 1999 and 2002 in Bavaria, Southern Germany. We calculated multivariable linear, logistic, and quantile regression models with outcomes BMI, overweight, obesity, and BMI quantiles (as appropriate). Models considered the covariates breastfeeding (breastfed vs. never breastfed), gender, age, smoking in pregnancy, TV watching, maternal BMI, parental education, and early infant weight gain. No significant association was found in the linear regression model. In the logistic model, a significant association was observed for obesity (odds ratio: 0.72 (95% confidence interval (CI) 0.55, 0.94)). In quantile regression no significant point estimates were observed for the percentiles of 0.4-0.8. However, breastfeeding reduced the BMI of children having values on the 90th and 97th percentiles by -0.23 (95% CI -0.39, -0.07) and -0.26 (95% CI -0.45, -0.07) kg/m(2), respectively, on average. In contrast, breastfeeding was significantly associated with a low shift toward higher BMI values for BMI quantiles of 0.03 and from 0.1 to 0.3. The detection of associations between breastfeeding and childhood body composition might be related to the coding of the response variable (continuous or binary) and the statistical method used (linear, logistic, or quantile regression). Quantile regression should additionally be applied in such studies.  相似文献   

16.
J Nam  J J Gart 《Biometrics》1985,41(2):455-466
The general method of the discrepancy or heterogeneity chi-square is applied to ABO-like data in which there are no observed double blanks in either the disease or the control group. When the recessive gene frequency is assumed zero, this method leads to an approximate chi-square test identical to that suggested by Smouse and Williams (1982, Biometrics 38, 757-768). When this assumption is relaxed, there arise two cases which are determined by whether the maximum likelihood estimate of this frequency is zero or not. It is shown that the value of the simple score statistic of Gart and Nam (1984, Biometrics 40, 887-894) discriminates between the two cases. The various omnibus test statistics for comparing groups are shown to differ little in several practical examples. However, under the more general assumption the appropriate degrees of freedom is one more than the number previously suggested.  相似文献   

17.
The distribution of the Hosmer-Lemeshow chi-square type goodness-of-fit tests (?g, ?g) for the logistic regression model are examined via simulations designed to examine their behavior when most of the estimated probabilities are small or are expected to fall in a few deciles. The results of the simulations show statistic ?g should be used when the two outcome groups (y = 0, 1) are not well separated, Δ≤2, where Δ2 is the Mahalanobis distance. Statistic ?g should be used when Δ ≥ 8. Either statistic may be used when 2 ≦ Δ ≦ 8. All tests should be used with caution when the proportion in the sample with y = 1 is less than 0.1.  相似文献   

18.
Summary Case-parent trio studies concerned with children affected by a disease and their parents aim to detect single nucleotide polymorphisms (SNPs) showing a preferential transmission of alleles from the parents to their affected offspring. A popular statistical test for detecting such SNPs associated with disease in this study design is the genotypic transmission/disequilibrium test (gTDT) based on a conditional logistic regression model, which usually needs to be fitted by an iterative procedure. In this article, we derive exact closed-form solutions for the parameter estimates of the conditional logistic regression models when testing for an additive, a dominant, or a recessive effect of a SNP, and show that such analytic parameter estimates also exist when considering gene-environment interactions with binary environmental variables. Because the genetic model underlying the association between a SNP and a disease is typically unknown, it might further be beneficial to use the maximum over the gTDT statistics for the possible effects of a SNP as test statistic. We therefore propose a procedure enabling a fast computation of the test statistic and the permutation-based p-value of this MAX gTDT. All these methods are applied to whole-genome scans of the case-parent trios from the International Cleft Consortium. These applications show our procedures dramatically reduce the required computing time compared to the conventional iterative methods allowing, for example, the analysis of hundreds of thousands of SNPs in a few minutes instead of several hours.  相似文献   

19.
Different types of random binary topological trees (like neuronal processes and rivers) occur with relative frequencies that can be explained in terms of growth models. It will be shown how the model parameter determining the mode of growth can be estimated with the maximum likelihood procedure from observed data. Monte Carlo simulations were used to study the distributional properties of this estimator which appeared to have a negligible bias. It is shown that the minimum chi-square procedure yields an estimate that is very close to the maximum likelihood estimate. Moreover, the goodness-of-fit of the growth model can be inferred directly from the chi-square statistic. To illustrate the procedures we examined axonal trees from the goldfish tectum. A notion of complete partition randomness is presented as an alternative to our growth hypotheses.  相似文献   

20.
Use of runs statistics for pattern recognition in genomic DNA sequences.   总被引:2,自引:0,他引:2  
In this article, the use of the finite Markov chain imbedding (FMCI) technique to study patterns in DNA under a hidden Markov model (HMM) is introduced. With a vision of studying multiple runs-related statistics simultaneously under an HMM through the FMCI technique, this work establishes an investigation of a bivariate runs statistic under a binary HMM for DNA pattern recognition. An FMCI-based recursive algorithm is derived and implemented for the determination of the exact distribution of this bivariate runs statistic under an independent identically distributed (IID) framework, a Markov chain (MC) framework, and a binary HMM framework. With this algorithm, we have studied the distributions of the bivariate runs statistic under different binary HMM parameter sets; probabilistic profiles of runs are created and shown to be useful for trapping HMM maximum likelihood estimates (MLEs). This MLE-trapping scheme offers good initial estimates to jump-start the expectation-maximization (EM) algorithm in HMM parameter estimation and helps prevent the EM estimates from landing on a local maximum or a saddle point. Applications of the bivariate runs statistic and the probabilistic profiles in conjunction with binary HMMs for pattern recognition in genomic DNA sequences are illustrated via case studies on DNA bendability signals using human DNA data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号