共查询到20条相似文献,搜索用时 15 毫秒
1.
We study a semiparametric estimation method for the random effects logistic regression when there is auxiliary covariate information about the main exposure variable. We extend the semiparametric estimator of Pepe and Fleming (1991, Journal of the American Statistical Association 86, 108-113) to the random effects model using the best linear unbiased prediction approach of Henderson (1975, Biometrics 31, 423-448). The method can be used to handle the missing covariate or mismeasured covariate data problems in a variety of real applications. Simulation study results show that the proposed method outperforms the existing methods. We analyzed a data set from the Collaborative Perinatal Project using the proposed method and found that the use of DDT increases the risk of preterm births among U.S. children. 相似文献
2.
Predictive models of habitat suitability for the Common Crane Grus grus in a wintering area of southern Portugal were derived using logistic multiple regression and Geographic Information Systems. The study area was characterized by landscape variables and surveyed uniformly for the presence of cranes. The most important variables were distance to roosts, to open Holm Oak woods and to villages, and the occurrence of unpaved roads, shrubby vegetation, slope and orchards. Two models were built, the second having one variable fewer than the first. The selection of the best model was based on statistical and biological criteria. Crane distribution was negatively related to: distance to open Holm Oak Quercus rotundifolia woods and roosts. Additionally, unsuitable vegetation and orchard areas are avoided. In these areas movement is difficult, food availability is reduced and the risk of predation increased. We also found that villages and roads were avoided; disturbance is a significant factor for this species. Some management guidelines are proposed for the area: (1) maintenance of open Holm Oak woodlands, (2) incentives to avoid the abandonment of traditional agriculture and pastoral use of the area, which would lead to an increase of shrubby vegetation areas, (3) preservation of suitable roosting places and (4) management of new patches of forest and orchards. 相似文献
3.
Summary . In medical research, there is great interest in developing methods for combining biomarkers. We argue that selection of markers should also be considered in the process. Traditional model/variable selection procedures ignore the underlying uncertainty after model selection. In this work, we propose a novel model-combining algorithm for classification in biomarker studies. It works by considering weighted combinations of various logistic regression models; five different weighting schemes are considered in the article. The weights and algorithm are justified using decision theory and risk-bound results. Simulation studies are performed to assess the finite-sample properties of the proposed model-combining method. It is illustrated with an application to data from an immunohistochemical study in prostate cancer. 相似文献
4.
5.
Abstract Natural forest expansion is one of the most relevant landscape changes in many temperate countries. Although large areas are involved, relatively few studies have been carried out with the objective of unravelling the specific impact of the individual factors characterising the sites prone to such a process. The aim of this article is to present a research tool for assessing the factors characterising farmland sites prone to natural conversion from crop growing and pasture to forests and other wooded land (OWL), and for predicting the probability of such a land-use change. The methodological approach is based on multinomial logistic regression. As a case study, the approach was applied to land-use classification repeated on the same sites in a large area of central Italy on two successive occasions, spanning two decades, from the beginning of the 1980s up to 2002. Of all the factors assessed, landscape attributes were identified as a sufficient subset for quantitative prediction of change from farmland to OWL or to forest. The tested modelling approach is explicitly empirical and planning-oriented. From a quantitative point of view, the precision of the models may be only indicative for assessing land-use change probability for single observations, while it is appropriate for predicting mean probabilities at a landscape mapping level, where it is possible to sample a number of sites. At this level, the approach is a useful tool for simulating future landscape scenarios related to natural forest expansion. 相似文献
6.
For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15 , 719–730) and Xie and Zhang (2017, Int J Biostat 13 , 1–20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey. 相似文献
7.
8.
We propose using a variant of logistic regression (LR) with-regularization to fit gene–gene and gene–environment interaction models. Studies haveshown that many common diseases are influenced by interactionof certain genes. LR models with quadratic penalization notonly correctly characterizes the influential genes along withtheir interaction structures but also yields additional benefitsin handling high-dimensional, discrete factors with a binaryresponse. We illustrate the advantages of using an -regularization scheme and compare its performancewith that of "multifactor dimensionality reduction" and "FlexTree,"2 recent tools for identifying gene–gene interactions.Through simulated and real data sets, we demonstrate that ourmethod outperforms other methods in the identification of theinteraction structures as well as prediction accuracy. In addition,we validate the significance of the factors selected throughbootstrap analyses. 相似文献
9.
10.
A goodness-of-fit test for multinomial logistic regression 总被引:1,自引:0,他引:1
This article presents a score test to check the fit of a logistic regression model with two or more outcome categories. The null hypothesis that the model fits well is tested against the alternative that residuals of samples close to each other in covariate space tend to deviate from the model in the same direction. We propose a test statistic that is a sum of squared smoothed residuals, and show that it can be interpreted as a score test in a random effects model. By specifying the distance metric in covariate space, users can choose the alternative against which the test is directed, making it either an omnibus goodness-of-fit test or a test for lack of fit of specific model variables or outcome categories. 相似文献
11.
12.
13.
14.
15.
A computer program for linear logistic regression analysis 总被引:1,自引:0,他引:1
E T Lee 《Computer programs in biomedicine》1974,4(2):80-92
16.
Background
Many mathematical and statistical models and algorithms have been proposed to do biomarker identification in recent years. However, the biomarkers inferred from different datasets suffer a lack of reproducibilities due to the heterogeneity of the data generated from different platforms or laboratories. This motivates us to develop robust biomarker identification methods by integrating multiple datasets.Methods
In this paper, we developed an integrative method for classification based on logistic regression. Different constant terms are set in the logistic regression model to measure the heterogeneity of the samples. By minimizing the differences of the constant terms within the same dataset, both the homogeneity within the same dataset and the heterogeneity in multiple datasets can be kept. The model is formulated as an optimization problem with a network penalty measuring the differences of the constant terms. The L1 penalty, elastic penalty and network related penalties are added to the objective function for the biomarker discovery purpose. Algorithms based on proximal Newton method are proposed to solve the optimization problem.Results
We first applied the proposed method to the simulated datasets. Both the AUC of the prediction and the biomarker identification accuracy are improved. We then applied the method to two breast cancer gene expression datasets. By integrating both datasets, the prediction AUC is improved over directly merging the datasets and MetaLasso. And it’s comparable to the best AUC when doing biomarker identification in an individual dataset. The identified biomarkers using network related penalty for variables were further analyzed. Meaningful subnetworks enriched by breast cancer were identified.Conclusion
A network-based integrative logistic regression model is proposed in the paper. It improves both the prediction and biomarker identification accuracy.17.
18.
19.
Minimum distance estimation for the logistic regression model 总被引:1,自引:0,他引:1
20.
We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). We extend this method to continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American Statistical Association 85, 699-704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log-concave. The log-concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41, 337-348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one-dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E-step. We present examples involving both simulated and real data. 相似文献