首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Fence method (Jiang and others 2008. Fence methods for mixed model selection. Annals of Statistics 36, 1669-1692) is a recently proposed strategy for model selection. It was motivated by the limitation of the traditional information criteria in selecting parsimonious models in some nonconventional situations, such as mixed model selection. Jiang and others (2009. A simplified adaptive fence procedure, Statistics & Probability Letters 79, 625-629) simplified the adaptive fence method of Jiang and others (2008) to make it more suitable and convenient to use in a wide variety of problems. Still, the current modification encounters computational difficulties when applied to high-dimensional and complex problems. To address this concern, we proposed a restricted fence procedure that combines the idea of the fence with that of the restricted maximum likelihood. Furthermore, we propose to use the wild bootstrap for choosing adaptively the tuning parameter used in the restricted fence. We focus on problems of longitudinal studies and demonstrate the performance of the new procedure and its comparison with other procedures of variable selection, including the information criteria and shrinkage methods, in simulation studies. The method is further illustrated by an example of real-data analysis.  相似文献   

3.
4.
DNA methylation is a widely studied epigenetic mechanism and alterations in methylation patterns may be involved in the development of common diseases. Unlike inherited changes in genetic sequence, variation in site-specific methylation varies by tissue, developmental stage, and disease status, and may be impacted by aging and exposure to environmental factors, such as diet or smoking. These non-genetic factors are typically included in epigenome-wide association studies (EWAS) because they may be confounding factors to the association between methylation and disease. However, missing values in these variables can lead to reduced sample size and decrease the statistical power of EWAS. We propose a site selection and multiple imputation (MI) method to impute missing covariate values and to perform association tests in EWAS. Then, we compare this method to an alternative projection-based method. Through simulations, we show that the MI-based method is slightly conservative, but provides consistent estimates for effect size. We also illustrate these methods with data from the Atherosclerosis Risk in Communities (ARIC) study to carry out an EWAS between methylation levels and smoking status, in which missing cell type compositions and white blood cell counts are imputed.  相似文献   

5.
6.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

7.
Claeskens G  Consentino F 《Biometrics》2008,64(4):1062-1069
SUMMARY: Application of classical model selection methods such as Akaike's information criterion (AIC) becomes problematic when observations are missing. In this article we propose some variations on the AIC, which are applicable to missing covariate problems. The method is directly based on the expectation maximization (EM) algorithm and is readily available for EM-based estimation methods, without much additional computational efforts. The missing data AIC criteria are formally derived and shown to work in a simulation study and by application to data on diabetic retinopathy.  相似文献   

8.
9.
Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values, and the missing at random assumption is commonly adopted, which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time. This is especially so when covariate missingness is related to an unmeasured variable affected by the patient's illness and prognosis factors at baseline. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their finite-sample performance is examined through simulation. An application to a real-data example is included for illustration.  相似文献   

10.
11.
An extension of the usual mixture model of heterogeneity (two family types, one with and one without linkage) is proposed by introducing age at onset as a covariate. The extended model defines age-dependent penetrances where the exact parametrization of age-at-onset distributions depends on the given genotype and family type (linked or unlinked). This extension was applied to breast cancer families. We postulated that the mean age at onset in individuals affected by the linked gene was lower than the mean age at onset in all other affected individuals. Linkage heterogeneity for breast cancer families was detected at a significance level of .003.  相似文献   

12.
Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

13.
Wang P  Puterman ML  Cockburn I  Le N 《Biometrics》1996,52(2):381-400
This paper studies a class of Poisson mixture models that includes covariates in rates. This model contains Poisson regression and independent Poisson mixtures as special cases. Estimation methods based on the EM and quasi-Newton algorithms, properties of these estimates, a model selection procedure, residual analysis, and goodness-of-fit test are discussed. A Monte Carlo study investigates implementation and model choice issues. This methodology is used to analyze seizure frequency and Ames salmonella assay data.  相似文献   

14.
Epistasis or gene-gene interaction is a fundamental component of the genetic architecture of complex traits such as disease susceptibility. Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free method to detect epistasis when there are no significant marginal genetic effects. However, in many studies of complex disease, other covariates like age of onset and smoking status could have a strong main effect and may potentially interfere with MDR's ability to achieve its goal. In this paper, we present a simple and computationally efficient sampling method to adjust for covariate effects in MDR. We use simulation to show that after adjustment, MDR has sufficient power to detect true gene-gene interactions. We also compare our method with the state-of-art technique in covariate adjustment. The results suggest that our proposed method performs similarly, but is more computationally efficient. We then apply this new method to an analysis of a population-based bladder cancer study in New Hampshire.  相似文献   

15.
Markov models for covariate dependence of binary sequences   总被引:2,自引:1,他引:2  
Suppose that a heterogeneous group of individuals is followed over time and that each individual can be in state 0 or state 1 at each time point. The sequence of states is assumed to follow a binary Markov chain. In this paper we model the transition probabilities for the 0 to 0 and 1 to 0 transitions by two logistic regressions, thus showing how the covariates relate to changes in state. With p covariates, there are 2(p + 1) parameters including intercepts, which we estimate by maximum likelihood. We show how to use transition probability estimates to test hypotheses about the probability of occupying state 0 at time i (i = 2, ..., T) and the equilibrium probability of state 0. These probabilities depend on the covariates. A recursive algorithm is suggested to estimate regression coefficients when some responses are missing. Extensions of the basic model which allow time-dependent covariates and nonstationary or second-order Markov chains are presented. An example shows the model applied to a study of the psychological impact of breast cancer in which women did or did not manifest distress at four time points in the year following surgery.  相似文献   

16.
Multiple sequence alignment is discussed in light of homology assessments in phylogenetic research. Pairwise and multiple alignment methods are reviewed as exact and heuristic procedures. Since the object of alignment is to create the most efficient statement of initial homology, methods that minimize nonhomology are to be favored. Therefore, among all possible alignments, the one that satisfies the phylogenetic optimality criterion the best should be considered the best alignment. Since all homology statements are subject to testing and explanation this way, consistency of optimality criteria is desirable. This consistency is based on the treatment of alignment gaps as character information and the consistent use of a cost function (e.g., insertion-deletion, transversion, and transition) through analysis from alignment to phylogeny reconstruction. Cost functions are not subject to testing via inspection; hence the assumptions they make should be examined by varying the assumed values in a sensitivity analysis context to test for the robustness of results. Agreement among data may be used to choose an optimal solution set from all of those examined through parameter variation. This idea of consistency between assumption and analysis through alignment and cladogram reconstruction is not limited to parsimony analysis and could and should be applied to other forms of analysis such as maximum likelihood.  相似文献   

17.
Dai JY  LeBlanc M  Kooperberg C 《Biometrics》2009,65(1):178-187
Summary .  Recent results for case–control sampling suggest when the covariate distribution is constrained by gene-environment independence, semiparametric estimation exploiting such independence yields a great deal of efficiency gain. We consider the efficient estimation of the treatment–biomarker interaction in two-phase sampling nested within randomized clinical trials, incorporating the independence between a randomized treatment and the baseline markers. We develop a Newton–Raphson algorithm based on the profile likelihood to compute the semiparametric maximum likelihood estimate (SPMLE). Our algorithm accommodates both continuous phase-one outcomes and continuous phase-two biomarkers. The profile information matrix is computed explicitly via numerical differentiation. In certain situations where computing the SPMLE is slow, we propose a maximum estimated likelihood estimator (MELE), which is also capable of incorporating the covariate independence. This estimated likelihood approach uses a one-step empirical covariate distribution, thus is straightforward to maximize. It offers a closed-form variance estimate with limited increase in variance relative to the fully efficient SPMLE. Our results suggest exploiting the covariate independence in two-phase sampling increases the efficiency substantially, particularly for estimating treatment–biomarker interactions.  相似文献   

18.
19.
20.
Li L  Shao J  Palta M 《Biometrics》2005,61(3):824-830
Covariate measurement error in regression is typically assumed to act in an additive or multiplicative manner on the true covariate value. However, such an assumption does not hold for the measurement error of sleep-disordered breathing (SDB) in the Wisconsin Sleep Cohort Study (WSCS). The true covariate is the severity of SDB, and the observed surrogate is the number of breathing pauses per unit time of sleep, which has a nonnegative semicontinuous distribution with a point mass at zero. We propose a latent variable measurement error model for the error structure in this situation and implement it in a linear mixed model. The estimation procedure is similar to regression calibration but involves a distributional assumption for the latent variable. Modeling and model-fitting strategies are explored and illustrated through an example from the WSCS.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号