首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 375 毫秒
1.
For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15 , 719–730) and Xie and Zhang (2017, Int J Biostat 13 , 1–20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey.  相似文献   

2.
Semiparametric regression estimation in the presence of dependent censoring   总被引:5,自引:0,他引:5  
We propose a semiparametric estimation procedure for estimatingthe regression of an outcome Y, measured at the end of a fixedfollow-up period, on baseline explanatory variables X, measuredprior to start of follow-up, in the presence of dependent censoringgiven X. The proposed estimators are consistent when the dataare ‘missing at random’ but not ‘missing completelyat random’ (Rubin, 1976), and do not require full specificationof the complete data likelihood. Specifically, we assume thatthe probability of censoring at time t is independent of theoutcome Y conditional on the recorded history up to t of a vectorof time-dependent covariates that are correlated with Y. Ourestimators can be used to adjust for dependent censoring andnonrandom noncompliance in randomised trials studying the effectof a treatment on the mean of a response variable of interest.Even with independent censoring, our methods allow the investigatorto increase efficiency by exploiting the correlation of theoutcome with a vector of time-dependent covariates.  相似文献   

3.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

4.
Given the normal multivariate linear regression model Y = BX + E, with B subjected to the linear restrictions H BJ = W A, J known, W and H unknown, A known, the maximum likelihood estimates of H, B, W, are obtained. A likelihood ratio test criterion for testing H = H0, W = W0 is provided. The results are extended to the GMANOVA model. All results are obtained in terms of the original variates directly, unlike Healy (1980) who obtains the results for the MANOVA model in terms of the canonical transformations of the original variates.  相似文献   

5.
Accurately estimating biological sex from the human skeleton can be especially difficult for fragmentary or incomplete remains often encountered in bioarchaeological contexts. Where typical anatomically dimorphic skeletal regions are incomplete or absent, observers often take their best guess to classify biological sex. Latent profile analysis (LPA) is a mixture modeling technique which uses observed continuous data to estimate unobserved categorical group membership using posterior probabilities. In this study, sex is the latent variable (male and female are the two latent classes), and the indicator variables used here were eight standard linear measurements (long bone lengths, diaphyseal and articular breadths, and circumferences). Mplus (Muthén and Muthén: Mplus user's guide, 6th ed. Los Angeles: Muthén & Muthén, 2010) was used to obtain maximum likelihood estimates for latent class membership from a known sample of individuals from the forensic data bank (FDB) (Jantz and Moore‐Jansen: Database for forensic anthropology in the United States 1962–1991, Ann Arbor, MI: Interuniversity Consortium for Political and Social Research, 2000) (n = 1,831), yielding 87% of correct classification for sex. Then, a simulation extracted 5,000 different random samples of 206 complete cases each from the FDB (these cases also had known sex). We then artificially imposed patterns of missing data similar to that observed in a poorly preserved bioarchaeological sample from Medieval Asturias, Spain (n = 206), and ran LPA on each sample. This tested the efficacy of LPA under extreme conditions of poor preservation (missing data, 42%). The simulation yielded an average of 82% accuracy, indicating that LPA is robust to large amounts of missing data when analyzing incomplete skeletons. Am J Phys Anthropol 151:538–543, 2013. © 2013 Wiley Periodicals, Inc.  相似文献   

6.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

7.
Summary Boosting is a powerful approach to fitting regression models. This article describes a boosting algorithm for likelihood‐based estimation with incomplete data. The algorithm combines boosting with a variant of stochastic approximation that uses Markov chain Monte Carlo to deal with the missing data. Applications to fitting generalized linear and additive models with missing covariates are given. The method is applied to the Pima Indians Diabetes Data where over half of the cases contain missing values.  相似文献   

8.
Satten GA  Carroll RJ 《Biometrics》2000,56(2):384-388
We consider methods for analyzing categorical regression models when some covariates (Z) are completely observed but other covariates (X) are missing for some subjects. When data on X are missing at random (i.e., when the probability that X is observed does not depend on the value of X itself), we present a likelihood approach for the observed data that allows the same nuisance parameters to be eliminated in a conditional analysis as when data are complete. An example of a matched case-control study is used to demonstrate our approach.  相似文献   

9.
Summary Given a randomized treatment Z, a clinical outcome Y, and a biomarker S measured some fixed time after Z is administered, we may be interested in addressing the surrogate endpoint problem by evaluating whether S can be used to reliably predict the effect of Z on Y. Several recent proposals for the statistical evaluation of surrogate value have been based on the framework of principal stratification. In this article, we consider two principal stratification estimands: joint risks and marginal risks. Joint risks measure causal associations (CAs) of treatment effects on S and Y, providing insight into the surrogate value of the biomarker, but are not statistically identifiable from vaccine trial data. Although marginal risks do not measure CAs of treatment effects, they nevertheless provide guidance for future research, and we describe a data collection scheme and assumptions under which the marginal risks are statistically identifiable. We show how different sets of assumptions affect the identifiability of these estimands; in particular, we depart from previous work by considering the consequences of relaxing the assumption of no individual treatment effects on Y before S is measured. Based on algebraic relationships between joint and marginal risks, we propose a sensitivity analysis approach for assessment of surrogate value, and show that in many cases the surrogate value of a biomarker may be hard to establish, even when the sample size is large.  相似文献   

10.
Two interesting results encountered in the literature concerning the Poisson and the negative binomial distributions are due to Moran (1952) and Patil & Seshadri (1964), respectively. Morans result provided a fundamental property of the Poisson distribution. Roughly speaking, he has shown that if Y, Z are independent, non-negative, integer-valued random variables with X = Y | Z then, under some mild restrictions, the conditional distribution of Y | X is binomial if and only if Y, Z are Poisson random variables. Motivated by Morans result Patil & Seshadri obtained a general characterization. A special case of this characterization suggests that, with conditions similar to those imposed by Moran, Y | X is negative hypergeometric if and only if Y, Z are negative binomials. In this paper we examine the results of Moran and Patil & Seshadri in the case where the conditional distribution of Y | X is truncated at an arbitrary point k – 1 (k = 1, 2, …). In fact we attempt to answer the question as to whether Morans property of the Poisson distribution, and subsequently Patil & Seshadris property of the negative binomial distribution, can be extended, in one form or another, to the case where Y | X is binomial truncated at k – 1 and negative hypergeometric truncated at k – 1 respectively.  相似文献   

11.
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

12.
The final step in the process of conidiation—conidial pigmentation—was studied in the fungus Trichoderma viride. Twenty-nine auxotrophic, color mutants, isolated from the same green wildtype strain, were paired to produce stable heterokaryons in all possible combinations and grouped according to their complementation behavior. No complementation (green pigmentation) was found in any of the heterokaryons formed by pairs of white (W) mutants. However, these mutants could be separated into two groups with respect to their behavior when paired with yellow (Y) and brown (Br) mutants. When Wc mutants were paired with any of the Y or Br mutants complementation took place. However, Wd mutants displayed this reaction with only one group of yellow mutants (Ya) and not with the other (Yb) nor with Br mutants. In noncomplementing heterokaryons such as Yb/Wd, only yellow and white conidia were produced, pigmentation being autonomous. On the other hand, in heterokaryons in which complementation did take place, as for instance Ya/Wd, green as well as white and yellow conidia were produced. Differential sensitivity to UV irradiation was used to show that the green conidia were of either Wd or Fa genotype, indicating a nonautonomous type of gene action. It is suggested that the genes Wc, Ya, Yb and Br have a sequential structural role in the biosynthesis of the green pigment, while Wd controls the activity of three (Wc, Yb, Br) of these genes.  相似文献   

13.
Shin Y  Raudenbush SW 《Biometrics》2007,63(4):1262-1268
The development of model-based methods for incomplete data has been a seminal contribution to statistical practice. Under the assumption of ignorable missingness, one estimates the joint distribution of the complete data for thetainTheta from the incomplete or observed data y(obs). Many interesting models involve one-to-one transformations of theta. For example, with y(i) approximately N(mu, Sigma) for i= 1, ... , n and theta= (mu, Sigma), an ordinary least squares (OLS) regression model is a one-to-one transformation of theta. Inferences based on such a transformation are equivalent to inferences based on OLS using data multiply imputed from f(y(mis) | y(obs), theta) for missing y(mis). Thus, identification of theta from y(obs) is equivalent to identification of the regression model. In this article, we consider a model for two-level data with continuous outcomes where the observations within each cluster are dependent. The parameters of the hierarchical linear model (HLM) of interest, however, lie in a subspace of Theta in general. This identification of the joint distribution overidentifies the HLM. We show how to characterize the joint distribution so that its parameters are a one-to-one transformation of the parameters of the HLM. This leads to efficient estimation of the HLM from incomplete data using either the transformation method or the method of multiple imputation. The approach allows outcomes and covariates to be missing at either of the two levels, and the HLM of interest can involve the regression of any subset of variables on a disjoint subset of variables conceived as covariates.  相似文献   

14.
Regression trees allow to search for meaningful explanatory variables that have a non linear impact on the dependent variable. Often they are used when there are many covariates and one does not want to restrict attention to only few of them. To grow a tree at each stage one has to select a cut point for splitting a group into two subgroups. The basis for this are the maxima of the test statistics related to the possible splits due to every covariate. They or the resulting P-values are compared as measure of importance. If covariates have different numbers of missing values, ties, or even different measurement scales the covariates lead to different numbers of tests. Those with a higher number of tests have a greater chance to achieve a smaller P-value if they are not adjusted. This can lead to erroneous splits even if the P-values are looked at informally. There is some theoretical work by Miller and Siegmund (1982) and Lausen and Schumacher (1992) to give an adjustment rule. But the asymptotic is based on a continuum of split points and may not lead to a fair splitting rule when applied to smaller data sets or to covariates with only few different values. Here we develop an approach that allows determination of P-values for any number of splits. The only approximation that is used is the normal approximation of the test statistics. The starting point for this investigation has been a prospective study on the development of AIDS. This is presented here as the main application.  相似文献   

15.
In deletion-mapping of W-specific RAPD (W-RAPD) markers and putative female determinant gene (Fem), we used X-ray irradiation to break the translocation-carrying W chromosome (W Ze ). We succeeded in obtaining a fragment of the W Ze chromosome designated as Ze W, having 3 of 12 W-RAPD markers (W-Bonsai, W-Yukemuri-S, W-Yukemuri-L). Inheritance of the Ze W fragment by males indicates that it does not include the Fem gene. On the basis of these results, we determined the relative positions of W-Yukemuri-S and W-Yukemuri-L, and we narrowed down the region where Fem gene is located. In addition to the Ze W fragment, the Z chromosome was also broken into a large fragment (Z1) having the + sch (1-21.5) and a small fragment (Z2) having the + od (1-49.6). Moreover, a new chromosomal fragment (Ze WZ2) was generated by a fusion event between the Ze W and the Z2 fragments. We analyzed the genetic behavior of the Z1 fragment and the Ze WZ2 fragment during male (Z/Z1 Ze WZ2) and female (Z1 Ze WZ2/W) meiosis using phenotypic markers. It was observed that the Z1 fragment and the Z or the W chromosomes separate without fail. On the other hand, non-disjunction between the Ze WZ2 fragment and the Z chromosome and also between the Ze WZ2 fragment and the W chromosome occurred. Furthermore, the females (2A: Z/Ze WZ2/W) and males (2A: Z/Z1) resulting from non-disjunction between the Ze WZ2 fragment and the W chromosome had phenotypic defects: namely, females exhibited abnormal oogenesis and males were flapless due to abnormal indirect flight muscle structure. These results suggest that Z2 region of the Z chromosome contains dose-sensitive gene(s), which are involved in oogenesis and indirect flight muscle development.  相似文献   

16.
Imputation, weighting, direct likelihood, and direct Bayesian inference (Rubin, 1976) are important approaches for missing data regression. Many useful semiparametric estimators have been developed for regression analysis of data with missing covariates or outcomes. It has been established that some semiparametric estimators are asymptotically equivalent, but it has not been shown that many are numerically the same. We applied some existing methods to a bladder cancer case-control study and noted that they were the same numerically when the observed covariates and outcomes are categorical. To understand the analytical background of this finding, we further show that when observed covariates and outcomes are categorical, some estimators are not only asymptotically equivalent but also actually numerically identical. That is, although their estimating equations are different, they lead numerically to exactly the same root. This includes a simple weighted estimator, an augmented weighted estimator, and a mean-score estimator. The numerical equivalence may elucidate the relationship between imputing scores and weighted estimation procedures.  相似文献   

17.
It is not uncommon for biological anthropologists to analyze incomplete bioarcheological or forensic skeleton specimens. As many quantitative multivariate analyses cannot handle incomplete data, missing data imputation or estimation is a common preprocessing practice for such data. Using William W. Howells' Craniometric Data Set and the Goldman Osteometric Data Set, we evaluated the performance of multiple popular statistical methods for imputing missing metric measurements. Results indicated that multiple imputation methods outperformed single imputation methods, such as Bayesian principal component analysis (BPCA). Multiple imputation with Bayesian linear regression implemented in the R package norm2, the Expectation–Maximization (EM) with Bootstrapping algorithm implemented in Amelia, and the Predictive Mean Matching (PMM) method and several of the derivative linear regression models implemented in mice, perform well regarding accuracy, robustness, and speed. Based on the findings of this study, we suggest a practical procedure for choosing appropriate imputation methods.  相似文献   

18.
The Cox proportional hazards regression model is the most popular approach to model covariate information for survival times. In this context, the development of high‐dimensional models where the number of covariates is much larger than the number of observations ( $p \,{\gg }\, n$ ) is an ongoing challenge. A practicable approach is to use ridge penalized Cox regression in such situations. Beside focussing on finding the best prediction rule, one is often interested in determining a subset of covariates that are the most important ones for prognosis. This could be a gene set in the biostatistical analysis of microarray data. Covariate selection can then, for example, be done by L1‐penalized Cox regression using the lasso (Tibshirani ( 1997 ). Statistics in Medicine 16 , 385–395). Several approaches beyond the lasso, that incorporate covariate selection, have been developed in recent years. This includes modifications of the lasso as well as nonconvex variants such as smoothly clipped absolute deviation (SCAD) (Fan and Li ( 2001 ). Journal of the American Statistical Association 96 , 1348–1360; Fan and Li ( 2002 ). The Annals of Statistics 30 , 74–99). The purpose of this article is to implement them practically into the model building process when analyzing high‐dimensional data with the Cox proportional hazards model. To evaluate penalized regression models beyond the lasso, we included SCAD variants and the adaptive lasso (Zou ( 2006 ). Journal of the American Statistical Association 101 , 1418–1429). We compare them with “standard” applications such as ridge regression, the lasso, and the elastic net. Predictive accuracy, features of variable selection, and estimation bias will be studied to assess the practical use of these methods. We observed that the performance of SCAD and adaptive lasso is highly dependent on nontrivial preselection procedures. A practical solution to this problem does not yet exist. Since there is high risk of missing relevant covariates when using SCAD or adaptive lasso applied after an inappropriate initial selection step, we recommend to stay with lasso or the elastic net in actual data applications. But with respect to the promising results for truly sparse models, we see some advantage of SCAD and adaptive lasso, if better preselection procedures would be available. This requires further methodological research.  相似文献   

19.
Molecular methods are a necessary tool for sexing monomorphic birds. These molecular approaches are usually reliable, but sexing protocols should be evaluated carefully because biochemical interactions may lead to errors. We optimized laboratory protocols for genetic sexing of a monomorphic shorebird, the upland sandpiper (Bartramia longicauda), using two independent sets of primers, P2/P8 and 2550F/2718R, to amplify regions of the sex‐linked CHD‐Z and CHD‐W genes. We discovered polymorphisms in the region of the CHD‐Z intron amplified by the primers P2/P8 which caused four males to be misidentified as females (n = 90 mated pairs). We cloned and sequenced one CHD‐W allele (370 bp) and three CHD‐Z alleles in our population: Z° (335 bp), Z (331 bp) and Z″ (330 bp). Normal (Z°Z°) males showed one band in agarose gel analysis and were easily differentiated from females (Z°W), which showed two bands. However, males heterozygous for CHD‐Z alleles (Z′Z″) unexpectedly showed two bands in a pattern similar to females. While the Z′ and Z″ fragments contained only short deletions, they annealed together during the polymerase chain reaction (PCR) process and formed heteroduplex molecules that were similar in size to the W fragment. Errors previously reported for molecular sex‐assignment have usually been due to allelic dropout, causing females to be misidentified as males. Here, we report evidence that events in PCRs can lead to the opposite error, with males misidentified as females. We recommend use of multiple primer sets and large samples of known‐sex birds for validation when designing protocols for molecular sex analysis.  相似文献   

20.
An electrometrical technique was used to investigate flash-induced electron transfer reactions between Mn-depleted spinach photosystem II core particles incorporated into liposomes and redox mediators. Besides the fast increase in the transmembrane electric potential difference associated with electron transfer between the redox active tyrosine (YZ) and the primary quinone acceptor QA, an additional electrogenic phase was observed in the presence of N,N,NN′-tetramethyl-p-phenylenediamine and 2,6-dichlorophenol-indophenol. The latter phase is attributed to vectorial electron transfer from the redox dye(s) to the protein-embedded YZ. The data obtained suggest an electrically isolated location of the YZ from the external water phase.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号