首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Estimation of pairwise correlation from incomplete and replicated molecular profiling data is an ubiquitous problem in pattern discovery analysis, such as clustering and networking. However, existing methods solve this problem by ad hoc data imputation, followed by aveGation coefficient type approaches, which might annihilate important patterns present in the molecular profiling data. Moreover, these approaches do not consider and exploit the underlying experimental design information that specifies the replication mechanisms. We develop an Expectation-Maximization (EM) type algorithm to estimate the correlation structure using incomplete and replicated molecular profiling data with a priori known replication mechanism. The approach is sufficiently generalized to be applicable to any known replication mechanism. In case of unknown replication mechanism, it is reduced to the parsimonious model introduced previously. The efficacy of our approach was first evaluated by comprehensively comparing various bivariate and multivariate imputation approaches using simulation studies. Results from real-world data analysis further confirmed the superior performance of the proposed approach to the commonly used approaches, where we assessed the robustness of the method using data sets with up to 30 percent missing values.  相似文献   

2.
In cohort studies the outcome is often time to a particular event, and subjects are followed at regular intervals. Periodic visits may also monitor a secondary irreversible event influencing the event of primary interest, and a significant proportion of subjects develop the secondary event over the period of follow‐up. The status of the secondary event serves as a time‐varying covariate, but is recorded only at the times of the scheduled visits, generating incomplete time‐varying covariates. While information on a typical time‐varying covariate is missing for entire follow‐up period except the visiting times, the status of the secondary event are unavailable only between visits where the status has changed, thus interval‐censored. One may view interval‐censored covariate of the secondary event status as missing time‐varying covariates, yet missingness is partial since partial information is provided throughout the follow‐up period. Current practice of using the latest observed status produces biased estimators, and the existing missing covariate techniques cannot accommodate the special feature of missingness due to interval censoring. To handle interval‐censored covariates in the Cox proportional hazards model, we propose an available‐data estimator, a doubly robust‐type estimator as well as the maximum likelihood estimator via EM algorithm and present their asymptotic properties. We also present practical approaches that are valid. We demonstrate the proposed methods using our motivating example from the Northern Manhattan Study.  相似文献   

3.
We consider the problem of estimating the intensity functions for a continuous time 'illness-death' model with intermittently observed data. In such a case, it may happen that a subject becomes diseased between two visits and dies without being observed. Consequently, there is an uncertainty about the precise number of transitions. Estimating the intensity of transition from health to illness by survival analysis (treating death as censoring) is biased downwards. Furthermore, the dates of transitions between states are not known exactly. We propose to estimate the intensity functions by maximizing a penalized likelihood. The method yields smooth estimates without parametric assumptions. This is illustrated using data from a large cohort study on cerebral ageing. The age-specific incidence of dementia is estimated using an illness-death approach and a survival approach.  相似文献   

4.
Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where “predictable” indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis.  相似文献   

5.
BackgroundPopulation-based net survival by tumour stage at diagnosis is a key measure in cancer surveillance. Unfortunately, data on tumour stage are often missing for a non-negligible proportion of patients and the mechanism giving rise to the missingness is usually anything but completely at random. In this setting, restricting analysis to the subset of complete records gives typically biased results. Multiple imputation is a promising practical approach to the issues raised by the missing data, but its use in conjunction with the Pohar-Perme method for estimating net survival has not been formally evaluated.MethodsWe performed a resampling study using colorectal cancer population-based registry data to evaluate the ability of multiple imputation, used along with the Pohar-Perme method, to deliver unbiased estimates of stage-specific net survival and recover missing stage information. We created 1000 independent data sets, each containing 5000 patients. Stage data were then made missing at random under two scenarios (30% and 50% missingness).ResultsComplete records analysis showed substantial bias and poor confidence interval coverage. Across both scenarios our multiple imputation strategy virtually eliminated the bias and greatly improved confidence interval coverage.ConclusionsIn the presence of missing stage data complete records analysis often gives severely biased results. We showed that combining multiple imputation with the Pohar-Perme estimator provides a valid practical approach for the estimation of stage-specific colorectal cancer net survival. As usual, when the percentage of missing data is high the results should be interpreted cautiously and sensitivity analyses are recommended.  相似文献   

6.
Environmental DNA (eDNA) sampling is prone to both false‐positive and false‐negative errors. We review statistical methods to account for such errors in the analysis of eDNA data and use simulations to compare the performance of different modelling approaches. Our simulations illustrate that even low false‐positive rates can produce biased estimates of occupancy and detectability. We further show that removing or classifying single PCR detections in an ad hoc manner under the suspicion that such records represent false positives, as sometimes advocated in the eDNA literature, also results in biased estimation of occupancy, detectability and false‐positive rates. We advocate alternative approaches to account for false‐positive errors that rely on prior information, or the collection of ancillary detection data at a subset of sites using a sampling method that is not prone to false‐positive errors. We illustrate the advantages of these approaches over ad hoc classifications of detections and provide practical advice and code for fitting these models in maximum likelihood and Bayesian frameworks. Given the severe bias induced by false‐negative and false‐positive errors, the methods presented here should be more routinely adopted in eDNA studies.  相似文献   

7.
To test for association between a disease and a set of linked markers, or to estimate relative risks of disease, several different methods have been developed. Many methods for family data require that individuals be genotyped at the full set of markers and that phase can be reconstructed. Individuals with missing data are excluded from the analysis. This can result in an important decrease in sample size and a loss of information. A possible solution to this problem is to use missing-data likelihood methods. We propose an alternative approach, namely the use of multiple imputation. Briefly, this method consists in estimating from the available data all possible phased genotypes and their respective posterior probabilities. These posterior probabilities are then used to generate replicate imputed data sets via a data augmentation algorithm. We performed simulations to test the efficiency of this approach for case/parent trio data and we found that the multiple imputation procedure generally gave unbiased parameter estimates with correct type 1 error and confidence interval coverage. Multiple imputation had some advantages over missing data likelihood methods with regards to ease of use and model flexibility. Multiple imputation methods represent promising tools in the search for disease susceptibility variants.  相似文献   

8.
Zhiguo Li  Peter Gilbert  Bin Nan 《Biometrics》2008,64(4):1247-1255
Summary Grouped failure time data arise often in HIV studies. In a recent preventive HIV vaccine efficacy trial, immune responses generated by the vaccine were measured from a case–cohort sample of vaccine recipients, who were subsequently evaluated for the study endpoint of HIV infection at prespecified follow‐up visits. Gilbert et al. (2005, Journal of Infectious Diseases 191 , 666–677) and Forthal et al. (2007, Journal of Immunology 178, 6596–6603) analyzed the association between the immune responses and HIV incidence with a Cox proportional hazards model, treating the HIV infection diagnosis time as a right‐censored random variable. The data, however, are of the form of grouped failure time data with case–cohort covariate sampling, and we propose an inverse selection probability‐weighted likelihood method for fitting the Cox model to these data. The method allows covariates to be time dependent, and uses multiple imputation to accommodate covariate data that are missing at random. We establish asymptotic properties of the proposed estimators, and present simulation results showing their good finite sample performance. We apply the method to the HIV vaccine trial data, showing that higher antibody levels are associated with a lower hazard of HIV infection.  相似文献   

9.
Longitudinal data often encounter missingness with monotone and/or intermittent missing patterns. Multiple imputation (MI) has been popularly employed for analysis of missing longitudinal data. In particular, the MI‐GEE method has been proposed for inference of generalized estimating equations (GEE) when missing data are imputed via MI. However, little is known about how to perform model selection with multiply imputed longitudinal data. In this work, we extend the existing GEE model selection criteria, including the “quasi‐likelihood under the independence model criterion” (QIC) and the “missing longitudinal information criterion” (MLIC), to accommodate multiple imputed datasets for selection of the MI‐GEE mean model. According to real data analyses from a schizophrenia study and an AIDS study, as well as simulations under nonmonotone missingness with moderate proportion of missing observations, we conclude that: (i) more than a few imputed datasets are required for stable and reliable model selection in MI‐GEE analysis; (ii) the MI‐based GEE model selection methods with a suitable number of imputations generally perform well, while the naive application of existing model selection methods by simply ignoring missing observations may lead to very poor performance; (iii) the model selection criteria based on improper (frequentist) multiple imputation generally performs better than their analogies based on proper (Bayesian) multiple imputation.  相似文献   

10.
The theory of competing risks has been developed to asses a specific risk in presence of other risk factors. In this paper we consider the parametric estimation of different failure modes under partially complete time and type of failure data using latent failure times and cause specific hazard functions models. Uniformly minimum variance unbiased estimators and maximum likelihood estimators are obtained when latent failure times and cause specific hazard functions are exponentially distributed. We also consider the case when they follow Weibull distributions. One data set is used to illustrate the proposed techniques. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
Individuals may experience more than one type of recurrent event and a terminal event during the life course of a disease. Follow‐up may be interrupted for several reasons, including the end of a study, or patients lost to follow‐up, which are noninformative censoring events. Death could also stop the follow‐up, hence, it is considered as a dependent terminal event. We propose a multivariate frailty model that jointly analyzes two types of recurrent events with a dependent terminal event. Two estimation methods are proposed: a semiparametrical approach using penalized likelihood estimation where baseline hazard functions are approximated by M‐splines, and another one with piecewise constant baseline hazard functions. Finally, we derived martingale residuals to check the goodness‐of‐fit. We illustrate our proposals with a real dataset on breast cancer. The main objective was to model the dependency between the two types of recurrent events (locoregional and metastatic) and the terminal event (death) after a breast cancer.  相似文献   

12.
Summary In estimation of the ROC curve, when the true disease status is subject to nonignorable missingness, the observed likelihood involves the missing mechanism given by a selection model. In this article, we proposed a likelihood‐based approach to estimate the ROC curve and the area under the ROC curve when the verification bias is nonignorable. We specified a parametric disease model in order to make the nonignorable selection model identifiable. With the estimated verification and disease probabilities, we constructed four types of empirical estimates of the ROC curve and its area based on imputation and reweighting methods. In practice, a reasonably large sample size is required to estimate the nonignorable selection model in our settings. Simulation studies showed that all four estimators of ROC area performed well, and imputation estimators were generally more efficient than the other estimators proposed. We applied the proposed method to a data set from research in Alzheimer's disease.  相似文献   

13.
14.
For confidentiality reasons, US federal death certificate data are incomplete with regards to the dates of birth and death for the decedents, making calculation of total lifetime of a decedent impossible and thus estimation of mortality incidence difficult. This paper proposes the use of natality data and an imputation‐based method to estimate age‐specific mortality incidence rates in the face of this missing information. By utilizing previously determined probabilities of birth, a birth date and death date are imputed for every decedent in the dataset. Thus, the birth cohort of each individual is imputed, and the total on‐study time can be calculated. This idea is implemented in two approaches for estimation of mortality incidence rates. The first is an extension of a person‐time approach, while the second is an extension of a life table approach. Monte Carlo simulations showed that both approaches perform well in comparison to the ideal complete data methods, but that the person‐time method is preferred. An application to Tay–Sachs disease is demonstrated. It is concluded that the imputation methods proposed provide valid estimates of the incidence of death from death certificate data without the need for additional assumptions under which usual mortality rates provide valid estimates.  相似文献   

15.
Summary The power bias model, a generalization of length‐biased sampling, is introduced and investigated in detail. In particular, attention is focused on order‐restricted inference. We show that the power bias model is an example of the density ratio model, or in other words, it is a semiparametric model that is specified by assuming that the ratio of several unknown probability density functions has a parametric form. Estimation and testing procedures under constraints are developed in detail. It is shown that the power bias model can be used for testing for, or against, the likelihood ratio ordering among multiple populations without resorting to any parametric assumptions. Examples and real data analysis demonstrate the usefulness of this approach.  相似文献   

16.
Summary Logistic regression is an important statistical procedure used in many disciplines. The standard software packages for data analysis are generally equipped with this procedure where the maximum likelihood estimates of the regression coefficients are obtained iteratively. It is well known that the estimates from the analyses of small‐ or medium‐sized samples are biased. Also, in finding such estimates, often a separation is encountered in which the likelihood converges but at least one of the parameter estimates diverges to infinity. Standard approaches of finding such estimates do not take care of these problems. Moreover, the missingness in the covariates adds an extra layer of complexity to the whole process. In this article, we address these three practical issues—bias, separation, and missing covariates by means of simple adjustments. We have applied the proposed technique using real and simulated data. The proposed method always finds a solution and the estimates are less biased. A SAS macro that implements the proposed method can be obtained from the authors.  相似文献   

17.
Chen HY  Xie H  Qian Y 《Biometrics》2011,67(3):799-809
Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubin's rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Markov chain Monte Carlo sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data.  相似文献   

18.
Latent class regression (LCR) is a popular method for analyzing multiple categorical outcomes. While nonresponse to the manifest items is a common complication, inferences of LCR can be evaluated using maximum likelihood, multiple imputation, and two‐stage multiple imputation. Under similar missing data assumptions, the estimates and variances from all three procedures are quite close. However, multiple imputation and two‐stage multiple imputation can provide additional information: estimates for the rates of missing information. The methodology is illustrated using an example from a study on racial and ethnic disparities in breast cancer severity.  相似文献   

19.
Fully Bayesian methods for Cox models specify a model for the baseline hazard function. Parametric approaches generally provide monotone estimations. Semi‐parametric choices allow for more flexible patterns but they can suffer from overfitting and instability. Regularization methods through prior distributions with correlated structures usually give reasonable answers to these types of situations. We discuss Bayesian regularization for Cox survival models defined via flexible baseline hazards specified by a mixture of piecewise constant functions and by a cubic B‐spline function. For those “semi‐parametric” proposals, different prior scenarios ranging from prior independence to particular correlated structures are discussed in a real study with microvirulence data and in an extensive simulation scenario that includes different data sample and time axis partition sizes in order to capture risk variations. The posterior distribution of the parameters was approximated using Markov chain Monte Carlo methods. Model selection was performed in accordance with the deviance information criteria and the log pseudo‐marginal likelihood. The results obtained reveal that, in general, Cox models present great robustness in covariate effects and survival estimates independent of the baseline hazard specification. In relation to the “semi‐parametric” baseline hazard specification, the B‐splines hazard function is less dependent on the regularization process than the piecewise specification because it demands a smaller time axis partition to estimate a similar behavior of the risk.  相似文献   

20.
In longitudinal randomised trials and observational studies within a medical context, a composite outcome—which is a function of several individual patient-specific outcomes—may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号