首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Reiter  Jerome P. 《Biometrika》2007,94(2):502-508
When performing multi-component significance tests with multiply-imputeddatasets, analysts can use a Wald-like test statistic and areference F-distribution. The currently employed degrees offreedom in the denominator of this F-distribution are derivedassuming an infinite sample size. For modest complete-data samplesizes, this degrees of freedom can be unrealistic; for example,it may exceed the complete-data degrees of freedom. This paperpresents an alternative denominator degrees of freedom thatis always less than or equal to the complete-data denominatordegrees of freedom, and equals the currently employed denominatordegrees of freedom for infinite sample sizes. Its advantagesover the currently employed degrees of freedom are illustratedwith a simulation.  相似文献   

2.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

3.
4.
5.
BackgroundCancer stage can be missing in national cancer registry records. We explored whether missing prostate cancer stage can be imputed using specific clinical assumptions.MethodsProstate cancer patients diagnosed between 2010 and 2013 were identified in English cancer registry data and linked to administrative hospital and mortality data (n = 139,807). Missing staging items were imputed based on specific assumptions: men with recorded N-stage but missing M-stage have no distant metastases (M0); low/intermediate-risk men with missing N- and/or M-stage have no nodal disease (N0) or metastases; and high-risk men with missing M-stage have no metastases. We tested these clinical assumptions by comparing 4-year survival in men with the same recorded and imputed cancer stage. Multi-variable Cox regression was used to test the validity of the clinical assumptions and multiple imputation.ResultsSurvival was similar for men with recorded N-stage but missing M-stage and corresponding men with M0 (89.5% vs 89.6%); for low/intermediate-risk men with missing M-stage and corresponding men with M0 (92.0% vs 93.1%); and for low/intermediate-risk men with missing N-stage and corresponding men with N0 (90.9% vs 93.7%). However, survival was different for high-risk men with missing M-stage and corresponding men with M0. Imputation based on clinical imputation performs as well as statistical multiple imputation.ConclusionSpecific clinical assumptions can be used to impute missing information on nodal involvement and distant metastases in some patients with prostate cancer.  相似文献   

6.
In this paper, we consider mean comparisons for paired samples in which a certain portion of the observations are missing. This type of data commonly arises in medical researches where the outcomes are assessed at two time points after the application of treatments. New methods for statistical inference are proposed by making finiteness correction based on asymptotic expansions of some intuitive statistics. The comparison methods naturally extend to the two‐group case after some suitable manipulations. Simulation study is carried out to demonstrate the numerical accuracy of the proposed methods. Data from a smoking‐cessation trial are used to illustrate the application of the methods.  相似文献   

7.
8.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

9.
Lee SM  Gee MJ  Hsieh SH 《Biometrics》2011,67(3):788-798
Summary We consider the estimation problem of a proportional odds model with missing covariates. Based on the validation and nonvalidation data sets, we propose a joint conditional method that is an extension of Wang et al. (2002, Statistica Sinica 12, 555–574). The proposed method is semiparametric since it requires neither an additional model for the missingness mechanism, nor the specification of the conditional distribution of missing covariates given observed variables. Under the assumption that the observed covariates and the surrogate variable are categorical, we derived the large sample property. The simulation studies show that in various situations, the joint conditional method is more efficient than the conditional estimation method and weighted method. We also use a real data set that came from a survey of cable TV satisfaction to illustrate the approaches.  相似文献   

10.
11.
AZZALINI  A. 《Biometrika》1994,81(4):767-775
  相似文献   

12.
BackgroundTreatment by immune checkpoint blockade (ICB) provides a remarkable survival benefit for multiple cancer types. However, disease aggravation occurs in a proportion of patients after the first couple of treatment cycles.MethodsRNA sequencing data was retrospectively collected. 6 tumour-immune related features were extracted and combined to build a lung cancer-specific predictive model to distinguish responses as progression disease (PD) or non-PD. This model was trained by 3 public pan-cancer datasets and a lung cancer cohort from our institute, and generated a lung cancer-specific integrated gene expression score, which we call LITES. It was finally tested in another lung cancer dataset.ResultsLITES is a promising predictor for checkpoint blockade (area under the curve [AUC]=0.86), superior to traditional biomarkers. It is independent of PD-L1 expression and tumour mutation burden. The sensitivity and specificity of LITES was 85.7% and 70.6%, respectively. Progression free survival (PFS) was longer in high-score group than in low-score group (median PFS: 6.0 vs. 2.4 months, hazard ratio=0.45, P=0.01). The mean AUC of 6 features was 0.70 (range=0.61-0.75), lower than in LITES, indicating that the combination of features had synergistic effects. Among the genes identified in the features, patients with high expression of NRAS and PDPK1 tended to have a PD response (P=0.001 and 0.01, respectively). Our model also functioned well for patients with advanced melanoma and was specific for ICB therapy.ConclusionsLITES is a promising biomarker for predicting an impaired response in lung cancer patients and for clarifying the biological mechanism underlying ICB therapy.  相似文献   

13.
BackgroundDiagnostic timeliness in cancer patients is important for clinical outcomes and patient satisfaction but, to-date, continuous monitoring of diagnostic intervals in nationwide incident cohorts has been impossible in England.MethodsWe developed a new methodology for measuring the secondary care diagnostic interval (SCDI - first relevant secondary care contact to diagnosis) using linked cancer registration and healthcare utilisation data. Using this method, we subsequently examined diagnostic timeliness in colorectal and lung cancer patients (2014–15) by socio-demographic characteristics, diagnostic route and stage at diagnosis.ResultsThe approach assigned SCDIs to 94.4% of all incident colorectal cancer cases [median length (90th centile) of 25 (104) days] and 95.3% of lung cancer cases [36 (144) days]. Advanced stage patients had shorter intervals (median, colorectal: stage 1 vs 4 - 34 vs 19 days; lung stage 1&2 vs 3B&4 - 70 vs 27 days). Routinely referred patients had the longest (colorectal: 61, lung: 69 days) and emergency presenters the shortest intervals (colorectal: 3, lung: 14 days). Comorbidities and additional diagnostic tests were also associated with longer intervals.ConclusionThis new method can enable repeatable nationwide measurement of cancer diagnostic timeliness in England and identifies actionable variation to inform early diagnosis interventions and target future research.  相似文献   

14.
Receiver operating characteristic (ROC) curve is commonly used to evaluate and compare the accuracy of classification methods or markers. Estimating ROC curves has been an important problem in various fields including biometric recognition and diagnostic medicine. In real applications, classification markers are often developed under two or more ordered conditions, such that a natural stochastic ordering exists among the observations. Incorporating such a stochastic ordering into estimation can improve statistical efficiency (Davidov and Herman, 2012). In addition, clustered and correlated data arise when multiple measurements are gleaned from the same subject, making estimation of ROC curves complicated due to within-cluster correlations. In this article, we propose to model the ROC curve using a weighted empirical process to jointly account for the order constraint and within-cluster correlation structure. The algebraic properties of resulting summary statistics of the ROC curve such as its area and partial area are also studied. The algebraic expressions reduce to the ones by Davidov and Herman (2012) for independent observations. We derive asymptotic properties of the proposed order-restricted estimators and show that they have smaller mean-squared errors than the existing estimators. Simulation studies also demonstrate better performance of the newly proposed estimators over existing methods for finite samples. The proposed method is further exemplified with the fingerprint matching data from the National Institute of Standards and Technology Special Database 4.  相似文献   

15.
Receiver operating characteristic (ROC) analysis is widely used to assess the ability of diagnostic markers to correctly classify into one of two disease classes. ROC surfaces and umbrella surfaces generalize the utility of ROC analysis when there are three disease classes. Identification of lung cancer diagnostic markers is an active area of research since prognosis for those diagnosed with lung cancer is so poor and there is not an accurate method for early detection of lung cancer. A study conducted for the assessment of DNA methylation markers motivated the comparison of ROC umbrella surfaces which is developed in this article using U-statistics and bootstrap methodology.  相似文献   

16.
We develop hierarchical models for spatial multinomial data with missing categories, to analyse a database of HLA-A and -B gene and haplotype frequencies from Papua New Guinea, with a highly variable number of samples per spatial unit. The spatial structure of the multinomial data is incorporated by adopting conditional autoregressive (CAR) priors for the random effects, reflecting extra-multinomial variation. Different spatial structures are investigated, and covariate effects are evaluated using a novel model selection criterion. Tables and maps reveal strong spatial association and the importance of altitude, a covariate anticipated to be significant in explaining genetic variation. Our approach can be used in identifying associations with environmental factors, linguistic or epidemiological patterns and hence potential causes of genetic diversity (population movements, natural selection, stochastic effects).  相似文献   

17.
Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC).We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-validation strategies (CV) for evaluating the ML predictive model performances with not so large datasets.We carried out two classification tasks: histology classification (3 classes) and overall stage classification (two classes: stage I and II). In the first task, the best performance was obtained by a Random Forest classifier, once the analysis has been restricted to stage I and II tumors of the Lung1 and L-RT merged dataset (AUC = 0.72 ± 0.11). For the overall stage classification, the best results were obtained when training on Lung1 and testing of L-RT dataset (AUC = 0.72 ± 0.04 for Random Forest and AUC = 0.84 ± 0.03 for linear-kernel Support Vector Machine).According to the classification task to be accomplished and to the heterogeneity of the available dataset(s), different CV strategies have to be explored and compared to make a robust assessment of the potential of a predictive model based on radiomics and ML.  相似文献   

18.
19.
20.
In an experiment to understand colon carcinogenesis, all animals were exposed to a carcinogen, with half the animals also being exposed to radiation. Spatially, we measured the existence of what are referred to as aberrant crypt foci (ACF), namely, morphologically changed colonic crypts that are known to be precursors of colon cancer development. The biological question of interest is whether the locations of these ACFs are spatially correlated: if so, this indicates that damage to the colon due to carcinogens and radiation is localized. Statistically, the data take the form of binary outcomes (corresponding to the existence of an ACF) on a regular grid. We develop score-type methods based upon the Matern and conditionally autoregressive (CAR) correlation models to test for the spatial correlation in such data, while allowing for nonstationarity. Because of a technical peculiarity of the score-type test, we also develop robust versions of the method. The methods are compared to a generalization of Moran's test for continuous outcomes, and are shown via simulation to have the potential for increased power. When applied to our data, the methods indicate the existence of spatial correlation, and hence indicate localization of damage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号