首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Nie H  Cheng J  Small DS 《Biometrics》2011,67(4):1397-1405
In many clinical studies with a survival outcome, administrative censoring occurs when follow-up ends at a prespecified date and many subjects are still alive. An additional complication in some trials is that there is noncompliance with the assigned treatment. For this setting, we study the estimation of the causal effect of treatment on survival probability up to a given time point among those subjects who would comply with the assignment to both treatment and control. We first discuss the standard instrumental variable (IV) method for survival outcomes and parametric maximum likelihood methods, and then develop an efficient plug-in nonparametric empirical maximum likelihood estimation (PNEMLE) approach. The PNEMLE method does not make any assumptions on outcome distributions, and makes use of the mixture structure in the data to gain efficiency over the standard IV method. Theoretical results of the PNEMLE are derived and the method is illustrated by an analysis of data from a breast cancer screening trial. From our limited mortality analysis with administrative censoring times 10 years into the follow-up, we find a significant benefit of screening is present after 4 years (at the 5% level) and this persists at 10 years follow-up.  相似文献   

2.
Microarray studies, in order to identify genes associated with an outcome of interest, usually produce noisy measurements for a large number of gene expression features from a small number of subjects. One common approach to analyzing such high-dimensional data is to use linear errors-in-variables (EIV) models; however, current methods for fitting such models are computationally expensive. In this paper, we present two efficient screening procedures, namely, corrected penalized marginal screening (PMSc) and corrected sure independence screening (SISc), to reduce the number of variables for final model building. Both screening procedures are based on fitting corrected marginal regression models relating the outcome to each contaminated covariate separately, which can be computed efficiently even with a large number of features. Under mild conditions, we show that these procedures achieve screening consistency and reduce the number of features substantially, even when the number of covariates grows exponentially with sample size. In addition, if the true covariates are weakly correlated, we show that PMSc can achieve full variable selection consistency. Through a simulation study and an analysis of gene expression data for bone mineral density of Norwegian women, we demonstrate that the two new screening procedures make estimation of linear EIV models computationally scalable in high-dimensional settings, and improve finite sample estimation and selection performance compared with estimators that do not employ a screening stage.  相似文献   

3.
For the analysis of ultrahigh-dimensional data, the first step is often to perform screening and feature selection to effectively reduce the dimensionality while retaining all the active or relevant variables with high probability. For this, many methods have been developed under various frameworks but most of them only apply to complete data. In this paper, we consider an incomplete data situation, case II interval-censored failure time data, for which there seems to be no screening procedure. Basing on the idea of cumulative residual, a model-free or nonparametric method is developed and shown to have the sure independent screening property. In particular, the approach is shown to tend to rank the active variables above the inactive ones in terms of their association with the failure time of interest. A simulation study is conducted to demonstrate the usefulness of the proposed method and, in particular, indicates that it works well with general survival models and is capable of capturing the nonlinear covariates with interactions. Also the approach is applied to a childhood cancer survivor study that motivated this investigation.  相似文献   

4.
Houseman EA  Coull BA  Betensky RA 《Biometrics》2006,62(4):1062-1070
Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.  相似文献   

5.
In this article, we address a missing data problem that occurs in transplant survival studies. Recipients of organ transplants are followed up from transplantation and their survival times recorded, together with various explanatory variables. Due to differences in data collection procedures in different centers or over time, a particular explanatory variable (or set of variables) may only be recorded for certain recipients, which results in this variable being missing for a substantial number of records in the data. The variable may also turn out to be an important predictor of survival and so it is important to handle this missing-by-design problem appropriately. Consensus in the literature is to handle this problem with complete case analysis, as the missing data are assumed to arise under an appropriate missing at random mechanism that gives consistent estimates here. Specifically, the missing values can reasonably be assumed not to be related to the survival time. In this article, we investigate the potential for multiple imputation to handle this problem in a relevant study on survival after kidney transplantation, and show that it comprehensively outperforms complete case analysis on a range of measures. This is a particularly important finding in the medical context as imputing large amounts of missing data is often viewed with scepticism.  相似文献   

6.
Statistical models support medical research by facilitating individualized outcome prognostication conditional on independent variables or by estimating effects of risk factors adjusted for covariates. Theory of statistical models is well‐established if the set of independent variables to consider is fixed and small. Hence, we can assume that effect estimates are unbiased and the usual methods for confidence interval estimation are valid. In routine work, however, it is not known a priori which covariates should be included in a model, and often we are confronted with the number of candidate variables in the range 10–30. This number is often too large to be considered in a statistical model. We provide an overview of various available variable selection methods that are based on significance or information criteria, penalized likelihood, the change‐in‐estimate criterion, background knowledge, or combinations thereof. These methods were usually developed in the context of a linear regression model and then transferred to more generalized linear models or models for censored survival data. Variable selection, in particular if used in explanatory modeling where effect estimates are of central interest, can compromise stability of a final model, unbiasedness of regression coefficients, and validity of p‐values or confidence intervals. Therefore, we give pragmatic recommendations for the practicing statistician on application of variable selection methods in general (low‐dimensional) modeling problems and on performing stability investigations and inference. We also propose some quantities based on resampling the entire variable selection process to be routinely reported by software packages offering automated variable selection algorithms.  相似文献   

7.
8.
Current statistical methods for estimating nest survival rates assume that nests are identical in their propensity to succeed. However, there are several biological reasons to question this assumption. For example, experience of the nest builder, number of nest helpers, genetic fitness of individuals, and site effects may contribute to an inherent disparity between nests with respect to their daily mortality rates. Ignoring such heterogeneity can lead to incorrect survival estimates. Our results show that constant survival models can seriously underestimate overall survival in the presence of heterogeneity. This paper presents a flexible random-effects approach to model heterogeneous nest survival data. We illustrate our methods through data on redwing blackbirds.  相似文献   

9.
Beta-lactamase resistant bacteria and especially ESBL producing Enterobacteriaceae are an increasing problem worldwide. For this reason a major interest in efficient and reliable methods for rapid screening of high sample numbers is recognizable. Therefore, a multiplex real-time PCR was developed to detect the predominant class A beta-lactamase genes bla CTX-M, bla SHV, bla TEM and CIT-type AmpCs in a one-step reaction. A set of 114 Enterobacteriaceae containing previously identified resistance gene subtypes and in addition 20 undefined animal and environmental isolates were used for the validation of this assay. To confirm the accessibility in variable settings, the real-time runs were performed analogous in two different laboratories using different real-time cyclers. The obtained results showed complete accordance between the real-time data and the predetermined genotypes. Even if sequence analyses are further necessary for a comprehensive characterization, this method was proofed to be reliable for rapid screening of high sample numbers and therefore could be an important tool for e. g. epidemiological purposes or support infection control measures.  相似文献   

10.
The optimal schedules for breast cancer screening in terms of examination frequency and ages at examination are of practical interest. A decision-theoretic approach is explored to search for optimal cancer screening programs which should achieve maximum survival benefit while balancing the associated cost to the health care system. We propose a class of utility functions that account for costs associated with screening examinations and value of survival benefit under a non-stable disease model. We consider two different optimization criteria: optimize the number of screening examinations with equal screening intervals between exams but without a prefixed total cost; and optimize the ages at which screening should be given for a fixed total cost. We show that an optimal solution exists under each of the two frameworks. The proposed methods may consider women at different levels of risk for breast cancer so that the optimal screening strategies will be tailored according to a woman’s risk of developing the disease. Results of a numerical study are presented and the proposed models are illustrated with various data inputs. We also use the data inputs from the Health Insurance Plan of New York (HIP) and Canadian National Breast Screening Study (CNBSS) to illustrate the proposed models and to compare the utility values between the optimal schedules and the actual schedules in the HIP and CNBSS trials. Here, the utility is defined as the difference in cure rates between cases found at screening examinations and cases found between screening examinations while accounting for the cost of examinations, under a given screening schedule.  相似文献   

11.

Background

Variable selection is an important step in building a multivariate regression model for which several methods and statistical packages are available. A comprehensive approach for variable selection in complex multivariate regression analyses within HIV cohorts is explored by utilizing both epidemiological and biostatistical procedures.

Methods

Three different methods for variable selection were illustrated in a study comparing survival time between subjects in the Department of Defense’s National History Study and the Atlanta Veterans Affairs Medical Center’s HIV Atlanta VA Cohort Study. The first two methods were stepwise selection procedures, based either on significance tests (Score test), or on information theory (Akaike Information Criterion), while the third method employed a Bayesian argument (Bayesian Model Averaging).

Results

All three methods resulted in a similar parsimonious survival model. Three of the covariates previously used in the multivariate model were not included in the final model suggested by the three approaches. When comparing the parsimonious model to the previously published model, there was evidence of less variance in the main survival estimates.

Conclusions

The variable selection approaches considered in this study allowed building a model based on significance tests, on an information criterion, and on averaging models using their posterior probabilities. A parsimonious model that balanced these three approaches was found to provide a better fit than the previously reported model.  相似文献   

12.
Selecting an appropriate variable subset in linear multivariate methods is an important methodological issue for ecologists. Interest often exists in obtaining general predictive capacity or in finding causal inferences from predictor variables. Because of a lack of solid knowledge on a studied phenomenon, scientists explore predictor variables in order to find the most meaningful (i.e. discriminating) ones. As an example, we modelled the response of the amphibious softwater plant Eleocharis multicaulis using canonical discriminant function analysis. We asked how variables can be selected through comparison of several methods: univariate Pearson chi-square screening, principal components analysis (PCA) and step-wise analysis, as well as combinations of some methods. We expected PCA to perform best. The selected methods were evaluated through fit and stability of the resulting discriminant functions and through correlations between these functions and the predictor variables. The chi-square subset, at P < 0.05, followed by a step-wise sub-selection, gave the best results. In contrast to expectations, PCA performed poorly, as so did step-wise analysis. The different chi-square subset methods all yielded ecologically meaningful variables, while probable noise variables were also selected by PCA and step-wise analysis. We advise against the simple use of PCA or step-wise discriminant analysis to obtain an ecologically meaningful variable subset; the former because it does not take into account the response variable, the latter because noise variables are likely to be selected. We suggest that univariate screening techniques are a worthwhile alternative for variable selection in ecology.  相似文献   

13.
Relative survival ratios (RSRs) can be useful for evaluating the impact of changes in cancer care on the prognosis of cancer patients or for comparing the prognosis for different subgroups of patients, but their use is problematic for cancer sites where screening has been introduced due to the potential of lead-time bias. Lead-time is survival time that is added to a patient's survival time because of an earlier diagnosis irrespective of a possibly postponed time of death. In the presence of screening it is difficult to disentangle how much of an observed improvement in survival is real and how much is due to lead-time bias. Even so, RSRs are often presented for breast cancer, a site where screening has led to early diagnosis, with the assumption that the lead-time bias is small. We describe a simulation-based framework for studying the lead-time bias due to mammography screening on RSRs of breast cancer based on a natural history model developed in a Swedish setting. We have performed simulations, using this framework, under different assumptions for screening sensitivity and breast cancer survival with the aim of estimating the lead-time bias. Screening every second year among ages 40–75 was introduced assuming that screening had no effect on survival, except for lead-time bias. Relative survival was estimated both with and without screening to enable quantification of the lead-time bias. Scenarios with low, moderate and high breast cancer survival, and low, moderate and high screening sensitivity were simulated, and the lead-time bias assessed in all scenarios.  相似文献   

14.
15.
In this article scenarios have been developed, which simulate screening effects in ecological and cohort studies of thyroid cancer incidence among Ukrainians, whose thyroids have been exposed to 131I in the aftermath of the Chernobyl accident. If possible, the scenarios were based on directly observed data, such as the population size, dose distributions and thyroid cancer cases. Two scenarios were considered where the screening effect on baseline cases is either equal to or larger than that of radiation-related thyroid cancer cases. For ecological studies in settlements with more than ten measurements of the 131I activity in the human thyroid in May–June 1986, the screening bias appeared small (<19%) for all risk quantities. In the cohort studies, the excess absolute risk per dose was larger by a factor of 4 than in the general population. For an equal screening effect on baseline and radiation-related cancer (Scenario 1) the excess relative risk was about the same as in the general population. However, a differential screening effect (Scenario 2) produced a risk smaller by a factor of 2.5. A comparison with first results of the Ukrainian–US-American cohort study did not give any indication that a differential screening effect has a marked influence on the risk estimates. The differences in the risk estimates from ecological studies and cohort studies were explained by the different screening patterns in the general population and in the much smaller cohort. The present investigations are characterized by dose estimates for many settlements which are very weakly correlated with screening, the confounding variable. The results show that under these conditions ecological studies may provide risk estimates with an acceptable bias.  相似文献   

16.
Delayed separation of survival curves is a common occurrence in confirmatory studies in immuno-oncology. Many novel statistical methods that aim to efficiently capture potential long-term survival improvements have been proposed in recent years. However, the vast majority do not consider stratification, which is a major limitation considering that most large confirmatory studies currently employ a stratified primary analysis. In this article, we combine recently proposed weighted log-rank tests that have been designed to work well under a delayed separation of survival curves, with stratification by a baseline variable. The aim is to increase the efficiency of the test when the stratifying variable is highly prognostic for survival. As there are many potential ways to combine the two techniques, we compare several possibilities in an extensive simulation study. We also apply the techniques retrospectively to two recent randomized clinical trials.  相似文献   

17.
In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.  相似文献   

18.
GlgB (α-1,4-glucan branching enzyme) is the key enzyme involved in the biosynthesis of α-glucan, which plays a significant role in the virulence and pathogenesis of Mycobacterium tuberculosis. Because α-glucans are implicated in the survival of both replicating and non-replicating bacteria, there exists an exigent need for the identification and development of novel inhibitors for targeting enzymes, such as GlgB, involved in this pathway. We have used the existing structural information of M. tuberculosis GlgB for high throughput virtual screening and molecular docking. A diverse database of 330,000 molecules was used for identifying novel and efficacious therapeutic agents for targeting GlgB. We also used three-dimensional shape as well as two-dimensional similarity matrix methods to identify diverse molecular scaffolds that inhibit M. tuberculosis GlgB activity. Virtual hits were generated after structure and ligand-based screening followed by filters based on interaction with human GlgB and in silico pharmacokinetic parameters. These hits were experimentally evaluated and resulted in the discovery of a number of structurally diverse chemical scaffolds that target M. tuberculosis GlgB. Although a number of inhibitors demonstrated in vitro enzyme inhibition, two compounds in particular showed excellent inhibition of in vivo M. tuberculosis survival and its ability to get phagocytosed. This work shows that in silico docking and three-dimensional chemical similarity could be an important therapeutic approach for developing inhibitors to specifically target the M. tuberculosis GlgB enzyme.  相似文献   

19.
20.
White spot syndrome virus (WSSV) is one of the most significant viral pathogens causing high mortality and economic damage in shrimp aquaculture. Although intensive efforts were undertaken to detect and characterize WSSV infection in shrimp during the last decade, we still lack methods either to prevent or cure white spot disease. Most of the studies on neutralizing antibodies from sera have been performed using in vivo assays. For the first time, we report use of an in vitro screening method to obtain a neutralizing scFv antibody against WSSV from a previously constructed anti-WSSV single chain fragment variable region (scFv) antibody phage display library. From clones that were positive for WSSV by ELISA, 1 neutralizing scFv antibody was identified using an in vitro screening method based on shrimp primary lymphoid cell cultures. The availability of a neutralizing antibody against the virus should accelerate identification of infection-related genes and the host cell receptor, and may also enable new approaches to the prevention and cure of white spot disease.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号