期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Targeted minimum loss based estimator that outperforms a given estimator

Gruber S van der Laan MJ 《The international journal of biostatistics》2012,8(1):Article 11

Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model (van der Laan and Rubin (2006), van der Laan (2008), van der Laan and Rose (2011)). In this article we demonstrate how to construct a TMLE that also satisfies the property that it is at least as efficient as a user supplied asymptotically linear estimator. In particular it is shown that this type of TMLE can incorporate empirical efficiency maximization as in Rubin and van der Laan (2008), Tan (2008, 2010), Rotnitzky et al. (2012), and retain double robustness. For the sake of illustration we focus on estimation of the additive average causal effect of a point treatment on an outcome, adjusting for baseline covariates. 相似文献

2.

Targeted maximum likelihood estimation for dynamic treatment regimes in sequentially randomized controlled trials

Chaffee PH van der Laan MJ 《The international journal of biostatistics》2012,8(1):Article 14

Sequential Randomized Controlled Trials (SRCTs) are rapidly becoming essential tools in the search for optimized treatment regimes in ongoing treatment settings. Analyzing data for multiple time-point treatments with a view toward optimal treatment regimes is of interest in many types of afflictions: HIV infection, Attention Deficit Hyperactivity Disorder in children, leukemia, prostate cancer, renal failure, and many others. Methods for analyzing data from SRCTs exist but they are either inefficient or suffer from the drawbacks of estimating equation methodology. We describe an estimation procedure, targeted maximum likelihood estimation (TMLE), which has been fully developed and implemented in point treatment settings, including time to event outcomes, binary outcomes and continuous outcomes. Here we develop and implement TMLE in the SRCT setting. As in the former settings, the TMLE procedure is targeted toward a pre-specified parameter of the distribution of the observed data, and thereby achieves important bias reduction in estimation of that parameter. As with the so-called Augmented Inverse Probability of Censoring Weight (A-IPCW) estimator, TMLE is double-robust and locally efficient. We report simulation results corresponding to two data-generating distributions from a longitudinal data structure. 相似文献

3.

Efficient targeted learning of heterogeneous treatment effects for multiple subgroups

Waverly Wei Maya Petersen Mark J van der Laan Zeyu Zheng Chong Wu Jingshen Wang 《Biometrics》2023,79(3):1934-1946

In biomedical science, analyzing treatment effect heterogeneity plays an essential role in assisting personalized medicine. The main goals of analyzing treatment effect heterogeneity include estimating treatment effects in clinically relevant subgroups and predicting whether a patient subpopulation might benefit from a particular treatment. Conventional approaches often evaluate the subgroup treatment effects via parametric modeling and can thus be susceptible to model mis-specifications. In this paper, we take a model-free semiparametric perspective and aim to efficiently evaluate the heterogeneous treatment effects of multiple subgroups simultaneously under the one-step targeted maximum-likelihood estimation (TMLE) framework. When the number of subgroups is large, we further expand this path of research by looking at a variation of the one-step TMLE that is robust to the presence of small estimated propensity scores in finite samples. From our simulations, our method demonstrates substantial finite sample improvements compared to conventional methods. In a case study, our method unveils the potential treatment effect heterogeneity of rs12916-T allele (a proxy for statin usage) in decreasing Alzheimer's disease risk. 相似文献

4.

Targeted maximum likelihood estimation of effect modification parameters in survival analysis

Stitelman OM Wester CW De Gruttola V van der Laan MJ 《The international journal of biostatistics》2011,7(1):19

The Cox proportional hazards model or its discrete time analogue, the logistic failure time model, posit highly restrictive parametric models and attempt to estimate parameters which are specific to the model proposed. These methods are typically implemented when assessing effect modification in survival analyses despite their flaws. The targeted maximum likelihood estimation (TMLE) methodology is more robust than the methods typically implemented and allows practitioners to estimate parameters that directly answer the question of interest. TMLE will be used in this paper to estimate two newly proposed parameters of interest that quantify effect modification in the time to event setting. These methods are then applied to the Tshepo study to assess if either gender or baseline CD4 level modify the effect of two cART therapies of interest, efavirenz (EFV) and nevirapine (NVP), on the progression of HIV. The results show that women tend to have more favorable outcomes using EFV while males tend to have more favorable outcomes with NVP. Furthermore, EFV tends to be favorable compared to NVP for individuals at high CD4 levels. 相似文献

5.

Nonparametric estimation of the causal effect of a stochastic threshold-based intervention

Lars van der Laan Wenbo Zhang Peter B. Gilbert 《Biometrics》2023,79(2):1014-1028

Identifying a biomarker or treatment-dose threshold that marks a specified level of risk is an important problem, especially in clinical trials. In view of this goal, we consider a covariate-adjusted threshold-based interventional estimand, which happens to equal the binary treatment–specific mean estimand from the causal inference literature obtained by dichotomizing the continuous biomarker or treatment as above or below a threshold. The unadjusted version of this estimand was considered in Donovan et al.. Expanding upon Stitelman et al., we show that this estimand, under conditions, identifies the expected outcome of a stochastic intervention that sets the treatment dose of all participants above the threshold. We propose a novel nonparametric efficient estimator for the covariate-adjusted threshold-response function for the case of informative outcome missingness, which utilizes machine learning and targeted minimum-loss estimation (TMLE). We prove the estimator is efficient and characterize its asymptotic distribution and robustness properties. Construction of simultaneous 95% confidence bands for the threshold-specific estimand across a set of thresholds is discussed. In the Supporting Information, we discuss how to adjust our estimator when the biomarker is missing at random, as occurs in clinical trials with biased sampling designs, using inverse probability weighting. Efficiency and bias reduction of the proposed estimator are assessed in simulations. The methods are employed to estimate neutralizing antibody thresholds for virologically confirmed dengue risk in the CYD14 and CYD15 dengue vaccine trials. 相似文献

6.

A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome

Gruber S van der Laan MJ 《The international journal of biostatistics》2010,6(1):Article 26

Targeted maximum likelihood estimation of a parameter of a data generating distribution, known to be an element of a semi-parametric model, involves constructing a parametric model through an initial density estimator with parameter ? representing an amount of fluctuation of the initial density estimator, where the score of this fluctuation model at ? = 0 equals the efficient influence curve/canonical gradient. The latter constraint can be satisfied by many parametric fluctuation models since it represents only a local constraint of its behavior at zero fluctuation. However, it is very important that the fluctuations stay within the semi-parametric model for the observed data distribution, even if the parameter can be defined on fluctuations that fall outside the assumed observed data model. In particular, in the context of sparse data, by which we mean situations where the Fisher information is low, a violation of this property can heavily affect the performance of the estimator. This paper presents a fluctuation approach that guarantees the fluctuated density estimator remains inside the bounds of the data model. We demonstrate this in the context of estimation of a causal effect of a binary treatment on a continuous outcome that is bounded. It results in a targeted maximum likelihood estimator that inherently respects known bounds, and consequently is more robust in sparse data situations than the targeted MLE using a naive fluctuation model. When an estimation procedure incorporates weights, observations having large weights relative to the rest heavily influence the point estimate and inflate the variance. Truncating these weights is a common approach to reducing the variance, but it can also introduce bias into the estimate. We present an alternative targeted maximum likelihood estimation (TMLE) approach that dampens the effect of these heavily weighted observations. As a substitution estimator, TMLE respects the global constraints of the observed data model. For example, when outcomes are binary, a fluctuation of an initial density estimate on the logit scale constrains predicted probabilities to be between 0 and 1. This inherent enforcement of bounds has been extended to continuous outcomes. Simulation study results indicate that this approach is on a par with, and many times superior to, fluctuating on the linear scale, and in particular is more robust when there is sparsity in the data. 相似文献

7.

The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis

Yu-Wen Wen Yi-Wen Tsai David Bin-Chia Wu Pei-Fen Chen 《PloS one》2013,8(6)

Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey''s bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing.Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making. 相似文献

8.

A semiparametric odds ratio model for measuring association

Yun Chen H 《Biometrics》2007,63(2):413-421

We propose a semiparametric odds ratio model to measure the association between two variables taking discrete values, continuous values, or a mixture of both. Methods for estimation and inference with varying degrees of robustness to model assumptions are studied. Semiparametric efficient estimation and inference procedures are also considered. The estimation methods are compared in a simulation study and applied to the study of associations among genital tract bacterial counts in HIV infected women. 相似文献

9.

Effects of sampling close relatives on some elementary population genetics analyses

下载免费PDF全文

Jinliang Wang 《Molecular ecology resources》2018,18(1):41-54

Many molecular ecology analyses assume the genotyped individuals are sampled at random from a population and thus are representative of the population. Realistically, however, a sample may contain excessive close relatives (ECR) because, for example, localized juveniles are drawn from fecund species. Our knowledge is limited about how ECR affect the routinely conducted elementary genetics analyses, and how ECR are best dealt with to yield unbiased and accurate parameter estimates. This study quantifies the effects of ECR on some popular population genetics analyses of marker data, including the estimation of allele frequencies, F‐statistics, expected heterozygosity (H_e), effective and observed numbers of alleles, and the tests of Hardy–Weinberg equilibrium (HWE) and linkage equilibrium (LE). It also investigates several strategies for handling ECR to mitigate their impact and to yield accurate parameter estimates. My analytical work, assisted by simulations, shows that ECR have large and global effects on all of the above marker analyses. The naïve approach of simply ignoring ECR could yield low‐precision and often biased parameter estimates, and could cause too many false rejections of HWE and LE. The bold approach, which simply identifies and removes ECR, and the cautious approach, which estimates target parameters (e.g., H_e) by accounting for ECR and using naïve allele frequency estimates, eliminate the bias and the false HWE and LE rejections, but could reduce estimation precision substantially. The likelihood approach, which accounts for ECR in estimating allele frequencies and thus target parameters relying on allele frequencies, usually yields unbiased and the most accurate parameter estimates. Which of the four approaches is the most effective and efficient may depend on the particular marker analysis to be conducted. The results are discussed in the context of using marker data for understanding population properties and marker properties. 相似文献

10.

Shrinkage Estimators for Covariance Matrices 总被引：1，自引：0，他引：1

Michael J. Daniels Robert E. Kass 《Biometrics》2001,57(4):1173-1184

Estimation of covariance matrices in small samples has been studied by many authors. Standard estimators, like the unstructured maximum likelihood estimator (ML) or restricted maximum likelihood (REML) estimator, can be very unstable with the smallest estimated eigenvalues being too small and the largest too big. A standard approach to more stably estimating the matrix in small samples is to compute the ML or REML estimator under some simple structure that involves estimation of fewer parameters, such as compound symmetry or independence. However, these estimators will not be consistent unless the hypothesized structure is correct. If interest focuses on estimation of regression coefficients with correlated (or longitudinal) data, a sandwich estimator of the covariance matrix may be used to provide standard errors for the estimated coefficients that are robust in the sense that they remain consistent under misspecification of the covariance structure. With large matrices, however, the inefficiency of the sandwich estimator becomes worrisome. We consider here two general shrinkage approaches to estimating the covariance matrix and regression coefficients. The first involves shrinking the eigenvalues of the unstructured ML or REML estimator. The second involves shrinking an unstructured estimator toward a structured estimator. For both cases, the data determine the amount of shrinkage. These estimators are consistent and give consistent and asymptotically efficient estimates for regression coefficients. Simulations show the improved operating characteristics of the shrinkage estimators of the covariance matrix and the regression coefficients in finite samples. The final estimator chosen includes a combination of both shrinkage approaches, i.e., shrinking the eigenvalues and then shrinking toward structure. We illustrate our approach on a sleep EEG study that requires estimation of a 24 x 24 covariance matrix and for which inferences on mean parameters critically depend on the covariance estimator chosen. We recommend making inference using a particular shrinkage estimator that provides a reasonable compromise between structured and unstructured estimators. 相似文献

11.

Probabilistic segmentation and intensity estimation for microarray images

Gottardo R Besag J Stephens M Murua A 《Biostatistics (Oxford, England)》2006,7(1):85-99

We describe a probabilistic approach to simultaneous image segmentation and intensity estimation for complementary DNA microarray experiments. The approach overcomes several limitations of existing methods. In particular, it (a) uses a flexible Markov random field approach to segmentation that allows for a wider range of spot shapes than existing methods, including relatively common 'doughnut-shaped' spots; (b) models the image directly as background plus hybridization intensity, and estimates the two quantities simultaneously, avoiding the common logical error that estimates of foreground may be less than those of the corresponding background if the two are estimated separately; and (c) uses a probabilistic modeling approach to simultaneously perform segmentation and intensity estimation, and to compute spot quality measures. We describe two approaches to parameter estimation: a fast algorithm, based on the expectation-maximization and the iterated conditional modes algorithms, and a fully Bayesian framework. These approaches produce comparable results, and both appear to offer some advantages over other methods. We use an HIV experiment to compare our approach to two commercial software products: Spot and Arrayvision. 相似文献

12.

Genomic clocks and evolutionary timescales

Blair Hedges S Kumar S 《Trends in genetics : TIG》2003,19(4):200-206

For decades, molecular clocks have helped to illuminate the evolutionary timescale of life, but now genomic data pose a challenge for time estimation methods. It is unclear how to integrate data from many genes, each potentially evolving under a different model of substitution and at a different rate. Current methods can be grouped by the way the data are handled (genes considered separately or combined into a 'supergene') and the way gene-specific rate models are applied (global versus local clock). There are advantages and disadvantages to each of these approaches, and the optimal method has not yet emerged. Fortunately, time estimates inferred using many genes or proteins have greater precision and appear to be robust to different approaches. 相似文献

13.

A bayesian approach to improved estimation of causal effect predictiveness for a principal surrogate endpoint

Zigler CM Belin TR 《Biometrics》2012,68(3):922-932

Summary The literature on potential outcomes has shown that traditional methods for characterizing surrogate endpoints in clinical trials based only on observed quantities can fail to capture causal relationships between treatments, surrogates, and outcomes. Building on the potential-outcomes formulation of a principal surrogate, we introduce a Bayesian method to estimate the causal effect predictiveness (CEP) surface and quantify a candidate surrogate's utility for reliably predicting clinical outcomes. In considering the full joint distribution of all potentially observable quantities, our Bayesian approach has the following features. First, our approach illuminates implicit assumptions embedded in previously-used estimation strategies that have been shown to result in poor performance. Second, our approach provides tools for making explicit and scientifically-interpretable assumptions regarding associations about which observed data are not informative. Through simulations based on an HIV vaccine trial, we found that the Bayesian approach can produce estimates of the CEP surface with improved performance compared to previous methods. Third, our approach can extend principal-surrogate estimation beyond the previously considered setting of a vaccine trial where the candidate surrogate is constant in one arm of the study. We illustrate this extension through an application to an AIDS therapy trial where the candidate surrogate varies in both treatment arms. 相似文献

14.

Using the Whole Cohort in the Analysis of Case-Control Data

Norman E. Breslow Gustavo Amorim Mary B. Pettinger Jacques Rossouw 《Statistics in biosciences》2013,5(2):232-249

Standard analyses of data from case-control studies that are nested in a large cohort ignore information available for cohort members not sampled for the sub-study. This paper reviews several methods designed to increase estimation efficiency by using more of the data, treating the case-control sample as a two or three phase stratified sample. When applied to a study of coronary heart disease among women in the hormone trials of the Women’s Health Initiative, modest but increasing gains in precision of regression coefficients were observed depending on the amount of cohort information used in the analysis. The gains were particularly evident for pseudo- or maximum likelihood estimates whose validity depends on the assumed model being correct. Larger standard errors were obtained for coefficients estimated by inverse probability weighted methods that are more robust to model misspecification. Such misspecification may have been responsible for an important difference in one key regression coefficient estimated using the weighted compared with the more efficient methods. 相似文献

15.

Parameter estimation in systems biology models using spline approximation

Choujun Zhan Lam F Yeung 《BMC systems biology》2011,5(1):14

Background

Mathematical models for revealing the dynamics and interactions properties of biological systems play an important role in computational systems biology. The inference of model parameter values from time-course data can be considered as a "reverse engineering" process and is still one of the most challenging tasks. Many parameter estimation methods have been developed but none of these methods is effective for all cases and can overwhelm all other approaches. Instead, various methods have their advantages and disadvantages. It is worth to develop parameter estimation methods which are robust against noise, efficient in computation and flexible enough to meet different constraints. 相似文献

16.

One-step targeted maximum likelihood estimation for time-to-event outcomes

Weixin Cai Mark J. van der Laan 《Biometrics》2020,76(3):722-733

Researchers in observational survival analysis are interested in not only estimating survival curve nonparametrically but also having statistical inference for the parameter. We consider right-censored failure time data where we observe n independent and identically distributed observations of a vector random variable consisting of baseline covariates, a binary treatment at baseline, a survival time subject to right censoring, and the censoring indicator. We assume the baseline covariates are allowed to affect the treatment and censoring so that an estimator that ignores covariate information would be inconsistent. The goal is to use these data to estimate the counterfactual average survival curve of the population if all subjects are assigned the same treatment at baseline. Existing observational survival analysis methods do not result in monotone survival curve estimators, which is undesirable and may lose efficiency by not constraining the shape of the estimator using the prior knowledge of the estimand. In this paper, we present a one-step Targeted Maximum Likelihood Estimator (TMLE) for estimating the counterfactual average survival curve. We show that this new TMLE can be executed via recursion in small local updates. We demonstrate the finite sample performance of this one-step TMLE in simulations and an application to a monoclonal gammopathy data. 相似文献

17.

Adjusting for selection bias in assessing treatment effect estimates from multiple subgroups

Ekkehard Glimm 《Biometrical journal. Biometrische Zeitschrift》2019,61(1):216-229

This paper discusses a number of methods for adjusting treatment effect estimates in clinical trials where differential effects in several subpopulations are suspected. In such situations, the estimates from the most extreme subpopulation are often overinterpreted. The paper focusses on the construction of simultaneous confidence intervals intended to provide a more realistic assessment regarding the uncertainty around these extreme results. The methods from simultaneous inference are compared with shrinkage estimates arising from Bayesian hierarchical models by discussing salient features of both approaches in a typical application. 相似文献

18.

Increasing data transparency and estimating phylogenetic uncertainty in supertrees: Approaches using nonparametric bootstrapping 总被引：3，自引：0，他引：3

Moore BR Smith SA Donoghue MJ 《Systematic biology》2006,55(4):662-676

The estimation of ever larger phylogenies requires consideration of alternative inference strategies, including divide-and-conquer approaches that decompose the global inference problem to a set of smaller, more manageable component problems. A prominent locus of research in this area is the development of supertree methods, which estimate a composite tree by combining a set of partially overlapping component topologies. Although promising, the use of component tree topologies as the primary data dissociates supertrees from complexities within the underling character data and complicates the evaluation of phylogenetic uncertainty. We address these issues by exploring three approaches that variously incorporate nonparametric bootstrapping into a common supertree estimation algorithm (matrix representation with parsimony, although any algorithm might be used), including bootstrap-weighting, source-tree bootstrapping, and hierarchical bootstrapping. We illustrate these procedures by means of hypothetical and empirical examples. Our preliminary experiments suggest that these methods have the potential to improve the correspondence of supertree estimates to those derived from simultaneous analysis of the combined data and to allow uncertainty in supertree topologies to be quantified. The ability to increase the transparency of supertrees to the underlying character data has several practical implications and sheds new light on an old debate. These methods have been implemented in the freely available program, tREeBOOT. 相似文献

19.

Analyzing time-to-event data in a clinical trial when an unknown proportion of subjects has experienced the event at entry

Balasubramanian R Lagakos SW 《Biometrics》2004,60(2):335-343

In some clinical trials, where the outcome is the time until development of a silent event, an unknown proportion of subjects who have already experienced the event will be unknowingly enrolled due to the imperfect nature of the diagnostic tests used to screen potential subjects. For example, commonly used diagnostic tests for evaluating HIV infection status in infants, such as DNA PCR and HIV Culture, have low sensitivity when given soon after infection. This can lead to the inclusion of an unknown proportion of HIV-infected infants into clinical trials aimed at the prevention of transmission from HIV-positive mothers to their infants through breastfeeding. The infection status of infants at the end of the trial, when they are more than a year of age, can be determined with certainty. For those infants found to be infected with HIV at the end of the trial, it cannot be determined whether this occurred during the study or whether they were already infected when they were enrolled. In these settings, estimates of the cumulative risk of the event by the end of the study will overestimate the true probability of event during the study period and hypothesis tests comparing two or more intervention strategies can also be biased. We present inference methods for the distribution of time until the event of interest in these settings, and investigate issues in the design of such trials when there is a choice of using both imperfect and perfect diagnostic tests. 相似文献

20.

Counting Cats: The integration of expert and citizen science data for unbiased inference of population abundance

Jenni L. McDonald Dave Hodgson 《Ecology and evolution》2021,11(9):4325

Free‐roaming animal populations are hard to count, and professional experts are a limited resource. There is vast untapped potential in the data collected by nonprofessional scientists who volunteer their time to population monitoring, but citizen science (CS) raises concerns around data quality and biases. A particular concern in abundance modeling is the presence of false positives that can occur due to misidentification of nontarget species. Here, we introduce Integrated Abundance Models (IAMs) that integrate citizen and expert data to allow robust inference of population abundance meanwhile accounting for biases caused by misidentification. We used simulation experiments to confirm that IAMs successfully remove the inflation of abundance estimates caused by false‐positive detections and can provide accurate estimates of both bias and abundance. We illustrate the approach with a case study on unowned domestic cats, which are commonly confused with owned, and infer their abundance by analyzing a combination of CS data and expert data. Our case study finds that relying on CS data alone, either through simple summation or via traditional modeling approaches, can vastly inflate abundance estimates. IAMs provide an adaptable framework, increasing the opportunity for further development of the approach, tailoring to specific systems and robust use of CS data. 相似文献