首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 687 毫秒
1.
Inverse-probability-weighted estimators are the oldest and potentially most commonly used class of procedures for the estimation of causal effects. By adjusting for selection biases via a weighting mechanism, these procedures estimate an effect of interest by constructing a pseudopopulation in which selection biases are eliminated. Despite their ease of use, these estimators require the correct specification of a model for the weighting mechanism, are known to be inefficient, and suffer from the curse of dimensionality. We propose a class of nonparametric inverse-probability-weighted estimators in which the weighting mechanism is estimated via undersmoothing of the highly adaptive lasso, a nonparametric regression function proven to converge at nearly n 1 / 3 $ n^{-1/3}$ -rate to the true weighting mechanism. We demonstrate that our estimators are asymptotically linear with variance converging to the nonparametric efficiency bound. Unlike doubly robust estimators, our procedures require neither derivation of the efficient influence function nor specification of the conditional outcome model. Our theoretical developments have broad implications for the construction of efficient inverse-probability-weighted estimators in large statistical models and a variety of problem settings. We assess the practical performance of our estimators in simulation studies and demonstrate use of our proposed methodology with data from a large-scale epidemiologic study.  相似文献   

2.
Inference of population structure from genetic data plays an important role in population and medical genetics studies. With the advancement and decreasing cost of sequencing technology, the increasingly available whole genome sequencing data provide much richer information about the underlying population structure. The traditional method originally developed for array-based genotype data for computing and selecting top principal components (PCs) that capture population structure may not perform well on sequencing data for two reasons. First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio n / p $n/p$ is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative PCs based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method.  相似文献   

3.
Pragmatic trials evaluating health care interventions often adopt cluster randomization due to scientific or logistical considerations. Systematic reviews have shown that coprimary endpoints are not uncommon in pragmatic trials but are seldom recognized in sample size or power calculations. While methods for power analysis based on K ( K 2 $K\ge 2$ ) binary coprimary endpoints are available for cluster randomized trials (CRTs), to our knowledge, methods for continuous coprimary endpoints are not yet available. Assuming a multivariate linear mixed model (MLMM) that accounts for multiple types of intraclass correlation coefficients among the observations in each cluster, we derive the closed-form joint distribution of K treatment effect estimators to facilitate sample size and power determination with different types of null hypotheses under equal cluster sizes. We characterize the relationship between the power of each test and different types of correlation parameters. We further relax the equal cluster size assumption and approximate the joint distribution of the K treatment effect estimators through the mean and coefficient of variation of cluster sizes. Our simulation studies with a finite number of clusters indicate that the predicted power by our method agrees well with the empirical power, when the parameters in the MLMM are estimated via the expectation-maximization algorithm. An application to a real CRT is presented to illustrate the proposed method.  相似文献   

4.
Ye He  Ling Zhou  Yingcun Xia  Huazhen Lin 《Biometrics》2023,79(3):2157-2170
The existing methods for subgroup analysis can be roughly divided into two categories: finite mixture models (FMM) and regularization methods with an ℓ1-type penalty. In this paper, by introducing the group centers and ℓ2-type penalty in the loss function, we propose a novel center-augmented regularization (CAR) method; this method can be regarded as a unification of the regularization method and FMM and hence exhibits higher efficiency and robustness and simpler computations than the existing methods. In particular, its computational complexity is reduced from the O ( n 2 ) $O(n^2)$ of the conventional pairwise-penalty method to only O ( n K ) $O(nK)$ , where n is the sample size and K is the number of subgroups. The asymptotic normality of CAR is established, and the convergence of the algorithm is proven. CAR is applied to a dataset from a multicenter clinical trial, Buprenorphine in the Treatment of Opiate Dependence; a larger R2 is produced and three additional significant variables are identified compared to those of the existing methods.  相似文献   

5.
This paper is motivated by studying differential brain activities to multiple experimental condition presentations in intracranial electroencephalography (iEEG) experiments. Contrasting effects of experimental conditions are often zero in most regions and nonzero in some local regions, yielding locally sparse functions. Such studies are essentially a function-on-scalar regression problem, with interest being focused not only on estimating nonparametric functions but also on recovering the function supports. We propose a weighted group bridge approach for simultaneous function estimation and support recovery in function-on-scalar mixed effect models, while accounting for heterogeneity present in functional data. We use B-splines to transform sparsity of functions to its sparse vector counterpart of increasing dimension, and propose a fast nonconvex optimization algorithm using nested alternative direction method of multipliers (ADMM) for estimation. Large sample properties are established. In particular, we show that the estimated coefficient functions are rate optimal in the minimax sense under the L2 norm and resemble a phase transition phenomenon. For support estimation, we derive a convergence rate under the L $L_{\infty }$ norm that leads to a selection consistency property under δ-sparsity, and obtain a result under strict sparsity using a simple sufficient regularity condition. An adjusted extended Bayesian information criterion is proposed for parameter tuning. The developed method is illustrated through simulations and an application to a novel iEEG data set to study multisensory integration.  相似文献   

6.
Rosenbaum PR 《Biometrics》2011,67(3):1017-1027
Summary In an observational or nonrandomized study of treatment effects, a sensitivity analysis indicates the magnitude of bias from unmeasured covariates that would need to be present to alter the conclusions of a naïve analysis that presumes adjustments for observed covariates suffice to remove all bias. The power of sensitivity analysis is the probability that it will reject a false hypothesis about treatment effects allowing for a departure from random assignment of a specified magnitude; in particular, if this specified magnitude is “no departure” then this is the same as the power of a randomization test in a randomized experiment. A new family of u‐statistics is proposed that includes Wilcoxon's signed rank statistic but also includes other statistics with substantially higher power when a sensitivity analysis is performed in an observational study. Wilcoxon's statistic has high power to detect small effects in large randomized experiments—that is, it often has good Pitman efficiency—but small effects are invariably sensitive to small unobserved biases. Members of this family of u‐statistics that emphasize medium to large effects can have substantially higher power in a sensitivity analysis. For example, in one situation with 250 pair differences that are Normal with expectation 1/2 and variance 1, the power of a sensitivity analysis that uses Wilcoxon's statistic is 0.08 while the power of another member of the family of u‐statistics is 0.66. The topic is examined by performing a sensitivity analysis in three observational studies, using an asymptotic measure called the design sensitivity, and by simulating power in finite samples. The three examples are drawn from epidemiology, clinical medicine, and genetic toxicology.  相似文献   

7.
For ordinal outcomes, the average treatment effect is often ill-defined and hard to interpret. Echoing Agresti and Kateri, we argue that the relative treatment effect can be a useful measure, especially for ordinal outcomes, which is defined as γ = pr { Y i ( 1 ) > Y i ( 0 ) } pr { Y i ( 1 ) < Y i ( 0 ) } , with Y i ( 1 ) and Y i ( 0 ) being the potential outcomes of unit i under treatment and control, respectively. Given the marginal distributions of the potential outcomes, we derive the sharp bounds on γ , which are identifiable parameters based on the observed data. Agresti and Kateri focused on modeling strategies under the assumption of independent potential outcomes, but we allow for arbitrary dependence.  相似文献   

8.
Data-driven methods for personalizing treatment assignment have garnered much attention from clinicians and researchers. Dynamic treatment regimes formalize this through a sequence of decision rules that map individual patient characteristics to a recommended treatment. Observational studies are commonly used for estimating dynamic treatment regimes due to the potentially prohibitive costs of conducting sequential multiple assignment randomized trials. However, estimating a dynamic treatment regime from observational data can lead to bias in the estimated regime due to unmeasured confounding. Sensitivity analyses are useful for assessing how robust the conclusions of the study are to a potential unmeasured confounder. A Monte Carlo sensitivity analysis is a probabilistic approach that involves positing and sampling from distributions for the parameters governing the bias. We propose a method for performing a Monte Carlo sensitivity analysis of the bias due to unmeasured confounding in the estimation of dynamic treatment regimes. We demonstrate the performance of the proposed procedure with a simulation study and apply it to an observational study examining tailoring the use of antidepressant medication for reducing symptoms of depression using data from Kaiser Permanente Washington.  相似文献   

9.
K.O. Ekvall  M. Bottai 《Biometrics》2023,79(3):2286-2297
We propose a unified framework for likelihood-based regression modeling when the response variable has finite support. Our work is motivated by the fact that, in practice, observed data are discrete and bounded. The proposed methods assume a model which includes models previously considered for interval-censored variables with log-concave distributions as special cases. The resulting log-likelihood is concave, which we use to establish asymptotic normality of its maximizer as the number of observations n tends to infinity with the number of parameters d fixed, and rates of convergence of L1-regularized estimators when the true parameter vector is sparse and d and n both tend to infinity with log ( d ) / n 0 $\log (d) / n \rightarrow 0$ . We consider an inexact proximal Newton algorithm for computing estimates and give theoretical guarantees for its convergence. The range of possible applications is wide, including but not limited to survival analysis in discrete time, the modeling of outcomes on scored surveys and questionnaires, and, more generally, interval-censored regression. The applicability and usefulness of the proposed methods are illustrated in simulations and data examples.  相似文献   

10.
Unmeasured confounders are a common problem in drawing causal inferences in observational studies. VanderWeele (Biometrics 2008, 64, 702–706) presented a theorem that allows researchers to determine the sign of the unmeasured confounding bias when monotonic relationships hold between the unmeasured confounder and the treatment, and between the unmeasured confounder and the outcome. He showed that his theorem can be applied to causal effects with the total group as the standard population, but he did not mention the causal effects with treated and untreated groups as the standard population. Here, we extend his results to these causal effects, and apply our theorems to an observational study. When researchers have a sense of what the unmeasured confounder may be, conclusions can be drawn about the sign of the bias.  相似文献   

11.
In identifying subgroups of a heterogeneous disease or condition, it is often desirable to identify both the observations and the features which differ between subgroups. For instance, it may be that there is a subgroup of individuals with a certain disease who differ from the rest of the population based on the expression profile for only a subset of genes. Identifying the subgroup of patients and subset of genes could lead to better-targeted therapy. We can represent the subgroup of individuals and genes as a bicluster, a submatrix, U, of a larger data matrix, X, such that the features and observations in U differ from those not contained in U. We present a novel two-step method, SC-Biclust, for identifying U. In the first step, the observations in the bicluster are identified to maximize the sum of the weighted between-cluster feature differences. In the second step, features in the bicluster are identified based on their contribution to the clustering of the observations. This versatile method can be used to identify biclusters that differ on the basis of feature means, feature variances, or more general differences. The bicluster identification accuracy of SC-Biclust is illustrated through several simulated studies. Application of SC-Biclust to pain research illustrates its ability to identify biologically meaningful subgroups.  相似文献   

12.
Lok JJ  Degruttola V 《Biometrics》2012,68(3):745-754
Summary We estimate how the effect of antiretroviral treatment depends on the time from HIV-infection to initiation of treatment, using observational data. A major challenge in making inferences from such observational data arises from biases associated with the nonrandom assignment of treatment, for example bias induced by dependence of time of initiation on disease status. To address this concern, we develop a new class of Structural Nested Mean Models (SNMMs) to estimate the impact of time of initiation of treatment after infection on an outcome measured a fixed duration after initiation, compared to the effect of not initiating treatment. This leads to a SNMM that models the effect of multiple dosages of treatment on a time-dependent outcome, in contrast to most existing SNNMs, which focus on the effect of one dosage of treatment on an outcome measured at the end of the study. Our identifying assumption is that there are no unmeasured confounders. We illustrate our methods using the observational Acute Infection and Early Disease Research Program (AIEDRP) Core01 database on HIV. The current standard of care in HIV-infected patients is Highly Active Anti-Retroviral Treatment (HAART); however, the optimal time to start HAART has not yet been identified. The new class of SNNMs allows estimation of the dependence of the effect of 1 year of HAART on the time between estimated date of infection and treatment initiation, and on patient characteristics. Results of fitting this model imply that early use of HAART substantially improves immune reconstitution in the early and acute phase of HIV-infection.  相似文献   

13.
Vanderweele TJ 《Biometrics》2008,64(3):702-706
Summary .   Unmeasured confounding variables are a common problem in drawing causal inferences in observational studies. A theorem is given which in certain circumstances allows the researcher to draw conclusions about the sign of the bias of unmeasured confounding. Specifically, it is possible to determine the sign of the bias when monotonicity relationships hold between the unmeasured confounding variable and the treatment, and between the unmeasured confounding variable and the outcome. Some discussion is given to the conditions under which the theorem applies and the strengths and limitations of using the theorem to assess the sign of the bias of unmeasured confounding.  相似文献   

14.
No tillage (NT) has been proposed as a practice to reduce the adverse effects of tillage on contaminant (e.g., sediment and nutrient) losses to waterways. Nonetheless, previous reports on impacts of NT on nitrate ( NO 3 ) leaching are inconsistent. A global meta-analysis was conducted to test the hypothesis that the response of NO 3 leaching under NT, relative to tillage, is associated with tillage type (inversion vs non-inversion tillage), soil properties (e.g., soil organic carbon [SOC]), climate factors (i.e., water input), and management practices (e.g., NT duration and nitrogen fertilizer inputs). Overall, compared with all forms of tillage combined, NT had 4% and 14% greater area-scaled and yield-scaled NO 3 leaching losses, respectively. The NO 3 leaching under NT tended to be 7% greater than that of inversion tillage but comparable to non-inversion tillage. Greater NO 3 leaching under NT, compared with inversion tillage, was most evident under short-duration NT (<5 years), where water inputs were low (<2 mm day−1), in medium texture and low SOC (<1%) soils, and at both higher (>200 kg ha−1) and lower (0–100 kg ha−1) rates of nitrogen addition. Of these, SOC was the most important factor affecting the risk of NO3 leaching under NT compared with inversion tillage. Globally, on average, the greater amount of NO3 leached under NT, compared with inversion tillage, was mainly attributed to corresponding increases in drainage. The percentage of global cropping land with lower risk of NO3 leaching under NT, relative to inversion tillage, increased with NT duration from 3 years (31%) to 15 years (54%). This study highlighted that the benefits of NT adoption for mitigating NO 3 leaching are most likely in long-term NT cropping systems on high-SOC soils.  相似文献   

15.
We study the effect of misclassification of a binary covariate on the parameters of a logistic regression model. In particular we consider 2 × 2 × 2 tables. We assume that a binary covariate is subject to misclassification that may depend on the observed outcome. This type of misclassification is known as (outcome dependent) differential misclassification. We examine the resulting asymptotic bias on the parameters of the model and derive formulas for the biases and their approximations as a function of the odds and misclassification probabilities. Conditions for unbiased estimation are also discussed. The implications are illustrated numerically using a case control study. For completeness we briefly examine the effect of covariate dependent misclassification of exposures and of outcomes.  相似文献   

16.
In this work, we applied a multi-information source modeling technique to solve a multi-objective Bayesian optimization problem involving the simultaneous minimization of cost and maximization of growth for serum-free C2C12 cells using a hyper-volume improvement acquisition function. In sequential batches of custom media experiments designed using our Bayesian criteria, collected using multiple assays targeting different cellular growth dynamics, the algorithm learned to identify the trade-off relationship between long-term growth and cost. We were able to identify several media with > 100 % $>100\%$ more growth of C2C12 cells than the control, as well as a medium with 23% more growth at only 62.5% of the cost of the control. These algorithmically generated media also maintained growth far past the study period, indicating the modeling approach approximates the cell growth well from an extremely limited data set.  相似文献   

17.
A control theory perspective on determination of optimal dynamic treatment regimes is considered. The aim is to adapt statistical methodology that has been developed for medical or other biostatistical applications to incorporate powerful control techniques that have been designed for engineering or other technological problems. Data tend to be sparse and noisy in the biostatistical area and interest has tended to be in statistical inference for treatment effects. In engineering fields, experimental data can be more easily obtained and reproduced and interest is more often in performance and stability of proposed controllers rather than modeling and inference per se. We propose that modeling and estimation should be based on standard statistical techniques but subsequent treatment policy should be obtained from robust control. To bring focus, we concentrate on A‐learning methodology as developed in the biostatistical literature and H ‐synthesis from control theory. Simulations and two applications demonstrate robustness of the H strategy compared to standard A‐learning in the presence of model misspecification or measurement error.  相似文献   

18.
Unbiased estimates of burrowing owl populations (Athene cunicularia) are essential to achieving diverse management and conservation objectives. We conducted visibility trials and developed logistic regression models to identify and correct for visibility bias associated with single, vehicle-based, visual survey occasions of breeding male owls during daylight hours in an agricultural landscape in California between 30 April and 2 May 2007. Visibility was predicted best by a second-degree polynomial function of time of day and 7 categorical perch types. Probability of being visible was highest in the afternoon, and individuals that flushed, flew, or perched on hay bales were highly visible (>0.85). Visibility was lowest in agricultural fields (<0.46) and nonagricultural vegetation (<0.77). We used the results from this model to compute unbiased maximum likelihood estimates of visibility bias, and combined these with estimated probabilities of availability bias to validate our model by correcting for visibility and availability biases in 4 independent datasets collected during morning hours. Correcting for both biases produced reliable estimates of abundance in all 4 independent validation datasets. We recommend that estimates of burrowing owl abundance from surveys in the southwest United States correct for both visibility and availability biases. © 2011 The Wildlife Society.  相似文献   

19.
Hung Hung 《Biometrics》2019,75(2):650-662
Identification of differentially expressed genes (DE genes) is commonly conducted in modern biomedical research. However, unwanted variation inevitably arises during the data collection process, which can make the detection results heavily biased. Various methods have been suggested for removing the unwanted variation while keeping the biological variation to ensure a reliable analysis result. Removing unwanted variation (RUV) has recently been proposed for this purpose, which works by virtue of negative control genes. On the other hand, outliers frequently appear in modern high‐throughput genetic data, which can heavily affect the performances of RUV and its downstream analysis. In this work, we propose a robust RUV‐testing procedure (a robust RUV procedure to remove unwanted variance, followed by a robust testing procedure to identify DE genes) via γ ‐divergence. The advantages of our method are twofold: (a) it does not involve any modeling for the outlier distribution, which makes it applicable to various situations; (b) it is easy to implement in the sense that its robustness is controlled by a single tuning parameter γ of γ ‐divergence, and a data‐driven criterion is developed to select γ . When applied to real data sets, our method can successfully remove unwanted variation, and was able to identify more DE genes than conventional methods.  相似文献   

20.
Linda M. Haines 《Biometrics》2020,76(2):540-548
Multinomial N-mixture models are commonly used to fit data from a removal sampling protocol. If the mixing distribution is negative binomial, the distribution of the counts does not appear to have been identified, and practitioners approximate the requisite likelihood by placing an upper bound on the embedded infinite sum. In this paper, the distribution which underpins the multinomial N-mixture model with a negative binomial mixing distribution is shown to belong to the broad class of multivariate negative binomial distributions. Specifically, the likelihood can be expressed in closed form as the product of conditional and marginal likelihoods and the information matrix shown to be block diagonal. As a consequence, the nature of the maximum likelihood estimates of the unknown parameters and their attendant standard errors can be examined and tests of the hypothesis of the Poisson against the negative binomial mixing distribution formulated. In addition, appropriate multinomial N-mixture models for data sets which include zero site totals can also be constructed. Two illustrative examples are provided.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号