首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
We are interested in the estimation of average treatment effects based on right-censored data of an observational study. We focus on causal inference of differences between t-year absolute event risks in a situation with competing risks. We derive doubly robust estimation equations and implement estimators for the nuisance parameters based on working regression models for the outcome, censoring, and treatment distribution conditional on auxiliary baseline covariates. We use the functional delta method to show that these estimators are regular asymptotically linear estimators and estimate their variances based on estimates of their influence functions. In empirical studies, we assess the robustness of the estimators and the coverage of confidence intervals. The methods are further illustrated using data from a Danish registry study.  相似文献   

3.
We have investigated simulation-based techniques for parameter estimation in chaotic intercellular networks. The proposed methodology combines a synchronization–based framework for parameter estimation in coupled chaotic systems with some state–of–the–art computational inference methods borrowed from the field of computational statistics. The first method is a stochastic optimization algorithm, known as accelerated random search method, and the other two techniques are based on approximate Bayesian computation. The latter is a general methodology for non–parametric inference that can be applied to practically any system of interest. The first method based on approximate Bayesian computation is a Markov Chain Monte Carlo scheme that generates a series of random parameter realizations for which a low synchronization error is guaranteed. We show that accurate parameter estimates can be obtained by averaging over these realizations. The second ABC–based technique is a Sequential Monte Carlo scheme. The algorithm generates a sequence of “populations”, i.e., sets of randomly generated parameter values, where the members of a certain population attain a synchronization error that is lesser than the error attained by members of the previous population. Again, we show that accurate estimates can be obtained by averaging over the parameter values in the last population of the sequence. We have analysed how effective these methods are from a computational perspective. For the numerical simulations we have considered a network that consists of two modified repressilators with identical parameters, coupled by the fast diffusion of the autoinducer across the cell membranes.  相似文献   

4.
Many estimators of the average effect of a treatment on an outcome require estimation of the propensity score, the outcome regression, or both. It is often beneficial to utilize flexible techniques, such as semiparametric regression or machine learning, to estimate these quantities. However, optimal estimation of these regressions does not necessarily lead to optimal estimation of the average treatment effect, particularly in settings with strong instrumental variables. A recent proposal addressed these issues via the outcome-adaptive lasso, a penalized regression technique for estimating the propensity score that seeks to minimize the impact of instrumental variables on treatment effect estimators. However, a notable limitation of this approach is that its application is restricted to parametric models. We propose a more flexible alternative that we call the outcome highly adaptive lasso. We discuss the large sample theory for this estimator and propose closed-form confidence intervals based on the proposed estimator. We show via simulation that our method offers benefits over several popular approaches.  相似文献   

5.
Practical identifiability of Systems Biology models has received a lot of attention in recent scientific research. It addresses the crucial question for models’ predictability: how accurately can the models’ parameters be recovered from available experimental data. The methods based on profile likelihood are among the most reliable methods of practical identification. However, these methods are often computationally demanding or lead to inaccurate estimations of parameters’ confidence intervals. Development of methods, which can accurately produce parameters’ confidence intervals in reasonable computational time, is of utmost importance for Systems Biology and QSP modeling.We propose an algorithm Confidence Intervals by Constraint Optimization (CICO) based on profile likelihood, designed to speed-up confidence intervals estimation and reduce computational cost. The numerical implementation of the algorithm includes settings to control the accuracy of confidence intervals estimates. The algorithm was tested on a number of Systems Biology models, including Taxol treatment model and STAT5 Dimerization model, discussed in the current article.The CICO algorithm is implemented in a software package freely available in Julia (https://github.com/insysbio/LikelihoodProfiler.jl) and Python (https://github.com/insysbio/LikelihoodProfiler.py).  相似文献   

6.
Increased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines, each with MEK inhibitor (MEKi) response and mRNA expression, point mutation, and copy number variation data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ = 0.88–0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ = 0.11–0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as exclusion of between-tissue signals leads to a decrease in Spearman’s ρ from a range of 0.43–0.62 to 0.30–0.51. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference.  相似文献   

7.
Accurate prediction of RNA pseudoknotted secondary structures from the base sequence is a challenging computational problem. Since prediction algorithms rely on thermodynamic energy models to identify low-energy structures, prediction accuracy relies in large part on the quality of free energy change parameters. In this work, we use our earlier constraint generation and Boltzmann likelihood parameter estimation methods to obtain new energy parameters for two energy models for secondary structures with pseudoknots, namely, the Dirks–Pierce (DP) and the Cao–Chen (CC) models. To train our parameters, and also to test their accuracy, we create a large data set of both pseudoknotted and pseudoknot-free secondary structures. In addition to structural data our training data set also includes thermodynamic data, for which experimentally determined free energy changes are available for sequences and their reference structures. When incorporated into the HotKnots prediction algorithm, our new parameters result in significantly improved secondary structure prediction on our test data set. Specifically, the prediction accuracy when using our new parameters improves from 68% to 79% for the DP model, and from 70% to 77% for the CC model.  相似文献   

8.
When predicting population dynamics, the value of the prediction is not enough and should be accompanied by a confidence interval that integrates the whole chain of errors, from observations to predictions via the estimates of the parameters of the model. Matrix models are often used to predict the dynamics of age- or size-structured populations. Their parameters are vital rates. This study aims (1) at assessing the impact of the variability of observations on vital rates, and then on model’s predictions, and (2) at comparing three methods for computing confidence intervals for values predicted from the models. The first method is the bootstrap. The second method is analytic and approximates the standard error of predictions by their asymptotic variance as the sample size tends to infinity. The third method combines use of the bootstrap to estimate the standard errors of vital rates with the analytical method to then estimate the errors of predictions from the model. Computations are done for an Usher matrix models that predicts the asymptotic (as time goes to infinity) stock recovery rate for three timber species in French Guiana. Little difference is found between the hybrid and the analytic method. Their estimates of bias and standard error converge towards the bootstrap estimates when the error on vital rates becomes small enough, which corresponds in the present case to a number of observations greater than 5000 trees.  相似文献   

9.
BackgroundKnowledge of accurate gestational age is required for comprehensive pregnancy care and is an essential component of research evaluating causes of preterm birth. In industrialised countries gestational age is determined with the help of fetal biometry in early pregnancy. Lack of ultrasound and late presentation to antenatal clinic limits this practice in low-resource settings. Instead, clinical estimators of gestational age are used, but their accuracy remains a matter of debate.MethodsIn a cohort of 688 singleton pregnancies from rural Papua New Guinea, delivery gestational age was calculated from Ballard score, last menstrual period, symphysis-pubis fundal height at first visit and quickening as well as mid- and late pregnancy fetal biometry. Published models using sequential fundal height measurements and corrected last menstrual period to estimate gestational age were also tested. Novel linear models that combined clinical measurements for gestational age estimation were developed. Predictions were compared with the reference early pregnancy ultrasound (<25 gestational weeks) using correlation, regression and Bland-Altman analyses and ranked for their capability to predict preterm birth using the harmonic mean of recall and precision (F-measure).ResultsAverage bias between reference ultrasound and clinical methods ranged from 0–11 days (95% confidence levels: 14–42 days). Preterm birth was best predicted by mid-pregnancy ultrasound (F-measure: 0.72), and neuromuscular Ballard score provided the least reliable preterm birth prediction (F-measure: 0.17). The best clinical methods to predict gestational age and preterm birth were last menstrual period and fundal height (F-measures 0.35). A linear model combining both measures improved prediction of preterm birth (F-measure: 0.58).ConclusionsEstimation of gestational age without ultrasound is prone to significant error. In the absence of ultrasound facilities, last menstrual period and fundal height are among the more reliable clinical measures. This study underlines the importance of strengthening ultrasound facilities and developing novel ways to estimate gestational age.  相似文献   

10.
Marques TA 《Biometrics》2004,60(3):757-763
Line transect sampling is one of the most widely used methods for animal abundance assessment. Standard estimation methods assume certain detection on the transect, no animal movement, and no measurement errors. Failure of the assumptions can cause substantial bias. In this work, the effect of error measurement on line transect estimators is investigated. Based on considerations of the process generating the errors, a multiplicative error model is presented and a simple way of correcting estimates based on knowledge of the error distribution is proposed. Using beta models for the error distribution, the effect of errors and of the proposed correction is assessed by simulation. Adequate confidence intervals for the corrected estimates are obtained using a bootstrap variance estimate for the correction and the delta method. As noted by Chen (1998, Biometrics 54, 899-908), even unbiased estimators of the distances might lead to biased density estimators, depending on the actual error distribution. In contrast with the findings of Chen, who used an additive model, unbiased estimation of distances, given a multiplicative model, lead to overestimation of density. Some error distributions result in observed distance distributions that make efficient estimation impossible, by removing the shoulder present in the original detection function. This indicates the need to improve field methods to reduce measurement error. An application of the new methods to a real data set is presented.  相似文献   

11.
12.
Li Y  Guolo A  Hoffman FO  Carroll RJ 《Biometrics》2007,63(4):1226-1236
In radiation epidemiology, it is often necessary to use mathematical models in the absence of direct measurements of individual doses. When complex models are used as surrogates for direct measurements to estimate individual doses that occurred almost 50 years ago, dose estimates will be associated with considerable error, this error being a mixture of (a) classical measurement error due to individual data such as diet histories and (b) Berkson measurement error associated with various aspects of the dosimetry system. In the Nevada Test Site(NTS) Thyroid Disease Study, the Berkson measurement errors are correlated within strata. This article concerns the development of statistical methods for inference about risk of radiation dose on thyroid disease, methods that account for the complex error structure inherence in the problem. Bayesian methods using Markov chain Monte Carlo and Monte-Carlo expectation-maximization methods are described, with both sharing a key Metropolis-Hastings step. Regression calibration is also considered, but we show that regression calibration does not use the correlation structure of the Berkson errors. Our methods are applied to the NTS Study, where we find a strong dose-response relationship between dose and thyroiditis. We conclude that full consideration of mixtures of Berkson and classical uncertainties in reconstructed individual doses are important for quantifying the dose response and its credibility/confidence interval. Using regression calibration and expectation values for individual doses can lead to a substantial underestimation of the excess relative risk per gray and its 95% confidence intervals.  相似文献   

13.
Case–control designs are commonly employed in genetic association studies. In addition to the case–control status, data on secondary traits are often collected. Directly regressing secondary traits on genetic variants from a case–control sample often leads to biased estimation. Several statistical methods have been proposed to address this issue. The inverse probability weighting (IPW) approach and the semiparametric maximum-likelihood (SPML) approach are the most commonly used. A new weighted estimating equation (WEE) approach is proposed to provide unbiased estimation of genetic associations with secondary traits, by combining observed and counterfactual outcomes. Compared to the existing approaches, WEE is more robust against biased sampling and disease model misspecification. We conducted simulations to evaluate the performance of the WEE under various models and sampling schemes. The WEE demonstrated robustness in all scenarios investigated, had appropriate type I error, and was as powerful or more powerful than the IPW and SPML approaches. We applied the WEE to an asthma case–control study to estimate the associations between the thymic stromal lymphopoietin gene and two secondary traits: overweight status and serum IgE level. The WEE identified two SNPs associated with overweight in logistic regression, three SNPs associated with serum IgE levels in linear regression, and an additional four SNPs that were missed in linear regression to be associated with the 75th quantile of IgE in quantile regression. The WEE approach provides a general and robust secondary analysis framework, which complements the existing approaches and should serve as a valuable tool for identifying new associations with secondary traits.  相似文献   

14.
Nonlinear mixed effects models are now widely used in biometrical studies, especially in pharmacokinetic research or for the analysis of growth traits for agricultural and laboratory species. Most of these studies, however, are often based on ML estimation procedures, which are known to be biased downwards. A few REML extensions have been proposed, but only for approximated methods. The aim of this paper is to present a REML implementation for nonlinear mixed effects models within an exact estimation scheme, based on an integration of the fixed effects and a stochastic estimation procedure. This method was implemented via a stochastic EM, namely the SAEM algorithm. The simulation study showed that the proposed REML estimation procedure considerably reduced the bias observed with the ML estimation, as well as the residual mean squared error of the variance parameter estimations, especially in the unbalanced cases. ML and REML based estimators of fixed effects were also compared via simulation. Although the two kinds of estimates were very close in terms of bias and mean square error, predictions of individual profiles were clearly improved when using REML vs. ML. An application of this estimation procedure is presented for the modelling of growth in lines of chicken.  相似文献   

15.
Hwang WH  Huang SY 《Biometrics》2003,59(4):1113-1122
We consider estimation problems in capture-recapture models when the covariates or the auxiliary variables are measured with errors. The naive approach, which ignores measurement errors, is found to be unacceptable in the estimation of both regression parameters and population size: it yields estimators with biases increasing with the magnitude of errors, and flawed confidence intervals. To account for measurement errors, we derive a regression parameter estimator using a regression calibration method. We develop modified estimators of the population size accordingly. A simulation study shows that the resulting estimators are more satisfactory than those from either the naive approach or the simulation extrapolation (SIMEX) method. Data from a bird species Prinia flaviventris in Hong Kong are analyzed with and without the assumption of measurement errors, to demonstrate the effects of errors on estimations.  相似文献   

16.
Two methods are commonly employed for evaluating the extent of the uncertainty of evolutionary distances between sequences: either some estimator of the variance of the distance estimator, or the bootstrap method. However, both approaches can be misleading, particularly when the evolutionary distance is small. We propose using another statistical method which does not have the same defect: interval estimation. We show how confidence intervals may be constructed for the Jukes and Cantor (1969) and Kimura two-parameter (1980) estimators. We compare the exact confidence intervals thus obtained with the approximate intervals derived by the two previous methods, using artificial and biological data. The results show that the usual methods clearly underestimate the variability when the substitution rate is low and when sequences are short. Moreover, our analysis suggests that similar results may be expected for other evolutionary distance estimators.   相似文献   

17.
The current approach to using machine learning (ML) algorithms in healthcare is to either require clinician oversight for every use case or use their predictions without any human oversight. We explore a middle ground that lets ML algorithms abstain from making a prediction to simultaneously improve their reliability and reduce the burden placed on human experts. To this end, we present a general penalized loss minimization framework for training selective prediction-set (SPS) models, which choose to either output a prediction set or abstain. The resulting models abstain when the outcome is difficult to predict accurately, such as on subjects who are too different from the training data, and achieve higher accuracy on those they do give predictions for. We then introduce a model-agnostic, statistical inference procedure for the coverage rate of an SPS model that ensembles individual models trained using K-fold cross-validation. We find that SPS ensembles attain prediction-set coverage rates closer to the nominal level and have narrower confidence intervals for its marginal coverage rate. We apply our method to train neural networks that abstain more for out-of-sample images on the MNIST digit prediction task and achieve higher predictive accuracy for ICU patients compared to existing approaches.  相似文献   

18.
The choice of guide RNA (gRNA) for CRISPR-based gene targeting is an essential step in gene editing applications, but the prediction of gRNA specificity remains challenging. Lack of transparency and focus on point estimates of efficiency disregarding the information on possible error sources in the model limit the power of existing Deep Learning-based methods. To overcome these problems, we present a new approach, a hybrid of Capsule Networks and Gaussian Processes. Our method predicts the cleavage efficiency of a gRNA with a corresponding confidence interval, which allows the user to incorporate information regarding possible model errors into the experimental design. We provide the first utilization of uncertainty estimation in computational gRNA design, which is a critical step toward accurate decision-making for future CRISPR applications. The proposed solution demonstrates acceptable confidence intervals for most test sets and shows regression quality similar to existing models. We introduce a set of criteria for gRNA selection based on off-target cleavage efficiency and its variance and present a collection of pre-computed gRNAs for human chromosome 22. Using Neural Network Interpretation methods, we show that our model rediscovers an established biological factor underlying cleavage efficiency, the importance of the seed region in gRNA.  相似文献   

19.
Online Prediction of the Running Time of Tasks   总被引:7,自引:0,他引:7  
We describe and evaluate the Running Time Advisor (RTA), a system that can predict the running time of a compute-bound task on a typical shared, unreserved commodity host. The prediction is computed from linear time series predictions of host load and takes the form of a confidence interval that neatly expresses the error associated with the measurement and prediction processes – error that must be captured to make statistically valid decisions based on the predictions. Adaptive applications make such decisions in pursuit of consistent high performance, choosing, for example, the host where a task is most likely to meet its deadline. We begin by describing the system and summarizing the results of our previously published work on host load prediction. We then describe our algorithm for computing predictions of running time from host load predictions. We next evaluate the system using over 100,000 randomized testcases run on 39 different hosts, finding that is indeed capable of computing correct and useful confidence intervals. Finally, we report on our experience with using the RTA in application-oriented real-time scheduling in distributed systems.  相似文献   

20.
Genomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5–17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.Subject terms: Quantitative trait, Genetic models  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号