首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Many variables of interest in agricultural or economical surveys have skewed distributions and can equal zero. Our data are measures of sheet and rill erosion called Revised Universal Soil Loss Equation - 2 (RUSLE2). Small area estimates of mean RUSLE2 erosion are of interest. We use a zero-inflated lognormal mixed effects model for small area estimation. The model combines a unit-level lognormal model for the positive RUSLE2 responses with a unit-level logistic mixed effects model for the binary indicator that the response is nonzero. In the Conservation Effects Assessment Project (CEAP) data, counties with a higher probability of nonzero responses also tend to have a higher mean among the positive RUSLE2 values. We capture this property of the data through an assumption that the pair of random effects for a county are correlated. We develop empirical Bayes (EB) small area predictors and a bootstrap estimator of the mean squared error (MSE). In simulations, the proposed predictor is superior to simpler alternatives. We then apply the method to construct EB predictors of mean RUSLE2 erosion for South Dakota counties. To obtain auxiliary variables for the population of cropland in South Dakota, we integrate a satellite-derived land cover map with a geographic database of soil properties. We provide an R Shiny application called viscover (available at https://lyux.shinyapps.io/viscover/ ) to visualize the overlay operations required to construct the covariates. On the basis of bootstrap estimates of the mean square error, we conclude that the EB predictors of mean RUSLE2 erosion are superior to direct estimators.  相似文献   

2.
Micro-array technology allows investigators the opportunity to measure expression levels of thousands of genes simultaneously. However, investigators are also faced with the challenge of simultaneous estimation of gene expression differences for thousands of genes with very small sample sizes. Traditional estimators of differences between treatment means (ordinary least squares estimators or OLS) are not the best estimators if interest is in estimation of gene expression differences for an ensemble of genes. In the case that gene expression differences are regarded as exchangeable samples from a common population, estimators are available that result in much smaller average mean-square error across the population of gene expression difference estimates. We have simulated the application of such an estimator, namely an empirical Bayes (EB) estimator of random effects in a hierarchical linear model (normal-normal). Simulation results revealed mean-square error as low as 0.05 times the mean-square error of OLS estimators (i.e., the difference between treatment means). We applied the analysis to an example dataset as a demonstration of the shrinkage of EB estimators and of the reduction in mean-square error, i.e., increase in precision, associated with EB estimators in this analysis. The method described here is available in software that is available at .  相似文献   

3.
Independent censoring is a crucial assumption in survival analysis. However, this is impractical in many medical studies, where the presence of dependent censoring leads to difficulty in analyzing covariate effects on disease outcomes. The semicompeting risks framework offers one approach to handling dependent censoring. There are two representative estimators based on an artificial censoring technique in this data structure. However, neither of these estimators is better than another with respect to efficiency (standard error). In this paper, we propose a new weighted estimator for the accelerated failure time (AFT) model under dependent censoring. One of the advantages in our approach is that these weights are optimal among all the linear combinations of the previously mentioned two estimators. To calculate these weights, a novel resampling-based scheme is employed. Attendant asymptotic statistical results for the estimator are established. In addition, simulation studies, as well as an application to real data, show the gains in efficiency for our estimator.  相似文献   

4.
A simple procedure for estimating the false discovery rate   总被引:1,自引:0,他引:1  
MOTIVATION: The most used criterion in microarray data analysis is nowadays the false discovery rate (FDR). In the framework of estimating procedures based on the marginal distribution of the P-values without any assumption on gene expression changes, estimators of the FDR are necessarily conservatively biased. Indeed, only an upper bound estimate can be obtained for the key quantity pi0, which is the probability for a gene to be unmodified. In this paper, we propose a novel family of estimators for pi0 that allows the calculation of FDR. RESULTS: The very simple method for estimating pi0 called LBE (Location Based Estimator) is presented together with results on its variability. Simulation results indicate that the proposed estimator performs well in finite sample and has the best mean square error in most of the cases as compared with the procedures QVALUE, BUM and SPLOSH. The different procedures are then applied to real datasets. AVAILABILITY: The R function LBE is available at http://ifr69.vjf.inserm.fr/lbe CONTACT: broet@vjf.inserm.fr.  相似文献   

5.
We investigate methods for regression analysis when covariates are measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies the classical measurement error model, but it may not have repeated measurements. In addition to the surrogate variables that are available among the subjects in the calibration sample, we assume that there is an instrumental variable (IV) that is available for all study subjects. An IV is correlated with the unobserved true exposure variable and hence can be useful in the estimation of the regression coefficients. We propose a robust best linear estimator that uses all the available data, which is the most efficient among a class of consistent estimators. The proposed estimator is shown to be consistent and asymptotically normal under very weak distributional assumptions. For Poisson or linear regression, the proposed estimator is consistent even if the measurement error from the surrogate or IV is heteroscedastic. Finite-sample performance of the proposed estimator is examined and compared with other estimators via intensive simulation studies. The proposed method and other methods are applied to a bladder cancer case-control study.  相似文献   

6.
Liu D  Zhou XH 《Biometrics》2011,67(3):906-916
Covariate-specific receiver operating characteristic (ROC) curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this article, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates' effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted, and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form of the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.  相似文献   

7.
Zhao and Tsiatis (1997) consider the problem of estimation of the distribution of the quality-adjusted lifetime when the chronological survival time is subject to right censoring. The quality-adjusted lifetime is typically defined as a weighted sum of the times spent in certain states up until death or some other failure time. They propose an estimator and establish the relevant asymptotics under the assumption of independent censoring. In this paper we extend the data structure with a covariate process observed until the end of follow-up and identify the optimal estimation problem. Because of the curse of dimensionality, no globally efficient nonparametric estimators, which have a good practical performance at moderate sample sizes, exist. Given a correctly specified model for the hazard of censoring conditional on the observed quality-of-life and covariate processes, we propose a closed-form one-step estimator of the distribution of the quality-adjusted lifetime whose asymptotic variance attains the efficiency bound if we can correctly specify a lower-dimensional working model for the conditional distribution of quality-adjusted lifetime given the observed quality-of-life and covariate processes. The estimator remains consistent and asymptotically normal even if this latter submodel is misspecified. The practical performance of the estimators is illustrated with a simulation study. We also extend our proposed one-step estimator to the case where treatment assignment is confounded by observed risk factors so that this estimator can be used to test a treatment effect in an observational study.  相似文献   

8.
We consider the problem of jointly modeling survival time and longitudinal data subject to measurement error. The survival times are modeled through the proportional hazards model and a random effects model is assumed for the longitudinal covariate process. Under this framework, we propose an approximate nonparametric corrected-score estimator for the parameter, which describes the association between the time-to-event and the longitudinal covariate. The term nonparametric refers to the fact that assumptions regarding the distribution of the random effects and that of the measurement error are unnecessary. The finite sample size performance of the approximate nonparametric corrected-score estimator is examined through simulation studies and its asymptotic properties are also developed. Furthermore, the proposed estimator and some existing estimators are applied to real data from an AIDS clinical trial.  相似文献   

9.
10.
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study 4 different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with 3 discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach.  相似文献   

11.
Many late-phase clinical trials recruit subjects at multiple study sites. This introduces a hierarchical structure into the data that can result in a power-loss compared to a more homogeneous single-center trial. Building on a recently proposed approach to sample size determination, we suggest a sample size recalculation procedure for multicenter trials with continuous endpoints. The procedure estimates nuisance parameters at interim from noncomparative data and recalculates the sample size required based on these estimates. In contrast to other sample size calculation methods for multicenter trials, our approach assumes a mixed effects model and does not rely on balanced data within centers. It is therefore advantageous, especially for sample size recalculation at interim. We illustrate the proposed methodology by a study evaluating a diabetes management system. Monte Carlo simulations are carried out to evaluate operation characteristics of the sample size recalculation procedure using comparative as well as noncomparative data, assessing their dependence on parameters such as between-center heterogeneity, residual variance of observations, treatment effect size and number of centers. We compare two different estimators for between-center heterogeneity, an unadjusted and a bias-adjusted estimator, both based on quadratic forms. The type 1 error probability as well as statistical power are close to their nominal levels for all parameter combinations considered in our simulation study for the proposed unadjusted estimator, whereas the adjusted estimator exhibits some type 1 error rate inflation. Overall, the sample size recalculation procedure can be recommended to mitigate risks arising from misspecified nuisance parameters at the planning stage.  相似文献   

12.
G C Wei  M A Tanner 《Biometrics》1991,47(4):1297-1309
The first part of the article reviews the Data Augmentation algorithm and presents two approximations to the Data Augmentation algorithm for the analysis of missing-data problems: the Poor Man's Data Augmentation algorithm and the Asymptotic Data Augmentation algorithm. These two algorithms are then implemented in the context of censored regression data to obtain semiparametric methodology. The performances of the censored regression algorithms are examined in a simulation study. It is found, up to the precision of the study, that the bias of both the Poor Man's and Asymptotic Data Augmentation estimators, as well as the Buckley-James estimator, does not appear to differ from zero. However, with regard to mean squared error, over a wide range of settings examined in this simulation study, the two Data Augmentation estimators have a smaller mean squared error than does the Buckley-James estimator. In addition, associated with the two Data Augmentation estimators is a natural device for estimating the standard error of the estimated regression parameters. It is shown how this device can be used to estimate the standard error of either Data Augmentation estimate of any parameter (e.g., the correlation coefficient) associated with the model. In the simulation study, the estimated standard error of the Asymptotic Data Augmentation estimate of the regression parameter is found to be congruent with the Monte Carlo standard deviation of the corresponding parameter estimate. The algorithms are illustrated using the updated Stanford heart transplant data set.  相似文献   

13.
Antoniadou T  Wallach D 《Biometrics》2000,56(2):420-426
It is important, both for farmer profit and for the environment, to correctly dose nitrogen fertilizer for crop growth. Fertilizer recommendations are embodied in decision rules, which give a recommended dose of nitrogen (N) as a function of information available at the time the decision is made. In this paper, we first propose a criterion for evaluating decision rules. The proposed criterion is the expectation of the objective function when the decision rule is implemented. The major problem here is the estimation of this criterion. Two estimators are considered, a model-based and a nonparametric estimator. A simulation study shows that, in essentially all cases, the nonparametric estimator is better or no worse than the model-based estimator. The bias in the nonparametric estimator is always very small.  相似文献   

14.
Clinically relevant cardiovascular parameters, such as pulmonary blood volume (PBV) and ejection fraction (EF), can be assessed through indicator dilution techniques. Among these techniques, which are typically invasive due to the need for central catheterization, contrast ultrasonography provides a new emerging minimally invasive option. PBV and EF are then measured by a dilution system identification algorithm after detection of multiple dilution curves by an ultrasound scanner. In this paper, dilution systems are represented by parametric models. Since the measured indicator dilution curves (IDCs) are corrupted by measurement artifacts and outliers, the use of conventional least square error (LSE) estimator for estimating system parameters is not optimal. Different estimators are therefore proposed for estimating the system parameters. Comparison of these estimators with the LSE estimator in assessing EF and PBV is performed on simulated, in vitro and patient data. The results show that the proposed total least absolute deviation estimator (TLAD) outperforms other estimators. The measured IDCs are highly corrupted by noise, which affect the estimation of EF and PBV. Therefore, a two stage denoising method capable of removing outliers is also proposed for removing noise in IDCs.  相似文献   

15.
Doubly robust estimation in missing data and causal inference models   总被引:3,自引:0,他引:3  
Bang H  Robins JM 《Biometrics》2005,61(4):962-973
The goal of this article is to construct doubly robust (DR) estimators in ignorable missing data and causal inference models. In a missing data model, an estimator is DR if it remains consistent when either (but not necessarily both) a model for the missingness mechanism or a model for the distribution of the complete data is correctly specified. Because with observational data one can never be sure that either a missingness model or a complete data model is correct, perhaps the best that can be hoped for is to find a DR estimator. DR estimators, in contrast to standard likelihood-based or (nonaugmented) inverse probability-weighted estimators, give the analyst two chances, instead of only one, to make a valid inference. In a causal inference model, an estimator is DR if it remains consistent when either a model for the treatment assignment mechanism or a model for the distribution of the counterfactual data is correctly specified. Because with observational data one can never be sure that a model for the treatment assignment mechanism or a model for the counterfactual data is correct, inference based on DR estimators should improve upon previous approaches. Indeed, we present the results of simulation studies which demonstrate that the finite sample performance of DR estimators is as impressive as theory would predict. The proposed method is applied to a cardiovascular clinical trial.  相似文献   

16.
Pan W  Lin X  Zeng D 《Biometrics》2006,62(2):402-412
We propose a new class of models, transition measurement error models, to study the effects of covariates and the past responses on the current response in longitudinal studies when one of the covariates is measured with error. We show that the response variable conditional on the error-prone covariate follows a complex transition mixed effects model. The naive model obtained by ignoring the measurement error correctly specifies the transition part of the model, but misspecifies the covariate effect structure and ignores the random effects. We next study the asymptotic bias in naive estimator obtained by ignoring the measurement error for both continuous and discrete outcomes. We show that the naive estimator of the regression coefficient of the error-prone covariate is attenuated, while the naive estimators of the regression coefficients of the past responses are generally inflated. We then develop a structural modeling approach for parameter estimation using the maximum likelihood estimation method. In view of the multidimensional integration required by full maximum likelihood estimation, an EM algorithm is developed to calculate maximum likelihood estimators, in which Monte Carlo simulations are used to evaluate the conditional expectations in the E-step. We evaluate the performance of the proposed method through a simulation study and apply it to a longitudinal social support study for elderly women with heart disease. An additional simulation study shows that the Bayesian information criterion (BIC) performs well in choosing the correct transition orders of the models.  相似文献   

17.
When we employ cluster sampling to collect data with matched pairs, the assumption of independence between all matched pairs is not likely true. This paper notes that applying interval estimators, that do not account for the intraclass correlation between matched pairs, to estimate the simple difference between two proportions of response can be quite misleading, especially when both the number of matched pairs per cluster and the intraclass correlation between matched pairs within clusters are large. This paper develops two asymptotic interval estimators of the simple difference, that accommodate the data of cluster sampling with correlated matched pairs. This paper further applies Monte Carlo simulation to compare the finite sample performance of these estimators and demonstrates that the interval estimator, derived from a quadratic equation proposed here, can actually perform quite well in a variety of situations.  相似文献   

18.
Understanding the functional relationship between the sample size and the performance of species richness estimators is necessary to optimize limited sampling resources against estimation error. Nonparametric estimators such as Chao and Jackknife demonstrate strong performances, but consensus is lacking as to which estimator performs better under constrained sampling. We explore a method to improve the estimators under such scenario. The method we propose involves randomly splitting species‐abundance data from a single sample into two equally sized samples, and using an appropriate incidence‐based estimator to estimate richness. To test this method, we assume a lognormal species‐abundance distribution (SAD) with varying coefficients of variation (CV), generate samples using MCMC simulations, and use the expected mean‐squared error as the performance criterion of the estimators. We test this method for Chao, Jackknife, ICE, and ACE estimators. Between abundance‐based estimators with the single sample, and incidence‐based estimators with the split‐in‐two samples, Chao2 performed the best when CV < 0.65, and incidence‐based Jackknife performed the best when CV > 0.65, given that the ratio of sample size to observed species richness is greater than a critical value given by a power function of CV with respect to abundance of the sampled population. The proposed method increases the performance of the estimators substantially and is more effective when more rare species are in an assemblage. We also show that the splitting method works qualitatively similarly well when the SADs are log series, geometric series, and negative binomial. We demonstrate an application of the proposed method by estimating richness of zooplankton communities in samples of ballast water. The proposed splitting method is an alternative to sampling a large number of individuals to increase the accuracy of richness estimations; therefore, it is appropriate for a wide range of resource‐limited sampling scenarios in ecology.  相似文献   

19.
BACKGROUND: The ratio of two measured fluorescence signals (called x and y) is used in different applications in fluorescence microscopy. Multiple instances of both signals can be combined in different ways to construct different ratio estimators. METHODS: The mean and variance of three estimators for the ratio between two random variables, x and y, are discussed. Given n samples of x and y, we can intuitively construct two different estimators: the mean of the ratio of each x and y and the ratio between the mean of x and the mean of y. The former is biased and the latter is only asymptotically unbiased. Using the statistical characteristics of this estimator, a third, unbiased estimator can be constructed. RESULTS: We tested the three estimators on simulated data, real-world fluorescence test images, and comparative genome hybridization (CGH) data. The results on the simulated and real-world test images confirm the presented theory. The CGH experiments show that our new estimator performs better than the existing estimators. CONCLUSIONS: We have derived an unbiased ratio estimator that outperforms intuitive ratio estimators.  相似文献   

20.
The relative risk (RR) is one of the most frequently used indices to measure the strength of association between a disease and a risk factor in etiological studies or the efficacy of an experimental treatment in clinical trials. In this paper, we concentrate attention on interval estimation of RR for sparse data, in which we have only a few patients per stratum, but a moderate or large number of strata. We consider five asymptotic interval estimators for RR, including a weighted least-squares (WLS) interval estimator with an ad hoc adjustment procedure for sparse data, an interval estimator proposed elsewhere for rare events, an interval estimator based on the Mantel-Haenszel (MH) estimator with a logarithmic transformation, an interval estimator calculated from a quadratic equation, and an interval estimator derived from the ratio estimator with a logarithmic transformation. On the basis of Monte Carlo simulations, we evaluate and compare the performance of these five interval estimators in a variety of situations. We note that, except for the cases in which the underlying common RR across strata is around 1, using the WLS interval estimator with the adjustment procedure for sparse data can be misleading. We note further that using the interval estimator suggested elsewhere for rare events tends to be conservative and hence leads to loss of efficiency. We find that the other three interval estimators can consistently perform well even when the mean number of patients for a given treatment is approximately 3 patients per stratum and the number of strata is as small as 20. Finally, we use a mortality data set comparing two chemotherapy treatments in patients with multiple myeloma to illustrate the use of the estimators discussed in this paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号