首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wang X  Wang K  Lim J 《Biometrics》2012,68(1):194-202
In applications that require cost efficiency, sample sizes are typically small so that the problem of empty strata may often occur in judgment poststratification (JPS), an important variant of balanced ranked set sampling. In this article, we consider estimation of population cumulative distribution functions (CDF) from JPS samples with empty strata. In the literature, the standard and restricted CDF estimators (Stokes and Sager, 1988, Journal of the American Statistical Association 83, 374381; Frey and Ozturk, 2011, Annals of the Institute of Statistical Mathematics, to appear) do not perform well when simply ignoring empty strata. In this article, we show that the original isotonized estimator (Ozturk, 2007, Journal of Nonparametric Statistics 19, 131-144) can handle empty strata automatically through two methods, MinMax and MaxMin. However, blindly using them can result in undesirable results in either tail of the CDF. We thoroughly examine MinMax and MaxMin and find interesting results about their behaviors and performance in the presence of empty strata. Motivated by these results, we propose modified isotonized estimators to improve estimation efficiency. Through simulation and empirical studies, we show that our estimators work well in different regions of the CDF, and also improve the overall performance of estimating the whole function.  相似文献   

2.
Ranked set sampling where sampling is based on visual judgment of the differences between the sizes of pairs of units or on a concomitant variable is reviewed. An alternative model for judgment ranking based on ratios of sizes of pairs of units is presented. Computation of the variance of a visual ranked set sampling estimator of the mean of a distribution is enabled via maximum likelihood estimation of the visual judgment error variance. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
Lakhal L  Rivest LP  Abdous B 《Biometrics》2008,64(1):180-188
Summary .   In many follow-up studies, patients are subject to concurrent events. In this article, we consider semicompeting risks data as defined by Fine, Jiang, and Chappell (2001, Biometrika 88 , 907–919) where one event is censored by the other but not vice versa. The proposed model involves marginal survival functions for the two events and a parametric family of copulas for their dependency. This article suggests a general method for estimating the dependence parameter when the dependency is modeled with an Archimedean copula. It uses the copula-graphic estimator of Zheng and Klein (1995, Biometrika 82 , 127–138) for estimating the survival function of the nonterminal event, subject to dependent censoring. Asymptotic properties of these estimators are derived. Simulations show that the new methods work well with finite samples. The copula-graphic estimator is shown to be more accurate than the estimator proposed by Fine et al. (2001) ; its performances are similar to those of the self-consistent estimator of Jiang, Fine, Kosorok, and Chappell (2005, Scandinavian Journal of Statistics 33, 1–20). The analysis of a data set, emphasizing the estimation of characteristics of the observable region, is presented as an illustration.  相似文献   

4.
1. The most straightforward way to assess diversity in a site is the species count. However, a relatively large sample is needed for a reliable result because of the presence of many rare species in rich assemblages. The use of richness estimation methods is suggested by many authors as a solution for this problem in many cases.
2. We examined the performance of 13 methods for estimating richness of stream macroinvertebrates inhabiting riffles both at local (stream) and regional (catchment) scales. The evaluation was based on (1) the smallest sub-sample size needed to estimate total richness in the sample, (2) constancy of this size, (3) lack of erratic behaviour in curve shape and (4) similarity in curve shape through different data sets. Samples were from three single stream sites (local) and three from several streams within the same catchment basin (regional). All collections were made from protected forest areas in south-east Brazil.
3. All estimation methods were dependent on sub-sample size, producing higher estimates when using larger sub-sample sizes. The Stout and Vandermeer method estimated total richness in the samples with the smallest sub-sample size, but showed some erratic behaviour at small sub-sample sizes, and the estimated curves were not similar among the six samples. The Bootstrap method was the best estimator in relation to constancy of sub-sample sizes, but needed an unacceptably large sub-sample to estimate total richness in the samples. The second order Jackknife method was the second best estimator both for minimum sub-sample size and constancy of this size and we suggest its use in future studies of diversity in tropical streams. Despite the inferior performance of several other methods, some produced acceptable results. Comments are made on the utility of using these estimators for predicting species richness in an area and for comparative purposes in diversity studies.  相似文献   

5.
Summary The validity of limiting dilution assays can be compromised or negated by the use of statistical methodology which does not consider all issues surrounding the biological process. This study critically evaluates statistical methods for estimating the mean frequency of responding cells in multiple sample limiting dilution assays. We show that methods that pool limiting dilution assay data, or samples, are unable to estimate the variance appropriately. In addition, we use Monte Carlo simulations to evaluate an unweighted mean of the maximum likelihood estimator, an unweighted mean based on the jackknife estimator, and a log transform of the maximum likelihood estimator. For small culture replicate size, the log transform outperforms both unweighted mean procedures. For moderate culture replicate size, the unweighted mean based on the jackknife produces the most acceptable results. This study also addresses the important issue of experimental design in multiple sample limiting dilution assays. In particular, we demonstrate that optimization of multiple sample limiting dilution assays is achieved by increasing the number of biological samples at the expense of repeat cultures.  相似文献   

6.
Cai J  Sen PK  Zhou H 《Biometrics》1999,55(1):182-189
A random effects model for analyzing multivariate failure time data is proposed. The work is motivated by the need for assessing the mean treatment effect in a multicenter clinical trial study, assuming that the centers are a random sample from an underlying population. An estimating equation for the mean hazard ratio parameter is proposed. The proposed estimator is shown to be consistent and asymptotically normally distributed. A variance estimator, based on large sample theory, is proposed. Simulation results indicate that the proposed estimator performs well in finite samples. The proposed variance estimator effectively corrects the bias of the naive variance estimator, which assumes independence of individuals within a group. The methodology is illustrated with a clinical trial data set from the Studies of Left Ventricular Dysfunction. This shows that the variability of the treatment effect is higher than found by means of simpler models.  相似文献   

7.
Metagenomic sequencing projects from environments dominated by a small number of species produce genome-wide population samples. We present a two-site composite likelihood estimator of the scaled recombination rate, ρ = 2Nec, that operates on metagenomic assemblies in which each sequenced fragment derives from a different individual. This new estimator properly accounts for sequencing error, as quantified by per-base quality scores, and missing data, as inferred from the placement of reads in a metagenomic assembly. We apply our estimator to data from a sludge metagenome project to demonstrate how this method will elucidate the rates of exchange of genetic material in natural microbial populations. Surprisingly, for a fixed amount of sequencing, this estimator has lower variance than similar methods that operate on more traditional population genetic samples of comparable size. In addition, we can infer variation in recombination rate across the genome because metagenomic projects sample genetic diversity genome-wide, not just at particular loci. The method itself makes no assumption specific to microbial populations, opening the door for application to any mixed population sample where the number of individuals sampled is much greater than the number of fragments sequenced.  相似文献   

8.
Understanding the functional relationship between the sample size and the performance of species richness estimators is necessary to optimize limited sampling resources against estimation error. Nonparametric estimators such as Chao and Jackknife demonstrate strong performances, but consensus is lacking as to which estimator performs better under constrained sampling. We explore a method to improve the estimators under such scenario. The method we propose involves randomly splitting species‐abundance data from a single sample into two equally sized samples, and using an appropriate incidence‐based estimator to estimate richness. To test this method, we assume a lognormal species‐abundance distribution (SAD) with varying coefficients of variation (CV), generate samples using MCMC simulations, and use the expected mean‐squared error as the performance criterion of the estimators. We test this method for Chao, Jackknife, ICE, and ACE estimators. Between abundance‐based estimators with the single sample, and incidence‐based estimators with the split‐in‐two samples, Chao2 performed the best when CV < 0.65, and incidence‐based Jackknife performed the best when CV > 0.65, given that the ratio of sample size to observed species richness is greater than a critical value given by a power function of CV with respect to abundance of the sampled population. The proposed method increases the performance of the estimators substantially and is more effective when more rare species are in an assemblage. We also show that the splitting method works qualitatively similarly well when the SADs are log series, geometric series, and negative binomial. We demonstrate an application of the proposed method by estimating richness of zooplankton communities in samples of ballast water. The proposed splitting method is an alternative to sampling a large number of individuals to increase the accuracy of richness estimations; therefore, it is appropriate for a wide range of resource‐limited sampling scenarios in ecology.  相似文献   

9.
The Petersen–Lincoln estimator has been used to estimate the size of a population in a single mark release experiment. However, the estimator is not valid when the capture sample and recapture sample are not independent. We provide an intuitive interpretation for “independence” between samples based on 2 × 2 categorical data formed by capture/non‐capture in each of the two samples. From the interpretation, we review a general measure of “dependence” and quantify the correlation bias of the Petersen–Lincoln estimator when two types of dependences (local list dependence and heterogeneity of capture probability) exist. An important implication in the census undercount problem is that instead of using a post enumeration sample to assess the undercount of a census, one should conduct a prior enumeration sample to avoid correlation bias. We extend the Petersen–Lincoln method to the case of two populations. This new estimator of the size of the shared population is proposed and its variance is derived. We discuss a special case where the correlation bias of the proposed estimator due to dependence between samples vanishes. The proposed method is applied to a study of the relapse rate of illicit drug use in Taiwan. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

10.
In the capture‐recapture problem for two independent samples, the traditional estimator, calculated as the product of the two sample sizes divided by the number of sampled subjects appearing commonly in both samples, is well known to be a biased estimator of the population size and have no finite variance under direct or binomial sampling. To alleviate these theoretical limitations, the inverse sampling, in which we continue sampling subjects in the second sample until we obtain a desired number of marked subjects who appeared in the first sample, has been proposed elsewhere. In this paper, we consider five interval estimators of the population size, including the most commonly‐used interval estimator using Wald's statistic, the interval estimator using the logarithmic transformation, the interval estimator derived from a quadratic equation developed here, the interval estimator using the χ2‐approximation, and the interval estimator based on the exact negative binomial distribution. To evaluate and compare the finite sample performance of these estimators, we employ Monte Carlo simulation to calculate the coverage probability and the standardized average length of the resulting confidence intervals in a variety of situations. To study the location of these interval estimators, we calculate the non‐coverage probability in the two tails of the confidence intervals. Finally, we briefly discuss the optimal sample size determination for a given precision to minimize the expected total cost. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

11.
Du J  MacEachern SN 《Biometrics》2008,64(2):345-354
Summary .   In many scientific studies, information that is not easily translated into covariates is ignored in the analysis. However, this type of information may significantly improve inference. In this research, we apply the idea of judgment post-stratification to utilize such information. Specifically, we consider experiments that are conducted under a completely randomized design. Sets of experimental units are formed, and the units in a set are ranked. Estimation is performed conditional on the sets and ranks. We propose a new estimator for a treatment contrast. We improve the new estimator by Rao–Blackwellization. Asymptotic distribution theory and corresponding inferential procedures for both estimators are developed. Simulation studies quantify the superiority of the new estimators and show their desirable properties for small and moderate sample sizes. The impact of the new techniques is illustrated with data from a clinical trial.  相似文献   

12.
Summary .   Standard prospective logistic regression analysis of case–control data often leads to very imprecise estimates of gene-environment interactions due to small numbers of cases or controls in cells of crossing genotype and exposure. In contrast, under the assumption of gene-environment independence, modern "retrospective" methods, including the "case-only" approach, can estimate the interaction parameters much more precisely, but they can be seriously biased when the underlying assumption of gene-environment independence is violated. In this article, we propose a novel empirical Bayes-type shrinkage estimator to analyze case–control data that can relax the gene-environment independence assumption in a data-adaptive fashion. In the special case, involving a binary gene and a binary exposure, the method leads to an estimator of the interaction log odds ratio parameter in a simple closed form that corresponds to an weighted average of the standard case-only and case–control estimators. We also describe a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005, Biometrika 92, 399–418). Both simulated and real data examples suggest that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study.  相似文献   

13.
Zhou XH  Tu W 《Biometrics》2000,56(4):1118-1125
In this paper, we consider the problem of interval estimation for the mean of diagnostic test charges. Diagnostic test charge data may contain zero values, and the nonzero values can often be modeled by a log-normal distribution. Under such a model, we propose three different interval estimation procedures: a percentile-t bootstrap interval based on sufficient statistics and two likelihood-based confidence intervals. For theoretical properties, we show that the two likelihood-based one-sided confidence intervals are only first-order accurate and that the bootstrap-based one-sided confidence interval is second-order accurate. For two-sided confidence intervals, all three proposed methods are second-order accurate. A simulation study in finite-sample sizes suggests all three proposed intervals outperform a widely used minimum variance unbiased estimator (MVUE)-based interval except for the case of one-sided lower end-point intervals when the skewness is very small. Among the proposed one-sided intervals, the bootstrap interval has the best coverage accuracy. For the two-sided intervals, when the sample size is small, the bootstrap method still yields the best coverage accuracy unless the skewness is very small, in which case the bias-corrected ML method has the best accuracy. When the sample size is large, all three proposed intervals have similar coverage accuracy. Finally, we analyze with the proposed methods one real example assessing diagnostic test charges among older adults with depression.  相似文献   

14.
Outcome misclassification occurs frequently in binary-outcome studies and can result in biased estimation of quantities such as the incidence, prevalence, cause-specific hazards, cumulative incidence functions, and so forth. A number of remedies have been proposed to address the potential misclassification of the outcomes in such data. The majority of these remedies lie in the estimation of misclassification probabilities, which are in turn used to adjust analyses for outcome misclassification. A number of authors advocate using a gold-standard procedure on a sample internal to the study to learn about the extent of the misclassification. With this type of internal validation, the problem of quantifying the misclassification also becomes a missing data problem as, by design, the true outcomes are only ascertained on a subset of the entire study sample. Although, the process of estimating misclassification probabilities appears simple conceptually, the estimation methods proposed so far have several methodological and practical shortcomings. Most methods rely on missing outcome data to be missing completely at random (MCAR), a rather stringent assumption which is unlikely to hold in practice. Some of the existing methods also tend to be computationally-intensive. To address these issues, we propose a computationally-efficient, easy-to-implement, pseudo-likelihood estimator of the misclassification probabilities under a missing at random (MAR) assumption, in studies with an available internal-validation sample. We present the estimator through the lens of studies with competing-risks outcomes, though the estimator extends beyond this setting. We describe the consistency and asymptotic distributional properties of the resulting estimator, and derive a closed-form estimator of its variance. The finite-sample performance of this estimator is evaluated via simulations. Using data from a real-world study with competing-risks outcomes, we illustrate how the proposed method can be used to estimate misclassification probabilities. We also show how the estimated misclassification probabilities can be used in an external study to adjust for possible misclassification bias when modeling cumulative incidence functions.  相似文献   

15.
MOTIVATION: False discovery rate (FDR) is defined as the expected percentage of false positives among all the claimed positives. In practice, with the true FDR unknown, an estimated FDR can serve as a criterion to evaluate the performance of various statistical methods under the condition that the estimated FDR approximates the true FDR well, or at least, it does not improperly favor or disfavor any particular method. Permutation methods have become popular to estimate FDR in genomic studies. The purpose of this paper is 2-fold. First, we investigate theoretically and empirically whether the standard permutation-based FDR estimator is biased, and if so, whether the bias inappropriately favors or disfavors any method. Second, we propose a simple modification of the standard permutation to yield a better FDR estimator, which can in turn serve as a more fair criterion to evaluate various statistical methods. RESULTS: Both simulated and real data examples are used for illustration and comparison. Three commonly used test statistics, the sample mean, SAM statistic and Student's t-statistic, are considered. The results show that the standard permutation method overestimates FDR. The overestimation is the most severe for the sample mean statistic while the least for the t-statistic with the SAM-statistic lying between the two extremes, suggesting that one has to be cautious when using the standard permutation-based FDR estimates to evaluate various statistical methods. In addition, our proposed FDR estimation method is simple and outperforms the standard method.  相似文献   

16.
There are two cases in double sampling; case(i) when the second sample is a sub-sample from preliminary large sample, and case(ii) when the second sample is not a sub-sample from the preliminary large sample. Recently SISODIA and DWIVEDI (1981) proposed a ratio cum product-type estimator in double sampling in which they have studied the properties of this estimator under case (i). In this paper, we have made an attempt to study the properties of the same estimator under case (ii). It is found that the estimator is superior than double sampling linear regression estimator, usual ratio estimator, product estimator and among others. The estimator is also compared with simple mean per unit for a given cost of the survey.  相似文献   

17.
Song X  Wang CY 《Biometrics》2008,64(2):557-566
Summary .   We study joint modeling of survival and longitudinal data. There are two regression models of interest. The primary model is for survival outcomes, which are assumed to follow a time-varying coefficient proportional hazards model. The second model is for longitudinal data, which are assumed to follow a random effects model. Based on the trajectory of a subject's longitudinal data, some covariates in the survival model are functions of the unobserved random effects. Estimated random effects are generally different from the unobserved random effects and hence this leads to covariate measurement error. To deal with covariate measurement error, we propose a local corrected score estimator and a local conditional score estimator. Both approaches are semiparametric methods in the sense that there is no distributional assumption needed for the underlying true covariates. The estimators are shown to be consistent and asymptotically normal. However, simulation studies indicate that the conditional score estimator outperforms the corrected score estimator for finite samples, especially in the case of relatively large measurement error. The approaches are demonstrated by an application to data from an HIV clinical trial.  相似文献   

18.
Motivated by recent work involving the analysis of biomedical imaging data, we present a novel procedure for constructing simultaneous confidence corridors for the mean of imaging data. We propose to use flexible bivariate splines over triangulations to handle an irregular domain of the images that is common in brain imaging studies and in other biomedical imaging applications. The proposed spline estimators of the mean functions are shown to be consistent and asymptotically normal under some regularity conditions. We also provide a computationally efficient estimator of the covariance function and derive its uniform consistency. The procedure is also extended to the two-sample case in which we focus on comparing the mean functions from two populations of imaging data. Through Monte Carlo simulation studies, we examine the finite sample performance of the proposed method. Finally, the proposed method is applied to analyze brain positron emission tomography data in two different studies. One data set used in preparation of this article was obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.  相似文献   

19.
Qin GY  Zhu ZY 《Biometrics》2009,65(1):52-59
Summary .  In this article, we study the robust estimation of both mean and variance components in generalized partial linear mixed models based on the construction of robustified likelihood function. Under some regularity conditions, the asymptotic properties of the proposed robust estimators are shown. Some simulations are carried out to investigate the performance of the proposed robust estimators. Just as expected, the proposed robust estimators perform better than those resulting from robust estimating equations involving conditional expectation like Sinha (2004, Journal of the American Statistical Association 99, 451–460) and Qin and Zhu (2007, Journal of Multivariate Analysis 98, 1658–1683). In the end, the proposed robust method is illustrated by the analysis of a real data set.  相似文献   

20.
Summary .  This article considers the problem of assessing causal effect moderation in longitudinal settings in which treatment (or exposure) is time varying and so are the covariates said to moderate its effect.  Intermediate causal effects  that describe time-varying causal effects of treatment conditional on past covariate history are introduced and considered as part of Robins' structural nested mean model. Two estimators of the intermediate causal effects, and their standard errors, are presented and discussed: The first is a proposed two-stage regression estimator. The second is Robins' G-estimator. The results of a small simulation study that begins to shed light on the small versus large sample performance of the estimators, and on the bias–variance trade-off between the two estimators are presented. The methodology is illustrated using longitudinal data from a depression study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号