首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 836 毫秒
1.
Ranked set sampling with unequal samples   总被引:3,自引:0,他引:3  
Bhoj DS 《Biometrics》2001,57(3):957-962
A ranked set sampling procedure with unequal samples (RSSU) is proposed and used to estimate the population mean. This estimator is then compared with the estimators based on the ranked set sampling (RSS) and median ranked set sampling (MRSS) procedures. It is shown that the relative precisions of the estimator based on RSSU are higher than those of the estimators based on RSS and MRSS. An example of estimating the mean diameter at breast height of longleaf-pine trees on the Wade Tract in Thomas County, Georgia, is presented.  相似文献   

2.
Understanding the functional relationship between the sample size and the performance of species richness estimators is necessary to optimize limited sampling resources against estimation error. Nonparametric estimators such as Chao and Jackknife demonstrate strong performances, but consensus is lacking as to which estimator performs better under constrained sampling. We explore a method to improve the estimators under such scenario. The method we propose involves randomly splitting species‐abundance data from a single sample into two equally sized samples, and using an appropriate incidence‐based estimator to estimate richness. To test this method, we assume a lognormal species‐abundance distribution (SAD) with varying coefficients of variation (CV), generate samples using MCMC simulations, and use the expected mean‐squared error as the performance criterion of the estimators. We test this method for Chao, Jackknife, ICE, and ACE estimators. Between abundance‐based estimators with the single sample, and incidence‐based estimators with the split‐in‐two samples, Chao2 performed the best when CV < 0.65, and incidence‐based Jackknife performed the best when CV > 0.65, given that the ratio of sample size to observed species richness is greater than a critical value given by a power function of CV with respect to abundance of the sampled population. The proposed method increases the performance of the estimators substantially and is more effective when more rare species are in an assemblage. We also show that the splitting method works qualitatively similarly well when the SADs are log series, geometric series, and negative binomial. We demonstrate an application of the proposed method by estimating richness of zooplankton communities in samples of ballast water. The proposed splitting method is an alternative to sampling a large number of individuals to increase the accuracy of richness estimations; therefore, it is appropriate for a wide range of resource‐limited sampling scenarios in ecology.  相似文献   

3.
Barabesi L  Pisani C 《Biometrics》2002,58(3):586-592
In practical ecological sampling studies, a certain design (such as plot sampling or line-intercept sampling) is usually replicated more than once. For each replication, the Horvitz-Thompson estimation of the objective parameter is considered. Finally, an overall estimator is achieved by averaging the single Horvitz-Thompson estimators. Because the design replications are drawn independently and under the same conditions, the overall estimator is simply the sample mean of the Horvitz-Thompson estimators under simple random sampling. This procedure may be wisely improved by using ranked set sampling. Hence, we propose the replicated protocol under ranked set sampling, which gives rise to a more accurate estimation than the replicated protocol under simple random sampling.  相似文献   

4.
Unit nonresponse is often a problem in sample surveys. It arises when the values of the survey variable cannot be recorded for some sampled units. In this paper, the use of nonresponse calibration weighting to treat nonresponse is considered in a complete design‐based framework. Nonresponse is viewed as a fixed characteristic of the units. The approach is suitable in environmental and forest surveys when sampled sites cannot be reached by field crews. Approximate expressions of design‐based bias and variance of the calibration estimator are derived and design‐based consistency is investigated. Choice of auxiliary variables to perform calibration is discussed. Sen–Yates–Grundy, Horvitz–Thompson, and jackknife estimators of the sampling variance are proposed. Analytical and Monte Carlo results demonstrate the validity of the procedure when the relationship between survey and auxiliary variables is similar in respondent and nonrespondent strata. An application to a forest survey performed in Northeastern Italy is considered.  相似文献   

5.
The existence of missing observations when the difference of means is estimated determines the need of sub sampling among the non respondents. Ranked set sampling is used for sub sampling. The information provided on one of the variables by the non respondents at the first attempt permits to rank them. The behavior of a ranked set sampling model with respect to other alternattives is studied in this paper. An unbiased estimator is derived and its expected variance is obtained. The proposed model is compared with the use of simple random sampling and Two‐phase sampling for stratification.  相似文献   

6.
Precision of the estimate of the population mean using ranked set sample (RSS) relative to using simple random sample (SRS), with the same number of quantified units, depends upon the population and success in ranking. In practice, even ranking a sample of moderate size and observing the ith ranked unit (other than the extremes) is a difficult task. Therefore, in this paper we introduce a variety of extreme ranked set sample (ERSSs) to estimate the population mean. ERSSs is more practical than the ordinary ranked set sampling, since in case of even sample size we need to identify successfully only the first and/or the last ordered unit or in case of odd sample size the median unit. We show that ERSSs gives an unbiased estimate of the population mean in case of symmetric populations and it is more efficient than SRS, using the same number of quantified units. Example using real data is given. Also, parametric examples are given.  相似文献   

7.
A nonparametric selected ranked set sampling is suggested. The estimator of population mean based on the new approach is compared with that using the simple random sampling (SRS), the ranked set sampling (RSS) and the median ranked set sampling (MRSS) methods. The estimator of population mean using the new approach is found to be more efficient than its counter‐parts for almost all the cases considered.  相似文献   

8.
Wang YG  Chen Z  Liu J 《Biometrics》2004,60(2):556-561
Nahhas, Wolfe, and Chen (2002, Biometrics58, 964-971) considered optimal set size for ranked set sampling (RSS) with fixed operational costs. This framework can be very useful in practice to determine whether RSS is beneficial and to obtain the optimal set size that minimizes the variance of the population estimator for a fixed total cost. In this article, we propose a scheme of general RSS in which more than one observation can be taken from each ranked set. This is shown to be more cost-effective in some cases when the cost of ranking is not so small. We demonstrate using the example in Nahhas, Wolfe, and Chen (2002, Biometrics58, 964-971), by taking two or more observations from one set even with the optimal set size from the RSS design can be more beneficial.  相似文献   

9.
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao''s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (‘Hill diversities''), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao''s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.  相似文献   

10.
AGARWAL and KUMAR (1980) proposed an estimator, combining ratio and pps estimators of population mean and proved that the proposed estimator would always be better (in minimum mean square error sense) than the pps estimator or the ratio estimator under pps sampling scheme for optimum value of constant k (parameter). The optimum value of k is rarely known in practice, hence the alternative is to replace k from the sample-values. In this paper, an estimator depending on estimated optimum value of k based on sample-values, under pps sampling scheme is proposed and studied.  相似文献   

11.
When clustered multinomial responses are fit using the generalized logistic link, Morel (1989) introduced a small sample correction in the Taylor series based estimator of the covariance matrix of the parameter estimates. The correction reduces the bias of the Type I error rates in small samples and guarantees positive definiteness of the estimated variance‐covariance matrix. It is well known that small sample bias in the use of the Delta method persists in any application of the Generalized Estimating Equations (GEE) methodology. In this article, we extend the correction originally suggested for the generalized logistic link, to other link functions and distributions, when parameters are estimated by GEE. In a Monte Carlo study with correlated data generated under different sampling schemes, the small sample correction has been shown to be effective in reducing the Type I error rates when the number of clusters is relatively small.  相似文献   

12.
Lakhal L  Rivest LP  Abdous B 《Biometrics》2008,64(1):180-188
Summary .   In many follow-up studies, patients are subject to concurrent events. In this article, we consider semicompeting risks data as defined by Fine, Jiang, and Chappell (2001, Biometrika 88 , 907–919) where one event is censored by the other but not vice versa. The proposed model involves marginal survival functions for the two events and a parametric family of copulas for their dependency. This article suggests a general method for estimating the dependence parameter when the dependency is modeled with an Archimedean copula. It uses the copula-graphic estimator of Zheng and Klein (1995, Biometrika 82 , 127–138) for estimating the survival function of the nonterminal event, subject to dependent censoring. Asymptotic properties of these estimators are derived. Simulations show that the new methods work well with finite samples. The copula-graphic estimator is shown to be more accurate than the estimator proposed by Fine et al. (2001) ; its performances are similar to those of the self-consistent estimator of Jiang, Fine, Kosorok, and Chappell (2005, Scandinavian Journal of Statistics 33, 1–20). The analysis of a data set, emphasizing the estimation of characteristics of the observable region, is presented as an illustration.  相似文献   

13.
Ranked set sampling is a method which may be used to increase the efficiency of the estimator of the mean of a population. Ranked set sampling with size biased probability of selection (i.e., the items are selected with probability proportion to its size) is combined with the line intercept method to increase the efficency of estimating cover, density and total amount of some variable of interest (e.g. biomass). A two-stage sampling plan is suggested with line intercept sampling in the first stage. Simple random sampling and ranked set sampling are compared in the second stage to show that the unbiased estimators of density, cover and total amount of some variable of interest based on ranked set sampling have smaller variances than the usual unbiased estimator based on simple random sampling. Efficiency is increased by reducing the number of items which are measured on a transect or by increasing the number of independent transects utilized in a study area. An application procedure is given for estimation of coverage, density and number of stems of mountain mahogany (Cercocarpus montanus) in a study area east of Laramie, Wyoming.  相似文献   

14.
Wang X  Lim J  Stokes L 《Biometrics》2008,64(2):355-363
Summary .   MacEachern, Stasny, and Wolfe (2004, Biometrics 60 , 207–215) introduced a data collection method, called judgment poststratification (JPS), based on ideas similar to those in ranked set sampling, and proposed methods for mean estimation from JPS samples. In this article, we propose an improvement to their methods, which exploits the fact that the distributions of the judgment poststrata are often stochastically ordered, so as to form a mean estimator using isotonized sample means of the poststrata. This new estimator is strongly consistent with similar asymptotic properties to those in MacEachern et al. (2004) . It is shown to be more efficient for small sample sizes, which appears to be attractive in applications requiring cost efficiency. Further, we extend our method to JPS samples with imprecise ranking or multiple rankers. The performance of the proposed estimators is examined on three data examples through simulation.  相似文献   

15.
Median ranked set sampling may be combined with size biased probability of selection. A two-phase sample is assumed. In the first phase, units are selected with probability proportional to their size. In the second phase, units are selected using median ranked set sampling to increase the efficiency of the estimators relative to simple random sampling. There is also an increase in the efficiency relative to ranked set sampling (for some probability distribution functions). There will be a loss in efficiency depending on the amount of errors in ranking the units, the median ranked set sampling can be used to reduce the errors in ranking the units selected from the population. Estimators of the population mean and the population size are considered. The median ranked set sampling with probability proportion to size and with errors in ranking is considered and compared with ranked set sampling with errors in ranking. Computer simulation results for some probability distributions are also given.  相似文献   

16.
Robust estimation of a location parameter is considered when the data from an unknown symmetric population are subject to arbitrary right-censorship. Comparisons are made between various M-estimators, several L-estimators (trimmed means), and the Kaplan-Meier median. Ten sampling distributions, two uniform censoring distributions, and three sample sizes are examined. A Cauchy censoring distribution is also considered when the sample size is equal to twenty for each of the ten sampling distributions. Performance is based on the estimated mean square error.  相似文献   

17.
We study relationships between extreme ranked set samples (ERSSs) and median ranked set sample (MRSS) with simple random sample (SRS). For a random variable X, we show that the distribution function estimator when using ERSSs and MRSS are more efficient than when using SRS and ranked set sampling for some values of a given x. It is shown that using ERSSs can reduce the necessary sample size by a factor of 1.33 to 4 when estimating the median of the distribution. Asymptotic results for the estimation of the distribution function is given for the center of the distribution function. Data on the bilirubin level of babies in neonatal intensive care is used to illustrate the method.  相似文献   

18.
In the era of big data, univariate models have widely been used as a workhorse tool for quickly producing marginal estimators; and this is true even when in a high-dimensional dense setting, in which many features are “true,” but weak signals. Genome-wide association studies (GWAS) epitomize this type of setting. Although the GWAS marginal estimator is popular, it has long been criticized for ignoring the correlation structure of genetic variants (i.e., the linkage disequilibrium [LD] pattern). In this paper, we study the effects of LD pattern on the GWAS marginal estimator and investigate whether or not additionally accounting for the LD can improve the prediction accuracy of complex traits. We consider a general high-dimensional dense setting for GWAS and study a class of ridge-type estimators, including the popular marginal estimator and the best linear unbiased prediction (BLUP) estimator as two special cases. We show that the performance of GWAS marginal estimator depends on the LD pattern through the first three moments of its eigenvalue distribution. Furthermore, we uncover that the relative performance of GWAS marginal and BLUP estimators highly depends on the ratio of GWAS sample size over the number of genetic variants. Particularly, our finding reveals that the marginal estimator can easily become near-optimal within this class when the sample size is relatively small, even though it ignores the LD pattern. On the other hand, BLUP estimator has substantially better performance than the marginal estimator as the sample size increases toward the number of genetic variants, which is typically in millions. Therefore, adjusting for the LD (such as in the BLUP) is most needed when GWAS sample size is large. We illustrate the importance of our results by using the simulated data and real GWAS.  相似文献   

19.
The relative risk (RR) is one of the most frequently used indices to measure the strength of association between a disease and a risk factor in etiological studies or the efficacy of an experimental treatment in clinical trials. In this paper, we concentrate attention on interval estimation of RR for sparse data, in which we have only a few patients per stratum, but a moderate or large number of strata. We consider five asymptotic interval estimators for RR, including a weighted least-squares (WLS) interval estimator with an ad hoc adjustment procedure for sparse data, an interval estimator proposed elsewhere for rare events, an interval estimator based on the Mantel-Haenszel (MH) estimator with a logarithmic transformation, an interval estimator calculated from a quadratic equation, and an interval estimator derived from the ratio estimator with a logarithmic transformation. On the basis of Monte Carlo simulations, we evaluate and compare the performance of these five interval estimators in a variety of situations. We note that, except for the cases in which the underlying common RR across strata is around 1, using the WLS interval estimator with the adjustment procedure for sparse data can be misleading. We note further that using the interval estimator suggested elsewhere for rare events tends to be conservative and hence leads to loss of efficiency. We find that the other three interval estimators can consistently perform well even when the mean number of patients for a given treatment is approximately 3 patients per stratum and the number of strata is as small as 20. Finally, we use a mortality data set comparing two chemotherapy treatments in patients with multiple myeloma to illustrate the use of the estimators discussed in this paper.  相似文献   

20.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号