首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 968 毫秒
1.
Chen H  Stasny EA  Wolfe DA 《Biometrics》2006,62(1):150-158
The application of ranked set sampling (RSS) techniques to data from a dichotomous population is currently an active research topic, and it has been shown that balanced RSS leads to improvement in precision over simple random sampling (SRS) for estimation of a population proportion. Balanced RSS, however, is not in general optimal in terms of variance reduction for this setting. The objective of this article is to investigate the application of unbalanced RSS in estimation of a population proportion under perfect ranking, where the probabilities of success for the order statistics are functions of the underlying population proportion. In particular, the Neyman allocation, which assigns sample units for each order statistic proportionally to its standard deviation, is shown to be optimal in the sense that it leads to minimum variance within the class of RSS estimators that are simple averages of the means of the order statistics. We also use a substantial data set, the National Health and Nutrition Examination Survey III (NHANES III) data, to demonstrate the feasibility and benefits of Neyman allocation in RSS for binary variables.  相似文献   

2.
Wang YG  Chen Z  Liu J 《Biometrics》2004,60(2):556-561
Nahhas, Wolfe, and Chen (2002, Biometrics58, 964-971) considered optimal set size for ranked set sampling (RSS) with fixed operational costs. This framework can be very useful in practice to determine whether RSS is beneficial and to obtain the optimal set size that minimizes the variance of the population estimator for a fixed total cost. In this article, we propose a scheme of general RSS in which more than one observation can be taken from each ranked set. This is shown to be more cost-effective in some cases when the cost of ranking is not so small. We demonstrate using the example in Nahhas, Wolfe, and Chen (2002, Biometrics58, 964-971), by taking two or more observations from one set even with the optimal set size from the RSS design can be more beneficial.  相似文献   

3.
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure.  相似文献   

4.
Median ranked set sampling may be combined with size biased probability of selection. A two-phase sample is assumed. In the first phase, units are selected with probability proportional to their size. In the second phase, units are selected using median ranked set sampling to increase the efficiency of the estimators relative to simple random sampling. There is also an increase in the efficiency relative to ranked set sampling (for some probability distribution functions). There will be a loss in efficiency depending on the amount of errors in ranking the units, the median ranked set sampling can be used to reduce the errors in ranking the units selected from the population. Estimators of the population mean and the population size are considered. The median ranked set sampling with probability proportion to size and with errors in ranking is considered and compared with ranked set sampling with errors in ranking. Computer simulation results for some probability distributions are also given.  相似文献   

5.
Precision of the estimate of the population mean using ranked set sample (RSS) relative to using simple random sample (SRS), with the same number of quantified units, depends upon the population and success in ranking. In practice, even ranking a sample of moderate size and observing the ith ranked unit (other than the extremes) is a difficult task. Therefore, in this paper we introduce a variety of extreme ranked set sample (ERSSs) to estimate the population mean. ERSSs is more practical than the ordinary ranked set sampling, since in case of even sample size we need to identify successfully only the first and/or the last ordered unit or in case of odd sample size the median unit. We show that ERSSs gives an unbiased estimate of the population mean in case of symmetric populations and it is more efficient than SRS, using the same number of quantified units. Example using real data is given. Also, parametric examples are given.  相似文献   

6.
Ranked set sampling with unequal samples   总被引:3,自引:0,他引:3  
Bhoj DS 《Biometrics》2001,57(3):957-962
A ranked set sampling procedure with unequal samples (RSSU) is proposed and used to estimate the population mean. This estimator is then compared with the estimators based on the ranked set sampling (RSS) and median ranked set sampling (MRSS) procedures. It is shown that the relative precisions of the estimator based on RSSU are higher than those of the estimators based on RSS and MRSS. An example of estimating the mean diameter at breast height of longleaf-pine trees on the Wade Tract in Thomas County, Georgia, is presented.  相似文献   

7.
Chen Z  Wang YG 《Biometrics》2004,60(4):997-1004
This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.  相似文献   

8.
Ranked set sampling where sampling is based on visual judgment of the differences between the sizes of pairs of units or on a concomitant variable is reviewed. An alternative model for judgment ranking based on ratios of sizes of pairs of units is presented. Computation of the variance of a visual ranked set sampling estimator of the mean of a distribution is enabled via maximum likelihood estimation of the visual judgment error variance. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

9.
A diagnostic cut‐off point of a biomarker measurement is needed for classifying a random subject to be either diseased or healthy. However, the cut‐off point is usually unknown and needs to be estimated by some optimization criteria. One important criterion is the Youden index, which has been widely adopted in practice. The Youden index, which is defined as the maximum of (sensitivity + specificity ?1), directly measures the largest total diagnostic accuracy a biomarker can achieve. Therefore, it is desirable to estimate the optimal cut‐off point associated with the Youden index. Sometimes, taking the actual measurements of a biomarker is very difficult and expensive, while ranking them without the actual measurement can be relatively easy. In such cases, ranked set sampling can give more precise estimation than simple random sampling, as ranked set samples are more likely to span the full range of the population. In this study, kernel density estimation is utilized to numerically solve for an estimate of the optimal cut‐off point. The asymptotic distributions of the kernel estimators based on two sampling schemes are derived analytically and we prove that the estimators based on ranked set sampling are relatively more efficient than that of simple random sampling and both estimators are asymptotically unbiased. Furthermore, the asymptotic confidence intervals are derived. Intensive simulations are carried out to compare the proposed method using ranked set sampling with simple random sampling, with the proposed method outperforming simple random sampling in all cases. A real data set is analyzed for illustrating the proposed method.  相似文献   

10.
Wang X  Lim J  Stokes L 《Biometrics》2008,64(2):355-363
Summary .   MacEachern, Stasny, and Wolfe (2004, Biometrics 60 , 207–215) introduced a data collection method, called judgment poststratification (JPS), based on ideas similar to those in ranked set sampling, and proposed methods for mean estimation from JPS samples. In this article, we propose an improvement to their methods, which exploits the fact that the distributions of the judgment poststrata are often stochastically ordered, so as to form a mean estimator using isotonized sample means of the poststrata. This new estimator is strongly consistent with similar asymptotic properties to those in MacEachern et al. (2004) . It is shown to be more efficient for small sample sizes, which appears to be attractive in applications requiring cost efficiency. Further, we extend our method to JPS samples with imprecise ranking or multiple rankers. The performance of the proposed estimators is examined on three data examples through simulation.  相似文献   

11.
Ranked set sampling (RSS) as suggested by McIntyre (1952) may be modified to introduced a new sampling method called pair rank set sampling (PRSS), which might be used in some area of application instead of the RSS to increase the efficiency of the estimators relative to the simple random sampling (SRS) method. Estimators of the population mean are considered. An example using real data is presented to illustrate computations.  相似文献   

12.
A nonparametric selected ranked set sampling is suggested. The estimator of population mean based on the new approach is compared with that using the simple random sampling (SRS), the ranked set sampling (RSS) and the median ranked set sampling (MRSS) methods. The estimator of population mean using the new approach is found to be more efficient than its counter‐parts for almost all the cases considered.  相似文献   

13.
In recent years in silico protein structure prediction reached a level where fully automated servers can generate large pools of near‐native structures. However, the identification and further refinement of the best structures from the pool of models remain problematic. To address these issues, we have developed (i) a target‐specific selective refinement (SR) protocol; and (ii) molecular dynamics (MD) simulation based ranking (SMDR) method. In SR the all‐atom refinement of structures is accomplished via the Rosetta Relax protocol, subject to specific constraints determined by the size and complexity of the target. The best‐refined models are selected with SMDR by testing their relative stability against gradual heating through all‐atom MD simulations. Through extensive testing we have found that Mufold‐MD, our fully automated protein structure prediction server updated with the SR and SMDR modules consistently outperformed its previous versions. Proteins 2015; 83:1823–1835. © 2015 Wiley Periodicals, Inc.  相似文献   

14.
Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient.  相似文献   

15.
Measurement error of explanatory variables used in sightability models can result in biased population estimates and associated measures of precision. We developed a Monte Carlo simulation procedure that can be implemented within the sightability model framework when measurement error is present. Additionally, we developed simulation and sample survey methods, for determining the optimal allocation of survey effort to maximize precision of population estimates for a fixed survey cost, when a complete survey of a study area is not feasible. We used data from aerial surveys of elk during 2004–2006 in Michigan to demonstrate the application of these techniques. By accounting for measurement error and applying appropriate survey design practices, managers employing sightability models may be able to generate more accurate and cost-effective population estimates and accompanying measures of precision than is possible if these techniques are ignored. © 2011 The Wildlife Society.  相似文献   

16.
In this paper, we identified the best species–area relationship (SAR) models from amongst 28 different models gathered from the literature, using an artificial predator–prey simulation (EcoSim), along with investigating how sampling approaches and sampling scales affect SARs. Further, we attempted to determine a plausible interpretation of SAR model coefficients for the best performing SAR models. This is the most extensive quantitatively based investigation of the species–area relationship so far undertaken in the literature.We gathered 28 different models from the literature and fitted them to sampling data from EcoSim using non-linear regression and ΔAICc as the goodness-of-fit criterion. Afterwards, we proposed a machine-learning approach to find plausible relationships between the models’ coefficients and the spatial information that likely affect SARs, as a basis for extracting rules that provide an interpretation of SAR coefficients.We found the power function family to be a reasonable choice and in particular the Plotkin function based on ΔAICc ranking. The Plotkin function was consistently in the top three in terms of the best ranked SAR functions. Furthermore, the simple power function was the best-ranked model in nested sampling amongst models with two coefficients. We found that the Plotkin, quadratic power, Morgan–Mercer–Floid and the generalized cumulative Weibull functions are the best ranked models for small, intermediate, large, and very large scales, respectively, in nested sampling, while Plotkin (in small to intermediate scales) and Chapman–Richards (in large to very large scales) are the best ranked functions in random sampling. Finally, based on rule extractions using machine-learning techniques we were able to find interpretations of the coefficients for the simple and extended power functions. For instance, function coefficients corresponded to sampling scale size, patch number, fractal dimension, average patch size, and spatial complexity.Our main conclusions are that SAR models are highly dependent on sampling scale and sampling approach and that the shape of the best ranked SAR model is convex without an asymptote for smaller scales (small, intermediate) and it is sigmoid for larger scales (large and very large). For some of the SAR model coefficients, there are clear correlations with spatial information, thereby providing an interpretation of these coefficients. In addition, the slope z measuring the rate of species increase for SAR models in the power function family was found to be directly proportional to beta diversity, which confirms the view that beta diversity and SAR models are to some extent both measures of species richness.  相似文献   

17.
Judgement post-stratification, which is based on ideas similar to those in ranked set sampling, relies on the ability of a ranker to forecast the ranks of potential observations on a set of units. In practice, the authors sometimes find it difficult to assign these ranks. This note shows how one can borrow techniques from the literature on finite population sampling to allow a probabilistic ranking of the units in a set, thus facilitating use of these sampling plans and improving estimation. The same techniques provide one approach to estimation using a judgement post-stratified sample with multiple rankers. The technique is illustrated on allometric data relating brain weight to body weight in different species of mammals, and on a study of student performance in graduate school.  相似文献   

18.
With the aim of creating a simplified sampling scheme that would retain the accuracy of standard mark–release–recapture (MRR) sampling, but at a greatly reduced cost, we analysed 23 capture–recapture data sets from spatially closed populations of six Lepidoptera species according to the constrained Cormack–Jolly–Seber models. Subsequently the relationships between the estimates of population parameters were investigated in order to develop a regression equation that would enable us to calculate seasonal population size without sampling the population throughout the entire flight period. The proportion of individuals flying at peak population was highly variable (CV=0.39), but the variation decreased considerably (CV=0.14) after different life span and flight period length were accounted for. Over 90% of the variance of this proportion was explained by the life span:flight period length ratio. Simulations of hypothetical sampling schemes proved that schemes covering the second and third quarter of the flight period performed much better than those restricted to the second quarter only. The accuracy of seasonal population size estimated with the regression equation developed was comparable for intensive schemes (daily sampling) and non-intensive ones (sampling once in 2 or 3 days). We propose a simplified method of surveying butterfly populations that should be based on checking the presence of flying adults at the beginning and end of the flight period to assess its length, and MRR sampling covering its middle part, with intervals between capture days corresponding to the average life span of investigated butterflies.  相似文献   

19.
20.
In connectivity models, land cover types are assigned cost values characterizing their resistance to species movements. Landscape genetic methods infer these values from the relationship between genetic differentiation and cost distances. The spatial heterogeneity of population sizes, and consequently genetic drift, is rarely included in this inference although it influences genetic differentiation. Similarly, migration rates and population spatial distributions potentially influence this inference. Here, we assessed the reliability of cost value inference under several migration rates, population spatial patterns and degrees of population size heterogeneity. Additionally, we assessed whether considering intra-population variables, here using gravity models, improved the inference when drift is spatially heterogeneous. We simulated several gene flow intensities between populations with varying local sizes and spatial distributions. We then fit gravity models of genetic distances as a function of (i) the ‘true’ cost distances driving simulations or alternative cost distances, and (ii) intra-population variables (population sizes, patch areas). We determined the conditions making the identification of the ‘true’ costs possible and assessed the contribution of intra-population variables to this objective. Overall, the inference ranked cost scenarios reliably in terms of similarity with the ‘true’ scenario (cost distance Mantel correlations), but this ‘true’ scenario rarely provided the best model goodness of fit. Ranking inaccuracies and failures to identify the ‘true’ scenario were more pronounced when migration was very restricted (<4 dispersal events/generation), population sizes were most heterogeneous and some populations were spatially aggregated. In these situations, considering intra-population variables helps identify cost scenarios reliably, thereby improving cost value inference from genetic data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号