首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
A random sample is drawn from a distribution which admits aminimal sufficient statistic for the parameters. The Gibbs sampleris proposed to generate samples, called conditionally sufficientor co-sufficient samples, from the conditional distributionof the sample given its value of the sufficient statistic. Theprocedure is illustrated for the gamma distribution. Co-sufficientsamples may be used to give exact tests of fit; for the gammadistribution these are compared for size and power with approximatetests based on the parametric bootstrap.  相似文献   

2.
An approximate method for estimating the sample size in simple random sampling and a systematic way of transformation of sample data are derived by using the parameters α and β of the regression of mean crowding on mean density in the spatial distribution per quadrat of animal populations (Iwao , 1968). If the values of α and β have been known for the species concerned, the sample size needed to attain a desired precision can be estimated by simply knowing the approximate level of mean density of the population to be sampled. Also, an appropriate variance stabilizing transformation of sample data can be obtained by the method given here without restrictions on the distribution pattern of the frequency counts.  相似文献   

3.
Estimating optimal sample size for microbiological surveys is a challenge for laboratory managers. When insufficient sampling is conducted, biased inferences are likely; however, when excessive sampling is conducted valuable laboratory resources are wasted. This report presents a statistical model for the estimation of the sample size appropriate for the accurate identification of the bacterial subtypes of interest in a specimen. This applied model for microbiology laboratory use is based on a Bayesian mode of inference, which combines two inputs: (ii) a prespecified estimate, or prior distribution statement, based on available scientific knowledge and (ii) observed data. The specific inputs for the model are a prior distribution statement of the number of strains per specimen provided by an informed microbiologist and data from a microbiological survey indicating the number of strains per specimen. The model output is an updated probability distribution of strains per specimen, which can be used to estimate the probability of observing all strains present according to the number of colonies that are sampled. In this report two scenarios that illustrate the use of the model to estimate bacterial colony sample size requirements are presented. In the first scenario, bacterial colony sample size is estimated to correctly identify Campylobacter amplified restriction fragment length polymorphism types on broiler carcasses. The second scenario estimates bacterial colony sample size to correctly identify Salmonella enterica serotype Enteritidis phage types in fecal drag swabs from egg-laying poultry flocks. An advantage of the model is that as updated inputs from ongoing surveys are incorporated into the model, increasingly precise sample size estimates are likely to be made.  相似文献   

4.
Estimating optimal sample size for microbiological surveys is a challenge for laboratory managers. When insufficient sampling is conducted, biased inferences are likely; however, when excessive sampling is conducted valuable laboratory resources are wasted. This report presents a statistical model for the estimation of the sample size appropriate for the accurate identification of the bacterial subtypes of interest in a specimen. This applied model for microbiology laboratory use is based on a Bayesian mode of inference, which combines two inputs: (ii) a prespecified estimate, or prior distribution statement, based on available scientific knowledge and (ii) observed data. The specific inputs for the model are a prior distribution statement of the number of strains per specimen provided by an informed microbiologist and data from a microbiological survey indicating the number of strains per specimen. The model output is an updated probability distribution of strains per specimen, which can be used to estimate the probability of observing all strains present according to the number of colonies that are sampled. In this report two scenarios that illustrate the use of the model to estimate bacterial colony sample size requirements are presented. In the first scenario, bacterial colony sample size is estimated to correctly identify Campylobacter amplified restriction fragment length polymorphism types on broiler carcasses. The second scenario estimates bacterial colony sample size to correctly identify Salmonella enterica serotype Enteritidis phage types in fecal drag swabs from egg-laying poultry flocks. An advantage of the model is that as updated inputs from ongoing surveys are incorporated into the model, increasingly precise sample size estimates are likely to be made.  相似文献   

5.
We study relationships between extreme ranked set samples (ERSSs) and median ranked set sample (MRSS) with simple random sample (SRS). For a random variable X, we show that the distribution function estimator when using ERSSs and MRSS are more efficient than when using SRS and ranked set sampling for some values of a given x. It is shown that using ERSSs can reduce the necessary sample size by a factor of 1.33 to 4 when estimating the median of the distribution. Asymptotic results for the estimation of the distribution function is given for the center of the distribution function. Data on the bilirubin level of babies in neonatal intensive care is used to illustrate the method.  相似文献   

6.
We consider sample size calculations for testing differences in means between two samples and allowing for different variances in the two groups. Typically, the power functions depend on the sample size and a set of parameters assumed known, and the sample size needed to obtain a prespecified power is calculated. Here, we account for two sources of variability: we allow the sample size in the power function to be a stochastic variable, and we consider estimating the parameters from preliminary data. An example of the first source of variability is nonadherence (noncompliance). We assume that the proportion of subjects who will adhere to their treatment regimen is not known before the study, but that the proportion is a stochastic variable with a known distribution. Under this assumption, we develop simple closed form sample size calculations based on asymptotic normality. The second source of variability is in parameter estimates that are estimated from prior data. For example, we account for variability in estimating the variance of the normal response from existing data which are assumed to have the same variance as the study for which we are calculating the sample size. We show that we can account for the variability of the variance estimate by simply using a slightly larger nominal power in the usual sample size calculation, which we call the calibrated power. We show that the calculation of the calibrated power depends only on the sample size of the existing data, and we give a table of calibrated power by sample size. Further, we consider the calculation of the sample size in the rarer situation where we account for the variability in estimating the standardized effect size from some existing data. This latter situation, as well as several of the previous ones, is motivated by sample size calculations for a Phase II trial of a malaria vaccine candidate.  相似文献   

7.
Medical diagnostic tests are used to classify subjects as non-diseased or diseased. The classification rule usually consists of classifying subjects using the values of a continuous marker that is dichotomised by means of a threshold. Here, the optimum threshold estimate is found by minimising a cost function that accounts for both decision costs and sampling uncertainty. The cost function is optimised either analytically in a normal distribution setting or empirically in a free-distribution setting when the underlying probability distributions of diseased and non-diseased subjects are unknown. Inference of the threshold estimates is based on approximate analytically standard errors and bootstrap-based approaches. The performance of the proposed methodology is assessed by means of a simulation study, and the sample size required for a given confidence interval precision and sample size ratio is also calculated. Finally, a case example based on previously published data concerning the diagnosis of Alzheimer's patients is provided in order to illustrate the procedure.  相似文献   

8.
Richard R. Hudson 《Genetics》1985,109(3):611-631
The sampling distributions of several statistics that measure the association of alleles on gametes (linkage disequilibrium) are estimated under a two-locus neutral infinite allele model using an efficient Monte Carlo method. An often used approximation for the mean squared linkage disequilibrium is shown to be inaccurate unless the proper statistical conditioning is used. The joint distribution of linkage disequilibrium and the allele frequencies in the sample is studied. This estimated joint distribution is sufficient for obtaining an approximate maximum likelihood estimate of C = 4Nc, where N is the population size and c is the recombination rate. It has been suggested that observations of high linkage disequilibrium might be a good basis for rejecting a neutral model in favor of a model in which natural selection maintains genetic variation. It is found that a single sample of chromosomes, examined at two loci cannot provide sufficient information for such a test if C less than 10, because with C this small, very high levels of linkage disequilibrium are not unexpected under the neutral model. In samples of size 50, it is found that, even when C is as large as 50, the distribution of linkage disequilibrium conditional on the allele frequencies is substantially different from the distribution when there is no linkage between the loci. When conditioned on the number of alleles at each locus in the sample, all of the sample statistics examined are nearly independent of theta = 4N mu, where mu is the neutral mutation rate.  相似文献   

9.
Designs of the double sampling (DS) chart are traditionally based on the average run length (ARL) criterion. However, the shape of the run length distribution changes with the process mean shifts, ranging from highly skewed when the process is in-control to almost symmetric when the mean shift is large. Therefore, we show that the ARL is a complicated performance measure and that the median run length (MRL) is a more meaningful measure to depend on. This is because the MRL provides an intuitive and a fair representation of the central tendency, especially for the rightly skewed run length distribution. Since the DS chart can effectively reduce the sample size without reducing the statistical efficiency, this paper proposes two optimal designs of the MRL-based DS chart, for minimizing (i) the in-control average sample size (ASS) and (ii) both the in-control and out-of-control ASSs. Comparisons with the optimal MRL-based EWMA and Shewhart charts demonstrate the superiority of the proposed optimal MRL-based DS chart, as the latter requires a smaller sample size on the average while maintaining the same detection speed as the two former charts. An example involving the added potassium sorbate in a yoghurt manufacturing process is used to illustrate the effectiveness of the proposed MRL-based DS chart in reducing the sample size needed.  相似文献   

10.
Heterogeneity and small sample size are problems that affect many paleodemographic studies. The former can cause the overall distribution of age at death to be an amalgam that does not accurately reflect the distributions of any of the groups composing the heterogeneous population. The latter can make it difficult to separate significant from nonsignificant demographic differences between groups. Survival analysis, a methodology that involves the survival distribution function and various regression models, can be applied to distributions of age at death in order to reveal statistically significant demographic differences and to control for heterogeneity. Survival analysis was used on demographic data from a heterogeneous sample of skeletons of low status Maya who lived in and around Copan, Honduras, between A.D. 400 and 1200. Results contribute to understanding the collapse of Classic Maya civilization.  相似文献   

11.
An improved understanding of how particle size distribution relates to enzymatic hydrolysis performance and rheological properties could enable enhanced biochemical conversion of lignocellulosic feedstocks. Particle size distribution can change as a result of either physical or chemical manipulation of a biomass sample. In this study, we employed image processing techniques to measure slurry particle size distribution and validated the results by showing that they are comparable to those from laser diffraction and sieving. Particle size and chemical changes of biomass slurries were manipulated independently and the resulting yield stress and enzymatic digestibility of slurries with different size distributions were measured. Interestingly, reducing particle size by mechanical means from about 1 mm to 100 μm did not reduce the yield stress of the slurries over a broad range of concentrations or increase the digestibility of the biomass over the range of size reduction studied here. This is in stark contrast to the increase in digestibility and decrease in yield stress when particle size is reduced by dilute-acid pretreatment over similar size ranges.  相似文献   

12.
Fifteen previously proposed similarity indices are examined for the effects of sample size and/or group size (the number of samples included in a cluster). The three indices ofCλ,NESS, andC′λ are free from effects, but the former two are unsuitable for arithmetic averaging unless all of the sample sizes are equal. Thus clustering usingC′λ is found to be superior to the combination of any other similarity index and the group-average strategy. Unfortunately none of these measures have the desirable property of measuring the difference in component species among samples independent of the alpha-diversity. A new index of similarity (HR) is developed based on the assumption that community from which samples are taken is described by a logseries distribution. This new index measures the beta-diversity among samples without the influence of sample size and group size, and has the advantage that the significance of fusing samples can statistically be tested. An example clustering withHR is shown and compared with those obtained by other clustering strategies.  相似文献   

13.
Wang L  Zhou XH 《Biometrics》2007,63(4):1218-1225
Heteroscedastic data arise in many applications. In heteroscedastic regression analysis, the variance is often modeled as a parametric function of the covariates or the regression mean. We propose a kernel-smoothing type nonparametric test for checking the adequacy of a given parametric variance structure. The test does not need to specify a parametric distribution for the random errors. It is shown that the test statistic has an asymptotical normal distribution under the null hypothesis and is powerful against a large class of alternatives. We suggest a simple bootstrap algorithm to approximate the distribution of the test statistic in finite sample size. Numerical simulations demonstrate the satisfactory performance of the proposed test. We also illustrate the application by the analysis of a radioimmunoassay data set.  相似文献   

14.
Confidence intervals for spectral mean and ratio statistics   总被引:1,自引:0,他引:1  
Shao  Xiaofeng 《Biometrika》2009,96(1):107-117
We propose a new method, to construct confidence intervals forspectral mean and related ratio statistics of a stationary process,that avoids direct estimation of their asymptotic variances.By introducing a bandwidth, a self-normalization procedure isadopted and the distribution of the new statistic is asymptoticallynuisance-parameter free. The bandwidth is chosen using informationcriteria and a moving average sieve approximation. Through asimulation study, we demonstrate good finite sample performanceof our method when the sample size is moderate, while a comparisonwith an empirical likelihood-based method for ratio statisticsis made, confirming a wider applicability of our method.  相似文献   

15.
Cheng Y  Shen Y 《Biometrics》2004,60(4):910-918
For confirmatory trials of regulatory decision making, it is important that adaptive designs under consideration provide inference with the correct nominal level, as well as unbiased estimates, and confidence intervals for the treatment comparisons in the actual trials. However, naive point estimate and its confidence interval are often biased in adaptive sequential designs. We develop a new procedure for estimation following a test from a sample size reestimation design. The method for obtaining an exact confidence interval and point estimate is based on a general distribution property of a pivot function of the Self-designing group sequential clinical trial by Shen and Fisher (1999, Biometrics55, 190-197). A modified estimate is proposed to explicitly account for futility stopping boundary with reduced bias when block sizes are small. The proposed estimates are shown to be consistent. The computation of the estimates is straightforward. We also provide a modified weight function to improve the power of the test. Extensive simulation studies show that the exact confidence intervals have accurate nominal probability of coverage, and the proposed point estimates are nearly unbiased with practical sample sizes.  相似文献   

16.
暴露评估中样本采集量的模拟研究   总被引:1,自引:0,他引:1  
选择暴露评估常用的4种右偏分布,就评估关注的高百分位数估计与采样量的关系进行模拟研究;又以对数正态分布为代表,从分布形态和变异的角度做了细致探讨.结果表明:(1)对右偏分布来说,百分位数越高,准确估计所需的采样容量就越大.而其估计值都随采样量的增大而趋近理论值,精度也随之增大,采样量500时,本文考察的4种右偏分布除P99.9外的其它百分位数都得到了较为准确的估计.(2)估计相同的百分位数,对数正态分布所需的样本容量要比正态分布大得多;而其分布变异越大,所需的采样量也就越大.本研究可为暴露评估中数据的采样调查提供借鉴.  相似文献   

17.
Breadth of the interproximal wear facet between lower P2 and M1 and between lower M1 and M2 was measured in human skeletal samples representing the Archaic, Woodland, and Mississippian periods of Tennessee River Valley prehistory, with the aim of assessing relative magnitudes of applied masticatory forces. When stratified by level of occlusal wear, mean interproximal facet breadth was consistently larger in the Archaic sample than in the Mississippi sample, with the Woodland sample intermediate. An analysis of covariance demonstrated that there was significant (p less than or equal to 0.01) differences in facet size among the three groups even when differencs in crown breadth were taken into account. Similar results were obtained in regressions of facet size on chronological age (Archaic larger than Mississippian at P less than or equal to 0.01). Since rate of occlusal wear appears to be somewhat greater in the Archaic sample than in the later samples, the differences in interproximal wear are probably underestimated. It is suggested that the high levels of interproximal wear in the Archaic are indicative of the large occlusal forces and repetitive chewing required to masticate a diet of seeds, wild plant foods, and small animals, for which prior preparation (e.g., grinding, cooking) was minimal or nonexistent (as indicated by paleofecal samples). The lower amounts of interproximal wear observed in the Woodland and Mississippian samples imply considerable reductions in strenous mastication, perhaps due to the widespread adoption during these period of pottery and the earth oven, together with ethnographically-documented techniques of food preparation that transformed most foods to a soft consistency.  相似文献   

18.
We consider the problem of jointly modeling survival time and longitudinal data subject to measurement error. The survival times are modeled through the proportional hazards model and a random effects model is assumed for the longitudinal covariate process. Under this framework, we propose an approximate nonparametric corrected-score estimator for the parameter, which describes the association between the time-to-event and the longitudinal covariate. The term nonparametric refers to the fact that assumptions regarding the distribution of the random effects and that of the measurement error are unnecessary. The finite sample size performance of the approximate nonparametric corrected-score estimator is examined through simulation studies and its asymptotic properties are also developed. Furthermore, the proposed estimator and some existing estimators are applied to real data from an AIDS clinical trial.  相似文献   

19.
MOTIVATION: Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. RESULTS: Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. AVAILABILITY: For the companion website, please visit http://public.tgen.org/tamu/ofs/ CONTACT: e-dougherty@ee.tamu.edu.  相似文献   

20.
An estimation procedure using the idea of sample coverage is proposed to estimate population size for capture-recapture experiments in continuous time. The capture rates (intensity) are allowed to vary by time and individuals (heterogeneity). Only capture frequency history are sufficient for estimating population size while capture times and sequential orders of animals caught are irrelevant for the analysis. An example is given for illustration. The performance of the proposed estimation procedure is also investigated by simulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号