首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 27 毫秒
1.
In the capture‐recapture problem for two independent samples, the traditional estimator, calculated as the product of the two sample sizes divided by the number of sampled subjects appearing commonly in both samples, is well known to be a biased estimator of the population size and have no finite variance under direct or binomial sampling. To alleviate these theoretical limitations, the inverse sampling, in which we continue sampling subjects in the second sample until we obtain a desired number of marked subjects who appeared in the first sample, has been proposed elsewhere. In this paper, we consider five interval estimators of the population size, including the most commonly‐used interval estimator using Wald's statistic, the interval estimator using the logarithmic transformation, the interval estimator derived from a quadratic equation developed here, the interval estimator using the χ2‐approximation, and the interval estimator based on the exact negative binomial distribution. To evaluate and compare the finite sample performance of these estimators, we employ Monte Carlo simulation to calculate the coverage probability and the standardized average length of the resulting confidence intervals in a variety of situations. To study the location of these interval estimators, we calculate the non‐coverage probability in the two tails of the confidence intervals. Finally, we briefly discuss the optimal sample size determination for a given precision to minimize the expected total cost. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

2.
MOTIVATION: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large. RESULTS: In this study, we have characterized quantitatively the effect of pooling samples on the efficiency of microarray experiments for the detection of differential gene expression between two classes. We present exact formulas for calculating the power of microarray experimental designs involving sample pooling and technical replications. The formulas can be used to determine the total number of arrays and biological subjects required in an experiment to achieve the desired power at a given significance level. The conditions under which pooled design becomes preferable to non-pooled design can then be derived given the unit cost associated with a microarray and that with a biological subject. This paper thus serves to provide guidance on sample pooling and cost-effectiveness. The formulation in this paper is outlined in the context of performing microarray comparative studies, but its applicability is not limited to microarray experiments. It is also applicable to a wide range of biomedical comparative studies where sample pooling may be involved.  相似文献   

3.
In a randomized clinical trial (RCT), noncompliance with an assigned treatment can occur due to serious side effects, while missing outcomes on patients may happen due to patients' withdrawal or loss to follow up. To avoid the possible loss of power to detect a given risk difference (RD) of interest between two treatments, it is essentially important to incorporate the information on noncompliance and missing outcomes into sample size calculation. Under the compound exclusion restriction model proposed elsewhere, we first derive the maximum likelihood estimator (MLE) of the RD among compliers between two treatments for a RCT with noncompliance and missing outcomes and its asymptotic variance in closed form. Based on the MLE with tanh(-1)(x) transformation, we develop an asymptotic test procedure for testing equality of two treatment effects among compliers. We further derive a sample size calculation formula accounting for both noncompliance and missing outcomes for a desired power 1 - beta at a nominal alpha-level. To evaluate the performance of the test procedure and the accuracy of the sample size calculation formula, we employ Monte Carlo simulation to calculate the estimated Type I error and power of the proposed test procedure corresponding to the resulting sample size in a variety of situations. We find that both the test procedure and the sample size formula developed here can perform well. Finally, we include a discussion on the effects of various parameters, including the proportion of compliers, the probability of non-missing outcomes, and the ratio of sample size allocation, on the minimum required sample size.  相似文献   

4.
The currently used criterion for sample size calculation in a reference interval study is not well stated and leads to imprecise control of the ratio in question. We propose a generalization of the criterion used to determine sufficient sample size in reference interval studies. The generalization allows better estimation of the required sample size when the reference interval estimation will be using a power transformation or is nonparametric. Bootstrap methods are presented to estimate sample sizes required by the generalized criterion. Simulation of several distributions both symmetric and positively skewed is presented to compare the sample size estimators. The new method is illustrated on a data set of plasma glucose values from a 50‐g oral glucose tolerance test. It is seen that the sample sizes calculated from the generalized criterion leads to more reliable control of the desired ratio.  相似文献   

5.
In case-control studies with matched pairs, the traditional point estimator of odds ratio (OR) is well-known to be biased with no exact finite variance under binomial sampling. In this paper, we consider use of inverse sampling in which we continue to sample subjects to form matched pairs until we obtain a pre-determined number (>0) of index pairs with the case unexposed but the control exposed. In contrast to use of binomial sampling, we show that the uniformly minimum variance unbiased estimator (UMVUE) of OR does exist under inverse sampling. We further derive an exact confidence interval of OR in closed form. Finally, we develop an exact test and an asymptotic test for testing the null hypothesis H0: OR = 1, as well as discuss sample size determination on the minimum required number of index pairs for a desired power at α-level.  相似文献   

6.
Medical diagnostic tests are used to classify subjects as non-diseased or diseased. The classification rule usually consists of classifying subjects using the values of a continuous marker that is dichotomised by means of a threshold. Here, the optimum threshold estimate is found by minimising a cost function that accounts for both decision costs and sampling uncertainty. The cost function is optimised either analytically in a normal distribution setting or empirically in a free-distribution setting when the underlying probability distributions of diseased and non-diseased subjects are unknown. Inference of the threshold estimates is based on approximate analytically standard errors and bootstrap-based approaches. The performance of the proposed methodology is assessed by means of a simulation study, and the sample size required for a given confidence interval precision and sample size ratio is also calculated. Finally, a case example based on previously published data concerning the diagnosis of Alzheimer's patients is provided in order to illustrate the procedure.  相似文献   

7.
S L Beal 《Biometrics》1989,45(3):969-977
Sample size determination is usually based on the premise that a hypothesis test is to be used. A confidence interval can sometimes serve better than a hypothesis test. In this paper a method is presented for sample size determination based on the premise that a confidence interval for a simple mean, or for the difference between two means, with normally distributed data is to be used. For this purpose, a concept of power relevant to confidence intervals is given. Some useful tables giving required sample size using this method are also presented.  相似文献   

8.
标记辅助回交育种中所需最小样本容量的近似估计   总被引:1,自引:1,他引:0  
回交育种是把有利基因从供体亲本向受体亲本转移的一种有效方法,标记辅助选择可加速其进程。为了制定合理的标记辅助选择计划,育种家必须知道所需的后代群体大小。该文提出了一种估算在标记辅助回交育种中同时进行前景选择和背景选择所需群体大小的方法。在假定所需转移的目标基因座与遗传背景之间为相互独立的简化假设下,可以通过将解析方法(针对前景选择)与基于回交亲本图示基因型的模拟方法(针对背景选择)相结合,近似地估计出在每一世代中选到所需基因型的概率,进而估算出在一定概率水平下至少获得一个符合要求的个体所需的最小样本容量,用假想的例子演示了该方法的使用情况。该方法可以很方便地应用于实际的回交育种。  相似文献   

9.
Simple, defensible sample sizes based on cost efficiency   总被引:1,自引:0,他引:1  
Summary .   The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices. Costs, however, are often impossible for investigators and funders to ignore in actual practice. Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost. By showing that a study's projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size. The first is to choose the sample size that minimizes the average cost per subject. The second is to choose sample size to minimize total cost divided by the square root of sample size. This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases. For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does. These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.  相似文献   

10.
We present new inference methods for the analysis of low‐ and high‐dimensional repeated measures data from two‐sample designs that may be unbalanced, the number of repeated measures per subject may be larger than the number of subjects, covariance matrices are not assumed to be spherical, and they can differ between the two samples. In comparison, we demonstrate how crucial it is for the popular Huynh‐Feldt (HF) method to make the restrictive and often unrealistic or unjustifiable assumption of equal covariance matrices. The new method is shown to maintain desired α‐levels better than the well‐known HF correction, as demonstrated in several simulation studies. The proposed test gains power when the number of repeated measures is increased in a manner that is consistent with the alternative. Thus, even increasing the number of measurements on the same subject may lead to an increase in power. Application of the new method is illustrated in detail, using two different real data sets. In one of them, the number of repeated measures per subject is smaller than the sample size, while in the other one, it is larger.  相似文献   

11.
Scientists often need to test hypotheses and construct corresponding confidence intervals. In designing a study to test a particular null hypothesis, traditional methods lead to a sample size large enough to provide sufficient statistical power. In contrast, traditional methods based on constructing a confidence interval lead to a sample size likely to control the width of the interval. With either approach, a sample size so large as to waste resources or introduce ethical concerns is undesirable. This work was motivated by the concern that existing sample size methods often make it difficult for scientists to achieve their actual goals. We focus on situations which involve a fixed, unknown scalar parameter representing the true state of nature. The width of the confidence interval is defined as the difference between the (random) upper and lower bounds. An event width is said to occur if the observed confidence interval width is less than a fixed constant chosen a priori. An event validity is said to occur if the parameter of interest is contained between the observed upper and lower confidence interval bounds. An event rejection is said to occur if the confidence interval excludes the null value of the parameter. In our opinion, scientists often implicitly seek to have all three occur: width, validity, and rejection. New results illustrate that neglecting rejection or width (and less so validity) often provides a sample size with a low probability of the simultaneous occurrence of all three events. We recommend considering all three events simultaneously when choosing a criterion for determining a sample size. We provide new theoretical results for any scalar (mean) parameter in a general linear model with Gaussian errors and fixed predictors. Convenient computational forms are included, as well as numerical examples to illustrate our methods.  相似文献   

12.
Roy A  Bhaumik DK  Aryal S  Gibbons RD 《Biometrics》2007,63(3):699-707
Summary .   We consider the problem of sample size determination for three-level mixed-effects linear regression models for the analysis of clustered longitudinal data. Three-level designs are used in many areas, but in particular, multicenter randomized longitudinal clinical trials in medical or health-related research. In this case, level 1 represents measurement occasion, level 2 represents subject, and level 3 represents center. The model we consider involves random effects of the time trends at both the subject level and the center level. In the most common case, we have two random effects (constant and a single trend), at both subject and center levels. The approach presented here is general with respect to sampling proportions, number of groups, and attrition rates over time. In addition, we also develop a cost model, as an aid in selecting the most parsimonious of several possible competing models (i.e., different combinations of centers, subjects within centers, and measurement occasions). We derive sample size requirements (i.e., power characteristics) for a test of treatment-by-time interaction(s) for designs based on either subject-level or cluster-level randomization. The general methodology is illustrated using two characteristic examples.  相似文献   

13.
We develop a Bayesian simulation based approach for determining the sample size required for estimating a binomial probability and the difference between two binomial probabilities where we allow for dependence between two fallible diagnostic procedures. Examples include estimating the prevalence of disease in a single population based on results from two imperfect diagnostic tests applied to sampled individuals, or surveys designed to compare the prevalences of two populations using diagnostic outcomes that are subject to misclassification. We propose a two stage procedure in which the tests are initially assumed to be independent conditional on true disease status (i.e. conditionally independent). An interval based sample size determination scheme is performed under this assumption and data are collected and used to test the conditional independence assumption. If the data reveal the diagnostic tests to be conditionally dependent, structure is added to the model to account for dependence and the sample size routine is repeated in order to properly satisfy the criterion under the correct model. We also examine the impact on required sample size when adding an extra heterogeneous population to a study.  相似文献   

14.
Questions: The quality of any inferences derived from field studies or monitoring programmes depends on expenditure of time and effort to make the underlying observations. Here, we used a long‐term data set from a succession‐monitoring scheme to assess the effect of different survey scenarios. We asked: (1) how well does a survey reflect successional processes if sampling effort varies (a) in space (b) in length of total observation period, (c) in observation frequency and (d) with a combination of these factors? (2) What are the practical implications for devising monitoring programmes? Location: Lignite mining region of Central Germany, post‐mining landscape of Goitzsche (Saxony‐Anhalt). Methods: Based on our full data set, we constructed subsamples. For the full data set and all subsets, we constructed Markov models and compared them based on the predictions made. We assessed effects of survey intensity on model performance using generalized linear models and multiple logistic regressions. Results: Exploring the effects of different survey scenarios revealed significant effects of all three main features of survey intensity (sample size, length, frequency). The most important sampling feature was study length. However, we found interactive effects of sample size with study length and observation interval on model predictions. This indicates that for long‐term observations with multiple recording intervals a lower sample size in space is required to reveal the same amount of information as required in a shorter study or one with fewer intervals. Conversely, a high sample size may, to some degree, compensate for relatively short study periods. Conclusions: Monitoring activities should not be restricted to intensive sampling over only a few years. With clearly limited resources, a decrease of sampling intensity in space, and stretching these resources over a longer period would probably pay off much better than totally abandoning monitoring activities after an intensive, but short, campaign.  相似文献   

15.
QuestionsUncertainty in detecting disturbance histories has long been ignored in dendrochronological studies in forest ecosystems. Our goal was to characterize this uncertainty in relation to the key parameters of forest ecosystems and sample size. In addition, we aimed to provide a method to define uncertainty bounds in specific forest ecosystems with known parameters, and to provide a required (conservative) minimal sample size to achieve a pre-defined level of uncertainty if no actual key forest parameters are known.LocationTraining data were collected from Žofínský Prales (48°40′N, 14°42′E, 735–830 m a.s.l., granite, Czech Republic).MethodsWe used probability theory and expressed uncertainty as the length (the difference between the upper and lower bounds) of the 95% confidence interval. We studied the uncertainty of (i) the initial growth of trees – if they originated under canopy or in a gap; and (ii) the responses to disturbance events during subsequent growth – on the basis of release detection in the radial growth of trees. These two variables provide different information, which together give a picture of the disturbance history. While initial growth date the existence of a gap in a given decade (recent as well as older gaps are included), release demonstrates the moment of a disturbance event.ResultsWith the help of general mathematical deduction, we have obtained results valid across vegetation types. The length of a confidence interval depends on the sample size, proportion of released trees in a population, as well as on the variability of tree layer features (e.g., crown area of suppressed and released trees).ConclusionsMost studies to date have evaluated the initial growth of trees with higher uncertainty than for canopy disturbed area. The length of the 95% confidence interval for detecting initial growth has been rarely shorter than 0.1 (error ± 5%) and has mostly been much longer. To reach 95% confidence interval length of 0.1 (error ± 5%) when detecting the canopy disturbed area, at least 485 tree cores should be evaluated in studied time period, while to reach a 0.05 interval length (error ± 2.5%) at least 1925 tree cores are required. Our approach can be used to find the required sample size in each specific forest ecosystem to achieve pre-defined levels of uncertainty while detecting disturbance history.  相似文献   

16.
Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferroni-type improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.  相似文献   

17.
A central goal in designing clinical trials is to find the test that maximizes power (or equivalently minimizes required sample size) for finding a false null hypothesis subject to the constraint of type I error. When there is more than one test, such as in clinical trials with multiple endpoints, the issues of optimal design and optimal procedures become more complex. In this paper, we address the question of how such optimal tests should be defined and how they can be found. We review different notions of power and how they relate to study goals, and also consider the requirements of type I error control and the nature of the procedures. This leads us to an explicit optimization problem with objective and constraints that describe its specific desiderata. We present a complete solution for deriving optimal procedures for two hypotheses, which have desired monotonicity properties, and are computationally simple. For some of the optimization formulations this yields optimal procedures that are identical to existing procedures, such as Hommel's procedure or the procedure of Bittman et al. (2009), while for other cases it yields completely novel and more powerful procedures than existing ones. We demonstrate the nature of our novel procedures and their improved power extensively in a simulation and on the APEX study (Cohen et al., 2016).  相似文献   

18.
The current development of densely spaced collections of single nucleotide polymorphisms (SNPs) will lead to genomewide association studies for a wide range of diseases in many different populations. Determinations of the appropriate number of SNPs to genotype involve a balancing of power and cost. Several variables are important in these determinations. We show that there are different combinations of sample size and marker density that can be expected to achieve the same power. Within certain bounds, investigators can choose between designs with more subjects and fewer markers or those with more markers and fewer subjects. Which designs are more cost-effective depends on the cost of phenotyping versus the cost of genotyping. We show that, under the assumption of a set cost for genotyping, one can calculate a "threshold cost" for phenotyping; when phenotyping costs per subject are less than this threshold, designs with more subjects will be more cost-effective than designs with more markers. This framework for determining a cost-effective study will aid in the planning of studies, especially if there are choices to be made with respect to phenotyping methods or study populations.  相似文献   

19.
Tang ML  Tang NS  Chan IS  Chan BP 《Biometrics》2002,58(4):957-963
In this article, we propose approximate sample size formulas for establishing equivalence or noninferiority of two treatments in match-pairs design. Using the ratio of two proportions as the equivalence measure, we derive sample size formulas based on a score statistic for two types of analyses: hypothesis testing and confidence interval estimation. Depending on the purpose of a study, these formulas can be used to provide a sample size estimate that guarantees a prespecified power of a hypothesis test at a certain significance level or controls the width of a confidence interval with a certain confidence level. Our empirical results confirm that these score methods are reliable in terms of true size, coverage probability, and skewness. A liver scan detection study is used to illustrate the proposed methods.  相似文献   

20.
Many medical and biological studies entail classifying a number of observations according to two factors, where one has two and the other three possible categories. This is the case of, for example, genetic association studies of complex traits with single-nucleotide polymorphisms (SNPs), where the a priori statistical planning, analysis, and interpretation of results are of critical importance. Here, we present methodology to determine the minimum sample size required to detect dependence in 2 x 3 tables based on Fisher's exact test, assuming that neither of the two margins is fixed and only the grand total N is known in advance. We provide the numerical tools necessary to determine these sample sizes for desired power, significance level, and effect size, where only the computational time can be a limitation for extreme parameter values. These programs can be accessed at . This solution of the sample size problem for an exact test will permit experimentalists to plan efficient sampling designs, determine the extent of statistical support for their hypotheses, and gain insight into the repeatability of their results. We apply this solution to the sample size problem to three empirical studies, and discuss the results with specified power and nominal significance levels.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号