首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
Theoretical models are often applied to population genetic data sets without fully considering the effect of missing data. Researchers can deal with missing data by removing individuals that have failed to yield genotypes and/or by removing loci that have failed to yield allelic determinations, but despite their best efforts, most data sets still contain some missing data. As a consequence, realized sample size differs among loci, and this poses a problem for unbiased methods that must explicitly account for random sampling error. One commonly used solution for the calculation of contemporary effective population size (Ne) is to calculate the effective sample size as an unweighted mean or harmonic mean across loci. This is not ideal because it fails to account for the fact that loci with different numbers of alleles have different information content. Here we consider this problem for genetic estimators of contemporary effective population size (Ne). To evaluate bias and precision of several statistical approaches for dealing with missing data, we simulated populations with known Ne and various degrees of missing data. Across all scenarios, one method of correcting for missing data (fixed‐inverse variance‐weighted harmonic mean) consistently performed the best for both single‐sample and two‐sample (temporal) methods of estimating Ne and outperformed some methods currently in widespread use. The approach adopted here may be a starting point to adjust other population genetics methods that include per‐locus sample size components.  相似文献   

2.
Clinical trials are often planned with high uncertainty about the variance of the primary outcome variable. A poor estimate of the variance, however, may lead to an over‐ or underpowered study. In the internal pilot study design, the sample variance is calculated at an interim step and the sample size can be adjusted if necessary. The available recalculation procedures use the data of those patients for sample size recalculation that have already completed the study. In this article, we consider a variance estimator that takes into account both the data at the endpoint and at an intermediate point of the treatment phase. We derive asymptotic properties of this estimator and the relating sample size recalculation procedure. In a simulation study, the performance of the proposed approach is evaluated and compared with the procedure that uses only long‐term data. Simulation results demonstrate that the sample size resulting from the proposed procedure shows in general a smaller variability. At the same time, the Type I error rate is not inflated and the achieved power is close to the desired value.  相似文献   

3.
Yin G  Shen Y 《Biometrics》2005,61(2):362-369
Clinical trial designs involving correlated data often arise in biomedical research. The intracluster correlation needs to be taken into account to ensure the validity of sample size and power calculations. In contrast to the fixed-sample designs, we propose a flexible trial design with adaptive monitoring and inference procedures. The total sample size is not predetermined, but adaptively re-estimated using observed data via a systematic mechanism. The final inference is based on a weighted average of the block-wise test statistics using generalized estimating equations, where the weight for each block depends on cumulated data from the ongoing trial. When there are no significant treatment effects, the devised stopping rule allows for early termination of the trial and acceptance of the null hypothesis. The proposed design updates information regarding both the effect size and within-cluster correlation based on the cumulated data in order to achieve a desired power. Estimation of the parameter of interest and its confidence interval are proposed. We conduct simulation studies to examine the operating characteristics and illustrate the proposed method with an example.  相似文献   

4.
Rosner B  Glynn RJ 《Biometrics》2011,67(2):646-653
The Wilcoxon rank sum test is widely used for two-group comparisons of nonnormal data. An assumption of this test is independence of sampling units both within and between groups, which will be violated in the clustered data setting such as in ophthalmological clinical trials, where the unit of randomization is the subject, but the unit of analysis is the individual eye. For this purpose, we have proposed the clustered Wilcoxon test to account for clustering among multiple subunits within the same cluster (Rosner, Glynn, and Lee, 2003, Biometrics 59, 1089-1098; 2006, Biometrics 62, 1251-1259). However, power estimation is needed to plan studies that use this analytic approach. We have recently published methods for estimating power and sample size for the ordinary Wilcoxon rank sum test (Rosner and Glynn, 2009, Biometrics 65, 188-197). In this article we present extensions of this approach to estimate power for the clustered Wilcoxon test. Simulation studies show a good agreement between estimated and empirical power. These methods are illustrated with examples from randomized trials in ophthalmology. Enhanced power is achieved with use of the subunit as the unit of analysis instead of the cluster using the ordinary Wilcoxon rank sum test.  相似文献   

5.
Miller F 《Biometrics》2005,61(2):355-361
We consider clinical studies with a sample size re-estimation based on the unblinded variance estimation at some interim point of the study. Because the sample size is determined in such a flexible way, the usual variance estimator at the end of the trial is biased. We derive sharp bounds for this bias. These bounds have a quite simple form and can help for the decision if this bias is negligible for the actual study or if a correction should be done. An exact formula for the bias is also provided. We discuss possibilities to get rid of this bias or at least to reduce the bias substantially. For this purpose, we propose a certain additive correction of the bias. We see in an example that the significance level of the test can be controlled when this additive correction is used.  相似文献   

6.
Risch and Zhang (1995; Science 268: 1584-9) reported a simple sample size and power calculation approach for the Haseman-Elston method and based their computations on the null hypothesis of no genetic effect. We argue that the more reasonable null hypothesis is that of no recombination. For this null hypothesis, we provide a general approach for sample size and power calculations within the Haseman-Elston framework. We demonstrate the validity of our approach in a Monte-Carlo simulation study and illustrate the differences using data from published segregation analyses on body weight and heritability estimates on carotid artery artherosclerotic lesions.  相似文献   

7.
Determining sample sizes for microarray experiments is important but the complexity of these experiments, and the large amounts of data they produce, can make the sample size issue seem daunting, and tempt researchers to use rules of thumb in place of formal calculations based on the goals of the experiment. Here we present formulae for determining sample sizes to achieve a variety of experimental goals, including class comparison and the development of prognostic markers. Results are derived which describe the impact of pooling, technical replicates and dye-swap arrays on sample size requirements. These results are shown to depend on the relative sizes of different sources of variability. A variety of common types of experimental situations and designs used with single-label and dual-label microarrays are considered. We discuss procedures for controlling the false discovery rate. Our calculations are based on relatively simple yet realistic statistical models for the data, and provide straightforward sample size calculation formulae.  相似文献   

8.
Microarray experiments are being increasingly used in molecular biology. A common task is to detect genes with differential expression across two experimental conditions, such as two different tissues or the same tissue at two time points of biological development. To take proper account of statistical variability, some statistical approaches based on the t-statistic have been proposed. In constructing the t-statistic, one needs to estimate the variance of gene expression levels. With a small number of replicated array experiments, the variance estimation can be challenging. For instance, although the sample variance is unbiased, it may have large variability, leading to a large mean squared error. For duplicated array experiments, a new approach based on simple averaging has recently been proposed in the literature. Here we consider two more general approaches based on nonparametric smoothing. Our goal is to assess the performance of each method empirically. The three methods are applied to a colon cancer data set containing 2,000 genes. Using two arrays, we compare the variance estimates obtained from the three methods. We also consider their impact on the t-statistics. Our results indicate that the three methods give variance estimates close to each other. Due to its simplicity and generality, we recommend the use of the smoothed sample variance for data with a small number of replicates. Electronic Publication  相似文献   

9.
Genetic drift and estimation of effective population size   总被引:3,自引:2,他引:1       下载免费PDF全文
Nei M  Tajima F 《Genetics》1981,98(3):625-640
The statistical properties of the standardized variance of gene frequency changes (a quantity equivalent to Wright's inbreeding coefficient) in a random mating population are studied, and new formulae for estimating the effective population size are developed. The accuracy of the formulae depends on the ratio of sample size to effective size, the number of generations involved (t), and the number of loci or alleles used. It is shown that the standardized variance approximately follows the chi(2) distribution unless t is very large, and the confidence interval of the estimate of effective size can be obtained by using this property. Application of the formulae to data from an isolated population of Dacus oleae has shown that the effective size of this population is about one tenth of the minimum census size, though there was a possibility that the procedure of sampling genes was improper.  相似文献   

10.
Statistical sample size calculation is a crucial part of planning nonhuman animal experiments in basic medical research. The 3R principle intends to reduce the number of animals to a sufficient minimum. When planning experiments, one may consider the impact of less rigorous assumptions during sample size determination as it might result in a considerable reduction in the number of required animals. Sample size calculations conducted for 111 biometrical reports were repeated. The original effect size assumptions remained unchanged, but the basic properties (type 1 error 5%, two-sided hypothesis, 80% power) were varied. The analyses showed that a less rigorous assumption on the type 1 error level (one-sided 5% instead of two-sided 5%) was associated with a savings potential of 14% regarding the original number of required animals. Animal experiments are predominantly exploratory studies. In light of the demonstrated potential reduction in the numbers of required animals, researchers should discuss whether less rigorous assumptions during the process of sample size calculation may be reasonable for the purpose of optimizing the number of animals in experiments according to the 3R principle.  相似文献   

11.
The Cochran-Armitage trend test is commonly used as a genotype-based test for candidate gene association. Corresponding to each underlying genetic model there is a particular set of scores assigned to the genotypes that maximizes its power. When the variance of the test statistic is known, the formulas for approximate power and associated sample size are readily obtained. In practice, however, the variance of the test statistic needs to be estimated. We present formulas for the required sample size to achieve a prespecified power that account for the need to estimate the variance of the test statistic. When the underlying genetic model is unknown one can incur a substantial loss of power when a test suitable for one mode of inheritance is used where another mode is the true one. Thus, tests having good power properties relative to the optimal tests for each model are useful. These tests are called efficiency robust and we study two of them: the maximin efficiency robust test is a linear combination of the standardized optimal tests that has high efficiency and the MAX test, the maximum of the standardized optimal tests. Simulation results of the robustness of these two tests indicate that the more computationally involved MAX test is preferable.  相似文献   

12.
In a randomized clinical trial (RCT), noncompliance with an assigned treatment can occur due to serious side effects, while missing outcomes on patients may happen due to patients' withdrawal or loss to follow up. To avoid the possible loss of power to detect a given risk difference (RD) of interest between two treatments, it is essentially important to incorporate the information on noncompliance and missing outcomes into sample size calculation. Under the compound exclusion restriction model proposed elsewhere, we first derive the maximum likelihood estimator (MLE) of the RD among compliers between two treatments for a RCT with noncompliance and missing outcomes and its asymptotic variance in closed form. Based on the MLE with tanh(-1)(x) transformation, we develop an asymptotic test procedure for testing equality of two treatment effects among compliers. We further derive a sample size calculation formula accounting for both noncompliance and missing outcomes for a desired power 1 - beta at a nominal alpha-level. To evaluate the performance of the test procedure and the accuracy of the sample size calculation formula, we employ Monte Carlo simulation to calculate the estimated Type I error and power of the proposed test procedure corresponding to the resulting sample size in a variety of situations. We find that both the test procedure and the sample size formula developed here can perform well. Finally, we include a discussion on the effects of various parameters, including the proportion of compliers, the probability of non-missing outcomes, and the ratio of sample size allocation, on the minimum required sample size.  相似文献   

13.
We consider the power and sample size calculation of diagnostic studies with normally distributed multiple correlated test results. We derive test statistics and obtain power and sample size formulas. The methods are illustrated using an example of comparison of CT and PET scanner for detecting extra-hepatic disease for colorectal cancer.  相似文献   

14.
A key question in sexual selection is whether the ability of males to fertilize eggs under sperm competition exhibits heritable genetic variation. Addressing this question poses a significant problem, however, because a male's ability to win fertilizations ultimately depends on the competitive ability of rival males. Attempts to partition genetic variance in sperm competitiveness, as estimated from measures of fertilization success, must therefore account for stochastic effects due to the random sampling of rival sperm competitors. In this contribution, we suggest a practical solution to this problem. We advocate the use of simple cross-classified breeding designs for partitioning sources of genetic variance in sperm competitiveness and fertilization success and show how these designs can be used to avoid stochastic effects due to the random sampling of rival sperm competitors. We illustrate the utility of these approaches by simulating various scenarios for estimating genetic parameters in sperm competitiveness, and show that the probability of detecting additive genetic variance in this trait is restored when stochastic effects due to the random sampling of rival sperm competitors are controlled. Our findings have important implications for the study of the evolutionary maintenance of polyandry.  相似文献   

15.
Standard sample size calculation formulas for stepped wedge cluster randomized trials (SW-CRTs) assume that cluster sizes are equal. When cluster sizes vary substantially, ignoring this variation may lead to an under-powered study. We investigate the relative efficiency of a SW-CRT with varying cluster sizes to equal cluster sizes, and derive variance estimators for the intervention effect that account for this variation under a mixed effects model—a commonly used approach for analyzing data from cluster randomized trials. When cluster sizes vary, the power of a SW-CRT depends on the order in which clusters receive the intervention, which is determined through randomization. We first derive a variance formula that corresponds to any particular realization of the randomized sequence and propose efficient algorithms to identify upper and lower bounds of the power. We then obtain an “expected” power based on a first-order approximation to the variance formula, where the expectation is taken with respect to all possible randomization sequences. Finally, we provide a variance formula for more general settings where only the cluster size arithmetic mean and coefficient of variation, instead of exact cluster sizes, are known in the design stage. We evaluate our methods through simulations and illustrate that the average power of a SW-CRT decreases as the variation in cluster sizes increases, and the impact is largest when the number of clusters is small.  相似文献   

16.
It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.  相似文献   

17.
In the management of most chronic conditions characterized by the lack of universally effective treatments, adaptive treatment strategies (ATSs) have grown in popularity as they offer a more individualized approach. As a result, sequential multiple assignment randomized trials (SMARTs) have gained attention as the most suitable clinical trial design to formalize the study of these strategies. While the number of SMARTs has increased in recent years, sample size and design considerations have generally been carried out in frequentist settings. However, standard frequentist formulae require assumptions on interim response rates and variance components. Misspecifying these can lead to incorrect sample size calculations and correspondingly inadequate levels of power. The Bayesian framework offers a straightforward path to alleviate some of these concerns. In this paper, we provide calculations in a Bayesian setting to allow more realistic and robust estimates that account for uncertainty in inputs through the ‘two priors’ approach. Additionally, compared to the standard frequentist formulae, this methodology allows us to rely on fewer assumptions, integrate pre-trial knowledge, and switch the focus from the standardized effect size to the MDD. The proposed methodology is evaluated in a thorough simulation study and is implemented to estimate the sample size for a full-scale SMART of an internet-based adaptive stress management intervention on cardiovascular disease patients using data from its pilot study conducted in two Canadian provinces.  相似文献   

18.
It is well known that, for estimating a linear treatment effect with constant variance, the optimal design divides the units equally between the two extremes of the design space. If the dose-response relation may be nonlinear, however, intermediate measurements may be useful in order to estimate the effects of partial treatments. We consider the decision of whether to gather data at an intermediate design point: do the gains from learning about nonlinearity outweigh the loss in efficiency in estimating the linear effect? Under reasonable assumptions about nonlinearity, we find that, unless sample size is very large, the design with no interior measurements is best, because with moderate total sample sizes, any nonlinearity in the dose-response will be difficult to detect. We discuss in the context of a simplified version of the problem that motivated this work-a study of pest-control treatments intended to reduce asthma symptoms in children.  相似文献   

19.
Propensity-score matching is frequently used in the medical literature to reduce or eliminate the effect of treatment selection bias when estimating the effect of treatments or exposures on outcomes using observational data. In propensity-score matching, pairs of treated and untreated subjects with similar propensity scores are formed. Recent systematic reviews of the use of propensity-score matching found that the large majority of researchers ignore the matched nature of the propensity-score matched sample when estimating the statistical significance of the treatment effect. We conducted a series of Monte Carlo simulations to examine the impact of ignoring the matched nature of the propensity-score matched sample on Type I error rates, coverage of confidence intervals, and variance estimation of the treatment effect. We examined estimating differences in means, relative risks, odds ratios, rate ratios from Poisson models, and hazard ratios from Cox regression models. We demonstrated that accounting for the matched nature of the propensity-score matched sample tended to result in type I error rates that were closer to the advertised level compared to when matching was not incorporated into the analyses. Similarly, accounting for the matched nature of the sample tended to result in confidence intervals with coverage rates that were closer to the nominal level, compared to when matching was not taken into account. Finally, accounting for the matched nature of the sample resulted in estimates of standard error that more closely reflected the sampling variability of the treatment effect compared to when matching was not taken into account.  相似文献   

20.
Regulatory authorities require that the sample size of a confirmatory trial is calculated prior to the start of the trial. However, the sample size quite often depends on parameters that might not be known in advance of the study. Misspecification of these parameters can lead to under‐ or overestimation of the sample size. Both situations are unfavourable as the first one decreases the power and the latter one leads to a waste of resources. Hence, designs have been suggested that allow a re‐assessment of the sample size in an ongoing trial. These methods usually focus on estimating the variance. However, for some methods the performance depends not only on the variance but also on the correlation between measurements. We develop and compare different methods for blinded estimation of the correlation coefficient that are less likely to introduce operational bias when the blinding is maintained. Their performance with respect to bias and standard error is compared to the unblinded estimator. We simulated two different settings: one assuming that all group means are the same and one assuming that different groups have different means. Simulation results show that the naïve (one‐sample) estimator is only slightly biased and has a standard error comparable to that of the unblinded estimator. However, if the group means differ, other estimators have better performance depending on the sample size per group and the number of groups.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号