期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Understanding on the Pooled Test for Controlled Clinical Trialsn

Hong Laura Lu Mohammad Huque 《Biometrical journal. Biometrische Zeitschrift》2001,43(7):909-923

Several independent clinical trials are usually conducted to demonstrate and support the evidence of the efficacy of a new drug. When not all the trials demonstrate a treatment effect because of a lack of statistical significant finding, the sponsor sometimes conducts a post hoc pooled test and uses the pooled result as extra statistical evidence. In this paper, we study the extent of type I error rate inflation with the post hoc pooled analysis and the power of interaction test in assessing the homogeneity of the trials with respect to treatment effect size. We also compare the power of several test procedures with or without pooled test involved and discuss the appropriateness of pooled tests under different alternative hypotheses. 相似文献

2.

Interim monitoring in sequential multiple assignment randomized trials

Liwen Wu Junyao Wang Abdus S. Wahed 《Biometrics》2023,79(1):368-380

A sequential multiple assignment randomized trial (SMART) facilitates the comparison of multiple adaptive treatment strategies (ATSs) simultaneously. Previous studies have established a framework to test the homogeneity of multiple ATSs by a global Wald test through inverse probability weighting. SMARTs are generally lengthier than classical clinical trials due to the sequential nature of treatment randomization in multiple stages. Thus, it would be beneficial to add interim analyses allowing for an early stop if overwhelming efficacy is observed. We introduce group sequential methods to SMARTs to facilitate interim monitoring based on the multivariate chi-square distribution. Simulation studies demonstrate that the proposed interim monitoring in SMART (IM-SMART) maintains the desired type I error and power with reduced expected sample size compared to the classical SMART. Finally, we illustrate our method by reanalyzing a SMART assessing the effects of cognitive behavioral and physical therapies in patients with knee osteoarthritis and comorbid subsyndromal depressive symptoms. 相似文献

3.

Modification of sample size in group sequential clinical trials 总被引：1，自引：0，他引：1

Cui L Hung HM Wang SJ 《Biometrics》1999,55(3):853-857

In group sequential clinical trials, sample size reestimation can be a complicated issue when it allows for change of sample size to be influenced by an observed sample path. Our simulation studies show that increasing sample size based on an interim estimate of the treatment difference can substantially inflate the probability of type I error in most practical situations. A new group sequential test procedure is developed by modifying the weights used in the traditional repeated significance two-sample mean test. The new test has the type I error probability preserved at the target level and can provide a substantial gain in power with the increase of sample size. Generalization of the new procedure is discussed. 相似文献

4.

Heterogeneous treatment effects in stratified clinical trials with time‐to‐event endpoints

Christina Beisel Axel Benner Christina Kunz Annette Kopp‐Schneider 《Biometrical journal. Biometrische Zeitschrift》2017,59(3):511-530

When analyzing clinical trials with a stratified population, homogeneity of treatment effects is a common assumption in survival analysis. However, in the context of recent developments in clinical trial design, which aim to test multiple targeted therapies in corresponding subpopulations simultaneously, the assumption that there is no treatment‐by‐stratum interaction seems inappropriate. It becomes an issue if the expected sample size of the strata makes it unfeasible to analyze the trial arms individually. Alternatively, one might choose as primary aim to prove efficacy of the overall (targeted) treatment strategy. When testing for the overall treatment effect, a violation of the no‐interaction assumption renders it necessary to deviate from standard methods that rely on this assumption. We investigate the performance of different methods for sample size calculation and data analysis under heterogeneous treatment effects. The commonly used sample size formula by Schoenfeld is compared to another formula by Lachin and Foulkes, and to an extension of Schoenfeld's formula allowing for stratification. Beyond the widely used (stratified) Cox model, we explore the lognormal shared frailty model, and a two‐step analysis approach as potential alternatives that attempt to adjust for interstrata heterogeneity. We carry out a simulation study for a trial with three strata and violations of the no‐interaction assumption. The extension of Schoenfeld's formula to heterogeneous strata effects provides the most reliable sample size with respect to desired versus actual power. The two‐step analysis and frailty model prove to be more robust against loss of power caused by heterogeneous treatment effects than the stratified Cox model and should be preferred in such situations. 相似文献

5.

Homogeneity score test for the intraclass version of the kappa statistics and sample-size determination in multiple or stratified studies

Nam JM 《Biometrics》2003,59(4):1027-1035

When the intraclass correlation coefficient or the equivalent version of the kappa agreement coefficient have been estimated from several independent studies or from a stratified study, we have the problem of comparing the kappa statistics and combining the information regarding the kappa statistics in a common kappa when the assumption of homogeneity of kappa coefficients holds. In this article, using the likelihood score theory extended to nuisance parameters (Tarone, 1988, Communications in Statistics-Theory and Methods 17(5), 1549-1556) we present an efficient homogeneity test for comparing several independent kappa statistics and, also, give a modified homogeneity score method using a noniterative and consistent estimator as an alternative. We provide the sample size using the modified homogeneity score method and compare it with that using the goodness-of-fit method (GOF) (Donner, Eliasziw, and Klar, 1996, Biometrics 52, 176-183). A simulation study for small and moderate sample sizes showed that the actual level of the homogeneity score test using the maximum likelihood estimators (MLEs) of parameters is satisfactorily close to the nominal and it is smaller than those of the modified homogeneity score and the goodness-of-fit tests. We investigated statistical properties of several noniterative estimators of a common kappa. The estimator (Donner et al., 1996) is essentially efficient and can be used as an alternative to the iterative MLE. An efficient interval estimation of a common kappa using the likelihood score method is presented. 相似文献

6.

The impact of stopping rules on heterogeneity of results in overviews of clinical trials.

M D Hughes L S Freedman S J Pocock 《Biometrics》1992,48(1):41-53

This paper explores the extent to which application of statistical stopping rules in clinical trials can create an artificial heterogeneity of treatment effects in overviews (meta-analyses) of related trials. For illustration, we concentrate on overviews of identically designed group sequential trials, using either fixed nominal or O'Brien and Fleming two-sided boundaries. Some analytic results are obtained for two-group designs and simulation studies are otherwise used, with the following overall findings. The use of stopping rules leads to biased estimates of treatment effect so that the assessment of heterogeneity of results in an overview of trials, some of which have used stopping rules, is confounded by this bias. If the true treatment effect being studied is small, as is often the case, then artificial heterogeneity is introduced, thus increasing the Type I error rate in the test of homogeneity. This could lead to erroneous use of a random effects model, producing exaggerated estimates and confidence intervals. However, if the true mean effect is large, then between-trial heterogeneity may be underestimated. When undertaking or interpreting overviews, one should ascertain whether stopping rules have been used (either formally or informally) and should consider whether their use might account for any heterogeneity found. 相似文献

7.

A revisit on tests for homogeneity of the risk difference

Lui KJ Kelly C 《Biometrics》2000,56(1):309-315

Lipsitz et al. (1998, Biometrics 54, 148-160) discussed testing the homogeneity of the risk difference for a series of 2 x 2 tables. They proposed and evaluated several weighted test statistics, including the commonly used weighted least squares test statistic. Here we suggest various important improvements on these test statistics. First, we propose using the one-sided analogues of the test procedures proposed by Lipsitz et al. because we should only reject the null hypothesis of homogeneity when the variation of the estimated risk differences between centers is large. Second, we generalize their study by redesigning the simulations to include the situations considered by Lipsitz et al. (1998) as special cases. Third, we consider a logarithmic transformation of the weighted least squares test statistic to improve the normal approximation of its sampling distribution. On the basis of Monte Carlo simulations, we note that, as long as the mean treatment group size per table is moderate or large (> or = 16), this simple test statistic, in conjunction with the commonly used adjustment procedure for sparse data, can be useful when the number of 2 x 2 tables is small or moderate (< or = 32). In these situations, in fact, we find that our proposed method generally outperforms all the statistics considered by Lipsitz et al. Finally, we include a general guideline about which test statistic should be used in a variety of situations. 相似文献

8.

Estimation of a parameter and its exact confidence interval following sequential sample size reestimation trials

Cheng Y Shen Y 《Biometrics》2004,60(4):910-918

For confirmatory trials of regulatory decision making, it is important that adaptive designs under consideration provide inference with the correct nominal level, as well as unbiased estimates, and confidence intervals for the treatment comparisons in the actual trials. However, naive point estimate and its confidence interval are often biased in adaptive sequential designs. We develop a new procedure for estimation following a test from a sample size reestimation design. The method for obtaining an exact confidence interval and point estimate is based on a general distribution property of a pivot function of the Self-designing group sequential clinical trial by Shen and Fisher (1999, Biometrics55, 190-197). A modified estimate is proposed to explicitly account for futility stopping boundary with reduced bias when block sizes are small. The proposed estimates are shown to be consistent. The computation of the estimates is straightforward. We also provide a modified weight function to improve the power of the test. Extensive simulation studies show that the exact confidence intervals have accurate nominal probability of coverage, and the proposed point estimates are nearly unbiased with practical sample sizes. 相似文献

9.

Test statistic and sample size for a two-sample McNemar test

E J Feuer L G Kessler 《Biometrics》1989,45(2):629-636

McNemar's (1947, Psychometrika 12, 153-157) test of marginal homogeneity is generalized to a two-sample situation where the hypothesis of interest is that the marginal changes in each of two independently sampled tables are equal. This situation is especially applicable to two cohorts (a control and an intervention cohort), each measured at baseline and after the intervention on a binary outcome variable. Some assumptions often realistic in this situation simplify the calculation of sample size. The calculation of sample size in a study designed to increase utilization of breast cancer screening is demonstrated. 相似文献

10.

Iso- and Anisoasymmetry Testing in Pre-post Rating Designs

Prof. Dr. W.-R. Heilmann G. A. Lienert 《Biometrical journal. Biometrische Zeitschrift》1987,29(1):65-69

It is suggested to test against anisoasymmetry rather than against marginal homogeneity if (1) a pre-post treatment matched pairs contingency table is ordinally scaled by clinical ratings and if (2) the alternative to marginal homogeneity is a shift in location from pre- to post treatment ratings. Testing against anisoasymmetry may detect treatment effects which are not manifest in marginal inhomogeneities. 相似文献

11.

Sample size calculations for noninferiority trials with Poisson distributed count data

Kathrin Stucke Meinhard Kieser 《Biometrical journal. Biometrische Zeitschrift》2013,55(2):203-216

Clinical trials with Poisson distributed count data as the primary outcome are common in various medical areas such as relapse counts in multiple sclerosis trials or the number of attacks in trials for the treatment of migraine. In this article, we present approximate sample size formulae for testing noninferiority using asymptotic tests which are based on restricted or unrestricted maximum likelihood estimators of the Poisson rates. The Poisson outcomes are allowed to be observed for unequal follow‐up schemes, and both the situations that the noninferiority margin is expressed in terms of the difference and the ratio are considered. The exact type I error rates and powers of these tests are evaluated and the accuracy of the approximate sample size formulae is examined. The test statistic using the restricted maximum likelihood estimators (for the difference test problem) and the test statistic that is based on the logarithmic transformation and employs the maximum likelihood estimators (for the ratio test problem) show favorable type I error control and can be recommended for practical application. The approximate sample size formulae show high accuracy even for small sample sizes and provide power values identical or close to the aspired ones. The methods are illustrated by a clinical trial example from anesthesia. 相似文献

12.

Bayesian design and analysis of active control clinical trials 总被引：6，自引：0，他引：6

Simon R 《Biometrics》1999,55(2):484-487

We consider the design and analysis of active control clinical trials, i.e., clinical trials comparing an experimental treatment E to a control treatment C considered to be effective. Direct comparison of E to placebo P, or no treatment, is sometimes ethically unacceptable. Much discussion of the design and analysis of such clinical trials has focused on whether the comparison of E to C should be based on a test of the null hypothesis of equivalence, on a test of a nonnull hypothesis that the difference is of some minimally medically important size delta, or on one or two-sided confidence intervals. These approaches are essentially the same for study planning. They all suffer from arbitrariness in specifying the size of the difference delta that must be excluded. We propose an alternative Bayesian approach to the design and analysis of active control trials. We derive the posterior probability that E is superior to P or that E is at least k% as good as C and that C is more effective than P. We also derive approximations for use with logistic and proportional hazard models. Selection of prior distributions is discussed, and results are illustrated using data from an active control trial of a drug for the treatment of unstable angina. 相似文献

13.

Sample size formulae for two-stage randomized trials with survival outcomes

Li Z Murphy SA 《Biometrika》2011,98(3):503-518

Two-stage randomized trials are growing in importance in developing adaptive treatment strategies, i.e. treatment policies or dynamic treatment regimes. Usually, the first stage involves randomization to one of the several initial treatments. The second stage of treatment begins when an early nonresponse criterion or response criterion is met. In the second-stage, nonresponding subjects are re-randomized among second-stage treatments. Sample size calculations for planning these two-stage randomized trials with failure time outcomes are challenging because the variances of common test statistics depend in a complex manner on the joint distribution of time to the early nonresponse criterion or response criterion and the primary failure time outcome. We produce simple, albeit conservative, sample size formulae by using upper bounds on the variances. The resulting formulae only require the working assumptions needed to size a standard single-stage randomized trial and, in common settings, are only mildly conservative. These sample size formulae are based on either a weighted Kaplan-Meier estimator of survival probabilities at a fixed time-point or a weighted version of the log-rank test. 相似文献

14.

A FORTRAN program for testing trend and homogeneity in proportions

A K Thakur K J Berry P W Mielke 《Computer programs in biomedicine》1985,19(2-3):229-233

A FORTRAN program is provided for testing linear trend and homogeneity in proportions. Trend is evaluated by the Cochran-Armitage method and homogeneity is tested by an overall X2 test as well by multiple pairwise comparisons by the Fisher-Irwin exact method. The program should be easy to implement on any size of computer with a FORTRAN compiler. 相似文献

15.

Non-inferiority of new procedure to standard procedure in stratified matched-pair design

Nam JM 《Biometrical journal. Biometrische Zeitschrift》2006,48(6):966-977

We consider the statistical testing for non-inferiority of a new treatment compared with the standard one under matched-pair setting in a stratified study or in several trials. A non-inferiority test based on the efficient scores and a Mantel-Haenszel (M-H) like procedure with restricted maximum likelihood estimators (RMLEs) of nuisance parameters and their corresponding sample size formulae are presented. We evaluate the above tests and the M-H type Wald test in level and power. The stratified score test is conservative and provides the best power. The M-H like procedure with RMLEs gives an accurate level. However, the Wald test is anti-conservative and we suggest caution when it is used. The unstratified score test is not biased but it is less powerful than the stratified score test when base-line probabilities related to strata are not the same. This investigation shows that the stratified score test possesses optimum statistical properties in testing non-inferiority. A common difference between two proportions across strata is the basic assumption of the stratified tests, we present appropriate tests to validate the assumption and related remarks. 相似文献

16.

Validity and reliability of agility tests in junior Australian football players

Young W Farrow D Pyne D McGregor W Handke T 《Journal of strength and conditioning research / National Strength & Conditioning Association》2011,25(12):3399-3403

Young, W, Farrow, D, Pyne, D, McGregor, W, and Handke, T. Validity and reliability of agility tests in junior Australian football players. J Strength Cond Res 25(12): 3399-3403, 2011-The importance of sport-specific stimuli in reactive agility tests (RATs) compared to other agility tests is not known. The purpose of this research was to determine the validity and reliability of agility tests. Fifty junior Australian football players aged 15-17 years, members of either an elite junior squad (n = 35) or a secondary school team (n = 15), were assessed on a new RAT that involved a change of direction sprint in response to the movements of an attacking player projected in life size on a screen. These players also underwent the planned Australian Football League agility test, and a subgroup (n = 13) underwent a test requiring a change of direction in response to a left or right arrow stimulus. The elite players were significantly better than the school group players on the RAT (2.81 ± 0.08 seconds, 3.07 ± 0.12 seconds; difference 8.5%) but not in the arrow stimulus test or planned agility test. The data were log transformed and the reliability of the new RAT estimated using typical error (TE) expressed as a coefficient of variation. The TE for the RAT was 2.7% (2.0-4.3, 90% confidence interval) or 0.07 seconds (0.5-1.0), with an intraclass correlation coefficient (ICC) of 0.33. For the test using the arrow stimulus, the TE was 3.4% (2.4-6.2), 0.09 (0.06-0.15) seconds, and ICC was 0.10. The sport-specific stimulus provided by the new RAT is a crucial component of an agility test; however, adoption of the new RAT for routine testing is likely to require more accessible equipment and several familiarization trials to improve its reliability. 相似文献

17.

Power and sample size estimation for the clustered wilcoxon test

Rosner B Glynn RJ 《Biometrics》2011,67(2):646-653

The Wilcoxon rank sum test is widely used for two-group comparisons of nonnormal data. An assumption of this test is independence of sampling units both within and between groups, which will be violated in the clustered data setting such as in ophthalmological clinical trials, where the unit of randomization is the subject, but the unit of analysis is the individual eye. For this purpose, we have proposed the clustered Wilcoxon test to account for clustering among multiple subunits within the same cluster (Rosner, Glynn, and Lee, 2003, Biometrics 59, 1089-1098; 2006, Biometrics 62, 1251-1259). However, power estimation is needed to plan studies that use this analytic approach. We have recently published methods for estimating power and sample size for the ordinary Wilcoxon rank sum test (Rosner and Glynn, 2009, Biometrics 65, 188-197). In this article we present extensions of this approach to estimate power for the clustered Wilcoxon test. Simulation studies show a good agreement between estimated and empirical power. These methods are illustrated with examples from randomized trials in ophthalmology. Enhanced power is achieved with use of the subunit as the unit of analysis instead of the cluster using the ordinary Wilcoxon rank sum test. 相似文献

18.

Using conditional power of network meta‐analysis (NMA) to inform the design of future clinical trials

下载免费PDF全文

Adriani Nikolakopoulou Dimitris Mavridis Georgia Salanti 《Biometrical journal. Biometrische Zeitschrift》2014,56(6):973-990

Clinical trials are typically designed with an aim to reach sufficient power to test a hypothesis about relative effectiveness of two or more interventions. Their role in informing evidence‐based decision‐making demands, however, that they are considered in the context of the existing evidence. Consequently, their planning can be informed by characteristics of relevant systematic reviews and meta‐analyses. In the presence of multiple competing interventions the evidence base has the form of a network of trials, which provides information not only about the required sample size but also about the interventions that should be compared in a future trial. In this paper we present a methodology to evaluate the impact of new studies, their information size, the comparisons involved, and the anticipated heterogeneity on the conditional power (CP) of the updated network meta‐analysis. The methods presented are an extension of the idea of CP initially suggested for a pairwise meta‐analysis and we show how to estimate the required sample size using various combinations of direct and indirect evidence in future trials. We apply the methods to two previously published networks and we show that CP for a treatment comparison is dependent on the magnitude of heterogeneity and the ratio of direct to indirect information in existing and future trials for that comparison. Our methodology can help investigators calculate the required sample size under different assumptions about heterogeneity and make decisions about the number and design of future studies (set of treatments compared). 相似文献

19.

Homogeneity test of rate ratios in stratified matched-pair studies

Li HQ Tang NS 《Biometrical journal. Biometrische Zeitschrift》2011,53(4):614-627

This paper investigates homogeneity test of rate ratios in stratified matched-pair studies on the basis of asymptotic and bootstrap-resampling methods. Based on the efficient score approach, we develop a simple and computationally tractable score test statistic. Several other homogeneity test statistics are also proposed on the basis of the weighted least-squares estimate and logarithmic transformation. Sample size formulae are derived to guarantee a pre-specified power for the proposed tests at the pre-given significance level. Empirical results confirm that (i) the modified score statistic based on the bootstrap-resampling method performs better in the sense that its empirical type I error rate is much closer to the pre-specified nominal level than those of other tests and its power is greater than those of other tests, and is hence recommended, whilst the statistics based on the weighted least-squares estimate and logarithmic transformation are slightly conservative under some of the considered settings; (ii) the derived sample size formulae are rather accurate in the sense that their empirical powers obtained from the estimated sample sizes are very close to the pre-specified nominal powers. A real example is used to illustrate the proposed methodologies. 相似文献

20.

Testing for heterogeneity among phenotypic correlations: a comparison of methods using Monte Carlo simulations

Brown RP 《Genetica》1997,101(1):67-74

Heterogeneous phenotypic correlations may be suggestive of underlying changes in genetic covariance among life-history, morphology, and behavioural traits, and their detection is therefore relevant to many biological studies. Two new statistical tests are proposed and their performances compared with existing methods. Of all tests considered, the existing approximate test of homogeneity of product-moment correlations provides the greatest power to detect heterogeneous correlations, when based on Hotelling's z*-transformation. The use of this transformation and test is recommended under conditions of bivariate normality. A new distribution-free randomisation test of homogeneity of Spearman's rank correlations is described and recommended for use when the bivariate samples are taken from populations with non-normal or unknown distributions. An alternative randomisation test of homogeneity of product-moment correlations is shown to be a useful compromise between the approximate tests and the randomisation tests on Spearman's rank correlations: it is not as sensitive to departures from normality as the approximate tests, but has greater power than the rank correlation test. An example is provided that shows how choice of test will have a considerable influence on the conclusions of a particular study. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献