期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Testing superiority and non-inferiority hypotheses in active controlled clinical trials

Tsong Y Zhang JJ 《Biometrical journal. Biometrische Zeitschrift》2005,47(1):62-74; discussion 99-107

Switching between testing for superiority and non-inferiority has been an important statistical issue in the design and analysis of active controlled clinical trial. In practice, it is often conducted with a two-stage testing procedure. It has been assumed that there is no type I error rate adjustment required when either switching to test for non-inferiority once the data fail to support the superiority claim or switching to test for superiority once the null hypothesis of non-inferiority is rejected with a pre-specified non-inferiority margin in a generalized historical control approach. However, when using a cross-trial comparison approach for non-inferiority testing, controlling the type I error rate sometimes becomes an issue with the conventional two-stage procedure. We propose to adopt a single-stage simultaneous testing concept as proposed by Ng (2003) to test both non-inferiority and superiority hypotheses simultaneously. The proposed procedure is based on Fieller's confidence interval procedure as proposed by Hauschke et al. (1999). 相似文献

2.

A note on familywise error rate for a primary and secondary endpoint

Michael A. Proschan Dean A. Follmann 《Biometrics》2023,79(2):1114-1118

Hung et al. (2007) considered the problem of controlling the type I error rate for a primary and secondary endpoint in a clinical trial using a gatekeeping approach in which the secondary endpoint is tested only if the primary endpoint crosses its monitoring boundary. They considered a two-look trial and showed by simulation that the naive method of testing the secondary endpoint at full level α at the time the primary endpoint reaches statistical significance does not control the familywise error rate at level α. Tamhane et al. (2010) derived analytic expressions for familywise error rate and power and confirmed the inflated error rate of the naive approach. Nonetheless, many people mistakenly believe that the closure principle can be used to prove that the naive procedure controls the familywise error rate. The purpose of this note is to explain in greater detail why there is a problem with the naive approach and show that the degree of alpha inflation can be as high as that of unadjusted monitoring of a single endpoint. 相似文献

3.

Resampling-based empirical Bayes multiple testing procedures for controlling generalized tail probability and expected value error rates: focus on the false discovery rate and simulation study

Dudoit S Gilbert HN van der Laan MJ 《Biometrical journal. Biometrische Zeitschrift》2008,50(5):716-744

This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure. 相似文献

4.

Changing frequency of interim analysis in sequential monitoring 总被引：1，自引：0，他引：1

K K Lan D L DeMets 《Biometrics》1989,45(3):1017-1020

In clinical trial data monitoring, one can either introduce a discrete sequential boundary for a set of specified decision times or adopt a use function and then derive the boundary when data are monitored. If the use function approach is employed, one can adjust the frequency of data monitoring as long as the decision is not data-dependent. However, if the frequency of future data monitoring is affected by the observed data, then the probability of Type I error will no longer be preserved exactly. But the effect on the significance level and power is very small, perhaps negligible, as indicated by simulation results. 相似文献

5.

General multistage gatekeeping procedures

Dmitrienko A Tamhane AC Wiens BL 《Biometrical journal. Biometrische Zeitschrift》2008,50(5):667-677

A general multistage (stepwise) procedure is proposed for dealing with arbitrary gatekeeping problems including parallel and serial gatekeeping. The procedure is very simple to implement since it does not require the application of the closed testing principle and the consequent need to test all nonempty intersections of hypotheses. It is based on the idea of carrying forward the Type I error rate for any rejected hypotheses to test hypotheses in the next ordered family. This requires the use of a so-called separable multiple test procedure (MTP) in the earlier family. The Bonferroni MTP is separable, but other standard MTPs such as Holm, Hochberg, Fallback and Dunnett are not. Their truncated versions are proposed which are separable and more powerful than the Bonferroni MTP. The proposed procedure is illustrated by a clinical trial example. 相似文献

6.

THE BETA-BINOMIAL MODEL: ACCOUNTING FOR INTER-TRIAL VARIATION IN REPLICATED DIFFERENCE AND PREFERENCE TESTS 总被引：1，自引：0，他引：1

DANIEL M. ENNIS JIAN BI 《Journal of sensory studies》1998,13(4):389-412

Binomial tests are commonly used in sensory difference and preference testing under the assumptions that choices are independent and choice probabilities do not vary from trial to trial. This paper addresses violations of the latter assumption (often referred to as overdispersion) and accounts for variation in inter-trial choice probabilities following the Beta distribution. Such variation could arise as a result of differences in test substrate from trial to trial, differences in sensory acuity among subjects or the existence of latent preference segments. In fact, it is likely that overdispersion occurs ubiquitously in product testing. Using the Binomial model for data in which there is inter-trial variation may lead to seriously misleading conclusions from a sensory difference or preference test. A simulation study in this paper based on product testing experience showed that when using a Binomial model for overdispersed Binomial data, Type I error may be 0.44 for a Binomial test specification corresponding to a level of 0.05. Underestimation of Type I error using the Binomial model may seriously undermine legal claims of product superiority in situations where overdispersion occurs. The Beta-Binomial (BB) model, an extension of the Binomial distribution, was developed to fit overdispersed Binomial data. Procedures for estimating and testing the parameters as well as testing for goodness of fit are discussed. Procedures for determining sample size and for calculating estimate precision and test power based on the BB model are given. Numerical examples and simulation results are also given in the paper. The BB model should improve the validity of sensory difference and preference testing. 相似文献

7.

Sample-size redetermination for repeated measures studies

Zucker DM Denne J 《Biometrics》2002,58(3):548-559

Clinical trialists recently have shown interest in two-stage procedures for updating the sample-size calculation at an interim point in a trial. Because many clinical trials involve repeated measures designs, it is desirable to have available practical two-stage procedures for such designs. Shih and Gould (1995, Statistics in Medicine 14, 2239-2248) discuss sample-size redetermination for repeated measures studies but under a highly simplified setup. We develop two-stage procedures under the general mixed linear model, allowing for dropouts and missed visits. We present a range of procedures and compare their Type I error and power by simulation. We find that, in general, the achieved power is brought considerably closer to the required level without inflating the Type I error rate. We also derive an inflation factor that ensures the power requirement is more closely met. 相似文献

8.

Self-designing two-stage trials to minimize expected costs

Thach CT Fisher LD 《Biometrics》2002,58(2):432-438

In the design of clinical trials, the sample size for the trial is traditionally calculated from estimates of parameters of interest, such as the mean treatment effect, which can often be inaccurate. However, recalculation of the sample size based on an estimate of the parameter of interest that uses accumulating data from the trial can lead to inflation of the overall Type I error rate of the trial. The self-designing method of Fisher, also known as the variance-spending method, allows the use of all accumulating data in a sequential trial (including the estimated treatment effect) in determining the sample size for the next stage of the trial without inflating the Type I error rate. We propose a self-designing group sequential procedure to minimize the expected total cost of a trial. Cost is an important parameter to consider in the statistical design of clinical trials due to limited financial resources. Using Bayesian decision theory on the accumulating data, the design specifies sequentially the optimal sample size and proportion of the test statistic's variance needed for each stage of a trial to minimize the expected cost of the trial. The optimality is with respect to a prior distribution on the parameter of interest. Results are presented for a simple two-stage trial. This method can extend to nonmonetary costs, such as ethical costs or quality-adjusted life years. 相似文献

9.

Characterization of dose-response relationships inferred by statistically significant trend tests

R L Kodell J J Chen 《Biometrics》1991,47(1):139-146

A method is proposed for classifying various experimental outcomes associated with statistically significant trend tests according to a set of sequential testing within a family of closed (under intersections) one-sided tests. The intent of the procedure is to characterize the general shape of implied dose-response relationships, taking care neither to inflate the false-positive (Type I) error rate by overtesting, nor to sacrifice power by overadjusting for multiple comparisons. 相似文献

10.

Incorporating historical information in biosimilar trials: Challenges and a hybrid Bayesian‐frequentist approach

下载免费PDF全文

Johanna Mielke Heinz Schmidli Byron Jones 《Biometrical journal. Biometrische Zeitschrift》2018,60(3):564-582

For the approval of biosimilars, it is, in most cases, necessary to conduct large Phase III clinical trials in patients to convince the regulatory authorities that the product is comparable in terms of efficacy and safety to the originator product. As the originator product has already been studied in several trials beforehand, it seems natural to include this historical information into the showing of equivalent efficacy. Since all studies for the regulatory approval of biosimilars are confirmatory studies, it is required that the statistical approach has reasonable frequentist properties, most importantly, that the Type I error rate is controlled—at least in all scenarios that are realistic in practice. However, it is well known that the incorporation of historical information can lead to an inflation of the Type I error rate in the case of a conflict between the distribution of the historical data and the distribution of the trial data. We illustrate this issue and confirm, using the Bayesian robustified meta‐analytic‐predictive (MAP) approach as an example, that simultaneously controlling the Type I error rate over the complete parameter space and gaining power in comparison to a standard frequentist approach that only considers the data in the new study, is not possible. We propose a hybrid Bayesian‐frequentist approach for binary endpoints that controls the Type I error rate in the neighborhood of the center of the prior distribution, while improving the power. We study the properties of this approach in an extensive simulation study and provide a real‐world example. 相似文献

11.

Implementation of group sequential logrank tests in a maximum duration trial

K K Lan J M Lachin 《Biometrics》1990,46(3):759-770

To control the Type I error probability in a group sequential procedure using the logrank test, it is important to know the information times (fractions) at the times of interim analyses conducted for purposes of data monitoring. For the logrank test, the information time at an interim analysis is the fraction of the total number of events to be accrued in the entire trial. In a maximum information trial design, the trial is concluded when a prespecified total number of events has been accrued. For such a design, therefore, the information time at each interim analysis is known. However, many trials are designed to accrue data over a fixed duration of follow-up on a specified number of patients. This is termed a maximum duration trial design. Under such a design, the total number of events to be accrued is unknown at the time of an interim analysis. For a maximum duration trial design, therefore, these information times need to be estimated. A common practice is to assume that a fixed fraction of information will be accrued between any two consecutive interim analyses, and then employ a Pocock or O'Brien-Fleming boundary. In this article, we describe an estimate of the information time based on the fraction of total patient exposure, which tends to be slightly negatively biased (i.e., conservative) if survival is exponentially distributed. We then present a numerical exploration of the robustness of this estimate when nonexponential survival applies. We also show that the Lan-DeMets (1983, Biometrika 70, 659-663) procedure for constructing group sequential boundaries with the desired level of Type I error control can be computed using the estimated information fraction, even though it may be biased. Finally, we discuss the implications of employing a biased estimate of study information for a group sequential procedure. 相似文献

12.

Test equality and sample size calculation based on risk difference in a randomized clinical trial with noncompliance and missing outcomes

Lui KJ Chang KC 《Biometrical journal. Biometrische Zeitschrift》2008,50(2):224-236

In a randomized clinical trial (RCT), noncompliance with an assigned treatment can occur due to serious side effects, while missing outcomes on patients may happen due to patients' withdrawal or loss to follow up. To avoid the possible loss of power to detect a given risk difference (RD) of interest between two treatments, it is essentially important to incorporate the information on noncompliance and missing outcomes into sample size calculation. Under the compound exclusion restriction model proposed elsewhere, we first derive the maximum likelihood estimator (MLE) of the RD among compliers between two treatments for a RCT with noncompliance and missing outcomes and its asymptotic variance in closed form. Based on the MLE with tanh(-1)(x) transformation, we develop an asymptotic test procedure for testing equality of two treatment effects among compliers. We further derive a sample size calculation formula accounting for both noncompliance and missing outcomes for a desired power 1 - beta at a nominal alpha-level. To evaluate the performance of the test procedure and the accuracy of the sample size calculation formula, we employ Monte Carlo simulation to calculate the estimated Type I error and power of the proposed test procedure corresponding to the resulting sample size in a variety of situations. We find that both the test procedure and the sample size formula developed here can perform well. Finally, we include a discussion on the effects of various parameters, including the proportion of compliers, the probability of non-missing outcomes, and the ratio of sample size allocation, on the minimum required sample size. 相似文献

13.

On multiple comparisons in the randomization analysis of growth and response curves

G O Zerbe J R Murphy 《Biometrics》1986,42(4):795-804

Two multiple-comparisons procedures are suggested for supplementing randomization analysis of growth and response curves. One controls the experimentwise Type I error rate for all possible contrast curves via an extension of the Scheffé method. The other controls a family of Type I error rates via a stepwise testing procedure. Both can be approximated by standard F tests without costly recomputation of all of the test statistics for a large number of permutations. 相似文献

14.

Multiple comparisons in the randomization analysis of designed experiments with growth curve responses

R V Foutz D R Jensen G W Anderson 《Biometrics》1985,41(1):29-37

A randomization approach to multiple comparisons is developed for comparing several growth curves in randomized experiments. The exact Type I probability error rate for these comparisons may be prespecified, and a Type I error probability for each component test can be evaluated. These procedures are free of many of the standard assumptions for analyzing growth curves and for making multiple comparisons. An application of the procedure gives all pairwise comparisons among the mean growth curves associated with four treatments in an animal experiment using a Youden square design, where growth curves are obtained on monitoring hormone levels over time. 相似文献

15.

COMPARISON OF METHODS FOR ANALYZING REPLICATED PREFERENCE TESTS

CHUN-YEN CHANG COCHRANE SUZANNE DUBNICKA THOMAS LOUGHIN 《Journal of sensory studies》2005,20(6):484-502

Preference testing is commonly used in consumer sensory evaluation. Traditionally, it is done without replication, effectively leading to a single 0/1 (binary) measurement on each panelist. However, to understand the nature of the preference, replicated preference tests are a better approach, resulting in binomial counts of preferences on each panelist. Variability among panelists then leads to overdispersion of the counts when the binomial model is used and to an inflated Type I error rate for statistical tests of preference. Overdispersion can be adjusted by Pearson correction or by other models such as correlated binomial or beta‐binomial. Several methods are suggested or reviewed in this study for analyzing replicated preference tests and their Type I error rates and power are compared. Simulation studies show that all methods have reasonable Type I error rates and similar power. Among them, the binomial model with Pearson adjustment is probably the safest way to analyze replicated preference tests, while a normal model in which the binomial distribution is not assumed is the easiest. 相似文献

16.

Testing the productive-space hypothesis: rational and power

Post DM 《Oecologia》2007,153(4):973-984

Understanding and explaining the causes of variation in food-chain length is a fundamental challenge for community ecology. The productive-space hypothesis, which suggests food-chain length is determined by the combination of local resource availability and ecosystem size, is central to this challenge. Two different approaches currently exist for testing the productive-space hypothesis: (1) the dual gradient approach that tests for significant relationships between food-chain length and separate gradients of ecosystem size (e.g., lake volume) and per-unit-size resource availability (e.g., g C m⁻¹ year⁻²), and (2) the single gradient approach that tests for a significant relationship between food-chain length and the productive space (product of ecosystem size and per-unit-size resource availability). Here I evaluate the efficacy of the two approaches for testing the productive-space hypothesis. Using simulated data sets, I estimate the Type 1 and Type 2 error rates for single and dual gradient models in recovering a known relationship between food-chain length and ecosystem size, resource availability, or the combination of ecosystem size and resource ability, as specified by the productive-space hypothesis. The single gradient model provided high power (low Type 2 error rates) but had a very high Type 1 error rate, often erroneously supporting the productive-space hypothesis. The dual gradient model had a very low Type 1 error rate but suffered from low power to detect an effect of per-unit-size resource availability because the range of variation in resource availability is limited. Finally, I performed a retrospective power analysis for the Post et al. (Nature 405:1047–1049, 2000) data set, which tested and rejected the productive-space hypothesis using the dual gradient approach. I found that Post et al. (Nature 405:1047–1049, 2000) had sufficient power to reject the productive-space hypothesis in north temperate lakes; however, the productive-space hypothesis must be tested in other ecosystems before its generality can be fully addressed. 相似文献

17.

Testing type II error rates in biological anthropology

Byers SN 《American journal of physical anthropology》2000,111(2):283-289

This paper presents a look at the underused procedure of testing for Type II errors when "negative" results are encountered during research. It recommends setting a statistical alternative hypothesis based on anthropologically derived information and calculating the probability of committing this type of error. In this manner, the process is similar to that used for testing Type I errors, which is clarified by examples from the literature. It is hoped that researchers will use the information presented here as a means of attaching levels of probability to acceptance of null hypotheses. 相似文献

18.

Testing and estimation of proportion (or risk) ratio under the matched‐pair design with multiple binary endpoints

Kung‐Jong Lui Kuang‐Chao Chang 《Biometrical journal. Biometrische Zeitschrift》2013,55(4):603-616

The proportion ratio (PR) of responses between an experimental treatment and a control treatment is one of the most commonly used indices to measure the relative treatment effect in a randomized clinical trial. We develop asymptotic and permutation‐based procedures for testing equality of treatment effects as well as derive confidence intervals of PRs for multivariate binary matched‐pair data under a mixed‐effects exponential risk model. To evaluate and compare the performance of these test procedures and interval estimators, we employ Monte Carlo simulation. When the number of matched pairs is large, we find that all test procedures presented here can perform well with respect to Type I error. When the number of matched pairs is small, the permutation‐based test procedures developed in this paper is of use. Furthermore, using test procedures (or interval estimators) based on a weighted linear average estimator of treatment effects can improve power (or gain precision) when the treatment effects on all response variables of interest are known to fall in the same direction. Finally, we apply the data taken from a crossover clinical trial that monitored several adverse events of an antidepressive drug to illustrate the practical use of test procedures and interval estimators considered here. 相似文献

19.

A Novel Approach to Delayed-Start Analyses for Demonstrating Disease-Modifying Effects in Alzheimer’s Disease

Hong Liu-Seifert Scott W. Andersen Ilya Lipkovich Karen C. Holdridge Eric Siemers 《PloS one》2015,10(3)

One method for demonstrating disease modification is a delayed-start design, consisting of a placebo-controlled period followed by a delayed-start period wherein all patients receive active treatment. To address methodological issues in previous delayed-start approaches, we propose a new method that is robust across conditions of drug effect, discontinuation rates, and missing data mechanisms. We propose a modeling approach and test procedure to test the hypothesis of noninferiority, comparing the treatment difference at the end of the delayed-start period with that at the end of the placebo-controlled period. We conducted simulations to identify the optimal noninferiority testing procedure to ensure the method was robust across scenarios and assumptions, and to evaluate the appropriate modeling approach for analyzing the delayed-start period. We then applied this methodology to Phase 3 solanezumab clinical trial data for mild Alzheimer’s disease patients. Simulation results showed a testing procedure using a proportional noninferiority margin was robust for detecting disease-modifying effects; conditions of high and moderate discontinuations; and with various missing data mechanisms. Using all data from all randomized patients in a single model over both the placebo-controlled and delayed-start study periods demonstrated good statistical performance. In analysis of solanezumab data using this methodology, the noninferiority criterion was met, indicating the treatment difference at the end of the placebo-controlled studies was preserved at the end of the delayed-start period within a pre-defined margin. The proposed noninferiority method for delayed-start analysis controls Type I error rate well and addresses many challenges posed by previous approaches. Delayed-start studies employing the proposed analysis approach could be used to provide evidence of a disease-modifying effect. This method has been communicated with FDA and has been successfully applied to actual clinical trial data accrued from the Phase 3 clinical trials of solanezumab. 相似文献

20.

A Two-Way Analysis of Covariance Model for Classification of Stability Data

Hongshik Ahn James J. Chen Tsae-Yun D. Lin 《Biometrical journal. Biometrische Zeitschrift》1997,39(5):559-576

This paper proposes a procedure for testing and classifying data with multiple factors. A two-way analysis of covariance is used to classify the differences among the batches as well as another factor such as package type and/or product strength. In the test procedure, slopes and intercepts of the main effects are tested using a combination of simultaneous and sequential F-tests. Based on the test procedure results, the data are classified into one of four different groups. For each group, shelf life can be calculated accordingly. We examine if the procedure produces satisfactory control of the probability of a Type I error and the power of detecting the difference of degradation rates and intercepts for different nominal levels. The method is evaluated with a Monte Carlo simulation study. The proposed procedure is compared with the current FDA procedure using real data. 相似文献