期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sample size determination in clinical trials with multiple co‐primary endpoints including mixed continuous and binary variables

Takashi Sozu Tomoyuki Sugimoto Toshimitsu Hamasaki 《Biometrical journal. Biometrische Zeitschrift》2012,54(5):716-729

In the field of pharmaceutical drug development, there have been extensive discussions on the establishment of statistically significant results that demonstrate the efficacy of a new treatment with multiple co‐primary endpoints. When designing a clinical trial with such multiple co‐primary endpoints, it is critical to determine the appropriate sample size for indicating the statistical significance of all the co‐primary endpoints with preserving the desired overall power because the type II error rate increases with the number of co‐primary endpoints. We consider overall power functions and sample size determinations with multiple co‐primary endpoints that consist of mixed continuous and binary variables, and provide numerical examples to illustrate the behavior of the overall power functions and sample sizes. In formulating the problem, we assume that response variables follow a multivariate normal distribution, where binary variables are observed in a dichotomized normal distribution with a certain point of dichotomy. Numerical examples show that the sample size decreases as the correlation increases when the individual powers of each endpoint are approximately and mutually equal. 相似文献

2.

Testing a Primary and a Secondary Endpoint in a Group Sequential Design

Ajit C. Tamhane Cyrus R. Mehta Lingyun Liu 《Biometrics》2010,66(4):1174-1184

Summary We consider a clinical trial with a primary and a secondary endpoint where the secondary endpoint is tested only if the primary endpoint is significant. The trial uses a group sequential procedure with two stages. The familywise error rate (FWER) of falsely concluding significance on either endpoint is to be controlled at a nominal level α. The type I error rate for the primary endpoint is controlled by choosing any α‐level stopping boundary, e.g., the standard O'Brien–Fleming or the Pocock boundary. Given any particular α‐level boundary for the primary endpoint, we study the problem of determining the boundary for the secondary endpoint to control the FWER. We study this FWER analytically and numerically and find that it is maximized when the correlation coefficient ρ between the two endpoints equals 1. For the four combinations consisting of O'Brien–Fleming and Pocock boundaries for the primary and secondary endpoints, the critical constants required to control the FWER are computed for different values of ρ. An ad hoc boundary is proposed for the secondary endpoint to address a practical concern that may be at issue in some applications. Numerical studies indicate that the O'Brien–Fleming boundary for the primary endpoint and the Pocock boundary for the secondary endpoint generally gives the best primary as well as secondary power performance. The Pocock boundary may be replaced by the ad hoc boundary for the secondary endpoint with a very little loss of secondary power if the practical concern is at issue. A clinical trial example is given to illustrate the methods. 相似文献

3.

Improved two‐stage group sequential procedures for testing a secondary endpoint after the primary endpoint achieves significance

下载免费PDF全文

Huiling Li Jianming Wang Xiaolong Luo Janis Grechko Christopher Jennison 《Biometrical journal. Biometrische Zeitschrift》2018,60(5):893-902

In two‐stage group sequential trials with a primary and a secondary endpoint, the overall type I error rate for the primary endpoint is often controlled by an α‐level boundary, such as an O'Brien‐Fleming or Pocock boundary. Following a hierarchical testing sequence, the secondary endpoint is tested only if the primary endpoint achieves statistical significance either at an interim analysis or at the final analysis. To control the type I error rate for the secondary endpoint, this is tested using a Bonferroni procedure or any α‐level group sequential method. In comparison with marginal testing, there is an overall power loss for the test of the secondary endpoint since a claim of a positive result depends on the significance of the primary endpoint in the hierarchical testing sequence. We propose two group sequential testing procedures with improved secondary power: the improved Bonferroni procedure and the improved Pocock procedure. The proposed procedures use the correlation between the interim and final statistics for the secondary endpoint while applying graphical approaches to transfer the significance level from the primary endpoint to the secondary endpoint. The procedures control the familywise error rate (FWER) strongly by construction and this is confirmed via simulation. We also compare the proposed procedures with other commonly used group sequential procedures in terms of control of the FWER and the power of rejecting the secondary hypothesis. An example is provided to illustrate the procedures. 相似文献

4.

Trimmed Weighted Simes' Test for Two One‐Sided Hypotheses With Arbitrarily Correlated Test Statistics

Werner Brannath Frank Bretz Willi Maurer Sanat Sarkar 《Biometrical journal. Biometrische Zeitschrift》2009,51(6):885-898

The two‐sided Simes test is known to control the type I error rate with bivariate normal test statistics. For one‐sided hypotheses, control of the type I error rate requires that the correlation between the bivariate normal test statistics is non‐negative. In this article, we introduce a trimmed version of the one‐sided weighted Simes test for two hypotheses which rejects if (i) the one‐sided weighted Simes test rejects and (ii) both p‐values are below one minus the respective weighted Bonferroni adjusted level. We show that the trimmed version controls the type I error rate at nominal significance level α if (i) the common distribution of test statistics is point symmetric and (ii) the two‐sided weighted Simes test at level 2α controls the level. These assumptions apply, for instance, to bivariate normal test statistics with arbitrary correlation. In a simulation study, we compare the power of the trimmed weighted Simes test with the power of the weighted Bonferroni test and the untrimmed weighted Simes test. An additional result of this article ensures type I error rate control of the usual weighted Simes test under a weak version of the positive regression dependence condition for the case of two hypotheses. This condition is shown to apply to the two‐sided p‐values of one‐ or two‐sample t‐tests for bivariate normal endpoints with arbitrary correlation and to the corresponding one‐sided p‐values if the correlation is non‐negative. The Simes test for such types of bivariate t‐tests has not been considered before. According to our main result, the trimmed version of the weighted Simes test then also applies to the one‐sided bivariate t‐test with arbitrary correlation. 相似文献

5.

An optimal Bayesian predictive probability design for phase II clinical trials with simple and complicated endpoints

Beibei Guo Suyu Liu 《Biometrical journal. Biometrische Zeitschrift》2020,62(2):339-349

Most existing phase II clinical trial designs focus on conventional chemotherapy with binary tumor response as the endpoint. The advent of novel therapies, such as molecularly targeted agents and immunotherapy, has made the endpoint of phase II trials more complicated, often involving ordinal, nested, and coprimary endpoints. We propose a simple and flexible Bayesian optimal phase II predictive probability (OPP) design that handles binary and complex endpoints in a unified way. The Dirichlet-multinomial model is employed to accommodate different types of endpoints. At each interim, given the observed interim data, we calculate the Bayesian predictive probability of success, should the trial continue to the maximum planned sample size, and use it to make the go/no-go decision. The OPP design controls the type I error rate, maximizes power or minimizes the expected sample size, and is easy to implement, because the go/no-go decision boundaries can be enumerated and included in the protocol before the onset of the trial. Simulation studies show that the OPP design has satisfactory operating characteristics. 相似文献

6.

On testing simultaneously non-inferiority in two multiple primary endpoints and superiority in at least one of them

Röhmel J Gerlinger C Benda N Läuter J 《Biometrical journal. Biometrische Zeitschrift》2006,48(6):916-933

In a clinical trial with an active treatment and a placebo the situation may occur that two (or even more) primary endpoints may be necessary to describe the active treatment's benefit. The focus of our interest is a more specific situation with two primary endpoints in which superiority in one of them would suffice given that non-inferiority is observed in the other. Several proposals exist in the literature for dealing with this or similar problems, but prove insufficient or inadequate at a closer look (e.g. Bloch et al. (2001, 2006) or Tamhane and Logan (2002, 2004)). For example, we were unable to find a good reason why a bootstrap p-value for superiority should depend on the initially selected non-inferiority margins or on the initially selected type I error alpha. We propose a hierarchical three step procedure, where non-inferiority in both variables must be proven in the first step, superiority has to be shown by a bivariate test (e.g. Holm (1979), O'Brien (1984), Hochberg (1988), a bootstrap (Wang (1998)), or L?uter (1996)) in the second step, and then superiority in at least one variable has to be verified in the third step by a corresponding univariate test. All statistical tests are performed at the same one-sided significance level alpha. From the above mentioned bivariate superiority tests we preferred L?uter's SS test and the Holm procedure for the reason that these have been proven to control the type I error strictly, irrespective of the correlation structure among the primary variables and the sample size applied. A simulation study reveals that the performance regarding power of the bivariate test depends to a considerable degree on the correlation and on the magnitude of the expected effects of the two primary endpoints. Therefore, the recommendation of which test to choose depends on knowledge of the possible correlation between the two primary endpoints. In general, L?uter's SS procedure in step 2 shows the best overall properties, whereas Holm's procedure shows an advantage if both a positive correlation between the two variables and a considerable difference between their standardized effect sizes can be expected. 相似文献

7.

A note on familywise error rate for a primary and secondary endpoint

Michael A. Proschan Dean A. Follmann 《Biometrics》2023,79(2):1114-1118

Hung et al. (2007) considered the problem of controlling the type I error rate for a primary and secondary endpoint in a clinical trial using a gatekeeping approach in which the secondary endpoint is tested only if the primary endpoint crosses its monitoring boundary. They considered a two-look trial and showed by simulation that the naive method of testing the secondary endpoint at full level α at the time the primary endpoint reaches statistical significance does not control the familywise error rate at level α. Tamhane et al. (2010) derived analytic expressions for familywise error rate and power and confirmed the inflated error rate of the naive approach. Nonetheless, many people mistakenly believe that the closure principle can be used to prove that the naive procedure controls the familywise error rate. The purpose of this note is to explain in greater detail why there is a problem with the naive approach and show that the degree of alpha inflation can be as high as that of unadjusted monitoring of a single endpoint. 相似文献

8.

Combining Global and Marginal Tests to Compare Two Treatments on Multiple Endpoints

Brent R. Logan Ajit C. Tamhane 《Biometrical journal. Biometrische Zeitschrift》2001,43(5):591-604

We consider the problem of comparing two treatments on multiple endpoints where the goal is to identify the endpoints that have treatment effects, while controlling the familywise error rate. Two current approaches for this are (i) applying a global test within a closed testing procedure, and (ii) adjusting individual endpoint p‐values for multiplicity. We propose combining the two current methods. We compare the combined method with several competing methods in a simulation study. It is concluded that the combined approach maintains higher power under a variety of treatment effect configurations than the other methods and is thus more power‐robust. 相似文献

9.

Testing for Treatment Effects on Subsets of Endpoints

James J. Chen Sue‐Jane Wang 《Biometrical journal. Biometrische Zeitschrift》2002,44(5):541-557

Multiple endpoints are tested to assess an overall treatment effect and also to identify which endpoints or subsets of endpoints contributed to treatment differences. The conventional p‐value adjustment methods, such as single‐step, step‐up, or step‐down procedures, sequentially identify each significant individual endpoint. Closed test procedures can also detect individual endpoints that have effects via a step‐by‐step closed strategy. This paper proposes a global‐based statistic for testing an a priori number, say, r of the k endpoints, as opposed to the conventional approach of testing one (r = 1) endpoint. The proposed test statistic is an extension of the single‐step p‐value‐based statistic based on the distribution of the smallest p‐value. The test maintains strong control of the FamilyWise Error (FWE) rate under the null hypothesis of no difference in any (sub)set of r endpoints among all possible combinations of the k endpoints. After rejecting the null hypothesis, the individual endpoints in the sets that are rejected can be tested further, using a univariate test statistic in a second step, if desired. However, the second step test only weakly controls the FWE. The proposed method is illustrated by application to a psychosis data set. 相似文献

10.

Design and analysis of group sequential clinical trials with multiple primary endpoints

Kosorok MR Yuanjun S DeMets DL 《Biometrics》2004,60(1):134-145

In many phase III clinical trials, it is desirable to separately assess the treatment effect on two or more primary endpoints. Consider the MERIT-HF study, where two endpoints of primary interest were time to death and the earliest of time to first hospitalization or death (The International Steering Committee on Behalf of the MERIT-HF Study Group, 1997, American Journal of Cardiology 80[9B], 54J-58J). It is possible that treatment has no effect on death but a beneficial effect on first hospitalization time, or it has a detrimental effect on death but no effect on hospitalization. A good clinical trial design should permit early stopping as soon as the treatment effect on both endpoints becomes clear. Previous work in this area has not resolved how to stop the study early when one or more endpoints have no treatment effect or how to assess and control the many possible error rates for concluding wrong hypotheses. In this article, we develop a general methodology for group sequential clinical trials with multiple primary endpoints. This method uses a global alpha-spending function to control the overall type I error and a multiple decision rule to control error rates for concluding wrong alternative hypotheses. The method is demonstrated with two simulated examples based on the MERIT-HF study. 相似文献

11.

Special report of the Massachusetts weight‐of‐evidence workgroup A weight‐of‐evidence approach for evaluating ecological risks

Charles Menzie Miranda Hope Henning Jerome Cura Kenneth Finkelstein Jack Gentile James Maughan 《人类与生态风险评估》1996,2(2):277-304

Weight‐of‐evidence is the process by which multiple measurement endpoints are related to an assessment endpoint to evaluate whether significant risk of harm is posed to the environment. In this paper, a methodology is offered for reconciling or balancing multiple lines of evidence pertaining to an assessment endpoint. Weight‐of‐evidence is reflected in three characteristics of measurement endpoints: (a) the weight assigned to each measurement endpoint; (b) the magnitude of response observed in the measurement endpoint; and (c) the concurrence among outcomes of multiple measurement endpoints. First, weights are assigned to measurement endpoints based on attributes related to: (a) strength of association between assessment and measurement endpoints; (b) data quality; and (c) study design and execution. Second, the magnitude of response in the measurement endpoint is evaluated with respect to whether the measurement endpoint indicates the presence or absence of harm; as well as the magnitude. Third, concurrence among measurement endpoints is evaluated by plotting the findings of the two preceding steps on a matrix for each measurement endpoint evaluated. The matrix allows easy visual examination of agreements or divergences among measurement endpoints, facilitating interpretation of the collection of measurement endpoints with respect to the assessment endpoint. A qualitative adaptation of the weight‐of‐evidence approach is also presented. 相似文献

12.

Gatekeeping testing via adaptive alpha allocation

Li JD Mehrotra DV 《Biometrical journal. Biometrische Zeitschrift》2008,50(5):704-715

In a typical clinical trial, there are one or two primary endpoints, and a few secondary endpoints. When at least one primary endpoint achieves statistical significance, there is considerable interest in using results for the secondary endpoints to enhance characterization of the treatment effect. Because multiple endpoints are involved, regulators may require that the familywise type I error rate be controlled at a pre-set level. This requirement can be achieved by using "gatekeeping" methods. However, existing methods suffer from logical oddities such as allowing results for secondary endpoint(s) to impact the likelihood of success for the primary endpoint(s). We propose a novel and easy-to-implement gatekeeping procedure that is devoid of such deficiencies. A real data example and simulation results are used to illustrate efficiency gains of our method relative to existing methods. 相似文献

13.

An Investigation on the Allelic Chi‐Square Test Used in Genetic Association Studies

Seung‐Ho Kang Dong‐Wan Shin Man‐Suk Oh Chul W. Ahn 《Biometrical journal. Biometrische Zeitschrift》2004,46(6):699-706

Case‐control studies are primary study designs used in genetic association studies. Sasieni (Biometrics 1997, 53, 1253–1261) pointed out that the allelic chi‐square test used in genetic association studies is invalid when Hardy‐Weinberg equilibrium (HWE) is violated in a combined population. It is important to know how much type I error rate is deviated from the nominal level under violated HWE. We examine bounds of type I error rate of the allelic chi‐square test. We also investigate power of the goodness‐of‐fit test for HWE which can be used as a guideline for selecting an appropriate test between the allelic chi‐square test and the modified allelic chi‐square test, the latter of which was proposed for cases of violated HWE. In small samples, power is not large enough to detect the Wright's inbreeding model of small values of inbreeding coefficient. Therefore, when the null hypothesis of HWE is barely accepted, the modified test should be considered as an alternative method. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

14.

Statistical Tests Based on New Composite Hypotheses in Clinical Trials Reflecting the Relative Clinical Importance of Multiple Endpoints Quantitatively

Masako Nishikawa Toshiro Tango Megu Ohtaki 《Biometrical journal. Biometrische Zeitschrift》2009,51(5):749-762

In clinical trials, several endpoints (EPs) are often evaluated to compare treatments in some therapeutic area. Suppose that there are two EPs in a clinical trial. We propose a new set of composite hypotheses for continuous variables, taking the relative clinical importance of the EPs into account. The main hypotheses were formulated to show that a treatment is so superior to the control treatment, which is not necessarily a placebo, in one EP, that the possible non‐inferiority of the treatment by at most a certain value in the other EP can be compensated sufficiently, taking the clinical point of view into account. The maximum non‐inferiority margin of one EP might not be a biologically unimportant difference in exchange for much superiority of the other EP. This formulation leads to a new composite EP and a very simple test statistic. The intersection‐union principle was employed to derive the proposed test. 相似文献

15.

Letter to the Editor

Beilei Wu Alexander R. de Leon 《Biometrical journal. Biometrische Zeitschrift》2013,55(5):807-812

Regarding Paper “Sample size determination in clinical trials with multiple co‐primary endpoints including mixed continuous and binary variables” by T. Sozu , T. Sugimoto , and T. Hamasaki Biometrical Journal (2012) 54 (5): 716–729 Article: http://dx.doi.org/10.1002/bimj.201100221 Authors' Reply: http://dx.doi.org/10.1002/bimj.201300032 This paper recently introduced a methodology for calculating the sample size in clinical trials with multiple mixed binary and continuous co‐primary endpoints modeled by the so‐called conditional grouped continuous model (CGCM). The purpose of this note is to clarify certain aspects of the methodology and propose an alternative approach based on latent means tests for the binary endpoints. We demonstrate that our approach is more powerful, yielding smaller sample sizes at powers comparable to those used in the paper. 相似文献

16.

Interim analysis incorporating short‐ and long‐term binary endpoints

Julia Niewczas Cornelia U. Kunz Franz Knig 《Biometrical journal. Biometrische Zeitschrift》2019,61(3):665-687

Designs incorporating more than one endpoint have become popular in drug development. One of such designs allows for incorporation of short‐term information in an interim analysis if the long‐term primary endpoint has not been yet observed for some of the patients. At first we consider a two‐stage design with binary endpoints allowing for futility stopping only based on conditional power under both fixed and observed effects. Design characteristics of three estimators: using primary long‐term endpoint only, short‐term endpoint only, and combining data from both are compared. For each approach, equivalent cut‐off point values for fixed and observed effect conditional power calculations can be derived resulting in the same overall power. While in trials stopping for futility the type I error rate cannot get inflated (it usually decreases), there is loss of power. In this study, we consider different scenarios, including different thresholds for conditional power, different amount of information available at the interim, different correlations and probabilities of success. We further extend the methods to adaptive designs with unblinded sample size reassessments based on conditional power with inverse normal method as the combination function. Two different futility stopping rules are considered: one based on the conditional power, and one from P‐values based on Z‐statistics of the estimators. Average sample size, probability to stop for futility and overall power of the trial are compared and the influence of the choice of weights is investigated. 相似文献

17.

Microbial Endpoints: The Rationale for their Exclusion as Ecological Assessment Endpoints

Lawrence A. Kapustka 《人类与生态风险评估》1999,5(4):691-696

The functional importance of bacteria and fungi in terrestrial systems is recognized widely. However, microbial population, community, and functional measurement endpoints change rapidly and across very short spatial scales. Measurement endpoints of microbes tend to be highly responsive to typical fluxes of temperature, moisture, oxygen, and many other noncontaminant factors. Functional redundancy across broad taxonomic groups enables wild swings in community composition without remarkable change in rates of decomposition or community respiration. Consequently, it is exceedingly difficult to relate specific microbial activities with indications of adverse and unacceptable environmental conditions. Moreover, changes in microbial processes do not necessarily result in consequences to plant and animal populations or communities, which in the end are the resources most commonly identified as those to be protected. Therefore, unless more definitive linkages are made between specific microbial effects and an adverse condition for typical assessment endpoint species, microbial endpoints will continue to have limited use in risk assessments; they will not drive the process as primary assessment endpoints. 相似文献

18.

N‐glycoproteins exhibit a positive expression level–evolutionary rate correlation

Felix Feyertag Patricia M. Berninsone David Alvarez‐Ponce 《Journal of evolutionary biology》2019,32(4):390-394

The different proteins of any proteome evolve at enormously different rates. One of the primary factors influencing rates of protein evolution is expression level, with highly expressed proteins tending to evolve at slow rates. This phenomenon, known as the expression level–evolutionary rate (E–R) anticorrelation, has been attributed to the abundance‐dependent deleterious effects of misfolding or misinteraction. We have recently shown that secreted proteins either lack an E–R anticorrelation or exhibit a significantly reduced E–R anticorrelation. This effect may be due to the strict quality control to which secreted proteins are subject in the endoplasmic reticulum (which is expected to reduce the rate of misfolding and its deleterious effects) or to their extracellular location (expected to reduce the rate of misinteraction and its deleterious effects). Among secreted proteins, N‐glycosylated ones are under particularly strong quality control. Here, we investigate how N‐linked glycosylation affects the E–R anticorrelation. Strikingly, we observe a positive E–R correlation among N‐glycosylated proteins. That is, N‐glycoproteins that are highly expressed evolve at faster rates than lowly expressed N‐glycoproteins, in contrast to what is observed among intracellular proteins. 相似文献

19.

Confidence intervals for directly standardized rates using mid‐p gamma intervals

Michael P. Fay Sungwook Kim 《Biometrical journal. Biometrische Zeitschrift》2017,59(2):377-387

Directly standardized rates continue to be an integral tool for presenting rates for diseases that are highly dependent on age, such as cancer. Statistically, these rates are modeled as a weighted sum of Poisson random variables. This is a difficult statistical problem, because there are k observed Poisson variables and k unknown means. The gamma confidence interval has been shown through simulations to have at least nominal coverage in all simulated scenarios, but it can be overly conservative. Previous modifications to that method have closer to nominal coverage on average, but they do not achieve the nominal coverage bound in all situations. Further, those modifications are not central intervals, and the upper coverage error rate can be substantially more than half the nominal error. Here we apply a mid‐p modification to the gamma confidence interval. Typical mid‐p methods forsake guaranteed coverage to get coverage that is sometimes higher and sometimes lower than the nominal coverage rate, depending on the values of the parameters. The mid‐p gamma interval does not have guaranteed coverage in all situations; however, in the (not rare) situations where the gamma method is overly conservative, the mid‐p gamma interval often has at least nominal coverage. The mid‐p gamma interval is especially appropriate when one wants a central interval, since simulations show that in many situations both the upper and lower coverage error rates are on average less than or equal to half the nominal error rate. 相似文献

20.

A multistate model for early decision-making in oncology

Ulrich Beyer David Dejardin Matthias Meller Kaspar Rufibach Hans Ulrich Burger 《Biometrical journal. Biometrische Zeitschrift》2020,62(3):550-567

The development of oncology drugs progresses through multiple phases, where after each phase, a decision is made about whether to move a molecule forward. Early phase efficacy decisions are often made on the basis of single-arm studies based on a set of rules to define whether the tumor improves (“responds”), remains stable, or progresses (response evaluation criteria in solid tumors [RECIST]). These decision rules are implicitly assuming some form of surrogacy between tumor response and long-term endpoints like progression-free survival (PFS) or overall survival (OS). With the emergence of new therapies, for which the link between RECIST tumor response and long-term endpoints is either not accessible yet, or the link is weaker than with classical chemotherapies, tumor response-based rules may not be optimal. In this paper, we explore the use of a multistate model for decision-making based on single-arm early phase trials. The multistate model allows to account for more information than the simple RECIST response status, namely, the time to get to response, the duration of response, the PFS time, and time to death. We propose to base the decision on efficacy on the OS hazard ratio (HR) comparing historical control to data from the experimental treatment, with the latter predicted from a multistate model based on early phase data with limited survival follow-up. Using two case studies, we illustrate feasibility of the estimation of such an OS HR. We argue that, in the presence of limited follow-up and small sample size, and making realistic assumptions within the multistate model, the OS prediction is acceptable and may lead to better early decisions within the development of a drug. 相似文献