## Summary

This case study extends the straightforward setting presented in Case study 1 to a more complex setting involving two trial endpoints and three treatment arms. Case study 5 illustrates the process of performing power calculations in clinical trials with multiple, hierarchically structured objectives and “multivariate” multiplicity adjustment strategies (gatekeeping procedures).

Consider a three-arm Phase III clinical trial for the treatment of rheumatoid arthritis (RA). Two co-primary endpoints will be used to evaluate the effect of a novel treatment on clinical response and on physical function. The endpoints are defined as follows:

• Endpoint 1: Response rate based on the American College of Rheumatology definition of improvement (ACR20).

• Endpoint 2: Change from baseline in the Health Assessment Questionnaire-Disability Index (HAQ-DI).

The two endpoints have different marginal distributions. The first endpoint is binary whereas the second one is continuous and follows a normal distribution.

The efficacy profile of two doses of a new treatment (Doses L and Dose H) will be compared to that of a placebo and a successful outcome will be defined as a significant treatment effect at either or both doses. A hierarchical structure has been established within each dose so that Endpoint 2 will be tested if and only if there is evidence of a significant effect on Endpoint 1.

Three treatment effect scenarios for each endpoint are displayed in the table below. The scenarios define three outcome parameter sets. The first set represents a rather conservative treatment effect scenario, the second set is a standard (most plausible) scenario and the third set represents an optimistic scenario. Note that a reduction in the HAQ-DI score indicates a beneficial effect and thus the mean changes are assumed to be negative for Endpoint 2.

Endpoint Outcome parameter set Placebo Dose L Dose H
ACR20 (%)
Conservative 30% 40% 50%
Standard 30% 45% 55%
Optimistic 30% 50% 60%
HAQ-DI (mean (SD))
Conservative −0.10 (0.50) −0.20 (0.50) −0.30 (0.50)
Standard −0.10 (0.50) −0.25 (0.50) −0.35 (0.50)
Optimistic −0.10 (0.50) −0.30 (0.50) −0.40 (0.50)

## Define a Data Model

As in Case study 4, two endpoints are evaluated for each patient in this clinical trial example, which means that their joint distribution needs to be specified. The MVMixedDist method will be utilized for specifying a bivariate distribution with binomial and normal marginals (var.type = list("BinomDist", "NormalDist")). In general, this function is used for modeling correlated normal, binomial and exponential endpoints and relies on the copula method, i.e., random variables are generated from a multivariate normal distribution and converted into variables with pre-specified marginal distributions.

Three parameters must be defined to specify the joint distribution of Endpoints 1 and 2 in this clinical trial example:

• Variable types (binomial and normal).

• Outcome distribution parameters (proportion for Endpoint 1, mean and SD for Endpoint 2) based on the assumptions listed in the Table above.

• Correlation matrix of the multivariate normal distribution used in the copula method.

These parameters are combined to define three outcome parameter sets (e.g., outcome1.plac , outcome1.dosel  and outcome1.doseh ) that will be included in the Sample object in the data model.

These outcome parameter set are then combined within each Sample object and the common sample size per treatment arm ranges between 100 and 120:

## Define an Analysis Model

To set up the analysis model in this clinical trial example, note that the treatment comparisons for Endpoints 1 and 2 will be carried out based on two different statistical tests:

• Endpoint 1: Two-sample test for comparing proportions (method = "PropTest").

• Endpoint 2: Two-sample t-test (method = "TTest").

It was pointed out earlier in this page that the two endpoints will be tested hierarchically within each dose. The figure below provides a visual summary of the testing strategy used in this clinical trial. The circles in this figure denote the four null hypotheses of interest:

• H1: Null hypothesis of no difference between Dose L and placebo with respect to Endpoint 1.

• H2: Null hypothesis of no difference between Dose H and placebo with respect to Endpoint 1.

• H3: Null hypothesis of no difference between Dose L and placebo with respect to Endpoint 2.

• H4: Null hypothesis of no difference between Dose H and placebo with respect to Endpoint 2.

A multiple testing procedure known as the multiple-sequence gatekeeping procedure will be applied to account for the hierarchical structure of this multiplicity problem. This procedure belongs to the class of mixture-based gatekeeping procedures introduced in Dmitrienko et al. (2015). This gatekeeping procedure is specified by defining the following three parameters:

• Families of null hypotheses (family).

• Component procedures used in the families (component.procedure).

• Truncation parameters used in the families (gamma).

These parameters are included in the MultAdjProc object defined below. The tests to which the multiplicity adjustment will be applied are defined in the tests argument. The use of this argument is optional if all tests included in the analysis model are to be included. The argument family states that the null hypotheses will be grouped into two families:

• Family 1: H1 and H2.

• Family 2: H3 and H4.

It is to be noted that the order corresponds to the order of the tests defined in the analysis model, except if the tests are specifically specified in the tests argument of the MultAdjProc object.

The families will be tested sequentially and a truncated Holm procedure will be applied within each family (component.procedure). Lastly, the truncation parameter will be set to 0.8 in Family 1 and to 1 in Family 2 (gamma). The resulting parameters are included in the par argument of the MultAdjProc object and, as before, the proc argument is used to specify the multiple testing procedure (MultipleSequenceGatekeepingAdj).

The test are then specified in the analysis model and the overall analysis model is defined as follows:

Recall that a numerically lower value indicates a beneficial effect for the HAQ-DI score and, as a result, the experimental treatment arm must be defined prior to the placebo arm in the test.samples parameters corresponding to the HAQ-DI tests, e.g., samples = samples("DoseL HAQ-DI", "Placebo HAQ-DI").

## Define an Evaluation Model

In order to assess the probability of success in this clinical trial, a hybrid criterion based on the conjunctive criterion (both trial endpoints must be significant) and disjunctive criterion (at least one dose-placebo comparison must be significant) can be considered.

This criterion will be met if a significant effect is established at one or two doses on Endpoint 1 (ACR20) and also at one or two doses on Endpoint 2 (HAQ-DI). However, due to the hierarchical structure of the testing strategy (see Figure), this is equivalent to demonstrating a significant difference between Placebo and at least one dose with respect to Endpoint 2. The corresponding criterion is a subset disjunctive criterion based on the two Endpoint 2 tests (subset disjunctive power was briefly mentioned in Case study 2).

In addition, the sponsor may also be interested in evaluating marginal power as well as subset disjunctive power based on the Endpoint 1 tests. The latter criterion will be met if a significant difference between Placebo and at least one dose is established with respect to Endpoint 1. Additionally, as in Case study 2, the user could consider defining custom evaluation criteria. The three resulting evaluation criteria (marginal power, subset disjunctive criterion based on the Endpoint 1 tests and subset disjunctive criterion based on the Endpoint 2 tests) are included in the following evaluation model.