How create a data model by resampling from an existing data set?
Summary
The Mediana R package uses Monte-Carlo simulations to generate patient outcomes in a clinical trial. The simulation parameter are specified in a data model. In addition, as explained below, a data model can also be created by resampling from an existing data set, e.g., a clinical trial database.
The following case study will be used to illustrate how to create a data model by resampling from an existing data set. We will consider a database of three Phase II clinical trials. This database contains data on the primary endpoint to be used in an upcoming Phase III trial for the experimental treatment. Furthermore, a large database with control data collected in several previously conducted trials with other investigational treatments is available. The sponsor wishes to estimate statistical power in the Phase III trial by sampling from these existing databases.
For simplicity, we will consider a Phase III clinical trial with two arms and a normally distributed endpoint (a Phase III clinical trial in patients with pulmonary arterial hypertension where the primary endpoint is the change in the six-minute walk distance).
Data Model
The data model will be constructed by sampling from several pre-existing datasets. For the purposes of illustration, we will generate these datasets.
Beginning with the database containing treatment data, we consider the outcome data collected from three Phase II clinical trials with 75, 75 and 50 patients, respectively. The observed means and SDs of the primary endpoint in each trial will be used to generate the treatment database as shown below:
For the control database, consider a database set up by pooling outcome data from several development programs with the same indications (there are 3000 patients in this database). The control database will be generated using an approach similar to the one utilized above:
The data model in this case study will be constructed by sampling data from these two databases. Several sample size scenarios will be evaluated to compute power in the Phase III trial, from 40 patients per treatment arm to 70, with a step of 10 patients.
In order to sample from the treatment and control data sets, a new outcome distribution function needs to be implemented. The key idea behind this function is to simply enable sampling outcome data from the two data sets. To create a custom outcome distribution function, please refer to this page. This function will require two parameters in addition to the sample size per arm, name of the existing data set and a boolean variable indicating whether the sampling will be done with or without replacement. The custom function, named SamplingDist
, is presented below.
The first block in the SamplingDist
function is used to get the function’s parameters, i.e., the data set’s name and the boolean indicator. The second block focuses on sampling n
values from the dataset
with an option to sample with or without replacement (based on replace
). Lastly, the third block creates an object that will be used in a simulation report and will not be discussed here.
For the purpose of illustration, we will consider a sampling scheme with replacement. The outcome parameters for each trial arm and the data model can be defined as follows. The outcome distribution specified in the OutcomeDist
object is SamplingDist
.
After the data model has been set up, the analysis model and the evaluation model can be defined using a standard approach.
Analysis model
The analysis model defines a single significance test that will be carried out in the Phase III trial (treatment versus placebo). The treatment effect will be assessed using the one-sided two-sample t-test:
Evaluation model
The data and analysis models specified above define the Clinical Scenarios that will be examined in the Phase III trial. In general, clinical scenarios are evaluated using success criteria based on the trial’s clinical objectives. Regular power, also known as marginal power, will be computed in this trial. This success criterion is specified in the following evaluation model.
Perform Clinical Scenario Evaluation
After the clinical scenarios (data and analysis models) and evaluation model have been defined, the user is ready to evaluate the success criteria specified in the evaluation model by invoking the CSE
function. The simulation parameters need to be defined in a SimParameters
object:
The CSE
call specifies the individual components of Clinical Scenario Evaluation in this case study as well as the simulation parameters:
Download
Click on the icons below to download the R code used in this case study and report that summarizes the results of Clinical Scenario Evaluation: