Bayesian Adaptive Trial Design for a Continuous Biomarker with Possibly Nonlinear or Nonmonotone Prognostic or Predictive Effects (original) (raw)

Journal Article

,

Department of Human Genetics University of Chicago

,

Chicago, Illinois

,

USA

Search for other works by this author on:

,

Department of Biostatistics University of Florida

,

Gainesville, Florida

,

USA

Search for other works by this author on:

Division of Biostatistics University of Southern California and Children's Oncology Group

,

Los Angeles, California

,

USA

Correspondence Lindsay A. Renfro, Division of Biostatistics, University of Southern California and Children's Oncology Group, Los Angeles, CA 90007, USA. Email: lrenfro@usc.edu

Search for other works by this author on:

Received:

21 February 2020

Published:

20 August 2021

Cite

Yusha Liu, John A. Kairalla, Lindsay A. Renfro, Bayesian Adaptive Trial Design for a Continuous Biomarker with Possibly Nonlinear or Nonmonotone Prognostic or Predictive Effects, Biometrics, Volume 78, Issue 4, December 2022, Pages 1441–1453, https://doi.org/10.1111/biom.13550
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

As diseases like cancer are increasingly understood on a molecular level, clinical trials are being designed to reveal or validate subpopulations in which an experimental therapy has enhanced benefit. Such biomarker-driven designs, particularly “adaptive enrichment” designs that initially enroll an unselected population and then allow for later restriction of accrual to “marker-positive” patients based on interim results, are increasingly popular. Many biomarkers of interest are naturally continuous, however, and most existing design approaches either require upfront dichotomization or force monotonicity through algorithmic searches for a single marker threshold, thereby excluding the possibility that the continuous biomarker has a nondisjoint and truly nonlinear or nonmonotone prognostic relationship with outcome or predictive relationship with treatment effect. To address this, we propose a novel trial design that leverages both the actual shapes of any continuous marker effects (both prognostic and predictive) and their corresponding posterior uncertainty in an adaptive decision-making framework. At interim analyses, this marker knowledge is updated and overall or marker-driven decisions are reached such as continuing enrollment to the next interim analysis or terminating early for efficacy or futility. Using simulations and patient-level data from a multi-center Children's Oncology Group trial in Acute Lymphoblastic Leukemia, we derive the operating characteristics of our design and compare its performance to a traditional approach that identifies and applies a dichotomizing marker threshold.

1 Introduction

The last two decades of clinical cancer research have yielded a major paradigm shift in how cancer is understood on a molecular level, which in turn has changed the paradigm for developing and evaluating novel therapeutic strategies. Although specific cancers and their associated treatments were once predominantly classified according to a tumor's stage and location (e.g., colon or breast), modern characterization of a patient's disease now routinely includes discussion of germline or tumor mutations, laboratory serum measurements, immune markers, or other characteristics, which are now being studied or leveraged to select the best therapeutic strategy for each patient. Meanwhile, cancer clinical trial biostatisticians have been required to keep up with the clinical and translational science, as the now-common strategy of evaluating novel therapies and biomarkers in combination requires a parallel strategic shift in clinical trial design (Renfro et al., 2017). Specifically, the classical randomized clinical trial framework must be modernized to accommodate biomarker-driven adaptation, decision-making, and possible accrual enrichment to those patients discovered mid-trial to benefit most from the experimental treatment under study.

There is a rich literature on biomarker-based adaptive enrichment designs, which initially enroll and randomize patients to the experimental versus control arms regardless of their biomarker status, and allow mid-trial termination of accrual to a marker-defined patient subset if interim analyses suggest no enhanced benefits from the experimental treatment for these patients. However, almost all of these adaptive enrichment designs assume that the biomarker is dichotomous in nature, that is, “marker-positive” and “marker-negative” subgroups are predefined prior to the start of the trial. One of the first such designs was proposed by Wang et al. (2007), who developed a two-stage adaptive design in the presence of a binary genomic biomarker and allowed mid-trial adjustment to recruit only the marker-positive subgroup based on an interim analysis. Subsequently, others have proposed similar designs or extensions for dichotomous or categorical markers, including Karuri and Simon (2012), Song (2014), Brannath et al. (2009), (Jenkins et al. (2011), Friede et al. 2012, and Metha et al. (2014).

To our knowledge, there is a paucity of trial designs that both assess the predictive effect of a truly continuous biomarker and carry out adaptive decision rules according to interim assessments of the marker's current relationships with outcome and treatment effect. Renfro et al. (2014) proposed a randomized trial design where at an interim analysis, optimal cut-points along a continuous biomarker are evaluated and the threshold yielding the strongest treatment-by-(dichotomous) marker interaction effect is potentially utilized for subsequent adaptations, including marker-based enrichment of accrual. Ohwada and Morita (2016) applied a four-parameter change-point Bayesian model to the relationship between a continuous biomarker and the treatment effect as measured by hazard ratio of two arms, assuming that the marker is possibly predictive but not prognostic. At interim analyses, they utilized posterior distributions of the model parameters to identify the cutoff value for the sensitive subpopulation and to determine whether to stop the trial early or restrict enrollment to the sensitive subpopulation. Both Renfro et al. (2014) and Ohwada and Morita (2016) assumed a known and monotone direction of benefit of the continuous marker, while in practice, such prior knowledge might not be available and the treatment effect might be revealed to be a nonmonotone function of the biomarker value. Moreover, while the cut-point and the change-point models are simple in form and interpretable to clinicians, they discard critical information on the relationship of interest that may prevent identification of a truly responsive subpopulation with sufficient power and an adequately controlled type I error rate.

In this paper, we propose a new Bayesian adaptive trial design for continuous biomarkers that may have truly nonlinear or nonmonotone prognostic relationships with outcome or predictive relationships with treatment effect. Importantly, our design does not utilize algorithmic dichotomization with grid searching for marker thresholds, but estimates continuous biomarker relationships directly and quantifies the posterior uncertainty around these estimates. At one or more interim analyses, this information is used to evaluate whether any marker-based subpopulation shows sufficiently enhanced treatment effect, and whether early termination for efficacy or futility is warranted, either overall or in a marker-defined subgroup. Throughout, a marker's continuous relationships with outcome and treatment effect are maintained and quantifiable posterior uncertainty leveraged to yield better trial decisions.

The remainder of the paper is organized as follows. In Section 2, we present the notation, models, and algorithm associated with our trial design. In Section 3, we perform a simulation study to assess the operating characteristics of our design under a range of true underlying marker scenarios, and compare its performance to a more typical marker-driven design that includes automated marker dichotomization and cut-point selection. In Section 4, we apply our design and the comparator design to individual patient data from a clinical trial in Acute Lymphoblastic Leukemia, and in Section 5, we offer some concluding remarks. Throughout this paper, we assume there is one continuous biomarker of interest, but our proposed framework can be extended to handle a small number of continuous biomarkers if we further assume their prognostic and predictive effects are independent and additive; details are given in Web Appendix C.

2 Methods

Assume that a randomized clinical trial is planned to study the effect of an experimental treatment on a time-to-event endpoint T relative to the control treatment. Let Z denote the treatment assignment, with Z=1 for the experimental arm and Z=0 for the control arm. We further assume that there exists a continuous and potentially predictive biomarker X that takes values in the range [a,b] for some a,b∈R1 and is measured at baseline for each patient.

2.1 Modeling the Continuous Biomarker-Driven Effect Using Bayesian Penalized Splines

2.1.1 Model Formulation

The effect of the continuous biomarker X on the endpoint T is modeled by a proportional hazards model. Mathematically, this can be expressed as

λt|x,z=λ0(t)expg(x)+h(x)z,

(2.1)

where λ0(t) and λ(t|x,z)⁠, respectively, denote the baseline hazard function and the conditional hazard function given the biomarker value x and the treatment assignment z, g(x) represents the prognostic effect of the marker in the control arm, and h(x) represents the marker-driven treatment effect on the experimental arm relative to the control arm. To adequately characterize the effect of the continuous marker on the clinical outcome, we use B-splines to model g(x) and h(x)⁠. For a patient with biomarker value x,

g(x)=∑m=1M+4Bm(x)νg,m,h(x)=∑m=1M+4Bm(x)νh,m,

(2.2)

where B1(x),⋯,BM+4(x) are the cubic B-spline basis functions defined by the knot sequence η1,⋯,ηM+8 such that

a=η1=η2=η3=η4<η5<⋯<ηM+4<ηM+5=ηM+6=ηM+7=ηM+8=b.

(2.3)

The M interior knots η5,⋯,ηM+4 can be chosen in a data-adaptive manner, for example, the _m_th interior knot can be placed at the mM+1th sample quantile of x's.

For n patients with marker values x1,⋯,xn⁠, (2.2) can be rewritten in a matrix form

where x=(x1,⋯,xn)′⁠, g(x)=(g(x1),⋯,g(xn))′,h(x)=(h(x1),⋯,h(xn))′⁠, B is the n×(M+4) B-spline design matrix with the (i,m)th entry Bim=Bm(xi)⁠, νg=(νg,1,⋯,νg,M+4)′⁠, and νh=(νh,1,⋯,νh,M+4)′⁠. As is typically done in the smoothing spline literature (Wand and Ormerod, 2008), an L2 penalty is, respectively, imposed on the integrated squared second derivative of g(x) and h(x) to induce smoothness and prevent overfitting, that is,

∫ab{g̈(x)}2dx=νg′Ωνg,∫ab{ḧ(x)}2dx=νh′Ωνh,

(2.5)

where Ω is the (M+4)×(M+4) penalty matrix with the (m,m′)th entry Ωmm′=∫abB̈m(x)B̈m′(x)dx⁠. Although the penalized splines model can be fit in a frequentist framework, here we choose to adopt a fully Bayesian approach that can quantify the posterior uncertainty around the estimates of g(x) and h(x)⁠. In a Bayesian framework, imposing the L2 penalty in (2.5) is equivalent to placing the following priors on νg and νh⁠:

p(νg)∝exp−12σg2νg′Ωνg,p(νh)∝exp−12σh2νh′Ωνh,

(2.6)

where σg2 and σh2 are regularization parameters.

It is known from the smoothing spline literature (Wand and Ormerod, 2008) that rank(Ω)=M+2⁠. Let d1,⋯,dM+2 denote the M+2 strictly positive eigenvalues of Ω, and WΩ denote the (M+4)×(M+2) matrix that consists of the corresponding eigenvectors of Ω. With some algebraic rearrangements provided in Web Appendix A, we can reformulate (2.4) as

g(x)=xβg+WBug,h(x)=1nβh,1+xβh,2+WBuh,

(2.7)

where 1n is an n×1 vector consisting of 1's, βg,βh,1⁠, and βh,2 are scalars having a diffuse prior, ug and uh are (M+2)×1 random vectors with mutually independent Gaussian priors ug∼MVN(0,σg2IM+2) and uh∼MVN(0,σh2IM+2)⁠, and WB=BWΩdiag(d1−1/2,⋯,dM+2−1/2)⁠. To complete the model specification, an uninformative inverse Gamma prior is used, respectively, on σg2 and σh2⁠:

p(σg2)∝(σg2)−a0,g−1exp−b0,gσg2,p(σh2)∝(σh2)−a0,h−1exp−b0,hσh2.

(2.8)

In our simulation and data application, we set a0,g=b0,g=a0,h=b0,h=0.01⁠.

2.1.2 Posterior Computation

Let D={(ti,δi,xi,zi);i=1,⋯,n} be the patient data observed up to a given analysis time point, where ti is the follow-up time for the endpoint T of patient i, δi is equal to 1 if patient i experiences an event and equal to 0 if right censored, xi is the biomarker value, and zi is the treatment assignment for patient i. Let θg=(βg,ug′)′⁠, and θh=(βh,1,βh,2,uh′)′⁠. Given (2.1), (2.7), and (2.8), posterior sampling proceeds differently depending on the assumption of λ0(t)⁠.

πθg,θh,σg2,σh2|D∝PLθg,θh|D×p(ug|σg2)p(uh|σh2)p(σg2)p(σh2)∝PLθg,θh|D×∏m=1M+21σgexp−ug,m22σg2×∏m=1M+21σhexp−uh,m22σh2×(σg2)−a0,g−1exp−b0,gσg2×(σh2)−a0,h−1exp−b0,hσh2,

(2.9)

πθg,θh,σg2,σh2|D∝Lθg,θh|D×p(ug|σg2)p(uh|σh2)p(σg2)p(σh2)p(λ0)∝∏i=1nλ0expg(xi)+h(xi)ziδiexp−λ0expg(xi)+h(xi)ziti×∏m=1M+21σgexp−ug,m22σg2×∏m=1M+21σhexp−uh,m22σh2×(σg2)−a0,g−1exp−b0,gσg2×(σh2)−a0,h−1exp−b0,hσh2×λ0a0,λ0−1exp(−b0,λ0λ0),

(2.10)

The parameters σg2⁠, σh2⁠, and λ0 have a conjugate full conditional distribution and are updated via a Gibbs step. θg and θh do not have a closed-form full conditional distribution and are therefore updated using adaptive Metropolis samplers. To avoid small Metropolis acceptance probabilities, the number of interior knots M is set to be no larger than 10, which we find to be sufficient to model the biomarker effect g(x) and h(x) well under various marker scenarios considered in our simulations. In addition to estimation, the posterior samples of θg and θh can be used to perform Bayesian inference, including the construction of credible bands for g(x) and h(x)⁠.

2.2 Bayesian Adaptive Design

Throughout the paper, we assume a 1:1 randomization ratio to each treatment arm. Suppose there are K interim analyses during the trial. In practice, the timing of an interim analysis is usually associated with some prespecified percentage of the total number of events required for the final analysis. Let Dk={(ti,δi,xi,zi);i=1,⋯,nk} denote the patient data observed by the interim timepoint k∈{1,⋯,K}⁠, where nk is the number of patients enrolled by time k.

2.2.1 Predictive Marker Effect Evaluation for Potential Subpopulations

At an interim check k, we first decide whether there is sufficient evidence of a differential treatment effect to classify patients by marker X into marker-positive cohort X+ and marker-negative cohort X− using the posterior distribution of h(x)⁠, and then based on this, perform a marker-cohort-specific or overall interim analysis to evaluate treatment effect.

To determine whether and how to classify patients based on X, one sensible strategy is to define

X+:={x∈[a,b]:Ph(x)>0∣Dk<α},

(2.11)

for a prespecified significance level α, and define X− as the complementary cohort of X+⁠. This strategy separates the marker values at which the experimental treatment is highly likely to be efficacious from other marker values. To maintain a reasonable sample size and other operating characteristics for both marker cohorts, we additionally require that the prevalence of X+ estimated based on the patients enrolled so far must fall within the interval [εprev,1−εprev] for some prespecified value εprev⁠. In practice, εprev might take some value between 0.05 and 0.25 and can be determined by the user based on other constraints of the trial. If a sensitive subpopulation is identified, we subsequently check for early efficacy and futility independently within each marker cohort. Compared to an overall cohort analysis, this allows us to draw marker-cohort-specific conclusions and distinguish the marker-defined subpopulation who truly benefit from the experimental treatment from those who do not. If the estimated marker prevalence falls outside of [εprev,1−εprev]⁠, we do not dichotomize the marker, and perform an interim analysis for efficacy and futility on the overall cohort.

Model for the overall cohort. Suppose that at the interim analysis timepoint k, we do not find evidence of sensitive versus nonsensitive subpopulations based on the continuous marker X, and perform overall evaluations using the current data to compare two treatment arms. To check for early efficacy and futility, we specify a parametric model for the baseline hazard λ0(t)⁠, that is, λ0(t)=f(γ,t)⁠, where f is a known function of the finite-dimensional parameter γ and follow-up time t. In practice, we can make use of previous similar trials to determine which parametric model best fits the time-to-event data of this type of trial. For example, λ0(t)=rtr−1 denotes a Weibull model with shape parameter r, and γ=r in this case. For a treatment arm assignment z, λ(t|z)=λ0(t)exp(α0+α1z)=f(γ,t)exp(α0+α1z)⁠, where α1 corresponds to the overall experimental treatment effect. Vague priors are placed on the overall model's parameters: γ,α0,α1⁠. A reasonable choice of priors for α0 and α1 is N(0,1/τ)⁠, where τ is the precision parameter and set to be small.

The joint posterior density of (α0,α1,γ) based on data Dk can be written as

πkα0,α1,γ|Dk∝Lkα0,α1,γ|Dk×p(α0)p(α1)p(γ)∝∏i=1nkf(γ,ti)exp(α0+α1zi)δiexp−exp(α0+α1zi)∫0tif(γ,t)dt×N(α0|0,1/τ)N(α1|0,1/τ)p(γ).

(2.12)

Model for the marker-specific cohort. Suppose that at the interim check k, modeling of the continuous marker X is used to identify a sensitive patient subpopulation X+⁠. Thus, X can be reduced to an indicator variable Xd where Xd=1 for patients from the marker-positive cohort X+ and Xd=0 for patients from the marker-negative cohort X−⁠. The model used to check for marker-cohort-specific efficacy and futility is identical to that used for the overall cohort analysis, except that the patient data Dk,Xd and model parameters γXd,α0,Xd,α1,Xd,τXd are specific to the marker subgroup Xd⁠.

2.2.2 Interim Analysis Algorithm

At each analysis time point k (⁠k=1,⋯,K⁠), our proposed trial design proceeds as follows.

To summarize, the trial first recruits and randomizes an unselected population of patients. At a series of interim analyses, the penalized splines model is applied to the accumulated patient data to study the treatment effect as a function of the continuous biomarker. If there is sufficient evidence of a predictive effect despite remaining posterior uncertainty, we consider sensitive versus nonsensitive subpopulations, check for efficacy and futility, and allow for early stopping separately within each marker subgroup. In the case of early stopping by one marker subgroup, the other marker subgroup is allowed to continue, with entirely independent subsequent evaluations for treatment effect. If the continuous marker is not found to be predictive at an interim analysis, we evaluate the treatment effect across the entire (unselected) patient population at that timepoint. When an interim decision is to continue enrollment of the whole patient population, subsequent analysis timepoints will include updated estimation of the marker effect curves.

In practice, the posterior thresholds Peff,group⁠, Peff,overall⁠, Pfut,group and Pfut,overall are calibrated in simulation studies and depend on independently chosen trial factors such as maximum sample size, desired minimum type I error rate, power, target or plausible biomarker effect sizes, and the number of planned interim analyses. This calibration process is illustrated in Web Appendix E and Figure 1 provides a high-level schema for the overall trial design.

Flowchart for the interim analysis algorithm in the adaptive trial design with a continuous biomarker

Figure 1

Flowchart for the interim analysis algorithm in the adaptive trial design with a continuous biomarker

3 Simulation Study

We conducted simulations to evaluate the performance of our proposed Bayesian continuous marker (BCM) design under a variety of “true” biomarker effect scenarios.

3.1 Data Generation and General Trial Settings

In all simulations, we consider a maximum trial size of N=500 patients with equal allocation to each treatment arm, and assume that patients are enrolled uniformly over an accrual period of 2.5 years. The time-to-event outcome T is exponentially and independently distributed for each patient, with the median event time in the control arm fixed to be 6 months. One interim analysis and one final analysis are planned when 40% and 80% of patients have experienced events, with the remaining patients right-censored at their present length of follow-up. Throughout, we assume that the continuous biomarker X is uniformly distributed in (0,1), with marker relationships following 10 different representative scenarios as described in Table 1 and shown in Figure S1.

Table 1

Simulation scenarios with descriptions, marker effect functions on the log hazard ratio scale, and true parameter values including the maximum marker effect size Δ (on the hazard ratio scale)

Scenario Control Arm g(x) Treatment Arm g(x)+h(x) Parameter Values
1 No treatment effect, no marker effect 0 0 -
2 Constant treatment effect, no marker effect 0 logΔ Δ=0.5
3 Prognostic marker effect, no treatment effect xlogΔ xlogΔ Δ=0.4
4 Predictive marker effect (perfectly dichotomous) 0 1{x>x0}logΔ x0=0.5,Δ=0.4
5 Predictive marker effect (nearly dichotomous) 0 exp{25(x−x0)}1+exp{25(x−x0)}logΔ x0=0.5,Δ=0.4
6 Predictive marker effect (linear) 0 xlogΔ Δ=0.4
7 Predictive marker effect (nonlinear, monotone) 0 {1−exp(−6x)}logΔ Δ=0.4
8 Predictive marker effect (nonlinear, monotone) 0 x3logΔ Δ=0.4
9 Predictive marker effect 0 If x≤x0,exp{30(x−x1)}1+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,11+exp{30(x−x2)}logΔ x1=0.3,x2=0.7
10 Predictive marker effect 0 If x≤x0,11+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,exp{30(x−x2)}1+exp{30(x−x2)}logΔ x1=0.3,x2=0.7
Scenario Control Arm g(x) Treatment Arm g(x)+h(x) Parameter Values
1 No treatment effect, no marker effect 0 0 -
2 Constant treatment effect, no marker effect 0 logΔ Δ=0.5
3 Prognostic marker effect, no treatment effect xlogΔ xlogΔ Δ=0.4
4 Predictive marker effect (perfectly dichotomous) 0 1{x>x0}logΔ x0=0.5,Δ=0.4
5 Predictive marker effect (nearly dichotomous) 0 exp{25(x−x0)}1+exp{25(x−x0)}logΔ x0=0.5,Δ=0.4
6 Predictive marker effect (linear) 0 xlogΔ Δ=0.4
7 Predictive marker effect (nonlinear, monotone) 0 {1−exp(−6x)}logΔ Δ=0.4
8 Predictive marker effect (nonlinear, monotone) 0 x3logΔ Δ=0.4
9 Predictive marker effect 0 If x≤x0,exp{30(x−x1)}1+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,11+exp{30(x−x2)}logΔ x1=0.3,x2=0.7
10 Predictive marker effect 0 If x≤x0,11+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,exp{30(x−x2)}1+exp{30(x−x2)}logΔ x1=0.3,x2=0.7

Table 1

Simulation scenarios with descriptions, marker effect functions on the log hazard ratio scale, and true parameter values including the maximum marker effect size Δ (on the hazard ratio scale)

Scenario Control Arm g(x) Treatment Arm g(x)+h(x) Parameter Values
1 No treatment effect, no marker effect 0 0 -
2 Constant treatment effect, no marker effect 0 logΔ Δ=0.5
3 Prognostic marker effect, no treatment effect xlogΔ xlogΔ Δ=0.4
4 Predictive marker effect (perfectly dichotomous) 0 1{x>x0}logΔ x0=0.5,Δ=0.4
5 Predictive marker effect (nearly dichotomous) 0 exp{25(x−x0)}1+exp{25(x−x0)}logΔ x0=0.5,Δ=0.4
6 Predictive marker effect (linear) 0 xlogΔ Δ=0.4
7 Predictive marker effect (nonlinear, monotone) 0 {1−exp(−6x)}logΔ Δ=0.4
8 Predictive marker effect (nonlinear, monotone) 0 x3logΔ Δ=0.4
9 Predictive marker effect 0 If x≤x0,exp{30(x−x1)}1+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,11+exp{30(x−x2)}logΔ x1=0.3,x2=0.7
10 Predictive marker effect 0 If x≤x0,11+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,exp{30(x−x2)}1+exp{30(x−x2)}logΔ x1=0.3,x2=0.7
Scenario Control Arm g(x) Treatment Arm g(x)+h(x) Parameter Values
1 No treatment effect, no marker effect 0 0 -
2 Constant treatment effect, no marker effect 0 logΔ Δ=0.5
3 Prognostic marker effect, no treatment effect xlogΔ xlogΔ Δ=0.4
4 Predictive marker effect (perfectly dichotomous) 0 1{x>x0}logΔ x0=0.5,Δ=0.4
5 Predictive marker effect (nearly dichotomous) 0 exp{25(x−x0)}1+exp{25(x−x0)}logΔ x0=0.5,Δ=0.4
6 Predictive marker effect (linear) 0 xlogΔ Δ=0.4
7 Predictive marker effect (nonlinear, monotone) 0 {1−exp(−6x)}logΔ Δ=0.4
8 Predictive marker effect (nonlinear, monotone) 0 x3logΔ Δ=0.4
9 Predictive marker effect 0 If x≤x0,exp{30(x−x1)}1+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,11+exp{30(x−x2)}logΔ x1=0.3,x2=0.7
10 Predictive marker effect 0 If x≤x0,11+exp{30(x−x1)}logΔ x0=0.5,Δ=0.4
(nonlinear, nonmonotone) If x>x0,exp{30(x−x2)}1+exp{30(x−x2)}logΔ x1=0.3,x2=0.7

Comparator design. To assess the relative advantages of (1) treating a continuous biomarker as continuous and nonmonotone in modeling and decision-making and (2) taking posterior uncertainty into account when modeling a marker's interaction with treatment effect, we compare our BCM design against a more typical adaptive enrichment design strategy that assumes marker monotonicity and employs artificial dichotomization before marker effects are assessed. Specifically, this frequentist dichotomizing marker (FDM) design uses a grid search algorithm to find the best dichotomizing cut-point at an interim analysis, and then evaluates efficacy and futility using a classical hypothesis testing framework. We outline the distinguishing features of this design algorithm below.

Parameter specification. With the total sample size and accrual parameters as described above, we selected decision thresholds to yield a fair comparison of the two designs while controlling the trial-wise type I error rate (incorrect declaration of efficacy overall or within a marker group at any timepoint) in our proposed design to be less than 10%, as follows. The marker prevalence threshold εprev was set to 0.20 for both designs. For the BCM design, we allowed a maximum number of M=10 interior knots for the penalized splines used to model biomarker-driven effect, and used α=0.05 in (2.11) to define marker positivity. For efficacy evaluations, we set Peff,overall=0.975 and Peff,group=0.975 for the BCM design, and correspondingly set Peff,overall,FDM=0.025 and Peff,group,FDM=0.025 for the FDM design. For futility analyses, we fix Pfut,overall=Pfut,group=0.05 for the BCM design, and fix HRfut,overall,FDM=HRfut,group,FDM=1 for the FDM design. We choose Pint,FDM=0.05⁠, which results in a similar marker dichotomization rate as our BCM design when the marker effect is perfectly dichotomous (i.e., Scenario 4). We performed 1000 iterations of hypothetical trials for each marker scenario.

3.2 Simulation Results

Simulation results are displayed graphically in Figures 2 and 3 and tabulated in Web Table 1. Figure 2 presents the relative frequency of all possible decisions made by the two trial designs for each scenario, which range from the totally correct decision in turquoise, to the partially correct decision in yellow, to the totally wrong decision in orange—standardized on this scale so that design performance based on the accuracy of conclusions reached, which themselves are scenario-dependent, may be more easily assessed and compared. For each scenario, Figure 3 shows the average rates at which each trial's final conclusion correctly versus incorrectly classifies individual patients whose marker values truly fall in (1) regions of high efficacy (⁠HR≤0.80⁠), where true marker positives are represented in dark turquoise and false marker negatives are shown in dark orange; and (2) regions of minimal or no efficacy (⁠HR>0.95⁠), where true marker negatives are represented in light turquoise and false marker positives are shown in light orange. A third region, representing the proportion of patients whose marker-specific treatment effects truly lie in the region between strong efficacy and no efficacy (⁠0.8<HR≤0.95⁠) and thus are challenging to classify in general (particularly in a trial of this size), is shown in black. The true marker prevalence can be directly referenced from the y-axis in scenarios where there is a truly predictive effect, and the areas of all three marker effect regions under each trial design add up to 1. As such, the True Positive Rate (TPR; dark turquoise) and True Negative Rate (TNR; light turquoise) can be readily compared between designs. Web Figure 1 shows the true arm-specific marker effects in graphical form for each scenario, as well as posterior estimates of the treatment effect curves h(x) and corresponding credible intervals as evaluated at the interim and the final analyses from one trial iteration under each marker scenario.

For each marker scenario, the relative frequency of all possible decisions made by each trial design (BCM and FDM) is shown. These range from an entirely correct decision (e.g., declaring efficacy in a subgroup when subgroup efficacy is the truth), to a partially correct decision where the truth might later be recoverable in yellow (e.g., declaring efficacy overall when subgroup efficacy is the truth), to a completely incorrect decision in orange (e.g., concluding efficacy overall when overall futility is the truth)

Figure 2

For each marker scenario, the relative frequency of all possible decisions made by each trial design (BCM and FDM) is shown. These range from an entirely correct decision (e.g., declaring efficacy in a subgroup when subgroup efficacy is the truth), to a partially correct decision where the truth might later be recoverable in yellow (e.g., declaring efficacy overall when subgroup efficacy is the truth), to a completely incorrect decision in orange (e.g., concluding efficacy overall when overall futility is the truth)

For each marker scenario, the average rates are shown by which each design (BCM and FDM) correctly versus incorrectly classifies individual patients as “marker positive” or “marker negative” according to the design's final decision and any applicable marker thresholds detected. Correct classifications are shown in turquoise, while incorrect classifications are shown in orange

Figure 3

For each marker scenario, the average rates are shown by which each design (BCM and FDM) correctly versus incorrectly classifies individual patients as “marker positive” or “marker negative” according to the design's final decision and any applicable marker thresholds detected. Correct classifications are shown in turquoise, while incorrect classifications are shown in orange

Scenarios 1–3: No Predictive Marker Effects. In Scenario 1, where no marker or treatment effects exist, the BCM design correctly concludes overall futility more than 90% of the time, while the FDM design does so approximately 80% of the time. Consequently, the BCM design achieves a TNR of 96.1% that is higher than 90.5% of the FDM design. Similar results can be observed for Scenario 3, where there exists a prognostic marker effect but no predictive (treatment) effect. In Scenario 2, where a constant treatment effect but no marker effects are present, the two designs have similar power (about 95%) to conclude overall efficacy and both designs achieve a TPR >98%⁠.

Scenarios 4–8: Predictive, Monotone Marker Effects. In Scenario 4, the biomarker is predictive and perfectly dichotomous, which is often assumed in practice in trial designs where a continuous biomarker is artificially dichotomized. In this “ideal” marker scenario, both designs correctly conclude marker-subgroup-specific efficacy more than 90% of the time and have similar TPR and TNR. These results are expected, since the threshold Pint,FDM was chosen to mirror the posterior credible interval marker threshold of the BCM design.

In Scenarios 5–8, the treatment effect is assumed to be a continuous and monotone function of the biomarker value. In each of these scenarios, the BCM design makes a totally correct decision at least as often as the FDM design, and achieves similar or greater TPR and TNR than the FDM design. In particular, in Scenarios 6 and 8, the FDM design incorrectly concludes overall efficacy in more than 5% of cases than the BCM design and consequently produces a lower TNR. The improved performance of our BCM design could be explained by the fact that it maintains the biomarker as continuous rather than dichotomous.

Scenarios 9–10: Predictive, Nonmonotone Marker Effects. In Scenarios 9–10, we assume that the continuous biomarker has a nonlinear and nonmonotone predictive relationship with treatment effect. In Scenario 9, where treatment effect is strongest at central values of the marker near X=0.50⁠, the BCM design correctly concludes marker-subgroup-specific efficacy more often and has a substantially improved TNR than the FDM design. In Scenario 10, where treatment effect is strong only at the extreme low and high marker values, the BCM design is able to detect that these disjoint regions are both “biomarker positive” approximately 70% of the time, while the FDM design incorrectly concludes overall efficacy more than 90% of the time. Consequently, the BCM design produces considerably higher TNR accompanied with slightly reduced TPR. These results indicate that by dichotomizing the continuous marker, the FDM design clearly fails to model the marker-by-treatment interaction when it is nonmonotone, which severely affects its ability to identify the truly responsive subpopulation defined by the marker.

4 Example: Acute Lymphoblastic Leukemia Trial with Continuous Biomarkers

4.1 Children's Oncology Group Trial AALL0434

In the completed Children's Oncology Group study AALL0434, 1031 children and young adults with T-cell acute lymphoblastic leukemia were enrolled and randomized to receive one of two dosing regimens of methotrexate: C-MTX or HDMTX (Winter et al., 2018). In this trial, the primary endpoint was event-free survival (EFS), defined as the time from randomization to the earliest of relapse, disease progression, secondary malignancy, or death. Patients were followed for up to 10 years, and those not experiencing an event or lost to follow-up were right-censored. Even though this study was not designed to be biomarker-driven, for the purpose of demonstrating our design's application in an actual trial setting, we separately show implementation using two different “biomarkers” that may be predictive of the degree of benefit of high-dose methotrexate: (1) age at diagnosis and (2) baseline white blood-cell count (WBC), a marker of inflammation.

Using the actual enrollment dates, arm assignments, age and WBC measurements (normalized by their percentiles), and EFS event or right-censoring dates for each patient, we were able to effectively re-run the AALL0434 trial according to both our BCM design and the competing FDM design. Due to differences in the overall sample size, lower event rate, and desired power of this trial from our simulation study, we adjusted the decision thresholds of each design to those indicated in Table 2, and additionally assumed that a single interim analysis would be performed once 75% of the 122 total EFS events were observed.

Table 2

Decision thresholds used to re-run the acute lymphoblastic leukemia trial under the BCM design and the FDM design, and the conclusions reached under each design

BCM sesign FDM design
Thresholds Overall efficacy Peff,overall=0.95 Peff,overall,FDM=0.05
Subgroup efficacy Peff,group=0.90 Peff,group,FDM=0.10
Overall futility Pfut,overall=0.10 HRfut,overall,FDM=1
Subgroup futility Pfut,group=0.10 HRfut,group,FDM=1
Marker dichotomization α=0.025 Pint,FDM=0.05
Marker prevalence [0.25, 0.75] [0.25, 0.75]
Trial conclusions Age X+ Conclusion: Efficacy (Interim) Overall Conclusion: Efficacy (Final)
X− Conclusion: Futility (Final)
Sensitive subgroup: Age >8.5 years Sensitive subgroup: None
WBC X+ Conclusion: Efficacy (Interim) Overall conclusion: Efficacy (Final)
X− Conclusion: Efficacy (Final)
Sensitive subgroup: None Sensitive subgroup: None
BCM sesign FDM design
Thresholds Overall efficacy Peff,overall=0.95 Peff,overall,FDM=0.05
Subgroup efficacy Peff,group=0.90 Peff,group,FDM=0.10
Overall futility Pfut,overall=0.10 HRfut,overall,FDM=1
Subgroup futility Pfut,group=0.10 HRfut,group,FDM=1
Marker dichotomization α=0.025 Pint,FDM=0.05
Marker prevalence [0.25, 0.75] [0.25, 0.75]
Trial conclusions Age X+ Conclusion: Efficacy (Interim) Overall Conclusion: Efficacy (Final)
X− Conclusion: Futility (Final)
Sensitive subgroup: Age >8.5 years Sensitive subgroup: None
WBC X+ Conclusion: Efficacy (Interim) Overall conclusion: Efficacy (Final)
X− Conclusion: Efficacy (Final)
Sensitive subgroup: None Sensitive subgroup: None

Table 2

Decision thresholds used to re-run the acute lymphoblastic leukemia trial under the BCM design and the FDM design, and the conclusions reached under each design

BCM sesign FDM design
Thresholds Overall efficacy Peff,overall=0.95 Peff,overall,FDM=0.05
Subgroup efficacy Peff,group=0.90 Peff,group,FDM=0.10
Overall futility Pfut,overall=0.10 HRfut,overall,FDM=1
Subgroup futility Pfut,group=0.10 HRfut,group,FDM=1
Marker dichotomization α=0.025 Pint,FDM=0.05
Marker prevalence [0.25, 0.75] [0.25, 0.75]
Trial conclusions Age X+ Conclusion: Efficacy (Interim) Overall Conclusion: Efficacy (Final)
X− Conclusion: Futility (Final)
Sensitive subgroup: Age >8.5 years Sensitive subgroup: None
WBC X+ Conclusion: Efficacy (Interim) Overall conclusion: Efficacy (Final)
X− Conclusion: Efficacy (Final)
Sensitive subgroup: None Sensitive subgroup: None
BCM sesign FDM design
Thresholds Overall efficacy Peff,overall=0.95 Peff,overall,FDM=0.05
Subgroup efficacy Peff,group=0.90 Peff,group,FDM=0.10
Overall futility Pfut,overall=0.10 HRfut,overall,FDM=1
Subgroup futility Pfut,group=0.10 HRfut,group,FDM=1
Marker dichotomization α=0.025 Pint,FDM=0.05
Marker prevalence [0.25, 0.75] [0.25, 0.75]
Trial conclusions Age X+ Conclusion: Efficacy (Interim) Overall Conclusion: Efficacy (Final)
X− Conclusion: Futility (Final)
Sensitive subgroup: Age >8.5 years Sensitive subgroup: None
WBC X+ Conclusion: Efficacy (Interim) Overall conclusion: Efficacy (Final)
X− Conclusion: Efficacy (Final)
Sensitive subgroup: None Sensitive subgroup: None

4.2 AALL0434 Results under Alternative Designs

Results of the re-analysis of AALL0434 are shown in Table 2, and the Bayesian penalized spline fits for the predictive marker effect h(x) at the interim and final analyses are shown in Figure 4, respectively, for age and WBC. When age was the continuous biomarker of interest, the FDM design failed to detect a biomarker effect, concluding overall efficacy at the final analysis. The BCM design, however, detected a predictive effect of age at the interim analysis as shown in Figure 4, concluding early efficacy in patients with age greater than 8.5 years, and concluding futility at the final analysis for patients younger than 8.5 years of age. When WBC was the marker of interest, both designs concluded overall efficacy; however, the BCM design reached this conclusion for a patient subset at the interim analysis, 2 years earlier in calendar time than the FDM design. Although the intended scope of this paper is a trial design for a single continuous biomarker, we include details for a design extension that considers both age and WBC (assuming independence of their effects) in Web Appendix D.

Estimates of h(eer), the predictive effect of age (top), and WBC (bottom) at interim and final analyses are shown in blue for the acute lymphoblastic leukemia trial data set. The corresponding 95% pointwise credible bands are shown in dark gray, and the 95% simultaneous credible bands are shown in light gray

Figure 4

Estimates of h(eer), the predictive effect of age (top), and WBC (bottom) at interim and final analyses are shown in blue for the acute lymphoblastic leukemia trial data set. The corresponding 95% pointwise credible bands are shown in dark gray, and the 95% simultaneous credible bands are shown in light gray

5 Discussion

In this paper, we proposed a new randomized adaptive trial design that utilizes the possibly nonlinear or nonmonotone relationships of a truly continuous biomarker with outcome and treatment effect to assess differential response to experimental therapy. In simulation studies, our BCM design always fared at least as well as a more traditional, dichotomization-based, cut-point search approach, and in scenarios where the true predictive marker effect was far from dichotomous, the BCM design usually yielded better properties and more accurate patient classification. Notably, our design fared especially well when truly nonlinear or nonmonotone marker effects were present, even though the shape of these marker effects were not required to be known a priori. Though we considered only the case of a time-to-event endpoint and a single interim analysis in our simulations, in practice, the BCM design is readily extendable to additional interim analyses and any endpoint type that can be assessed by a linear or generalized linear model with nested Bayesian penalized splines or other nonlinear modeling methods for the biomarker effects.

Some limitations to our design, similar to those affecting all adaptive enrichment designs, do exist. First, this design should only be used in settings where a large enough sample size can be afforded to detect a range of feasible and clinically relevant effect sizes that might exist, and for various distributions or prevalences of the underlying biomarker. Simulation studies are critical to determine how the design might function for moderate sample sizes or marker effects. Second, the primary endpoint of the trial should be “quickly” observable, at least relative to the rate of enrollment, so that interim analyses and adaptations are sufficiently well informed by observed early outcomes, and so early decisions have the benefit of tangible savings (in terms of time, money, or patients enrolled). Finally, our design is intended to be used with a marker that is already well understood with respect to its relationship to the experimental therapy and its mechanism of action, rather than markers or signatures that are purely exploratory or where preliminary evidence of a predictive effect has not yet been established, for example, in earlier or smaller studies.

Planned extensions of our design include the setting of more than one continuous biomarker where prognostic and predictive marker effects may not be independent or additive, time-to-event endpoints that may not follow proportional hazards (as is commonly observed in immunotherapy trials, for example), and a marker-adaptive randomization process that randomizes a patient with a particular value of the marker X with greater probability to the treatment arm that appears thus far in the trial to be more favorable for that marker value. In the latter case, the probability of random assignment to the experimental arm (versus the control arm) could be represented as a continuous function over the range of marker values that incoming patients may have, with this function updated continuously over the course of the trial based on evidence of the marker effect and associated posterior uncertainty. These extensions are ongoing and will be presented in future work.

Open Research Badges

This article has earned an Open Materials badge for making publicly available the components of the research methodology needed to reproduce the reported procedure and analysis. All materials are available at xxxxxxx.

1 Data Availability Statement

De-identified clinical trial data as analyzed in this paper are available by completing Data Use and Data Transfer agreements with Children's Oncology Group at https://childrensoncologygroup.org/index.php/data-sharing or may be downloaded from the National Clinical Trials Network (NCTN) Data Archive of the National Cancer Institute (NCI) at https://nctn-data-archive.nci.nih.gov.

Supporting Information

Web Appendices, Tables, and Figures referenced in Sections 24 are available with this paper at the Biometrics website on Wiley Online Library. The R code to implement the proposed trial design and to reproduce the results of this paper can be found at https://github.com/YushaLiu/Continuous_marker_adaptive_trial_design.

Acknowledgments

This work was supported by the National Institutes of Health [NIH/NCI 2U10CA180899-06 and NIH KL2 TR002379]. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. The authors would also like to acknowledge the Children's Oncology Group ALL Committee for permitting use of the AALL0434 data as well as Zhiguo (Bruce) Chen for his contributions to that study.

References

Brannath

,

W.

,

Zuber

,

E.

,

Branson

,

M.

,

Bretz

,

F.

,

Gallo

,

P.

,

Posch

,

M.

, et al. (

2009

)

Confirmatory adaptive designs with Bayesian decision tools for a targeted therapy in oncology

.

Statistics in Medicine

,

28

,

1445

1463

.

Friede

,

T.

,

Parsons

,

N.

&

Stallard

,

N.

(

2012

)

A conditional error function approach for subgroup selection in adaptive clinical trials

.

Statistics in Medicine

,

31

,

4309

4320

.

Jenkins

,

M.

,

Stone

,

A.

&

Jennison

,

C.

(

2011

)

An adaptive seamless phase II/III design for oncology trials with subpopulation selection using correlated survival endpoints

.

Pharmaceutical Statistics

,

10

,

347

356

.

Karuri

,

S. W.

&

Simon

,

R.

(

2012

)

A two-stage Bayesian design for co-development of new drugs and companion diagnostics

.

Statistics in Medicine

,

31

,

901

914

.

Mehta

,

C.

,

Schäfer

,

H.

,

Daniel

,

H.

&

Irle

,

S.

(

2014

)

Biomarker driven population enrichment for adaptive oncology trials with time to event endpoints

.

Statistics in Medicine

,

33

,

4515

4531

.

Ohwada

,

S.

&

Morita

,

S.

(

2016

)

Bayesian adaptive patient enrollment restriction to identify a sensitive subpopulation using a continuous biomarker in a randomized phase 2 trial

.

Pharmaceutical Statistics

,

15

,

420

429

.

Renfro

,

L.A.

,

Coughlin

,

C.M.

,

Grothey

,

A.M.

&

Sargent

,

D.J.

(

2014

)

Adaptive randomized phase II design for biomarker threshold selection and independent evaluation

.

Chinese Clinical Oncology

,

3

,

1

14

.

Renfro

,

L.A.

,

An

,

M.W.

&

Mandrekar

,

S.J.

(

2017

)

Precision oncology: a new era of cancer clinical trials

.

Cancer Letters

,

387

,

121

126

.

Song

,

J.X.

(

2014

)

A two-stage patient enrichment adaptive design in phase II oncology trials

.

Contemporary Clinical Trials

,

37

,

148

154

.

Wang

,

S.J.

,

O'Neill

,

R.T.

&

Hung

,

H.M.

(

2007

)

Approaches to evaluation of treatment effect in randomized clinical trials with genomic subset

.

Pharmaceutical Statistics

,

6

,

227

244

.

Wand

,

M.P.

&

Ormerod

,

J.T.

(

2008

)

On semiparametric regression with O'Sullivan penalized splines

.

Australian & New Zealand Journal of Statistics

,

50

,

179

198

.

Winter

,

S.S.

,

Dunsmore

,

K.P.

,

Devidas

,

M.

,

Wood

,

B.L.

,

Esiashvili

,

N.

,

Chen

,

Z.

, et al. (

2018

)

Improved survival for children and young adults with T-lineage acute lymphoblastic leukemia: results from the Children's Oncology Group AALL0434 methotrexate randomization

.

Journal of Clinical Oncology

,

36

,

2926

2934

.

© 2021 The International Biometric Society.

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 189

140 Pageviews

49 PDF Downloads

Since 1/1/2024

Month: Total Views:
January 2024 4
February 2024 13
March 2024 27
April 2024 23
May 2024 21
June 2024 16
July 2024 17
August 2024 29
September 2024 10
October 2024 21
November 2024 8

Citations

1 Web of Science

×

Email alerts

Email alerts

Citing articles via

More from Oxford Academic