Correlates of Protection

Linking antibody titres to infection risk — Post 2 in the Digital Twins for Vaccine Trials series

digital twin

vaccine

correlate of protection

immunology

Author

Jong-Hoon Kim

Published

April 22, 2026

1 The bridge from immune model to clinical endpoint

Post 1 built a within-host ODE model that predicts the antibody titre trajectory \(\text{Ab}(t)\) for a vaccinated individual. We can run that model for any individual whose parameters we know and get a full 12-month titre curve.

But the question a clinical trial cares about is not what is this person’s titre? — it is will this person get infected? Connecting the two requires a correlate of protection (CoP): a relationship between an immune measurement and the probability of being protected from infection.

The CoP is the second essential building block of a vaccine digital twin. Without it, the within-host model produces immunological outputs that float free of any clinical meaning. With it, every point on the antibody trajectory becomes a probability of protection at that moment in time — exactly what trial simulation requires.

2 What is a correlate of protection?

Plotkin defined a correlate of protection as “an immune response that is statistically interchangeable with protection” (1). The formal statistical version, due to Prentice (2) and later refined by Qin et al. (3), requires the immune marker to capture all of the treatment effect on the clinical outcome. Plotkin and Gilbert later distinguished (4):

Correlate of protection (CoP): a marker that statistically predicts protection, without necessarily causing it.
Mechanism of protection (MoP): the immune effector that directly prevents infection or disease.

For most vaccines, neutralising antibody serves as both — it is the effector that neutralises the pathogen and also the marker that predicts protection. For vaccines where T cells are the primary effector (e.g., tuberculosis), identifying a CoP has been much harder.

Why does this distinction matter for digital twins? Because:

A statistical CoP is sufficient to simulate a trial: if you know the titre-protection relationship, you can predict clinical outcomes from simulated titres.
A mechanistic CoP is needed if you want the model to extrapolate to new vaccine platforms, variants, or dosing regimens not seen in the training data.

COVID-19 vaccine trials provided the clearest demonstration of a titre-based CoP in recent history. Khoury et al. (5) analysed seven Phase 3 trials and found that neutralising antibody titre at peak immunogenicity explained 93% of the variation in vaccine efficacy across platforms. Earle et al. (6) reached a similar conclusion using a different meta-analytic approach. Gilbert et al. (7) then confirmed this in a pre-specified correlates analysis of the mRNA-1273 COVE trial, satisfying formal statistical CoP criteria.

3 Mathematical framework

3.1 The Hill model

The most widely used quantitative CoP model is the Hill (sigmoidal) function:

\[ P(\text{protected} \mid T) = \frac{T^k}{T^k + \text{EC}_{50}^k} \tag{1} \]

where:

\(T\) is the antibody titre (or any immune marker)
\(\text{EC}_{50}\) is the titre at which 50% of individuals are protected
\(k\) is the Hill coefficient, controlling the steepness of the transition

This is the same function used in pharmacology to describe dose-response relationships. For vaccine CoP analysis, \(T\) is typically the neutralising antibody titre measured at peak immunogenicity (day 28–30 after last dose), and protection is defined as absence of symptomatic infection during the follow-up period.

3.2 Connection to logistic regression

The Hill model is algebraically identical to logistic regression on the log-titre. Rewriting equation (1):

\[ \log\frac{P}{1-P} = k \cdot \log T - k \cdot \log \text{EC}_{50} = k \cdot (\log T - \log \text{EC}_{50}) \]

This is a logistic regression with slope \(k\) and intercept \(-k \log \text{EC}_{50}\) on the log-titre scale. The Hill coefficient equals the logistic regression slope; \(\text{EC}_{50} = \exp(-\beta_0 / \beta_1)\). The two parameterisations are interchangeable — the Hill form is more interpretable biologically, while the logistic form connects directly to standard statistical software.

3.3 Effect of \(k\) on threshold sharpness

Code

titre_grid <- seq(0.1, 500, length.out = 500)
EC50_ref   <- 100

k_vals <- c(0.5, 1, 2, 4, 8)
df_hill <- expand.grid(titre = titre_grid, k = k_vals) |>
  mutate(
    P_protect = titre^k / (titre^k + EC50_ref^k),
    k_label   = factor(paste0("k = ", k), levels = paste0("k = ", k_vals))
  )

pal_k <- c("k = 0.5" = "#D7191C", "k = 1" = "#FDAE61",
           "k = 2"   = "#3B6EA8", "k = 4" = "#1A9641", "k = 8" = "#7B2D8B")

ggplot(df_hill, aes(x = titre, y = P_protect, colour = k_label)) +
  geom_line(linewidth = 0.9) +
  geom_vline(xintercept = EC50_ref, linetype = "dashed", colour = "grey40") +
  annotate("text", x = EC50_ref + 8, y = 0.08, label = "EC50",
           colour = "grey40", size = 3.5, hjust = 0) +
  scale_x_log10(breaks = c(1, 10, 100, 500),
                labels = c("1", "10", "100", "500")) +
  scale_colour_manual(values = pal_k) +
  labs(x = "Antibody titre (AU/mL, log scale)",
       y = "P(protected)",
       colour = NULL) +
  theme(legend.position = "top")

The Hill protection model for EC50 = 100 and varying Hill coefficients k. Low k (blue) gives a gentle, gradual protection curve — even individuals with low titres have some protection and even high titres give incomplete protection. High k (red) gives a near-binary threshold: most individuals with titre above EC50 are fully protected, most below are not. For COVID-19 vaccines, k ≈ 2–3 was estimated from meta-analysis.

The Hill coefficient \(k\) has a direct practical interpretation: it determines whether protection is graded (low \(k\)) or threshold-like (high \(k\)). A graded CoP means that even low-titre individuals have meaningful partial protection; a threshold CoP means the vaccine either works or does not, with little in-between. For COVID-19, empirical estimates cluster around \(k \approx 2\)–3 (5,8), intermediate between these extremes but closer to threshold-like.

4 Fitting a CoP model from trial data

4.1 Simulated trial data

We generate a synthetic cohort of 500 vaccinated trial participants. Each participant has a peak antibody titre drawn from a log-normal distribution (approximating BNT162b2 immunogenicity data), and an infection outcome determined by the Hill model with known ground-truth parameters.

Code

set.seed(2024)
n_trial <- 500

# Titre distribution (log-normal; median ≈ 150 AU/mL, ~4-fold inter-individual spread)
mu_log    <- log(150)
sigma_log <- 0.7
titre_obs <- rlnorm(n_trial, meanlog = mu_log, sdlog = sigma_log)

# True CoP parameters (ground truth; inspired by Khoury et al. 2021)
EC50_true <- 120
k_true    <- 2.2

hill_p <- function(T, EC50, k) T^k / (T^k + EC50^k)

p_protect_true <- hill_p(titre_obs, EC50_true, k_true)
infected       <- rbinom(n_trial, 1, prob = 1 - p_protect_true)  # 1 = infected

cat(sprintf(
  "Trial: %d participants, %d infected (%.1f%%), %d protected (%.1f%%)\n",
  n_trial, sum(infected), 100 * mean(infected),
  sum(infected == 0), 100 * mean(infected == 0)
))

Trial: 500 participants, 201 infected (40.2%), 299 protected (59.8%)

4.2 Maximum likelihood estimation

The Hill model is fitted by maximising the Bernoulli log-likelihood over \(\text{EC}_{50}\) and \(k\). Working on the log scale ensures parameters remain positive throughout optimisation.

Code

neg_ll <- function(params) {
  EC50 <- exp(params[1])
  k    <- exp(params[2])
  p    <- hill_p(titre_obs, EC50, k)
  ll   <- sum(log(ifelse(infected == 0, p, 1 - p) + 1e-12))
  -ll
}

opt <- optim(c(log(100), log(2)), neg_ll, method = "BFGS", hessian = TRUE)

EC50_hat <- exp(opt$par[1])
k_hat    <- exp(opt$par[2])

cat(sprintf("True:      EC50 = %.1f,  k = %.2f\n", EC50_true, k_true))

True:      EC50 = 120.0,  k = 2.20

Code

cat(sprintf("Estimated: EC50 = %.1f, k = %.2f\n", EC50_hat,  k_hat))

Estimated: EC50 = 122.0, k = 2.26

4.3 Bootstrap confidence intervals

Code

set.seed(99)
n_boot <- 1000

boot_pars <- t(replicate(n_boot, {
  idx <- sample(n_trial, n_trial, replace = TRUE)
  neg_ll_b <- function(params) {
    EC50 <- exp(params[1]); k <- exp(params[2])
    p    <- hill_p(titre_obs[idx], EC50, k)
    -sum(log(ifelse(infected[idx] == 0, p, 1 - p) + 1e-12))
  }
  res <- tryCatch(
    optim(opt$par, neg_ll_b, method = "BFGS")$par,
    error = function(e) opt$par
  )
  exp(res)
}))

ci_EC50 <- quantile(boot_pars[, 1], c(0.025, 0.975))
ci_k    <- quantile(boot_pars[, 2], c(0.025, 0.975))

cat(sprintf("EC50: %.1f  (95%% CI: %.1f – %.1f)\n", EC50_hat, ci_EC50[1], ci_EC50[2]))

EC50: 122.0  (95% CI: 111.3 – 134.0)

Code

cat(sprintf("k:    %.2f  (95%% CI: %.2f – %.2f)\n", k_hat,    ci_k[1],    ci_k[2]))

k:    2.26  (95% CI: 1.86 – 2.74)

Code

titre_seq <- exp(seq(log(1), log(max(titre_obs) * 1.2), length.out = 400))

# Bootstrap uncertainty band
band_mat <- apply(boot_pars, 1, function(p) hill_p(titre_seq, p[1], p[2]))
p_lo <- apply(band_mat, 1, quantile, 0.025)
p_hi <- apply(band_mat, 1, quantile, 0.975)

df_curve <- tibble(
  titre  = titre_seq,
  p_fit  = hill_p(titre_seq, EC50_hat, k_hat),
  p_lo   = p_lo,
  p_hi   = p_hi
)

df_pts <- tibble(
  titre    = titre_obs,
  outcome  = factor(ifelse(infected == 0, "Protected", "Infected"),
                    levels = c("Protected", "Infected")),
  y_jitter = ifelse(infected == 0, 1, 0) + runif(n_trial, -0.04, 0.04)
)

ggplot() +
  geom_ribbon(data = df_curve,
              aes(x = titre, ymin = p_lo, ymax = p_hi),
              fill = "grey70", alpha = 0.4) +
  geom_line(data = df_curve, aes(x = titre, y = p_fit),
            colour = "#3B6EA8", linewidth = 1.0) +
  geom_point(data = df_pts,
             aes(x = titre, y = y_jitter, colour = outcome),
             alpha = 0.35, size = 1.2) +
  geom_vline(xintercept = EC50_hat, linetype = "dashed", colour = "grey40") +
  annotate("text", x = EC50_hat * 1.08, y = 0.08,
           label = sprintf("EC50 = %.0f", EC50_hat),
           colour = "grey30", size = 3.5, hjust = 0) +
  scale_x_log10(breaks = c(5, 20, 50, 100, 200, 500),
                labels = c("5", "20", "50", "100", "200", "500")) +
  scale_colour_manual(values = c("Protected" = "#3B6EA8", "Infected" = "#E87722")) +
  labs(x = "Antibody titre at day 28 (AU/mL, log scale)",
       y = "P(protected)",
       colour = NULL) +
  theme(legend.position = "top")

Fitted Hill model (solid curve) against simulated trial data. Points are jittered vertically; orange = infected, blue = protected. The grey band is the 95% bootstrap confidence interval around the fitted curve. The dashed line marks the estimated EC50.

The MLE recovers the ground-truth parameters well, and the bootstrap band is reassuringly narrow across the central range of titres — where most of the data lie. The band widens substantially at the extremes, reflecting the limited information content of data from individuals who are almost certainly protected (high titre) or almost certainly susceptible (very low titre).

5 Time-varying protection

The CoP model so far treats titre as a static snapshot at day 28. But titre changes over time — as shown in Post 1 — and so does protection. Combining the within-host ODE with the CoP gives us a continuous protection trajectory: \(P(\text{protected} \mid t) = P(\text{protected} \mid \text{Ab}(t))\).

Code

# Scale Ab trajectory to titre units:
# set peak prime-boost Ab = 200 AU/mL (approximately 90th percentile of our cohort)
peak_ab    <- max(out_pb$Ab)
ref_titre  <- quantile(titre_obs, 0.75)   # ~75th percentile titre
scale_fac  <- ref_titre / peak_ab

df_traj <- out_pb |>
  mutate(
    titre_scaled = Ab * scale_fac,
    P_protect    = hill_p(titre_scaled, EC50_hat, k_hat)
  )

# Dual-axis plot
y_ab_max <- max(df_traj$titre_scaled)
y_p_max  <- 1

p_traj <- ggplot(df_traj, aes(x = time)) +
  geom_area(aes(y = titre_scaled / y_ab_max),
            fill = "grey80", alpha = 0.5) +
  geom_line(aes(y = P_protect),
            colour = "#E87722", linewidth = 1.0) +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey40") +
  geom_vline(xintercept = 21, linetype = "dotted", colour = "#3B6EA8",
             linewidth = 0.7) +
  annotate("text", x = 23, y = 0.97, label = "Boost", colour = "#3B6EA8",
           size = 3.5, hjust = 0) +
  annotate("text", x = 340, y = 0.53, label = "50% protection",
           colour = "grey40", size = 3, hjust = 1) +
  scale_y_continuous(
    name    = "P(protected)",
    limits  = c(0, 1),
    sec.axis = sec_axis(~ . * y_ab_max,
                        name = "Antibody titre (AU/mL, scaled)")
  ) +
  labs(x = "Days post-first dose") +
  theme(axis.title.y.right = element_text(colour = "grey50"),
        axis.text.y.right  = element_text(colour = "grey50"))

p_traj

Time-varying protection for a representative vaccinated individual (prime-boost at days 0 and 21). The antibody trajectory from Post 1’s ODE model (grey, right axis) is scaled to titre units and fed into the fitted Hill CoP. The orange curve shows P(protected) over 12 months. The dashed horizontal line marks the 50% protection level corresponding to EC50. Protection peaks at ~85% around day 35–40, then wanes slowly, remaining above 50% for most of the year thanks to the long-lived plasma cell pool.

This figure encodes a key insight for trial design: the time window of adequate protection is not simply determined by when titres fall, but by the shape of the CoP curve and where the individual sits on it. An individual whose titre starts high may remain above 50% protection for 10+ months; someone in the bottom quartile of the immunogenicity distribution may drop below that threshold within 3–4 months post-boost.

A digital twin makes this explicit and individual-specific — unlike population-average efficacy estimates that obscure the heterogeneity.

6 From individual titres to population vaccine efficacy

6.1 Building a mini virtual patient cohort

A single representative patient is illustrative but not sufficient for trial design. We need a distribution of patients — a virtual patient cohort. Post 3 will build this properly using Latin hypercube sampling across all ODE parameters. For now, we demonstrate the concept by varying the SLPC stimulation rate \(k_S\), the dominant determinant of peak antibody titre.

Code

set.seed(777)
n_vp <- 300

# Log-normal multiplier on kS: ~2-fold individual variation in peak titre
kS_mult <- rlnorm(n_vp, meanlog = 0, sdlog = 0.45)

# Simulate ODE for each virtual patient; extract peak titre
vp_peaks <- vapply(kS_mult, function(m) {
  p_i <- parms_wh
  p_i["kS"] <- parms_wh["kS"] * m
  out_i <- as.data.frame(ode(y = state0, times = times_wh,
                              func = withinhost_vax, parms = p_i,
                              events = list(data = dose2)))
  max(out_i$Ab) * scale_fac   # convert to titre units
}, numeric(1))

# Protection probability at peak titre for each virtual patient
vp_protect <- hill_p(vp_peaks, EC50_hat, k_hat)

6.2 Population VE from individual heterogeneity

In a randomised trial comparing vaccinated to unvaccinated (placebo) participants, vaccine efficacy is:

\[ \text{VE} = 1 - \frac{P(\text{infection} \mid \text{vaccinated})}{P(\text{infection} \mid \text{unvaccinated})} \]

Under 100% exposure (every participant encounters the pathogen), \(P(\text{infection} \mid \text{vaccinated}) = 1 - \bar{P}\) where \(\bar{P}\) is the mean individual protection probability, and \(P(\text{infection} \mid \text{unvaccinated}) = 1\). Therefore:

\[ \text{VE} = \bar{P} = \frac{1}{n} \sum_i P(\text{protected} \mid T_i) \]

Code

VE_pop      <- mean(vp_protect)
VE_mean_tit <- hill_p(mean(vp_peaks), EC50_hat, k_hat)   # naive: plug in mean titre

cat(sprintf("Population VE (mean of P_i):   %.1f%%\n", 100 * VE_pop))

Population VE (mean of P_i):   63.7%

Code

cat(sprintf("Naive VE (Hill at mean titre): %.1f%%\n",  100 * VE_mean_tit))

Naive VE (Hill at mean titre): 100.0%

Code

df_vp <- tibble(titre = vp_peaks, P_protect = vp_protect)

p_hist <- ggplot(df_vp, aes(x = titre)) +
  geom_histogram(bins = 40, fill = "#3B6EA8", colour = "white", linewidth = 0.2) +
  scale_x_log10(breaks = c(20, 50, 100, 200, 500),
                labels = c("20", "50", "100", "200", "500")) +
  labs(x = NULL, y = "Count") +
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank())

p_prot <- ggplot(df_vp, aes(x = titre, y = P_protect)) +
  geom_point(alpha = 0.35, colour = "#E87722", size = 1.2) +
  geom_line(data = df_curve, aes(x = titre, y = p_fit),
            colour = "#3B6EA8", linewidth = 0.8) +
  geom_hline(yintercept = VE_pop, linetype = "dashed", colour = "grey30") +
  annotate("text", x = 400, y = VE_pop + 0.03,
           label = sprintf("Population VE = %.0f%%", 100 * VE_pop),
           colour = "grey30", size = 3.5, hjust = 1) +
  scale_x_log10(breaks = c(20, 50, 100, 200, 500),
                labels = c("20", "50", "100", "200", "500")) +
  labs(x = "Peak antibody titre (AU/mL, log scale)", y = "P(protected)")

p_hist / p_prot

Virtual patient cohort: distribution of peak antibody titres (top) and individual protection probabilities (bottom). The population VE (dashed line) is the mean of the individual protection probabilities — but its value is sensitive to the entire distribution, not just the mean titre. The long left tail of poorly protected individuals (low P_protect) pulls population VE below what you would calculate by plugging the mean titre into the Hill model.

The calculation surfaces a subtle but important result: plugging the mean titre into the Hill model overestimates population VE. The correct calculation integrates over the titre distribution, and the non-linearity of the Hill model means the left tail (low-titre, poorly protected individuals) drags the population average down disproportionately. This Jensen’s inequality effect is one reason why “mean immunogenicity → mean efficacy” reasoning can be misleading in vaccine development.

The discrepancy is largest when: 1. The titre distribution is wide (high \(\sigma_{\log}\)), so the left tail is large. 2. The Hill coefficient \(k\) is moderate (the CoP curve is non-linear but not binary). 3. The mean titre sits near \(\text{EC}_{50}\), in the steepest part of the curve.

A digital twin that samples individual patients from their titre distribution — rather than working with population averages — gets this right automatically.

7 Implications for trial design

The CoP model quantifies several things that matter directly for trial planning:

1. Predicted vaccine efficacy before Phase 3. If you have Phase 1/2 immunogenicity data (titre distributions) and a CoP estimated from earlier trials or challenge studies, you can predict Phase 3 VE before enrolling a single patient. This is the most commercially valuable use of the CoP model: it informs go/no-go decisions and dose selection.

2. The correlate as a primary endpoint. Regulatory guidance increasingly allows immunogenicity endpoints (titre above a CoP threshold) as primary endpoints for bridging studies and booster approvals. FDA used this framework to approve SARS-CoV-2 booster doses in children without new efficacy trials.

3. Time-to-booster decisions. The time-varying protection curve shows when the mean (or any percentile) patient drops below a given protection threshold. This is how booster timing policies are now being designed — not from expert opinion but from quantitative model predictions.

4. Subgroup analysis. The CoP curve applied to different subgroup titre distributions predicts differential efficacy in elderly, immunocompromised, or previously infected populations — a direct input to trial stratification and eligibility criteria.

8 What comes next

We now have two of the three building blocks of a vaccine digital twin:

✅ Post 1: A within-host ODE that generates \(\text{Ab}(t)\) for any given parameter set.
✅ Post 2 (this post): A CoP model that maps \(\text{Ab}(t) \to P(\text{protected at time } t)\).

The missing piece is inter-individual variability: the fact that different people have very different within-host parameters, and we need to represent this variation realistically. In Post 3, we build a proper virtual patient cohort by sampling parameter uncertainty with Latin hypercube methods, calibrating to observed immunogenicity distributions, and validating that the resulting virtual population reproduces trial immunogenicity data.

References

Plotkin SA. Correlates of protection induced by vaccination. Clinical and Vaccine Immunology. 2010;17(7):1055–65. doi:10.1128/CVI.00131-10

Prentice RL. Surrogate endpoints in clinical trials: Definition and operational criteria. Statistics in Medicine. 1989;8(4):431–40. doi:10.1002/sim.4780080407

Qin L, Gilbert PB, Corey L, McElrath MJ, Self SG. A framework for assessing immunological correlates of protection in vaccine trials. Journal of Infectious Diseases. 2007;196(9):1304–12. doi:10.1086/522428

Plotkin SA, Gilbert PB. Nomenclature for immune correlates of protection after vaccination. Clinical Infectious Diseases. 2012;54(11):1615–7. doi:10.1093/cid/cis238

Khoury DS, Cromer D, Reynaldi A, Schlub TE, Wheatley AK, Juno JA, et al. Neutralizing antibody levels are highly predictive of immune protection from symptomatic SARS-CoV-2 infection. Nature Medicine. 2021;27:1205–11. doi:10.1038/s41591-021-01377-8

Earle KA, Ambrosino DM, Fiore-Gartland A, Goldblatt D, Gilbert PB, Siber GR, et al. Evidence for antibody as a protective correlate for COVID-19 vaccines. Vaccine. 2021;39(32):4423–8. doi:10.1016/j.vaccine.2021.05.063

Gilbert PB, Montefiori DC, McDermott AB, Fong Y, Benkeser D, Deng W, et al. Immune correlates analysis of the mRNA-1273 COVID-19 vaccine efficacy clinical trial. Science. 2022;375(6576):43–50. doi:10.1126/science.abm3425

Cromer D, Steain M, Reynaldi A, Schlub TE, Wheatley AK, Juno JA, et al. Neutralising antibody titres as predictors of protection against SARS-CoV-2 variants and the impact of boosting. npj Vaccines. 2022;7:17. doi:10.1038/s41541-022-00441-3

R version 4.5.3 (2026-03-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Asia/Seoul
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.3.2 tidyr_1.3.2     dplyr_1.2.0     ggplot2_4.0.2  
[5] deSolve_1.42   

loaded via a namespace (and not attached):
 [1] vctrs_0.7.2        cli_3.6.5          knitr_1.51         rlang_1.1.7       
 [5] xfun_0.57          otel_0.2.0         purrr_1.2.1        generics_0.1.4    
 [9] S7_0.2.1           jsonlite_2.0.0     labeling_0.4.3     glue_1.8.0        
[13] htmltools_0.5.9    scales_1.4.0       rmarkdown_2.31     grid_4.5.3        
[17] tibble_3.3.1       evaluate_1.0.5     fastmap_1.2.0      yaml_2.3.12       
[21] lifecycle_1.0.5    compiler_4.5.3     codetools_0.2-20   RColorBrewer_1.1-3
[25] pkgconfig_2.0.3    htmlwidgets_1.6.4  farver_2.1.2       digest_0.6.39     
[29] R6_2.6.1           tidyselect_1.2.1   pillar_1.11.1      magrittr_2.0.4    
[33] withr_3.0.2        tools_4.5.3        gtable_0.3.6