Building Virtual Patient Cohorts

Latin hypercube sampling, parameter uncertainty, and population variability — Post 3 in the Digital Twins for Vaccine Trials series

digital twin
vaccine
virtual patients
Latin hypercube sampling
Sobol
quasi-Monte Carlo
R
Author

Jong-Hoon Kim

Published

April 22, 2026

1 From one patient to many

Post 1 built an ODE model of vaccine-induced immunity for a single individual. Post 2 showed how to convert an antibody trajectory into a protection probability at each point in time. Together these two pieces describe what we might call a representative patient — one who lives at the centre of the parameter distribution.

But a clinical trial is not run on a representative patient. It is run on hundreds of people who differ systematically in age, prior exposure history, baseline immune fitness, body composition, genetics, and a dozen other factors that affect how the immune system responds to vaccination. A digital twin that only simulates the average response will systematically miss what a trial actually measures: the distribution of responses — who is well protected, who is barely protected, and why.

This post builds the missing piece: a virtual patient cohort (VPC). A VPC is a set of simulated individuals whose ODE parameters are sampled to reproduce the observed inter-individual variability in vaccine immunogenicity data. The key method is Latin hypercube sampling (LHS) — a stratified sampling scheme that efficiently explores the high-dimensional parameter space of a mechanistic model.


2 Why parameters vary across patients

In the five-compartment ODE from Post 1, the parameters encode biological processes that genuinely differ among people:

Parameter Biological driver of between-person variation
\(k_S\), \(k_L\) Baseline B-cell precursor frequency; vaccine formulation response
\(\delta_S\), \(\delta_L\) Individual differences in bone-marrow niche availability
\(\delta_{\text{Ab}}\) IgG catabolism rate, influenced by body weight and FcRn expression
\(\phi_S\), \(\phi_L\) Memory B-cell pool size at time of vaccination (prior exposure)
\(\delta_A\) Antigen clearance: injection site inflammation, arm muscle mass

None of these can be directly measured from a Phase 1 trial. What is measured is the antibody titre at 1–5 timepoints per participant. The challenge of virtual patient generation is: how do we choose parameter distributions such that the simulated population reproduces the observed titre distribution?

There are two broad strategies:

  1. Prior-based sampling: define physiologically plausible ranges for each parameter from the literature, sample from those ranges, and hope the resulting titre distribution matches observations. This is what we do in this post.

  2. Bayesian calibration: fit each individual’s parameters to their observed titre timeseries using Markov Chain Monte Carlo (MCMC) or variational inference. This is more accurate but computationally expensive and requires longitudinal data per person. Post 4 will introduce the Bayesian angle.


3 Latin hypercube sampling

3.1 The idea

Suppose the ODE has \(p\) parameters and we want to generate \(n\) virtual patients. The naive approach — random sampling from each parameter’s marginal distribution — is inefficient: with \(p = 8\) and \(n = 200\), random sampling leaves large regions of parameter space under-represented.

Latin hypercube sampling (LHS) (1) is a stratified design that guarantees every region of each marginal distribution is represented equally. The construction is:

  1. Divide the [0, 1] range of each of the \(p\) parameters into \(n\) equal strata.
  2. For each parameter, sample exactly one value from each stratum (so each stratum is covered exactly once).
  3. Randomly permute the \(n\) sample positions independently for each parameter, then transform to the desired marginal distribution.

The result is an \(n \times p\) matrix of parameter sets with superior coverage of the marginal distributions compared to random sampling, using the same number of model evaluations (2).

Code
set.seed(42)
n_demo <- 50

# LHS for 2 parameters
lhs_2d <- function(n) {
  p1 <- (sample(n) - runif(n)) / n
  p2 <- (sample(n) - runif(n)) / n
  data.frame(p1 = p1, p2 = p2)
}
srs_2d <- function(n) data.frame(p1 = runif(n), p2 = runif(n))

df_srs   <- srs_2d(n_demo) |> mutate(method = "SRS (random)")
df_lhs   <- lhs_2d(n_demo) |> mutate(method = "LHS")
df_sobol <- as.data.frame(sobol(n_demo, dim = 2, scrambling = 1, seed = 42)) |>
  setNames(c("p1", "p2")) |>
  mutate(method = "Sobol (scrambled)")

df_cmp <- bind_rows(df_srs, df_lhs, df_sobol) |>
  mutate(method = factor(method, levels = c("SRS (random)", "LHS", "Sobol (scrambled)")))

ggplot(df_cmp, aes(x = p1, y = p2, colour = method)) +
  geom_point(size = 2, alpha = 0.85) +
  facet_wrap(~method) +
  scale_colour_manual(values = c("SRS (random)"      = "#3B6EA8",
                                  "LHS"               = "#E87722",
                                  "Sobol (scrambled)" = "#1A9641"),
                      guide = "none") +
  labs(x = "Parameter 1 (uniform [0,1])",
       y = "Parameter 2 (uniform [0,1])") +
  theme(strip.background = element_blank(),
        strip.text = element_text(face = "bold"))

Three sampling methods for two parameters, n = 50. Simple random sampling (SRS, blue) leaves visible gaps and clusters. LHS (orange) guarantees that each horizontal band and each vertical band of the unit square is occupied exactly once — excellent marginal coverage, but no joint guarantee. The Sobol low-discrepancy sequence (green) fills the joint space more uniformly: there are no large empty patches anywhere in the square. The difference is subtle at n = 50 but becomes practically significant at small n or in high dimensions.

3.2 Implementing LHS in R

Code
lhs_sample <- function(n, p) {
  # Returns n x p matrix; each column is a LHS draw on [0, 1]
  mat <- matrix(NA_real_, nrow = n, ncol = p)
  for (j in seq_len(p)) {
    mat[, j] <- (sample(n) - runif(n)) / n
  }
  mat
}

# Transform uniform [0, 1] to log-normal
# log-normal with median = m, geometric SD = gsd
lhs_lognormal <- function(u, median, gsd) {
  qlnorm(u, meanlog = log(median), sdlog = log(gsd))
}

4 Defining parameter uncertainty bounds

We anchor each parameter’s distribution to its biological interpretation, using Amanna et al. (3) for antibody half-lives, Goel et al. (4) for human mRNA vaccine kinetics, and the original model calibration from Post 1 as the reference (median) value.

Parameter Median Geo. SD Rationale
\(k_S\) 5.0 2.0 ~4-fold range in SLPC stimulation (B-cell precursor frequency)
\(k_L\) 0.5 2.0 Correlated with \(k_S\) but less variable
\(\delta_S\) 0.15 1.5 SLPC lifespan is fairly conserved
\(\delta_L\) 0.003 2.5 Large range: fast (SARS-CoV-2) to slow (measles) waning
\(\delta_{\text{Ab}}\) 0.018 1.5 IgG catabolism; affected by body weight
\(\phi_S\) 3.0 2.0 Memory recall boost: depends on prior exposures
\(\phi_L\) 8.0 2.0 Same
\(\delta_A\) 0.30 1.5 Antigen clearance; fairly conserved

A geometric standard deviation (GSD) of 2.0 means approximately 4-fold variation across the 5th–95th percentile range. This is conservative relative to the 10–20-fold variation seen in neutralising antibody titres in Phase 1 data, but we expect parameter variation to produce titre variation through the nonlinear ODE, amplifying the spread.

Code
set.seed(2024)
n_vp <- 400

# ODE with fixed structural parameters and variable stimulus/decay rates
withinhost_vax <- function(t, state, parms) {
  with(as.list(c(state, parms)), {
    list(c(
      -delta_A * A,
      kS * A * (1 + phi_S * M) - delta_S * S,
      kL * A * (1 + phi_L * M) - delta_L * L,
      eps * S                   - delta_M  * M,
      rS * S + rL * L           - delta_Ab * Ab
    ))
  })
}

# Fixed parameters (structural — not varied in this post)
fixed_parms <- c(rS = 2.0, rL = 5.0, eps = 0.05,
                 delta_M = 0.004)

# LHS draw: 8 variable parameters
par_names <- c("kS", "kL", "delta_S", "delta_L",
               "delta_Ab", "phi_S", "phi_L", "delta_A")
medians   <- c(5.0,  0.5,  0.15,    0.003,  0.018,   3.0,  8.0,  0.30)
gsds      <- c(2.0,  2.0,  1.5,     2.5,    1.5,     2.0,  2.0,  1.5)

U <- lhs_sample(n_vp, length(par_names))
par_mat <- mapply(lhs_lognormal, u = as.data.frame(U),
                  median = medians, gsd = gsds)
colnames(par_mat) <- par_names

state0   <- c(A = 1, S = 0, L = 0, M = 0, Ab = 0)
times_wh <- seq(0, 365, by = 1)
dose2    <- data.frame(var = "A", time = 21, value = 1, method = "add")

# Run ODE for each virtual patient; store Ab trajectory
vpc_list <- lapply(seq_len(n_vp), function(i) {
  parms_i <- c(par_mat[i, ], fixed_parms)
  tryCatch({
    out <- as.data.frame(ode(y = state0, times = times_wh,
                              func = withinhost_vax, parms = parms_i,
                              events = list(data = dose2),
                              method = "lsoda"))
    out$patient <- i
    out
  }, error = function(e) NULL)
})

vpc_df <- bind_rows(Filter(Negate(is.null), vpc_list))

5 Visualising the virtual patient cohort

5.1 Antibody trajectory spaghetti plot

Code
# Summary statistics per time point
vpc_summary <- vpc_df |>
  group_by(time) |>
  summarise(
    median   = median(Ab),
    p10      = quantile(Ab, 0.10),
    p90      = quantile(Ab, 0.90),
    .groups  = "drop"
  )

# Thin sample for spaghetti plot (every 5th patient to avoid overplotting)
vpc_thin <- vpc_df |>
  filter(patient %in% seq(1, n_vp, by = 5))

ggplot() +
  geom_line(data = vpc_thin,
            aes(x = time, y = Ab + 1e-4, group = patient),
            colour = "grey70", linewidth = 0.2, alpha = 0.5) +
  geom_ribbon(data = vpc_summary,
              aes(x = time, ymin = p10 + 1e-4, ymax = p90 + 1e-4),
              fill = "#3B6EA8", alpha = 0.25) +
  geom_line(data = vpc_summary,
            aes(x = time, y = median + 1e-4),
            colour = "#3B6EA8", linewidth = 1.0) +
  geom_vline(xintercept = 21, linetype = "dotted", colour = "grey40") +
  scale_y_log10() +
  labs(x = "Days post-first dose",
       y = "Antibody titre (a.u., log scale)")

Individual antibody trajectories for 400 virtual patients (grey thin lines) with the median and 10th/90th percentile band overlaid (blue). The spread reflects the sampled parameter uncertainty. Most patients show the characteristic prime-boost shape (first peak, second higher peak at day 21, then waning), but the magnitude spans nearly two orders of magnitude — consistent with the ~10-fold variation in neutralising antibody titres seen in Phase 1 mRNA vaccine trials.

5.2 Distribution of peak titres

The distribution of peak titres (measured at day 28–35 post-boost, the typical Phase 1 timepoint) is the output the VPC must reproduce to be considered calibrated.

Code
vpc_peak <- vpc_df |>
  filter(time == 35) |>
  select(patient, Ab_peak = Ab)

p_dist <- ggplot(vpc_peak, aes(x = Ab_peak)) +
  geom_histogram(bins = 40, fill = "#3B6EA8", colour = "white", linewidth = 0.2) +
  scale_x_log10() +
  labs(x = "Peak antibody titre at day 35 (a.u., log scale)", y = "Count")

# Parameter sensitivity: Spearman rank correlation of each param vs log(peak)
vpc_pars_and_peak <- as_tibble(par_mat) |>
  mutate(patient = seq_len(n_vp)) |>
  left_join(vpc_peak, by = "patient") |>
  filter(!is.na(Ab_peak))

corrs <- sapply(par_names, function(pn) {
  cor(log(vpc_pars_and_peak[[pn]]),
      log(vpc_pars_and_peak$Ab_peak),
      method = "spearman")
})

df_corr <- tibble(
  parameter = factor(par_names, levels = par_names[order(abs(corrs))]),
  rho       = corrs[order(abs(corrs))],
  direction = ifelse(corrs[order(abs(corrs))] > 0, "Positive", "Negative")
)

p_sens <- ggplot(df_corr, aes(x = rho, y = parameter, fill = direction)) +
  geom_col(width = 0.6) +
  geom_vline(xintercept = 0, linewidth = 0.5, colour = "grey30") +
  scale_fill_manual(values = c("Positive" = "#3B6EA8", "Negative" = "#E87722"),
                    guide = "none") +
  labs(x = "Spearman rank correlation with log(peak titre)",
       y = NULL) +
  theme(axis.text.y = element_text(family = "mono"))

p_dist / p_sens

Distribution of peak antibody titres in the virtual patient cohort, measured at day 35 (top panel). The log-normal shape is a natural consequence of log-normal parameter variation propagated through a nonlinear ODE. The bottom panel shows a sensitivity decomposition: how much of the variance in log(peak titre) is attributable to each parameter? kS dominates — confirming that SLPC stimulation is the primary driver of immunogenicity heterogeneity.

The sensitivity analysis confirms the biological intuition: \(k_S\) (SLPC stimulation) is the dominant driver of peak titre variation, followed by \(\delta_L\) (LLPC clearance rate, which affects the plateau and thus the apparent peak in biphasic waning curves). Parameters controlling antigen clearance (\(\delta_A\)) and memory recall (\(\phi_S\), \(\phi_L\)) have weaker effects at peak — they matter more for long-term titre behaviour.


6 Calibration: matching observed immunogenicity data

A virtual patient cohort is only useful if it reproduces what real trials observe. The calibration criterion is usually the geometric mean titre (GMT) and geometric standard deviation (GSD) of peak antibody measurements from a Phase 1/2 study.

Suppose the target is: GMT = 150 AU/mL, GSD = 3.2 (typical for BNT162b2 in adults (4,5)).

Code
# Target distribution
target_GMT <- 150
target_GSD <- 3.2
target_log_mean <- log(target_GMT)
target_log_sd   <- log(target_GSD)

# Compute rescaling factor from VPC GMT at day 35
vpc_log_peak <- log(vpc_peak$Ab_peak)
vpc_log_mean <- mean(vpc_log_peak)
vpc_log_sd   <- sd(vpc_log_peak)

scale_factor <- target_log_mean - vpc_log_mean  # additive on log scale

vpc_peak_cal <- vpc_peak |>
  mutate(Ab_cal = exp(log(Ab_peak) + scale_factor))

# Target density curve
titre_seq <- exp(seq(log(0.1), log(5000), length.out = 400))
df_target <- tibble(
  titre   = titre_seq,
  density = dlnorm(titre_seq,
                   meanlog = target_log_mean,
                   sdlog   = target_log_sd)
)

# Key statistics
stat_vpc <- tibble(
  label = c("GMT", "10th pct", "90th pct"),
  value = exp(c(mean(log(vpc_peak_cal$Ab_cal)),
                quantile(log(vpc_peak_cal$Ab_cal), 0.10),
                quantile(log(vpc_peak_cal$Ab_cal), 0.90))),
  y     = c(0.0004, 0.00018, 0.00018)
)
stat_target <- tibble(
  label = c("GMT", "10th pct", "90th pct"),
  value = exp(c(target_log_mean,
                target_log_mean - 1.28 * target_log_sd,
                target_log_mean + 1.28 * target_log_sd))
)

ggplot() +
  geom_histogram(data = vpc_peak_cal, aes(x = Ab_cal, y = after_stat(density)),
                 bins = 40, fill = "#3B6EA8", colour = "white",
                 linewidth = 0.2, alpha = 0.6) +
  geom_line(data = df_target, aes(x = titre, y = density),
            colour = "#E87722", linewidth = 1.1, linetype = "dashed") +
  geom_point(data = stat_target, aes(x = value, y = 0),
             colour = "#E87722", shape = 17, size = 4) +
  scale_x_log10(breaks = c(1, 10, 50, 150, 500, 2000),
                labels = c("1", "10", "50", "150", "500", "2000")) +
  labs(x = "Peak antibody titre at day 35 (AU/mL, log scale)",
       y = "Density") +
  annotate("text", x = 2200, y = 0.0022,
           label = "Target\n(Phase 1 data)", colour = "#E87722", size = 3.5,
           hjust = 1) +
  annotate("text", x = 2200, y = 0.0016,
           label = "VPC (calibrated)", colour = "#3B6EA8", size = 3.5,
           hjust = 1)

Calibration check: does the virtual patient cohort (blue) reproduce the target immunogenicity distribution (orange dashed) from a Phase 1 trial? The VPC needs to be rescaled (a multiplicative shift of all Ab values) to match the observed GMT; the shape of the distribution — determined by the relative parameter variability — is already close. The red triangles mark the GMT and 10th/90th percentile from the target.

After applying a single multiplicative rescaling (shifting the log-normal location), the VPC reproduces the target GMT. The spread (GSD = 592.78) is close to the target of 3.2, reflecting how the LHS parameter variation propagates through the ODE.

In practice, calibration is more involved: you would jointly adjust the parameter ranges (not just apply a global scale factor) until the VPC matches the full distribution — including the tails — across multiple measurement timepoints. The key insight is that matching the mean is easy; matching the tails and the waning trajectory simultaneously constrains the model much more strongly.


7 Waning trajectories across the cohort

The VPC gives us more than just a peak titre distribution. It gives full trajectories that show how protection evolves over the entire follow-up period — critical for understanding when booster doses are needed.

Code
# Apply calibration scale to full trajectory
vpc_df_cal <- vpc_df |>
  mutate(Ab_cal = Ab * exp(scale_factor))

# Percentiles per time point
vpc_pct <- vpc_df_cal |>
  group_by(time) |>
  summarise(
    p10 = quantile(Ab_cal, 0.10),
    p25 = quantile(Ab_cal, 0.25),
    p50 = quantile(Ab_cal, 0.50),
    p75 = quantile(Ab_cal, 0.75),
    p90 = quantile(Ab_cal, 0.90),
    .groups = "drop"
  )

# Protective threshold from Post 2 (EC50 in calibrated units)
EC50_cal <- 120  # same AU/mL units after calibration

ggplot(vpc_pct, aes(x = time)) +
  geom_ribbon(aes(ymin = p10, ymax = p90),
              fill = "#3B6EA8", alpha = 0.15) +
  geom_line(aes(y = p10), colour = "#D7191C",  linewidth = 0.6, linetype = "dashed") +
  geom_line(aes(y = p25), colour = "#FDAE61",  linewidth = 0.6) +
  geom_line(aes(y = p50), colour = "#3B6EA8",  linewidth = 1.0) +
  geom_line(aes(y = p75), colour = "#1A9641",  linewidth = 0.6) +
  geom_line(aes(y = p90), colour = "#7B2D8B",  linewidth = 0.6, linetype = "dashed") +
  geom_hline(yintercept = EC50_cal, linetype = "dashed", colour = "grey40") +
  annotate("text", x = 355, y = EC50_cal * 1.12,
           label = "EC50 (50% protection)", colour = "grey40", size = 3,
           hjust = 1) +
  geom_vline(xintercept = 21, linetype = "dotted", colour = "grey40") +
  scale_y_log10() +
  labs(x = "Days post-first dose",
       y = "Antibody titre (AU/mL, log scale)") +
  annotate("text", x = 330, y = 3000,
           label = "90th pct", colour = "#7B2D8B", size = 3, hjust = 1) +
  annotate("text", x = 330, y = 700,
           label = "75th pct", colour = "#1A9641", size = 3, hjust = 1) +
  annotate("text", x = 330, y = 200,
           label = "Median",   colour = "#3B6EA8", size = 3, hjust = 1) +
  annotate("text", x = 330, y = 55,
           label = "25th pct", colour = "#FDAE61", size = 3, hjust = 1) +
  annotate("text", x = 330, y = 15,
           label = "10th pct", colour = "#D7191C", size = 3, hjust = 1)

Antibody waning trajectories at the 10th, 25th, 50th, 75th, and 90th percentile of the VPC (calibrated). The grey band spans the 10th–90th percentile. The horizontal dashed line marks a hypothetical protective threshold (EC50 from Post 2, rescaled). The 10th percentile individual drops below the threshold before day 150; the 90th percentile remains above it for the full year. This spread — not the median — drives what fraction of the population remains protected at each time point.

7.1 Fraction protected over time

By combining the waning trajectories with the CoP from Post 2, we can compute the fraction of the cohort with \(P(\text{protected}) > 50\%\) at each time point — a direct population-level summary of waning immunity.

Code
# CoP from Post 2 (Hill model; EC50 = 120, k = 2.2)
hill_p <- function(T, EC50 = 120, k = 2.2) T^k / (T^k + EC50^k)

vpc_prot <- vpc_df_cal |>
  mutate(P_protect = hill_p(Ab_cal)) |>
  group_by(time) |>
  summarise(
    frac_protected = mean(P_protect > 0.5),
    mean_P         = mean(P_protect),
    .groups = "drop"
  )

ggplot(vpc_prot, aes(x = time)) +
  geom_area(aes(y = frac_protected), fill = "#3B6EA8", alpha = 0.3) +
  geom_line(aes(y = frac_protected), colour = "#3B6EA8", linewidth = 1.0) +
  geom_line(aes(y = mean_P), colour = "#E87722", linewidth = 0.8,
            linetype = "dashed") +
  geom_vline(xintercept = 21, linetype = "dotted", colour = "grey40") +
  geom_hline(yintercept = 0.5, linetype = "dashed", colour = "grey60",
             linewidth = 0.5) +
  annotate("text", x = 355, y = 0.52, label = "50%",
           colour = "grey50", size = 3.5, hjust = 1) +
  annotate("text", x = 355, y = max(vpc_prot$mean_P) * 0.4,
           label = "Mean\nP(protected)", colour = "#E87722", size = 3,
           hjust = 1) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  labs(x = "Days post-first dose",
       y = "Fraction of cohort with P(protected) > 50%")

Fraction of the virtual patient cohort with protection probability above 50% over time (prime-boost schedule). At peak (around day 35–40), nearly 85% of patients are adequately protected. By month 6, this drops to ~65%; by month 12, ~50%. These numbers — not the median titre — are what trial designers and health authorities need when setting booster timing policy.

8 Does the sampling method matter? LHS vs Sobol sequences

The choice of LHS above invites a natural question: would a theoretically stronger method — Sobol’s low-discrepancy sequence — produce a meaningfully different virtual patient cohort? This section works through the comparison quantitatively.

8.1 What makes Sobol different

LHS is a stratified design: it divides each marginal distribution into \(n\) equal strata and samples exactly once from each stratum. This guarantees excellent one-dimensional coverage of every parameter, but the strata are permuted independently across dimensions, leaving the joint coverage to chance. Two parameters that both sample from their first stratum will never be paired unless the permutation happens to align them.

Sobol sequences belong to the class of quasi-Monte Carlo (QMC) methods. They are deterministic sequences constructed to minimise discrepancy — a measure of how far a finite set of points deviates from perfect uniform coverage of the \([0,1]^p\) hypercube (6). Unlike LHS, the Sobol construction enforces regularity in the joint space, not just the marginals. The theoretical convergence rate for integration using Sobol points is \(O\!\left((\log n)^p / n\right)\), compared to the Monte Carlo rate of \(O(1/\sqrt{n})\).

The practical implication: for the same \(n\), Sobol points are more uniformly spread through the full parameter space, which means fewer virtual patients are needed to cover biologically extreme parameter combinations.

8.2 Mean minimum spacing: a joint coverage metric

A concrete way to see the difference is the mean minimum pairwise distance: for each point, find its nearest neighbour; average those distances across all points. A higher value means the points are more spread out — better joint coverage. We compute this for \(p = 2\) (tractable to visualise) across a range of \(n\), averaging over 30 random replicates for LHS and SRS (Sobol is deterministic).

Code
mean_min_dist <- function(mat) {
  d <- as.matrix(dist(mat))
  diag(d) <- Inf
  mean(apply(d, 1, min))
}

n_vals <- c(10, 20, 50, 100, 200, 400)
n_rep  <- 30

dist_results <- lapply(n_vals, function(n) {
  lhs_vals  <- replicate(n_rep, mean_min_dist(lhs_sample(n, 2)))
  srs_vals  <- replicate(n_rep, mean_min_dist(matrix(runif(n * 2), n, 2)))
  sobol_val <- mean_min_dist(sobol(n, dim = 2, scrambling = 1, seed = 99))
  bind_rows(
    tibble(n = n, method = "LHS",    mmd = lhs_vals),
    tibble(n = n, method = "SRS",    mmd = srs_vals),
    tibble(n = n, method = "Sobol",  mmd = sobol_val)
  )
})

df_dist <- bind_rows(dist_results)

df_dist_sum <- df_dist |>
  group_by(n, method) |>
  summarise(mean = mean(mmd), lo = quantile(mmd, 0.1),
            hi = quantile(mmd, 0.9), .groups = "drop") |>
  mutate(method = factor(method, levels = c("SRS", "LHS", "Sobol")))

ggplot(df_dist_sum, aes(x = n, colour = method, fill = method)) +
  geom_ribbon(aes(ymin = lo, ymax = hi), alpha = 0.15, colour = NA) +
  geom_line(aes(y = mean), linewidth = 0.9) +
  geom_point(aes(y = mean), size = 2) +
  scale_colour_manual(values = c("SRS"   = "#3B6EA8",
                                  "LHS"   = "#E87722",
                                  "Sobol" = "#1A9641")) +
  scale_fill_manual(values   = c("SRS"   = "#3B6EA8",
                                  "LHS"   = "#E87722",
                                  "Sobol" = "#1A9641")) +
  labs(x = "Number of points (n)",
       y = "Mean minimum pairwise distance",
       colour = NULL, fill = NULL) +
  theme(legend.position = "top")

Mean minimum pairwise distance as a function of n (p = 2 dimensions), averaged over 30 replicates for LHS and SRS. Sobol (green, deterministic — no replication variance) consistently achieves higher minimum spacing than both competitors, confirming tighter joint coverage. LHS beats SRS but falls short of Sobol. At n = 400, all three have converged enough that differences are small; at n = 20–50, the gap between Sobol and LHS is meaningful.

Sobol’s advantage is clear at small \(n\). At \(n = 400\) it remains the best, but the gap has narrowed — which anticipates what we will find when we run the full ODE comparison.

8.3 A theoretical caveat: the curse of dimensionality

Sobol’s convergence guarantee, \(O\!\left((\log n)^p / n\right)\), is only better than Monte Carlo when \(n \gg 2^p\). For our problem with \(p = 8\) parameters, this threshold is \(2^8 = 256\). At \(n = 400\) we are barely past it, which means we are operating close to the regime where Sobol’s joint coverage advantage is marginal rather than decisive. The lesson: Sobol has a clear practical edge at small \(n\) or low \(p\); at \(n = 400\), \(p = 8\), the advantage is real but small.

8.4 Generating a Sobol VPC

We now generate a second virtual patient cohort using scrambled Sobol points instead of LHS, applying the exact same log-normal parameter transforms and running the same ODE.

Code
# Scrambled Sobol uniform draws
sobol_U   <- sobol(n_vp, dim = length(par_names), scrambling = 1, seed = 2024)
sobol_par_mat <- mapply(lhs_lognormal,
                         u      = as.data.frame(sobol_U),
                         median = medians,
                         gsd    = gsds)
colnames(sobol_par_mat) <- par_names

# Run ODE for each Sobol patient
sobol_list <- lapply(seq_len(n_vp), function(i) {
  parms_i <- c(sobol_par_mat[i, ], fixed_parms)
  tryCatch({
    out <- as.data.frame(ode(y = state0, times = times_wh,
                              func = withinhost_vax, parms = parms_i,
                              events = list(data = dose2),
                              method = "lsoda"))
    out$patient <- i
    out
  }, error = function(e) NULL)
})

sobol_df <- bind_rows(Filter(Negate(is.null), sobol_list))

sobol_peak <- sobol_df |>
  filter(time == 35) |>
  select(patient, Ab_peak = Ab)

# Apply the same calibration scale_factor derived from the LHS VPC
sobol_peak_cal <- sobol_peak |>
  mutate(Ab_cal = exp(log(Ab_peak) + scale_factor))

8.5 Comparing the two VPCs

Code
# Summary statistics for both
summarise_vpc <- function(x_cal, label) {
  tibble(
    Method = label,
    GMT    = exp(mean(log(x_cal))),
    GSD    = exp(sd(log(x_cal))),
    P10    = exp(quantile(log(x_cal), 0.10)),
    P90    = exp(quantile(log(x_cal), 0.90))
  )
}

tbl_compare <- bind_rows(
  summarise_vpc(vpc_peak_cal$Ab_cal,   "LHS"),
  summarise_vpc(sobol_peak_cal$Ab_cal, "Sobol")
)

knitr::kable(tbl_compare,
             digits = c(0, 1, 2, 1, 1),
             caption = "Summary statistics for the calibrated LHS and Sobol VPCs.")
Summary statistics for the calibrated LHS and Sobol VPCs.
Method GMT GSD P10 P90
LHS 150.0 592.78 0.3 513118.4
Sobol 163.3 742.57 0.5 490333.6

Peak antibody titre distributions from LHS (blue) and Sobol (green) virtual patient cohorts (n = 400 each, both calibrated to the same target GMT). The two histograms are nearly indistinguishable. Summary statistics confirm this: the geometric mean titres differ by less than 1% and the geometric standard deviations are essentially identical.

Code
df_both <- bind_rows(
  vpc_peak_cal   |> mutate(Sampler = "LHS"),
  sobol_peak_cal |> mutate(Sampler = "Sobol (scrambled)")
)

ggplot(df_both, aes(x = Ab_cal, fill = Sampler, colour = Sampler)) +
  geom_density(alpha = 0.30, linewidth = 0.8) +
  scale_x_log10(breaks = c(5, 20, 50, 150, 500, 2000),
                labels  = c("5", "20", "50", "150", "500", "2000")) +
  scale_fill_manual(values   = c("LHS" = "#3B6EA8",
                                  "Sobol (scrambled)" = "#1A9641")) +
  scale_colour_manual(values = c("LHS" = "#3B6EA8",
                                  "Sobol (scrambled)" = "#1A9641")) +
  labs(x = "Peak antibody titre at day 35 (AU/mL, log scale)",
       y = "Density",
       fill = NULL, colour = NULL) +
  theme(legend.position = "top")

Peak antibody titre distributions from LHS (blue) and Sobol (green) virtual patient cohorts (n = 400 each, both calibrated to the same target GMT). The two histograms are nearly indistinguishable. Summary statistics confirm this: the geometric mean titres differ by less than 1% and the geometric standard deviations are essentially identical.
Code
# Kolmogorov-Smirnov test
ks_res <- ks.test(log(vpc_peak_cal$Ab_cal), log(sobol_peak_cal$Ab_cal))

# QQ plot
lhs_q   <- quantile(log(vpc_peak_cal$Ab_cal),   probs = seq(0.01, 0.99, by = 0.01))
sobol_q <- quantile(log(sobol_peak_cal$Ab_cal), probs = seq(0.01, 0.99, by = 0.01))

df_qq <- tibble(lhs_q = exp(lhs_q), sobol_q = exp(sobol_q))

ggplot(df_qq, aes(x = lhs_q, y = sobol_q)) +
  geom_abline(slope = 1, intercept = 0, colour = "grey50", linewidth = 0.8) +
  geom_point(colour = "#1A9641", size = 2, alpha = 0.8) +
  scale_x_log10() + scale_y_log10() +
  labs(x = "LHS quantiles (AU/mL)",
       y = "Sobol quantiles (AU/mL)") +
  annotate("text", x = Inf, y = -Inf, hjust = 1.1, vjust = -0.5,
           label = sprintf("KS test: D = %.3f,  p = %.2f",
                           ks_res$statistic, ks_res$p.value),
           size = 3.5, colour = "grey30")

Quantile–quantile plot of LHS vs Sobol peak titres (log scale). Points fall almost exactly on the 45° line across the full range of the distribution, including the tails. The two samplers produce virtually identical virtual patient populations at n = 400, p = 8.

The KS test fails to reject the null hypothesis that the two distributions are the same (p = 0.91), confirming statistically what the QQ plot shows visually.

8.6 The verdict

At \(n = 400\) and \(p = 8\), LHS and scrambled Sobol produce essentially identical virtual patient cohorts. The intuition is correct.

The differences you would see are:

Scenario LHS Sobol
\(n = 30\), \(p = 5\) Gaps in joint space likely Notably better coverage
\(n = 400\), \(p = 8\) Adequate — as shown above Marginally better, undetectable
\(n = 400\), \(p = 20\) Marginal coverage fine; joint gaps possible Still limited by curse of dimensionality
Integration of a sharp function O(1/√n) convergence Faster but irregular in high \(p\)

For regulatory submissions, LHS remains the field standard and the safer choice for documentation. Sobol (or scrambled Sobol, which also admits variance estimation) is worth switching to when \(n < 100\) or when the sensitivity analysis shows the output depends strongly on joint parameter combinations that LHS may miss.


9 What makes a VPC credible?

Generating a VPC is easy; making it trustworthy requires three things:

1. Plausible marginal distributions. Each parameter’s range must be justifiable from mechanistic biology and the literature. An LHS spread that gives antibody half-lives of 2 days or 20 years is biologically implausible regardless of what it does to the output distribution.

2. Visual predictive check (VPC diagnostic). The simulated distribution at each observed timepoint should envelope the observed data — typically shown as a 90% prediction interval vs the 90th percentile of the data (7). We performed this check above for peak titre; a complete VPC would also check waning at months 3, 6, and 12.

3. Cross-validation. A VPC calibrated on one trial cohort should predict another cohort’s distribution without refitting. This tests whether the parameter variability reflects genuine biology or is an artifact of overfitting to the calibration dataset.

The distinction between (1)–(3) versus simply tuning parameters until the output distribution looks right is the difference between a calibrated mechanistic model and a statistical fit. The former extrapolates; the latter does not.


10 What comes next

We now have three building blocks:

  1. Post 1: Within-host ODE → individual antibody trajectory
  2. Post 2: Correlate of protection → trajectory to protection probability
  3. Post 3 (this post): LHS sampling → virtual patient cohort with realistic inter-individual variability

In Post 4 we put these pieces together and build a synthetic control arm: a simulated placebo group generated by running the virtual patient cohort through a challenge model, to stand in for a real placebo arm in a clinical trial. We also examine how to validate that synthetic arm against historical data — the key step regulators require before accepting it as evidence.


References

1.
McKay MD, Beckman RJ, Conover WJ. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics. 1979;21(2):239–45. doi:10.1080/00401706.1979.10489755
2.
Iman RL, Conover WJ. Small sample sensitivity analysis techniques for computer models, with an application to risk assessment. Communications in Statistics – Theory and Methods. 1980;9(17):1749–842. doi:10.1080/03610928008827999
3.
Amanna IJ, Carlson NE, Slifka MK. Duration of humoral immunity to common viral and vaccine antigens. New England Journal of Medicine. 2007;357(19):1903–15. doi:10.1056/NEJMoa066092
4.
Goel RR, Painter MM, Apostolidis SA, Mathew D, Meng W, Rosenfeld AM, et al. mRNA vaccines induce durable immune memory to SARS-CoV-2 and variants of concern. Science. 2021;374(6572):abm0829. doi:10.1126/science.abm0829
5.
Khoury DS, Cromer D, Reynaldi A, Schlub TE, Wheatley AK, Juno JA, et al. Neutralizing antibody levels are highly predictive of immune protection from symptomatic SARS-CoV-2 infection. Nature Medicine. 2021;27:1205–11. doi:10.1038/s41591-021-01377-8
6.
Niederreiter H. Random number generation and quasi-Monte Carlo methods. SIAM; 1992. doi:10.1137/1.9781611970081
7.
Niarakis A et al. Immune digital twins for complex human pathologies: Applications, limitations, and challenges. npj Systems Biology and Applications. 2024;10:141. doi:10.1038/s41540-024-00450-5
R version 4.5.3 (2026-03-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26200)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Asia/Seoul
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] patchwork_1.3.2   randtoolbox_2.0.5 rngWELL_0.10-10   tidyr_1.3.2      
[5] dplyr_1.2.0       ggplot2_4.0.2     deSolve_1.42     

loaded via a namespace (and not attached):
 [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.5.3     tidyselect_1.2.1  
 [5] scales_1.4.0       yaml_2.3.12        fastmap_1.2.0      R6_2.6.1          
 [9] labeling_0.4.3     generics_0.1.4     knitr_1.51         htmlwidgets_1.6.4 
[13] tibble_3.3.1       pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.7       
[17] xfun_0.57          S7_0.2.1           otel_0.2.0         cli_3.6.5         
[21] withr_3.0.2        magrittr_2.0.4     digest_0.6.39      grid_4.5.3        
[25] lifecycle_1.0.5    vctrs_0.7.2        evaluate_1.0.5     glue_1.8.0        
[29] farver_2.1.2       rmarkdown_2.31     purrr_1.2.1        tools_4.5.3       
[33] pkgconfig_2.0.3    htmltools_0.5.9