Simulating Correlated Outbreak Parameters Using Copulas
Author
Jong-Hoon Kim
Published
July 25, 2025
I wanted to simulate synthetic outbreaks based on realistic distributions for the following three parameters:
\(t_{\text{peak}}\): Time to peak (weeks after outbreak start)
\(I_{\text{peak}}\): Peak weekly incidence (number of cases)
\(D\): Total outbreak duration (in weeks)
These parameters are typically right-skewed, strictly positive, and correlated. A naive approach that samples each parameter independently fails to capture their interdependencies. Instead, we can use a copula-based approach to model their joint distribution realistically.
In this post, we will:
Explain copulas and how they help capture correlations between different distributions.
Simulate fake outbreak parameter data using a Gaussian copula.
Fit a hierarchical Bayesian copula model using rstan.
1 What is a Copula?
A copula is a multivariate distribution that allows us to model dependence structures separately from marginal distributions. Given random variables \(X_1, X_2, X_3\) with different marginal distributions (e.g., log-normal, negative binomial, gamma), a copula binds them together using their ranks (uniform CDF values).
The Gaussian copula uses a multivariate normal distribution on the inverse CDF-transformed variables:
mod <-stan_model("copula_hier.stan")fit <-sampling(mod, data = stan_data,chains =4, iter =1500, warmup =750,seed =2025,control =list(adapt_delta =0.9))print(fit, pars =c("mu_t0","mu_I0","mu_D0","sigma_t","phi_I","shape_D","rate_D"))
Inference for Stan model: anon_model.
4 chains, each with iter=1500; warmup=750; thin=1;
post-warmup draws per chain=750, total post-warmup draws=3000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
mu_t0 1.52 0.00 0.17 1.19 1.41 1.52 1.63 1.88 1231 1
mu_I0 3.18 0.01 0.33 2.52 2.97 3.17 3.37 3.84 743 1
mu_D0 2.73 0.02 1.44 -0.03 1.74 2.72 3.71 5.47 5547 1
sigma_t 0.59 0.00 0.08 0.45 0.53 0.58 0.63 0.77 1470 1
phi_I 0.27 0.00 0.08 0.15 0.22 0.26 0.32 0.45 2711 1
shape_D 3.28 0.01 0.62 2.18 2.84 3.23 3.67 4.59 1970 1
rate_D 0.25 0.00 0.05 0.16 0.21 0.24 0.28 0.35 1966 1
Samples were drawn using NUTS(diag_e) at Fri Jul 25 16:08:17 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Using copulas allows us to flexibly simulate and estimate correlated epidemiological parameters without assuming a joint normal distribution. This approach can be extended to include additional outbreak features (e.g., start week, region, climate covariates) and can be combined with downstream simulations for vaccine impact or ORI strategies.
6 Step 5: Compare Posterior Estimates vs. True Values