Using fluEvidenceSynthesis to infer epidemological parameters.

Edwin van Leeuwen

2020-01-14

–>

Introduction

This package implements all the tools needed to analyse and compare the effectiveness of different Influenza vaccine programs (see Baguelin et al. (2013) for more details). This analysis has two main steps.

  1. Parameter inference using existing model
  2. Simulation of different possible vaccination strategies using the inferred parameters

This vignette will focus on how the fluEvidenceSynthesis package can be used to perform the first step of parameter inference.

The method we use here combines data from numerous sources that are then used to compute the likelihood of the predicted number of influenza cases in a given week. Given the data and the likelihood function we use MCMC to obtain the posterior distribution of the parameters of an underlying epidemiological model. In this vignette we give practical examples on the required data and how to use the package to perform the MCMC fitting.

The epidemiological model is a compartmental model with Susceptible, Exposed, Infection and Recovered (SEIR) compartments. Waiting time between Exposed and Infectious and between Infectious and Recovered are assumed to be gamma distributed. The epidemiological parameters used in the model are the transmissibility, susceptibility, time latent (time between exposed and susceptible) and the time infectious. Further details of the underlying model and techniques can be found in Baguelin et al. (2013).

The methods are computationally intensive and fitting the model can easily take a couple of hours on a powerfull machine. For example an inference run of 11 million samples (1 million burn in) will take about 8-9 hours on a relatively modern machine (2015, Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz). Of course one can run different years/strains in parallell.

Examples

This vignette includes three examples. The first one has a model of two age groups and one risk group. The second example is based on the UK analysis, which includes 7 age groups, and two risk groups. The final example implements the UK analysis in pure R. This can be used as a starting point on how to implement your own analysis based on our analysis.

Two age groups

In this example we will run parameter inference for a model with two age groups ([0,65) and [65,+)) and one risk group. The four main steps needed are 1) prepare the data, 2) load a vaccination calendar, 3) decide on parameterisation of the model and 4) run the inference.

Prepare the data

The main sets of data needed are the contact data (e.g. from the polymod study), Influenza Like Illness (ILI) counts throughout the year and virological confirmation samples for (a subset) of the ILI diagnosed patients. Note that the example data included in the package is part of the UK data (see example below). For this example we load and simplify the data into two age groups. When you are using your own data then you will need to load that into R instead.

For more details on the input data see ?data_name, e.g. ?polymod_uk will give an overview of the polymod data layout required.

Parameterisation

There are five main parameters included in the model. For more details on this see the modelling vignette and the manuscript (Baguelin et al., 2013).

  • Ascertainment probabilty for three age groups (\(\epsilon_i\))
  • Outside infection (\(\psi\))
  • Transmissibility (\(q\))
  • Susceptibility for three age groups (\(\sigma_i\))
  • Initial number of infections (log transformed; \(I\))

The ascertianment probality and susceptibility are age group specific, so we need two parameters for that. Next we chose relevant initial parameter values. The exact values chosen here are not that relevant, although the closer to the correct values the faster the inference of the parameters inference will converge.

Inference

The inference function returns a list with the accepted parameters (inference.results$batch) and the corresponding log-likelihood values (inference.results$llikelihoods) as well as a matrix (inference.results$contact.ids) containing the row numbers of the contacts data used to build the contact matrix.

Table continues below
epsilon_1 epsilon_2 psi transmissibility susceptibility_1
0.1004 0.09865 0.0002652 0.1591 0.4995
0.1004 0.09865 0.0002652 0.1591 0.4995
0.1004 0.09865 0.0002652 0.1591 0.4995
0.1004 0.09864 0.000263 0.159 0.4995
0.1004 0.09864 0.000263 0.159 0.4995
0.1004 0.09864 0.000263 0.159 0.4995
susceptibility_2 initial_infected
0.4994 -0.1499
0.4994 -0.1499
0.4994 -0.1499
0.4994 -0.15
0.4994 -0.15
0.4994 -0.15

UK based example

The UK model is more complicated than the example above, because we model 7 different age groups (\([0,1), [1,5), [5,15), [15,25), [25,45), [45,65), [65,+)\)) and two risk groups (low risk and high risk), so our vaccination calendar represents that fact. In contrast our ILI and virological data is only recorded in five different age groups (\([0,5), [5,15), [15,45), [45,65), [65,+)\)) and is not split into multiple age groups. This means that we need to map our epidemiological model output (7 age groups, 2 risk groups) to the data (5 age groups). This can be done using the age_group_mapping and risk_group_mapping functions.

Another detail is that we reduced the models complexity by assuming that age group [0,15) and [15,65) had the same ascertainment rate (\(\epsilon\)) and susceptibility. To do this we pass a parameter_map to the inference function, which will map the needed parameters to certain indices in the parameter list (see ?parameter_mapping for more details).

library(fluEvidenceSynthesis)
data("demography")
data("polymod_uk")
data("ili")
# Virologically confirmed infections
data("confirmed.samples")

# UK vaccine calendar
data("vaccine_calendar")

vaccine_calendar$calendar[,15:21] <- 0

age_map <- age_group_mapping(c(1,5,15,25,45,65), c(5,15,45,65))
risk_map <- risk_group_mapping(c("LowRisk", "HighRisk"), c("All"))
# The percentage of each age group in the high risk group
risk_ratios <- matrix(c(0.021, 0.055, 0.098, 0.087, 0.092, 0.183, 0.45), ncol = 7, byrow = T)

par_map <- parameter_mapping(
  epsilon = c(1,1,2,2,3), # The first parameter in the initial.parameters is used for the first 2 age groups, etc.
  psi = 4,
  transmissibility = 5,
  susceptibility = c(6,6,6,7,7,7,8),
  initial_infected = c(9))

initial.parameters <- c(0.01188150, 0.01831852, 0.05434378,
                        1.049317e-05, 0.1657944,
                        0.3855279, 0.9269811, 0.5710709,
                        -0.1543508)
# Adding names for clarity, is not actually needed
names(initial.parameters) <- c("espilon_1", "epsilon_2", "epsilon_3", "psi",
                               "transmissibility", "susceptibility_1", "susceptibility_2", "suceptibility_3",
                               "initial_infected")

inference.results <- inference(demography = demography,
                      vaccine_calendar = vaccine_calendar,
                      polymod_data = as.matrix(polymod_uk),
                      ili = ili$ili,
                      mon_pop = ili$total.monitored,
                      n_pos = confirmed.samples$positive,
                      n_samples = confirmed.samples$total.samples,
                      initial = initial.parameters,
                      age_group_map = age_map,
                      risk_group_map = risk_map,
                      parameter_map = par_map,
                      risk_ratios = risk_ratios,
                      nbatch = 1000,
                      nburn = 1000, blen = 5 )
Table continues below
espilon_1 epsilon_2 epsilon_3 psi transmissibility
0.0112 0.01692 0.05442 2.074e-05 0.1665
0.0112 0.01692 0.05442 2.074e-05 0.1665
0.0112 0.01692 0.05442 2.074e-05 0.1665
0.0112 0.01692 0.05442 2.074e-05 0.1665
0.0112 0.01715 0.05456 1.975e-06 0.1664
0.0112 0.01715 0.05456 1.975e-06 0.1664
susceptibility_1 susceptibility_2 suceptibility_3 initial_infected
0.3858 0.9254 0.5706 -0.1542
0.3858 0.9254 0.5706 -0.1542
0.3858 0.9254 0.5706 -0.1542
0.3858 0.9254 0.5706 -0.1542
0.3859 0.9255 0.5707 -0.1539
0.3859 0.9255 0.5707 -0.1539

More in depth example

Above we used the general inference function to perform parameter inference using the data. In this section we implement this function in R, which allows adjustment of the inference process to allow different models, additional data etc.. This method gives the user more control, but it will be slower than calling the inference function directly.

To perform parameter inference we first need to define a function that returns the (log) likelihood for given parameter values, as dependent on the data and a function which returns the log prior probability of the parameters. These functions are then passed to adaptive.mcmc which returns a posterior sample for the parameter values. By combining a number of data sources, the likelihood function becomes relatively complex, and so needs to perform the following steps (for full details see Baguelin et al., 2013):

Following these steps we can implement our custom inference function as follows:

library(fluEvidenceSynthesis)

# The custom inference function. In this example the custom inference function 
# performs exactly the same inference as the original C++ function (above). 
# It is up to the user to change this in a way that works for their analysis.
custom_inference <- function(demography, vaccine_calendar, polymod_data, ili, 
                             mon_pop, n_pos, n_samples, initial, mapping,
                             nbatch, nburn, blen) {
  current.contact.ids <- seq(1,nrow(polymod_uk))
  proposed.contact.ids <- current.contact.ids
  
  # Seven age groups used in the model
  age.group.limits <- c(1,5,15,25,45,65)
  
  # Sum all populations with a certain age into their corresponding age group
  age.group.sizes.5 <- stratify_by_age(demography, c(5,15,45,65))
  
  if (missing(mapping))
    mapping <- age_group_mapping(age.group.limits, c(5,15,45,65))
  
  # Define the actual log likelihood function
  llikelihood <- function( pars ) {
    # Resample contact ids 
    proposed.contact.ids <<- current.contact.ids
    if (runif(1,0,1) < 0.1) {
      rs <- round(runif(2,1,length(proposed.contact.ids)))
      proposed.contact.ids[rs[1]] <<- rs[2]
    }
    
    contacts <- contact_matrix(as.matrix(polymod_uk[proposed.contact.ids,]),
                               demography, age.group.limits )
    
    age.groups <- stratify_by_age(demography, 
                                  age.group.limits )
    
    # Fraction of each age group classified as high risk
    # We can classify a third risk group, but we are not doing
    # that here (the second row is 0 in our risk.ratios matrix)
    risk.ratios <- matrix(c(
      0.021, 0.055, 0.098, 0.087, 0.092, 0.183, 0.45, 
      0, 0, 0, 0, 0, 0, 0                          
    ), ncol = 7, byrow = T)
    
    # Population sizes in each age and risk group
    popv <- stratify_by_risk(
      age.groups, risk.ratios );
    
    # Population size initially infected by age and risk group
    initial.infected <- rep( 10^pars[9], 7 ) 
    initial.infected <- stratify_by_risk(
      initial.infected, risk.ratios );
    
    # Run simulation
    # Note that to reduce complexity 
    # we are using the same susceptibility parameter for multiple age groups
    odes <- infectionODEs( popv, initial.infected,
                           vaccine_calendar,
                           contacts,
                           c(pars[6], pars[6], pars[6],
                             pars[7], pars[7], pars[7], pars[8]),
                           transmissibility = pars[5],
                           c(0.8,1.8), 7 )
    
    # Ignore times row
    odes <- odes[,2:22]
    
    # Convert the 7 age groups for each risk group to 5 groups
    from <- as.numeric(mapping$from)
    to <- as.numeric(mapping$to)
    converted.odes <- matrix(0, nrow = nrow(odes), ncol = max(to))
    for (i in 1:nrow(mapping)) {
      # all three age groups
      fv <- c(0,7,14) + from[i]
      converted.odes[,to[i]] <- converted.odes[,to[i]] + mapping$weight[i]*rowSums(odes[,fv]) 
    }
    
    # For each week and each group sum log likelihood
    epsilons <- c(pars[1], pars[1], pars[2], pars[2], pars[3])
    ll <- log_likelihood_cases(
      epsilons,pars[4], as.matrix(converted.odes),
      age.group.sizes.5, ili, mon_pop,
      n_pos, n_samples)
    return(ll)
  }
  llprior <- function(pars) {
    if (any(pars[1:8] < 0) || any(pars[1:4] > 1) || any(pars[6:8] > 1)
        || pars[9] < log(0.00001) || pars[9] > log(10) )
      return(-Inf)
    
    lprob <- dnorm(pars[5], 0.1653183, 0.02773053, 1)
    lprob <- lprob + dlnorm(pars[1], -4.493789, 0.2860455, 1)
    lprob <- lprob + dlnorm(pars[2], -4.117028, 0.4751615, 1)
    lprob <- lprob + dlnorm(pars[3], -2.977965, 1.331832, 1)
    
    return(lprob)
  }
  
  # Store the contact ids used during inference
  contact.ids <- list()
  
  # Run adaptive.mcmc
  mcmc.result <- adaptive.mcmc(lprior = llprior, llikelihood = llikelihood, 
                               outfun = function() { 
                                 contact.ids[[length(contact.ids)+1]] <<-  current.contact.ids
                               },
                               acceptfun = function() {
                                 current.contact.ids <<- proposed.contact.ids
                               },
                               nburn = nburn, 
                               initial = initial,
                               nbatch = nbatch, blen = blen)
  mcmc.result$contact.ids <- t(data.frame(contact.ids))
  mcmc.result
}

The resulting custom inference function can be called similarly to the original inference function.

Analysing the results

Plotting the resulting posterior parameter values1.

library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
library(ggplot2)
colnames(inference.results$batch) <- c("eps1", "eps2", "eps3", "psi", "q",
                                       "susc1", "susc2", "susc3", "I0")

ggplot(data=melt(inference.results$batch)) + facet_wrap( ~ Var2, ncol=3, scales="free" ) + geom_histogram(aes(x=value), bins=25)

Posterior model results

Here we plot the credibility intervals of our models. Each plot is the result for one of the seven age groups and shows the number of new cases each week.

library(fluEvidenceSynthesis)
library(ggplot2)

# Function that runs the model given a set of parameters. Most of the function
# has todo with loading the correct inputs for the ODE model
ode.results <- function( pars ) 
{
  data("demography")
  
  age.group.limits <- c(1,5,15,25,45,65)
  contacts <- contact_matrix(as.matrix(polymod_uk),
                             demography, age.group.limits )
  
  age.groups <- stratify_by_age(demography, age.group.limits)
  
  # Fraction of each age group classified as high risk
  # We can classify a third risk group, but we are not doing
  # that here (the second row is 0 in our risk.ratios matrix)
  risk.ratios <- matrix( c(
    0.021, 0.055, 0.098, 0.087, 0.092, 0.183, 0.45, 
    0, 0, 0, 0, 0, 0, 0                          
  ), ncol=7, byrow=T )
  
  # Population sizes in each age and risk group
  popv <- stratify_by_risk(
    age.groups, risk.ratios )
  
  # Population size initially infected by age and risk group
  initial.infected <- rep( 10^pars[9], 7 )
  initial.infected <- stratify_by_risk(
    initial.infected, risk.ratios )
  
  # Run simulation
  # Note that to reduce complexity 
  # by using the same susceptibility parameter for multiple age groups
  odes <- infectionODEs( popv, initial.infected,
                         vaccine_calendar,
                         contacts,
                         c(pars[6],pars[6],pars[6],
                           pars[7],pars[7],pars[7],pars[8]),
                         transmissibility=pars[5],
                         c(0.8,1.8), 7 )
  
  # For simplicity we sum the low and high risk group
  simplify.odes <- odes[,2:8]+odes[,9:15]+odes[,16:22]
  rownames(simplify.odes) <- odes[,1]
  return( simplify.odes )
}

# Calculate the credibility intervals for each time point. By default the function
# calculate it for the (equal tailed) credibility interval of 0% (median), 50% and 98%
cim <- credible.interval.model(ode.results, inference.results$batch, intervals=c(0,0.5, 0.98))
cim$row.ID <- as.Date(as.character(cim$row.ID)) 

ggplot( data=cim ) + facet_wrap( ~ column.ID, ncol=3, scales="free" ) +
  stat_ci( aes(row.ID,value, ci = aggregate) ) +
  xlab("Time") + ylab("New infections")

Possible exercise(s)

  1. Plot the covariance between parameters

References

Baguelin, Marc, Stefan Flasche, Anton Camacho, Nikolaos Demiris, Elizabeth Miller, and W. John Edmunds. ‘Assessing Optimal Target Populations for Influenza Vaccination Programmes: An Evidence Synthesis and Modelling Study.’ PLoS Med 10, no. 10 (2013): e1001527. doi:10.1371/journal.pmed.1001527.


  1. Be aware that these results are for a very short mcmc run and are not realistic