–>
This package implements all the tools needed to analyse and compare the effectiveness of different Influenza vaccine programs (see Baguelin et al. (2013) for more details). This analysis has two main steps.
This vignette will focus on how the fluEvidenceSynthesis package can be used to perform the first step of parameter inference.
The method we use here combines data from numerous sources that are then used to compute the likelihood of the predicted number of influenza cases in a given week. Given the data and the likelihood function we use MCMC to obtain the posterior distribution of the parameters of an underlying epidemiological model. In this vignette we give practical examples on the required data and how to use the package to perform the MCMC fitting.
The epidemiological model is a compartmental model with Susceptible, Exposed, Infection and Recovered (SEIR) compartments. Waiting time between Exposed and Infectious and between Infectious and Recovered are assumed to be gamma distributed. The epidemiological parameters used in the model are the transmissibility, susceptibility, time latent (time between exposed and susceptible) and the time infectious. Further details of the underlying model and techniques can be found in Baguelin et al. (2013).
The methods are computationally intensive and fitting the model can easily take a couple of hours on a powerfull machine. For example an inference run of 11 million samples (1 million burn in) will take about 8-9 hours on a relatively modern machine (2015, Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz). Of course one can run different years/strains in parallell.
This vignette includes three examples. The first one has a model of two age groups and one risk group. The second example is based on the UK analysis, which includes 7 age groups, and two risk groups. The final example implements the UK analysis in pure R. This can be used as a starting point on how to implement your own analysis based on our analysis.
In this example we will run parameter inference for a model with two age groups ([0,65) and [65,+)) and one risk group. The four main steps needed are 1) prepare the data, 2) load a vaccination calendar, 3) decide on parameterisation of the model and 4) run the inference.
The main sets of data needed are the contact data (e.g. from the polymod study), Influenza Like Illness (ILI) counts throughout the year and virological confirmation samples for (a subset) of the ILI diagnosed patients. Note that the example data included in the package is part of the UK data (see example below). For this example we load and simplify the data into two age groups. When you are using your own data then you will need to load that into R instead.
# Here we load the needed data and regroup it from 5 into two age groups.
library(fluEvidenceSynthesis)
data("demography")
data("polymod_uk")
data("ili")
# Virologically confirmed infections
data("confirmed.samples")
# Group all the example data into two age groups:
polymod <- polymod_uk[,c(1,2)]
polymod[,3] <- rowSums(polymod_uk[,c(3,4,5,6,7,8)])
polymod[,4] <- polymod_uk[,9]
ili_df <- ili
ili_df$ili <- ili$ili[,c(1,2)]
ili_df$ili[,1] <- rowSums(ili$ili[,c(1,2,3,4)])
ili_df$ili[,2] <- ili$ili[,5]
ili_df$total.monitored <- ili$total.monitored[,c(1,2)]
ili_df$total.monitored[,1] <- rowSums(ili$total.monitored[,c(1,2,3,4)])
ili_df$total.monitored[,2] <- ili$total.monitored[,5]
confirmed.samples_df <- confirmed.samples
confirmed.samples_df$positive <- confirmed.samples$positive[,c(1,2)]
confirmed.samples_df$positive[,1] <- rowSums(confirmed.samples$positive[,c(1,2,3,4)])
confirmed.samples_df$positive[,2] <- confirmed.samples$positive[,5]
confirmed.samples_df$total.samples <- confirmed.samples$total.samples[,c(1,2)]
confirmed.samples_df$total.samples[,1] <- rowSums(confirmed.samples$total.samples[,c(1,2,3,4)])
confirmed.samples_df$total.samples[,2] <- confirmed.samples$total.samples[,5]
For more details on the input data see ?data_name
, e.g. ?polymod_uk
will give an overview of the polymod data layout required.
We will use a similar vaccination calendar as the first example in the vaccination vignette but simplified to only have one risk group:
There are five main parameters included in the model. For more details on this see the modelling vignette and the manuscript (Baguelin et al., 2013).
The ascertianment probality and susceptibility are age group specific, so we need two parameters for that. Next we chose relevant initial parameter values. The exact values chosen here are not that relevant, although the closer to the correct values the faster the inference of the parameters inference will converge.
inference.results <- inference(demography = demography,
vaccine_calendar = vaccine_calendar,
polymod_data = as.matrix(polymod),
ili = ili_df$ili,
mon_pop = ili_df$total.monitored,
n_pos = confirmed.samples_df$positive,
n_samples = confirmed.samples_df$total.samples,
initial = initial_parameters,
age_groups = c(65),
nbatch = 1000,
nburn = 1000, blen = 5 )
The inference
function returns a list with the accepted parameters (inference.results$batch
) and the corresponding log-likelihood values (inference.results$llikelihoods
) as well as a matrix (inference.results$contact.ids
) containing the row numbers of the contacts data used to build the contact matrix.
epsilon_1 | epsilon_2 | psi | transmissibility | susceptibility_1 |
---|---|---|---|---|
0.1004 | 0.09865 | 0.0002652 | 0.1591 | 0.4995 |
0.1004 | 0.09865 | 0.0002652 | 0.1591 | 0.4995 |
0.1004 | 0.09865 | 0.0002652 | 0.1591 | 0.4995 |
0.1004 | 0.09864 | 0.000263 | 0.159 | 0.4995 |
0.1004 | 0.09864 | 0.000263 | 0.159 | 0.4995 |
0.1004 | 0.09864 | 0.000263 | 0.159 | 0.4995 |
susceptibility_2 | initial_infected |
---|---|
0.4994 | -0.1499 |
0.4994 | -0.1499 |
0.4994 | -0.1499 |
0.4994 | -0.15 |
0.4994 | -0.15 |
0.4994 | -0.15 |
The UK model is more complicated than the example above, because we model 7 different age groups (\([0,1), [1,5), [5,15), [15,25), [25,45), [45,65), [65,+)\)) and two risk groups (low risk and high risk), so our vaccination calendar represents that fact. In contrast our ILI and virological data is only recorded in five different age groups (\([0,5), [5,15), [15,45), [45,65), [65,+)\)) and is not split into multiple age groups. This means that we need to map our epidemiological model output (7 age groups, 2 risk groups) to the data (5 age groups). This can be done using the age_group_mapping
and risk_group_mapping
functions.
Another detail is that we reduced the models complexity by assuming that age group [0,15) and [15,65) had the same ascertainment rate (\(\epsilon\)) and susceptibility. To do this we pass a parameter_map
to the inference function, which will map the needed parameters to certain indices in the parameter list (see ?parameter_mapping
for more details).
library(fluEvidenceSynthesis)
data("demography")
data("polymod_uk")
data("ili")
# Virologically confirmed infections
data("confirmed.samples")
# UK vaccine calendar
data("vaccine_calendar")
vaccine_calendar$calendar[,15:21] <- 0
age_map <- age_group_mapping(c(1,5,15,25,45,65), c(5,15,45,65))
risk_map <- risk_group_mapping(c("LowRisk", "HighRisk"), c("All"))
# The percentage of each age group in the high risk group
risk_ratios <- matrix(c(0.021, 0.055, 0.098, 0.087, 0.092, 0.183, 0.45), ncol = 7, byrow = T)
par_map <- parameter_mapping(
epsilon = c(1,1,2,2,3), # The first parameter in the initial.parameters is used for the first 2 age groups, etc.
psi = 4,
transmissibility = 5,
susceptibility = c(6,6,6,7,7,7,8),
initial_infected = c(9))
initial.parameters <- c(0.01188150, 0.01831852, 0.05434378,
1.049317e-05, 0.1657944,
0.3855279, 0.9269811, 0.5710709,
-0.1543508)
# Adding names for clarity, is not actually needed
names(initial.parameters) <- c("espilon_1", "epsilon_2", "epsilon_3", "psi",
"transmissibility", "susceptibility_1", "susceptibility_2", "suceptibility_3",
"initial_infected")
inference.results <- inference(demography = demography,
vaccine_calendar = vaccine_calendar,
polymod_data = as.matrix(polymod_uk),
ili = ili$ili,
mon_pop = ili$total.monitored,
n_pos = confirmed.samples$positive,
n_samples = confirmed.samples$total.samples,
initial = initial.parameters,
age_group_map = age_map,
risk_group_map = risk_map,
parameter_map = par_map,
risk_ratios = risk_ratios,
nbatch = 1000,
nburn = 1000, blen = 5 )
espilon_1 | epsilon_2 | epsilon_3 | psi | transmissibility |
---|---|---|---|---|
0.0112 | 0.01692 | 0.05442 | 2.074e-05 | 0.1665 |
0.0112 | 0.01692 | 0.05442 | 2.074e-05 | 0.1665 |
0.0112 | 0.01692 | 0.05442 | 2.074e-05 | 0.1665 |
0.0112 | 0.01692 | 0.05442 | 2.074e-05 | 0.1665 |
0.0112 | 0.01715 | 0.05456 | 1.975e-06 | 0.1664 |
0.0112 | 0.01715 | 0.05456 | 1.975e-06 | 0.1664 |
susceptibility_1 | susceptibility_2 | suceptibility_3 | initial_infected |
---|---|---|---|
0.3858 | 0.9254 | 0.5706 | -0.1542 |
0.3858 | 0.9254 | 0.5706 | -0.1542 |
0.3858 | 0.9254 | 0.5706 | -0.1542 |
0.3858 | 0.9254 | 0.5706 | -0.1542 |
0.3859 | 0.9255 | 0.5707 | -0.1539 |
0.3859 | 0.9255 | 0.5707 | -0.1539 |
Above we used the general inference
function to perform parameter inference using the data. In this section we implement this function in R, which allows adjustment of the inference process to allow different models, additional data etc.. This method gives the user more control, but it will be slower than calling the inference
function directly.
To perform parameter inference we first need to define a function that returns the (log) likelihood for given parameter values, as dependent on the data and a function which returns the log prior probability of the parameters. These functions are then passed to adaptive.mcmc
which returns a posterior sample for the parameter values. By combining a number of data sources, the likelihood function becomes relatively complex, and so needs to perform the following steps (for full details see Baguelin et al., 2013):
Following these steps we can implement our custom inference function as follows:
library(fluEvidenceSynthesis)
# The custom inference function. In this example the custom inference function
# performs exactly the same inference as the original C++ function (above).
# It is up to the user to change this in a way that works for their analysis.
custom_inference <- function(demography, vaccine_calendar, polymod_data, ili,
mon_pop, n_pos, n_samples, initial, mapping,
nbatch, nburn, blen) {
current.contact.ids <- seq(1,nrow(polymod_uk))
proposed.contact.ids <- current.contact.ids
# Seven age groups used in the model
age.group.limits <- c(1,5,15,25,45,65)
# Sum all populations with a certain age into their corresponding age group
age.group.sizes.5 <- stratify_by_age(demography, c(5,15,45,65))
if (missing(mapping))
mapping <- age_group_mapping(age.group.limits, c(5,15,45,65))
# Define the actual log likelihood function
llikelihood <- function( pars ) {
# Resample contact ids
proposed.contact.ids <<- current.contact.ids
if (runif(1,0,1) < 0.1) {
rs <- round(runif(2,1,length(proposed.contact.ids)))
proposed.contact.ids[rs[1]] <<- rs[2]
}
contacts <- contact_matrix(as.matrix(polymod_uk[proposed.contact.ids,]),
demography, age.group.limits )
age.groups <- stratify_by_age(demography,
age.group.limits )
# Fraction of each age group classified as high risk
# We can classify a third risk group, but we are not doing
# that here (the second row is 0 in our risk.ratios matrix)
risk.ratios <- matrix(c(
0.021, 0.055, 0.098, 0.087, 0.092, 0.183, 0.45,
0, 0, 0, 0, 0, 0, 0
), ncol = 7, byrow = T)
# Population sizes in each age and risk group
popv <- stratify_by_risk(
age.groups, risk.ratios );
# Population size initially infected by age and risk group
initial.infected <- rep( 10^pars[9], 7 )
initial.infected <- stratify_by_risk(
initial.infected, risk.ratios );
# Run simulation
# Note that to reduce complexity
# we are using the same susceptibility parameter for multiple age groups
odes <- infectionODEs( popv, initial.infected,
vaccine_calendar,
contacts,
c(pars[6], pars[6], pars[6],
pars[7], pars[7], pars[7], pars[8]),
transmissibility = pars[5],
c(0.8,1.8), 7 )
# Ignore times row
odes <- odes[,2:22]
# Convert the 7 age groups for each risk group to 5 groups
from <- as.numeric(mapping$from)
to <- as.numeric(mapping$to)
converted.odes <- matrix(0, nrow = nrow(odes), ncol = max(to))
for (i in 1:nrow(mapping)) {
# all three age groups
fv <- c(0,7,14) + from[i]
converted.odes[,to[i]] <- converted.odes[,to[i]] + mapping$weight[i]*rowSums(odes[,fv])
}
# For each week and each group sum log likelihood
epsilons <- c(pars[1], pars[1], pars[2], pars[2], pars[3])
ll <- log_likelihood_cases(
epsilons,pars[4], as.matrix(converted.odes),
age.group.sizes.5, ili, mon_pop,
n_pos, n_samples)
return(ll)
}
llprior <- function(pars) {
if (any(pars[1:8] < 0) || any(pars[1:4] > 1) || any(pars[6:8] > 1)
|| pars[9] < log(0.00001) || pars[9] > log(10) )
return(-Inf)
lprob <- dnorm(pars[5], 0.1653183, 0.02773053, 1)
lprob <- lprob + dlnorm(pars[1], -4.493789, 0.2860455, 1)
lprob <- lprob + dlnorm(pars[2], -4.117028, 0.4751615, 1)
lprob <- lprob + dlnorm(pars[3], -2.977965, 1.331832, 1)
return(lprob)
}
# Store the contact ids used during inference
contact.ids <- list()
# Run adaptive.mcmc
mcmc.result <- adaptive.mcmc(lprior = llprior, llikelihood = llikelihood,
outfun = function() {
contact.ids[[length(contact.ids)+1]] <<- current.contact.ids
},
acceptfun = function() {
current.contact.ids <<- proposed.contact.ids
},
nburn = nburn,
initial = initial,
nbatch = nbatch, blen = blen)
mcmc.result$contact.ids <- t(data.frame(contact.ids))
mcmc.result
}
The resulting custom inference function can be called similarly to the original inference function.
inference.results <- custom_inference(demography = demography,
vaccine_calendar = vaccine_calendar,
polymod_data = as.matrix(polymod_uk),
ili = ili$ili,
mon_pop = ili$total.monitored,
n_pos = confirmed.samples$positive,
n_samples = confirmed.samples$total.samples,
initial = initial.parameters,
nbatch = 1000,
nburn = 1000, blen = 5 )
Plotting the resulting posterior parameter values1.
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
library(ggplot2)
colnames(inference.results$batch) <- c("eps1", "eps2", "eps3", "psi", "q",
"susc1", "susc2", "susc3", "I0")
ggplot(data=melt(inference.results$batch)) + facet_wrap( ~ Var2, ncol=3, scales="free" ) + geom_histogram(aes(x=value), bins=25)
Here we plot the credibility intervals of our models. Each plot is the result for one of the seven age groups and shows the number of new cases each week.
library(fluEvidenceSynthesis)
library(ggplot2)
# Function that runs the model given a set of parameters. Most of the function
# has todo with loading the correct inputs for the ODE model
ode.results <- function( pars )
{
data("demography")
age.group.limits <- c(1,5,15,25,45,65)
contacts <- contact_matrix(as.matrix(polymod_uk),
demography, age.group.limits )
age.groups <- stratify_by_age(demography, age.group.limits)
# Fraction of each age group classified as high risk
# We can classify a third risk group, but we are not doing
# that here (the second row is 0 in our risk.ratios matrix)
risk.ratios <- matrix( c(
0.021, 0.055, 0.098, 0.087, 0.092, 0.183, 0.45,
0, 0, 0, 0, 0, 0, 0
), ncol=7, byrow=T )
# Population sizes in each age and risk group
popv <- stratify_by_risk(
age.groups, risk.ratios )
# Population size initially infected by age and risk group
initial.infected <- rep( 10^pars[9], 7 )
initial.infected <- stratify_by_risk(
initial.infected, risk.ratios )
# Run simulation
# Note that to reduce complexity
# by using the same susceptibility parameter for multiple age groups
odes <- infectionODEs( popv, initial.infected,
vaccine_calendar,
contacts,
c(pars[6],pars[6],pars[6],
pars[7],pars[7],pars[7],pars[8]),
transmissibility=pars[5],
c(0.8,1.8), 7 )
# For simplicity we sum the low and high risk group
simplify.odes <- odes[,2:8]+odes[,9:15]+odes[,16:22]
rownames(simplify.odes) <- odes[,1]
return( simplify.odes )
}
# Calculate the credibility intervals for each time point. By default the function
# calculate it for the (equal tailed) credibility interval of 0% (median), 50% and 98%
cim <- credible.interval.model(ode.results, inference.results$batch, intervals=c(0,0.5, 0.98))
cim$row.ID <- as.Date(as.character(cim$row.ID))
ggplot( data=cim ) + facet_wrap( ~ column.ID, ncol=3, scales="free" ) +
stat_ci( aes(row.ID,value, ci = aggregate) ) +
xlab("Time") + ylab("New infections")
Baguelin, Marc, Stefan Flasche, Anton Camacho, Nikolaos Demiris, Elizabeth Miller, and W. John Edmunds. ‘Assessing Optimal Target Populations for Influenza Vaccination Programmes: An Evidence Synthesis and Modelling Study.’ PLoS Med 10, no. 10 (2013): e1001527. doi:10.1371/journal.pmed.1001527.
Be aware that these results are for a very short mcmc run and are not realistic↩