Modeling mortality tables

In this section, we attempt to fit the Gompertz-Makeham and the Heligman-Pollard models to a selected mortality table.

The Gompertz-Makeham model

In the Gompertz-Makeham model, the force of mortality (akin to the hazard function) $\mu_x$ is given by

\[ \mu_x = Ae^{Bx} + C,\]

for age $x$. The force of mortality represents the instantaneous rate of mortality at a certain age $x$, in an annual basis. It is closely related to the mortality rate $q_x$, which is the percentage of deaths in a population per year, which can be interpreted as the probability a person at an age $x$, in years, dies before reaching age $x+1$. Under certain assumptions, the two are related by

\[ \mu_x = \frac{q_x}{1 - q_x}, \qquad q_x = \frac{\mu_x}{1 + \mu_x}.\]

The terms $A$ and $B$ are associated with the Gompertz law $Ae^{Bx}$, while the term $C$ is an additional term provided by Makeham, which combine to form the Gompertz-Makeham model.

The Heligman-Pollard model

The Gompertz-Makeham model approximates reasonably well the force of mortality, especially the growth seen in adult years, but a better model is the Heligman-Pollard, that also approximates well the childhood and young groups and the eldery years. This model takes the form

\[ \mu_x = A^{(x + B)^C} + D e^{-E \ln(x/F)^2} + \frac{GH^x}{1 + KGH^x},\]

for suitable coefficients $A, B, C, D, E, F, G, H, K$. It is common to see the Heligman-Pollard models with $K=0$, which is the main model suggested by the authors, but in the same paper they also discuss two extensions of the model, one of them being the one above.

We can clearly distinguish three terms in the model, with the first term, with parameters $A, B, C$, modeling the steep exponential decline in mortality in the early childhood years due in part to a relatively high degree of mortality in the newborn; the second term, with parameters $D, E, F$, representing the log-normal bump in mortality in the youth ages; and the last term, with parameters $G, H, K$, with the exponential growth at middle to older ages. Notice the last term follows the original Gompertz law (without the additional term due to Makeham).

A mortality table

We start by loading the necessary packages

using Distributions, Turing, StatsPlots

We illustrate the modeling process with the United States 1997 mortality table. We removed the data from ages 0-1 and 100+ hoping for a better fit.

x = collect(1:99)
# mortality rate
qx = [
    0.00055
    0.00036
    0.00029
    0.00023
    0.00021
    0.00020
    0.00019
    0.00017
    0.00015
    0.00014
    0.00014
    0.00019
    0.00028
    0.00041
    0.00055
    0.00068
    0.00078
    0.00085
    0.00089
    0.00093
    0.00098
    0.00101
    0.00101
    0.00101
    0.00100
    0.00099
    0.00100
    0.00103
    0.00108
    0.00114
    0.00119
    0.00126
    0.00133
    0.00140
    0.00149
    0.00157
    0.00167
    0.00178
    0.00192
    0.00206
    0.00222
    0.00239
    0.00257
    0.00278
    0.00300
    0.00325
    0.00352
    0.00380
    0.00411
    0.00444
    0.00482
    0.00524
    0.00571
    0.00623
    0.00685
    0.00755
    0.00833
    0.00916
    0.01005
    0.01101
    0.01208
    0.01321
    0.01439
    0.01560
    0.01679
    0.01802
    0.01948
    0.02127
    0.02338
    0.02565
    0.02799
    0.03043
    0.03297
    0.03563
    0.03843
    0.04147
    0.04494
    0.04904
    0.05385
    0.05938
    0.06555
    0.07241
    0.07990
    0.08812
    0.09653
    0.10556
    0.11539
    0.12616
    0.13802
    0.15085
    0.16429
    0.17813
    0.19250
    0.20764
    0.22354
    0.23999
    0.25653
    0.27295
    0.28915
]
# mx = - log.(1.0 .- qx) # force of mortality # roughly the same as below
mx = qx ./ (1.0 .- qx) # force of mortality
scatter(x, qx, yscale=:log10, legend=:topleft, label="qx")
scatter!(x, mx, yscale=:log10, legend=:topleft, label="mx")
Example block output

The Gompertz-Makeham model in Turing.jl

First we start by defining the function that characterizes the Gompertz-Makeham model

function gompertz_makeham(x, p)
    A, B, C = p
    m = A * exp(B * x) + C
    return m
end
gompertz_makeham (generic function with 1 method)

Now we define the compound probability model, assigning Beta distributions to the parameters. But, as mentioned before, the initial prior is very important. It is not easy to get a good fit. So, first, we approximate by "hand", with trial and error, to get the following approximate fit:

let (A, B, C) = (0.00001, 0.1, 0.001)
    m = A * exp.(B * x) .+ C
    plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
    plot!(plt, x, m, label="Gompertz-Makeham hand-fit")
    scatter!(plt, x, mx, label="data")
end
Example block output

With these values, we define the prior Beta distributions for the compound probability model with parameters that yield a mean near those values.

@model function gompertz_makeham_model(x, m)
    A ~ Beta(2, 99998)
    B ~ Beta(2, 18)
    C ~ Beta(2, 1998)
    σ² ~ InverseGamma()
    σ = sqrt(σ²)
    p = (A, B, C)

    for i in eachindex(x)
        y = gompertz_makeham(x[i], p)
        m[i] ~ Normal(y, σ)
    end
end
gompertz_makeham_model (generic function with 2 methods)

Now we instantiate the Turing model

model_gm = gompertz_makeham_model(x, mx)
DynamicPPL.Model{typeof(Main.gompertz_makeham_model), (:x, :m), (), (), Tuple{Vector{Int64}, Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}(Main.gompertz_makeham_model, (x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  90, 91, 92, 93, 94, 95, 96, 97, 98, 99], m = [0.0005503026664665567, 0.0003601296466728022, 0.00029008412439607484, 0.00023005291216979904, 0.00021004410926294523, 0.00020004000800160032, 0.00019003610686030346, 0.00017002890491383537, 0.0001500225033755063, 0.0001400196027443842  …  0.17764823647176592, 0.19658733292649364, 0.21673744022777328, 0.23839009287925697, 0.2620526023524661, 0.2878963501017438, 0.31577216089261984, 0.34504418470146736, 0.3754212227494671, 0.4067665470915102]), NamedTuple(), DynamicPPL.DefaultContext())

and fit it:

# chain_gm = sample(model_gm, HMC(0.05, 10), 40_000)
chain_gm = sample(model_gm, NUTS(0.65), 5_000)
Chains MCMC chain (5000×16×1 Array{Float64, 3}):

Iterations        = 1001:1:6000
Number of chains  = 1
Samples per chain = 5000
Wall duration     = 8.72 seconds
Compute duration  = 8.72 seconds
parameters        = A, B, C, σ²
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           A    0.0000    0.0000    0.0000   2319.3678   1923.6552    1.0002   ⋯
           B    0.1019    0.0078    0.0002   2322.4045   1976.5187    1.0003   ⋯
           C    0.0010    0.0007    0.0000   2682.2130   1992.0565    1.0000   ⋯
          σ²    0.0205    0.0030    0.0001   3239.9145   2762.0779    1.0000   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           A    0.0000    0.0000    0.0000    0.0000    0.0001
           B    0.0886    0.0964    0.1010    0.1066    0.1196
           C    0.0001    0.0005    0.0008    0.0014    0.0028
          σ²    0.0155    0.0184    0.0202    0.0223    0.0273

Here is the result of the MCMC:

plot(chain_gm)
Example block output

We can see the mean values of the parameters as follows

mean(chain_gm)
Mean
  parameters      mean
      Symbol   Float64

           A    0.0000
           B    0.1019
           C    0.0010
          σ²    0.0205

The mean fit is given by

m_gm = [gompertz_makeham(xi, mean(chain_gm, [:A, :B, :C]).nt.mean) for xi in x]
99-element Vector{Float64}:
 0.0010279596724221227
 0.001030513151744429
 0.0010333404873858415
 0.001036471049971319
 0.001039937360075447
 0.001043775426049186
 0.0010480251180779601
 0.0010527305823568437
 0.0010579406996853308
 0.001063709593245615
 ⋮
 0.22945562053442967
 0.2539566525560491
 0.28108537863617566
 0.31112361449828724
 0.3443834001234147
 0.3812102412504186
 0.42198669852165843
 0.4671363615584368
 0.5171282492492901

and we plot it

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
plot!(plt, x, m_gm, label="Gompertz-Makeham fit")
scatter!(plt, x, mx, label="data")
Example block output

It remains to compute the 95% credible interval,

quantiles_gm = reduce(
    hcat,
    quantile(
        [
            gompertz_makeham(xi, (A, B, C)) for (A, B, C) in eachrow(view(chain_gm.value.data, :, 1:3, 1))
        ],
        [0.025, 0.975]
        )
    for xi in x
)
2×99 Matrix{Float64}:
 0.00014535  0.000147215  0.000150652  …  0.236819  0.261927  0.288608
 0.00279306  0.00279493   0.00279701      0.440955  0.491764  0.5468

and plot it

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
plot!(plt, x, m_gm, ribbon=(m_gm .- view(quantiles_gm, 1, :), view(quantiles_gm, 2, :) .- m_gm), label="Gompertz-Makeham fit")
scatter!(plt, x, mx, label="data")
Example block output

Notice how the function with the means of the parameters is outside the quantiles, which is based on the function values of the parameter samples. Let's check the last portion of the ensemble.

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=nothing)
plot!(plt, x, m_gm, label="Bayesian fitted line", color=2)
for (A, B, C) in eachrow(view(chain_gm.value.data, 4500:5000, 1:3, 1))
    plot!(plt, x, x -> gompertz_makeham(x, (A, B, C)), alpha=0.1, color=2, label=false)
end
scatter!(plt, x, mx, color=1)
Example block output

Let's look at just a few samples to have a better look at the dependence of the function on the sampled values:[off]

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=nothing)
plot!(plt, x, m_gm, label="Bayesian fitted line", color=2)
for (A, B, C) in eachrow(view(chain_gm.value.data, 1000:50:2000, 1:3, 1))
    plot!(plt, x, x -> gompertz_makeham(x, (A, B, C)), alpha=0.4, color=3, label=false)
end
scatter!(plt, x, mx, color=1)
Example block output

The Heligman-Pollard in Turing.jl

Now we consider the Heligman-Pollard model.

First we start by defining the function that characterizes the model:

function heligman_pollard(x, p)
    A, B, C, D, E, F, G, H, K = p
    m = A^((x + B)^C) + D * exp(-E * log(x / F)^2) + (G * H^x) / (1 + K * G * H^x)
    return m
end
heligman_pollard (generic function with 1 method)

Now we define the compound probability model. As mentioned before, the initial prior is very important. We start with numbers of the order of those given in the original article by Heligman and Pollard. They considered a number of examples, but, as a starting point, we borrow only the data from the 1970-1972 period, separated by gender:

ParameterMales 1970-72Females 1970-1972
A0.001600.00142
B0.001120.0350
C0.11120.1345
D0.001630.00038
E16.7121.86
F20.0318.27
G0.00005020.0000507
H1.10741.0937
K2.416-2.800

The difference in the sign of $K$ between the male and female populations is justified by the difference in mortality for the elderly, although the data showed in the article both look more like in the description of the male mortality. Let's see how an approximation of that looks like, keeping a positive sign for $K$ because of this observation.

let (A, B, C, D, E, F, G, H, K) = (0.0015, 0.018, 0.012, 0.001, 19.0, 19.0, 0.00005, 1.1, 1.0)
    m = [heligman_pollard(xi, (A, B, C, D, E, F, G, H, K)) for xi in x]
    plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
    plot!(plt, x, m, label="Heligman-Pollard hand-fit")
    scatter!(plt, x, mx, label="data")
end
Example block output

Ok, that seems like a reasonable starting point. So we choose the following priors for each parameter:

Parameterpriormean
ABeta(15, 9985)0.0015
BBeta(18, 982)0.018
CBeta(12, 988)0.012
DBeta(2, 1998)0.001
EGamma(38, 0.5)19.0
FGamma(38, 0.5)19.0
GBeta(5, 99995)0.00005
HGamma(2, 0.5)1.0
KGamma(2, 0.5)1.0

But these turned out not to be so good. We either look for better informative priors or use slightly less informative priors. We fiddle a little bit with the parameters to get the following improvement.

let (A, B, C, D, E, F, G, H, K) = (0.0003, 0.02, 0.08, 0.0008, 15.0, 20.0, 0.00005, 1.1, 1.0)
    m = [heligman_pollard(xi, (A, B, C, D, E, F, G, H, K)) for xi in x]
    plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
    plot!(plt, x, m, label="Heligman-Pollard hand-fit")
    scatter!(plt, x, mx, label="data")
end
Example block output

Based on that, we choose the following priors:

@model function heligman_pollard_model(x, m)
    A ~ Beta(3, 9997) # Beta(1, 660) # Beta(15, 9985)
    B ~ Beta(2, 98) # Beta(1, 50) # Beta(18, 982)
    C ~ Beta(8, 92) # Beta(12, 988)
    D ~ Beta(8, 9992) # Beta(1, 999) # Beta(2, 1998)
    E ~ Gamma(30, 0.5) # Gamma(19, 1) # Gamma(38, 0.5)
    F ~ Gamma(40, 0.5) # Gamma(19, 1) # Gamma(38, 0.5)
    G ~ Beta(5, 99995) # Beta(1, 19999) # Beta(5, 99995)
    H ~ Gamma(2.2, 0.5) # Gamma(1, 1) # Gamma(2, 0.5)
    K ~ Gamma(2.0, 0.5) # Gamma(1, 1) # Gamma(2, 0.5)
    σ² ~ InverseGamma()
    σ = sqrt(σ²)

    for i in eachindex(x)
        y = A ^ ((x[i] + B) ^ C) + D * exp( - E * (log(x[i]) - log(F)) ^ 2) + G * H ^ (x[i]) # / (1 + K * G * H ^ (x[i]) )
        m[i] ~ Normal(y, σ)
    end
end
heligman_pollard_model (generic function with 2 methods)

Now we instantiate the Heligman-Pollard Turing model

model_hp = heligman_pollard_model(x, mx)
DynamicPPL.Model{typeof(Main.heligman_pollard_model), (:x, :m), (), (), Tuple{Vector{Int64}, Vector{Float64}}, Tuple{}, DynamicPPL.DefaultContext}(Main.heligman_pollard_model, (x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10  …  90, 91, 92, 93, 94, 95, 96, 97, 98, 99], m = [0.0005503026664665567, 0.0003601296466728022, 0.00029008412439607484, 0.00023005291216979904, 0.00021004410926294523, 0.00020004000800160032, 0.00019003610686030346, 0.00017002890491383537, 0.0001500225033755063, 0.0001400196027443842  …  0.17764823647176592, 0.19658733292649364, 0.21673744022777328, 0.23839009287925697, 0.2620526023524661, 0.2878963501017438, 0.31577216089261984, 0.34504418470146736, 0.3754212227494671, 0.4067665470915102]), NamedTuple(), DynamicPPL.DefaultContext())

and fit it:

# chain_hp = sample(model_hp, HMC(0.05, 10), 4_000) # Pure HMC didn't converge
# chain_hp = sample(model_hp, MH(), 5_000) # Metropolis-Hasting didn't converge
# chain_hp = sample(model_hp, Gibbs(MH(:A, :B, :C), MH(:D, :E, :F), HMC(0.65, 5, :G, :H, :K)), 5_000) # I am afraid I am not getting this right
chain_hp = sample(model_hp, NUTS(0.85), 5_000)
Chains MCMC chain (5000×22×1 Array{Float64, 3}):

Iterations        = 1001:1:6000
Number of chains  = 1
Samples per chain = 5000
Wall duration     = 27.61 seconds
Compute duration  = 27.61 seconds
parameters        = A, B, C, D, E, F, G, H, K, σ²
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           A    0.0003    0.0002    0.0000   6288.4819   2407.0987    1.0005   ⋯
           B    0.0199    0.0140    0.0002   5297.4639   2343.2760    0.9999   ⋯
           C    0.0800    0.0273    0.0003   6412.9666   3318.2054    1.0000   ⋯
           D    0.0008    0.0003    0.0000   6840.5086   3520.0791    1.0009   ⋯
           E   15.0611    2.7855    0.0306   8066.9082   3006.4320    1.0002   ⋯
           F   20.0503    3.1971    0.0371   7380.3479   3494.9658    1.0003   ⋯
           G    0.0000    0.0000    0.0000   4118.4309   2831.3746    0.9999   ⋯
           H    1.0961    0.0056    0.0001   4055.0930   2675.5635    0.9999   ⋯
           K    0.9975    0.7109    0.0089   4842.3014   2357.3357    1.0006   ⋯
          σ²    0.0204    0.0030    0.0000   7146.8734   3374.2040    1.0009   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           A    0.0001    0.0002    0.0003    0.0004    0.0007
           B    0.0026    0.0093    0.0166    0.0271    0.0553
           C    0.0353    0.0603    0.0772    0.0965    0.1412
           D    0.0004    0.0006    0.0008    0.0010    0.0014
           E   10.0060   13.1441   14.8909   16.8106   21.0734
           F   14.2694   17.8391   19.8751   22.1139   26.7570
           G    0.0000    0.0000    0.0000    0.0001    0.0001
           H    1.0859    1.0923    1.0958    1.0996    1.1083
           K    0.1141    0.4733    0.8374    1.3451    2.7937
          σ²    0.0153    0.0183    0.0201    0.0222    0.0271

Here is the result of the MCMC:

plot(chain_hp)
Example block output

We can see the mean values of the parameters as follows

mean(chain_hp)
Mean
  parameters      mean
      Symbol   Float64

           A    0.0003
           B    0.0199
           C    0.0800
           D    0.0008
           E   15.0611
           F   20.0503
           G    0.0000
           H    1.0961
           K    0.9975
          σ²    0.0204

The mean fit is given by

m_hp = [heligman_pollard(xi, mean(chain_hp, [:A, :B, :C, :D, :E, :F, :G, :H, :K]).nt.mean) for xi in x]
99-element Vector{Float64}:
 0.00034968012338853555
 0.000246488537808511
 0.00020636590783560657
 0.00018638293067518293
 0.00017594612993119857
 0.00017108071285336205
 0.00016995535637307591
 0.00017162388344596095
 0.00017560641079663637
 0.0001820092419380469
 ⋮
 0.1731683176040796
 0.1867136428394152
 0.20106169770159063
 0.2162201728911004
 0.23219045338601163
 0.24896688723562782
 0.2665361213372277
 0.28487653614007535
 0.303957812133525

and we plot it

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
plot!(plt, x, m_hp, label="Heligman-Pollard fit")
scatter!(plt, x, mx, label="data")
Example block output

It remains to compute the 95% credible interval,

quantiles_hp = reduce(
    hcat,
    quantile(
        [
            heligman_pollard(xi, p) for p in eachrow(view(chain_hp.value.data, :, 1:9, 1))
        ],
        [0.025, 0.975]
        )
    for xi in x
)
2×99 Matrix{Float64}:
 0.000104923  7.96105e-5   6.99194e-5   …  0.16353   0.172546  0.182292
 0.000755473  0.000544423  0.000458298     0.360687  0.393052  0.428256

and plot it

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=:topleft)
plot!(plt, x, m_hp, ribbon=(m_hp .- view(quantiles_hp, 1, :), view(quantiles_hp, 2, :) .- m_hp), label="Heligman-Pollard fit")
scatter!(plt, x, mx, label="data")
Example block output

Notice how the function with the means of the parameters is again outside the quantiles, which is based on the function values of the parameter samples. Let's check the last portion of the ensemble:

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=nothing)
plot!(plt, x, m_hp, label="Bayesian fitted line", color=2)
for p in eachrow(view(chain_hp.value.data, 4500:5000, 1:9, 1))
    plot!(plt, x, x -> heligman_pollard(x, p), alpha=0.1, color=2, label=false)
end
scatter!(plt, x, mx, color=1)
Example block output

Let's look at just a few samples to have a better look at the dependence of the function on the sampled values:

plt = plot(yscale=:log10, title="Force of mortality", titlefont=10, xlabel="age", ylabel="force of mortality", legend=nothing)
plot!(plt, x, m_hp, label="Bayesian fitted line", color=2)
for p in eachrow(view(chain_hp.value.data, 1000:50:2000, 1:9, 1))
    plot!(plt, x, x -> heligman_pollard(x, p), alpha=0.4, color=3, label=false)
end
scatter!(plt, x, mx, color=1)
Example block output
  • offHow often do you see $x \mapsto f_{\mathrm{mean}(p)}(x)$ fall off the credible interval of the family $x \mapsto \{f_p(x)\}_p$, where $p$ is the set of parameters? It happens.