Alometry law for the Nile Tilapia

nile tilapia

(image from Fishsource)

We use the alometry law $y = A x^B$ to correlate the length $x$ of the fish tilapia with its mass $y$. The following table for the length-weight relationship of growing-finishing cage-farmed Nile tilapia (Oreocromis niloticus)is provided in the article T. S. de Castro Silva, L. D. dos Santos, L. C. R. da Silva, M. Michelato, V. R. B. Furuya, W. M. Furuya, Length-weight relationship and prediction equations of body composition for growing-finishing cage-farmed Nile tilapia, R. Bras. Zootec. vol.44 no.4 Viçosa Apr. 2015):

Days of culture120406080100
Mass (g)28.6 ± 4.288.6 ± 1.4177.6 ± 3.6313.8 ± 12.8423.7 ± 12.7774.4 ± 23.6
Length (cm)10.9±0.415.3±0.419.1±0.222.8±0.526.3±0.631.3±0.4

We use Turing.jl with the compound model

\[ \begin{align*} A & \sim \mathrm{InverseGamma}(\alpha_A, \beta_A) \\ B & \sim \mathrm{InverseGamma}(\alpha_B, \beta_B) \\ \sigma^2 & \sim \mathrm{InverseGamma}(2, 1) \\ y & \sim \mathrm{Normal}(A x^B, \sigma^2) \end{align*}\]

with suitable hyperparameters $(\alpha_A, \beta_A)$ and $(\alpha_B, \beta_B)$ to be chosen shortly.

We start by loading the necessary packages:

using Distributions, Turing, StatsPlots

Then we collect the adimensionalized data in vector form and plot it for the fun of it:

xx = [10.9, 15.3, 19.1, 22.8, 26.3, 31.3]
yy = [28.6, 88.6, 177.6, 313.8, 423.7, 774.4]

scatter(xx, yy, xlabel="Body length (cm)", ylabel="Body weight (g)", xlims=(0.0, 35.0), ylims=(0.0, 1000.0), title="Length-weight relationship of growing-finishing cage-farmed Nile tilapia", titlefont=10, legend=nothing)
Example block output

We define the Turing.jl model with parameters $A$ and $B$ as Inverse Gamma functions with hyperparameters Ah = (Ah.α, Ah.β) and Bh=(Bh.α, Bh.β).

@model function alometry(x, q; Ah, Bh)
    A ~ InverseGamma(Ah.α, Ah.β)
    B ~ InverseGamma(Bh.α, Bh.β)
    σ² ~ InverseGamma(2, 1)
    σ = sqrt(σ²)

    for i in eachindex(x)
        y = A * x[i] ^ B
        q[i] ~ Normal(y, σ)
    end
end
alometry (generic function with 2 methods)

As in any nonlinear optimization problem, the starting point is crucial. Here, the starting point is our prior. The following prior does not work properly, as we can see.

model = alometry(xx, yy; Ah=(α=1,β=1), Bh=(α=1,β=1))
chain = sample(model, NUTS(0.65), 1_000)
Chains MCMC chain (1000×15×1 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 1
Samples per chain = 1000
Wall duration     = 3.61 seconds
Compute duration  = 3.61 seconds
parameters        = A, B, σ²
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters        mean         std       mcse   ess_bulk   ess_tail      rha ⋯
      Symbol     Float64     Float64    Float64    Float64    Float64   Float6 ⋯

           A      0.2748      0.1859     0.0202    77.3210    94.0721    1.018 ⋯
           B      2.3303      0.1701     0.0196    77.6471    93.8946    1.019 ⋯
          σ²   1810.1147   1392.6146   158.0125    16.9285    17.5254    1.061 ⋯
                                                               2 columns omitted

Quantiles
  parameters       2.5%      25.0%       50.0%       75.0%       97.5%
      Symbol    Float64    Float64     Float64     Float64     Float64

           A     0.0908     0.1515      0.2109      0.3371      0.8258
           B     1.9471     2.2133      2.3592      2.4551      2.6086
          σ²   480.0587   811.3111   1403.1287   2347.0384   5610.4958
plt = scatter(xx, yy, xlabel="Body length (cm)", ylabel="Body weight (g)", xlims=(0.0, 35.0), ylims=(0.0, 1000.0), title="Length-weight relationship of growing-finishing cage-farmed Nile tilapia", titlefont=10, legend=nothing)
xxx = range(0.9*first(xx), 1.1*last(xx), length=200)
yyy = mean(chain, :A) * xxx .^ mean(chain, :B)
plot!(plt, xxx, yyy)
Example block output

If we start with a more informative prior, we get a suitable result. If we pick two data points $(x_1, y_1)$ and $(x_2, y_2)$, assuming $y \approx Ax^B$, we have $y_2/y_1 = (x_2/x_1)^B$, so that

\[ B = \frac{\ln(y_2/y_1)}{\ln(x_2/x_1)}.\]

If we choose the second and third points, we get

    B = log(yy[3]/yy[2])/log(xx[3]/xx[2])
3.1347640575458167

From that, we can also estimate $A$ from $A = y/x^B$, so that, choosing the second point

    A = yy[2]/xx[2]^B
0.017127958628527618

With that in mind, we choose the hyperparameters for the prior as $(\alpha_A, \beta_A) = (57, 1)$, with mean $1/(57+1) \approx 0.017$, and $(\alpha_B, \beta_B) = (1, 6)$, with mean $6/(1+1) = 3.0$.

model = alometry(xx, yy; Ah=(α=57,β=1), Bh=(α=1,β=6))
DynamicPPL.Model{typeof(Main.alometry), (:x, :q), (:Ah, :Bh), (), Tuple{Vector{Float64}, Vector{Float64}}, Tuple{@NamedTuple{α::Int64, β::Int64}, @NamedTuple{α::Int64, β::Int64}}, DynamicPPL.DefaultContext}(Main.alometry, (x = [10.9, 15.3, 19.1, 22.8, 26.3, 31.3], q = [28.6, 88.6, 177.6, 313.8, 423.7, 774.4]), (Ah = (α = 57, β = 1), Bh = (α = 1, β = 6)), DynamicPPL.DefaultContext())

With this prior, we attempt again to fit the model.

# chain = sample(model, HMC(0.05, 10), 4_000) # HMC seems quite unstable here
chain = sample(model, NUTS(0.65), MCMCSerial(), 1000, 3; progress=false)
Chains MCMC chain (1000×15×3 Array{Float64, 3}):

Iterations        = 501:1:1500
Number of chains  = 3
Samples per chain = 1000
Wall duration     = 1.4 seconds
Compute duration  = 1.18 seconds
parameters        = A, B, σ²
internals         = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size

Summary Statistics
  parameters       mean        std      mcse   ess_bulk    ess_tail      rhat  ⋯
      Symbol    Float64    Float64   Float64    Float64     Float64   Float64  ⋯

           A     0.0184     0.0023    0.0001   711.2213   1002.9183    1.0041  ⋯
           B     3.0923     0.0372    0.0014   717.3784   1002.8145    1.0042  ⋯
          σ²   212.8771   144.5667    5.3585   802.0512    502.0103    1.0010  ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%      25.0%      50.0%      75.0%      97.5%
      Symbol   Float64    Float64    Float64    Float64    Float64

           A    0.0144     0.0168     0.0182     0.0198     0.0235
           B    3.0178     3.0676     3.0923     3.1181     3.1633
          σ²   74.6209   126.4572   174.6941   253.7830   555.5175

Here is the result of the chain.

plot(chain)
Example block output

Taking the mean of the parameters $A$ and $B$ we plot the fitted curve.

plt = scatter(xx, yy, xlabel="Body length (cm)", ylabel="Body weight (g)", xlims=(0.0, 35.0), ylims=(0.0, 1000.0), title="Length-weight relationship of growing-finishing cage-farmed Nile tilapia", titlefont=10, legend=nothing)
xxx = range(0.9*first(xx), 1.1*last(xx), length=200)
yyy = mean(chain, :A) * xxx .^ mean(chain, :B)
plot!(plt, xxx, yyy)
Example block output

This seems successful. Now we compute the 95% credible interval

quantiles = reduce(
    hcat,
    quantile(
        [
            A * x^B for (A, B) in eachrow(view(chain.value.data, :, 1:2, 1))
        ],
        [0.025, 0.975]
        )
    for x in xxx
)
2×200 Matrix{Float64}:
 19.804   20.5959  21.411   22.2546  23.1207  …   976.612   987.523   998.517
 23.1048  23.9897  24.8968  25.8339  26.7986     1048.51   1060.29   1072.16

and plot it along the data:

plt = plot(xlabel="Body length (cm)", ylabel="Body weight (g)", xlims=(0.0, 35.0), ylims=(0.0, 1000.0), title="Length-weight relationship of growing-finishing cage-farmed Nile tilapia", titlefont=10, legend=nothing)
plot!(plt, xxx, yyy, ribbon=(yyy .- view(quantiles, 1, :), view(quantiles, 2, :) .- yyy), label="Bayesian fitted line", color=2)
scatter!(plt, xx, yy, color=1)
Example block output

We end this section plotting an ensemble of lines generated with the chain.

plt = plot(xlabel="Body length (cm)", ylabel="Body weight (g)", xlims=(0.0, 35.0), ylims=(0.0, 1000.0), title="Length-weight relationship of growing-finishing cage-farmed Nile tilapia", titlefont=10, legend=nothing)
plot!(plt, xxx, yyy, label="Bayesian fitted line", color=2)
for (a, b) in eachrow(view(chain.value.data, :, 1:2, 1))
    plot!(plt, xxx, a .* xxx .^b, alpha=0.01, color=2, label=false)
end
scatter!(plt, xx, yy, color=1)
Example block output