Deriving Secret Sales Volumes with Bayesian Inference

Using RxInfer to decouple two signals

Retail Industry
Bayesian Inference
Active Inference
RxInfer
Julia
Author

Kobus Esterhuysen

Published

October 12, 2024

Modified

October 12, 2024

Back to Blog |  LearnableLoopAI.com |  Portfolio of Projects |  LinkedIn


versioninfo() ## Julia version
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_NUM_THREADS = 
import Pkg
Pkg.add(Pkg.PackageSpec(;name="RxInfer"))
Pkg.add("Plots")
Pkg.add("StableRNGs")
Pkg.add("LaTeXStrings")
Pkg.add("Distributions")

using RxInfer, Random, Plots, StableRNGs, LaTeXStrings, Distributions
    Updating registry at `~/.julia/registries/General.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.10/Project.toml`
  No Changes to `~/.julia/environments/v1.10/Manifest.toml`
Pkg.status()
Status `~/.julia/environments/v1.10/Project.toml`
  [31c24e10] Distributions v0.25.112
  [b964fa9f] LaTeXStrings v1.3.1
  [91a5bcdd] Plots v1.40.8
  [86711068] RxInfer v3.7.1
  [860ef19b] StableRNGs v1.0.2

Deriving Secret Sales Volumes

This use case of active inference comes from the retail industry. We use the published sales volume of a company over 250 time intervals. They state that this volume is composed of two components:

  • Product sales
  • Rental sales

For strategic reasons they only publish the combined volume. Their competitor came to us with a request to try and decouple the underlying signals. This will help the competitor to decide on a strategic repositioning in the marketplace.

To test our approach, we will use a simple random walk model to simulate both the product sales volume as well as the rental sales volume. We also add these two signals to simulate the published combined sales volume. Then we will use Bayesian inference inside the RxInfer Julia package to attempt to decouple the two sales volume signals - one for product volume and the other for rental volume. During inference we will not have access to the separate product and rental sales signals, but only to their combination. If it works well enough, we will take the real published data and attempt to decouple the two sales volume signals.

The Generative Process

Provision function (\(f_p\))

The provision function, provides another covariate vector. Because this is a sequential system, the provision function defines the transition between the previous state and the current state. This special case of the provision function is known as a transition function:

\[\tilde{p}_{1t} = f_p(\tilde{s}_{1,t-1}) = \tilde{s}_{1,t-1}\]

\[\tilde{p}_{2t} = f_p(\tilde{s}_{2,t-1}) = \tilde{s}_{2,t-1}\]

When we add system noise the next state is produced:

\[\tilde{s}_{1t} = \mathcal{N}(\tilde{p}_{1t}, \tilde{\tau}_{s_1})\]

\[\tilde{s}_{2t} = \mathcal{N}(\tilde{p}_{2t}, \tilde{\tau}_{s_2})\]

The tildes indicate that the parameters and variables are hidden and not observed.

## provision function, provides another covariate vector
function fₚ(; s̃ₜ₋₁)
    return s̃ₜ₋₁
end
## fₚ(s̃ₜ₋₁=_s̃₁₍ₜ₋₁₎), fₚ(s̃ₜ₋₁=_s̃₂₍ₜ₋₁₎) 
fₚ (generic function with 1 method)

Response function (\(f_r\))

The response function, provides the response to the covariate vector. \[\tilde{r}_{t} = f_r(\tilde{s}_{1t}, \tilde{s}_{2t}) = f(\tilde{s}_{1t}, \tilde{s}_{2t}) = \tilde{s}_{1t} + \tilde{s}_{2t} \]

After combining with observation noise the observation is produced:

\[y_{t} = \mathcal{N}(\tilde{r}_{t}, \tilde{\sigma}^2_V)\]

The tildes indicate that the parameters and variables are hidden and not observed.

## response function, provides the response to covariate vector
function fᵣ(; f, s̃₁ₜ, s̃₂ₜ)
    return f(s̃₁ₜ, s̃₂ₜ)
end
## fᵣ(; f=+, s̃₁ₜ=7.0, s̃₂ₜ=8.0)
fᵣ (generic function with 1 method)

Next, we create the sim_batch_data(...) function that accepts an argument \(f\) which is a function that captures how the two signals are coupled or combined. We will call the sim_batch_data() function with the \(+\) function because the product sales volume is simply added to the rental sales volume.

The function will return the real (but hidden) signals \(s̃₁\) and \(s̃₂\) (used for comparison later) and their coupled/combined version \(y\). We will use \(y\) (the published combined sales volume) as our observations during the inference. We also assume that \(y\) is corrupted with some observation noise.

## Data comes from either a simulation/lab (sim|lab) OR from the field (fld)
## Data are handled either in batches (batch) OR online as individual points (point)
## Batch data accumulates either
    ## along the depth/examples dimension/axis (into the screen/page), OR
        ## typical for supervised & unsupervised learning
    ## along the time dimension/axis (down the screen page)
        ## typical for sequential decision learning (reinforcement learning & active inference)
# function sim_batch_data(f, T; seed=123, s̃₁₍ₜ₋₁₎, s̃₂₍ₜ₋₁₎, τ̃_s₁, τ̃_s₂, σ̃²ᵥ)
function sim_batch_data(f, T; seed, s̃₁₍ₜ₋₁₎, s̃₂₍ₜ₋₁₎, τ̃_s₁, τ̃_s₂, σ̃²ᵥ)
    rng = StableRNG(seed)
    p̃₁ = Vector{Float64}(undef, T)
    s̃₁ = Vector{Float64}(undef, T)
    p̃₂ = Vector{Float64}(undef, T)
    s̃₂ = Vector{Float64}(undef, T)
= Vector{Float64}(undef, T)
    y = Vector{Float64}(undef, T)
    for t in 1:T
        p̃₁[t] = fₚ(s̃ₜ₋₁=s̃₁₍ₜ₋₁₎)
        s̃₁[t] = rand(rng, Normal(p̃₁[t], sqrt(1.0/τ̃_s₁)))
        p̃₂[t] = fₚ(s̃ₜ₋₁=s̃₂₍ₜ₋₁₎)
        s̃₂[t] = rand(rng, Normal(p̃₂[t], sqrt(1.0/τ̃_s₂)))
        r̃[t] = fᵣ(f=f, s̃₁ₜ=s̃₁[t], s̃₂ₜ=s̃₂[t])
        y[t] = rand(rng, Normal(r̃[t], sqrt(σ̃²ᵥ)))
        s̃₁₍ₜ₋₁₎ = s̃₁[t]
        s̃₂₍ₜ₋₁₎ = s̃₂[t]
    end
    return s̃₁, s̃₂, y
end
sim_batch_data (generic function with 1 method)

First, let us create a synthetic dataset:

_s̃₁₍ₜ₋₁₎ = 40.0
_s̃₂₍ₜ₋₁₎ = 20.0

_τ̃_s₁ = 1.0
_τ̃_s₂ = 1.0

_σ̃²ᵥ = 20.0 ##observation noise
_seed = 222
_T = 250
250
_s̃₁, _s̃₂, _y = sim_batch_data(
    +, _T; 
    seed=_seed,
    s̃₁₍ₜ₋₁₎=_s̃₁₍ₜ₋₁₎, s̃₂₍ₜ₋₁₎=_s̃₂₍ₜ₋₁₎, 
    τ̃_s₁=_τ̃_s₁, τ̃_s₂=_τ̃_s₂, 
    σ̃²ᵥ=_σ̃²ᵥ);
_s̃₁
250-element Vector{Float64}:
 38.17351536500037
 38.77424066555762
 38.0649901128261
 37.88494841327824
 38.32316266962699
 37.9709503143836
 37.92445739678425
 39.242134600059345
 39.75171798148435
 39.00489489933928
  ⋮
  5.229725647688231
  6.1355497052670325
  6.004826916013331
  6.285939765621888
  6.169350874135184
  5.982917077260892
  8.04753374496972
  6.996226689895888
  6.879849314673978
_s̃₂
250-element Vector{Float64}:
 20.827514379359563
 21.836968985674375
 23.319198323841803
 23.800911955617885
 23.690922682811433
 22.885366024383806
 22.872966512102174
 21.67113389505316
 22.684780998372805
 19.758996186852205
  ⋮
 31.085429600320868
 32.03089890414834
 31.826225724635243
 30.535300132388816
 30.68245473546646
 30.705134833295908
 30.23491684777188
 30.10640420518613
 30.761763390055563
_y
250-element Vector{Float64}:
 51.055772694424824
 59.95111148432222
 55.05654278445259
 62.45589589491548
 57.256104599566946
 66.24912791331924
 63.3860861039736
 66.46907679463577
 65.17701380960303
 59.18125677519563
  ⋮
 34.39902507080608
 38.54427167243298
 40.46984799252074
 43.82615990686573
 38.467290944324475
 35.19221692251506
 39.40013064479152
 40.61920355195525
 39.98054648387227
_p1 = plot(title="Hidden product volume and rental volume")
_p1 = plot!(_p1, _s̃₁, label=L"\mathrm{Product\ sales\ volume\ } \tilde{s}_1", c="red")
_p1 = plot!(_p1, _s̃₂, label=L"\mathrm{Rental\ sales\ volume\ } \tilde{s}_2", c="blue")

_p2 = plot(title="Published combined volume with noise " * L"y = \tilde{s}_1 + \tilde{s}_2")
_p2 = scatter!(_p2, _y, ms=3, color=:purple, label=L"y")
plot(_p1, _p2, size=(700, 600), layout=(2,1))

The Generative Model

Next, we create a probabilistic model to perform inference. We use the RxInfer package:

@model function decouple_model(f, y, m₁₀, τ₁₀, a₁, b₁, m₂₀, τ₂₀, a₂, b₂, a_y, b_y)
    s₁₀ ~ Normal(mean=m₁₀, precision=τ₁₀)
    τ_s₁ ~ Gamma(shape=a₁, rate=b₁)
    s₂₀ ~ Normal(mean=m₂₀, precision=τ₂₀)
    τ_s₂ ~ Gamma(shape=a₂, rate=b₂)
    τ_y ~ Gamma(shape=a_y, rate=b_y)
    s₁₍ₜ₋₁₎ = s₁₀
    s₂₍ₜ₋₁₎ = s₂₀
    local s₁
    local s₂
    local z
    for t in 1:length(y)
        s₁[t] ~ Normal(mean=s₁₍ₜ₋₁₎, precision=τ_s₁)
        s₂[t] ~ Normal(mean=s₂₍ₜ₋₁₎, precision=τ_s₂)
        z[t] ~ f(s₁[t], s₂[t])
        y[t] ~ Normal(mean=z[t], precision=τ_y)
        s₁₍ₜ₋₁₎ = s₁[t]
        s₂₍ₜ₋₁₎ = s₂[t]
    end
end

RxInfer runs Bayesian inference as a variational optimisation procedure between the real solution and its variational proxy q. In our model specification we assumed noise components to be unknown so we need to enforce a structured mean-field assumption for the variational family of distributions q. This inevitably reduces the accuracy of the result, but makes the task easier and allows for fast and analytical message passing-based variational inference:

_constraints = @constraints begin
    q(s₁₀, s₂₀, s₁, s₂, τ_s₁, τ_s₂, τ_y, z) = q(s₁, s₁₀, s₂, s₂₀, z)q(τ_s₂)q(τ_s₁)q(τ_y)
end
Constraints: 
  q(s₁₀, s₂₀, s₁, s₂, τ_s₁, τ_s₂, τ_y, z) = q(s₁, s₁₀, s₂, s₂₀, z)q(τ_s₂)q(τ_s₁)q(τ_y)

Next, we need to assign priors, initialize the messages needed, and the marginals, and then call the infer function:

_m₁₀, _τ₁₀ = 40.0, 0.10
_m₂₀, _τ₂₀ = 20.0, 0.10

## strong priors for the random walk noise components
_a₁, _b₁ = 0.02, 0.1var(_s̃₁)
_a₂, _b₂ = 0.02, 0.1var(_s̃₂)

## vague prior for the observation noise
_a_y, _b_y = 1.0, 1.0

## strong priors for messages
_s₁_init = map(r -> NormalMeanPrecision(r, _τ₁₀), reverse(range(-50, -20, length=_T)))
_s₂_init = map(r -> NormalMeanPrecision(r, _τ₂₀), range(20, 50, length=_T))

_init = @initialization begin
    μ(s₁) = _s₁_init
    μ(s₂) = _s₂_init
    q(τ_s₁) = GammaShapeRate(_a₁, _b₁)
    q(τ_s₂) = GammaShapeRate(_a₂, _b₂)
    q(τ_y) = GammaShapeRate(_a_y, _b_y)
end

_result = infer(
    model = decouple_model(
        f=+, 
        m₁₀=_m₁₀, τ₁₀=_τ₁₀, a₁=_a₁, b₁=_b₁, 
        m₂₀=_m₂₀, τ₂₀=_τ₂₀, a₂=_a₂, b₂=_b₂, 
        a_y=_a_y, b_y=_b_y),
    data  = (y = _y,), 
    options = (limit_stack_depth = 500, ), 
    constraints = _constraints, 
    initialization = _init,
    iterations = 50
)
Inference results:
  Posteriors       | available for (s₂, τ_s₂, s₁₀, s₂₀, τ_s₁, s₁, z, τ_y)
_result.posteriors
Dict{Symbol, Vector} with 8 entries:
  :s₂   => Vector{NormalWeightedMeanPrecision{Float64}}[[NormalWeightedMeanPrec…
  :τ_s₂ => GammaShapeRate{Float64}[GammaShapeRate{Float64}(a=125.02, b=1057.12)…
  :s₁₀  => NormalWeightedMeanPrecision{Float64}[NormalWeightedMeanPrecision{Flo…
  :s₂₀  => NormalWeightedMeanPrecision{Float64}[NormalWeightedMeanPrecision{Flo…
  :τ_s₁ => GammaShapeRate{Float64}[GammaShapeRate{Float64}(a=125.02, b=1670.59)…
  :s₁   => Vector{NormalWeightedMeanPrecision{Float64}}[[NormalWeightedMeanPrec…
  :z    => Vector{NormalWeightedMeanPrecision{Float64}}[[NormalWeightedMeanPrec…
  :τ_y  => GammaShapeRate{Float64}[GammaShapeRate{Float64}(a=126.0, b=126.383),…

The inference results:

_τ_s₁_marginals = _result.posteriors[:τ_s₁]
_τ_s₂_marginals = _result.posteriors[:τ_s₂]
_τ_y_marginals = _result.posteriors[:τ_y]

_z_marginals = _result.posteriors[:z]
_s₁_marginals = _result.posteriors[:s₁]
_s₂_marginals = _result.posteriors[:s₂];
_p1 = plot(legend=:bottomleft, title="Estimated product volume and rental volume")
_p1 = plot!(_p1, _s̃₁, label=L"\mathrm{Product\ sales\ volume\ } \tilde{s}_1", c="red", linestyle=:dash)
_p1 = plot!(_p1, mean.(_s₁_marginals[end]), ribbon=var.(_s₁_marginals[end]), label=L"\mathrm{Estimated\ Product\ sales\ volume\ } \tilde{s}_1", color=:pink)

_p1 = plot!(_p1, _s̃₂, label=L"\mathrm{Rental\ sales\ volume\ } \tilde{s}_2", color=:blue, linestyle=:dash)
_p1 = plot!(_p1, mean.(_s₂_marginals[end]), ribbon=var.(_s₂_marginals[end]), label=L"\mathrm{Estimated\ Rental\ sales\ volume\ } \tilde{s}_2", color=:lightblue)

_p2 = plot(legend=:bottomright, title="Estimated combined volume")
_p2 = scatter!(_p2, _y, label="Observations (published combined volume with noise))", ms=2, alpha=0.5, color=:orange)
_p2 = plot!(_p2, mean.(_z_marginals[end]), ribbon=std.(_z_marginals[end]), label="Combined estimated volume", color="plum")

plot(_p1, _p2, size=(700, 600), layout=(2,1))

The inferred hidden signals track the their hidden counterparts initially, but then progressively deviates increasingly. We can conclude that the product volume declines consistently, but that the rental volume keeps on increasing, or at least does not show a declining trend. This kind of information could be valuable for the competitor that approached us. Based on the simulation results, they may just decide to position themselves to compete more aggresively in the product space (due to reduced competition), rather than the rental space.