Day 2 – Joint modelling and Non-stationary processes

Olatunji Johnson

Day 2 goals

Today we focus on two ideas that come up constantly in disease mapping:

Joint modelling
- multiple related malaria outcomes
- shared and outcome-specific spatial structure
- multiple likelihoods (Binomial + Poisson)
Non-stationary spatial processes
- when “one Matérn everywhere” is not realistic
- two practical constructions:
  - mixture of SPDEs (smooth/rough)
  - structured range (range changes with a covariate)

Why do we need joint models?

You often have multiple imperfect views of the same latent risk:

prevalence surveys (direct infection measurements)
facility case counts (incidence proxy)
different risk groups (children vs pregnant women)
different diagnostics / instruments

Separate models can disagree and waste information.

Joint models let us:

borrow strength (share information)
separate common risk from process-specific effects
improve predictions where one dataset is sparse

Key idea: latent spatial process

We posit a latent spatial surface (S(s)) that captures residual dependence:

nearby locations share similar unobserved risk drivers
induces spatial correlation in the data

In MBG terms:

\[\eta(s) = \beta_0 + \beta^\top x(s) + S(s)\]

and data are conditionally independent given (S(s)).

Part A — Joint modelling (same likelihood, two groups)

Example: two prevalence processes

Suppose we have:

children <5 prevalence data
pregnant women prevalence data

At each location \(s_i\), group \(g\in\{c,p\}\):

\[ Y_{ig} \mid p_{ig} \sim \text{Binomial}(N_{ig}, p_{ig}) \]

\[ \text{logit}(p_{ig}) = \eta_{ig} \]

Decomposition: shared + group-specific spatial fields

A common and interpretable joint structure:

\[ \eta_{ig} = \beta_{0g} + \beta^\top x_i + S_0(s_i) + S_g(s_i) \]

\(S_0(s)\): shared spatial field (common malaria risk)
\(S_g(s)\): group-specific field (deviations for group (g))
\(\beta_{0g}\): group-specific baseline

Interpretation

If \(S_g(s)\approx 0\), groups share the same spatial pattern
Where \(S_g(s)\neq 0\), groups differ spatially

Why is this useful?

This structure answers practical questions:

What spatial variation is shared across groups?
Where do children and pregnant women show different risk?
Do covariates explain both outcomes similarly?

It also improves predictions:

locations with sparse pregnant data can borrow information from children data (through \(S_0(s)\))

Identifiability and scaling

Potential issue: \(S_0\) and \(S_g\) can compete.

Typical solutions:

set priors so that group-specific fields are “smaller” than the shared field
consider constraints or priors on variance:
- \(\sigma^2_{S_g} < \sigma^2_{S_0}\) (probabilistically)

Conceptually: shared structure should capture the dominant signal.

What do we assume about \(S_0(s)\) and \(S_g(s)\)?

A standard choice: Matérn Gaussian random fields

\[ S(s) \sim \text{GRF}\left(0, \text{Matérn}(\rho, \sigma^2)\right) \]

\(\rho\): range (how quickly correlation decays)
\(\sigma^2\): marginal variance

Stationarity here means \(\rho,\sigma^2\) are constant over space.

Mesh and SPDE

Mesh nodes: basis functions
Field represented as:

\[ S(s) \approx \sum_{k=1}^K w_k \phi_k(s) \]

where \(w_k\) are GMRF weights.

Joint model uses the same mesh for \(S_0\) and \(S_g\), but:

\(S_0\): one field
\(S_g\): replicated by group

Part B — Joint modelling with multiple likelihoods

Motivation: prevalence + case counts

Often you have:

Binomial prevalence surveys:
- \(Y_i\) positives out of \(N_i\) tested
Poisson facility case counts:
- \(C_i\) cases with exposure \(E_i\) (population, person-time, etc.)

Each dataset is incomplete on its own.

Joint modelling combines them while respecting their data-generating mechanisms.

The two likelihoods

Prevalence

\[ Y_i \mid p_i \sim \text{Binomial}(N_i, p_i),\quad \text{logit}(p_i)=\eta_i^{(B)} \]

Counts

\[ C_i \mid \lambda_i \sim \text{Poisson}(E_i\lambda_i),\quad \log(\lambda_i)=\eta_i^{(P)} \]

Different link functions:

logit for probabilities
log for rates

Linking them through shared latent structure

A flexible shared-structure joint model:

\[ \eta_i^{(B)} = \beta_0^{(B)} + \beta^\top x_i + S_0(s_i) + S_B(s_i) \]

\[ \eta_i^{(P)} = \beta_0^{(P)} + \beta^\top x_i + \alpha S_0(s_i) + S_P(s_i) \]

\(S_0(s)\): shared field
\(S_B(s)\): prevalence-specific residual spatial field
\(S_P(s)\): incidence-specific residual spatial field
\(\alpha\): scaling (how strongly the shared risk appears in counts)

Interpretation of \(\alpha \in [0, 1]\)

\(\alpha>0\): shared field increases counts where it increases prevalence
\(\alpha\approx 0\): weak coupling between processes
large \(|\alpha|\): strong coupling

This parameter is often crucial:

it encodes the relationship between the two measurements

Why include outcome-specific fields \(S_B, S_P\)?

Because the two processes are not identical:

prevalence and incidence are related but not the same
facility data can reflect access and reporting biases
survey prevalence can reflect age structure and diagnostics

Outcome-specific fields mop up these systematic differences.

What can go wrong?

Two common practical pitfalls:

Confounding:
- covariates and spatial fields compete
Mis-specified exposure \(E_i\):
- Poisson component can dominate if \(E_i\) is wrong scale
Different spatial supports:
- facility counts might be aggregated (administrative units)
- prevalence might be point-referenced

Solutions:

careful covariate selection
informative priors (especially for \(\alpha\))
match spatial support where possible

Part C — Non-stationarity

Why stationarity can be unrealistic

Stationary Matérn assumes:

same smoothness everywhere
same correlation length everywhere

But malaria transmission may be:

smooth in the north (broad ecological gradients)
rough in the south (heterogeneous land use, urbanisation)
different across ecozones

Non-stationarity: the covariance structure changes over space.

Two practical nonstationary constructions

Mixture of SPDEs (smooth + rough)
- simple, and robust
Structured range in the SPDE
- range varies continuously with a covariate
- closer to “mechanistic” covariance modelling

Both are useful; choose based on:

interpretability
stability
computational cost
available covariates

Part C1 — Mixture of two SPDEs

Idea: blend a smooth and a rough field

Let:

\(S_{\text{smooth}}(s)\): long-range Matérn
\(S_{\text{rough}}(s)\): short-range Matérn
\(\omega(s)\in[0,1]\): spatial weight (e.g., function of latitude)

Define:

\[ S(s)=\omega(s)S_{\text{smooth}}(s) + (1-\omega(s))S_{\text{rough}}(s) \]

Then:

\[ \eta(s)=\beta_0+\beta^\top x(s)+S(s) \]

What does \(\omega(s)\) do?

If \(\omega(s)\approx 1\): field behaves like smooth long-range
If \(\omega(s)\approx 0\): field behaves like rough short-range
If \(\omega(s)\) changes with location: correlation structure changes

Interpretation - \(\omega(s)\) is a nonstationarity driver.

Choosing \(\omega(s)\)

Simple choices:

monotone latitude function (north–south structure)
ecozone indicator (piecewise structure)
logistic function of a covariate: \[ \omega(s) = \text{logit}^{-1}(a+bz(s)) \]

In the practical session:

start with latitude-based \(\omega(s)\)
it is easy to explain and visualise

Advantages and limitations

Advantages:

stable in computation
intuitive interpretation (smooth vs rough regions)
easy to communicate to non-statisticians

Limitations:

mixture is a construction, not a single Matérn
interpretation of “local range” is indirect
may require choosing \(\omega(s)\) externally

Part C2 — Structured range in the SPDE

Goal: let the Matérn range vary with a covariate

Recall: for Matérn SPDE models, range is approximately:

\[ \rho(s)\approx \frac{\sqrt{8}}{\kappa(s)} \]

So if we model:

\[ \log \kappa(s)=\theta_0+\theta_1 z(s) \]

then

\[ \rho(s)\text{ varies smoothly with }z(s). \]

What does the sign of \(\theta_1\) mean?

Because \(\rho(s)\propto 1/\kappa(s)\):

If \(\theta_1>0\): \(\kappa(s)\) increases with \(z\) ⇒ range \(\rho(s)\) decreases
If \(\theta_1<0\): \(\kappa(s)\) decreases with \(z\) ⇒ range \(\rho(s)\) increases

So you can encode statements like:

“range is shorter in high-heterogeneity regions”
“range is longer in smooth ecological gradient regions”

Why use mesh-vertex covariates?

In the SPDE approximation:

the field is defined through weights on mesh vertices
so nonstationary parameters need to be specified at the same support (vertices)

Thus \(z(s)\) is evaluated at mesh nodes:

\(z_k = z(\text{vertex}_k)\)

This makes the range a spatially varying parameter.

What changes compared to stationary SPDE?

Stationary SPDE:

one \(\kappa\), one \(\tau\) for all space

Structured-range SPDE:

\(\kappa(s)\) and \(\tau(s)\) can depend on covariates
introduces additional parameters \(\theta\) controlling how they vary

Conceptual result:

correlation length changes smoothly across the map

Relation between \(\tau(s)\), \(\kappa(s)\), and variance

In SPDE Matérn models:

\(\kappa(s)\): controls range
\(\tau(s)\): controls marginal variance (together with \(\kappa\))

So structured modelling can target:

range only (via \(\kappa\))
variance only (via \(\tau\))
or both

Today we emphasise structured range.

Practical guidance for modelling

Start with a simple covariate \(z(s)\) (e.g. northing or ecozone index)
Scale it to have mean 0, sd 1
Use weak-to-moderate priors on the slope parameter to avoid extreme range variation

In interpretation:

show the covariate map
show the predicted prevalence map
(optionally) show a derived “implied range” surface

Summary

Joint models - share spatial information across related outcomes - can handle: - multiple groups (same likelihood) - multiple likelihoods (binomial + poisson)

Nonstationarity - mixture SPDE: simple and robust - structured range SPDE: mechanistic, covariate-driven

Take-away - choose the simplest model that answers your scientific question - use nonstationarity when stationary assumptions are visibly violated