Practice Exercises

Extensions of Geostatistical Modelling with inlabru

Author

Olatunji Johnson

1 Introduction

These exercises are designed as follow-up practice for the workshop
“Geostatistical Modelling with inlabru”, using synthetic onchocerciasis data for Nigeria.

All data are in the oncho_data/ folder and documented in oncho_data/README.md.
You can download the onchocerciasis practice data as a zip: Onchocerciasis practice data (ZIP)
The data are documented in: oncho_data/README.md
You do not need the simulation code.
Many tasks are extensions of what was taught in the workshop.
They are marked as Extension and may require you to combine ideas, check documentation, or be creative in your modelling.

You are encouraged to:

Work in small groups,
Document your modelling choices,
Compare different model specifications,
Reflect on model assumptions and limitations.

You will use the following files:

Day 1:
- spatial_oncho_binomial_data.csv
- spatial_oncho_poisson_data.csv
- spatiotemporal_oncho_binomial_data.csv
Day 2:
- joint_oncho_processes.csv
- nonstationary_oncho_data.csv
- (Optionally) spatial_oncho_poisson_data.csv for multi-likelihood extensions
Day 3:
- hybrid_ml_geostatistical_oncho_data.csv
- oncho_prediction_grid_utm.csv

2 Day 1 – Spatial & Spatio-Temporal Modelling (Onchocerciasis)

2.1 1.1 Getting started with the spatial binomial data

Use: data/spatial_oncho_binomial_data.csv.

Tasks

Load the data and inspect:
- Plot locations (utm_x, utm_y) on a map (optionally overlay nga_shapefile).
- Summarise n_examined, n_pos, and empirical prevalence.
Fit a non-spatial binomial model:
- Response: n_pos / n_examined.
- Predictors: at least river_index, north_index, elevation, ndvi, urban, and 1–2 others of your choice (rainfall, temperature, pop_density).
Inspect model residuals:
- Map residuals.
- Compute an empirical variogram of residuals.
Fit a spatial binomial model in inlabru:
- Use UTM coordinates to build a mesh and SPDE model.
- Include the same fixed effects as in (2), plus a spatial random field.
Compare:
- How do coefficient estimates and uncertainties change when you add the spatial field?
- Does the spatial field capture meaningful residual structure?

Extension A (not taught explicitly)

Fit a binomial model with a non-standard link (e.g., complementary log–log) and compare:
- Does this change the interpretation of covariate effects?
- How do fitted probabilities change in high-risk vs low-risk areas?

2.2 1.2 Mesh design and sensitivity

Use: spatial_oncho_binomial_data.csv.

Tasks

Construct two different meshes:
- Mesh 1: coarser (fewer nodes, larger max edge).
- Mesh 2: finer (more nodes, smaller max edge, better boundary resolution).
Fit the same spatial binomial model under both meshes.
Compare:
- Posterior range and marginal variance of the spatial field.
- Predicted prevalence surface on a grid (using oncho_prediction_grid_utm.csv).
- Computation time and stability.

Extension B (not taught explicitly)

Experiment with different PC priors for range and marginal standard deviation:
- Make range smaller vs larger a priori.
- Discuss how prior choices influence the spatial field and predictions.

2.3 1.3 Spatial Poisson incidence model

Use: spatial_oncho_poisson_data.csv.

Tasks

Fit a Poisson spatial model:
- Response: cases, offset: log(population).
- Predictors: at least river_index, north_index, ndvi, rainfall, temperature, and 1–2 others.
- Include a spatial SPDE field.
Map the predicted incidence rate and compare to observed crude rates.
Inspect posterior uncertainty (e.g., posterior SD) of the spatial field.

Extension C (not taught explicitly)

Fit a negative binomial spatial model (if you know how to specify one in INLA/inlabru):
- Compare fit (WAIC or similar) with the Poisson model.
- Discuss whether overdispersion appears to matter for this dataset.

2.4 1.4 Spatio-temporal prevalence of onchocerciasis

Use: spatiotemporal_oncho_binomial_data.csv.

Tasks

Explore:
- Plot prevalence over time for a few randomly selected sites.
- Map aggregated prevalence by time (e.g., mean per time point).
Fit a separable spatio-temporal model:
- Spatial SPDE field.
- Temporal RW1 or RW2 effect (time index).
- Fixed effects: a subset of the covariates (e.g., river_index, north_index, ndvi).
Generate predicted prevalence surfaces for at least two time points
(e.g., earliest and latest).

Extension D (not taught explicitly)

Add a space–time interaction term:
- For example, a group-indexed spatial field varying by a simple time grouping
  (early vs late), or another structure not covered in detail in the workshop.
- Compare model fit and predicted dynamics to the separable model:
  - Do you see evidence that the spatial pattern changes over time?

3 Day 2 – Joint & Non-Stationary Processes (Onchocerciasis)

3.1 2.1 Joint modelling of multiple onchocerciasis processes

Use: joint_oncho_processes.csv.

Tasks

Explore the data:
- How do n_examined and n_pos differ between group levels?
- Are there obvious spatial differences between groups?
Fit a joint binomial model with:
- Shared spatial field across groups,
- Group-specific intercepts,
- Common covariate effects (e.g., river_index, north_index, ndvi).
Extend to a model with group-specific covariate effects for at least one key predictor
(e.g., river_index × group).
Map group-specific predicted prevalence and group-specific spatial fields (if used).

Extension E (not fully taught)

Fit a model that includes a group-specific spatial field component in addition to the shared field:
- Compare the magnitude of shared vs group-specific spatial variability.
- Discuss when adding group-specific spatial fields is justified and when it might overfit.

3.2 2.2 Non-stationary spatial structure

Use: nonstationary_oncho_data.csv.

Tasks

Explore:
- Plot weight_north against latitude (if you convert UTM to lat/long or just inspect weight_north).
- Map prevalence and S_nonstationary.
Fit a stationary spatial model (single SPDE field):
- Response: n_pos / n_examined,
- Fixed effects: some covariates (e.g., river_index, north_index, ndvi),
- One spatial field.
Compare the model’s spatial residuals to S_nonstationary (roughly) and comment on where the stationary model seems to misrepresent the pattern.

Extension F (not taught explicitly)

Propose and fit a non-stationary model, for example:
- Two spatial fields with different ranges, combined with a spatially varying weight
  (e.g., function of north_index),
- Or spatially varying coefficients for river_index (e.g., effect stronger in the north).
Compare:
- Predictions,
- Uncertainty,
- A model selection criterion (e.g., WAIC), between the stationary and non-stationary models.

3.3 2.3 Multi-likelihood modelling (binomial + Poisson)

Use: spatial_oncho_binomial_data.csv and spatial_oncho_poisson_data.csv.

These datasets were not simulated at identical coordinates, but you can link them approximately.

Tasks

Construct an approximate joint dataset:
- For each binomial location, find the closest Poisson location (e.g., using nearest neighbour based on UTM coordinates).
- Create a combined data frame that records:
  - Binomial outcome (n_pos, n_examined),
  - Poisson outcome (cases, population),
  - Shared covariates (e.g., from the binomial dataset).
Fit a joint model with:
- Binomial likelihood for the prevalence data.
- Poisson likelihood (with offset) for the count data.
- A shared spatial field.

Extension G (not taught in detail)

Compare:
- Separate models for each outcome vs the joint model.
- Spatial fields and covariate effects.
- Discuss how the joint model “borrows strength” between data sources and when that might be beneficial or risky.

4 Day 3 – Hybrid Machine Learning + Geostatistics

4.1 3.1 ML-only models

Use: hybrid_ml_geostatistical_oncho_data.csv.

Tasks

Split data:
- Randomly assign ~70% of locations to a training set and the rest to a test set.
Fit at least two ML models (outside inlabru), for example:
- Random Forest,
- XGBoost,
- Gradient Boosting Machine, or others you know.
Use covariates such as: elevation, ndvi, urban, rainfall, temperature, pop_density, river_index, north_index.
Evaluate ML predictions on the test set:
- Use appropriate metrics (e.g., RMSE on prevalence, Brier score, etc.).
- Inspect variable importance (if available).
Compute and map residuals (observed – predicted) on the full dataset.

Extension H (not taught explicitly)

Compute a spatial variogram of residuals and assess:
- Whether there is remaining spatial structure.
- Which ML method leaves the least spatial structure.

4.2 3.2 Hybrid ML + spatial residuals in inlabru

Use: hybrid_ml_geostatistical_oncho_data.csv.

Tasks

For your best ML model from 3.1:
- Save the fitted values at all locations.
- Add them to the data as a new covariate (e.g. ml_pred).
Fit a hybrid model in inlabru:
- Use ml_pred as an offset or fixed covariate.
- Add a spatial SPDE residual field.
Compare:
- Spatial residual field from the hybrid model,
- Residuals from the ML-only model.
Map:
- Hybrid model predictions on the oncho_prediction_grid_utm.csv grid.
- Posterior SD of the spatial residual field.

Extension I (not taught explicitly)

Try a two-stage hybrid:
- Fit the ML model on the training set only.
- Use the spatial residual model on top of out-of-sample ML predictions.
- Evaluate on the test set whether the hybrid model systematically improves predictions.

4.3 3.3 Robustness and scenario experiments

Use: hybrid_ml_geostatistical_oncho_data.csv.

Tasks

Create modified versions of the dataset with covariate issues:
- Version A: randomly set 20–30% of rainfall values to NA.
- Version B: add strong noise to ndvi in the north only (e.g., for high north_index).
Refit:
- ML-only model,
- Hybrid model (ML + spatial),
to each version.
Compare:
- Prediction performance (on the original uncorrupted test subset, if you keep one aside).
- Stability of spatial fields and covariate effects.

Extension J (not taught explicitly)

Write a short report (~1–2 pages) summarising:
- When the spatial residual component helps buffer against covariate problems,
- When it fails to compensate,
- Implications for real-world onchocerciasis mapping in Nigeria.

5 Optional: Cross-Day Extension Projects

If you have time and interest, you can combine ideas from multiple days:

Full pipeline for onchocerciasis risk mapping:
- Start from a spatial binomial model,
- Add non-stationarity or multiple groups,
- Integrate ML for complex covariate effects,
- Produce prediction and uncertainty maps for Nigeria,
- Reflect on choices at each step.
Model comparison and reporting:
- Pick 3–4 models (e.g., stationary vs non-stationary, GLM vs spatial vs hybrid ML),
- Compare them using a consistent set of metrics (WAIC, cross-validation, residual diagnostics),
- Produce a short “modelling report” as if for a real onchocerciasis mapping project.