Practice Exercises
Extensions of Geostatistical Modelling with inlabru
1 Introduction
These exercises are designed as follow-up practice for the workshop
“Geostatistical Modelling with inlabru”, using synthetic onchocerciasis data for Nigeria.
- All data are in the
oncho_data/folder and documented inoncho_data/README.md. - You can download the onchocerciasis practice data as a zip: Onchocerciasis practice data (ZIP)
- The data are documented in:
oncho_data/README.md - You do not need the simulation code.
- Many tasks are extensions of what was taught in the workshop.
- They are marked as Extension and may require you to combine ideas, check documentation, or be creative in your modelling.
You are encouraged to:
- Work in small groups,
- Document your modelling choices,
- Compare different model specifications,
- Reflect on model assumptions and limitations.
You will use the following files:
- Day 1:
spatial_oncho_binomial_data.csv
spatial_oncho_poisson_data.csv
spatiotemporal_oncho_binomial_data.csv
- Day 2:
joint_oncho_processes.csv
nonstationary_oncho_data.csv
- (Optionally)
spatial_oncho_poisson_data.csvfor multi-likelihood extensions
- Day 3:
hybrid_ml_geostatistical_oncho_data.csv
oncho_prediction_grid_utm.csv
2 Day 1 – Spatial & Spatio-Temporal Modelling (Onchocerciasis)
2.1 1.1 Getting started with the spatial binomial data
Use: data/spatial_oncho_binomial_data.csv.
Tasks
Load the data and inspect:
- Plot locations (
utm_x,utm_y) on a map (optionally overlaynga_shapefile). - Summarise
n_examined,n_pos, and empirical prevalence.
- Plot locations (
Fit a non-spatial binomial model:
- Response:
n_pos/n_examined.
- Predictors: at least
river_index,north_index,elevation,ndvi,urban, and 1–2 others of your choice (rainfall,temperature,pop_density).
- Response:
Inspect model residuals:
- Map residuals.
- Compute an empirical variogram of residuals.
Fit a spatial binomial model in inlabru:
- Use UTM coordinates to build a mesh and SPDE model.
- Include the same fixed effects as in (2), plus a spatial random field.
Compare:
- How do coefficient estimates and uncertainties change when you add the spatial field?
- Does the spatial field capture meaningful residual structure?
Extension A (not taught explicitly)
Fit a binomial model with a non-standard link (e.g., complementary log–log) and compare:
- Does this change the interpretation of covariate effects?
- How do fitted probabilities change in high-risk vs low-risk areas?
2.2 1.2 Mesh design and sensitivity
Use: spatial_oncho_binomial_data.csv.
Tasks
Construct two different meshes:
- Mesh 1: coarser (fewer nodes, larger max edge).
- Mesh 2: finer (more nodes, smaller max edge, better boundary resolution).
- Mesh 1: coarser (fewer nodes, larger max edge).
Fit the same spatial binomial model under both meshes.
Compare:
- Posterior range and marginal variance of the spatial field.
- Predicted prevalence surface on a grid (using
oncho_prediction_grid_utm.csv). - Computation time and stability.
Extension B (not taught explicitly)
Experiment with different PC priors for range and marginal standard deviation:
- Make range smaller vs larger a priori.
- Discuss how prior choices influence the spatial field and predictions.
- Make range smaller vs larger a priori.
2.3 1.3 Spatial Poisson incidence model
Use: spatial_oncho_poisson_data.csv.
Tasks
Fit a Poisson spatial model:
- Response:
cases, offset:log(population).
- Predictors: at least
river_index,north_index,ndvi,rainfall,temperature, and 1–2 others. - Include a spatial SPDE field.
- Response:
Map the predicted incidence rate and compare to observed crude rates.
Inspect posterior uncertainty (e.g., posterior SD) of the spatial field.
Extension C (not taught explicitly)
Fit a negative binomial spatial model (if you know how to specify one in INLA/inlabru):
- Compare fit (WAIC or similar) with the Poisson model.
- Discuss whether overdispersion appears to matter for this dataset.
2.4 1.4 Spatio-temporal prevalence of onchocerciasis
Use: spatiotemporal_oncho_binomial_data.csv.
Tasks
Explore:
- Plot prevalence over time for a few randomly selected sites.
- Map aggregated prevalence by time (e.g., mean per time point).
Fit a separable spatio-temporal model:
- Spatial SPDE field.
- Temporal RW1 or RW2 effect (
timeindex). - Fixed effects: a subset of the covariates (e.g.,
river_index,north_index,ndvi).
Generate predicted prevalence surfaces for at least two time points
(e.g., earliest and latest).
Extension D (not taught explicitly)
Add a space–time interaction term:
- For example, a group-indexed spatial field varying by a simple time grouping
(early vs late), or another structure not covered in detail in the workshop. - Compare model fit and predicted dynamics to the separable model:
- Do you see evidence that the spatial pattern changes over time?
- For example, a group-indexed spatial field varying by a simple time grouping
3 Day 2 – Joint & Non-Stationary Processes (Onchocerciasis)
3.1 2.1 Joint modelling of multiple onchocerciasis processes
Use: joint_oncho_processes.csv.
Tasks
Explore the data:
- How do
n_examinedandn_posdiffer betweengrouplevels? - Are there obvious spatial differences between groups?
- How do
Fit a joint binomial model with:
- Shared spatial field across groups,
- Group-specific intercepts,
- Common covariate effects (e.g.,
river_index,north_index,ndvi).
Extend to a model with group-specific covariate effects for at least one key predictor
(e.g.,river_index×group).Map group-specific predicted prevalence and group-specific spatial fields (if used).
Extension E (not fully taught)
Fit a model that includes a group-specific spatial field component in addition to the shared field:
- Compare the magnitude of shared vs group-specific spatial variability.
- Discuss when adding group-specific spatial fields is justified and when it might overfit.
3.2 2.2 Non-stationary spatial structure
Use: nonstationary_oncho_data.csv.
Tasks
Explore:
- Plot
weight_northagainst latitude (if you convert UTM to lat/long or just inspectweight_north).
- Map
prevalenceandS_nonstationary.
- Plot
Fit a stationary spatial model (single SPDE field):
- Response:
n_pos/n_examined, - Fixed effects: some covariates (e.g.,
river_index,north_index,ndvi), - One spatial field.
- Response:
Compare the model’s spatial residuals to
S_nonstationary(roughly) and comment on where the stationary model seems to misrepresent the pattern.
Extension F (not taught explicitly)
Propose and fit a non-stationary model, for example:
- Two spatial fields with different ranges, combined with a spatially varying weight
(e.g., function ofnorth_index), - Or spatially varying coefficients for
river_index(e.g., effect stronger in the north).
- Two spatial fields with different ranges, combined with a spatially varying weight
Compare:
- Predictions,
- Uncertainty,
- A model selection criterion (e.g., WAIC), between the stationary and non-stationary models.
3.3 2.3 Multi-likelihood modelling (binomial + Poisson)
Use: spatial_oncho_binomial_data.csv and spatial_oncho_poisson_data.csv.
These datasets were not simulated at identical coordinates, but you can link them approximately.
Tasks
Construct an approximate joint dataset:
- For each binomial location, find the closest Poisson location (e.g., using nearest neighbour based on UTM coordinates).
- Create a combined data frame that records:
- Binomial outcome (
n_pos,n_examined), - Poisson outcome (
cases,population), - Shared covariates (e.g., from the binomial dataset).
- Binomial outcome (
Fit a joint model with:
- Binomial likelihood for the prevalence data.
- Poisson likelihood (with offset) for the count data.
- A shared spatial field.
Extension G (not taught in detail)
Compare:
- Separate models for each outcome vs the joint model.
- Spatial fields and covariate effects.
- Discuss how the joint model “borrows strength” between data sources and when that might be beneficial or risky.
4 Day 3 – Hybrid Machine Learning + Geostatistics
4.1 3.1 ML-only models
Use: hybrid_ml_geostatistical_oncho_data.csv.
Tasks
Split data:
- Randomly assign ~70% of locations to a training set and the rest to a test set.
Fit at least two ML models (outside inlabru), for example:
- Random Forest,
- XGBoost,
- Gradient Boosting Machine, or others you know.
Use covariates such as:
elevation,ndvi,urban,rainfall,temperature,pop_density,river_index,north_index.Evaluate ML predictions on the test set:
- Use appropriate metrics (e.g., RMSE on prevalence, Brier score, etc.).
- Inspect variable importance (if available).
Compute and map residuals (observed – predicted) on the full dataset.
Extension H (not taught explicitly)
Compute a spatial variogram of residuals and assess:
- Whether there is remaining spatial structure.
- Which ML method leaves the least spatial structure.
4.2 3.2 Hybrid ML + spatial residuals in inlabru
Use: hybrid_ml_geostatistical_oncho_data.csv.
Tasks
For your best ML model from 3.1:
- Save the fitted values at all locations.
- Add them to the data as a new covariate (e.g.
ml_pred).
Fit a hybrid model in inlabru:
- Use
ml_predas an offset or fixed covariate. - Add a spatial SPDE residual field.
- Use
Compare:
- Spatial residual field from the hybrid model,
- Residuals from the ML-only model.
Map:
- Hybrid model predictions on the
oncho_prediction_grid_utm.csvgrid. - Posterior SD of the spatial residual field.
- Hybrid model predictions on the
Extension I (not taught explicitly)
Try a two-stage hybrid:
- Fit the ML model on the training set only.
- Use the spatial residual model on top of out-of-sample ML predictions.
- Evaluate on the test set whether the hybrid model systematically improves predictions.
4.3 3.3 Robustness and scenario experiments
Use: hybrid_ml_geostatistical_oncho_data.csv.
Tasks
Create modified versions of the dataset with covariate issues:
- Version A: randomly set 20–30% of
rainfallvalues toNA. - Version B: add strong noise to
ndviin the north only (e.g., for highnorth_index).
- Version A: randomly set 20–30% of
Refit:
- ML-only model,
- Hybrid model (ML + spatial),
to each version.
Compare:
- Prediction performance (on the original uncorrupted test subset, if you keep one aside).
- Stability of spatial fields and covariate effects.
Extension J (not taught explicitly)
Write a short report (~1–2 pages) summarising:
- When the spatial residual component helps buffer against covariate problems,
- When it fails to compensate,
- Implications for real-world onchocerciasis mapping in Nigeria.
5 Optional: Cross-Day Extension Projects
If you have time and interest, you can combine ideas from multiple days:
Full pipeline for onchocerciasis risk mapping:
- Start from a spatial binomial model,
- Add non-stationarity or multiple groups,
- Integrate ML for complex covariate effects,
- Produce prediction and uncertainty maps for Nigeria,
- Reflect on choices at each step.
Model comparison and reporting:
- Pick 3–4 models (e.g., stationary vs non-stationary, GLM vs spatial vs hybrid ML),
- Compare them using a consistent set of metrics (WAIC, cross-validation, residual diagnostics),
- Produce a short “modelling report” as if for a real onchocerciasis mapping project.