Spatial Probit Model Using Gaussian Random Field

Mar 14, 2019 11 min read

Introduction

Spatial probit models is very popular in spatial econometrics and the book of J. LeSage and Pace (2009) gives a very good overview. This is basically an extension of probit model when one is interested to adjust for both fixed and spatial random effect. J. P. LeSage (2000) model the spatial random effect using the class of Gaussian Markov Random Field (GMRF) introduced by Besag, York, and Mollié (1991) and provides a Bayesian procedure for fitting this class of model. Smith and LeSage (2004) further extends the probit model to allow for spatial random effect and individual random effect.

Most of the extension to allow for spatial dependence structure utilises Simultatenous Autoregressive model (SAR) and Conditional autoregressive model (CAR) - a class of Gaussian markov random field (GMRF). The use of Guassian Random Field (GFR) is not popular for probit model, infact we cannot find any in literature. It was pointed out in Brooks et al. (2011) that estimating the covariance parameter of the GRF in a spatial probit model suffers from identifiability issues. And noted that a binary outcome contain less information about the magnitude of dependence. Also suggest using GMRF because “it aggregates pieces of binary information from the neigbouring regions to better estimate spatial depencies”. However, this is true because the the spatial dependence parameter is linear in the linear predictor and there exist a nice framework proposed by Besag, York, and Mollié (1991) that allows for this. It is important to note that there are several debates on the interpretation of the spatial dependence paramter of a CAR or SAR model. By using GMRF models, an explicit assumption that is made is that the spatial dependence parameter reflects a situation where values observed at one region, say observation $i$, depend on the values of neighboring observations at nearby regions. And using a GRF, we basically assume that there is a spatial continuous process that generate any observation at any region.

Methodological Framework

Let $Y_{ijk}$ be the binary outcome for an admission $i= 1, \ldots, L$ for patient $k= 1, \ldots, N$ living in region $j= 1, \ldots, M$. A augumented probit model proposed by Albert and Chib (1993) usually assume that the outcome is generated through an underlying latent variable, $Z_{ijk}, with the following hierarchical structure.

\[ Y_{ijk} = \begin{cases} 1 & Z_{ijk} > 0 \\ 0 & Z_{ijk} \leq 0 \end{cases} \]

\[Z_{ijk} = d_{ijk}^\top \beta + S_j + U_{k} + \epsilon_{ijk}\]

with the following prior distributions

\[\epsilon_{ijk} \sim \mathcal{N}(0, 1)\]

\[\beta \sim \mathcal{N}(\mu_\beta, \Sigma_\beta)\]

\[S \sim \mathcal{N}(0, \Sigma(\theta))\]

\[Z \sim \mathcal{N}(0, \tau^2I) \]

where $d_{ijk}$ is the vector of $p$ covariates with associated coefficient $\beta$, $S_j$ is the region specific random effect and it is define using GRF and $U_{k}$ is an patient or individual random effect, $\epsilon_{ijk}$ is the error term.

can be re-written in vector form as

\[Z = D^\top \beta + \Delta_1S + \Delta_2 U + \epsilon\] where $Z=(Z_{ijk}: i = 1, \ldots, L, j= 1, \ldots, M, k = 1, \ldots, N)'$, $S=(S_{j}: j= 1, \ldots, M)'$, $U=(U_{j}: j= 1, \ldots, U)'$, \[ \Delta_1 = \begin{pmatrix} 1 & & \\ &\ddots &\\ & & 1 \end{pmatrix}\]

The joint posterior distribution of $\beta$, $\theta$, $\tau^2$, $S$, $U$ and $Z$ is given by,

2 \[ \label{eq:post} p(\beta, \theta, \tau^2, S, U, Z | Y) = \pi(\beta) \pi(\theta) \pi(\tau^2) f(S| \theta) f(U | \tau^2) f(Z | S, U, \beta, \theta, \tau^2) f(Y|Z) \]

We use MCMC algorithm to update these parameters in turn

Initialise $\beta$, $\theta$, $\tau^2$, $S$, $U$ and $Z$
Update joint posterior of $\theta$ using random-walk Metropolis Hastings (RWMH)
Update joint posterior of $\tau$ using random-walk Metropolis Hastings (RWMH)
Update joint posterior of $\beta$ using a Gibbs step. The posterior distribution for $\beta$ is Gaussian since we assign a conjugate prior on $\beta$, hence \[ [\beta| \cdot] = \mathcal{N}(M_\beta, V_\beta),\] where \[V_\beta = (D^\top D + \Sigma_\beta^{-1})^{-1} \] and \[M_\beta = V_\beta (D^\top (Z- (S+U)) + \Sigma_\beta^{-1} \mu_\beta) \]
Update joint posterior of $S$ using a Gibbs step. The posterior distribution for $S$ is Gaussian since we assign a conjugate prior on $S$, hence \[ [S| \cdot] \sim \mathcal{N}(M_S, V_S),\] where \[V_S = (\Delta_1^\top \Delta_1 + \Sigma_S^{-1})^{-1} \] and \[M_S = V_S (\Delta_1^\top (Z- (D\beta+ \Delta_2U))) \]
Update joint posterior of $U$ using a Gibbs step. The posterior distribution for $U$ is Gaussian since we assign a conjugate prior on $U$, hence \[ [U| \cdot] \sim \mathcal{N}(M_U, V_U),\] where \[V_U = (\Delta_2^\top \Delta_2 + \Sigma_S^{-1})^{-1} \] and \[M_U = V_U (\Delta_2^\top (Z- (D\beta+ \Delta_1 S))) \]
Update joint posterior of $Z$ using a Gibbs step. The posterior distribution for $Z$ is truncated Gaussian, hence \[ [Z| \cdot] \sim \mathcal{TN}(D\beta+ \Delta_1 S + \Delta_2 U, 1, I(Y, Z)),\] where \[I(Y, Z) = I(Y=1)I(Z > 0) + I(Y=0)I(Z \leq 0)\].

Simulation Study

Generating synthetic data

#load the required package
library(geoR, quietly = TRUE)

## --------------------------------------------------------------
##  Analysis of Geostatistical Data
##  For an Introduction to geoR go to http://www.leg.ufpr.br/geoR
##  geoR version 1.7-5.2.1 (built on 2016-05-02) is now loaded
## --------------------------------------------------------------

require(Matrix, quietly = TRUE)
#simulating my type of data
n_add <- 250
n_pat <- 150
n_reg <- 50
adm <- 1:n_add
pat <- c(1:n_pat, sample(1:n_pat, n_add-n_pat))
reg <- c(1:n_reg, sample(1:n_reg, n_add-n_reg, replace=TRUE))
data <- data.frame(adm, pat, reg)
#####
x <- rnorm(n_add)
# Create n x D design matrix
D <- 2
# We learn a linear function
X <- cbind(rep(1, n_add), x)
# True values of regression coeffiecients theta
true_theta <- c(-0.5, 0.5)
# simulate the gaussian process
S <- grf(n=n_reg, grid = "irreg", cov.pars = c(1, 0.5), xlims = c(0, 6), ylims = c(0, 6))

## grf: simulation(s) on randomly chosen locations with  50  points
## grf: process with  1  covariance structure(s)
## grf: nugget effect is: tausq= 0 
## grf: covariance model 1 is: exponential(sigmasq=1, phi=0.5)
## grf: decomposition algorithm used is:  cholesky 
## grf: End of simulation procedure. Number of realizations: 1

xy <- S$coords
u <- rnorm(n_pat, sd=sqrt(0.2))
# create the mapping matrix
inc_mat_reg <-t(as(as.factor(reg),Class="sparseMatrix"))
inc_mat_pat <-t(as(as.factor(pat),Class="sparseMatrix"))
# Obtain the vector with probabilities of success p using the probit link
p <- pnorm(as.numeric(X %*% true_theta + inc_mat_reg%*%cbind(S$data) + inc_mat_pat%*%cbind(u)))
# Generate binary observation data y
y <- cbind(rbinom(n_add, 1, p))
data <- data.frame(data, x, y)
head(data)

##   adm pat reg          x y
## 1   1   1   1  0.5402883 0
## 2   2   2   2  0.1756355 0
## 3   3   3   3  1.1559319 1
## 4   4   4   4  0.8420048 0
## 5   5   5   5 -0.4327365 0
## 6   6   6   6 -0.9763709 0

Fiting a non spatial model

# fit the non-spatial model
fit <- glm(formula = y~x, family = binomial(link = "probit"), data=data)
summary(fit)

## 
## Call:
## glm(formula = y ~ x, family = binomial(link = "probit"), data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3398  -0.8973  -0.7382   1.3171   2.0473  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.46790    0.08364  -5.594 2.22e-08 ***
## x            0.25979    0.08494   3.058  0.00222 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 313.43  on 249  degrees of freedom
## Residual deviance: 303.73  on 248  degrees of freedom
## AIC: 307.73
## 
## Number of Fisher Scoring iterations: 4

Fiting a full bayesian spatial probit model using GRF

# Library for sampling from Multivariate Normal distribution
require(mvtnorm, quietly = TRUE)
# Library for sampling from Truncated Normal distribution
require(truncnorm, quietly = TRUE)
# Variables that we will need later
N1  <- sum(y)  # Number of successes
N0  <- n_add - N1  # Number of failures

# Conjugate prior on the coefficients \beta ~ N(mu_beta, Sigma_beta)
mu_beta <- c(fit$coefficients)
# Sigma_beta <- diag(10, 2)
Sigma_beta <- as.matrix(vcov(fit)) * 10

# Compute the distance matrix over the region
U <- as.matrix(dist(xy))

# proposal covariance
prop_Sigma <- matrix(c(0.01, 0, 0, 0.01), 2)
prop_Sigma_tau2 <- 0.01

# [,1]       [,2]
# [1,] 0.14450882 0.03176559
# [2,] 0.03176559 0.34874799

# Initialize parameters
re_para <- c(1, 0.5, 0.2)
beta <- c(-0.5, 0.5)
Sigma_s <-re_para[1]*exp(-U/re_para[2])
S <- rmvnorm(n = 1, mean = rep(0, n_reg), sigma = Sigma_s)
u <- rnorm(n = n_pat, mean = 0, sd=sqrt(re_para[3]))
mu_z <- X %*% beta + inc_mat_reg%*%as.numeric(S) + inc_mat_pat%*%u
z <- rep(0, n_add)
z[y == 0] <- rtruncnorm(N0, mean = mu_z[y == 0], sd = 1, a = -Inf, b = 0)
z[y == 1] <- rtruncnorm(N1, mean = mu_z[y == 1], sd = 1, a = 0, b = Inf)


# Number of simulations for Gibbs sampler
N_sim <- 110000
# Burn in period
burn_in <- 10000
# Matrix storing samples of the \beta parameter
beta_chain <- matrix(beta, nrow = N_sim, ncol = D)
# Matrix storing samples of the latent variable, z
z_chain <- matrix(0, nrow = N_sim, ncol = n_add)
# Matrix storing samples of the parameter of S and U, sigma^2, phi and tau^2
re_para_chain <- matrix(re_para, nrow = N_sim, ncol = 3)



# ---------------------------------
# Gibbs sampling algorithm
# ---------------------------------
# Compute posterior variance of beta
prec_beta <- solve(Sigma_beta)
V <- solve(prec_beta + crossprod(X, X))
#acceptance probability
accept1 <- 0
accept2 <- 0
for (t in 2:N_sim) {
  
  #  get the random effect parameters
  sigma2 <- re_para[1]
  phi <- re_para[2]
  tau2 <- re_para[3]
  
  # update the proposal variance
  if (t > 1000){
    prop_Sigma <- cov(log(re_para_chain[,1:2]))*((2.38)^2)/2
    prop_Sigma_tau2 <- cov(log(re_para_chain))[3,3]*((2.38)^2)
  }
  
  # Simulate each of the spatial random effects parameters using metropolis-hasting
  # the proposal
  sigma2_phi_proposal <- rmvnorm(n=1, mean=log(c(sigma2, phi)), sigma = prop_Sigma)
  # the acceptance probability
  alpha <- exp(dmvnorm(x = as.numeric(S), mean = rep(0, n_reg), sigma=exp(sigma2_phi_proposal[1])*exp(-U/exp(sigma2_phi_proposal[2])), log = TRUE) + 
                 dnorm(x = exp(sigma2_phi_proposal[1]), mean = 1, sd = 1.00005, log = TRUE) +  #prior of sigma2
                 dnorm(x= exp(sigma2_phi_proposal[2]), mean = 0.5, sd = 1.00001, log = TRUE)  -  #prior of phi
                 dmvnorm(x = as.numeric(S), mean = rep(0, n_reg), sigma= Sigma_s, log = TRUE)- 
                 dnorm(x = sigma2, mean = 1, sd = 1.00005, log = TRUE) -
                 dnorm(x= phi, mean = 0.5, sd = 1.00001, log = TRUE)  +
                 log(exp(sigma2_phi_proposal[1])/sigma2) + 
                 log(exp(sigma2_phi_proposal[2])/phi))
  
  if (alpha > runif(1)) {
    accept1 <- accept1 + 1
    sigma2 <- exp(sigma2_phi_proposal[1])
    phi <- exp(sigma2_phi_proposal[2])
    #cat("accept =", c(sigma2,phi), "\n")
  }
  
  # Simulate each of the random effects parameters using metropolis-hasting
  # the likelihood
  # tau2.lik <- sum(dnorm(x = u, mean = rep(0, n_pat), sd= sqrt(tau2), log = TRUE))
  # the proposal
  tau2_proposal <- rnorm(n=1, mean=log(c(tau2)), sd = sqrt(prop_Sigma_tau2))
  # the acceptance probability
  alpha <- exp(sum(dnorm(x = u, mean = rep(0, n_pat), sd= sqrt(exp(tau2_proposal)), log = TRUE)) +  
                 dnorm(x= exp(tau2_proposal), mean = 0.2, sd = 1.0001, log = TRUE) -
                     sum(dnorm(x = u, mean = rep(0, n_pat), sd= sqrt(tau2), log = TRUE)) - 
                 dnorm(x= tau2, mean = 0.2, sd = 1.0001, log = TRUE) +
                 log(exp(tau2_proposal)/tau2)) 
  
  if (alpha > runif(1)) {
    accept2 <- accept2 + 1
    tau2 <- exp(tau2_proposal)
    #cat("accept =", tau2, "\n")
  }
  
  #######
  # Compute posterior of S
  Sigma_s <-sigma2*exp(-U/phi)
  # Compute mean and variance of S
  V_s <- solve(diag(table(reg)) + solve(Sigma_s))
  mu_s <- V_s%*%(crossprod(inc_mat_reg, (z-(X%*%beta + inc_mat_pat%*%u)))) 
  S <- t(rmvnorm(n = 1, mean = mu_s, sigma = as.matrix(V_s)))
  
  # Compute posterior of u
  # Compute mean and variance of u
  V_u <- solve(diag(table(pat)) + diag(1/tau2, n_pat))
  mu_u <- V_u%*%(crossprod(inc_mat_pat, (z-(X%*%beta + inc_mat_reg%*%S)))) 
  u <- t(rmvnorm(n = 1, mean = mu_u, sigma = as.matrix(V_u)))
  
  # Compute posterior mean of beta
  M <- V %*% (prec_beta %*% mu_beta + crossprod(X, (z-(inc_mat_reg%*%as.numeric(S) + inc_mat_pat%*%u))))
  # M <- V %*% (prec_beta %*% mu_beta + crossprod(X, (z)))
  # Draw variable \theta from its full conditional: \beta | z, X
  beta <- c(rmvnorm(1, M, as.matrix(V)))
  
  # Update Mean of z
  mu_z <- X %*% beta + inc_mat_reg%*%as.numeric(S) + inc_mat_pat%*%u
  # Draw latent variable z from its full conditional: z | \theta, y, X
  z[y == 0] <- rtruncnorm(N0, mean = mu_z[y == 0], sd = 1, a = -Inf, b = 0)
  z[y == 1] <- rtruncnorm(N1, mean = mu_z[y == 1], sd = 1, a = 0, b = Inf)
  
  # Store the \theta draws
  beta_chain[t, ] <- beta
  # Store the z draws
  z_chain[t, ] <- z
  
  # Store the covariance parameters draws
  re_para[1] <- sigma2
  re_para[2] <- phi
  re_para[3] <- tau2
  re_para_chain[t, ] <- re_para
  #cat("iter =", t, "\n")
}

post_beta <- colMeans(beta_chain[-(1:burn_in), ])
post_beta

## [1] -0.5684900  0.3836226

post_rf <- colMeans(re_para_chain[-(1:burn_in), ])
post_rf

## [1] 1.4133263 1.0859733 0.3904089

Ploting the diagnostic of the inference

#load the package
require(ggmcmc, quietly = TRUE)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:Matrix':
## 
##     expand

require(coda, quietly = TRUE)
df <- data.frame(beta_chain, re_para_chain)
colnames(df) <- c("beta0", "beta1", "sigma2", "phi", "tau2")
df <- df[-1,]
df <- df[-(1:(burn_in-1)),]   #burn in
df <- df[seq(1, nrow(df), 100),]  #thining
bbb <- ggs(mcmc(df))   
# ggmcmc(bbb)
ggs_traceplot(bbb)

ggs_density(bbb)

ggs_autocorrelation(bbb)

# compute the acceptance rate
accept1/N_sim # for sigma2 and phi

## [1] 0.09990909

accept2/N_sim # for tau2

## [1] 0.07552727

References

Albert, James H, and Siddhartha Chib. 1993. “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association 88 (422). Taylor & Francis Group: 669–79.

Besag, Julian, Jeremy York, and Annie Mollié. 1991. “Bayesian image restoration, with two applications in spatial statistics.” Annals of the Institute of Statistical Mathematics 43 (1): 1–20. https://econpapers.repec.org/RePEc:spr:aistmt✌43:y:1991:i:1:p:1-20.

Brooks, Steve, Andrew Gelman, Galin Jones, and Xiao-Li Meng. 2011. Handbook of Markov Chain Monte Carlo. CRC press.

LeSage, James P. 2000. “Bayesian Estimation of Limited Dependent Variable Spatial Autoregressive Models.” Geographical Analysis 32 (1). Wiley Online Library: 19–35.

LeSage, James, and Robert Kelley Pace. 2009. Introduction to Spatial Econometrics. Chapman; Hall/CRC.

Smith, Tony E, and James P LeSage. 2004. “A Bayesian Probit Model with Spatial Dependencies.” In Spatial and Spatiotemporal Econometrics, 127–60. Emerald Group Publishing Limited.

Lecturer in Statistics

My research interests include Geospatial epidemiology, Spatio-temporal statistics, Disease mapping and Predictive modelling