Fitting Geostatistical Model Using TensorFlow API from R

Mar 2, 2019 4 min read

Introduction

This tutorial simply estimate the parameter of a geostatistical model using the TensorFlow API from R. There are many tutorial and links online on how to use TensorFlow in R, see https://www.r-bloggers.com/an-introduction-to-tensorflow/ for an introduction to TensorFlow and https://kyleclo.github.io/maximum-likelihood-in-tensorflow-pt-1/ for TensorFlow’s autodifferentiation toolbox for maximum likelihood estimation.

TensorFlow is popularly used machine learning and deep learning. It is very essential that statisticians begin to benefit from its features. Some of the features that can be useful are:

autodifferentiation.
It automatically suports Multicore CPU and GPU.
It can be used from R. I will show this very soon.
Graphical probabilistic modelling is very possible with the use of TensorFlow Probability.
One can monitor the learning process through the TensorBoard.

The section that follows gives a general description typical geostatistical model.

Linear geostatistical model

Let \(y_i\) for \(i = 1, \ldots, n\) denote measurements at locations \(x_i \in A \subset \mathcal{R}^2\) , where \(A\) corresponds to the area of interest. Let \(S(x)\) and \(Z(x)\) denote a stationary zero-mean Gaussian process and Gaussian noise, respectively. The standard linear geostatsitical model for the data can then be expressed as \[Y(x_i) = d(x_i)^\top \beta + S(x_i) + Z(x_i)\] where \(Y(x_i)\) is the random variable associated with the data \(y(x_i)\) , and \(d(x_i)\) is a vector of the covariates with corresponding vector of regression coefficients \(\beta\). We also assume that the stationary process \(S(x)\) is isotropic, \(cov{S(x), S(x 0 )} = \sigma^2 \rho(u; \phi),\) where \(\sigma^2\) is the variance of \(S(x)\), \(\rho(\cdot; \phi)\) is a correlation function with parameter \(\phi\) (assumed to be exponential in this case) and \(u\) is the Euclidean distance between \(x\) and \(x'\) . Finally, we use \(\tau^2\) to denote the variance of the Gaussian noise \(Z(x_i)\). \(\theta = (\beta, \sigma^2, \phi, \tau^2)\) is the vector of the parameter in the model.

Geostatistics with TensorFlow

Simulating the data

n <- 625  # number of data point
x <- rnorm(n)  # simulating a covariate
### generate the data location
xy <- expand.grid(x=seq(0, 3, length.out = sqrt(n)), y=seq(0, 3, length.out = sqrt(n)))
d <- as.matrix(dist(xy))  # compute the distance matrix

# The parameters of the model
beta0 <- 0.5
beta1 <- 2.5
sigma2 <- 1
phi <- 0.3
tau2 <- 0.1

# Generate the covariance matrix of the process
Sigma <- sigma2*exp(-d/phi)
S <- t(chol(Sigma))%*%rnorm(n)

# Finally simulate the data
y <- as.numeric(beta0 + beta1 *x  + S + rnorm(n = n, mean = 0, sd = sqrt(tau2)))

Loading the R packages

#Load the package
library(tensorflow)
tfp <- import("tensorflow_probability") # import the tensorflow probability
tfd <- tfp$distributions   # get the tensorflow distributions

Defining the model

sess <- tf$Session()

x_data <- tf$placeholder(dtype = "float",
                         shape = n,
                         name = "x") # Placeholder for Petal.Length
y_data <- tf$placeholder(dtype = "float",
                         shape = n,
                         name = "y")  # Placeholder for Petal.Width

d_mat <- tf$placeholder(dtype = tf$float32,  name="d")

beta0 <- tf$Variable(0.1,   name = "Intercept")
beta1 <- tf$Variable(0.1,   name = "coefficient")

#initialise the parameters
sigma2 <- tf$Variable(0.1, name = "sigma2")
phi <- tf$Variable(0.1, name = "phi")
tau2 <- tf$Variable(0.1, name = "tau2")

# The expectation of the model
y_hat <- beta0 + beta1 * x_data

Maximum Likelihood Estimation

# Setting up the model
n <- tf$constant(n)
# define a Gaussian distribution with mean = y_hat and covariance matrix using an exponential covariance function
gaussian_dist <- tfd$MultivariateNormalFullCovariance(loc = y_hat, covariance_matrix = sigma2*tf$exp(-d_mat/phi) + tf$eye(n)*tau2)
# log_likelihood (y_data | theta)
log_prob <- gaussian_dist$log_prob(value = y_data)
# negative_log_likelihood (y_data | theta)
neg_log_likelihood <- -1.0 * tf$reduce_sum(log_prob)

# gradient of neg_log_likelihood wrt (theta)
grad <- tf$gradients(neg_log_likelihood,c(beta0, beta1, sigma2, phi, tau2))

# optimizer
optimizer <- tf$train$AdamOptimizer(learning_rate = 0.01)
train_op <- optimizer$minimize(loss = neg_log_likelihood)

Creating an interface of the tensorboard

MSE_hist <- tf$summary$scalar("neg_log_likelihood", neg_log_likelihood)   # save all values of neg_log_likelihood
beta0_hist <- tf$summary$scalar("Intercept", beta0)
beta1_hist <- tf$summary$scalar("Coefficient", beta1)
sigma2_hist <- tf$summary$scalar("sigma2", sigma2)
phi_hist <- tf$summary$scalar("phi", phi)
tau2_hist <- tf$summary$scalar("tau2", tau2)
merged <- tf$summary$merge_all()            # Merges all summaries collected in the default graph.
train_writer <- tf$summary$FileWriter(logdir = "logs")
train_writer$add_graph(sess$graph)          # add a graph structure

Running the session

sess$run(tf$global_variables_initializer())
for (epoch in 1:200) {
  result <- sess$run(list(merged,              # the merged parameters and 
                          train_op,            # Min neg_log_likelihood
                          neg_log_likelihood,  # neg_log_likelihood
                          grad),               # Gradient
                     feed_dict = dict(x_data = x,
                                      y_data =  y, d_mat=d))
  summary <- result[[1]]                    # extract the summary result of merged
  train_writer$add_summary(summary, epoch)  # write summary to disk
}
#print the maximum likelihood estimate of the parameters
cat("Intercept: ", sess$run(beta0), "\n Coefficient: ", sess$run(beta1), "\n sigma2: ", 
    sess$run(sigma2), "\n phi: ", sess$run(phi), "\n tau2: ", sess$run(tau2), "\n")

## Intercept:  0.2836692 
##  Coefficient:  1.201712 
##  sigma2:  0.5700233 
##  phi:  0.110626 
##  tau2:  0.5315924

# print the gradient of the log-likelihood with respect to the parameters 
cat("Gradient wrt: d.beta0 ", result[[4]][[1]], "\n d.beta1: ", result[[4]][[2]], "\n d.sigma2: ", result[[4]][[3]], 
    "\n d.phi: ", result[[4]][[4]], "\n d.tau2: ", result[[4]][[5]], " \n")

## Gradient wrt: d.beta0  -3.351908 
##  d.beta1:  -768.9117 
##  d.sigma2:  -328.6408 
##  d.phi:  190.2612 
##  d.tau2:  -392.9058

sess$close()
tf$reset_default_graph()

Monitoring with TensorBoard

tensorboard(log_dir = "logs") # Play with tensorboard

## Started TensorBoard at http://127.0.0.1:3902

Lecturer in Statistics

My research interests include Geospatial epidemiology, Spatio-temporal statistics, Disease mapping and Predictive modelling