Gaussian Process Regression

In this notebook, Gaussian Process Regression is performed based on perfect and noisy data in order to show the basic principles behind the process.

Gaussian_Process_Donike_viz

Lab 2 - Gaussian Processes

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
import pandas as pd
from matplotlib.pyplot import figure

Exercise 1 - Sampling from a Gaussian process prior

(a)

Use numpy.linspace to construct a vector x ∗ with m = 101 elements equally spaced between [−5; 5]

In [2]:
x = np.linspace(-5,5,101)
print(x)
[-5.  -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4.  -3.9 -3.8 -3.7
 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.  -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3
 -2.2 -2.1 -2.  -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.  -0.9
 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1  0.   0.1  0.2  0.3  0.4  0.5
  0.6  0.7  0.8  0.9  1.   1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9
  2.   2.1  2.2  2.3  2.4  2.5  2.6  2.7  2.8  2.9  3.   3.1  3.2  3.3
  3.4  3.5  3.6  3.7  3.8  3.9  4.   4.1  4.2  4.3  4.4  4.5  4.6  4.7
  4.8  4.9  5. ]

(b)

Construct a mean vector m(x ∗ ) with 101 elements all equal to zero and the 101 × 101 covariance matrix K(x ∗ , x ∗ ). The expression for κ(·, ·) is given above. Let the hyperparameters be l = 2 and σ 2 f = 1. Hint : You might find it useful to define a function that returns κ(x, x′ ), taking x and x ′ as arguments. Hint : One solution to construct the matrix rapidly using numpy is to use for example the computation of the distance as numpy.abs(xs[:,numpy.newaxis]-xs[:,numpy.newaxis].T)**2) where xs is the grid x ∗ , i.e. the output of the numpy.linspace.

In [3]:
x_0 = np.zeros(101)
print(x_0)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0.]
In [4]:
def covariance_matrix(a,h1,h2): #hyperparameters 1 and 2
    s1 = -1/(2*h1**2) *np.abs(a[:,np.newaxis] - a[:,np.newaxis ].T) **2
    s2 =  h1*np.exp(s1)
    return(s2)
In [5]:
cov = covariance_matrix(x,h1=2,h2=1)
print(cov.shape)
(101, 101)

(c)

Use scipy.stats.multivariate normal (you might need to use the option allow singular=True) to draw 25 realizations of the function on the 101 grid points, i.e. f (1)(x ∗ ), . . . , f(25)(x ∗ ) sample independently from the multivariate normal distribution f(x ∗ ) ∼ N (m(x ∗ ), K(x ∗ , x ∗ ))

In [6]:
def draw_samples(mean,cov,amount):
    multiv_object = multivariate_normal(mean=mean ,cov=cov , allow_singular =True)
    res = multiv_object.rvs(amount).T # draw from random variates
    return(res)
In [7]:
samples = draw_samples(x_0,cov,25)
print(samples.shape)
(101, 25)

(d)

Plot these samples f (1)(x ∗ ), . . . , f(25)(x ∗ ) versus the input vector x ∗

In [8]:
# plot samples
figure(figsize=(6, 4), dpi=120)
plt.plot(x,samples,color="gray")
plt.plot(x,x_0,color="orange",label="Input Vector",lw=3)
plt.legend()
plt.title("25 sample realizations for h1: 1")
plt.show()

(e)

Try another value of l and repeat steps (b)-(d)

In [9]:
for i in [0.1,0.5,1,2,5,10,100]:
    
    cov_temp = covariance_matrix(x,h1=i,h2=1)
    samples_temp = draw_samples(x_0,cov_temp,25)

    # plot samples
    figure(figsize=(6, 4), dpi=120)
    plt.plot(x,samples_temp,color="lightgray")
    plt.plot(x,x_0,color="orange",label="Input Vector",lw=3)
    plt.ylim(-5,5)
    plt.legend()
    plt.title("25 sample realizations for h1: "+str(i))
    plt.show()

How do the two plots differ and why ?

L value determines how "stretched" curves are, because is's the width of the area under the curve. Higher values therefore mean a more stretched appearance. It determines the actual distance between crests & troughs, and therefore the chance of points falling in said area. If chosen right, it should reflect the distribution of the points.

Exercise 2 - Posterior of the Gaussian process

(a)

In [10]:
x1 = np.array([-4,-3,-1, 0, 2])
f = np.array([-2, 0, 1, 2,-1])

# vector
x1_v = np.linspace(-5,5,1001)

(b)

In [11]:
# using previous cov matrix

x1_cov = covariance_matrix(x1,1,1)
x1_cov.shape
Out[11]:
(5, 5)
In [12]:
# new funciton to take both vectors instead of just 1 as previously
def k(a,b,h1=1,h2=1):
    s1 = -1/(2*h1**2)*np.abs(a[:,np.newaxis]-b[:,np.newaxis].T)**2
    s2 = h2*np.exp(s1)
    return(s2)
In [13]:
k1 = k(x1_v,x1_v,1,1) # x*, x*
k2 = k(x1,x1_v,1,1)   # x , x*
k3 = k(x1,x1,1,1)     # x , x

(c)

In [14]:
mu_post = (k2.T@np.linalg.inv(k3))@f
K_post = k1 - k2.T@np.linalg.inv(k3)@k2
In [15]:
mu_post
Out[15]:
array([-1.6838634 , -1.69789962, -1.71184871, ..., -0.01594348,
       -0.01547481, -0.01501841])
In [16]:
K_post
Out[16]:
array([[ 5.44092657e-01,  5.40005479e-01,  5.35834847e-01, ...,
        -4.79536160e-05, -4.65461212e-05, -4.51753672e-05],
       [ 5.40005479e-01,  5.35980052e-01,  5.31871228e-01, ...,
        -4.81138030e-05, -4.67016066e-05, -4.53262737e-05],
       [ 5.35834847e-01,  5.31871228e-01,  5.27824294e-01, ...,
        -4.82646171e-05, -4.68479941e-05, -4.54683502e-05],
       ...,
       [-4.79536160e-05, -4.81138030e-05, -4.82646171e-05, ...,
         9.99857141e-01,  9.99811343e-01,  9.99665453e-01],
       [-4.65461212e-05, -4.67016066e-05, -4.68479941e-05, ...,
         9.99811343e-01,  9.99865420e-01,  9.99819392e-01],
       [-4.51753672e-05, -4.53262737e-05, -4.54683502e-05, ...,
         9.99665453e-01,  9.99819392e-01,  9.99873244e-01]])

(d)

In [17]:
samples2 = draw_samples(mu_post,K_post,25)
In [18]:
figure(figsize=(6, 4), dpi=120)
plt.plot(x1_v,samples2,color="lightgray")
#plt.plot(x1_v,K_post,color="black")
plt.plot(x1_v,mu_post,color="blue",lw=3,label="mu_post")
plt.scatter(x1,f,color="orange",label="original function",lw=3,zorder=100)
plt.legend()
plt.show()