In this notebook, Gaussian Process Regression is performed based on perfect and noisy data in order to show the basic principles behind the process.
Lab 2 - Gaussian Processes¶
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
import pandas as pd
from matplotlib.pyplot import figure
x = np.linspace(-5,5,101)
print(x)
[-5. -4.9 -4.8 -4.7 -4.6 -4.5 -4.4 -4.3 -4.2 -4.1 -4. -3.9 -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3. -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2. -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1. -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3. 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4. 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5. ]
(b)¶
Construct a mean vector m(x ∗ ) with 101 elements all equal to zero and the 101 × 101 covariance matrix K(x ∗ , x ∗ ). The expression for κ(·, ·) is given above. Let the hyperparameters be l = 2 and σ 2 f = 1. Hint : You might find it useful to define a function that returns κ(x, x′ ), taking x and x ′ as arguments. Hint : One solution to construct the matrix rapidly using numpy is to use for example the computation of the distance as numpy.abs(xs[:,numpy.newaxis]-xs[:,numpy.newaxis].T)**2) where xs is the grid x ∗ , i.e. the output of the numpy.linspace.
x_0 = np.zeros(101)
print(x_0)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
def covariance_matrix(a,h1,h2): #hyperparameters 1 and 2
s1 = -1/(2*h1**2) *np.abs(a[:,np.newaxis] - a[:,np.newaxis ].T) **2
s2 = h1*np.exp(s1)
return(s2)
cov = covariance_matrix(x,h1=2,h2=1)
print(cov.shape)
(101, 101)
(c)¶
Use scipy.stats.multivariate normal (you might need to use the option allow singular=True) to draw 25 realizations of the function on the 101 grid points, i.e. f (1)(x ∗ ), . . . , f(25)(x ∗ ) sample independently from the multivariate normal distribution f(x ∗ ) ∼ N (m(x ∗ ), K(x ∗ , x ∗ ))
def draw_samples(mean,cov,amount):
multiv_object = multivariate_normal(mean=mean ,cov=cov , allow_singular =True)
res = multiv_object.rvs(amount).T # draw from random variates
return(res)
samples = draw_samples(x_0,cov,25)
print(samples.shape)
(101, 25)
(d)¶
Plot these samples f (1)(x ∗ ), . . . , f(25)(x ∗ ) versus the input vector x ∗
# plot samples
figure(figsize=(6, 4), dpi=120)
plt.plot(x,samples,color="gray")
plt.plot(x,x_0,color="orange",label="Input Vector",lw=3)
plt.legend()
plt.title("25 sample realizations for h1: 1")
plt.show()
(e)¶
Try another value of l and repeat steps (b)-(d)
for i in [0.1,0.5,1,2,5,10,100]:
cov_temp = covariance_matrix(x,h1=i,h2=1)
samples_temp = draw_samples(x_0,cov_temp,25)
# plot samples
figure(figsize=(6, 4), dpi=120)
plt.plot(x,samples_temp,color="lightgray")
plt.plot(x,x_0,color="orange",label="Input Vector",lw=3)
plt.ylim(-5,5)
plt.legend()
plt.title("25 sample realizations for h1: "+str(i))
plt.show()
How do the two plots differ and why ?¶
L value determines how "stretched" curves are, because is's the width of the area under the curve. Higher values therefore mean a more stretched appearance. It determines the actual distance between crests & troughs, and therefore the chance of points falling in said area. If chosen right, it should reflect the distribution of the points.
Exercise 2 - Posterior of the Gaussian process¶
(a)¶
x1 = np.array([-4,-3,-1, 0, 2])
f = np.array([-2, 0, 1, 2,-1])
# vector
x1_v = np.linspace(-5,5,1001)
(b)¶
# using previous cov matrix
x1_cov = covariance_matrix(x1,1,1)
x1_cov.shape
(5, 5)
# new funciton to take both vectors instead of just 1 as previously
def k(a,b,h1=1,h2=1):
s1 = -1/(2*h1**2)*np.abs(a[:,np.newaxis]-b[:,np.newaxis].T)**2
s2 = h2*np.exp(s1)
return(s2)
k1 = k(x1_v,x1_v,1,1) # x*, x*
k2 = k(x1,x1_v,1,1) # x , x*
k3 = k(x1,x1,1,1) # x , x
(c)¶
mu_post = (k2.T@np.linalg.inv(k3))@f
K_post = k1 - k2.T@np.linalg.inv(k3)@k2
mu_post
array([-1.6838634 , -1.69789962, -1.71184871, ..., -0.01594348, -0.01547481, -0.01501841])
K_post
array([[ 5.44092657e-01, 5.40005479e-01, 5.35834847e-01, ..., -4.79536160e-05, -4.65461212e-05, -4.51753672e-05], [ 5.40005479e-01, 5.35980052e-01, 5.31871228e-01, ..., -4.81138030e-05, -4.67016066e-05, -4.53262737e-05], [ 5.35834847e-01, 5.31871228e-01, 5.27824294e-01, ..., -4.82646171e-05, -4.68479941e-05, -4.54683502e-05], ..., [-4.79536160e-05, -4.81138030e-05, -4.82646171e-05, ..., 9.99857141e-01, 9.99811343e-01, 9.99665453e-01], [-4.65461212e-05, -4.67016066e-05, -4.68479941e-05, ..., 9.99811343e-01, 9.99865420e-01, 9.99819392e-01], [-4.51753672e-05, -4.53262737e-05, -4.54683502e-05, ..., 9.99665453e-01, 9.99819392e-01, 9.99873244e-01]])
(d)¶
samples2 = draw_samples(mu_post,K_post,25)
figure(figsize=(6, 4), dpi=120)
plt.plot(x1_v,samples2,color="lightgray")
#plt.plot(x1_v,K_post,color="black")
plt.plot(x1_v,mu_post,color="blue",lw=3,label="mu_post")
plt.scatter(x1,f,color="orange",label="original function",lw=3,zorder=100)
plt.legend()
plt.show()