In this video, we will talk about Gaussian processes for regression.

In the next video,

we will use Gaussian processes for Bayesian optimization.

Let's start from a regression problem example with a set of observations.

The goal of this example is to learn this function using Gaussian processes.

Suppose that the observations are noisy as it's shown on this slide.

y is a vector of the target observations,

and f is a vector the true function values,

Epsilon is the noise of the distribution.

We also suppose that the noise has normal distribution.

So the vector of the target observations y,

is distributed with a multivariate normal distribution.

Mean of the distribution is a vector of true function values f,

and the covariance matrix is a unit matrix multiplied by a parameter Alpha.

Also, suppose that the vector of

the true function values also has normal distribution with

zero mean and the covariance matrix K.

The matrix K is chosen to express the property that,

for points x_n and x_m that are similar,

the values f(x_n) and f(x_m) will be more strongly correlated than for dissimilar points.

Let's consider this matrix more detailed.

The covariance between two points is based on

the Euclidean distance between these two points,

x_i and x_j as it's shown on the slide,

and this expression expresses the property that for these two points that are similar,

the values of the function will be more strongly correlated than for dissimilar points.

Distributions of the target observations

and the true function values shown on the previous slides,

allow us to calculate distribution of

the observations with zero mean in the following covariance

matrix C. Let's consider how to use

this distribution to estimate the function value for a given point x.

Let's estimate the function value y_N plus 1 for a given point x_N plus 1.

The vector of the target observations with N plus 1 points has a normal distribution,

where mean and covariance matrix have the following structure.

The covariance matrix for the N plus 1 points consists of the covariance matrix for

the previous N points plus one additional row and column as it's shown on the slide.

Mean and covariance for the N points is known.

The conditional distribution of the observation value y_N plus 1 for

a given N previous points has

normal distribution with the following mean and standard deviation values,

as it's shown on the slide.

In this slide, I just would like to remind you one more time

the properties of the conditional distribution for the normal distribution,

and we will use these properties in the next slide.

The conditional distribution properties give us expressions

for the mean and the standard deviation of the observation y_N plus 1,

given N previous points as it's shown on the slide.

Let's plot the mean values for the observation y_N plus 1,

given previous N points.

The mean values are shown as green line in the figure.

This example shows that 10 observations estimates the function very well.

Three Sigma confidence region of

the distribution is shown in the figure as green regions.

In this video, we have learned about Gaussian processes for regression.

In the next video,

we will use them for the Bayesian optimization.