[MUSIC]

In this video,

we're going to cover sampling from one-dimensional distributions.

And let's start with the simplest case, a one-dimensional discrete distribution.

So say we have a random variable like this,

which takes the value of a1 with probability 0.6,

value a2 with probability 0.1, and the value a3 with probability 0.3.

How can we sample from it?

Well, we have to start from something.

It's hard to generate randomness from nothing on a deterministic computer.

So we're going to use this kind of function which can generate your random

number on the interval from zero to one.

And this function is included in almost any possible programming language's

library.

So it's usually there, and

we can use it to base our further sampling techniques on it.

But anyway, we have this function.

How can we generate a sample from this discrete distribution?

Well, let's start with a interval and let's decompose it into three chunks,

where the first chunk has length 0.6,

the second chunk has length 0.1, and the third chunk has length 0.3.

And let's say that we're going to generate a random number in this interval.

And let's say that if it appears in the first chunk, in the right one,

then we will say that we have just generated the value of a1.

So let's generate a random number from our function which just [INAUDIBLE]

point on this interval.

And let's say that we were to be somewhere around 0.9 for example.

It means we have just generated the value of a3.

So, why does it work?

Well, because, obviously, if you throw a random point on an interval,

then the probability that it's appeared in some chunk is

proportional to the length of the chunk.

And the length of the first chunk is 0.6,

which is exactly the probability we wanted to, a1 to be generated, and etc.

So this indeed is the correct procedure for generating random numbers

from discrete distributions with finite number of values.

And the bottom line here is that it's really easy to generate samples

from discrete distributions.

But note that here we had just three values.

If we for example had 1 million values, or 1 billion values, it would be much harder.

Because we would have to spend amount of time proportional to

1 billion to separate this interval into 1 billion chunks of

appropriate sizes and then to find in which particular interval,

in which particular chunk the random value appeared.

But if you have finite number of discrete events and this number is not that large,

it's like 1,000, then you can really efficiently and

easily generate samples from this distribution.

Okay, let's move to continuous distributions.

So let's say for beginning that we want to

sample a point from a Gaussian, like this.

How can we do it?

Well, one way is to use the central limit theorem.

That says that if we have a few independent random values, and

if we sum them up together, we'll get something like Gaussian.

So if we just throw, say, 12 random numbers from the uniform

distribution on an interval from 0 to 1 and then add them all together,

we'll get something with expected value being 6.

Because expected value of each x i is 0.5, it's the center of the interval.

And the sum of 12 x i's will be will be from 0 to 12.

And the center of this interval is 6.

So I have to subtract 6 to make the expected value of this

distribution to be 0.

And in this way, our distribution will be actually not from minus infinity to

plus infinity like in the Gaussian case, but rather from minus 6 to 6.

So it's indeed an approximation, it can't generate you anything like

an actual Gaussian, but it's a pretty close one.

So although it cannot ever generate for example, a number of 100,

but the Gaussian itself is also very, very improbable to generate you 100.

So we can be okay with losing these highly improbable parts of your sample space.

And you may ask, why 12 here?

Well, just to make the variance of the resulting

random variable z to be equal to 1 as of standard Gaussian.

If you use for example not 12 but, I don't know, 100 values,

you would have to divide this thing by something to make its variance to be 1.

Or if you don't want to use these kind of loose approximations,

you can ask your favorite programming language for a function which

just generates you numbers from Gaussian, and it probably has it.

So it's a pretty standard thing to have in a programming language

to generate numbers from a standard Gaussian.

And this is kind of more efficient and

more accurate than just generating it from central limit theorem.

So, okay, Gaussians are easy.

But what if we have a more complicated distribution like this and

we don't know how to sample from it?

Again, as we discussed in the previous video, it can happen if for example,

this is a posterior distribution on some parameters or some latent variables, and

we may even know it up to normalization constant, not exactly.

So what do we do in this situation?