plus epsilon, so if gamma were equal to this denominator term.

And if beta were equal to mu, so this value up here,

then the effect of gamma z norm plus beta is

that it would exactly invert this equation.

So if this is true,

then actually z tilde i is equal to zi.

And so by an appropriate setting of the parameters gamma and beta,

this normalization step, that is,

these four equations is just computing essentially the identity function.

But by choosing other values of gamma and beta, this allows you to make the hidden

unit values have other means and variances as well.

And so the way you fit this into your neural network is,

whereas previously you were using these values z1, z2, and so

on, you would now use z tilde i, Instead of zi for

the later computations in your neural network.

And you want to put back in this [l] to explicitly denote which layer it is in,

you can put it back there.

So the intuition I hope you'll take away from this is that we saw how

normalizing the input features x can help learning in a neural network.

And what batch norm does is it applies that normalization process not just

to the input layer, but

to the values even deep in some hidden layer in the neural network.

So it will apply this type of normalization to normalize the mean and

variance of some of your hidden units' values, z.

But one difference between the training input and these hidden unit values is you

might not want your hidden unit values be forced to have mean 0 and variance 1.

For example, if you have a sigmoid activation function,

you don't want your values to always be clustered here.

You might want them to have a larger variance or have a mean that's different

than 0, in order to better take advantage of the nonlinearity of

the sigmoid function rather than have all your values be in just this linear regime.

So that's why with the parameters gamma and beta,

you can now make sure that your zi values have the range of values that you want.

But what it does really is it then shows that your hidden units have

standardized mean and variance, where the mean and

variance are controlled by two explicit parameters gamma and

beta which the learning algorithm can set to whatever it wants.

So what it really does is it normalizes in mean and variance of these hidden

unit values, really the zis, to have some fixed mean and variance.

And that mean and variance could be 0 and 1, or it could be some other value,

and it's controlled by these parameters gamma and beta.

So I hope that gives you a sense of the mechanics of how to implement batch norm,

at least for a single layer in the neural network.

In the next video, I'm going to show you how to fit batch norm into a neural

network, even a deep neural network, and how to make it work for

the many different layers of a neural network.

And after that, we'll get some more intuition about why batch norm could

help you train your neural network.

So in case why it works still seems a little bit mysterious, stay with me, and

I think in two videos from now we'll really make that clearer.