You now know pretty much all the building blocks of building

a full convolutional neural network.

Let's look at an example.

Let's say you're inputting an image which is 32 x 32 x 3, so

it's an RGB image and maybe you're trying to do handwritten digit recognition.

So you have a number like 7 in a 32 x 32 RGB initiate trying

to recognize which one of the 10 digits from zero to nine is this.

Let's throw the neural network to do this.

And what I'm going to use in this slide is inspired,

it's actually quite similar to one of the classic neural networks called LeNet-5,

which is created by Yann LeCun many years ago.

What I'll show here isn't exactly LeNet-5 but

it's inspired by it, but many parameter choices were inspired by it.

So with a 32 x 32 x 3 input let's say that the first

layer uses a 5 x 5 filter and a stride of 1, and no padding.

So the output of this layer,

if you use 6 filters would be 28 x 28 x 6,

and we're going to call this layer conv 1.

So you apply 6 filters, add a bias, apply the non-linearity,

maybe a real non-linearity, and that's the conv 1 output.

Next, let's apply a pooling layer, so

I am going to apply mass pooling here and let's use a f=2, s=2.

When I don't write a padding use a pad easy with a 0.

Next let's apply a pooling layer, I am going to apply,

let's see max pooling with a 2 x 2 filter and the stride equals 2.

So this is should reduce the height and

width of the representation by a factor of 2.

So 28 x 28 now becomes 14 x 14, and

the number of channels remains the same so 14 x 14 x 6,

and we're going to call this the Pool 1 output.

So, it turns out that in the literature of a ConvNet there are two

conventions which are inside the inconsistent about what you call a layer.

One convention is that this is called one layer.

So this will be layer one of the neural network, and now the conversion

will be to call they convey layer as a layer and the pool layer as a layer.

When people report the number of layers in a neural network usually people just

record the number of layers that have weight, that have parameters.

And because the pooling layer has no weights, has no parameters,

only a few hyper parameters, I'm going to use a convention that Conv 1 and

Pool 1 shared together.

I'm going to treat that as Layer 1, although sometimes you see people maybe

read articles online and read research papers, you hear about the conv layer and

the pooling layer as if they are two separate layers.

But this is maybe two slightly inconsistent notation terminologies,

but when I count layers, I'm just going to count layers that have weights.

So achieve both of this together as Layer 1.

And the name Conv1 and Pool1 use here the 1 at the end also

refers the fact that I view both of this is part of Layer 1 of the neural network.

And Pool 1 is grouped into Layer 1 because it doesn't have its own weights.

Next, given a 14 x 14 bx 6 volume, let's apply another

convolutional layer to it, let's use a filter size that's 5 x 5,

and let's use a stride of 1, and let's use 10 filters this time.

So now you end up with, A 10 x 10

x 10 volume, so I'll call this Comv 2,

and then in this network let's do max

pulling with f=2, s=2 again.

So you could probably guess the output of this, f=2,

s=2, this should reduce the height and

width by a factor of 2, so you're left with 5 x 5 x 10.

And so I'm going to call this Pool 2, and

in our convention this is Layer 2 of the neural network.

Now let's apply another convolutional layer to this.

I'm going to use a 5 x 5 filter, so f = 5, and let's try this,

1, and I don't write the padding, means there's no padding.

And this will give you the Conv 2 output, and that's your 16 filters.

So this would be a 10 x 10 x 16 dimensional output.

So we look at that, and this is the Conv 2 layer.

And then let's apply max pooling to this with f=2, s=2.

You can probably guess the output of this,

we're at 10 x 10 x 16 with max pooling with f=2, s=2.

This will half the height and

width, you can probably guess the result of this, right?

Left pooling with f = 2, s = 2.

This should halve the height and width so you end up with

a 5 x 5 x 16 volume, same number of channels as before.

We're going to call this Pool 2.

And in our convention this is Layer 2 because this

has one set of weights and your Conv 2 layer.

Now 5 x 5 x 16, 5 x 5 x 16 is equal to 400.

So let's now fatten our Pool 2 into a 400 x 1 dimensional vector.

So think of this as fatting this up into these set of neurons, like so.

And what we're going to do is then take these 400 units and

let's build the next layer, As having 120 units.

So this is actually our first fully connected layer.

I'm going to call this FC3 because we have

400 units densely connected to 120 units.