0:00

You now know pretty much all the building blocks of building

Â a full convolutional neural network.

Â Let's look at an example.

Â Let's say you're inputting an image which is 32 x 32 x 3, so

Â it's an RGB image and maybe you're trying to do handwritten digit recognition.

Â So you have a number like 7 in a 32 x 32 RGB initiate trying

Â to recognize which one of the 10 digits from zero to nine is this.

Â Let's throw the neural network to do this.

Â And what I'm going to use in this slide is inspired,

Â it's actually quite similar to one of the classic neural networks called LeNet-5,

Â which is created by Yann LeCun many years ago.

Â What I'll show here isn't exactly LeNet-5 but

Â it's inspired by it, but many parameter choices were inspired by it.

Â So with a 32 x 32 x 3 input let's say that the first

Â layer uses a 5 x 5 filter and a stride of 1, and no padding.

Â So the output of this layer,

Â if you use 6 filters would be 28 x 28 x 6,

Â and we're going to call this layer conv 1.

Â So you apply 6 filters, add a bias, apply the non-linearity,

Â maybe a real non-linearity, and that's the conv 1 output.

Â Next, let's apply a pooling layer, so

Â I am going to apply mass pooling here and let's use a f=2, s=2.

Â When I don't write a padding use a pad easy with a 0.

Â Next let's apply a pooling layer, I am going to apply,

Â let's see max pooling with a 2 x 2 filter and the stride equals 2.

Â So this is should reduce the height and

Â width of the representation by a factor of 2.

Â So 28 x 28 now becomes 14 x 14, and

Â the number of channels remains the same so 14 x 14 x 6,

Â and we're going to call this the Pool 1 output.

Â So, it turns out that in the literature of a ConvNet there are two

Â conventions which are inside the inconsistent about what you call a layer.

Â One convention is that this is called one layer.

Â So this will be layer one of the neural network, and now the conversion

Â will be to call they convey layer as a layer and the pool layer as a layer.

Â When people report the number of layers in a neural network usually people just

Â record the number of layers that have weight, that have parameters.

Â And because the pooling layer has no weights, has no parameters,

Â only a few hyper parameters, I'm going to use a convention that Conv 1 and

Â Pool 1 shared together.

Â I'm going to treat that as Layer 1, although sometimes you see people maybe

Â read articles online and read research papers, you hear about the conv layer and

Â the pooling layer as if they are two separate layers.

Â But this is maybe two slightly inconsistent notation terminologies,

Â but when I count layers, I'm just going to count layers that have weights.

Â So achieve both of this together as Layer 1.

Â And the name Conv1 and Pool1 use here the 1 at the end also

Â refers the fact that I view both of this is part of Layer 1 of the neural network.

Â And Pool 1 is grouped into Layer 1 because it doesn't have its own weights.

Â Next, given a 14 x 14 bx 6 volume, let's apply another

Â convolutional layer to it, let's use a filter size that's 5 x 5,

Â and let's use a stride of 1, and let's use 10 filters this time.

Â So now you end up with, A 10 x 10

Â x 10 volume, so I'll call this Comv 2,

Â and then in this network let's do max

Â pulling with f=2, s=2 again.

Â So you could probably guess the output of this, f=2,

Â s=2, this should reduce the height and

Â width by a factor of 2, so you're left with 5 x 5 x 10.

Â And so I'm going to call this Pool 2, and

Â in our convention this is Layer 2 of the neural network.

Â Now let's apply another convolutional layer to this.

Â I'm going to use a 5 x 5 filter, so f = 5, and let's try this,

Â 1, and I don't write the padding, means there's no padding.

Â And this will give you the Conv 2 output, and that's your 16 filters.

Â So this would be a 10 x 10 x 16 dimensional output.

Â So we look at that, and this is the Conv 2 layer.

Â And then let's apply max pooling to this with f=2, s=2.

Â You can probably guess the output of this,

Â we're at 10 x 10 x 16 with max pooling with f=2, s=2.

Â This will half the height and

Â width, you can probably guess the result of this, right?

Â Left pooling with f = 2, s = 2.

Â This should halve the height and width so you end up with

Â a 5 x 5 x 16 volume, same number of channels as before.

Â We're going to call this Pool 2.

Â And in our convention this is Layer 2 because this

Â has one set of weights and your Conv 2 layer.

Â Now 5 x 5 x 16, 5 x 5 x 16 is equal to 400.

Â So let's now fatten our Pool 2 into a 400 x 1 dimensional vector.

Â So think of this as fatting this up into these set of neurons, like so.

Â And what we're going to do is then take these 400 units and

Â let's build the next layer, As having 120 units.

Â So this is actually our first fully connected layer.

Â I'm going to call this FC3 because we have

Â 400 units densely connected to 120 units.

Â 6:46

So this fully connected unit, this fully connected layer is just like

Â the single neural network layer that you saw in Courses 1 and 2.

Â This is just a standard neural network where you have

Â a weight matrix that's called W3 of dimension 120 x 400.

Â And this is fully connected because each of the 400 units here is connected

Â to each of the 120 units here, and you also have the bias parameter,

Â yes that's going to be just a 120 dimensional, this is 120 outputs.

Â And then lastly let's take 120 units and add another layer,

Â this time smaller but let's say we had 84 units here,

Â I'm going to call this fully connected Layer 4.

Â And finally we now have 84 real numbers that you can fit to a [INAUDIBLE] unit.

Â And if you're trying to do handwritten digital recognition,

Â to recognize this hand it is 0, 1, 2, and so on up to 9.

Â Then this would be a softmax with 10 outputs.

Â So this is a vis-a-vis typical example of what

Â a convolutional neural network might look like.

Â And I know this seems like there a lot of hyper parameters.

Â We'll give you some more specific suggestions later for

Â how to choose these types of hyper parameters.

Â Maybe one common guideline is to actually not try to invent your own

Â settings of hyper parameters, but

Â to look in the literature to see what hyper parameters you work for others.

Â And to just choose an architecture that has worked well for

Â someone else, and there's a chance that will work for your application as well.

Â We'll see more about that next week.

Â But for now I'll just point out that as you go deeper in the neural network,

Â usually nh and nw to height and width will decrease.

Â Pointed this out earlier, but it goes from 32 x 32, to 20 x 20, to 14 x 14,

Â to 10 x 10, to 5 x 5.

Â So as you go deeper usually the height and width will decrease,

Â whereas the number of channels will increase.

Â It's gone from 3 to 6 to 16, and then your fully connected layer is at the end.

Â And another pretty common pattern you see in neural networks is to have conv layers,

Â maybe one or more conv layers followed by a pooling layer, and

Â then one or more conv layers followed by pooling layer.

Â And then at the end you have a few fully connected layers and

Â then followed by maybe a softmax.

Â And this is another pretty common pattern you see in neural networks.

Â So let's just go through for

Â this neural network some more details of what are the activation shape,

Â the activation size, and the number of parameters in this network.

Â So the input was 32 x 30 x 3, and

Â if you multiply out those numbers you should get 3,072.

Â So the activation, a0 has dimension 3072.

Â Well it's really 32 x 32 x 3.

Â And there are no parameters I guess at the input layer.

Â And as you look at the different layers,

Â feel free to work out the details yourself.

Â These are the activation shape and

Â the activation sizes of these different layers.

Â 10:15

So just to point out a few things.

Â First, notice that the max pooling layers don't have any parameters.

Â Second, notice that the conv layers tend to have relatively

Â few parameters, as we discussed in early videos.

Â And in fact, a lot of the parameters tend to be in the fully

Â collected layers of the neural network.

Â And then you notice also that the activation size tends to

Â maybe go down gradually as you go deeper in the neural network.

Â If it drops too quickly, that's usually not great for performance as well.

Â So it starts first there with 6,000 and 1,600, and

Â then slowly falls into 84 until finally you have your Softmax output.

Â You find that a lot of will have properties will

Â have patterns similar to these.

Â So you've now seen the basic building blocks of neural networks,

Â your convolutional neural networks, the conv layer, the pooling layer,

Â and the fully connected layer.

Â A lot of computer division research has gone into figuring out how to put together

Â these basic building blocks to build effective neural networks.

Â And putting these things together actually requires quite a bit of insight.

Â I think that one of the best ways for

Â you to gain intuition is about how to put these things together is a C a number of

Â concrete examples of how others have done it.

Â So what I want to do next week is show you a few concrete examples even beyond this

Â first one that you just saw on how people have successfully put these

Â things together to build very effective neural networks.

Â And through those videos next week l hope you hold your own intuitions about how

Â these things are built.

Â And as we are given concrete examples that architectures that maybe you can just use

Â here exactly as developed by someone else or your own application.

Â So we'll do that next week, but

Â before wrapping this week's videos just one last thing which is one I'll talk

Â a little bit in the next video about why you might want to use convolutions.

Â Some benefits and

Â advantages of using convolutions as well as how to put them all together.

Â How to take a neural network like the one you just saw and actually train it

Â on a training set to perform image recognition for some of the tasks.

Â So with that let's go on to the last video of this week.

Â