0:01

You've seen how convolutions over 2D images works.

Â Now, let's see how you can implement convolutions over,

Â not just 2D images,

Â but over three dimensional volumes.

Â Let's start with an example,

Â let's say you want to detect features,

Â not just in a great scale image,

Â but in a RGB image.

Â So, an RGB image might be instead of a six by six image,

Â it could be six by six by three,

Â where the three here responds to the three color channels.

Â So, you think of this as a stack of three six by six images.

Â In order to detect edges or some other feature in this image,

Â you can vault this,

Â not with a three by three filter,

Â as we have previously,

Â but now with also with a 3D filter,

Â that's going to be three by three by three.

Â So the filter itself will also have three layers corresponding to the red,

Â green, and blue channels.

Â So to give these things some names,

Â this first six here,

Â that's the height of the image,

Â that's the width, and this three is the number of channels.

Â And your filter also similarly has a height,

Â a width, and the number of channels.

Â And the number of channels in

Â your image must match the number of channels in your filter,

Â so these two numbers have to be equal.

Â We'll see on the next slide how this convolution operation actually works,

Â but the output of this will be a four by four image.

Â And notice this is four by four by one,

Â there's no longer a three at the end.

Â Let's go through in detail how this works but let's use a more nicely drawn image.

Â So here's the six by six by three image,

Â and here's a three by three by three filter,

Â and this last number,

Â the number of channels matches the 3D image and the filter.

Â So to simplify the drawing of this three by three by three filter,

Â instead of joining it is a stack of the matrices, I'm also going to,

Â sometimes, just draw it as this three dimensional cube, like that.

Â So to compute the output of this convolutional operation,

Â what you would do is take the three by three by three filter and first,

Â place it in that upper left most position.

Â So, notice that this three by three by three filter has 27 numbers,

Â or 27 parameters, that's three cubes.

Â And so, what you do is take each of

Â these 27 numbers and multiply them with the corresponding numbers from the red,

Â green, and blue channels of the image,

Â so take the first nine numbers from red channel,

Â then the three beneath it to the green channel,

Â then the three beneath it to the blue channel,

Â and multiply it with the corresponding 27 numbers that gets

Â covered by this yellow cube show on the left.

Â Then add up all those numbers and this gives you this first number in the output,

Â and then to compute the next output you take this cube and slide it over by one,

Â and again, due to 27 multiplications,

Â add up the 27 numbers,

Â that gives you this next output,

Â do it for the next number over,

Â for the next position over,

Â that gives the third output and so on.

Â That dives you the forth and then one row down and then the next one,

Â to the next one, to the next one,

Â and so on, you get the idea,

Â until at the very end,

Â that's the position you'll have for that final output.

Â So, what does this allow you to do?

Â Well, here's an example,

Â this filter is three by three by three.

Â So, if you want to detect edges in the red channel of the image,

Â then you could have the first filter, the one, one, one, one is one,

Â one is one, one is one as usual,

Â and have the green channel be all zeros,

Â and have the blue filter be all zeros.

Â And if you have these three stock together to form your three by three by three filter,

Â then this would be a filter that detect edges,

Â vertical edges but only in the red channel.

Â Alternatively, if you don't care what color the vertical edge is in,

Â then you might have a filter that's like this,

Â whereas this one, one, one, minus one,

Â minus one, minus one,

Â in all three channels.

Â So, by setting this second alternative, set the parameters,

Â you then have a edge detector,

Â a three by three by three edge detector,

Â that detects edges in any color.

Â And with different choices of these parameters you can get

Â different feature detectors out of this three by three by three filter.

Â And by convention, in computer vision,

Â when you have an input with a certain height, a certain width,

Â and a certain number of channels, then

Â your filter will have a potential different height,

Â different width, but the same number of channels.

Â And in theory it's possible to have a filter that maybe only looks at the red channel

Â or maybe a filter looks at only the green channel and a blue channel.

Â And once again, you notice th\t convolving a volume,

Â a six by six by three convolve with a three by three by three,

Â that gives a four by four, a 2D output.

Â Now that you know how to convolve on volumes,

Â there is one last idea that will be crucial for building convolutional neural networks,

Â which is what if we don't just wanted to detect vertical edges?

Â What if we wanted to detect vertical edges and horizontal edges

Â and maybe 45 degree edges and maybe 70 degree edges as well,

Â but in other words, what if you want to use multiple filters at the same time?

Â So, here's the picture we had from the previous slide,

Â we had six by six by three convolved with the three by three by three,

Â gets four by four,

Â and maybe this is a vertical edge detector,

Â or maybe it's run to detect some other feature.

Â Now, maybe a second filter may be denoted by this orange-ish color,

Â which could be a horizontal edge detector.

Â So, maybe convolving it with the first filter gives you this first four by four output

Â and convolving with the second filter gives you a different four by four output.

Â And what we can do is then take these two four by four outputs,

Â take this first one within the front and you

Â can take this second filter output and well, let me draw it here,

Â put it at back as follows,

Â so that by stacking these two together,

Â you end up with a four by four by two output volume, right?

Â And you can think of the volume as if we draw this is a box,

Â I guess it would look like this.

Â So this would be a four by four by two output volume,

Â which is the result of taking your six by six by three image and

Â convolving it or applying two different three by three filters to it,

Â resulting in two four by four outputs that then gets stacked up

Â to form a four by four by two volume.

Â And the two here comes from the fact that we used two different filters.

Â So, let's just summarize the dimensions,

Â if you have a n by n by number of channels input image,

Â so an example, there's a six by six by three,

Â where n subscript C is the number of channels,

Â and you convolve that with a f by f by, and again,

Â this should be the same nC, so this was,

Â three by three by three,

Â and by convention this and this have to be the same number.

Â Then, what you get is n minus f plus one by

Â n minus f plus one by and you want to use this nC prime,

Â or its really nC of the next layer,

Â but this is the number of filters that you use.

Â So this in our example would be be four by four by two.

Â And I wrote this assuming that you use a stride of one and no padding.

Â But if you used a different stride of padding

Â than this n minus F plus one would be affected in a usual way,

Â as we see in the previous videos.

Â So this idea of convolution on volumes,

Â turns out to be really powerful.

Â Only a small part of it is that you can now operate

Â directly on RGB images with three channels.

Â But even more important is that

Â you can now detect two features, like vertical, horizontal edges,

Â or 10, or maybe a 128,

Â or maybe several hundreds of different features.

Â And the output will then have a number

Â of channels equal to the number of filters you are detecting.

Â And as a note of notation,

Â I've been using your number of channels to denote this last dimension in the literature,

Â people will also often call this the depth of this 3D volume and both notations,

Â channels or depth, are commonly used in the literature.

Â But they find depth more confusing

Â because you usually talk about the depth of the neural network as well,

Â so I'm going to use the term channels in these videos to refer to

Â the size of this third dimension of these filters.

Â So now that you know how to implement convolutions over volumes,

Â you now are ready to implement one layer of the convolutional neural network.

Â Let's see how to do that in the next video.

Â