0:00

For this final video for this week,

Â let's talk a bit about why convolutions are so

Â useful when you include them in your neural networks.

Â And then finally, let's briefly talk about how to put this all together and how

Â you could train a convolution neural network when you have a label training set.

Â I think there are two main advantages of

Â convolutional layers over just using fully connected layers.

Â And the advantages are parameter sharing and sparsity of connections.

Â Let me illustrate with an example.

Â Let's say you have a 32 by 32 by 3 dimensional image,

Â and this actually comes from the example from the previous video,

Â but let's say you use five by five filter with six filters.

Â And so, this gives you a 28 by 28 by 6 dimensional output.

Â So, 32 by 32 by 3 is 3,072,

Â and 28 by 28 by 6 if you multiply all those numbers is 4,704.

Â And so, if you were to create a neural network with 3,072 units in one layer,

Â and with 4,704 units in the next layer,

Â and if you were to connect every one of these neurons,

Â then the weight matrix,

Â the number of parameters in a weight matrix would be 3,072

Â times 4,704 which is about 14 million.

Â So, that's just a lot of parameters to train.

Â And today you can train neural networks with even more parameters than 14 million,

Â but considering that this is just a pretty small image,

Â this is a lot of parameters to train.

Â And of course, if this were to be 1,000 by 1,000 image,

Â then your display matrix will just become invisibly large.

Â But if you look at the number of parameters in this convolutional layer,

Â each filter is five by five.

Â So, each filter has 25 parameters,

Â plus a bias parameter miss of 26 parameters per a filter,

Â and you have six filters, so,

Â the total number of parameters is that,

Â which is equal to 156 parameters.

Â And so, the number of parameters in this conv layer remains quite small.

Â And the reason that a consonant has run to these small parameters is really two reasons.

Â One is parameter sharing.

Â And parameter sharing is motivated by the observation

Â that feature detector such as vertical edge detector,

Â that's useful in one part of the image is probably useful in another part of the image.

Â And what that means is that,

Â if you've figured out say a three by three filter for detecting vertical edges,

Â you can then apply the same three by three filter over here,

Â and then the next position over,

Â and the next position over, and so on.

Â And so, each of these feature detectors,

Â each of these aqua's can use the same parameters in lots of

Â different positions in your input image in order to

Â detect say a vertical edge or some other feature.

Â And I think this is true for low-level features like edges,

Â as well as the higher level features, like maybe,

Â detecting the eye that indicates a face or a cat or something there.

Â But being with a share in this case

Â the same nine parameters to compute all 16 of these aquas,

Â is one of the ways the number of parameters is reduced.

Â And it also just seems intuitive that a feature detector

Â like a vertical edge detector computes it for the upper left-hand corner of the image.

Â The same feature seems like it will probably be useful,

Â has a good chance of being useful for the lower right-hand corner of the image.

Â So, maybe you don't need to learn

Â separate feature detectors for

Â the upper left and the lower right-hand corners of the image.

Â And maybe you do have a dataset where you have

Â the upper left-hand corner and lower right-hand corner have different distributions, so,

Â they maybe look a little bit different but they might be similar enough,

Â they're sharing feature detectors all across the image, works just fine.

Â The second way that consonants get away with

Â having relatively few parameters is by having sparse connections.

Â So, here's what I mean,

Â if you look at the zero,

Â this is computed via three by three convolution.

Â And so, it depends only on this three by three inputs grid or cells.

Â So, it is as if this output units on the right is connected only

Â to nine out of these six by six, 36 input features.

Â And in particular, the rest of these pixel values,

Â all of these pixel values do not have any effects on the other output.

Â So, that's what I mean by sparsity of connections.

Â As another example, this output depends only on these nine input features.

Â And so, it's as if only those nine input features are connected to this output,

Â and the other pixels just don't affect this output at all.

Â And so, through these two mechanisms,

Â a neural network has a lot fewer parameters which allows it

Â to be trained with smaller training cells and is less prone to be over 30.

Â And so, sometimes you also hear about

Â convolutional neural networks being very good at capturing translation invariance.

Â And that's the observation that

Â a picture of a cat shifted a couple of pixels to the right,

Â is still pretty clearly a cat.

Â And convolutional structure helps the neural network encode the fact that an image

Â shifted a few pixels should result in pretty similar features and

Â should probably be assigned the same oval label.

Â And the fact that you are applying to same filter,

Â knows all the positions of the image,

Â both in the early layers and in the late layers that

Â helps a neural network automatically learn to be more

Â robust or to better capture the desirable property of translation invariance.

Â So, these are maybe a couple of the reasons why

Â convolutions or convolutional neural network work so well in computer vision.

Â Finally, let's put it all together and see how you can train one of these networks.

Â Let's say you want to build a cat detector and you

Â have a labeled training sets as follows,

Â where now, X is an image.

Â And the y's can be binary labels,

Â or one of K causes.

Â And let's say you've chosen a convolutional neural network structure,

Â may be inserted the image and then having neural convolutional and pulling layers

Â and then some fully connected layers

Â followed by a software output that then operates Y hat.

Â The conv layers and the fully connected layers will have various parameters,

Â W, as well as bias's B.

Â And so, any setting of the parameters, therefore,

Â lets you define a cost function similar to what we have seen in the previous courses,

Â where we've randomly initialized parameters W and B.

Â You can compute the cause J,

Â as the sum of losses of the neural networks predictions on your entire training set,

Â maybe divide it by M. So,

Â to train this neural network,

Â all you need to do is then use gradient descents or some of

Â the algorithm like, gradient descent momentum,

Â or RMSProp or Adam, or something else,

Â in order to optimize all the parameters of

Â the neural network to try to reduce the cost function J.

Â And you find that if you do this,

Â you can build a very effective cat detector or some other detector.

Â So, congratulations on finishing this week's videos.

Â You've now seen all the basic building blocks of a convolutional neural network,

Â and how to put them together into an effective image recognition system.

Â In this week's program exercises,

Â I think all of these things will come more concrete,

Â and you'll get the chance to practice implementing

Â these things yourself and seeing it work for yourself.

Â Next week, we'll continue to go deeper into convolutional neural networks.

Â I mentioned earlier, that there're just a lot of

Â the hyperparameters in convolution neural networks.

Â So, what I want to do next week,

Â is show you a few concrete examples of some of

Â the most effective convolutional neural networks,

Â so you can start to recognize the patterns

Â of what types of network architectures are effective.

Â And one thing that people often do is just take the architecture that

Â someone else has found and published in

Â a research paper and just use that for your application.

Â And so, by seeing some more concrete examples next week,

Â you also learn how to do that better.

Â And beyond that, next week,

Â we'll also just get that intuitions about what makes confinet work well,

Â and then in the rest of the course,

Â we'll also see a variety of other computer vision applications such as,

Â object detection, and neural store transfer.

Â How they create new forms of artwork using these set of algorithms.

Â So, that's over this week,

Â best of luck with the home works,

Â and I look forward to seeing you next week.

Â