In terms of designing content architectures, one of the ideas that really help is using a one-by-one convolution. Now, you might be wondering, what does a one-by-one convolution do? Isn't that just multiplying by numbers. That seems like a funny thing to do. Turns out it's not quite like that. Let's take a look. Here's a one-by-one filter. I put a number 2 there. If you take this six-by-six image, six by six by one, and convolve it with this one-by-one-by-one filter, you end up just taking the emission multiplied by two. So 1, 2, 3 ends up being 2, 4, 6, and so on. A convolution by a one-by-one filter doesn't seem particularly useful. You just multiply it by some number. But that's the case of six by six by one channel images. If you have a six-by-six by 32 instead of by one, then a convolution with a one-by-one filter can do something that makes much more sense. In particular, what a one-by-one convolution will do is it will look at each of the 36 different positions here. It will take the element-wise product between 32 numbers on the left and the 32 numbers in the filter, and then apply a ReLU, nonlinearity to it after that. To look at one of the 36 positions, maybe one slice through this volume, you take these 36 numbers, multiply it by one by one slice through the volume like that and you end up with a single real number. Which then gets plotted in one of the outputs like that. In fact, one way to think about the 32 numbers you have in this one, a one by 32 filter is as if you have one neuron that is taking us input 32 numbers. Multiplying each of these 32 numbers in one slice in the same position, height, and width, but these 32 different channels, multiplying them by 32 weights. In applying a ReLU, nonlinearity to it and then outputting the corresponding thing over there. More generally, if you have not just one filter, but if you have multiple filters, then it's as if you have not just one unit, but multiple units to taking as input all the numbers in one slice and then building them up into an output, the 0.66 by six by number of filters. One way to think about a one-by-one convolution is that it is basically having a fully connected neural network that applies to each of the 62 different positions. What that fully connected neural network does is it inputs 32 numbers and outputs number of filters, outputs. I guess the payer notation, this is really a nc of 0 plus 1 if that's the next layer. By doing those at each of the 36 positions, each of the six by six positions, you end up with an output that is six-by-six by the number of filters. This can carry out a pretty non-trivial computation on your input volume. This idea is often called a one-by-one convolution, but it's sometimes also called network in network. It's described in this paper by M. Lin, Q. Chen, S. Yan. Even though the details of the architecture in this paper on views YZ, this idea of a one-by-one convolution of this sometimes called network and network idea has been very influential as influence many other neural network architectures, including the inception network which we'll see you in the next video. But to give you an example of where one-by-one convolution is useful, here's something you could do with it. Let's say you have a 28 by 28 by 192 volume. If you want to shrink the height and width, you can use a pooling layer, so we know how to do that. But one of the number of channels has gotten too big and you want to shrink that. How do you shrink it to a 28 by 28 by 32 dimensional volume? Well, what you can do is use, 32 filters that are one-by-one, and technically each filter would be of dimension one by one by 192, because the number of channels in your filter has to match the number of channels in your input volume. But you use 32 filters and the output of this process will be 28 by 28 by 32 volume. This is a way to let your strength and see as well. Whereas pooling layer are used just to shrink an h and nw, the height and width of these volumes. We'll see later how this idea of one-by-one convolutions allows you to shrink the number of channels and therefore save on computation in some networks. But of course, if you want to keep the number of channels to the 192, that's fine too. The effect of a one-by-one convolution is it just has nonlinearity. It allows you to learn a more complex function of your network by adding another layer, the inputs 20 by 20 by 192, and outputs 20 by 20 by 192. You've now seen how a one-by-one convolution operation is actually doing a pretty non-trivial operation and allows you to shrink the number of channels in your volumes or keep it the same or even increase it if you want. In the next video, you see that this can be used to help build up to the inception network. Let's go onto the next video.