0:00

In the last video, you saw how to define

Â the content cost function for the neural style transfer.

Â Next, let's take a look at the style cost function.

Â So, what is the style of an image mean?

Â Let's say you have an input image like this,

Â they used to seeing a convnet like that,

Â compute features that there's different layers.

Â And let's say you've chosen some layer L,

Â maybe that layer to define the measure of the style of an image.

Â What we need to do is define the style as the correlation between

Â activations across different channels in this layer L activation.

Â So here's what I mean by that.

Â Let's say you take that layer L activation.

Â So this is going to be nh by nw by nc block of activations,

Â and we're going to ask how correlated are the activations across different channels.

Â So to explain what I mean by this may be slightly cryptic phrase,

Â let's take this block of activations

Â and let me shade the different channels by a different colors.

Â So in this below example,

Â we have say five channels and which is why I have five shades of color here.

Â In practice, of course,

Â in neural network we usually have a lot more channels than five,

Â but using just five makes it drawing easier.

Â But to capture the style of an image,

Â what you're going to do is the following.

Â Let's look at the first two channels.

Â Let's see for the red channel and the yellow channel and say

Â how correlated are activations in these first two channels.

Â So, for example, in the lower right hand corner,

Â you have some activation in the first channel and some activation in the second channel.

Â So that gives you a pair of numbers.

Â And what you do is look at different positions across

Â this block of activations and just look at those two pairs of numbers,

Â one in the first channel, the red channel,

Â one in the yellow channel, the second channel.

Â And you just look at these two pairs of numbers and

Â see when you look across all of these positions,

Â all of these nh by nw positions,

Â how correlated are these two numbers.

Â So, why does this capture style?

Â Let's look another example.

Â Here's one of the visualizations from the earlier video.

Â This comes from again the paper by

Â Matthew Zeiler and Rob Fergus that I have reference earlier.

Â And let's say for the sake of arguments,

Â that the red neuron corresponds to,

Â and let's say for the sake of arguments,

Â that the red channel corresponds to this neurons so we're trying to figure out if there's

Â this little vertical texture in

Â a particular position in the nh and let's say that this second channel,

Â this yellow second channel corresponds to this neuron,

Â which is vaguely looking for orange colored patches.

Â What does it mean for these two channels to be highly correlated?

Â Well, if they're highly correlated what that means is whatever part of

Â the image has this type of subtle vertical texture,

Â that part of the image will probably have these orange-ish tint.

Â And what does it mean for them to be uncorrelated?

Â Well, it means that whenever there is this vertical texture,

Â it's probably won't have that orange-ish tint.

Â And so the correlation tells you which of

Â these high level texture components tend to occur or not occur together

Â in part of an image and that's the degree of correlation that gives you

Â one way of measuring how often these different high level features,

Â such as vertical texture or this orange tint or other things as well,

Â how often they occur and how often they occur

Â together and don't occur together in different parts of an image.

Â And so, if we use the degree of correlation between channels as a measure of the style,

Â then what you can do is measure the degree to which in your generated image,

Â this first channel is correlated or uncorrelated with

Â the second channel and that will tell you in the generated image how often

Â this type of vertical texture occurs or doesn't

Â occur with this orange-ish tint and this gives you a measure

Â of how similar is the style of the generated image to the style of the input style image.

Â So let's now formalize this intuition.

Â So what you can to do is given an image computes something called a style matrix,

Â which will measure all those correlations we talks about on the last slide.

Â So, more formally, let's let a superscript l, subscript i,

Â j,k denote the activation at position i,j,k in

Â hidden layer l. So i indexes into the height,

Â j indexes into the width,

Â and k indexes across the different channels.

Â So, in the previous slide,

Â we had five channels that k will index across those five channels.

Â So what the style matrix will do is you're going to compute a matrix clauses

Â G superscript square bracketed l. This is going to be an nc by nc dimensional matrix,

Â so it'd be a square matrix.

Â Remember you have nc channels and so you have an

Â nc by nc dimensional matrix in order to measure how correlated each pair of them is.

Â So particular G, l, k,

Â k prime will measure how correlated are the activations in

Â channel k compared to the activations in channel k prime.

Â Well here, k and k prime will range from 1 through nc,

Â the number of channels they're all up in that layer.

Â So more formally, the way you compute G,

Â l and I'm just going to write down the formula for computing one elements.

Â So the k, k prime elements of this.

Â This is going to be sum of a i,

Â sum of a j,

Â of deactivation and that layer i, j,

Â k times the activation at i, j, k prime.

Â So, here, remember i and j index across to a different positions in the block,

Â indexes over the height and width.

Â So i is the sum from one to nh and j is a sum from one to nw

Â and k here and k prime index over the channel so

Â k and k prime range from one to

Â the total number of channels in that layer of the neural network.

Â So all this is doing

Â is summing over the different positions that the image over the height and width and just

Â multiplying the activations together of

Â the channels k and k prime and that's the definition of G,k,k prime.

Â And you do this for every value of k and k prime to compute this matrix G,

Â also called the style matrix.

Â And so notice that if both of these activations tend to be lashed together,

Â then G, k, k prime will be large,

Â whereas if they are uncorrelated then g,k,

Â k prime might be small.

Â And technically, I've been using

Â the term correlation to convey intuition but this is actually

Â the unnormalized cross of the areas because we're not

Â subtracting out the mean and this is just multiplied by these elements directly.

Â So this is how you compute the style of an image.

Â And you'd actually do this for both the style image s,n for

Â the generated image G. So just to distinguish that this is the style image,

Â maybe let me add a round bracket S there,

Â just to denote that this is the style image for the image

Â S and those are the activations on the image

Â S. And what you do is then compute the same thing for the generated image.

Â So it's really the same thing summarized sum of a j, a, i,

Â j, k, l, a,

Â i, j,k,l and the summation indices are the same.

Â Let's follow this and you want to just denote this is for the generated image,

Â I'll just put the round brackets G there.

Â So, now, you have two matrices they capture what is the style with

Â the image s and what is the style of the image G. And,

Â by the way, we've been using the alphabet capital G to denote these matrices.

Â In linear algebra, these are also called the

Â grand matrix of these in called grand matrices but in this video,

Â I'm just going to use the term style matrix because this term grand

Â matrix that most of these using capital G to denote these matrices.

Â Finally, the cost function,

Â the style cost function.

Â If you're doing this on layer l between s and G,

Â you can now define that to be

Â just the difference

Â between these two matrices,

Â G l, G square and these are matrices.

Â So just take it from the previous one.

Â This is just the sum of squares of the element wise differences between

Â these two matrices and just divides this out this is going to be sum over k,

Â sum over k prime of these differences of s, k,

Â k prime minus G l,

Â G, k, k prime and then the sum of square of the elements.

Â The authors actually used this for the normalization constants two times of nh,

Â nw, in that layer,

Â nc in that layer and I'll square this and you can put this up here as well.

Â But a normalization constant doesn't matter that much because this

Â causes multiplied by some hyperparameter b anyway.

Â So just to finish up,

Â this is the style cost function defined

Â using layer l and as you saw on the previous slide,

Â this is basically the Frobenius norm between the two star matrices computed on

Â the image s and on the image G

Â Frobenius on squared and never by the just low normalization constants,

Â which isn't that important.

Â And, finally, it turns out that you get more visually pleasing results if you

Â use the style cost function from multiple different layers.

Â So, the overall style cost function,

Â you can define as sum over

Â all the different layers of the style cost function for that layer.

Â We should define the book weighted by some set of parameters,

Â by some set of additional hyperparameters,

Â which we'll denote as lambda l here.

Â So what it does is allows you to use different layers in a neural network.

Â Well of the early ones,

Â which measure relatively simpler low level features

Â like edges as well as some later layers,

Â which measure high level features and cause a neural network to take

Â both low level and high level correlations into account when computing style.

Â And, in the following exercise,

Â you gain more intuition about what might be

Â reasonable choices for this type of parameter lambda as well.

Â And so just to wrap this up,

Â you can now define the overall cost function

Â as alpha times the content cost between c and G plus

Â beta times the style cost between s and G and then just create in the sense

Â or a more sophisticated optimization algorithm if you want

Â in order to try to find an image G that normalize,

Â that tries to minimize this cost function j of G. And if you do that,

Â you can generate pretty good looking neural artistic

Â and if you do that you'll be able to generate some pretty nice novel artwork.

Â So that's it for neural style transfer and I hope you have

Â fun implementing it in this week's printing exercise.

Â Before wrapping up this week,

Â there's just one last thing I want to share of you,

Â which is how to do convolutions over

Â 1D or 3D data rather than over only 2D images. Let's go into the last video.

Â