So if your input image is 64 pixels by 64 pixels,

then you would have 3 64 by 64 matrices

corresponding to the red, green and blue pixel intensity values for your images.

Although to make this little slide I drew these as much smaller matrices, so

these are actually 5 by 4 matrices rather than 64 by 64.

So to turn these pixel intensity values- Into a feature vector, what we're

going to do is unroll all of these pixel values into an input feature vector x.

So to unroll all these pixel intensity values into Feature vector, what we're

going to do is define a feature vector x corresponding to this image as follows.

We're just going to take all the pixel values 255, 231, and so on.

255, 231, and so on until we've listed all the red pixels.

And then eventually 255 134 255, 134 and so

on until we get a long feature vector listing out all the red,

green and blue pixel intensity values of this image.

If this image is a 64 by 64 image, the total dimension

of this vector x will be 64 by 64 by 3 because that's

the total numbers we have in all of these matrixes.

Which in this case, turns out to be 12,288,

that's what you get if you multiply all those numbers.

And so we're going to use nx=12288

to represent the dimension of the input features x.

And sometimes for brevity, I will also just use lowercase n

to represent the dimension of this input feature vector.

So in binary classification, our goal is to learn a classifier that can input

an image represented by this feature vector x.

And predict whether the corresponding label y is 1 or 0,

that is, whether this is a cat image or a non-cat image.

Let's now lay out some of the notation that we'll

use throughout the rest of this course.

A single training example is represented by a pair,

(x,y) where x is an x-dimensional feature

vector and y, the label, is either 0 or 1.

Your training sets will comprise lower-case m training examples.

And so your training sets will be written (x1, y1) which is the input and

output for your first training example (x(2), y(2)) for

the second training example up to <xm, ym) which is your last training example.

And then that altogether is your entire training set.

So I'm going to use lowercase m to denote the number of training samples.

And sometimes to emphasize that this is the number of train examples,

I might write this as M = M train.

And when we talk about a test set,

we might sometimes use m subscript test to denote the number of test examples.

So that's the number of test examples.

Finally, to output all of the training examples into a more compact notation,

we're going to define a matrix, capital X.

As defined by taking you training set inputs x1, x2 and

so on and stacking them in columns.

So we take X1 and put that as a first column of this matrix,

X2, put that as a second column and so on down to Xm,

then this is the matrix capital X.

So this matrix X will have M columns, where M is the number of train

examples and the number of railroads, or the height of this matrix is NX.

Notice that in other causes, you might see the matrix capital

X defined by stacking up the train examples in rows like so,

X1 transpose down to Xm transpose.

It turns out that when you're implementing neural networks using

this convention I have on the left, will make the implementation much easier.

So just to recap, x is a nx by m dimensional matrix, and

when you implement this in Python,

you see that x.shape, that's the python command for

finding the shape of the matrix, that this an nx, m.

That just means it is an nx by m dimensional matrix.

So that's how you group the training examples, input x into matrix.

How about the output labels Y?

It turns out that to make your implementation of a neural network easier,

it would be convenient to also stack Y In columns.

So we're going to define capital Y to be equal to Y 1, Y 2,

up to Y m like so.

So Y here will be a 1 by m dimensional matrix.

And again, to use the notation without the shape of Y will be 1, m.

Which just means this is a 1 by m matrix.

And as you influence your new network, mtrain discourse, you find that a useful

convention would be to take the data associated with different training

examples, and by data I mean either x or y, or other quantities you see later.

But to take the stuff or

the data associated with different training examples and

to stack them in different columns, like we've done here for both x and y.