For the problem of dimensionality reduction, by far the most popular,

by far the most commonly used algorithm is something called

principle components analysis, or PCA.

In this video, I'd like to start talking about the problem formulation for PCA.

In other words, let's try to formulate,

precisely, exactly what we would like PCA to do.

Let's say we have a data set like this.

So, this is a data set of examples x and R2 and let's say I want to

reduce the dimension of the data from two-dimensional to one-dimensional.

In other words, I would like to find a line onto which to project the data.

So what seems like a good line onto which to project the data,

it's a line like this, might be a pretty good choice.

And the reason we think this might be a good choice is that if you look at where

the projected versions of the point scales, so I take this point and

project it down here.

Get that, this point gets projected here, to here, to here, to here.

What we find is that the distance between each point and

the projected version is pretty small.

That is, these blue line segments are pretty short.

So what PCA does formally is it tries to find a lower dimensional surface,

really a line in this case, onto which to project the data so

that the sum of squares of these little blue line segments is minimized.

The length of those blue line segments,

that's sometimes also called the projection error.

And so what PCA does is it tries to find a surface onto which to project the data so

as to minimize that.

As an aside, before applying PCA, it's standard practice to first

perform mean normalization at feature scaling so that the features x1 and

x2 should have zero mean, and should have comparable ranges of values.

I've already done this for this example, but I'll come back to this later and

talk more about feature scaling and the normalization in the context of PCA later.