In this session, we're going to cover a couple of topics.

One main idea is,

if you don't do R-squared,

if you don't do statistics but we just have some data,

we keep it aside for test,

how can we use that data to improve our models?

That's the one part of it.

The second part is,

how do we do regression when

the response variable is not linear,

but it's a 0-1 response

or red-blue-green kind of response?

We're going to do these two big ideas in this session.

So here, I'll break this up into four parts.

The first part is I think it is just splitting hairs,

but more about is there a difference between

making prediction versus explaining a phenomenon?

That's the one thing we want to do,

and then we want to use the idea

of a hold-out sample in this context.

The second thing we want to do is,

how do we use regression?

If you want to predict,

how will we use regression?

How will we use the hold-out sample?

How will we measure performance with the hold-out sample?

Way quickly, we will pass

through one other idea which is,

how do we improve our model by

deciding which variables to keep on regression,

which variables not to keep in our regression,

why is that an important thing especially with big data?

The last thing would be,

how do we extend our models to handle binary variables?

So we'll be talking about

a newly type of regression called logistics regression.