If you run a learning algorithm and it doesn't do as long as you are hoping,

almost all the time,

it will be because you have either a high bias problem or a high variance problem,

in other words, either an underfitting problem or an overfitting problem.

In this case, it's very important to figure out which of

these two problems is bias or variance or a bit of both that you actually have.

Because knowing which of these two things is happening would give

a very strong indicator for

whether the useful and promising ways to try to improve your algorithm.

In this video, I'd like to delve more deeply into

this bias and variance issue and understand them better as was

figure out how to look in a learning algorithm and

evaluate or diagnose whether we might have a bias problem or

a variance problem since this will be critical to figuring out

how to improve the performance of a learning algorithm that you will implement.

So, you've already seen this figure a few times where if you fit

two simple hypothesis like a straight line that underfits the data,

if you fit a two complex hypothesis,

then that might fit the training set perfectly but overfit the data and this

may be hypothesis of some intermediate level of complexities

of some maybe degree two polynomials or not too low and not too high degree

that's like just right and gives you the best generalization error over these options.

Now that we're armed with the notion of chain training and validation in test sets,

we can understand the concepts of bias and variance a little bit better.

Concretely, let's let our training error and cross

validation error be defined as in the previous videos.

Just say the squared error,

the average squared error,

as measured on the training sets or as measured on the cross validation set.

Now, let's plot the following figure.

On the horizontal axis I'm going to plot the degree of polynomial.

So, as I go to the right I'm going to be fitting higher and higher order polynomials.

So where the left of this figure where maybe d equals one,

we're going to be fitting

very simple functions whereas we're here on the right of the horizontal axis,

I have much larger values of ds,

of a much higher degree polynomial.

So here, that's going to correspond to fitting

much more complex functions to your training set.

Let's look at the training error and

the cross validation error and plot them on this figure.

Let's start with the training error.

As we increase the degree of the polynomial,

we're going to be able to fit our training set better and better and so if d equals one,

then there is high training error,

if we have a very high degree of polynomial our training error is going to be really low,

maybe even 0 because will fit the training set really well.

So, as we increase the degree of polynomial,

we find typically that the training error decreases.

So I'm going to write J subscript train of theta there,

because our training error tends to decrease with

the degree of the polynomial that we fit to the data.

Next, let's look at the cross-validation error or for that matter,

if we look at the test set error,

we'll get a pretty similar result as if we were to plot the cross validation error.

So, we know that if d equals one,

we're fitting a very simple function and so we may be

underfitting the training set and so it's going to be very high cross-validation error.

If we fit an intermediate degree polynomial,

we had d equals two in our example in the previous slide,

we're going to have a much lower cross-validation error because

we're finding a much better fit to the data.

Conversely, if d were too high.

So if d took on say a value of four,

then we're again overfitting,

and so we end up with a high value for cross-validation error.

So, if you were to vary this smoothly and plot a curve,

you might end up with a curve like that where that's JCV of theta.

Again, if you plot J test of theta you get something very similar.

So, this sort of plot also helps us

to better understand the notions of bias and variance.

Concretely, suppose you have applied

a learning algorithm and it's not performing as well as you are hoping,

so if your cross-validation set error or your test set error is high,

how can we figure out if the learning algorithm is suffering

from high bias or suffering from high variance?

So, the setting of a cross-validation error being high

corresponds to either this regime or this regime.

So, this regime on the left corresponds to a high bias problem.

That is, if you are fitting a overly low order polynomial such as

a d equals one when we really needed a higher order polynomial to fit to data,

whereas in contrast this regime corresponds to a high variance problem.

That is, if d the degree of polynomial was too large for the data set that we have,

and this figure gives us a clue for how to distinguish between these two cases.

Concretely, for the high bias case,

that is the case of underfitting,

what we find is that

both the cross validation error and the training error are going to be high.

So, if your algorithm is suffering from a bias problem,

the training set error will be

high and you might find that the cross validation error will also be high.

It might be close,

maybe just slightly higher, than the training error.

So, if you see this combination,

that's a sign that your algorithm may be suffering from high bias.

In contrast, if your algorithm is suffering from high variance,

then if you look here,

we'll notice that J train,

that is the training error,

is going to be low.

That is, you're fitting the training set very well,

whereas your cross validation error assuming that this is, say,

the squared error which we're trying to minimize say,

whereas in contrast your error on a cross validation set or your cross function or cross

validation set will be much bigger than your training set error.

So, this is a double greater than sign.

That's the map symbol for much greater thans,

denoted by two greater than signs.

So if you see this combination of values,

then that's a clue that

your learning algorithm may be suffering from high variance and might be overfitting.

The key that distinguishes these two cases is,

if you have a high bias problem,

your training set error will also be high is

your hypothesis just not fitting the training set well.

If you have a high variance problem,

your training set error will usually be low,

that is much lower than your cross-validation error.

So hopefully that gives you

a somewhat better understanding of the two problems of bias and variance.

I still have a lot more to say about bias and variance in the next few videos,

but what we'll see later is that by diagnosing whether

a learning algorithm may be suffering from high bias or high variance,

I'll show you even more details on how to do that in later videos.

But we'll see that by figuring out whether a learning algorithm may be

suffering from high bias or high variance or combination of both,

that that would give us much better guidance for what might be promising things to

try in order to improve the performance of a learning algorithm.