For those of you there are, maybe some are more familiar with linear algebra,

what some students have asked me is,

when computing this Theta equals X transpose X inverse X transpose Y.

What if the matrix X transpose X is non-invertible?

So for those of you that know a bit more linear algebra

you may know that only some matrices are invertible and

some matrices do not have an inverse we call those non-invertible matrices.

Singular or degenerate matrices.

The issue or

the problem of x transpose x being non invertible should happen pretty rarely.

And in Octave if you implement this to compute theta,

it turns out that this will actually do the right thing.

I'm getting a little technical now, and I don't want to go into the details,

but Octave hast two functions for inverting matrices.

One is called pinv, and the other is called inv.

And the differences between these two are somewhat technical.

One's called the pseudo-inverse, one's called the inverse.

But you can show mathematically that so

long as you use the pinv function then this will actually compute

the value of data that you want even if X transpose X is non-invertible.

The specific details between inv.

What is the difference between pinv?

What is inv?

That's somewhat advanced numerical computing concepts,

I don't really want to get into.

But I thought in this optional video, I'll try to give you little bit of intuition

about what it means for X transpose X to be non-invertible.

For those of you that know a bit more linear Algebra might be interested.

I'm not gonna prove this mathematically but if X transpose X is non-invertible,

there usually two most common causes for this.

The first cause is if somehow in your learning problem you have redundant

features.

Concretely, if you're trying to predict housing prices and if x1 is the size of

the house in feet, in square feet and x2 is the size of the house in square meters,

then you know 1 meter is equal to 3.28 feet Rounded to two decimals.

And so your two features will always satisfy the constraint x1

equals 3.28 squared times x2.

And you can show for those of you that are somewhat advanced in linear Algebra, but

if you're explaining the algebra you can actually show that if your two features

are related, are a linear equation like this.

Then matrix X transpose X would be non-invertable.

The second thing that can cause X transpose X to be non-invertable is if you

are training, if you are trying to run the learning algorithm with a lot of features.

Concretely, if m is less than or equal to n.

For example, if you imagine that you have m = 10 training examples

that you have n equals 100 features then you're trying to fit

a parameter back to theta which is, you know, n plus one dimensional.

So this is 101 dimensional,

you're trying to fit 101 parameters from just 10 training examples.