In this video, we're going to make some final comments

on the least squares regression fitting of data.

We're going to look at how we really do this in angle in

the world using computational tools like

Matlab or Python or R. So you've made the gradient descent least squares minimiser,

and then you used that to solve the sandpit problem already.

There are a few comments to make before we move on.

In reality, there are a huge number of solvers for non-linear least squares problems.

For instance, we can observe that if we do

a Taylor series expansion of chi squared, then the second term,

the second derivative is the Hessian which gives us information about

the curvature or the gradient of the gradient and the gradient of the Jacobian.

And therefore, we can shoot directly for where

the Jacobian is zero just as in Newton episode using that second derivative.

Now using the Hessian would be faster than

simply taking steps along a steep descent algorithm.

Effectively, we'd be using the Hessian to give us

a guess as to the sise of the step we should take in gradient descent.

The problem is is that often the Hessian isn't

very stable especially far from the minimum.

So the Levenberg–Marquardt method uses steepest descent far from the minimum,

and then switches to use the Hessian as it gets close to

the minimum based on the criteria as to whether chi squared is getting better or not.

If it is getting better, it uses the Hessian.

And if it's in trouble,

it uses steepest descent.

There is also the Gauss-Newton method and the BFGS method amongst many others that either

use the Hessian directly or build up

information about the Hessian over successive iterations.

And depending on the convergence,

then different methods may be better than others.

Robust fitting is another topic you should

be aware of in case you need to look it up later.

If we come back to Anscombe's quartet here,

we see the bottom left data center problem has just that one flyer data point.

A truly robust fitting method will be unbothered by such a data point.

One approach to robust fitting is it minimises instead of the least squares,

the absolute of the square deviations.

So it doesn't weigh the points that are far away from the line as strongly.

And that means it fits a little bit what visually looks a bit better.

Now, let's turn to look at how you do this in the real world.

In Matlab, it's easy.

You simply import your data using the input data tab at the top of the screen,

and then flick over to the apps tab and start up curve fitting, the curve fitting app.

There, you can even symbolically define your own algorithm.

You have to pick a starting guess and it will fit your function for you.

Or you can use a pre-built function that already knows its Jacobian,

so will therefore be faster and more efficient.

In Python, it's very nearly as simple.

In the scientific Python,

the scipy set of modules,

they optimiser module includes a least squares curve fitting minimiser,

a curve fit which is at the link below this video.

Accompanying this video, we've left you one of the scipy examples

for fitting some data so you can see how to use it. This is it.

And it's amazing, the fitting takes really works very well,

produces this nice graph here,

this apparently noisy data with this crazy function.

And the fitting only takes three lines,

the three bold ones here.

Two lines to define the function and one line to do the fit.

The rest of that code is all about plotting and importing the data and anything else,

while generating it in this case actually.

Effectively, the set of routines in scipy is the modern implementation of

what's called MINPACK which was a Fortran set of

routines published by Jorge More in 1980,

and described in the book Numerical Recipes.

And it's absolutely astonishing how easy it is to do this stuff now.

When I was a student, we were used to reading this long difficult textbook.

Now, you just write one line in python.

Just as in Python,

in the R statistical programming language,

there's also a minimiser for doing non-linear least squares fitting of models to data.

So if you like R for looking at data,

you can also do all this stuff just as easily in R. So what I want you to do

now is write a Python code block to fit the Gaussian distribution shown here.

You'll need to give it a starting guess and we'll give you

the input data for the height distribution in the population.

This is the final height distribution dataset

that we've been talking about schematically all the way through these two courses.

And what you'll find is that the mean height B here is

about 178 centimeters and

the characteristic width for this distribution is about 11 centimeters.

That's parameter C in the equation here.

Now it's useful at this point to think about why we need to have the starting guess.

If we starting with a guess here for the mean of 100 centimeters,

the model curve wouldn't overlap with the data at all.

So when we did a little move to B,

we get no change,

and therefore the gradient of chi squared with respect to the mean B would be zero.

So, the algorithm wouldn't know what direction to go in.

We wouldn't get a sensible answer for the Jacobian or for grad.

And therefore, our algorithm wouldn't know where to go to find the minimum.

So in doing any of this data fixing it's vital to come up

with a good means for generating a starting guess.

Here it's easy, you pick the biggest value for instance.

Not coincidentally, it's also equally important to compare

the fit to the data and ask if you believe the final fit.

So, what we've done in this video

is we finished our little discussion of using vectors and

multivariate calculus together to help us

do optimisations of functions and to fit data to functions.

And it turns out in the end to be really easy computationally.

We can fit function in just a few lines of code in Python on Matlab or R.

But now, after all this work you understand

something of how those algorithms work under the hood,

and that means that hopefully you'll be much better off

figuring out how to fix them when they go wrong.

For instance where the Jacobian doesn't make any sense.

And also, you've got all the maths you need to

access the next course in this specialisation on machine learning.