Welcome to calculus. I'm Professor Ghrist, and we're about to

begin lecture 14, Bonus Material. Well let's consider a more involved

example, this one motivated by a problem in statistics.

Lets say that you run an experiment and it gives you some data that is of the

following form, y equals m times x. That is you measure x values and y

values. And you know that there's some linear

relationship between them, but you don't know the value of m.

Perhaps this is coming from a physical experiment, like trying to measure a

spring constant by measuring force and deflection.

Whatever the physical motivation, you are given some collection of data points, but

these data points are noisy. They don't fit perfectly on the line.

How do you determine the appropriate value of m?

Well you could just draw a line and try to make it fit, and look right.

But wouldn't it be nice to have a more principled approach?

This is what statistics is meant for. So, let's assume that the inputs to your

problem are a collection of data points. x values, x sub i and y values, y sub i,

that are paired. Now, in order to find the appropriate

value of m, we're going to write this as an optimization problem.

The method of least squares is a wonderful technique for determining the

optimal m. Consider the function s depending on m

that is given by the following. I'm going to look at the vertical

distance between the data points, and the line of slope m.

This vertical distance is given by y sub i minus m times x sub i.

What I'm going to want to do is add up all of those distances and then minimize.

Now there's a bit of a problem in that these distances are signed, they're

positive or negative because I'm really just looking at the change in y values.

So let's square that term, we have y sub i minus mx sub i, quantity squared.

And now let's sum all of those terms up over i.

This is going to give you a deviation of the data from the line of slope m.

This function depends on m. If we chose a value of m like 0, well

that would that would give a very large value of s.

In this case, what we want to do is find the value of m that minimizes this

deviation s. So let's proceed.

If we compute the derivative of s with respect to m, what would we get?

This looks scary, but it's not so bad. Differentiation is linear, so we can pass

the derivative inside the summation sign. Now, using the chain rule what do we get?

Well we get twice quantity y sub i minus m times x sub i, times the derivative of

that quantity with respect to m. That derivative is negative x sub i.

Now if we distribute this multiplication and expand out into 2 sums we get minus 2

times the sum of over i of xi times yi plus 2 times m times the sum over i, xi

squared. We can factor out that two and that m

because they appear in every summation term.

Now our goal is to compute the minimum. So we find the critical point by setting

this derivative equal to zero. Moving one sum over to the other side, we

see that 2 times the sum over i of xi, yi is equal to 2m times the sum over i of xi

squared. What is it that we're trying to solve

for. We're trying to solve for m and so

cancelling the 2s and then dividing both sides by the sum of xi squared gives a

value of m equal to the sum over i of xi times yi divided by the sum over i of xi

squared. The question remains is this critical

point a local min or a local max? Well you might guess that it's a local

min, but how would you show it for sure? Well, if we compute the second derivative

of s with respect to m, what will we get? It looks complicated, but there's really

only one m in that first derivative. And so, treating everything as a

constant, we get that the second derivative is simply 2 times the sum over

i of xi squared. What do we note about that?

Well we don't care what the xi values are.

When we square it, we get something non negative.

So as long as sum of the xi terms are positive, we get a positive second

derivative, and a minimum. This value of m is going to minimize our

deviation and give us a best fit line. Now, what happens if our experiment is a

little bit different? The line that we're looking for doesn't

necessarily pass through the origin. Well it doesn't seem as though the

problem has really changed much at all. We're just again looking for a straight

line. But now we have to worry about not only

the slope, but also the y intercept which we might call b.

We're looking for a line of the form y equals m x plus b.

I wonder, could we do the same thing? Well the vertical distance would involve

a b term in this s function. And now, this function would depend not

only on m but on B. And this leads us to some very

interesting questions because we do not know how to find a max or min of a

function that depends on more than one input.

This is really a problem that you're going to come back to in multivariable

calculus. When you add function with several

inputs, how do you do optimization? Well I've got to tell you, some unusual

things can happen in that context. But, those unusually situations wind up

opening a whole new world of interesting questions and applications for example,

gain theory, deals with optimization of multi-varied functions.

Linear programming, machine learning, all of these fascinating subjects, are deeply

concerned with optimization of finding maxima, minima and other types of

critical points. There are some wonderful fields out there

that will rely on the intuition that we've learned in single variable

calculus.