In this video, we demonstrate how the response surface strategy changes as we reach the optimum.

Issues of curvature and non-linearity become important at the peak of the mountain.

One advantage of response surface methods is that we learn about the region around us

as we go. Remember that analogy of walking with a ski pole in your hand? Well, we never

really know the region around us. So when we use that ski pole to figure out what the

terrain looks like, we need to have a way to know when we've reached the top.

Let's just quickly contrast the response surface approach with the OFAT approach. The COST

approach, or the OFAT approach, makes you think that you're at the optimum, but you

can never really be sure. In this case that we saw earlier with two factors, you would

alternate between optimizing factor A, then factor B, then optimize factor A again, then

B again. And you'll eventually get to an optimum, but will you be sure you're at the peak? How

do you know you don't need to do another round of optimizing in A and B again?

Also, if I'd optimized B first and then A, I would have arrived at the optimum faster.

This seems like a lottery! Sometimes you get to the peak quickly, and sometimes slower.

Not surprisingly, statisticians don't like this sort of thing.

Furthermore, this approach doesn't scale well. If you had five factors, for example, A, B,

C, D, and E, then this haphazard searching across the five factors leads to inefficient

experimentation.

By using the COST approach you will not learn about the interactions in your system. Recall

from an earlier video in this module that learning more about our systems was the first

way we can use data to improve our processes.

So let's resume and continue with the model built on points 11, 12, 13, 14, and the baseline

at point 10. We pointed out that the contour plots exhibit curvature. The lines are not

parallel. These curved lines come from the interaction term, indicating that the interaction

coefficient is important relative to the main effects.

In prior models, the interaction term was small. Notice though, that the steepest ascent

method will still send us up in the correct direction if we ignore the interaction terms.

The interaction term, if we had accounted for it, would send us in a slightly different

angle.

But in this example, the discrepancy is not so bad. Had their interaction term sent us

into a different direction, we would definitely follow that direction instead of the steepest

ascent that is determined only with the linear terms. But more on that to come with this

topic of "curvature".

Let's quickly go take a step in the direction for run number 15. And because you are good

at this now, I am going to take a step of "Delta x_T" equal to the size two and the

corresponding "Delta x_P" is equal to to minus two-thirds. You can do the rest of the calculations

yourself and show that the predicted value of profit at this location is $742, and that

corresponds to these real world values and these coded values. When we run the actual

experiment, we record a profit of $735. That's an overestimate of $7. This overestimate is

comparable to the main effect.

And we also have visual evidence now of curvature. This is starting to tell me that I should

change my strategy. When we start to enter a region of curvature in response surface

methods, it is the presence of a change in the surface's linearity that's apparent.

We're becoming more nonlinear, and likely approaching an optimum. It is desirable to

know when this is happening. And one indication of that already is that our interaction terms

are large, they cannot be ignored. And visually, we see that as these non-parallel lines in

the contour plot.

The second indication that an optimum is close by is that we are levelling out. Levelling

out means that my outcome values, in the neighbourhood, are getting closer and closer, even when I'm

taking reasonable step changes.

Let's see this. The spread in profit values in the first factorial was around a $300 difference.

In the second factorial over here, that spread was around $150. And now in this third factorial,

my spreads are only $15 to $20.

We're not making the gains we had made earlier. And if we're not careful, we can be affected

by noise. If we don't know the level of noise around us, we might be misled. How do I know

whether that spread of $15 to $20 is any different to the noise in the system? Another way to

ask that is if we repeated those corner experiments, would we get similar values or different values?

So let's go calculate what the noise level is. Run at least three or four repeated experiments

at the same condition. And we typically use the baseline. So here at the base of the factorial.

I previously had an outcome of $732, and two more runs give me an outcome of $733 and $737.

So there's a spread of about $5. That spread is very different to the spread over the corner

points of the factorial. Indicating, I'm still seeing signal over the noise.

The third indication of an optimum, is whether our predictions are too high, or too low.

We saw here at point 15, we had a prediction error of $7, just over our level of noise.

This indicates the model can be improved.

We often observe strong changes in the model's surface near the optimum. For example, if

you're making a product, you want to make it long enough to bring out the beautiful

colours and caramelization flavours that occur. But go just a little bit too far and it becomes

burnt.

We also see this in engineering systems. Often, our optimal point of operation is right at

the edge of a cliff, and if we go just a little bit further, we fall over to the edge of the

cliff and see our outcome value drop down rapidly. Another good reason to take small

steps near the optimum.

A fourth way to detect curvature is that our model does not fit the surface very well.

A linear model cannot fit a curved surface well. And we use the terminology, "lack of

fit", to quantify that. Let me show you. In our first factorial, the center point was

$407, but the predicted center point was $390. That's a difference of $17.

Now that might seem large, but it really isn't when we compare it to the main effect of 55

and 134. Recall what the interpretation of that number 55 is again? So a $17 difference

really is small, indicating a small lack of fit.

In the second factorial, the actual center was $657 while the predicted center was $645.

A difference of $12. That again is small when compared to the neighbourhood we're in.

In this third factorial though, the actual center is at the average of these three baseline

values, $734. Compare that to the predicted center value of $724. That's a difference

of $10. Which when compared to the largest effect of 7.5 and to the level of noise of

about $5, indicates an important deviation in the model, versus the actual surface that

we're on, at least in the center.

So if we're getting large deviations at the center, we cannot hope to get good predictions

outside of the range of the model. And good predictions are essential to optimize in the

correct direction.

So there are four ways that we've shown to check for inadequacy in the model. And those

of you with a statistical background can go calculate the confidence intervals on the

model coefficients, and observe that they're very wide. None of the terms in the model

are statistically significant.

Well, as we saw in the single-variable popcorn example, when faced with a poorly predicting

model in a region that has curvature, we can add terms to account for the nonlinearity:

"quadratic terms". So let's go add these now.

There are two options: adding points on the face of the cube, or adding points a little

bit further called "axial points" or "star points". These points are at a distance denoted

as alpha from the center. Alpha is a value greater than 1 to ensure they are outside

the cube.

The design on the left works well if you hit into a constraint, or can not leave the factorial

space. The design on the right, comes from a class of designs called central composite

designs or CCD, and they're preferred for the statistical reason called rotatability.

Just a quick aside, rotatability simply means that the prediction error is equal for any

two points that are the same distance from the center. And it's a desirable statistical

property.

Now, there are various choices on the distance alpha and the number of center points to use,

but that's a messy discussion that you can research quite easily. The general advice

is this though: run the factorials first; then run the star points afterwards at a distance

of alpha equal to 2\^k taken to the power of 0.25.

So, if you have two factors, alpha = 1.41, and if you had three factors, you would have

alpha = 1.68. Also, add three to four center points to assess lack of fit. And run these

center points at different times, not all after each other.

Notice this though, from the individual perspective of factor T and from factor P, each of these

have runs at five distinct levels, and that's what helps us go accurately fit that quadratic

model.

Let's go do this! The first star point is run number 18 at a value of +alpha for factor

T in coded units, and a value of zero in factor P. Let's add that to the table, and also calculate

the real world units for it in the usual way. So that's 343 parts per hour, and a sales

price of $1.63. You can go practice reproducing the other three star points, and let's go

add one final center point experiment, number 22, so that we have a total of four center

points.

Now we go run these experiments, in random order of course, and report the values here

in standard order. Notice firstly that the center point 22 is similar to the prior values

indicating that the system is still stable and reproducible.

Well we've got quite the collection of data here. A central composite design (CCD), always

has the factorial points, center points, and star points. Now I've arranged them in that

order in the R code.

When we run that code, we get the quadratic model from them all. I will leave it as a

small challenge to you, to go prove the following two things.

Firstly, the model's prediction of the center point, when compared to the average of the

four center points has a very small deviation. So this model fits well, at least at the center.

Secondly, this quadratic model's prediction of the other points, for example, one of the

corner points, or one of the star points, or even experiment 15 over here, is a very

good prediction. There is little prediction error. So we have confidence in this model's

prediction.

Now let's go visualize those as a contour plots. And right away, we can see we are in

fact near the optimum. Visually, the axial point is pretty close to the predicted optimum

region from the model. That's good enough to stop here and use as our optimum.

But let's say the quadratic model had looked like this one instead. Then you would go run

your next experiment over here based on the model at that predicted optimum. And then

you would go verify the model's prediction ability at that point to check that you've

reached the optimum.

Now we can be a bit more precise -- for those of you who don't like to trust the visual

judgement. We can take this quadratic equation, differentiate it with respect to the coded

variables, set it equal to zero, and you will get a set of two linear equations and two

unknowns, which you can then solve using your favourite linear algebra software, or by hand.

When you go do that, you get the predicted optimum at 343 parts per hour, and a selling

price of $1.59. The quadratic model tells us to expect a profit of $740 at this point.

Running that 23rd experiment gives an actual profit of $739; that's very close agreement.

This is definitely the largest value we've observed over the entire approach followed.

So this video has answered the last question we had in an earlier video in this module:

"How do we know when to stop?" We know that we can stop when our model matches the surface

well; and the model predicts an optimum. Using the model, we know that we've reached the

peak of the mountain, even though we cannot see the actual mountain around us .

So let's recap our entire approach. Start by building successive linear models, shown

here in blue, green, and orange, respectively. I'm showing you the prediction contours in

those colours for the local region around each model. Each of those local models had

their baseline or 0-0 value.

These past videos have also shown that we should incorporate the baseline points, as

well as other points in the neighbourhood in our model, to help improve their estimates.

We use our models as long as we have confidence in their predictions. We rebuild the model

once we demonstrate those predictions are poor, judged by comparing the predictions

to the actual values, and taking noise into account.

As we approach the optimum, issues regarding curvature, which we studied in four points,

become apparent. We have to change our strategy. If we pick up we have curvature, based on

these criteria, we have to start decreasing our step size and to start fitting quadratic

models.

The principle of an optimum is that it's nonlinear. Points around us must be lower. And so our

last prediction model that we build, shown here in red, illustrates that quite nicely.

To end off with though, let me show you the true surface in a grey colour. This is obviously

something you would never seen in practice. But seeing it here gives you good confidence

that we're doing the right thing all along.

You can see how the models in blue, green, and orange approximated the non-linear surface

very well in their local region. Outside their local neighbourhood, they start to deviate.

The non-linear model fits the surface over a wider region. That isn't too surprising.

The information to build that non-linear model, required four plus four, plus four, or 12

experiments. And we use that non-linear model to place our final experiment(s) very close

to the true optimum.

To end this video, I will add one point: the real optimum, may move.

Our system could deteriorate and change, so that optimum that you found - won't stay there.

They are experimental tools that continually keep searching and moving towards the optimum.

We won't have time to cover them in this course, but the topic of Evolutionary Operation (EVOP)

is what you should search for if that interests you. It is particularly applicable to manufacturing

systems that are never stable. That mountain is moving and you have to move as well in

order to remain at the peak.