In fact the order of growth classifications are so important they've

led to enormous amount of research in recent years and just talk briefly about

that now. So there is, life is a little bit more complicated than pointed out in

the last example and one problem is that the inputs can cause the performance of

the algorithm to vary widely. So often we have to think about different ways of

analyzing the algorithm depending on the input. So, the running time is going to be

somewhere between the best case and the worst case. Best case is the lower bound

on cost it. It provides something that the running time is going to be bigger than

that always or not less than that and then there's the worst case which is the most

difficult input. If we analyze that then we can guarantee that the running time in

the algorithms not going to be bigger than that. And then in a lot of situations we

might consider our input to be random. Well we need to, someway to model, what we

mean by random for the problem that we're solving but there is a lot of situations

where we can do that and then we have a way to predict performance even when the

input might vary widely. So for example for 3-sum, it's kind of always the same.

With the tilde notation, the only variability in that algorithm is the

number of times the counter is incremented and that's in low order terms so it

doesn't need to chew up in our analysis. For binary search it's, you might find the

thing right away in which case is constant time and we can show that the average and

the worst case are both lg based two(N). There's other, in another examples that be

much more variability even. So, we have this different types of analysis depending

on the input. And but the question is, what about the actual problem that the

client is trying to solve? So we have to understand that two in order to be able to

understand performance of the algorithm. And there's two approaches that are, or

successful in this. One is to design for the worst case. Just to make sure that

your algorithm are, always runs quickly and that's definitely ideal. Another is

to, if you can't do that is to randomize and then depend on some kind of

probabilistic guarantee and we'll see examples of both of these as we go through

the course. Now, those kinds of considerations, you know the idea of order

of growth leads to discussion of, what's called, what I call the theory of

algorithms. And here our goals are, we have a problem to solve like solve the

3-sum problem and we want to know how difficult it is. We want to find the best

algorithm for solving that problem. The approach that the computer scientist use

for this is to try to suppress as many details as possible in the analysis. And

so just analyze the running time to or within a constant factor. That's what

order of growth is getting at and also I want to, not worry about the input model

at all. And so we focused on worst case design and we can talk about performance

of algorithms just in turn of the order of growth and it's actually possible, it's

actually possible to do that in a very rigorous way that it's taught us a lot

about the difficulty of solving problems. And our goal is to find an optimal

algorithm where we can guarantee to within a constant factor certain performance for

any input cuz we discovered the worst case but we also can have approved that didn't

know algorithm could provide a better performance guarantee. I'll give a couple

of easy examples of this. Now in order to do this they're, these commonly used

notations called the big theta, big O and big omega notations. So the and those

definitions are given here. So big theta notation is just the way to describe the

order of growth. Theta(N)^2 is kind of short hand for anything N^2. It's bounded

above and below by constant time N^2 and that's what we really use to classify

algorithms. And then, there is big O notation which is upper bounds on

performance. When we say O(N^2), we mean that it's less than some constant time N^2

as N grows. And big omega is used for lower bounds means greater than some

constant time N^2 as N grows. So those three notations were able to use to

classify algorithms and I'll show them in the following. So, examples from our

1-sum, 2-sum, and 3-sum are easy to articulate so our goals are to establish

the difficulty of the problem and to develop an optimal algorithm. So, the

1-sum problem is 00 in the array. Well, an upper bound on the difficulty of the

problem is some specific algorithm. So, for example, the brute force algorithm

that looked, that looks at every array entry is a specific algorithm and it means

that and that takes O(N) time. We have to look at every, it's less than a constant

time N for some constant. So, the running time of the optimal algorithm has to be

O(N) that is that's specific algorithm provides an upper bound on the running

time of the optimal algorithm. And but in this case it's also easy to develop a

lower bound, that's a proof that no algorithm can do better. Well, for 1-sum

you have to examine all entries in the array. If you miss one, then that one

might be zero so that means that the optimal algorithm has to have a running

time at least some constant times N where we say the running time is omega of n. Now

in this case, the upper bound and the lower bound match. So, doing the constant

factor so, that's a proof that the brute force algorithm for 1-sum is optimal. It's

running time is theta(N). It's both omega and O(N). That's, for that simple problem

it was okay to get the optimal algorithm. For a more complicated problems it's going

to be more difficult to get upper balance and lower balance and particularly upper

balance and lower balance that match. For example let's look at 3-sum. So, upper

bound for 3-sum, say our first brute force algorithm, say that the proof, was a proof

that the running time of the optimal algorithm is O(N^3) but we found a better

improved algorithm. Whose running time is O(N^2) lg N. So, that's a better upper

bound. Lower bound well, we have to examine all entries cuz again, we might

miss one that makes 3-sum = zero and that's a proof that the running time in

the optimal algorithm is omega(N) but nobody knows higher or lower bound for

3-sum. So there's a gap between the upper bound and the lower bound and open

problems. Is there an optimal algorithm for 3-sum? We don't know what it is. We

don't even know if there's a algorithm whose running time is < O(N^2) or we don't

know higher lower bound and linear. So that's an example of an open problem in

the theory of algorithms we don't know how difficult it is to solve the 3-sum

problem. Now, this point of view has been extremely successful in recent decades. We

have a new problem, develop some algorithm, proves some lower bound. If

there's a gap, we look for new algorithm that will lower the upper bound or we try

to find a way to raise the lower bound. Usually it's very difficult to prove

non-trivial or lower bounds. Trivial or lower bound like look at every input items

is not so hard non-trivial lower bounds like for example, the proof that we're

talking about for Union-find problem are much more difficult. And in the last

several decades people have learned about the computational difficulty of problems

by examining steadily decreasing upper bounds so the algorithms were better worst

case running times for lots and lots of important problems and plenty of optimal

algorithms and plenty of gaps still remain. It's a fascinating field of

research that many people are engaged in. Now there is a couple of caveats on this

on the context to this course. And the first one is maybe it's overly pessimistic

to be focusing on the worst case. We've got data out there. We've got problems to

solve. Maybe it's not worst case data and lots of fields of engineering and science.

We don't focus on the worst case. The worst case for this course would be

lightning to strike and it would be over so we don't plan for that. And since

similar it's true for algorithms. Maybe we should be focusing on understanding prope

rties of the input and finding algorithms that are efficient for that input. And the

other thing is in order to really predict performance and compare algorithms we need

to do a closer analysis than to within a constant factor. So we talked about the

tilde notation in the big theta, big O, and big omega, omega that are used in the

theory of algorithms. And really there's so much published research in the theory

of algorithms that a lot of people make the mistake of interpreting the big O

results that are supposed to give improved upper bounds on the difficulty of the

problem as approximate models for the running time and that's really a mistake.

So in this course, we're going to focus on approximate models by, you know making

sure that we use the tilde notation and we'll try to give specific results for

certain quantities of interest and the constant, any unspecified constant in the

running time. We'll have to do with properties in the machine and in the

system so they will be able to use these results to predict performance and to

compare algorithms.