Not all programs are created equal. In this course, we'll focus on writing quality code that runs correctly and efficiently. We'll design, code and validate our programs and learn how to compare programs that are addressing the same task.

Loading...

來自 University of Toronto 的課程

学习编程：撰写高质量代码

323 個評分

Not all programs are created equal. In this course, we'll focus on writing quality code that runs correctly and efficiently. We'll design, code and validate our programs and learn how to compare programs that are addressing the same task.

從本節課中

Week 3

- Jennifer CampbellAssociate Professor, Teaching Stream

Department of Computer Science - Paul GriesAssociate Professor, Teaching Stream

Department of Computer Science

You've learned about two ways to search for a value in a list.

One of them, linear search, will work on any list, regardless of whether the lists

are sorted. Binary search requires that the list is

sorted but, as you've seen, it looks like it can be a lot faster than linear

search. Let's poke at these two algorithms a

little bit more, so that we can get a feel for how good they are and when we

would use one or the other. Computers play a lot of tricks in order

to make code really fast, so it's hard to know exactly what to attempt.

However, we do know that there is a comparison that happens each time through

the linear search while loop, and that there's an assignment statement.

So let's count these comparisons and the assignment statements that happen each

time through a loop. For linear search, we see that there are

two that happens each iteration. One comparison and one assignment

statement. Again, this is a rough, rough estimate of

what goes on in code, but it turns out that this rough estimate is going to be

good enough for us to conclude that binary search is just wildly faster than

linear search. At least when lists are large.

In binary search, we have a much simpler comparison.

But we'll still count it as just one. We have one assignment statement, a

second comparison, and then the result of that comparison will lead us to one

assignment statement or the other. So, we have one comparison here, one

assignment statement, another comparison. And then one of these is going to happen.

So in binary search, we actually have sort of four steps on each iteration of

the loop. Because the initialization and the if

statement, at the end of each, happen only once, no matter how many items there

are on the list. And there are about the same number of

steps in linearly search and binary search.

We're going to ignore these in our analysis.

In the linear search then, we figured there was roughly 2 steps per iteration,

and in binary steps there was roughly 4 steps per iteration.

We're going to compare these algorithms in the worst case.

That happens when the value is not in the list, for the linear search.

Here's an example for linear search if my list is 1, 2, 3, 4, 5, so there are 5

items in the list and am looking for value maybe 6.

Then I look here, here, here, here, and here.

And then I'm done. So that looked at all 5 items.

Let's make a table of the number of items and the number of iterations.

If there's one item, then there's one iteration.

If there's two items, there's two iterations.

In general if there are k items, there are k iterations.

And now for binary search. If there is one item, then on the first

iteration, we said m, we do a comparison, we move either b or e.

And then the loop terminates. Let's start building our table.

If there is one item then there's one iteration.

If there are two items, then were going to throw away one half of the first time

to the loop leaving us with one item. And we know how many steps that's going

to take. So if there were two items, it takes us

two iterations. Here's where things start to get kind of

interesting. I'm actually going to skip 3 and go right

to 4. If I have 4 items in my list, then in one

comparison, I'm throwing away about half. And that means that I'm, I end up with a

two item list. Well, it took me one step to get to this

point, so that means that I just have one more iteration than I had when I had two

items in my list. This means that I can process twice as

many items with just one more iteration. Using this knowledge, let's fill in a few

more rows of the table. 8s twice 4, 16 is twice 8, 32 is twice

16, 64 is twice 32, and 128 is twice 64. With 128 values, we can find the index of

v in this list in only 8 iterations. Linear search would take 128.

Let's plot this on a graph. Our axes are the size of the list, or the

number of items, and the number of iterations.

If the list has 1 item in it then there is 1 iteration.

2 items, 2 iterations. 4 items, 3 iterations.

8 items and 4 iterations. When we double the size of the list, we

increment the number of iterations by 1. So thinking about this backwards, if I

have 128 items in my list, then after one comparison, one iteration of binary

search, I end up with only 64 items left to consider.

When I have 64 items left to consider, then in one iteration I'm throwing away

half of them so that after that iteration I only have 32 items to consider, and so

on. This is awfully powerful.

Mathematicians have a name for this function.

It's called the logarithm base 2 of the number of items.

Here's one more way to think about logarithms.

It's the number of times you can divide by 2 in order to reach 1.

So, how many times can I divide by 2 by 2 in order to reach 1?

Once. What about 4?

I can divide 4 by 2 to get 2. And 2 by 2 to get 1.

That was 2 divides. 8 is going to be 1 division by 2 you got

4. 4 divided by 2 is 2.

2 divided by 2 is 1, so that gives us 3 divisions.

16 is 4 divisions. 32 is 5 divisions.

And so on. So, let's try to calculate the logarithm

of 1,024. Well, 1,024 is 512 times 2.

When we divide 512 by 2, we get 256. Then, 128.

Then 64, then 32, then 16, then 8, then 4, then 2, and then 1.

So there's one division, two, three divisions, four divisions, five

divisions, six divisions, seven divisions, eight divisions, nine

divisions, ten divisions. So the log base 2 of 1024 is going to

produce, what did I say, one, two, three, four, five, six, seven, eight, nine, ten.

Module math has a log function. This function returns the logarithm of a

number with a given base. Here are base is 2, because we are

dealing with binary. If I ask what the logarithm of 2 is in

base 2 it tells me 1. If I ask what the logarithm of 2, of 4 in

base 2 it tells me 2. 8 is 3, the logarithm is 16 and base 2 is

4. The logarithm of 32 and base 2 is 5.

So let's see if we can get an impression for just how much faster binary search is

by looking at the number, well that's 1 million.

Let's do 1 billion. So I'm just going to delete these

comments, because I'm not allowed to use them when I'm writing an integer.

Now linear search on a billion items is going to take a billion iterations.

What is the logarithm of 1 billion, in base 2?

Just under 30. So, it's going to take 30 iterations of

binary search in order to find a number in a billion items.

Let's call a binary search and linear search on a long list of numbers, to see

just how different these 2 are. We'll search for a value we know is not

in the list. So, with 10 million items, binary search

is nearly instantaneous. Linear search is noticeably slower, here

we go. cProfile is a module that contains code

that allows us to profile other code. To profile means to measure in some way.

To see how much memory it uses, how much time it takes to run.

Function run in module cProfile takes a statement to execute, and then presents

profiling information. It uses something called exec, which

takes an object, which can be a string or a code object, and then executes it.

A code object is something we haven't encountered yet, nor do we need it

because we can just use a string. Let's try calling exec.

if we're, if we give it something that isn't a string, we're told that argument

one must be a string, lights, or code object.

Okay let's put 3 plus 4 inside quotes. There's no ever, but we still don't know

exactly what's going on. Well, 1 picks a statement, perhaps exec

can pick a statement as well, maybe an assignment statement.

When we do the function call, indeed variable x is created.

Let's try assigning a string to variable y.

Remember that l refers to a list with 10 million items in it.

Let's call function run in module cProfile with a call on binary search

looking in that list for a value that isn't there.

We see a table. There were 6 function calls.

It took very little time. There was one call that took the total

time of no seconds and it was on the string that we passed in.

There was one call on binary search, there was one call on exec, one call on

len, and one call on a method disable of the profile.

That last one is completely irrelevant to our analysis.

Here we see that in linear search there was one call on the function linear

search. It took nearly 3 seconds to complete.

There were ten million and two calls on method len.

That took nearly a second just by itself. That gives us a hint that we could

optimize linear search. Let's call len on l outside of the y

loop, and then use variable length everywhere instead of the call on len.

When we run this again and profile it, we see that we went from nearly 3 seconds

for a linear search down to just over 1 and a half.

That's a remarkable speedup. However, that doesn't even come close to

how fast binary search is. This is all if the list is sorted and all

in the worst case. In the best case, where we search for the

first item in the list, linear search is also blazingly fast.

Mathematical analysis of algorithms and profiling can give us significant insight

into how algorithms work and why they are so good or bad.

This analysis of algorithms is actually a discipline within computer science.

It's a sub-field of computer science and people study this kind of thing for a

living.