So I'm starting with another copy of the last slide from the end of the previous
lecture, the hill climbing lecture. And what we said in that slide was that
the big idea, the two big ideas, of this new method for placement with random hill
climbing were that there was this temperature control parameter that we
started at a very hot value. And we cooled very slowly over the course
of many, many, many swaps, random swaps, that we're going to try to drive the
placement process forward. And that this temperature parameter was a
hill climbing control parameter and when the temperature was hot, we were more
tolerant of. Of changes in the gate locations that
made the placement wire length worse. And when the temperature control
parameter was cold, we were not so tolerant.
And what tolerant meant was that we actually used some random numbers in some
interesting ways to calculate some probabilities to decide whether we should
or should not keep a gate swap that made things worse.
This is an instance of a very famous general optimization method that has a
very famous name. This is an algorithm called Simulated
Annealing and so in this lecture, we are going to talk about.
Simulated Annealing in the context of placement, but we're also going to give
you a little background about why this thing has this name.
So let's let's just talk about a little context here.
So here's a physical analogy for you. Suppose you want to make a perfect
crystal. What does perfect mean for a crystal?
It means that all the atoms are lined up in their crystal lattice sites.
And that's actually the lowest energy state.
That's what the atoms really want to do. So I've got a little example over here of
a crystal that has lousy order. It is a disordered crystal.
And I've got basically a little 8 by 8 grid of, of, pink circles, that,
Are trying to look like atoms. And then I've got four little red atoms
that are in sort of strange locations. They're clearly not where they want to
be. This is disordered as a higher energy.
How do you take this physical system and actually make it a really good high
quality crystal, a low energy crystal, a crystal where all the atoms are on their
lattice sites? The answer is you get the material really
really hot. So the atoms have the energy to move
around even to bad places which is to say, places not on their lattice crystal
sites. And then you cool it very, very slowly so
the atoms settle into good locations. And the idea is that when there's a lot
of energy, the atoms can move around even to bad places and when it's not hot, the
atoms have to sort of stay where they found.
And by getting it really hot and cooling it really slowly, you actually encourage
it to go into this, from this highly disordered state to an ordered state.
This process has a name, it's called annealing.
So, you anneal it, you get it really hot and you cool it really slowly and
hopefully you get a crystal with, well it's probably not going to be perfect and
it's probably not going to be minimum but it's going to be really good.
It's going to have very few dislocations in the crystal lattice.
So that's physically how you do that. What if you actually wanted to write a
computer program to simulate that thing? What kind of physics do you need to know
to make that actually work. Well, you would have to have some kind of
a data structure that would represent the crystal lattice.
And you know, that could just be a bunch of atoms with x and y or x,y and x
coordinates. I'm just going to draw it in two
dimenstions here. And you would have, just like our placer
had. An inner loop where you would randomly
grab two gates and swap their locations and evaluate the change in a number that
mattered to you, and that change was the wirelength.
For the physical system, you grab an atom and you move it someplace, that's what
I'm showing here. You grab the blue atom and you move it
someplace, and the thing you calculate is the energy, then in particular you
calculate the change in energy, because that's what the crystal cares about.
And you also need to know how hot it is, because whether or not this is a good
thing to do, whether or not this is a likely thing to do, depends on how hot it
is. And so let's ask a question.
To model a real physical crystal, where we grab an atom, and we move it to a new
location and we calculate the change in energy, delta e.
What is the probability that that actually should happen in my computer
program? And the answer is 1.
If delta e goes down, if delta e is smaller, if this change in the atomic
location makes the energy better. You should keep it.
That's a good thing to do. That's a physically reali-, reasonable
thing to do. What if, to model a real physical
crystal, what if the energy goes up? What if you grab the atom and you move it
to a worse place in terms of the overall crystal lattice organization?
What if delta e is bigger than zero? Now, in your computer program, what's the
probability that you ought to, you know, make that happen?
And interestingly, the answer is e of the minus delta e over kt.
That's the physics of the real universe of taking that atom and putting it
someplace where the energy got locally, microscopically at the atomic level, a
little worse. Delta e is the change in the energy.
T is the temperature, and k amazingly enough, is the Boltzmann constant.
In real physics the units all have to line up.
If you probably took a physics class in your life you probably got yelled at by
some professor, lecturer, instructor or teaching assistant, because your units
didn't add up. You know, you can't take newtons and add
meters. It's just not okay, and similarly here
the thing that's in the exponent has to be without Units and so the thing on the
top and the thing on the bottom. The nits all have to cancel.
And so if the thing at the top is energy. And the thing on the bottom is
temperature there has to be something that links the way we measure temperature
to the way we measure energy. For this to get the real probability.
That the real atom, makes this real physical move and amazingly enough,
that's a famous number. That's a famous physical constant.
That's the Boltzmann constant and that's Ludwig Boltzmann over there on the right,
a picture from Wikipedia. The Boltzmann constant, in real physics.
Converts the units of temperature to the units of energy.
So what, you know, Ludwig was doing back there in the 19th century, was pioneering
some of the initial mathematics of statistical thermodynamics to be able to
sort of write these equations about what happens with what probabilities with what
energies. This is a long time ago.
Whats amazing is this e to the minus energy over temperature thing is the sort
of the foundational model how the probabilities works.
Now, I caution you that you don't need a k when you're doing a placement.
Because you don't want a placement. Nobody cares what the units of wire
length and temperature are. So as far as you're concerned, pretend
the temperature is measured in meters. Right?
It doesn't matter. It's all artificial.
It just has to be a probability. It has to be a number between 0 and 1.
We tune the parameters so that everything works.
But if you're doing real physics, with real atoms and real crystals and you want
real probabilities with real energies, the units all have to work and that's
where Ludwig comes in. This new method is called simulated
annealing. It's a very famous general optimization
method. You can optimize lots of different things
with it. It is, however, widely used in VLSI CAD.
It was invented at IBM in the early 1980s by Kirkpatrick, Gelatt, and Vecchi, from
this very famous paper from Science Magazine.
Which is a very interesting paper to read, I recommend it, if you can get
yourself a copy. and in fact some of the earliest examples
that our friends at IBM, in fact I know Scott pretty well used were actually
electronic design examples, ship design examples.
They were actually designing parts of IBM computers back there in the early 1980s,
and getting fabulously good results. And it's this idea of annealing something
getting it really hot. Allowing both downhill changes that make
things better. And uphill changes that make things
worse, with a probability controlled by the temperature.
But applying this in the form of a, of software, of a computer program, to
things that are not physical systems. Things like gates moving around on the
surface of chips. So, here's some pseudo-code for a
simulated annealing placer and it looks kind of complicated.
And what I want to say is that we're going to read this sort of from the
inside out to convince you that it's actually something very close to what you
already know. So, ignore the stuff with color behind
it. Ignore the yellow stuff and ignore the
kind of pink brown stuff, and let's just start with the simple white parts here,
and so up at the top it says that start with a random initial placement.
Okay, that's just like the greedy random iterative improvement stuff that you
know. And then it's got this part here that
says for s equals one to m times the number of gates, m is just how many swaps
we're going to do per gate, so pick a number.
1000. I'm going to do 1000 times the number of
gates swaps and that's going to be my placement.
And then look at the code inside the for loop, swap 2 random gates gi and gj.
Calculate delta l, the change of the wire length if delta l is less than 0, then
accept this swap, else. And then ignore the stuff with the brown
background, else undo this swap, okay. Well, great that's just the greedy
placer. What happens now when we add the,
simulated annealing stuff, is that, the first thing we do, is we add this inner
loop stuff that says, if the wire length went up, if delta l is greater than zero
than we generate a uniform random number r.
That's this thing here, and we calculate that probability p which is e to the
minus delta l over t. And we ask if r is less than p.
And if that's true then we accept this uphill swap.
It's a new worse placement but our hill climbing control parameter key says it's
okay to do. Alright, so that's the probabilistic swap
acceptance, and then the yellow stuff, is this new outer loop, this annealing
temperature control loop. And that says that you start with the
temperature being hot. And then you have this, this basically
this Boolean right, just a flag, that says frozen is false, and it says while
not frozen, do, we do a whole bunch of gate swaps at this temperature.
And then at the end it says, if the overall wire length is still decreasing
over the last few temperatures, then okay we're still annealing, what do you do?
You make the temperature a little cooler because as the annealing runs, you have
to make the temperature control go down so you don't keep taking all the big
uphill moves. So this is an arbitrary number.
T is 0.9T. We cool it.
And if it's not the case, then, if it is the case that the total sum of the wire
length has stopped changing. So let's say it just, you know, it hasn't
changed more than, you know, 1 100th of a percent over the last five temperatures.
Okay, we're done, right? I mean, we're just not going to find
anymore improvement in the wire length. Then we set frozen as true, we drop out
of the loop and we return the final placement as the best solution.
So, it's not that much different, it's just the temperature outer loop, the
yellow stuff in this diagram, and the brown-pink stuff.
The, you know? e to the minus delta, l over t.
Let me go uphill with a certain probability stuff that's really new here.
This probabilistic acceptance of swap stuff is really famous.
So this little snippet of code. If uniform random less than exp delta l
over t. Then accept the swap else undo the swap.
This little piece of code is sort of a version of a very famous idea.
So this is the idea that randomly accepting a perturbation of a system with
a specific probability will correctly simulate the physics of that real system.
This has a name. This is called the Metropolis Criterion.
And it's named after this guy, Nicholas Metropolis, Nick Metropolis, a very
famous guy. what was Nick doing when he was sort of
doing this? Nick was working with some very famous
people like John Von Neuman and Stanislaw Ulam, who's a very famous mathematical
physicist. Way back at the dawn of atomic physics in
the 1950's, when people had computers the size of football fields built out of
thousands of vacuum tubes. And they were trying to write the first
computer, some of the first computer programs, to simulate how the physics of
things, you know, with atoms doing interesting things, worked.
And these are the guys who invented the first Monte Carlo simulations, if you've
ever heard that word. And this sort of inner loop thing that is
generating particular probability distributions, checking that the
perturbation you make either does or doesn't fit within some probabilistic
acceptance criterion. That's really famous and so when you see
lots of computer. algorithms that are based on random sorts
of acceptance methods for how things make progress.
You'll often see metropolis the name show up.
Metropolis criterion, metropolis algorithms.
Lots of interesting places we use this idea.
This is really famous. How does this metropolis criterion make
our overall algorithm behave? and the answer is, in some very unusual
ways. In some ways that you've probably haven't
experienced before. So it really is random.
You might accept the swap. You might not accept the swap.
It depends on the random number you generate.
And the way to maybe explain that is suppose the temperature's 10 and delta l
is 20, plus 20. So either the minus delta l over t.
Is either the minus 2. Its about 0.14.
So this is about a 14% probability you accept the swap.
You know, what does that mean? So here's a, let me give you a concrete
example of what that means. Supposed this temperature you're trying a
really large number of swaps, in the, in the, in the, in the placement.
So, I mean, suppose you're trying a million swaps it this temperature.
And suppose it turns out that 5,000 of those different swaps all happen to
evaluate that they change the wire length by delta l is plus 20.
Then roughly 0.14 of those swaps that you try that all have a wire length
degradation of 20. Approximately 14% of them are 0.14 x 5000
equals 700 of them will be accepted. A random 700 of those 5000 which all
change the placement in exactly the same way.
They all make the placement 20 units of wire length longer.
About 700 of them will get accepted and about 4,300 of them won't.
And if you change the random number seed so that you get a different stream of
random numbers and you run your placer again.
You will get a different set of placement swaps proposed, and let's assume that you
again try about 5,000 of them that have delta l as 20.
You will still expect about 700 of them to be accepted, but it'll be a completely
different set of swaps with a completely different set of results.
It's a very different universe when you have probabilistic algorithms like this,
and the thing that's amazing is how well this stuff works.
So, how well does it work? And the answer is, it works great.
There is nothing else to say other than it works great.
So this is the same 2,500 gate benchmark, this is an annealer that looks very much
like the annealer pseudo-code that I showed you a couple slides ago.
The hot temperature is 800, the m number for how many moves per gate, how many
eval, swaps per gate, is 100. So we try 2500 times 100.
which is what, 250,000? swaps per temperature.
And then we lower the temperature this is the curve of the progress.
So the first thing I'll just say is we got about 30% less half-perimeter
wirelength total than the random iterative improvement algorithm, and
that's a gigantic number. That's a very big deal.
So let me tell you how you read an annealing curve, because this is a pretty
standard looking annealing curve. The x axis is temperature.
Okay? And the thing that's interesting is that,
it's a log scale. So what you see is the temperature is
0.1, 1, 10, 100, 1000, going from left to right.
And the other thing to remember, right? Is that you read the curve from right to
left. Because the temperature starts hot.