In this video, we're going to look at the equilibria of infinitely repeated games.
So in order to talk about equilibria, we need to begin by asking what are we going
to mean by a pure strategy in an infinitely repeated game? And you may like
at this point to pause the video and see if you can answer that question before I
go on and tell you the answer. If you need a hint, you should remember that the rule
of thumb that we've used to define strategies as we've changed game
representations has been to say that a pure strategy should be whatever you need
to tell another person to play in the game on your behalf and to end up doing exactly
what you would have done. So a pure strategy has always kind of been the
policy that you would, you would follow. Making all of the decisions in, in the
same way as you would have actually done it. So whatever you would need to
communicate about that policy. Well, in an infinitely repeated game your strategy is
going to be a choice of action at every decision point that you would have. which
means an action that you would take every stage game. And bear in mind, that when
you take those actions, you actually get to reason about everything that you've
seen before in the game which is to say, you can remember all of your own previous
actions and you can also remember all of the actions in previous stage games by
other players. So, so really, your pure strategy space is mapping from every
history to an action choice that you would make. Clearly, this is an infinite number
of actions. So, unlike the previous games we've looked at, extensive form games and
normal form games, you're not even going to have a finite set of pure strategies in
an infinitely-repeated game. Let me give you some examples of some famous pure
strategies in infinitely-repeated games, nevertheless, to give you the sense that
we can say interesting things about pure strategies, even though the set of
possible pure strategies is. Is infinite. So think about, playing repeated
prisoner's dilemma, so , I'm going to play the prisoner's dilemma game infinitely.
One famous strategy is called tit-for-tat. Turns out that they were famous
competitions where people submitted programs that played a prisoner's dilemma
very repeatedly and then they looked at how these programs did against each other.
And, tit-for-tat was a stradegy that did famously well in those competitions. The
way it works is it starts out cooperating, and then if it observes that it's opponent
defect every defects in the previous round the opponent defected, then tit-for-tat
defects in the next round then it goes back to cooperation. So then, if it
defected but its opponent cooperated, then it'll respond to that by cooperating. So
if tit-for-tat plays itself it'll cooperate forever., but if it plays
against somebody who defects it's going to intuitively kind of punish that defection
by defecting itself in the next round. But it's very forgiving, it's going to go back
to cooperation after one punishment. In contrast, the trigger strategy is kind of
really a mean-spirited version of tit-for-tat. So, it starts out by
cooperating, but if its opponent ever defects, that's it. It's just going to
defect forever. So it's going to pull a trigger and say., I'm, I'm never going to
forgive you. You've wronged me once. I'm going to wrong you back until the end of
time. That's why we call it trigger. So, you can see that, in both of these cases,
I've been able to describe to you how these strategies would respond to anything
from the infinite history that they would see kind of in algorithmic terms without
actually kind of writing out a strategy kind of in a formal language. So, you can
see that it is possible to think kind of coherently about interesting strategies in
infinitely repeated games. Now, of course, what we would really like to do as game
theorists is describe the Nash equilibria of infinitely repeated games. The kind of
approach that we've taken in the past, has been to first of all show how we can make
an induced normal format of a game. We, we figure out what a strategy is in the game,
we show how to make the induced normal form. And then, we just appeal to Nash's
theorem, and say, because we've got a normal form game, we know that there is a
Nash equilibrium, and so, everything kind of goes through the way it did before.
Well, that worked well for us in the past, because we always ended up with finite
sized induced normal forms. And unfortunately, because we have an infinite
number of pure-strategies, we're going to get an infinity by infinity matrix, even
in the two player case. And so, we're not going to have something to which Nash's
theorem applies anymore, because now we have an infinite-sized game and Nash's
theorem only works for finite games, which means games whose matrices are finite and
contain a finite number of numbers. And that means we, we don't have any reason
based on what we know so far, even to be sure that equilibria do exists in these
games. on the other hand, because there are an infinite number of strategies,
there might even be an infinite number of pure-strategy equilibria. So we've seen
the, the fact that in the past it's possible to have an infinite number of
mixed strategy equilibria. For example if I, in a normal forum game, have two
strategies. two pure strategies that I'm indifferent between, then any mixture
between them can also be a best response for me. So that there's a sense in which I
can have a infinite number of mixed strategy equilibria. But here, I can have
an infinite number of qualitatively different pure-strategy equilibria
potentially, that seems like a problem. So, we're not going to be able to somehow
list off the equilibria, necessarily, of a repeated game. Interestingly, there is
still something that we can coherently do to give a satisfying answer about the Nash
equilibria of infinitely repeated games. And in this video, I'm going to tell you
what it is, and actually because, it's so satisfying I'm even going to prove this
theorem to you, so you really will understand how this one works.
And, here is the idea of the theorem that we will eventually prove, we can
characterize what payoffs can be achieved under equilibrium. And, we can give an
example of equilibrium strategies that result in those payoffs, without giving
every equilibrium that results in those payoffs. So, we're going to kind of
characterize which strategies are in equilibrium in infinitely repeateded
games, sort of via their payoffs. I'm going to be able to tell you precisely
which payoff vectors are achievable in equilibrium. So to do that, I need to give
you a little bit of notation so we can talk about these payoff vectors. So this
is kind of our slide of notation, once we get through this we've going to have all
the building blocks we need to prove our theorem. So, we're going to start with
some n player game so this is just our stage game, some normal form game. And
we're also going to start with some payoff vector here, and by this, what I mean is
the utilities that the players get in the game. Now, in this video I'm going to talk
about the average reward case, where each of these numbers actually encodes the
utility that I get on average by following my strategy in the infinitely repeated
game. And so, that's going to say, I care just in the same way about payoffs that I
get now and payoffs that I get a million iterations into the future. we can do
something similar in the discounted reward case and that's going to be a different
topic of a different video. But for this video, we're going to talk only about the
average reward case and the reason is the proof is a little bit easier to think
about in the average reward case. Even though the discounted reward case, reward
case is maybe a bit more of a practical setting. We can get across the, the key
ideas of both proofs by this proof and this one's a bit easier. So, so let's do
it to do that, I need to remind you of the concept of the minmax value here which is
something that we kind of introduced in the concept of zero games, but it turns
out to be very important for repeated games. So I'm going to remind you of what
it is here. So, this is the mathematical definition of the minmax value, but let,
let me start by saying in words what it means, because the, the nested min and max
operators are a little bit hard to think about. Essentially what, what i's minmax
value is, is it's the amount of utility that player i is able to obtain for
himself. If all of the other players who we call minus i are completely unconcerned
about their own utility, and instead, they're just trying to hurt i as much as
possible. So, they all play that strategy from their joint mixed strategy space,
that, if i responds to that by doing the best thing that i can for himself, then
i's utility is as low as possible. So they're trying to minimize i's utility,
given that i is trying to maximize in response to their minimization and the
number that comes out at the end is the amount of utility that i can get by doing
as, as best he can against everybody trying to hurt him as much as possible. So
that's i's minmax value. So, intuitively, if i want to punish you as much as
possible in a game, and you know that I'm trying to punish you, that's, the, the
amount of utility that you get is your minmax value.
So I will say that a payoff profile, remember, a payoff profile is a payoff
amount for everybody, is enforceable, I'll give it the name enforceable if it's the
case that everybody's payoff in that payoff profile is at least their minmax
value. So if anybody is getting less than their minmax value in a given payoff
profile, I will say that that's an unenforceable payoff profile. And I will
say that a payoff profile is feasible, I'll use, I'll use this technical term
feasible if the following condition is true. intuitively, what I, what I want to
say about feasibility is that it's possible to actually get that payoff
profile by combining together payoffs in, in the actual game that's being played.
Notice that enforcibility doesn't actually require that it's possible. You know, I
could say the payoff $1 million for me, a million for you in prisoner's dilemma is
an enforceable payoff profile, because it's above both of our min max values, but
there's no way we can actually each get a million in prisoner's dilemma, because the
numbers don't go that high. So feasibility is going to talk about what it's actually
possible to do. And the way we're going to say this is I'm going to restrict myself
to rational numbers, I'm going to say, if there exists rational and non-negative
values alpha a, such that for all players, I could express player i's element from
this payoff vector as a sum overall of the payoff profile, action profiles in the
normal form game of this alpha a times i's utility for that a. So, intuitively oh,
and then, of course, I need this let me just say this last part of the condition,
that these alphas all sum to one, across all of the action profiles. So let me kind
of explain to you what this means. So, oops, so I have kind of a normal form game
and what I want to say is I have some weight on each cell in the game. These are
the alphpa a's and they all sum to one. And what I want to do is take my payoff
from this cell weighted by the alpha for that cell, my payoff for this next cell
weighted by my alpha for that cell, and sum all of those, those weighted utilities
for me up and then I get ri. So I, I, what I want to say is there exist alphas that I
could use that would blend together the payoffs in the actual game in such a way
that I get ri. And furthermore, that has to simultaneously be true for everybody
else, and in particular, it has to be true with the same alpha a's for everybody
else. So I can come up with some alphas that may, that get me my ri, but then I
have to use those same alphas to get rj for player j. I have to have the same
weights on all of the, the cells here so that I, I get the same numbers for
everybody. So, so let's think about the following game, which will give you an
examp le of how feasibility works. So, let's say I have a game that looks like
this. So in this game, I claim the payoff profile 1,1, is enforcable, and the reason
is I can put a weight of 50% on this cell, a weight of 50 percent in this cell, and
weights of zero on these cells. And you can see my payoff in this game is 50%
times 2 plus 0 times 0 plus 0 plus times 0 plus 50% times 0, which is 1. The other
players' payoff in this game is 50% times 0 plus 0 times 0 plus 0 times 0 plus 50%
times 2, which is also 1. So that is a feasible payoff in this game. On the other
hand, the payoff 2,2 is not feasible in this game, and the reason is, there's no
way that I can have weights on these cells that sum up to one and that give both of
us two, because for me to get a payoff of two, this would have to be one. That's the
only cell where I get a payoff at all. And for my opponent to get two, this one would
have to be one, and then they would both sum up to more than, than one. This
condition over here would be violated and so we just can't do it, so that's not a
feasible payout. ,, , Okay. Nol, we're ready to prove the folk theorem. So the,
and it's kind of funny it's called the folk theorem. It's kind of like a folk
song for mathematicians. So a folk song is a song that everybody knows, but nobody
really knows who wrote it first. And a folk theorem is a theorem that people all
kind of knew and talked about before they got around to writing it down and they're
not really sure quite where it came from. So, this is the folk theorem of game
theory, and despite having uncertain origins, it's, it's very important. So in
the folk theorem basically tells us what the Nash equilibria are of an infinitely
repeated game in terms of these payoffs and in terms of the definitions that I
just told you. So there are different folk theorems for different settings. This is
the folk theorem for infinitely repeated games with average rewards. So, the folk
theorem has two parts which basically stems from the fact that I've made a
restriction here to rational alphas. Turns out we don't actually need that, that the
math gets more annoying if we have to deal with real values. So, to keep things
simple, I'm doing this just for rational numbers. So, the folk theorem says that in
any n-player game, which we're going to repeat infinitely, and, for any payoff