21. Repeated games: cooperation vs. the end game

21. Repeated games: cooperation vs. the end game

Professor Ben Polak:
Okay, let's make a start. So I hope everyone had a good
break. We're going to spend this week
looking at repeated interaction. We already saw last time,
before the break, that once we repeat games–once
games go on for a while–we can sustain behavior that's quite
interesting. So, for example,
before the break, we saw that we could sustain
fighting by players that were rational in a war of attrition.
Another thing we learned before the break is when we're
analyzing these potentially very long games,
it helps sometimes to break the analysis up into what we might
call "stage games," each period of the game,
and break the payoffs up into: the payoffs that are associated
with that stage; payoffs that are associated
with the past (they're sunk, so they doesn't really matter);
and payoffs that are going to come in the future from future
equilibrium play.

So those are some ideas we're
going to pick up today, but for the most part what we
do today will be new. Now whereas last time we
focused on fighting, for the whole of today I want
to focus on the issue of cooperation.
In fact, for the whole of this week, I want to focus on the
issue of cooperation. The question behind everything
this week is going to be: can repeated interaction among
players, both induce and sustain
cooperative behavior, or if you like "good behavior."
Our canonical example is going to be Prisoners' Dilemma.
Way back in the very first class, we talked about
Prisoners' Dilemma, and we mentioned that playing
the game repeatedly might be able to get us out of the
dilemma. It might be able to enable us
to sustain cooperation. And what's going to be good
about that is not just sustained cooperation, but sustained
cooperation without the use of outside payments such as
contracts or the mafia or whatever.
So why does this matter? Well one reason it matters is
that most interactions in society, either don't or perhaps
even can't rely on contracts.

Most relationships are not
contractual. However, many relationships are
repeated. So this is going to be of more
importance perhaps in general life–though perhaps less so in
business–more important in general life than thinking about
contracts. So let's think about some
obvious examples, think about your own
friendships. I don't know if you have any
friendships–I assume you do–but for those of you who do
your friendships are typically not contractual.
You don't have a contract that says if you're nice to me,
I'll be nice to you. Similarly, think about
interactions among nations. Interactions among nations
typically cannot be contractual because there's no court to
enforce those would-be contracts,
although you can have treaties I suppose.
But most interactions among nations–cooperation among
nations is sustained by the fact that those relationships are
going to go forever.

Even in business,
even where we have contracts, and even in a very litigious
society like the U.S. which is probably the most
litigious society in the world, we can't really rely on
contracts for everyday business relationships.
So, in some sense, we need a way to model a way to
sustain cooperation and good behavior that forms,
if you like, the social fabric of our
society and prevents always going to court about everything.
Now, why might repeated interaction work?
Why do we think way back in day one of the class that repeated
interaction might be able to enable us to behave well,
even in situations like Prisoner's Dilemma's or
situations involving moral hazard where bad behavior is
going to occur in one shot games?
So the lesson we're going to be sort of underlying things today
and all week is this one. In ongoing relationships the
promise of future rewards and the threat of future punishments
may–let's be careful–may sometimes provide incentives for
good behavior today.

Just leave a gap here in your
notes because we're going to come back to this.
So this is a very general idea. And the idea is that future
behavior in the relationship can generate the possibility of
future rewards and/or future punishments,
and those promises or threats may sometimes provide incentives
for people to behave well today. The reason I want to leave a
gap here is I want–part of the purpose of this week's lectures
is to try and get beyond this. This is kind of,
almost a platitude, right?
Most of you knew this already. So I want to get beyond this.
I want to see when is this going to work?
When is it not going to work? How is it going to work?
So I don't want people to leave this week of classes or leave
the course thinking: oh well,
we're going to interact more than once so everything's fine.
That's not true.

I want to make sure we
understand when things work, how they work,
and more importantly, when they don't work and how
they don't work. So we're going to try and fill
in the gap that we just left on that board as we go on today.
Nevertheless, we do have this very strong
intuition that repeated interaction will get us,
as it were, out of the Prisoner's Dilemma.
So why don't we start with the Prisoner's Dilemma.
I'll put this up out of the way and we'll come back to it.
Let's just remind ourselves what the Prisoner's Dilemma is
because you guys are all full of turkey and cranberry sauce and
you've probably forgotten what Game Theory is entirely.
Let's name these strategies, rather than alpha and beta,
let's call them cooperation and defect.
And that will be our convention this week.
We'll call them cooperation and defect.
This is Player A and this is Player B, and the payoffs are
something like this (2,2), (-1,3), (3, -1) and (0,0).
It doesn't have to be exactly this but this will do.
This is the game we're going to play.
And, to try and see if we get cooperation out of it by having
repeated interaction, we're going to play it more
than once.

So let me go and find some
players to play here. This should be a familiar game
to everybody here. So why don't I pick some people
who are kind of close to the front row.
So what's your name again? Student: Brooke.
Professor Ben Polak: Brooke.
Okay so Brooke is going to be Player B.
And I've forgotten your name, by this stage I should know it.
Patrick you're going to be Player A.
And the difference between playing this game now and
playing this game earlier on in the class is we're going to play
not once, but twice.

We're going to play it twice.
So write down what you're going to do the first time and show it
to your neighbor. Don't show it to each other.
And let's find out what they did the first time.
So is it written down? Something's written down?
So Brooke. Student: I cooperated.
Professor Ben Polak: You cooperated.
Patrick? Student: I defected.
Professor Ben Polak: Patrick defected,
okay. Okay let's play it a second
time. So write down what you're going
to do the second time. Brooke?
Student: This time I'm going to defect.
Student: Me too. Professor Ben Polak:
All right, so we had the play this
time–let's just put it up here–so when we played it this
time, we had A and B.

And the first time we had
(defect, cooperate) and the second time we had (defect,
defect). Let's try another pair.
We'll just play this a couple times and we'll talk about it.
So that's fair enough. Why don't we go to your
neighbors. That's fair enough. It's easy.
So you are? Student: Ben.
Professor Ben Polak: That's a good name,
very good okay. You are? Student: Edwina.
Professor Ben Polak: Edwina, Edwina and Ben
okay. So we're going to make Ben
Player B and Edwina Player A. And why don't you write down
what you're going to do for the first time.
Again, we're going to play it twice.
Why don't we mix it up, we can play it three times.
We'll play it three times this time okay.
We'll play it three times.

Both people are happy with
their decisions. Okay so the first time Edwina
what did you choose? Student: Defect.
Student: Cooperate. Professor Ben Polak:
All right so we had–let's put down this time–so we've got
Edwina will Play A and Ben B. And we had (cooperate, defect).
Second time please, Edwina? Student: Cooperate.
Student: Defect. Professor Ben Polak:
Okay, so we're going to and fro now.
So this was cooperate and defect and one more time:
write it down.

Both players written down?
Edwina? Student: Cooperate.
Student: Defect. Professor Ben Polak:
Okay, so you we flipped round again okay.
Okay, so we're seeing some pretty odd behavior here.
Who did what that time? Edwina what did you do?
Student: I cooperated. Professor Ben Polak: So
we had this. Is that right?
So keep the microphones for a minute, and we'll just talk
about it a second. All right, so first of all
let's start with Ben here. Ben you were cooperating on the
first go. So why did you choose to
cooperate the first turn? Student: I felt that if
I established a reputation for cooperating we could end up in
the cooperate, cooperate.
Professor Ben Polak: All right,
so you thought that by playing cooperate early you could
establish some kind of reputation.
And what about later on when you played defect thereafter,
what were you thinking there? Student: I realized that
she established a reputation for defecting a second time.
Professor Ben Polak: All right,
so you switched strategies mid-course.
Edwina you started off by defecting.
Why did you start off by defecting?
Shout it out so people can hear you.
Student: Because his friend defected so I thought he
might defect.

Professor Ben Polak:
Okay, his friend defected. Okay, so he's been tainted by
his friend there. There's a shortage of space in
the class. They could have just been
sitting next to each other. Thereafter you cooperated.
Why was that? Student: Because I
thought he cooperated. Maybe he was going to keep
cooperating. Professor Ben Polak:
All right, so in fact your reputation
works in some sense. By cooperating early you
convinced Edwina you would cooperate.
And then you went on cooperating even after he
defected, so what were you doing in the third round?
Shout out. Student: I thought he
might cooperate because I cooperated.
Professor Ben Polak: All right,
he might come back. Let's talk about it to your
neighbors. So Brooke, shout out why you
cooperated in the first round. Student: Because I was
hopeful that he would cooperate. Professor Ben Polak:
You were hoping he would cooperate all right,
and why did you defect thereafter?
Student: Because I thought he would continue to
defect after he defected.

Professor Ben Polak:
Because he defected, he would continue to defect.
Patrick, you're the person who just defected throughout here.
Grab the mic that's next to you. Why did you just defect?
Student: It's such a short game that it makes sense
to defect in the last period so the second last period and the
first period. Professor Ben Polak:
All right, that's an interesting idea.
So Patrick's saying, actually if we look at the last
period of this game, if we look at this last period
of the game, what does the game look like in
the last period? Student: It's a single
period game.

Professor Ben Polak:
In the last period, this actually is the game.
If I drew out the game with two periods, it would be kind of a
hard thing to draw, it would be an annoying diagram
to draw. But in the last period of the
game, whatever happened in the first period is what?
It's sunk, is that right? Everything that happened in the
first period is sunk.

So in the last period of the
game these are the only relevant payoffs, is that right?
Since these are the only relevant payoffs looking
forward, in the last period of the game, we know that there's
actually a dominant strategy. And what is that dominant
strategy in the last period of the game?
To do what? In Prisoner's Dilemma,
what's the dominant strategy? Shout it out.
Defect, okay. So what we should see in this
game–we didn't actually because we had some kindness over here
from Edwina but what we should see in general–is we know that
in the last period of the game, in period two,
we're going to get both people defecting.
The reason we're going to get both people defecting is because
the last period of the game is just a one shot game.
There's nothing particularly exciting about it.
There is no tomorrow, and so people are going to

But now let's go back and
revisit some of the arguments that Edwina, Brooke,
and–I've forgotten what your neighbor is called again,
Ben, (I should remember that)–and Ben said earlier.
They gave quite elaborate reasons for cooperating:
cooperating to establish reputation;
cooperating because the other person might cooperate;
whatever. But most of these behaviors
were designed to either induce or promise cooperation in period
two, is that right? What we've just argued is that
in period two everyone's going to defect.
Period two is just a trivial one stage Prisoner's Dilemma.
We actually analyzed it in the very first week of the class.
And provided we believe these payoffs, we're done.
Period two people are going to defect.
Since they're going to defect in period two,
nothing I can do in period one is going to affect that behavior
and therefore I should defect also in period one.
In order to belabor this point, we can actually draw up what
the matrix looks like in period one–so let's do that–using the
style we did last week–two weeks ago–before we went away.
So here once again is the matrix we had before,
and I want to analyze the first stage game.
In the first stage game, what I'm going to do is I'm
going to add in the payoffs I'm going to get from tomorrow.
The payoffs I'm going to get from tomorrow are from
tomorrow's equilibrium.

Well this isn't going to do
very much for me as we'll see because I'll get 2 + 0 tomorrow
because we know I'm playing defect tomorrow,
2 + 0 tomorrow, – 1 + 0 tomorrow,
3 + 0 tomorrow, 3 + 0 tomorrow,
– 1 + 0 tomorrow, and 0 + 0 tomorrow,
0 + 0 tomorrow. So just as we did with the war
of attrition game two weeks ago, we can put in the payoffs from
tomorrow, we can roll back those
equilibrium payoffs to today, it's just in this particular
exercise it's rather a boring thing,
because we're just adding 0 to everything.
When I add 0 to everything, I then just cancel out the
zeros and I'm back where I started, and of course I should
defect. So what I'm going to see is
because I'm going to defect anyway tomorrow,
today is just like a one shot game as well.
And I'm going to get defect again.
Now here we played the game twice and got defect,
defect, what about if we played the game three times?
It's the same thing. We didn't play three times but
we did play the game three times between Edwina and Ben.
There we know we're going to defect in the third round.
Therefore we may as well defect in the second to last round.
Therefore we may as well defect in the first round.
If we played it five times, we know we're going to all
defect in the fifth round.

Therefore we may as well defect
in the fourth round. Therefore we may as well defect
in the third, and so on.
If we played it 500 times, we wouldn't have time in the
class, but if we played it 500 times,
we know in that 500th period it's a one shot game and people
are going to defect. And therefore,
in the 499th period people are going to defect.
And therefore in the 498th period people are going to
defect, and so on. So the problem here is that we
get unraveling, something we've seen before in
this class, we get unraveling from the back.
I have a worry that there might only be one L in unraveling in
America, is that right? How many L's do we put in
unraveling in America? One, I've just come back from
England and my spelling is somewhere in the mid-Atlantic
right now.

I'll leave it as one.
All right: unraveling from the back.
Essentially this is a backward induction argument,
only instead of using backward induction we're really using
sub-game perfection. We're looking at the equilibria
in the last games and as we roll back up the game,
we get unraveling. So here's bad news.
The bad news is, we'd hoped that by having
repeated interaction in the Prisoners' Dilemma,
we would be able to sustain cooperation.
That's been our hope since day one of the class.
In fact, we stated it kind of confidently in the first day of
the class, and we kind of intuitively believe it.
But what we're discovering is, even if you played this game
for 500 times and then stopped, you wouldn't be able to sustain
cooperation in equilibrium because we're going to get
unraveling in the last stage and so on and so forth.
So it seems like our big hope that repeated interaction would
induce cooperation in society is going down the plug hole.
That's bad.

So let's come back and modify
our lesson a little bit. So what went wrong here was,
in the last period of the game, there was no incentives
generated by the future, so there was no promise of
future rewards or future punishments, and therefore
cooperation broke down and then we had unraveling.
So the lesson here is what? The lesson is:
but for this to work it helps to have a future. This whole idea of repeated
interaction was the future was going to create incentives for
the present. But if the games come to an
end, there's going to be some point when there isn't a future
anymore and then we get unraveling.
Now this is not just a formal technical point to be made in
the ivory tower of Yale. This is a true idea.
So for example, if we think about CEO's or
presidents, or managers of sports teams,
there's a term we use, there's a word we use–at least
in the states–to describe such leaders when they're getting
towards the end of their term and everyone knows it.
What's the expression we use? "Lame duck."
So we have this lame duck effect.
The lame duck effect at the end of somebody's term undermines
their ability to cooperate, their ability to provide
incentives for people to cooperate with them,
and causes a problem.

So this lame duck effect
affects presidents but it also affects CEO's of companies.
But it's not just leaders who run into this problem.
So if you have an employee, if you're employing somebody,
you may have a contract with the person you're employing,
but basically you're sustaining cooperation with this person
because you interact with them often.
You know you're always going to interact with them.
But then this employee approaches retirement.
Everyone knows that in April or something they're going to
retire, then the future can't provide incentives anymore.
And you have to switch over from the implicit incentive of
knowing you're going to be interacting in the future,
to an explicit incentive of putting incentive clauses in the

So retirement can cause,
if you like, a lame duck effect. This is even true in personal
relationships. In your personal relationships
with your friends, if you think that those
friendships are going to go on for a long time,
be they with your significant other or just with the people
you hang out with, you're likely to get a lot of
cooperation. But if, as with perhaps most
economics majors, most of your significant others
are only going to last for a day at most,
you're not going to get great cooperation.
You're going to get cheating. No one's rising to that one but
I guess it's true. So what do we call these:
"economics majors' relationships." These are kind of "end effects."
All of these things are caused by the fact that the
relationship is coming to an end.
And once the relationship is coming to an end,
all those threats and promises of future behavior,
implicit or otherwise, are going to basically

So at this point we might think
the following. You might conclude the
following. You might conclude that if a
relationship has a known end, if everyone knows the
relationship's going to end at a certain time,
then we're done and we basically can't sustain
cooperation through repeated interaction.
That's kind of what the example we looked at seems to suggest.
However, that's not quite true. So let's look at another
example where a relationship is going to have a known end but
nevertheless we are able to sustain some cooperation.
And we'll see how. So again, I'm being careful
here, I've said it helps to have a future.
I haven't said it's necessary to have a
future. So that's good news for the
Economics majors again. So let's do this example to
illustrate that, even a finite interaction,
even an interaction that's going to end and everyone knows
it's going to end, might still have some hope for

So look at this slightly more
complicated game here. And this game has three
strategies and we'll call them A, B, and C for each player.
The payoffs are as follows (4,4), (0,5),
(0,0), down here we'll do (0,0), (0,0) and (3,3),
and in the middle row (5,0), (1,1), (0,0). We're going to assume that this
game, just like we did with the first time we did Prisoner's
Dilemma, this game is going to be played twice.
It's going to be repeated, it's going to be played twice,
repeated once. So let's just make sure we
understand what the point of this game is.
In this game, in the one shot game I hope
it's clear that (A, A) is kind of the cooperative
thing to do.

We'd like to sustain play of
(A, A), because then both players get 4 and that looks
pretty good for everybody. However, in the one shot game
(A, A) is not a Nash equilibrium.
Why is (A, A) not a Nash equilibrium?
Let me grab those mikes again. Why is (A, A) not a Nash
equilibrium? Anybody?
I'm even getting to know the names at this stage of the term.
This is Katie, right? So shout out.
Student: The best response to the other guy
playing A is playing B. Professor Ben Polak:
Good, so if I think the other person's going to play A,
I'm going to want to defect and play B and obtain a gain,
a gain of 1. So basically I'll get 5 rather
than 4. I'll defect to playing B and
get 5 rather than 4 for a gain of 1, is that right?
So (A, A) is not a Nash equilibrium in the one shot
game–we're sometimes going to call it–it's fine–in the one
shot game. So now imagine we play this
game twice. Instead of just playing once,
we're going to play this game two times.
I'll come back to that.

Before I do that what are the
pure strategy Nash equilibria in this game?
Anybody? So the Nash equilibria in this
one shot game are (B, B) and (C, C).
There's some mixed ones as well but this will do.
So (B, B) and (C, C) are the pure strategy Nash
equilibria. Now consider playing this game
twice. And last time we looked at a
game played twice, it was Prisoner's Dilemma,
and we noticed that we couldn't sustain cooperation because in
the last stage people weren't going to cooperate and hence in
the first stage people weren't going to cooperate.
But let's look what happens here.
If this game is played twice is there any hope of sustaining
cooperation, i.e.

A, in both stages?
Could we have people play A in the first stage and then play A
again in the second stage? So Patrick's shaking his head.
So that's right to shake his head.
So let me grab the other mike. So why is that not going to
work? Why can't we get people to
cooperate and play A in both periods?
Shout out. Student: In the second
period you're still going to defect and play B.
Professor Ben Polak: Good, so in the second
period, exactly the argument that Katie produced just now in
the one shot game applies because the second period game
is a one shot game. So we've got no hope of
sustaining cooperation in both periods.
Let's call this cooperation.

We can't sustain (A,
A) in period two, in the second period.
However, I claim that we may be able to get people to cooperate
in the first period of the game. Now how are we going to do that?
So to see that let's consider the following strategy.
But consider the strategy–the strategy's going to be–play A
and then play C if (A, A) was played;
and play B otherwise. So this strategy is an
instruction telling the player how to play.
Now, before we consider whether this is an equilibrium or not,
let's just check that this actually is a strategy.
So what does a strategy have to do?
It has to tell me what I should do–it should give me an
instruction–at each of my information sets.
In this two period game, each of us, each of the players
in the game has two information sets.
They have an information set at the beginning of the game and
they have another information set at the beginning of period

Is that right?
So it has to tell you what to do at the first information set
and at the second information set and it does.
It says play A at the first one, and then at the beginning
of period two–I said there's only one information set there,
but actually there's nine possible information sets
depending on what happened in the first period.
So each thing that happened in the first period is associated
with a different information set.
I always know what happened in the first period.
And at each of those nine information sets it tells me
what to do at the beginning of period two.
In particular, it says if it turns out that
(A, A) was played then play C now.
And otherwise, all the other eight possible
information sets I could find myself in, play B.
So this is a strategy. Now of course the big question
is, is this strategy an equilibrium, and in particular
is it a sub-game perfect equilibrium? Let me a bit more precise:
if both players were playing this strategy would that be a
sub-game perfect equilibrium? Well let's have a look.

(Of course I don't see it now
so let's pull both these boards down.) So to check whether this
is a sub-game perfect equilibrium,
we're going to have to check what?
We're going to have to check that it induces Nash behavior in
each sub-game. (I think the battery is going
on that. Shall I get rid of that.
Okay, I'm going to shout. Can people still hear me?
People in the balcony can they hear me?
Yup, okay.) So we're going to have to see if we can sustain
Nash behavior in each sub-game. So let's start with the
sub-games associated with the second period.
Technically, there are nine such sub-games
depending on what happened in the past, depending on what
happened in the first period. There's a sub-game following

There's a sub-game following
(A,B). There's a sub-game following
(A,C), and so on. So for each activity in the
first period, for each profile in the first
period there's a sub-game. However, it doesn't really
matter to distinguish all of these sub-games particularly
carefully here, since the costs from the past,
what happened in the past is sunk, so we'll just look at them
as a whole. So in period two after
(A,A)–so you have one in particular of those nine
sub-games–after (A,A), this strategy induces (C,
C). If people play A,
if both people play A in the first period,
then in the sub-game following people are supposed to play (C,
C). Is that a Nash equilibrium of
the sub-game? Well was (C,C) a Nash
equilibrium? Yeah, it's one of our Nash

Let's just look up there,
we've got it listed. There it is.
So we're playing this Nash equilibrium.
So that is a Nash equilibrium, so we're okay.
After the other choices in period one, this strategy
induces (B, B). That's good news too because
(B, B), we already agreed, was a Nash equilibrium in the
one shot game. So in all of those nine
sub-games, the one after (A,A), and the eight after everything
else, we're playing Nash behavior, so that's good.
What about in the whole game? In the whole game,
starting from period one, we have to ask do you do better
to play the strategy as designated,
in particular to choose A, or would you do better to
defect? Well let's have a look.
So if I choose A–remember the other person is playing this
strategy–so if I choose A then my payoff in this period comes
from (A,A) and is 4.

If I choose A then we're both
playing A in this period and I get 4.
Tomorrow, according to this strategy–tomorrow since we both
played A–both of us will now play C.
Since we're both playing C, I'll get an additional payoff
of 3. So tomorrow (C,
C) will occur and I'll get 3 for a total of 7.
What about if I defect? We could consider lots of
possible defections, but let's just consider the
obvious defection. You can check the other ones at
home. So if I defect and choose B
now, then, in this period, I will be playing B and my
opponent or my pair will be playing A.
So in this period I will get 5. And tomorrow,
since (A,A) did not occur, both of us will play B and get
a continuation payoff of 1. So the continuation payoff this
time will be following from (B, B), and I'll get 1.
Why don't I do what I've been doing before in this class and
put boxes around the continuation payoffs,
just to indicate that they are, in fact, continuation payoffs.
So if I play A, I get 4 now and a continuation
payoff of 3, for a total of 7.

If I play B now,
I gain something now, I get 5 now,
but tomorrow I'll only get 1 for a total of 6. So in fact, 7 is bigger than 6,
so I'm okay and I won't want to do this defection. I just want to write this one
other way because it's going to be useful for later.
So one other way to write this, I think we've convinced
ourselves that this is an equilibrium,
but one other way to write this is, and it's a more general way
in repeated games, is to write it explicitly
comparing the temptations to cheat today with the rewards and
punishments from tomorrow.

So what we want to do is,
in general, we can just rewrite this as checking that the
temptation to cheat or defect today is smaller than the value
of the reward minus the value of the punishment.
But the key words here are: defecting occurs today;
rewards and punishments occur tomorrow.
If we just rewrite it this way, we'll see exactly the same
thing, just rearranging slightly.
The temptation to defect today is I get 5 rather than 4,
or if you like a gain of 1. And the value of the reward
tomorrow–the reward was to play (C, C) tomorrow and get 3.
The value of the punishment tomorrow was to play (B,
B) tomorrow and get 1, and that difference is 2.
So here the fact that the temptation is outweighed by the
difference between the value of the reward and the value of the
punishment is what enabled us to sustain cooperation.
I'm just writing that in a more general way because this is the
way that we can apply in games from here on.
We're going to compare temptations to cheat with
tomorrow's promises.


Patrick, let me get you a mike.
Student: I don't understand why it's reasonable
to think you would play (B, B) in the second period though.
In the second period you have a temptation to play C,
C even if the person defected on you.
Professor Ben Polak: Good, that's a very good
point. So what Patrick's saying is,
it's all very well to say we're sustaining cooperation in the
first period here, but the way in which we
sustained cooperation was by going along with,
as it were, the punishment tomorrow.
It required me, tomorrow, to go along with the
strategy of choosing B if I cheated in the first period.
I want to answer this twice, once disagreeing with him and
once agreeing with him.

So let me just disagree with
him first. So notice tomorrow,
if the other person, the other player is going to
play B then I'm going to want to play B.
So the key idea here is, as always in Nash equilibrium,
if I take the other person's play as given and just look at
my own behavior–if I think the other person is playing this
strategy and hence he's going to play B tomorrow after I've
cheated–then I want to be play B myself.
So that check is just our standard check,
and actually that's the check that makes sure that it really
is a sub-game perfect equilibrium.
We're not putting some punishments down the tree that
are arising out of equilibrium.

It has to be I want to do
tomorrow what I'm told to do tomorrow.
So that idea seems right and I'm glad Patrick raised it
because that was the next thing in my notes.
I want to go along with this punishment because if the other
person's playing B I want to play B myself.
Nevertheless, I think Patrick's onto
something and let me come back to it in a minute.
What I want to do before I do that is just draw out a general
lesson from this game.

The general lesson is we can
sustain cooperation even in a finitely repeated game,
but to do so we need there to be more than one Nash
equilibrium in the stage game. What we need there to be is
several Nash equilibria, one at least of which we can
use as a reward and another one which we can use as a
punishment. So even if a game is only
played a finite number of times, if there are several equilibria
in the stage game, both (B,B) and (C,C),
we can use one of them as a reward and the other one as a
punishment, and use that difference to try
and get people to resist temptations today.
So that's the general idea here, and let's just write that
down. Patrick don't let me get away
with not coming back to your point, I want to come back to it
in a second. So the lesson here is,
if a stage game–a stage game is the game that's going to be
repeated–if a stage game has more than one Nash equilibrium
in it, then we may be able to use the
prospect of playing different equilibria tomorrow to provide
incentives–and we could think of these incentives as rewards
and punishments–for cooperation today.
In the game we just saw, there were exactly two pure
strategy Nash equilibria in the sub-game.
We used one of them as a reward and the other one as a
punishment, and we were able to sustain cooperation in a
sub-game perfect equilibrium.

Now, a question arises here,
and I think it's behind Patrick's question,
and that is how plausible is this?
How plausible is this? Formally, if we write down the
game and do the math, this comes out.
But how plausible is this as a model of what's going on in
society? I think the worry–I'm guessing
this is worry that was behind Patrick's question–is this.
Suppose I'm playing this game with Patrick and suppose Patrick
cheats on me the first period, so Patrick chooses B while I
wanted him to choose A in the first period.
Now in the second period, according to the equilibrium
instructions, we're supposed to play (B,
B) and get payoffs of 1 rather than (C,C) and get payoffs of 3.
So let's make that visible again.

But suppose Patrick comes to me
in the meantime. So between period one and
period two, Patrick shows up at my office hours and he says:
yeah, I know I cheated on you
yesterday, but why should we punish ourselves today?
Why should we, both of us, lose today by
playing the (B,B) equilibrium? Why don't we both switch to the
(C,C) equilibrium? After all, that's better for
both of us. Patrick's saying to me,
it's true that I cheated you yesterday, but "let bygones be
bygones," or "why cry over spilt milk,"
or he'll use some other saying plucked out of the book of
platitudes, and say to me:
well why go along with the punishment.
Let's just play the good equilibrium now.
And, if I look at things and I say well, actually,
it's true I got nothing in the first period because Patrick
kind of cheated me in the first period–so it's true I got
nothing yesterday–and it's true it was Patrick who caused me to
get nothing yesterday, but nevertheless that's a sunk
cost and I'm comparing getting 1 now with getting 3 now.
Why don't we just go along and get 3?
Moreover, I'm not in danger of being cheated again because if
Patrick believes I'm going to play C, he's going to play C

So that kind of argument
involves what? It involves some kind of
communication between stages, but it sounds like that's going
to be a problem. Why?
Well, suppose it's the case that we are going to get
communication between periods and suppose it's the case that
someone with the gift of the gab,
someone on his way to law school like Patrick,
is going to be able to persuade me to go back to the good
equilibrium for everybody in period two,
then we know we're going to play the good equilibrium in
period two and now we've lost any incentive to cooperate in
period one. The only reason I was willing
to cooperate in period one was because the temptation to defect
was outweighed by the difference between the value of the reward
and the value of the punishment. If we're going to get the
reward anyway, I'll go ahead and defect today.
So the problem here is this notion of "renegotiation," this
notion of communicating between periods can undermine this kind
of equilibrium. There's a problem that arises
if we have renegotiation.

So there may be a problem of
renegotiation. Now, this probably may not be
such a big problem. For example,
it may be, say, I'll be so angry at Patrick
because he screwed me over in period one that I won't go along
with the renegotiation. It may also be the case,
and we'll see some examples of this on the homework assignment,
that the many equilibria in the second stage of the game are not
such that a punishment for Patrick is also a punishment for
me. What really caused the problem
here was, in trying to punish Patrick, I had to punish myself.
But you could imagine games, or see some concrete examples
on the next homework assignment, in which punishing Patrick is
rather fun for me, and punishing me is rather fun
for Patrick, and that's going to be much
harder to renegotiate our way out of.
There was a question, let me get a mike out to the
question. Yeah?
Student: If we're ruling out renegotiation,
can't we devise a strategy for Prisoner's Dilemma as well even
though it doesn't have multiple Nash equilibriums?
Professor Ben Polak: Yeah, okay good,
so the issue there is, in Prisoner's Dilemma,
we established in the first week that if we're not allowed
to make side payments, we're not allowed to bring in
outside contracts, then no amount of communication
is going to help us.

So you're right if we can rely
on the courts or the mafia to enforce the contracts that would
be fine and then communication would have bite.
But you remember way back in the first week when we tried to
talk our way out of bad behavior in the Prisoner's Dilemma it
didn't help precisely because it's a dominant strategy.
Whereas, here, Patrick's conversation,
Patrick's verbal agreement to play the other equilibrium is an
agreement to play a Nash equilibrium.
That's what is getting us into trouble.
So what may help us here, what may avoid renegotiation is
simply I'm not going to go along with that renegotiation–I'm too
angry about having been cheated on–and it may be for other
reasons it may actually be that I enjoy the punishment.
Nevertheless, this is a real problem in
society and I don't think we should pretend that this problem
isn't there.

So a good example is in
bankruptcy, which is one of those words I can never spell.
It seems I have too many consonants in it,
is that right? It's approximately right
anyway. So bankruptcy law in the U.S.
for the last 200 odd years has gone through cycles.
One way to view these cycles is, they're cycles of relaxing
the law and making life "easier for borrowers" and then
tightening up again.

This is not a recent
phenomenon, this is not only a recent phenomenon,
this occurred throughout the nineteenth century.
So what typically happened was there was either explicit
renegotiation between parties or renegotiation through act of
Congress or sometimes through the acts of the states,
in which bankrupt debtors were basically let off or given
easier terms. The argument was always the
same. These people are not going to
pay back now. It's clear from the nineteenth
century, often if you were bankrupt you were in jail,
actually worse than that. Sometimes in the nineteenth
century in England not only if you were bankrupt were you in
jail but your creditors were having to pay the fees to feed
you in jail. So there you were sitting in
jail, you weren't paying that money back to your creditor,
and you're actually costing money to your creditor by being
in jail. This seems like a situation
that you want to renegotiate your way out of.
You say, hey let's let these guys out of jail.
Let them be productive again, and then they'll pay back part
of the loans.

So you had these waves of
bankruptcy reform in which the debtors' prisons were closed
down, people were let out, people were relieved of debt.
What's the problem with doing that?
That seems like a good idea right.
After all, you don't want all these people bankrupt,
in debt, not paying money back to their creditors anyway.
That doesn't seem like a good situation in society.
It seems like a renegotiation that's a win-win situation:
it's better for everybody. What's the problem with it
though? Let's get a mike down here.
What's the problem with this? Student: It incentivizes
bankruptcy. Professor Ben Polak:
Right, it creates an incentive for people not to
repay in the first place. It creates an incentive for
people to take big risks now, and hence, it makes bankruptcy,
if you like, or makes non-repayment of debt
more likely. So this has been going on for a
while, but you see it very much today if you read the financial
pages of the papers in the last few weeks.
There's a big worry in the U.S.

Right now about people failing
to repay what kind of debt? What kind of debt is the big
worry about? Mortgage debt,
right, so those people who are house owners failing to pay back
mortgage debt and, equally worrying,
financial institutions that have a lent a lot of,
for example, sub-prime debt now find
themselves in financial trouble. You're going to read a lot in
the papers about not letting people out lightly out of those
situations of being in debt, or not letting people out
lightly out of bankruptcy.

The term you're going to hear
is "bail out." So bail out–the argument
you're going to read is, you don't want the government
or the central bank bailing out those financial institutions who
have apparently taken too large risks on sub-prime mortgage
debts, even though we all agree it's
better right now for those financial institutions not to go
under. Why are we not going to–Even
though it's better for everybody for it not to go under,
why are we not going to bail them out?
Because it undermines the incentives for them not to make
bad loans to start with. To a lesser extent you're going
to hear that on the debtor side as well.
You're going to hear some people say we shouldn't be
bailing out people who took on bad loans,
took on bad mortgages to finance their houses,
again for bail out reasons.

So this is an important trade
off. If you go on to law school,
you're going to see a lot about this kind of discussion,
and this is the discussion of trading off ex-ante efficiency
and ex-post efficiency. Sometimes, as Patrick has
pointed out in the game just now, the ex-post efficient thing
to do is to go back to the good equilibrium,
or if you like to bail out these firms who've made bad
loans. However, from an ex-ante point
of view, it creates bad incentives for people to make
those loans in the first place; and, in the ex-ante point of
view, it created the incentive for people to defect in the
first period of that game.

So this theme of ex-ante versus
ex-post efficiency is not one we're going to go into anymore
in this class, but it should be there in the
back of your minds when you all end up in law school in a few
years time. Okay, so, so far what have we
done? We've been looking at repeated
interaction and seeing if it can sustain cooperation.
The first thing we learned was that if the repeated interaction
is a finite interaction, if we know when it's going to
end–we know when the interaction's going to end–then
sustaining cooperation is going to be hard because in the last
period there will be an incentive to defect.
We saw we could get around that to some extent if games have
multiple equilibria, but in a game like Prisoner's
Dilemma, we're really in trouble.
Things will unravel from the back.
So now let's mix things up a little bit by looking at a more
complicated variety of repeated interactions.
Rather than just play the game once or twice,
or three times, let's play the game under the
following rules.

We'll go back to our same
players, how many mikes are still out here?
I took them both back, is that right?
I'm taking both the green and the blue mike,
and I'm giving them back to our players.
So this is to Brooke and this is to Patrick.
And we're going to have Brooke and Patrick play Prisoner's
Dilemma again. I'm hoping I haven't deleted it.
Maybe I did. It doesn't matter we know the
payoffs. We're going to have them play
Prisoner's Dilemma again, but this time,
in between every play of the game, I'm going to toss a coin.
Actually I'll toss the coin twice and if that coin comes up
heads both times then the game will end, but otherwise they'll
play again. So everyone understand what
we're going to do? We're going to play Prisoner's
Dilemma. At the end of every period I'll
toss a coin twice. I might get Jake to toss it.
Jake will toss a coin twice. If it comes up heads both times
the game's over but otherwise the game continues.
So both Brooke and Patrick should get ready to play,
and the payoffs of this game are just what we had before.
So let's just remind ourselves what the payoffs of that game

So we've got cooperate,
defect, cooperate, defect, (2,2),
(-1,3), (3, -1) and (0,0). And we'll keep score here:
so this Brooke and Patrick. So, putting pressure on these
guys, let's write down what you're going to do the first
time. Brooke? Student: Defect.
Professor Ben Polak: Patrick?
Student: Cooperate. Professor Ben Polak:
All right. I think we're getting some
payback from earlier, right.
Round two. Student: Are you going
to toss the coin? Professor Ben Polak:
Oh I have to toss the coin, you're absolutely right,
thank you. Now I have to find a coin.
Look at that, thank you Ale. Twice: toss it twice. Heads, heads again,
so the game is over. That didn't last long.
Just for the sake of the class, let's pretend that it came up
tails. Okay we'll cheat a little bit.
Okay, so we're playing a second time–just with a little bit of

I need someone else,
someone less honest to toss the coin.
Brooke what do you choose? Student: Oh I'm
defecting. Professor Ben Polak:
Defecting again, Patrick?
Student: Cooperate. Professor Ben Polak:
Cooperate, Patrick seems very trusting
here, all right let's toss the coin a third time.
All right, Brooke? Student: I'm going to
defect again. Student: Defect.
Professor Ben Polak: All right,
heads, heads, so this time we'll end it.
So what happened this time, let's just talk about it a bit.
So Brooke and Patrick were playing, Patrick cooperated a
bit in the beginning, Brooke's defected throughout.
Brooke why did you defect? Shout out so everyone can hear
you. Why did you defect right from
the start of the game? Student: Because last
time it didn't work so well cooperating.
Professor Ben Polak: Last time it didn't work
so well, okay.

Fair enough but even after
Patrick was sort of cooperating you went on defecting.
So why then? Student: Because I
wanted to get the higher payoff, I thought either he would
continue cooperating and I could defect,
Professor Ben Polak: All right,
if he had gone on cooperating, which in fact he did.
Patrick why were you cooperating early on here?
Shout out so people can hear you.
Student: So with a two head rule, like you have a 75%
chance at having another game. So with those payoffs,
even one period the payoff of cooperating twice is the same as
defecting once, so it's better if you can
continue cooperating, and the percentage is high
enough that it would make sense to do so.
Professor Ben Polak: All right,
if you figure there's a good enough chance of getting–even
after Brooke's defected the first period you went on
cooperating, but then after the second
period you gave up and started defecting.
If it had gone on to the fourth period what would you have done?
Student: Defected.

Professor Ben Polak:
You would have defected again, all right.
Fifth period? Student: Well if she
kept defecting, I would keep defecting.
Professor Ben Polak: All right,
so what Patrick's saying is he started off cooperating but once
he saw that Brooke was defecting,
he was going to switch to defect.
And basically as long as she went on defecting,
he was going to stick with defecting.
Let's try a different pair. So why don't we switch it over
to your partners there. So Ben here and Edwina. So why don't you stand up.
I want everybody to see these people.
So stand up a second. So these are our players,
I want people at the back to essentially know who are
playing, this is Edwina and this is Ben.
So Edwina–sit down so you can actually write things down.
So Edwina and Ben, Edwina, have you both written
down a strategy? Ben, have you written down a
strategy? Edwina what did you choose?
Student: Cooperate.

Professor Ben Polak:
So Edwina's cooperating, Ben?
Student: Cooperate. Professor Ben Polak:
Okay, let's toss the coin. So we're okay,
so we're still playing. Edwina?
Student: Cooperate. Professor Ben Polak:
Ben? Student: I chose
cooperate. Professor Ben Polak:
All right, so they're cooperating.
Tails again, so you're still playing.
Student: Cooperate. Student: Cooperate.
Professor Ben Polak: All right,
so they're still cooperating. Some pain in the voice this
time. Heads and then tails,
write down what you're going to do.
Edwina? Student: Defect. Professor Ben Polak:
Ben. Student: Cooperate.
Professor Ben Polak: Things were going so nicely
there. We had such a nice class going
on there–. All right, so we're still
playing. Edwina? Student: Defect.
Professor Ben Polak: Ben?
Student: Defect. Professor Ben Polak:
All right, Jake?
Tails, tails, we're still going. Student: Defect.
Student: Defect. Professor Ben Polak:
All right, let me stop it there,
we'll pretend that we had two heads.
So let's talk about this. We had some cooperation going
on here, both people started cooperating.
So Ben, why did you cooperate at the beginning?
Student: Well, going along with Patrick's
reasoning I felt that if we could have the cooperate,
cooperate in the long term with the 75% chance of continuing
playing, that it would be a worthwhile investment.
Professor Ben Polak: All right.
Student: Until I realized that Edwina had started

Professor Ben Polak:
Let's come back a second. Let's get you guys to stand up
so people can hear you. When you stand up you shout
more. So stand up again.
Edwina, so you also started cooperating, why did you start
cooperating? Student: For the same
reason. Professor Ben Polak:
Same reason, okay.
So the key thing here is why did you start defecting?
You heard the big sigh in the class.
Why did you start defecting at this stage?
Student: Because we'd had so many, I mean the coin
toss had to come to heads, heads sometime,
so I started thinking that maybe- Professor Ben Polak:
The reversion to the mean of the coin.
Student: Yeah, I just thought that it.
I thought, I mean, I don't know.

Professor Ben Polak:
So what did I say about the relationships of Economic majors
that are in the class? Anyway, all right,
so Edwina defected and then Ben you switched after,
why did you switch? Student: Because once
Edwina started defecting I felt that we'd revert back to the
defect, defect equilibrium. Professor Ben Polak:
All right, so thank you guys.
So there's another good strategy here.
People started off cooperating and I claim that at least
Ben–Ben can contradict me in a second–but I think Ben's
strategy here was something like this.
I'm going to cooperate and I'm going to go on cooperating as
long as we're cooperating.

But if at some point Edwina
defects–or for that matter I defect–then this relationship's
over and we're going to play defect forever.
Is that right? That's kind of a rough
description of your strategy? Edwina was more or less playing
the same thing. In fact it was her who
defected, but once she defected she realized that it was over
and she went on defecting. So this strategy has a name.
Let's just be clear what the strategy is.
This strategy says play C which is cooperate,
and then play C if no one has played D;
and play D otherwise. So start off by cooperating.
Keep cooperating as long as nobody's cheated.
But if somebody cheats, this relationship's over:
we're just going to defect forever.
Now this strategy is a famous strategy.
It has a name.

Anyone know what the name is?
This is called the "Grim Trigger Strategy."
So this strategy again, it says we're going to
cooperate, but if that cooperation breaks down ever,
even if it's me who breaks it down, then I'm just going to
defect forever. Now, we're going to come back
next time to see if this is an equilibrium, but there's a few
things to do first. First let's just check that it
actually is a strategy. What does it mean to be a
strategy again? It has to tell us what to do at
every information set I could find myself at.
And this game is potentially infinite, so potentially there's
an infinite number of information sets I could reach.
So you might think that writing down a strategy that gives me an
instruction at every single information set is going to be
incredibly complicated once we go to games that are potentially
infinite, because there needs to be an
infinite number of instructions.

But it turns out,
actually it's possible to write down such strategies rather
simply, at least if they're simple strategies.
This example is one. This tells me what to do at the
first information set, it says play C.
It then tells me for every information set I find myself
at, in which only cooperation has ever occurred in the history
of the game, I'm going to go on cooperating:
play C. And it says for all other
histories, for all other information sets I might find
myself at, play D. So it really is a strategy.
Now this is very different behavior–we played with the
same players–this kind of behavior is very different,
in both games actually, is very different than the
behavior we saw in the game that ended,
the game with two periods or three periods.
What is it essentially that made this different?
What's different about this way of playing Prisoner's Dilemma,
where we had Jake toss the coin versus the way we played before
and we just played for five periods and then stopped?
What's different about it? Somebody?
Let's talk to our players, Patrick why is this different?
Student: We don't when the game is going to end or if
it's going to end, so there's no last period.
Professor Ben Polak: Good, so our analysis of
the game before, the analysis of the Prisoner's
Dilemma when we knew it was going to end after two periods,
after five periods, whatever it was,
was we all knew it was going to end.
There was a clearly defined last period.
When people are going to retire, we know the month in
which they're going to retire.

When President's are going to
step down, we know they're going to step down that period.
When CEO's are going to go, we know they're going to go–or
acctually we don't always know they're going to go but let's
just pretend we do. So what's different about this
game is, every time we play the game, there is a probability,
in this case a .75 probability that the game is going to
continue to the next period. Every time we play the game,
with probability of .75 there's going to be a future.
There's no obvious last period from which we can unravel the
game in the way we did before. Just to remind ourselves,
the way in which cooperation–our analysis of
cooperation–broke down in the finitely repeated Prisoner's
Dilemma, was when we looked at the last
period, we know people are going to defect.
And once that thread is loose we can unravel it all the way
back to the beginning.

But here, since there is no
last period that unraveling argument never gets to hold.
Now instead we're able to see strategies emerge like the Grim
Trigger Strategy, and notice that the Grim
Trigger Strategy has a pretty good chance of actually
sustaining cooperation. So in particular,
as long as people play this strategy they are cooperating.
It turns out that Edwina eventually gave up that
strategy, but had she gone on playing it, they would have gone
on cooperating forever. But of course there's a
question here, and the question is:
is this in fact an equilibrium? We know that if people play
this way, we get cooperation, but the question–the thousand
dollar question or whatever–is: is this an equilibrium?
So what do we have to do check whether this is an equilibrium
or not? We have to mimic the argument
we had before.

We have to compare the
temptation to defect today and compare that with the value of
the reward (to cooperating) and the value of the punishment
(from defecting) tomorrow. So this basic idea is going to
re-emerge. Having said that,
let me now delete it so I have some room. To show this is an equilibrium,
we need to show that the temptation to defect–the
temptation to cheat in the short term–is outweighed by the
difference between the value of the reward and the value of the
punishment. All right, so let's set that up. Let's put the temptation here
first. So the temptation in Prisoner's
Dilemma, the temptation to cheat today is what?
I'll get 3 rather than 2, is that right?
So if I defect–when Edwina defected: here's Edwina
defecting in this period–she got a payoff of 3 rather than
the payoff of 2 she would have got from cooperating.
So the temptation here is just 3 – 2 and let's be clear,
this is a temptation today and we want to compare this with the
value of the reward minus the value of the punishment,
but the key observation is that these occur tomorrow.
So since they occur tomorrow we have to weight them a little bit

So in general,
the way in which we're going to weight them tomorrow is we're
going to discount them just like we did in our bargaining game.
We're going to weight tomorrow's payments by δ,
where δ < 1.
Now why is δ < 1? Why are we weighing tomorrow
less than payment today? Why are payments tomorrow worth
less than payments today? Because tomorrow might not
happen. There are other reasons why,
by the way. It might be that we are
impatient to get the money today.
Edwina just wanted the payoff in a hurry, or it might be that
she wanted to take the payment today and put it in the bank and
earn interest. There are other reasons why
money today might be more valuable than money tomorrow,
but, in games, the most important reason is:
tomorrow may not happen.

By tomorrow you might be dead,
or, if not dead, at least Jake's thrown two
heads in the coins. So δ is less than 1
because the game may end. Now, what's the value of the
reward? The value of the reward is
going to be the value of C "for ever," but you want to be
careful about "for ever." It's C for ever,
but of course it isn't really for ever because the game may
end. So by "for ever" I mean until
the game ends. Let me be a bit more careful
actually, it's (C, C) isn't it?
The value of (C, C)–(cooperate,
cooperate)–for ever.

Here we're going to have the
value of (D, D) for ever. And once again,
the for ever here means until the game ends.
So this is the calculation we're going to have to do.
We're going to have to compare the temptation,
that was easy, that was just 1 with the
discounted difference between the value of cooperation and the
value of defecting. Let's do the easy bits now,
and then we'll leave you in suspense until Wednesday.
So let's do all the easy bits. So what's this δ
in this case? In this particular game what
was the probability that the game was going to continue?
What was the probability that the game was going to end?
The probability of it ending was .25, so δ
here was .75, that's easy.
The second bit that's relatively easy is what's the
value of playing (D, D) until the game ends?
Once people have cheated you're going to play D for ever–here
we are: Edwina's cheating here.

You're going to get (D,
D) in this period, (D, D) in this period,
and so on and so forth until the game ends.
In each of those periods you're going to earn 0,
so this is just 0. Which leaves us with a messy
bit: what's the value of cooperating forever?
Let's try and do it. We've got one minute.
Let's do it. So in every period in which we
both cooperate what do we earn? Throughout the beginning of the
game: we cooperated in the first period;
now in the second period, we cooperate again.
What payoff do we get from cooperating again?
We get 2 and then Jake tosses his coin and with probability
δ we continue and we're going to cooperate again.
So with probability δ we cooperate again and get what
payoff the next period? 2 again, and then Jakes tosses
the coin again, so now he's tossed the coin
twice, so with probability
δ² we're still playing and we get 2,
and then Jakes tosses the coin again and it comes up other than
heads, heads again,
that's with probability δ³ we get 2 and so on.

So your exercise between now
and Wednesday is to figure out what the value of cooperation
forever is: figure out this equation and find whether in
fact it was an equilibrium for people to cooperate.
We'll pick it up on Wednesday..

As found on YouTube

Looking to see what kind of mortgage you can get? Click here to see

Leave a reply

Your email address will not be published. Required fields are marked *