66. Markov Perfect Equilibrium#

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon
Hide code cell output
Requirement already satisfied: quantecon in /opt/conda/envs/quantecon/lib/python3.12/site-packages (0.8.0)
Requirement already satisfied: numba>=0.49.0 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from quantecon) (0.60.0)
Requirement already satisfied: numpy>=1.17.0 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from quantecon) (1.26.4)
Requirement already satisfied: requests in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from quantecon) (2.32.3)
Requirement already satisfied: scipy>=1.5.0 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from quantecon) (1.13.1)
Requirement already satisfied: sympy in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from quantecon) (1.13.1)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from numba>=0.49.0->quantecon) (0.43.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from requests->quantecon) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from requests->quantecon) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from requests->quantecon) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from requests->quantecon) (2024.8.30)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/quantecon/lib/python3.12/site-packages (from sympy->quantecon) (1.3.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

66.1. Overview#

This lecture describes the concept of Markov perfect equilibrium.

Markov perfect equilibrium is a key notion for analyzing economic problems involving dynamic strategic interaction, and a cornerstone of applied game theory.

In this lecture, we teach Markov perfect equilibrium by example.

We will focus on settings with

  • two players

  • quadratic payoff functions

  • linear transition rules for the state

Other references include chapter 7 of [Ljungqvist and Sargent, 2018].

Let’s start with some standard imports:

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (11, 5)  #set default figure size
import numpy as np
import quantecon as qe

66.2. Background#

Markov perfect equilibrium is a refinement of the concept of Nash equilibrium.

It is used to study settings where multiple decision-makers interact non-cooperatively over time, each pursuing its own objective.

The agents in the model face a common state vector, the time path of which is influenced by – and influences – their decisions.

In particular, the transition law for the state that confronts each agent is affected by decision rules of other agents.

Individual payoff maximization requires that each agent solve a dynamic programming problem that includes this transition law.

Markov perfect equilibrium prevails when no agent wishes to revise its policy, taking as given the policies of all other agents.

Well known examples include

Let’s examine a model of the first type.

66.2.1. Example: A Duopoly Model#

Two firms are the only producers of a good, the demand for which is governed by a linear inverse demand function

(66.1)#p=a0a1(q1+q2)

Here p=pt is the price of the good, qi=qit is the output of firm i=1,2 at time t and a0>0,a1>0.

In (66.1) and what follows,

  • the time subscript is suppressed when possible to simplify notation

  • x^ denotes a next period value of variable x

Each firm recognizes that its output affects total output and therefore the market price.

The one-period payoff function of firm i is price times quantity minus adjustment costs:

(66.2)#πi=pqiγ(q^iqi)2,γ>0,

Substituting the inverse demand curve (66.1) into (66.2) lets us express the one-period payoff as

(66.3)#πi(qi,qi,q^i)=a0qia1qi2a1qiqiγ(q^iqi)2,

where qi denotes the output of the firm other than i.

The objective of the firm is to maximize t=0βtπit.

Firm i chooses a decision rule that sets next period quantity q^i as a function fi of the current state (qi,qi).

An essential aspect of a Markov perfect equilibrium is that each firm takes the decision rule of the other firm as known and given.

Given fi, the Bellman equation of firm i is

(66.4)#vi(qi,qi)=maxq^i{πi(qi,qi,q^i)+βvi(q^i,fi(qi,qi))}

Definition A Markov perfect equilibrium of the duopoly model is a pair of value functions (v1,v2) and a pair of policy functions (f1,f2) such that, for each i{1,2} and each possible state,

  • The value function vi satisfies Bellman equation (66.4).

  • The maximizer on the right side of (66.4) equals fi(qi,qi).

The adjective “Markov” denotes that the equilibrium decision rules depend only on the current values of the state variables, not other parts of their histories.

“Perfect” means complete, in the sense that the equilibrium is constructed by backward induction and hence builds in optimizing behavior for each firm at all possible future states.

  • These include many states that will not be reached when we iterate forward on the pair of equilibrium strategies fi starting from a given initial state.

66.2.2. Computation#

One strategy for computing a Markov perfect equilibrium is iterating to convergence on pairs of Bellman equations and decision rules.

In particular, let vij,fij be the value function and policy function for firm i at the j-th iteration.

Imagine constructing the iterates

(66.5)#vij+1(qi,qi)=maxq^i{πi(qi,qi,q^i)+βvij(q^i,fi(qi,qi))}

These iterations can be challenging to implement computationally.

However, they simplify for the case in which one-period payoff functions are quadratic and transition laws are linear — which takes us to our next topic.

66.3. Linear Markov Perfect Equilibria#

As we saw in the duopoly example, the study of Markov perfect equilibria in games with two players leads us to an interrelated pair of Bellman equations.

In linear-quadratic dynamic games, these “stacked Bellman equations” become “stacked Riccati equations” with a tractable mathematical structure.

We’ll lay out that structure in a general setup and then apply it to some simple problems.

66.3.1. Coupled Linear Regulator Problems#

We consider a general linear-quadratic regulator game with two players.

For convenience, we’ll start with a finite horizon formulation, where t0 is the initial date and t1 is the common terminal date.

Player i takes {uit} as given and minimizes

(66.6)#t=t0t11βtt0{xtRixt+uitQiuit+uitSiuit+2xtWiuit+2uitMiuit}

while the state evolves according to

(66.7)#xt+1=Axt+B1u1t+B2u2t

Here

  • xt is an n×1 state vector and uit is a ki×1 vector of controls for player i

  • Ri is n×n

  • Si is ki×ki

  • Qi is ki×ki

  • Wi is n×ki

  • Mi is ki×ki

  • A is n×n

  • Bi is n×ki

66.3.2. Computing Equilibrium#

We formulate a linear Markov perfect equilibrium as follows.

Player i employs linear decision rules uit=Fitxt, where Fit is a ki×n matrix.

A Markov perfect equilibrium is a pair of sequences {F1t,F2t} over t=t0,,t11 such that

  • {F1t} solves player 1’s problem, taking {F2t} as given, and

  • {F2t} solves player 2’s problem, taking {F1t} as given

If we take u2t=F2txt and substitute it into (66.6) and (66.7), then player 1’s problem becomes minimization of

(66.8)#t=t0t11βtt0{xtΠ1txt+u1tQ1u1t+2u1tΓ1txt}

subject to

(66.9)#xt+1=Λ1txt+B1u1t,

where

  • Λit:=ABiFit

  • Πit:=Ri+FitSiFit

  • Γit:=WiMiFit

This is an LQ dynamic programming problem that can be solved by working backwards.

Decision rules that solve this problem are

(66.10)#F1t=(Q1+βB1P1t+1B1)1(βB1P1t+1Λ1t+Γ1t)

where P1t solves the matrix Riccati difference equation

(66.11)#P1t=Π1t(βB1P1t+1Λ1t+Γ1t)(Q1+βB1P1t+1B1)1(βB1P1t+1Λ1t+Γ1t)+βΛ1tP1t+1Λ1t

Similarly, decision rules that solve player 2’s problem are

(66.12)#F2t=(Q2+βB2P2t+1B2)1(βB2P2t+1Λ2t+Γ2t)

where P2t solves

(66.13)#P2t=Π2t(βB2P2t+1Λ2t+Γ2t)(Q2+βB2P2t+1B2)1(βB2P2t+1Λ2t+Γ2t)+βΛ2tP2t+1Λ2t

Here, in all cases t=t0,,t11 and the terminal conditions are Pit1=0.

The solution procedure is to use equations (66.10), (66.11), (66.12), and (66.13), and “work backwards” from time t11.

Since we’re working backward, P1t+1 and P2t+1 are taken as given at each stage.

Moreover, since

  • some terms on the right-hand side of (66.10) contain F2t

  • some terms on the right-hand side of (66.12) contain F1t

we need to solve these k1+k2 equations simultaneously.

66.3.2.1. Key Insight#

A key insight is that equations (66.10) and (66.12) are linear in F1t and F2t.

After these equations have been solved, we can take Fit and solve for Pit in (66.11) and (66.13).

66.3.2.2. Infinite Horizon#

We often want to compute the solutions of such games for infinite horizons, in the hope that the decision rules Fit settle down to be time-invariant as t1+.

In practice, we usually fix t1 and compute the equilibrium of an infinite horizon game by driving t0.

This is the approach we adopt in the next section.

66.3.3. Implementation#

We use the function nnash from QuantEcon.py that computes a Markov perfect equilibrium of the infinite horizon linear-quadratic dynamic game in the manner described above.

66.4. Application#

Let’s use these procedures to treat some applications, starting with the duopoly model.

66.4.1. A Duopoly Model#

To map the duopoly model into coupled linear-quadratic dynamic programming problems, define the state and controls as

xt:=[1q1tq2t]anduit:=qi,t+1qit,i=1,2

If we write

xtRixt+uitQiuit

where Q1=Q2=γ,

R1:=[0a020a02a1a120a120]andR2:=[00a0200a12a02a12a1]

then we recover the one-period payoffs in expression (66.3).

The law of motion for the state xt is xt+1=Axt+B1u1t+B2u2t where

A:=[100010001],B1:=[010],B2:=[001]

The optimal decision rule of firm i will take the form uit=Fixt, inducing the following closed-loop system for the evolution of x in the Markov perfect equilibrium:

(66.14)#xt+1=(AB1F1B1F2)xt

66.4.2. Parameters and Solution#

Consider the previously presented duopoly model with parameter values of:

  • a0=10

  • a1=2

  • β=0.96

  • γ=12

From these, we compute the infinite horizon MPE using the preceding code

import numpy as np
import quantecon as qe

# Parameters
a0 = 10.0
a1 = 2.0
β = 0.96
γ = 12.0

# In LQ form
A = np.eye(3)
B1 = np.array([[0.], [1.], [0.]])
B2 = np.array([[0.], [0.], [1.]])


R1 = [[      0.,     -a0 / 2,          0.],
      [-a0 / 2.,          a1,     a1 / 2.],
      [       0,     a1 / 2.,          0.]]

R2 = [[     0.,           0.,      -a0 / 2],
      [     0.,           0.,      a1 / 2.],
      [-a0 / 2,      a1 / 2.,           a1]]

Q1 = Q2 = γ
S1 = S2 = W1 = W2 = M1 = M2 = 0.0

# Solve using QE's nnash function
F1, F2, P1, P2 = qe.nnash(A, B1, B2, R1, R2, Q1, 
                          Q2, S1, S2, W1, W2, M1, 
                          M2, beta=β)

# Display policies
print("Computed policies for firm 1 and firm 2:\n")
print(f"F1 = {F1}")
print(f"F2 = {F2}")
print("\n")
Computed policies for firm 1 and firm 2:

F1 = [[-0.66846615  0.29512482  0.07584666]]
F2 = [[-0.66846615  0.07584666  0.29512482]]

Running the code produces the following output.

One way to see that Fi is indeed optimal for firm i taking F2 as given is to use QuantEcon.py’s LQ class.

In particular, let’s take F2 as computed above, plug it into (66.8) and (66.9) to get firm 1’s problem and solve it using LQ.

We hope that the resulting policy will agree with F1 as computed above

Λ1 = A - B2 @ F2
lq1 = qe.LQ(Q1, R1, Λ1, B1, beta=β)
P1_ih, F1_ih, d = lq1.stationary_values()
F1_ih
array([[-0.66846613,  0.29512482,  0.07584666]])

This is close enough for rock and roll, as they say in the trade.

Indeed, np.allclose agrees with our assessment

np.allclose(F1, F1_ih)
True

66.4.3. Dynamics#

Let’s now investigate the dynamics of price and output in this simple duopoly model under the MPE policies.

Given our optimal policies F1 and F2, the state evolves according to (66.14).

The following program

  • imports F1 and F2 from the previous program along with all parameters.

  • computes the evolution of xt using (66.14).

  • extracts and plots industry output qt=q1t+q2t and price pt=a0a1qt.

AF = A - B1 @ F1 - B2 @ F2
n = 20
x = np.empty((3, n))
x[:, 0] = 1, 1, 1
for t in range(n-1):
    x[:, t+1] = AF @ x[:, t]
q1 = x[1, :]
q2 = x[2, :]
q = q1 + q2       # Total output, MPE
p = a0 - a1 * q   # Price, MPE

fig, ax = plt.subplots(figsize=(9, 5.8))
ax.plot(q, 'b-', lw=2, alpha=0.75, label='total output')
ax.plot(p, 'g-', lw=2, alpha=0.75, label='price')
ax.set_title('Output and prices, duopoly MPE')
ax.legend(frameon=False)
plt.show()
_images/f3156fceb064592506812fe7b7fd88b0a7252fa92383a4d7ed46eb9c937ea218.png

Note that the initial condition has been set to q10=q20=1.0.

To gain some perspective we can compare this to what happens in the monopoly case.

The first panel in the next figure compares output of the monopolist and industry output under the MPE, as a function of time.

The second panel shows analogous curves for price.

_images/mpe_vs_monopolist.png

Here parameters are the same as above for both the MPE and monopoly solutions.

The monopolist initial condition is q0=2.0 to mimic the industry initial condition q10=q20=1.0 in the MPE case.

As expected, output is higher and prices are lower under duopoly than monopoly.

66.5. Exercises#

Exercise 66.1

Replicate the pair of figures showing the comparison of output and prices for the monopolist and duopoly under MPE.

Parameters are as in duopoly_mpe.py and you can use that code to compute MPE policies under duopoly.

The optimal policy in the monopolist case can be computed using QuantEcon.py’s LQ class.

Exercise 66.2

In this exercise, we consider a slightly more sophisticated duopoly problem.

It takes the form of infinite horizon linear-quadratic game proposed by Judd [Judd, 1990].

Two firms set prices and quantities of two goods interrelated through their demand curves.

Relevant variables are defined as follows:

  • Iit = inventories of firm i at beginning of t

  • qit = production of firm i during period t

  • pit = price charged by firm i during period t

  • Sit = sales made by firm i during period t

  • Eit = costs of production of firm i during period t

  • Cit = costs of carrying inventories for firm i during t

The firms’ cost functions are

  • Cit=ci1+ci2Iit+0.5ci3Iit2

  • Eit=ei1+ei2qit+0.5ei3qit2 where eij,cij are positive scalars

Inventories obey the laws of motion

Ii,t+1=(1δ)Iit+qitSit

Demand is governed by the linear schedule

St=Dpit+b

where

  • St=[S1tS2t]

  • D is a 2×2 negative definite matrix and

  • b is a vector of constants

Firm i maximizes the undiscounted sum

limT 1T t=0T (pitSitEitCit)

We can convert this to a linear-quadratic problem by taking

uit=[pitqit]andxt=[I1tI2t1]

Decision rules for price and quantity take the form uit=Fixt.

The Markov perfect equilibrium of Judd’s model can be computed by filling in the matrices appropriately.

The exercise is to calculate these matrices and compute the following figures.

The first figure shows the dynamics of inventories for each firm when the parameters are

δ = 0.02
D = np.array([[-1, 0.5], [0.5, -1]])
b = np.array([25, 25])
c1 = c2 = np.array([1, -2, 1])
e1 = e2 = np.array([10, 10, 3])
_images/judd_fig2.png

Inventories trend to a common steady state.

If we increase the depreciation rate to δ=0.05, then we expect steady state inventories to fall.

This is indeed the case, as the next figure shows

_images/judd_fig1.png

In this exercise, reproduce the figure when δ=0.02.