53. Optimal Savings III: Stochastic Returns #

53.1. Overview #

In this lecture, we continue our study of optimal savings problems, building on Optimal Savings I: Cake Eating and Optimal Savings II: Numerical Cake Eating.

The key difference from the previous lectures is that wealth now evolves stochastically.

We can think of wealth as a harvest that regrows if we save some seeds.

Specifically, if we save and invest part of today’s harvest \(x_t\), it grows into next period’s harvest \(x_{t+1}\) according to a stochastic production process.

The extensions in this lecture introduce several new elements:

nonlinear returns to saving, through a production function, and
stochastic returns, due to shocks to production.

Despite these additions, the model remains relatively tractable.

As a first pass, we will solve the model using dynamic programming and value function iteration (VFI).

Note

In later lectures we’ll explore more efficient methods for this class of problems.

At the same time, VFI is foundational and globally convergent.

Hence we want to be sure we can use this method too.

More information on this savings problem can be found in

[Ljungqvist and Sargent, 2018], Section 3.1
EDTC, Chapter 1
[Sundaram, 1996], Chapter 12

Let’s start with some imports:

import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d
from scipy.optimize import minimize_scalar
from typing import NamedTuple, Callable

53.2. The Model #

Here we described the new model and the optimization problem.

53.2.1. Setup#

Consider an agent who owns an amount \(x_t \in \mathbb R_+ := [0, \infty)\) of a consumption good at time \(t\).

This output can either be consumed or saved and used for production.

Production is stochastic, in that it also depends on a shock \(\xi_{t+1}\) realized at the end of the current period.

Next period output is

\[ x_{t+1} := f(s_t) \xi_{t+1} \]

where \(f \colon \mathbb R_+ \to \mathbb R_+\) is the production function and

(53.1)#\[s_t = x_t - c_t \]

is current savings.

and all variables are required to be nonnegative.

In what follows,

The sequence \(\{\xi_t\}\) is assumed to be IID.
The common distribution of each \(\xi_t\) will be denoted by \(\phi\).
The production function \(f\) is assumed to be increasing and continuous.

53.2.2. Optimization#

Taking \(x_0\) as given, the agent wishes to maximize

(53.2)#\[\mathbb E \left[ \sum_{t = 0}^{\infty} \beta^t u(c_t) \right]\]

subject to

(53.3)#\[x_{t+1} = f(x_t - c_t) \xi_{t+1} \quad \text{and} \quad 0 \leq c_t \leq x_t \quad \text{for all } t\]

where

\(u\) is a bounded, continuous and strictly increasing utility function and
\(\beta \in (0, 1)\) is a discount factor.

In summary, the agent’s aim is to select a path \(c_0, c_1, c_2, \ldots\) for consumption that is

nonnegative,
feasible,
optimal, in the sense that it maximizes (53.2) relative to all other feasible consumption sequences, and
adapted, in the sense that the current action \(c_t\) depends only on current and historical outcomes, not on future outcomes such as \(\xi_{t+1}\).

In the present context

\(x_t\) is called the state variable — it summarizes the “state of the world” at the start of each period.
\(c_t\) is called the control variable — a value chosen by the agent each period after observing the state.

53.2.3. The Policy Function Approach#

One way to think about solving this problem is to look for the best policy function.

A policy function is a map from past and present observables into current action.

We’ll be particularly interested in Markov policies, which are maps from the current state \(x_t\) into a current action \(c_t\).

For dynamic programming problems such as this one, the optimal policy is always a Markov policy (see, e.g., DP1).

In other words, the current state \(x_t\) provides a sufficient statistic for the history in terms of making an optimal decision today.

In our context, a Markov policy is a function \(\sigma \colon \mathbb R_+ \to \mathbb R_+\), with the understanding that states are mapped to actions via

\[ c_t = \sigma(x_t) \quad \text{for all } t \]

In what follows, we will call \(\sigma\) a feasible consumption policy if it satisfies

(53.4)#\[0 \leq \sigma(x) \leq x \quad \text{for all} \quad x \in \mathbb R_+\]

In other words, a feasible consumption policy is a Markov policy that respects the resource constraint.

The set of all feasible consumption policies will be denoted by \(\Sigma\).

Each \(\sigma \in \Sigma\) determines a continuous state Markov process \(\{x_t\}\) for output via

(53.5)#\[x_{t+1} = f(x_t - \sigma(x_t)) \xi_{t+1}, \quad x_0 \text{ given}\]

This is the time path for output when we choose and stick with the policy \(\sigma\).

We insert this process into the objective function to get

(53.6)#\[\mathbb E \left[ \, \sum_{t = 0}^{\infty} \beta^t u(c_t) \, \right] = \mathbb E \left[ \, \sum_{t = 0}^{\infty} \beta^t u(\sigma(x_t)) \, \right]\]

This is the total expected present value of following policy \(\sigma\) forever, given initial income \(x_0\).

The aim is to select a policy that makes this number as large as possible.

The next section covers these ideas more formally.

53.2.4. Optimality#

The lifetime value \(v_{\sigma}\) associated with a given policy \(\sigma\) is the mapping defined by

(53.7)#\[v_{\sigma}(x) = \mathbb E \left[ \sum_{t = 0}^{\infty} \beta^t u(\sigma(x_t)) \right]\]

when \(\{x_t\}\) is given by (53.5) with \(x_0 = x\).

In other words, it is the lifetime value of following policy \(\sigma\) forever, starting at initial condition \(x\).

The value function is then defined as

(53.8)#\[v^*(x) := \sup_{\sigma \in \Sigma} \; v_{\sigma}(x)\]

The value function gives the maximal value that can be obtained from state \(x\), after considering all feasible policies.

A policy \(\sigma \in \Sigma\) is called optimal if it attains the supremum in (53.8) for all \(x \in \mathbb R_+\).

53.2.5. The Bellman Equation#

The following equation is called the Bellman equation associated with this dynamic programming problem.

(53.9)#\[v(x) = \max_{0 \leq c \leq x} \left\{ u(c) + \beta \int v(f(x - c) z) \phi(dz) \right\} \qquad (x \in \mathbb R_+)\]

This is a functional equation in \(v\), in the sense that a given \(v\) can either satisfy it or not satisfy it.

The term \(\int v(f(x - c) z) \phi(dz)\) can be understood as the expected next period value when

\(v\) is used to measure value
the state is \(x\)
consumption is set to \(c\)

As shown in EDTC, Theorem 10.1.11 and a range of other texts, the value function \(v^*\) satisfies the Bellman equation.

In other words, (53.9) holds when \(v=v^*\).

The intuition is that maximal value from a given state can be obtained by optimally trading off

current reward from a given action, vs
expected discounted future value of the state resulting from that action

The Bellman equation is important because it

gives us more information about the value function and
suggests a way of computing the value function, which we discuss below.

53.2.6. Greedy Policies#

The value function can be used to compute optimal policies.

Given a continuous function \(v\) on \(\mathbb R_+\), we say that \(\sigma \in \Sigma\) is \(v\)-greedy if \(\sigma(x)\) is a solution to

(53.10)#\[\max_{0 \leq c \leq x} \left\{ u(c) + \beta \int v(f(x - c) z) \phi(dz) \right\}\]

for every \(x \in \mathbb R_+\).

In other words, \(\sigma \in \Sigma\) is \(v\)-greedy if it optimally trades off current and future rewards when \(v\) is taken to be the value function.

In our setting, we have the following key result

A feasible consumption policy is optimal if and only if it is \(v^*\)-greedy.

The intuition is similar to the intuition for the Bellman equation, which was provided after (53.9).

See, for example, Theorem 10.1.11 of EDTC.

Hence, once we have a good approximation to \(v^*\), we can compute the (approximately) optimal policy by computing the corresponding greedy policy.

The advantage is that we are now solving a much lower dimensional optimization problem.

53.2.7. The Bellman Operator#

How, then, should we compute the value function?

One way is to use the so-called Bellman operator.

(The term operator is usually reserved for functions that send functions into functions!)

The Bellman operator is denoted by \(T\) and defined by

(53.11)#\[Tv(x) := \max_{0 \leq c \leq x} \left\{ u(c) + \beta \int v(f(x - c) z) \phi(dz) \right\} \qquad (x \in \mathbb R_+)\]

In other words, \(T\) sends the function \(v\) into the new function \(Tv\) defined by (53.11).

By construction, the set of solutions to the Bellman equation (53.9) exactly coincides with the set of fixed points of \(T\).

For example, if \(Tv = v\), then, for any \(x \geq 0\),

\[ v(x) = Tv(x) = \max_{0 \leq c \leq x} \left\{ u(c) + \beta \int v^*(f(x - c) z) \phi(dz) \right\} \]

which says precisely that \(v\) is a solution to the Bellman equation.

It follows that \(v^*\) is a fixed point of \(T\).

53.2.8. Review of Theoretical Results#

One can also show that \(T\) is a contraction mapping on the set of continuous bounded functions on \(\mathbb R_+\) under the supremum distance

\[ \rho(g, h) = \sup_{x \geq 0} |g(x) - h(x)| \]

See EDTC, lemma 10.1.18.

Hence, it has exactly one fixed point in this set, which we know is equal to the value function.

It follows that

The value function \(v^*\) is bounded and continuous.
Starting from any bounded and continuous \(v\), the sequence \(v, Tv, T^2v, \ldots\) generated by iteratively applying \(T\) converges uniformly to \(v^*\).

This iterative method is called value function iteration.

We also know that a feasible policy is optimal if and only if it is \(v^*\)-greedy.

It’s not too hard to show that a \(v^*\)-greedy policy exists (see EDTC, theorem 10.1.11 if you get stuck).

Hence, at least one optimal policy exists.

Our problem now is how to compute it.

53.2.9. Unbounded Utility#

The results stated above assume that \(u\) is bounded.

In practice economists often work with unbounded utility functions — and so will we.

In the unbounded setting, various optimality theories exist.

Nevertheless, their main conclusions are usually in line with those stated for the bounded case just above (as long as we drop the word “bounded”).

Note

Consult the following references for more on the unbounded case:

The lecture The Income Fluctuation Problem IV: Stochastic Returns on Assets.
Section 12.2 of EDTC.

53.3. Computation #

Let’s now look at computing the value function and the optimal policy.

Our implementation in this lecture will focus on clarity and flexibility.

(In subsequent lectures we will focus on efficiency and speed.)

We will use fitted value function iteration, which was already described in Optimal Savings II: Numerical Cake Eating.

53.3.1. Scalar Maximization#

To maximize the right hand side of the Bellman equation (53.9), we are going to use the minimize_scalar routine from SciPy.

To keep the interface tidy, we will wrap minimize_scalar in an outer function as follows:

def maximize(g, a, b, args):
    """
    Maximize the function g over the interval [a, b].

    We use the fact that the maximizer of g on any interval is
    also the minimizer of -g.  The tuple args collects any extra
    arguments to g.

    Returns the maximal value and the maximizer.
    """

    objective = lambda x: -g(x, *args)
    result = minimize_scalar(objective, bounds=(a, b), method='bounded')
    maximizer, maximum = result.x, -result.fun
    return maximizer, maximum

53.3.2. Model#

We will assume for now that \(\phi\) is the distribution of \(\xi := \exp(\mu + \nu \zeta)\) where

\(\zeta\) is standard normal,
\(\mu\) is a shock location parameter and
\(\nu\) is a shock scale parameter.

We will store the primitives of the model in a NamedTuple.

class Model(NamedTuple):
    u: Callable        # utility function
    f: Callable        # production function
    β: float           # discount factor
    μ: float           # shock location parameter
    ν: float           # shock scale parameter
    grid: np.ndarray   # state grid
    shocks: np.ndarray # shock draws


def create_model(u: Callable,
                 f: Callable,
                 β: float = 0.96,
                 μ: float = 0.0,
                 ν: float = 0.1,
                 grid_max: float = 4.0,
                 grid_size: int = 120,
                 shock_size: int = 250,
                 seed: int = 1234) -> Model:
    """
    Creates an instance of the optimal savings model.
    """
    # Set up grid
    grid = np.linspace(1e-4, grid_max, grid_size)

    # Store shocks (with a seed, so results are reproducible)
    np.random.seed(seed)
    shocks = np.exp(μ + ν * np.random.randn(shock_size))

    return Model(u, f, β, μ, ν, grid, shocks)


def state_action_value(c: float,
                       model: Model,
                       x: float,
                       v_array: np.ndarray) -> float:
    """
    Right hand side of the Bellman equation.
    """
    u, f, β, shocks = model.u, model.f, model.β, model.shocks
    grid = model.grid

    v = interp1d(grid, v_array)

    return u(c) + β * np.mean(v(f(x - c) * shocks))

In the second last line we are using linear interpolation.

In the last line, the expectation in (53.11) is computed via Monte Carlo, using the approximation

\[ \int v(f(x - c) z) \phi(dz) \approx \frac{1}{n} \sum_{i=1}^n v(f(x - c) \xi_i) \]

where \(\{\xi_i\}_{i=1}^n\) are IID draws from \(\phi\).

Monte Carlo is not always the most efficient way to compute integrals numerically but it does have some theoretical advantages in the present setting.

(For example, it preserves the contraction mapping property of the Bellman operator — see, e.g., [Pál and Stachurski, 2013].)

53.3.3. The Bellman Operator#

The next function implements the Bellman operator.

def T(v: np.ndarray, model: Model) -> tuple[np.ndarray, np.ndarray]:
    """
    The Bellman operator.  Updates the guess of the value function
    and also computes a v-greedy policy.

      * model is an instance of Model
      * v is an array representing a guess of the value function

    """
    grid = model.grid
    v_new = np.empty_like(v)
    v_greedy = np.empty_like(v)

    for i in range(len(grid)):
        x = grid[i]

        # Maximize RHS of Bellman equation at state x
        c_star, v_max = maximize(state_action_value, 1e-10, x, (model, x, v))
        v_new[i] = v_max
        v_greedy[i] = c_star

    return v_greedy, v_new

53.3.4. An Example#

Let’s suppose now that

\[ f(x-c) = (x-c)^{\alpha} \quad \text{and} \quad u(c) = \ln c \]

For this particular problem, an exact analytical solution is available (see [Ljungqvist and Sargent, 2018], section 3.1.2), with

(53.12)#\[v^*(x) = \frac{\ln (1 - \alpha \beta) }{ 1 - \beta} + \frac{(\mu + \alpha \ln (\alpha \beta))}{1 - \alpha} \left[ \frac{1}{1- \beta} - \frac{1}{1 - \alpha \beta} \right] + \frac{1}{1 - \alpha \beta} \ln x\]

and optimal consumption policy

\[ \sigma^*(x) = (1 - \alpha \beta ) x \]

It is valuable to have these closed-form solutions because it lets us check whether our code works for this particular case.

In Python, the functions above can be expressed as:

def v_star(x, α, β, μ):
    """
    True value function
    """
    c1 = np.log(1 - α * β) / (1 - β)
    c2 = (μ + α * np.log(α * β)) / (1 - α)
    c3 = 1 / (1 - β)
    c4 = 1 / (1 - α * β)
    return c1 + c2 * (c3 - c4) + c4 * np.log(x)

def σ_star(x, α, β):
    """
    True optimal policy
    """
    return (1 - α * β) * x

Next let’s create an instance of the model with the above primitives and assign it to the variable model.

α = 0.4
def fcd(s):
    return s**α

model = create_model(u=np.log, f=fcd)

Now let’s see what happens when we apply our Bellman operator to the exact solution \(v^*\) in this case.

In theory, since \(v^*\) is a fixed point, the resulting function should again be \(v^*\).

In practice, we expect some small numerical error.

grid = model.grid

v_init = v_star(grid, α, model.β, model.μ)    # Start at the solution
v_greedy, v = T(v_init, model)             # Apply T once

fig, ax = plt.subplots()
ax.set_ylim(-35, -24)
ax.plot(grid, v, lw=2, alpha=0.6, label='$Tv^*$')
ax.plot(grid, v_init, lw=2, alpha=0.6, label='$v^*$')
ax.legend()
plt.show()

_images/ce31429ed11ee82e6418a140edfe483ddaf4d68ee8fc0af0eda0fc90127f9f42.png

The two functions are essentially indistinguishable, so we are off to a good start.

Now let’s have a look at iterating with the Bellman operator, starting from an arbitrary initial condition.

The initial condition we’ll start with is, somewhat arbitrarily, \(v(x) = 5 \ln (x)\).

v = 5 * np.log(grid)  # An initial condition
n = 35

fig, ax = plt.subplots()

ax.plot(grid, v, color=plt.cm.jet(0),
        lw=2, alpha=0.6, label='Initial condition')

for i in range(n):
    v_greedy, v = T(v, model)  # Apply the Bellman operator
    ax.plot(grid, v, color=plt.cm.jet(i / n), lw=2, alpha=0.6)

ax.plot(grid, v_star(grid, α, model.β, model.μ), 'k-', lw=2,
        alpha=0.8, label='True value function')

ax.legend()
ax.set(ylim=(-40, 10), xlim=(np.min(grid), np.max(grid)))
plt.show()

_images/f42cb058d988697c3bd188f13ab259298191db0dd5040558562e76733c80ce89.png

The figure shows

the first 36 functions generated by the fitted value function iteration algorithm, with hotter colors given to higher iterates
the true value function \(v^*\) drawn in black

The sequence of iterates converges towards \(v^*\).

We are clearly getting closer.

53.3.5. Iterating to Convergence#

We can write a function that iterates until the difference is below a particular tolerance level.

def solve_model(og,
                tol=1e-4,
                max_iter=1000,
                verbose=True,
                print_skip=25):
    """
    Solve model by iterating with the Bellman operator.

    """

    # Set up loop
    v = og.u(og.grid)  # Initial condition
    i = 0
    error = tol + 1

    while i < max_iter and error > tol:
        v_greedy, v_new = T(v, og)
        error = np.max(np.abs(v - v_new))
        i += 1
        if verbose and i % print_skip == 0:
            print(f"Error at iteration {i} is {error}.")
        v = v_new

    if error > tol:
        print("Failed to converge!")
    elif verbose:
        print(f"\nConverged in {i} iterations.")

    return v_greedy, v_new

Let’s use this function to compute an approximate solution at the defaults.

v_greedy, v_solution = solve_model(model)

Error at iteration 25 is 0.40975776844489786.
Error at iteration 50 is 0.1476753540823772.
Error at iteration 75 is 0.05322171277213883.
Error at iteration 100 is 0.01918093054865011.
Error at iteration 125 is 0.006912744396018411.
Error at iteration 150 is 0.0024913303848137502.
Error at iteration 175 is 0.000897867291303811.
Error at iteration 200 is 0.00032358842396718046.
Error at iteration 225 is 0.0001166202056204213.

Converged in 229 iterations.

Now we check our result by plotting it against the true value:

fig, ax = plt.subplots()

ax.plot(grid, v_solution, lw=2, alpha=0.6,
        label='Approximate value function')

ax.plot(grid, v_star(grid, α, model.β, model.μ), lw=2,
        alpha=0.6, label='True value function')

ax.legend()
ax.set_ylim(-35, -24)
plt.show()

_images/d4783e401b55a42771e570dc3c8aa490b995ac696e10d477d907d9afe1ca5eba.png

The figure shows that we are pretty much on the money.

53.3.6. The Policy Function#

The policy v_greedy computed above corresponds to an approximate optimal policy.

The next figure compares it to the exact solution, which, as mentioned above, is \(\sigma(x) = (1 - \alpha \beta) x\)

fig, ax = plt.subplots()

ax.plot(grid, v_greedy, lw=2,
        alpha=0.6, label='approximate policy function')

ax.plot(grid, σ_star(grid, α, model.β), '--',
        lw=2, alpha=0.6, label='true policy function')

ax.legend()
plt.show()

_images/35a1da927016a3403975811924ff680aecb2a6d29de3f7de4b968b795bf2cf30.png

The figure shows that we’ve done a good job in this instance of approximating the true policy.

53.4. Exercises #

Exercise 53.1

A common choice for utility function in this kind of work is the CRRA specification

\[ u(c) = \frac{c^{1 - \gamma}} {1 - \gamma} \]

Maintaining the other defaults, including the Cobb-Douglas production function, solve the optimal savings model with this utility specification.

Setting \(\gamma = 1.5\), compute and plot an estimate of the optimal policy.

Solution

Here we set up the model.

γ = 1.5   # Preference parameter

def u_crra(c):
    return (c**(1 - γ) - 1) / (1 - γ)

model = create_model(u=u_crra, f=fcd)

Now let’s run it, with a timer.

%%time
v_greedy, v_solution = solve_model(model)

Error at iteration 25 is 0.5528151810417512.
Error at iteration 50 is 0.19923228425591333.
Error at iteration 75 is 0.07180266113800826.
Error at iteration 100 is 0.02587744333580133.
Error at iteration 125 is 0.0093261456189353.
Error at iteration 150 is 0.003361112262005861.
Error at iteration 175 is 0.0012113338242301097.
Error at iteration 200 is 0.000436560733326985.
Error at iteration 225 is 0.00015733505506432266.

Converged in 237 iterations.
CPU times: user 29.4 s, sys: 7.99 ms, total: 29.4 s
Wall time: 29.4 s

Let’s plot the policy function just to see what it looks like:

fig, ax = plt.subplots()

ax.plot(grid, v_greedy, lw=2,
        alpha=0.6, label='Approximate optimal policy')

ax.legend()
plt.show()

_images/f48bba6d52a5f402130047867a8d38328df25f04013fa328f6ec1f62aa419b23.png

53. Optimal Savings III: Stochastic Returns#

53.1. Overview#

53.2. The Model#

53.2.1. Setup#

53.2.2. Optimization#

53.2.3. The Policy Function Approach#

53.2.4. Optimality#

53.2.5. The Bellman Equation#

53.2.6. Greedy Policies#

53.2.7. The Bellman Operator#

53.2.8. Review of Theoretical Results#

53.2.9. Unbounded Utility#

53.3. Computation#

53.3.1. Scalar Maximization#

53.3.2. Model#

53.3.3. The Bellman Operator#

53.3.4. An Example#

53.3.5. Iterating to Convergence#

53.3.6. The Policy Function#

53.4. Exercises#

53. Optimal Savings III: Stochastic Returns #

53.1. Overview #

53.2. The Model #

53.3. Computation #

53.4. Exercises #