{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A First Look at the Kalman Filter\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "- [A First Look at the Kalman Filter](#A-First-Look-at-the-Kalman-Filter) \n", " - [Overview](#Overview) \n", " - [The Basic Idea](#The-Basic-Idea) \n", " - [Convergence](#Convergence) \n", " - [Implementation](#Implementation) \n", " - [Exercises](#Exercises) \n", " - [Solutions](#Solutions) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to what’s in Anaconda, this lecture will need the following libraries:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "!conda install -y quantecon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "This lecture provides a simple and intuitive introduction to the Kalman filter, for those who either\n", "\n", "- have heard of the Kalman filter but don’t know how it works, or \n", "- know the Kalman filter equations, but don’t know where they come from \n", "\n", "\n", "For additional (more advanced) reading on the Kalman filter, see\n", "\n", "- [[LS18](https://python.quantecon.org/zreferences.html#id143)], section 2.7 \n", "- [[AM05](https://python.quantecon.org/zreferences.html#id103)] \n", "\n", "\n", "The second reference presents a comprehensive treatment of the Kalman filter.\n", "\n", "Required knowledge: Familiarity with matrix manipulations, multivariate normal distributions, covariance matrices, etc.\n", "\n", "We’ll need the following imports:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.rcParams[\"figure.figsize\"] = (11, 5) #set default figure size\n", "from scipy import linalg\n", "import numpy as np\n", "import matplotlib.cm as cm\n", "from quantecon import Kalman, LinearStateSpace\n", "from scipy.stats import norm\n", "from scipy.integrate import quad\n", "from numpy.random import multivariate_normal\n", "from scipy.linalg import eigvals" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Basic Idea\n", "\n", "The Kalman filter has many applications in economics, but for now\n", "let’s pretend that we are rocket scientists.\n", "\n", "A missile has been launched from country Y and our mission is to track it.\n", "\n", "Let $x \\in \\mathbb{R}^2$ denote the current location of the missile—a\n", "pair indicating latitude-longitude coordinates on a map.\n", "\n", "At the present moment in time, the precise location $x$ is unknown, but\n", "we do have some beliefs about $x$.\n", "\n", "One way to summarize our knowledge is a point prediction $\\hat x$\n", "\n", "- But what if the President wants to know the probability that the missile is currently over the Sea of Japan? \n", "- Then it is better to summarize our initial beliefs with a bivariate probability density $p$ \n", " - $\\int_E p(x)dx$ indicates the probability that we attach to the missile being in region $E$. \n", "\n", "\n", "The density $p$ is called our *prior* for the random variable $x$.\n", "\n", "To keep things tractable in our example, we assume that our prior is Gaussian.\n", "\n", "In particular, we take\n", "\n", "\n", "\n", "$$\n", "p = N(\\hat x, \\Sigma) \\tag{18.1}\n", "$$\n", "\n", "where $\\hat x$ is the mean of the distribution and $\\Sigma$ is a\n", "$2 \\times 2$ covariance matrix. In our simulations, we will suppose that\n", "\n", "\n", "\n", "$$\n", "\\hat x\n", "= \\left(\n", "\\begin{array}{c}\n", " 0.2 \\\\\n", " -0.2\n", "\\end{array}\n", " \\right),\n", "\\qquad\n", "\\Sigma\n", "= \\left(\n", "\\begin{array}{cc}\n", " 0.4 & 0.3 \\\\\n", " 0.3 & 0.45\n", "\\end{array}\n", " \\right) \\tag{18.2}\n", "$$\n", "\n", "This density $p(x)$ is shown below as a contour map, with the center of the red ellipse being equal to $\\hat x$." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Set up the Gaussian prior density p\n", "Σ = [[0.4, 0.3], [0.3, 0.45]]\n", "Σ = np.matrix(Σ)\n", "x_hat = np.matrix([0.2, -0.2]).T\n", "# Define the matrices G and R from the equation y = G x + N(0, R)\n", "G = [[1, 0], [0, 1]]\n", "G = np.matrix(G)\n", "R = 0.5 * Σ\n", "# The matrices A and Q\n", "A = [[1.2, 0], [0, -0.2]]\n", "A = np.matrix(A)\n", "Q = 0.3 * Σ\n", "# The observed value of y\n", "y = np.matrix([2.3, -1.9]).T\n", "\n", "# Set up grid for plotting\n", "x_grid = np.linspace(-1.5, 2.9, 100)\n", "y_grid = np.linspace(-3.1, 1.7, 100)\n", "X, Y = np.meshgrid(x_grid, y_grid)\n", "\n", "def bivariate_normal(x, y, σ_x=1.0, σ_y=1.0, μ_x=0.0, μ_y=0.0, σ_xy=0.0):\n", " \"\"\"\n", " Compute and return the probability density function of bivariate normal\n", " distribution of normal random variables x and y\n", "\n", " Parameters\n", " ----------\n", " x : array_like(float)\n", " Random variable\n", "\n", " y : array_like(float)\n", " Random variable\n", "\n", " σ_x : array_like(float)\n", " Standard deviation of random variable x\n", "\n", " σ_y : array_like(float)\n", " Standard deviation of random variable y\n", "\n", " μ_x : scalar(float)\n", " Mean value of random variable x\n", "\n", " μ_y : scalar(float)\n", " Mean value of random variable y\n", "\n", " σ_xy : array_like(float)\n", " Covariance of random variables x and y\n", "\n", " \"\"\"\n", "\n", " x_μ = x - μ_x\n", " y_μ = y - μ_y\n", "\n", " ρ = σ_xy / (σ_x * σ_y)\n", " z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 - 2 * ρ * x_μ * y_μ / (σ_x * σ_y)\n", " denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 - ρ**2)\n", " return np.exp(-z / (2 * (1 - ρ**2))) / denom\n", "\n", "def gen_gaussian_plot_vals(μ, C):\n", " \"Z values for plotting the bivariate Gaussian N(μ, C)\"\n", " m_x, m_y = float(μ), float(μ)\n", " s_x, s_y = np.sqrt(C[0, 0]), np.sqrt(C[1, 1])\n", " s_xy = C[0, 1]\n", " return bivariate_normal(X, Y, s_x, s_y, m_x, m_y, s_xy)\n", "\n", "# Plot the figure\n", "\n", "fig, ax = plt.subplots(figsize=(10, 8))\n", "ax.grid()\n", "\n", "Z = gen_gaussian_plot_vals(x_hat, Σ)\n", "ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)\n", "cs = ax.contour(X, Y, Z, 6, colors=\"black\")\n", "ax.clabel(cs, inline=1, fontsize=10)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Filtering Step\n", "\n", "We are now presented with some good news and some bad news.\n", "\n", "The good news is that the missile has been located by our sensors, which report that the current location is $y = (2.3, -1.9)$.\n", "\n", "The next figure shows the original prior $p(x)$ and the new reported\n", "location $y$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10, 8))\n", "ax.grid()\n", "\n", "Z = gen_gaussian_plot_vals(x_hat, Σ)\n", "ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)\n", "cs = ax.contour(X, Y, Z, 6, colors=\"black\")\n", "ax.clabel(cs, inline=1, fontsize=10)\n", "ax.text(float(y), float(y), \"$y$\", fontsize=20, color=\"black\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The bad news is that our sensors are imprecise.\n", "\n", "In particular, we should interpret the output of our sensor not as\n", "$y=x$, but rather as\n", "\n", "\n", "\n", "$$\n", "y = G x + v, \\quad \\text{where} \\quad v \\sim N(0, R) \\tag{18.3}\n", "$$\n", "\n", "Here $G$ and $R$ are $2 \\times 2$ matrices with $R$\n", "positive definite. Both are assumed known, and the noise term $v$ is assumed\n", "to be independent of $x$.\n", "\n", "How then should we combine our prior $p(x) = N(\\hat x, \\Sigma)$ and this\n", "new information $y$ to improve our understanding of the location of the\n", "missile?\n", "\n", "As you may have guessed, the answer is to use Bayes’ theorem, which tells\n", "us to update our prior $p(x)$ to $p(x \\,|\\, y)$ via\n", "\n", "$$\n", "p(x \\,|\\, y) = \\frac{p(y \\,|\\, x) \\, p(x)} {p(y)}\n", "$$\n", "\n", "where $p(y) = \\int p(y \\,|\\, x) \\, p(x) dx$.\n", "\n", "In solving for $p(x \\,|\\, y)$, we observe that\n", "\n", "- $p(x) = N(\\hat x, \\Sigma)$. \n", "- In view of [(18.3)](#equation-kl-measurement-model), the conditional density $p(y \\,|\\, x)$ is $N(Gx, R)$. \n", "- $p(y)$ does not depend on $x$, and enters into the calculations only as a normalizing constant. \n", "\n", "\n", "Because we are in a linear and Gaussian framework, the updated density can be computed by calculating population linear regressions.\n", "\n", "In particular, the solution is known  to be\n", "\n", "$$\n", "p(x \\,|\\, y) = N(\\hat x^F, \\Sigma^F)\n", "$$\n", "\n", "where\n", "\n", "\n", "\n", "$$\n", "\\hat x^F := \\hat x + \\Sigma G' (G \\Sigma G' + R)^{-1}(y - G \\hat x)\n", "\\quad \\text{and} \\quad\n", "\\Sigma^F := \\Sigma - \\Sigma G' (G \\Sigma G' + R)^{-1} G \\Sigma \\tag{18.4}\n", "$$\n", "\n", "Here $\\Sigma G' (G \\Sigma G' + R)^{-1}$ is the matrix of population regression coefficients of the hidden object $x - \\hat x$ on the surprise $y - G \\hat x$.\n", "\n", "This new density $p(x \\,|\\, y) = N(\\hat x^F, \\Sigma^F)$ is shown in the next figure via contour lines and the color map.\n", "\n", "The original density is left in as contour lines for comparison" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10, 8))\n", "ax.grid()\n", "\n", "Z = gen_gaussian_plot_vals(x_hat, Σ)\n", "cs1 = ax.contour(X, Y, Z, 6, colors=\"black\")\n", "ax.clabel(cs1, inline=1, fontsize=10)\n", "M = Σ * G.T * linalg.inv(G * Σ * G.T + R)\n", "x_hat_F = x_hat + M * (y - G * x_hat)\n", "Σ_F = Σ - M * G * Σ\n", "new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)\n", "cs2 = ax.contour(X, Y, new_Z, 6, colors=\"black\")\n", "ax.clabel(cs2, inline=1, fontsize=10)\n", "ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)\n", "ax.text(float(y), float(y), \"$y$\", fontsize=20, color=\"black\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our new density twists the prior $p(x)$ in a direction determined by the new\n", "information $y - G \\hat x$.\n", "\n", "In generating the figure, we set $G$ to the identity matrix and $R = 0.5 \\Sigma$ for $\\Sigma$ defined in [(18.2)](#equation-kalman-dhxs).\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Forecast Step\n", "\n", "What have we achieved so far?\n", "\n", "We have obtained probabilities for the current location of the state (missile) given prior and current information.\n", "\n", "This is called “filtering” rather than forecasting because we are filtering\n", "out noise rather than looking into the future.\n", "\n", "- $p(x \\,|\\, y) = N(\\hat x^F, \\Sigma^F)$ is called the *filtering distribution* \n", "\n", "\n", "But now let’s suppose that we are given another task: to predict the location of the missile after one unit of time (whatever that may be) has elapsed.\n", "\n", "To do this we need a model of how the state evolves.\n", "\n", "Let’s suppose that we have one, and that it’s linear and Gaussian. In particular,\n", "\n", "\n", "\n", "$$\n", "x_{t+1} = A x_t + w_{t+1}, \\quad \\text{where} \\quad w_t \\sim N(0, Q) \\tag{18.5}\n", "$$\n", "\n", "Our aim is to combine this law of motion and our current distribution $p(x \\,|\\, y) = N(\\hat x^F, \\Sigma^F)$ to come up with a new *predictive* distribution for the location in one unit of time.\n", "\n", "In view of [(18.5)](#equation-kl-xdynam), all we have to do is introduce a random vector $x^F \\sim N(\\hat x^F, \\Sigma^F)$ and work out the distribution of $A x^F + w$ where $w$ is independent of $x^F$ and has distribution $N(0, Q)$.\n", "\n", "Since linear combinations of Gaussians are Gaussian, $A x^F + w$ is Gaussian.\n", "\n", "Elementary calculations and the expressions in [(18.4)](#equation-kl-filter-exp) tell us that\n", "\n", "$$\n", "\\mathbb{E} [A x^F + w]\n", "= A \\mathbb{E} x^F + \\mathbb{E} w\n", "= A \\hat x^F\n", "= A \\hat x + A \\Sigma G' (G \\Sigma G' + R)^{-1}(y - G \\hat x)\n", "$$\n", "\n", "and\n", "\n", "$$\n", "\\operatorname{Var} [A x^F + w]\n", "= A \\operatorname{Var}[x^F] A' + Q\n", "= A \\Sigma^F A' + Q\n", "= A \\Sigma A' - A \\Sigma G' (G \\Sigma G' + R)^{-1} G \\Sigma A' + Q\n", "$$\n", "\n", "The matrix $A \\Sigma G' (G \\Sigma G' + R)^{-1}$ is often written as\n", "$K_{\\Sigma}$ and called the *Kalman gain*.\n", "\n", "- The subscript $\\Sigma$ has been added to remind us that $K_{\\Sigma}$ depends on $\\Sigma$, but not $y$ or $\\hat x$. \n", "\n", "\n", "Using this notation, we can summarize our results as follows.\n", "\n", "Our updated prediction is the density $N(\\hat x_{new}, \\Sigma_{new})$ where\n", "\n", "\n", "\n", "\n", "\\begin{aligned}\n", " \\hat x_{new} &:= A \\hat x + K_{\\Sigma} (y - G \\hat x) \\\\\n", " \\Sigma_{new} &:= A \\Sigma A' - K_{\\Sigma} G \\Sigma A' + Q \\nonumber\n", "\\end{aligned} \\tag{18.6}\n", "\n", "\n", "- The density $p_{new}(x) = N(\\hat x_{new}, \\Sigma_{new})$ is called the *predictive distribution* \n", "\n", "\n", "The predictive distribution is the new density shown in the following figure, where\n", "the update has used parameters.\n", "\n", "$$\n", "A\n", "= \\left(\n", "\\begin{array}{cc}\n", " 1.2 & 0.0 \\\\\n", " 0.0 & -0.2\n", "\\end{array}\n", " \\right),\n", " \\qquad\n", "Q = 0.3 * \\Sigma\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "fig, ax = plt.subplots(figsize=(10, 8))\n", "ax.grid()\n", "\n", "# Density 1\n", "Z = gen_gaussian_plot_vals(x_hat, Σ)\n", "cs1 = ax.contour(X, Y, Z, 6, colors=\"black\")\n", "ax.clabel(cs1, inline=1, fontsize=10)\n", "\n", "# Density 2\n", "M = Σ * G.T * linalg.inv(G * Σ * G.T + R)\n", "x_hat_F = x_hat + M * (y - G * x_hat)\n", "Σ_F = Σ - M * G * Σ\n", "Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)\n", "cs2 = ax.contour(X, Y, Z_F, 6, colors=\"black\")\n", "ax.clabel(cs2, inline=1, fontsize=10)\n", "\n", "# Density 3\n", "new_x_hat = A * x_hat_F\n", "new_Σ = A * Σ_F * A.T + Q\n", "new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)\n", "cs3 = ax.contour(X, Y, new_Z, 6, colors=\"black\")\n", "ax.clabel(cs3, inline=1, fontsize=10)\n", "ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)\n", "ax.text(float(y), float(y), \"$y$\", fontsize=20, color=\"black\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Recursive Procedure\n", "\n", "\n", "\n", "Let’s look back at what we’ve done.\n", "\n", "We started the current period with a prior $p(x)$ for the location $x$ of the missile.\n", "\n", "We then used the current measurement $y$ to update to $p(x \\,|\\, y)$.\n", "\n", "Finally, we used the law of motion [(18.5)](#equation-kl-xdynam) for $\\{x_t\\}$ to update to $p_{new}(x)$.\n", "\n", "If we now step into the next period, we are ready to go round again, taking $p_{new}(x)$\n", "as the current prior.\n", "\n", "Swapping notation $p_t(x)$ for $p(x)$ and $p_{t+1}(x)$ for $p_{new}(x)$, the full recursive procedure is:\n", "\n", "1. Start the current period with prior $p_t(x) = N(\\hat x_t, \\Sigma_t)$. \n", "1. Observe current measurement $y_t$. \n", "1. Compute the filtering distribution $p_t(x \\,|\\, y) = N(\\hat x_t^F, \\Sigma_t^F)$ from $p_t(x)$ and $y_t$, applying Bayes rule and the conditional distribution [(18.3)](#equation-kl-measurement-model). \n", "1. Compute the predictive distribution $p_{t+1}(x) = N(\\hat x_{t+1}, \\Sigma_{t+1})$ from the filtering distribution and [(18.5)](#equation-kl-xdynam). \n", "1. Increment $t$ by one and go to step 1. \n", "\n", "\n", "Repeating [(18.6)](#equation-kl-mlom0), the dynamics for $\\hat x_t$ and $\\Sigma_t$ are as follows\n", "\n", "\n", "\n", "\n", "\\begin{aligned}\n", " \\hat x_{t+1} &= A \\hat x_t + K_{\\Sigma_t} (y_t - G \\hat x_t) \\\\\n", " \\Sigma_{t+1} &= A \\Sigma_t A' - K_{\\Sigma_t} G \\Sigma_t A' + Q \\nonumber\n", "\\end{aligned} \\tag{18.7}\n", "\n", "\n", "These are the standard dynamic equations for the Kalman filter (see, for example, [[LS18](https://python.quantecon.org/zreferences.html#id143)], page 58).\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convergence\n", "\n", "The matrix $\\Sigma_t$ is a measure of the uncertainty of our prediction $\\hat x_t$ of $x_t$.\n", "\n", "Apart from special cases, this uncertainty will never be fully resolved, regardless of how much time elapses.\n", "\n", "One reason is that our prediction $\\hat x_t$ is made based on information available at $t-1$, not $t$.\n", "\n", "Even if we know the precise value of $x_{t-1}$ (which we don’t), the transition equation [(18.5)](#equation-kl-xdynam) implies that $x_t = A x_{t-1} + w_t$.\n", "\n", "Since the shock $w_t$ is not observable at $t-1$, any time $t-1$ prediction of $x_t$ will incur some error (unless $w_t$ is degenerate).\n", "\n", "However, it is certainly possible that $\\Sigma_t$ converges to a constant matrix as $t \\to \\infty$.\n", "\n", "To study this topic, let’s expand the second equation in [(18.7)](#equation-kalman-lom):\n", "\n", "\n", "\n", "$$\n", "\\Sigma_{t+1} = A \\Sigma_t A' - A \\Sigma_t G' (G \\Sigma_t G' + R)^{-1} G \\Sigma_t A' + Q \\tag{18.8}\n", "$$\n", "\n", "This is a nonlinear difference equation in $\\Sigma_t$.\n", "\n", "A fixed point of [(18.8)](#equation-kalman-sdy) is a constant matrix $\\Sigma$ such that\n", "\n", "\n", "\n", "$$\n", "\\Sigma = A \\Sigma A' - A \\Sigma G' (G \\Sigma G' + R)^{-1} G \\Sigma A' + Q \\tag{18.9}\n", "$$\n", "\n", "Equation [(18.8)](#equation-kalman-sdy) is known as a discrete-time Riccati difference equation.\n", "\n", "Equation [(18.9)](#equation-kalman-dare) is known as a [discrete-time algebraic Riccati equation](https://en.wikipedia.org/wiki/Algebraic_Riccati_equation).\n", "\n", "Conditions under which a fixed point exists and the sequence $\\{\\Sigma_t\\}$ converges to it are discussed in [[AHMS96](https://python.quantecon.org/zreferences.html#id105)] and [[AM05](https://python.quantecon.org/zreferences.html#id103)], chapter 4.\n", "\n", "A sufficient (but not necessary) condition is that all the eigenvalues $\\lambda_i$ of $A$ satisfy $|\\lambda_i| < 1$ (cf. e.g., [[AM05](https://python.quantecon.org/zreferences.html#id103)], p. 77).\n", "\n", "(This strong condition assures that the unconditional distribution of $x_t$ converges as $t \\rightarrow + \\infty$.)\n", "\n", "In this case, for any initial choice of $\\Sigma_0$ that is both non-negative and symmetric, the sequence $\\{\\Sigma_t\\}$ in [(18.8)](#equation-kalman-sdy) converges to a non-negative symmetric matrix $\\Sigma$ that solves [(18.9)](#equation-kalman-dare)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation\n", "\n", "\n", "\n", "The class Kalman from the [QuantEcon.py](http://quantecon.org/quantecon-py) package implements the Kalman filter\n", "\n", "- Instance data consists of: \n", " - the moments $(\\hat x_t, \\Sigma_t)$ of the current prior. \n", " - An instance of the [LinearStateSpace](https://github.com/QuantEcon/QuantEcon.py/blob/master/quantecon/lss.py) class from [QuantEcon.py](http://quantecon.org/quantecon-py). \n", "\n", "\n", "The latter represents a linear state space model of the form\n", "\n", "\n", "\\begin{aligned}\n", " x_{t+1} & = A x_t + C w_{t+1}\n", " \\\\\n", " y_t & = G x_t + H v_t\n", "\\end{aligned}\n", "\n", "\n", "where the shocks $w_t$ and $v_t$ are IID standard normals.\n", "\n", "To connect this with the notation of this lecture we set\n", "\n", "$$\n", "Q := CC' \\quad \\text{and} \\quad R := HH'\n", "$$\n", "\n", "- The class Kalman from the [QuantEcon.py](http://quantecon.org/quantecon-py) package has a number of methods, some that we will wait to use until we study more advanced applications in subsequent lectures. \n", "- Methods pertinent for this lecture are: \n", " - prior_to_filtered, which updates $(\\hat x_t, \\Sigma_t)$ to $(\\hat x_t^F, \\Sigma_t^F)$ \n", " - filtered_to_forecast, which updates the filtering distribution to the predictive distribution – which becomes the new prior $(\\hat x_{t+1}, \\Sigma_{t+1})$ \n", " - update, which combines the last two methods \n", " - a stationary_values, which computes the solution to [(18.9)](#equation-kalman-dare) and the corresponding (stationary) Kalman gain \n", "\n", "\n", "You can view the program [on GitHub](https://github.com/QuantEcon/QuantEcon.py/blob/master/quantecon/kalman.py)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "Consider the following simple application of the Kalman filter, loosely based\n", "on [[LS18](https://python.quantecon.org/zreferences.html#id143)], section 2.9.2.\n", "\n", "Suppose that\n", "\n", "- all variables are scalars \n", "- the hidden state $\\{x_t\\}$ is in fact constant, equal to some $\\theta \\in \\mathbb{R}$ unknown to the modeler \n", "\n", "\n", "State dynamics are therefore given by [(18.5)](#equation-kl-xdynam) with $A=1$, $Q=0$ and $x_0 = \\theta$.\n", "\n", "The measurement equation is $y_t = \\theta + v_t$ where $v_t$ is $N(0,1)$ and IID.\n", "\n", "The task of this exercise to simulate the model and, using the code from kalman.py, plot the first five predictive densities $p_t(x) = N(\\hat x_t, \\Sigma_t)$.\n", "\n", "As shown in [[LS18](https://python.quantecon.org/zreferences.html#id143)], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the unknown value $\\theta$.\n", "\n", "In the simulation, take $\\theta = 10$, $\\hat x_0 = 8$ and $\\Sigma_0 = 1$.\n", "\n", "Your figure should – modulo randomness – look something like this\n", "\n", "![https://python.quantecon.org/_static/lecture_specific/kalman/kl_ex1_fig.png](https://python.quantecon.org/_static/lecture_specific/kalman/kl_ex1_fig.png)\n", "\n", " \n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "The preceding figure gives some support to the idea that probability mass\n", "converges to $\\theta$.\n", "\n", "To get a better idea, choose a small $\\epsilon > 0$ and calculate\n", "\n", "$$\n", "z_t := 1 - \\int_{\\theta - \\epsilon}^{\\theta + \\epsilon} p_t(x) dx\n", "$$\n", "\n", "for $t = 0, 1, 2, \\ldots, T$.\n", "\n", "Plot $z_t$ against $T$, setting $\\epsilon = 0.1$ and $T = 600$.\n", "\n", "Your figure should show error erratically declining something like this\n", "\n", "![https://python.quantecon.org/_static/lecture_specific/kalman/kl_ex2_fig.png](https://python.quantecon.org/_static/lecture_specific/kalman/kl_ex2_fig.png)\n", "\n", " \n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3\n", "\n", "As discussed [above](#kalman-convergence), if the shock sequence $\\{w_t\\}$ is not degenerate, then it is not in general possible to predict $x_t$ without error at time $t-1$ (and this would be the case even if we could observe $x_{t-1}$).\n", "\n", "Let’s now compare the prediction $\\hat x_t$ made by the Kalman filter\n", "against a competitor who **is** allowed to observe $x_{t-1}$.\n", "\n", "This competitor will use the conditional expectation $\\mathbb E[ x_t\n", "\\,|\\, x_{t-1}]$, which in this case is $A x_{t-1}$.\n", "\n", "The conditional expectation is known to be the optimal prediction method in terms of minimizing mean squared error.\n", "\n", "(More precisely, the minimizer of $\\mathbb E \\, \\| x_t - g(x_{t-1}) \\|^2$ with respect to $g$ is $g^*(x_{t-1}) := \\mathbb E[ x_t \\,|\\, x_{t-1}]$)\n", "\n", "Thus we are comparing the Kalman filter against a competitor who has more\n", "information (in the sense of being able to observe the latent state) and\n", "behaves optimally in terms of minimizing squared error.\n", "\n", "Our horse race will be assessed in terms of squared error.\n", "\n", "In particular, your task is to generate a graph plotting observations of both $\\| x_t - A x_{t-1} \\|^2$ and $\\| x_t - \\hat x_t \\|^2$ against $t$ for $t = 1, \\ldots, 50$.\n", "\n", "For the parameters, set $G = I, R = 0.5 I$ and $Q = 0.3 I$, where $I$ is\n", "the $2 \\times 2$ identity.\n", "\n", "Set\n", "\n", "$$\n", "A\n", "= \\left(\n", "\\begin{array}{cc}\n", " 0.5 & 0.4 \\\\\n", " 0.6 & 0.3\n", "\\end{array}\n", " \\right)\n", "$$\n", "\n", "To initialize the prior density, set\n", "\n", "$$\n", "\\Sigma_0\n", "= \\left(\n", "\\begin{array}{cc}\n", " 0.9 & 0.3 \\\\\n", " 0.3 & 0.9\n", "\\end{array}\n", " \\right)\n", "$$\n", "\n", "and $\\hat x_0 = (8, 8)$.\n", "\n", "Finally, set $x_0 = (0, 0)$.\n", "\n", "You should end up with a figure similar to the following (modulo randomness)\n", "\n", "![https://python.quantecon.org/_static/lecture_specific/kalman/kalman_ex3.png](https://python.quantecon.org/_static/lecture_specific/kalman/kalman_ex3.png)\n", "\n", " \n", "Observe how, after an initial learning period, the Kalman filter performs quite well, even relative to the competitor who predicts optimally with knowledge of the latent state.\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 4\n", "\n", "Try varying the coefficient $0.3$ in $Q = 0.3 I$ up and down.\n", "\n", "Observe how the diagonal values in the stationary solution $\\Sigma$ (see [(18.9)](#equation-kalman-dare)) increase and decrease in line with this coefficient.\n", "\n", "The interpretation is that more randomness in the law of motion for $x_t$ causes more (permanent) uncertainty in prediction." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solutions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Parameters\n", "θ = 10 # Constant value of state x_t\n", "A, C, G, H = 1, 0, 1, 1\n", "ss = LinearStateSpace(A, C, G, H, mu_0=θ)\n", "\n", "# Set prior, initialize kalman filter\n", "x_hat_0, Σ_0 = 8, 1\n", "kalman = Kalman(ss, x_hat_0, Σ_0)\n", "\n", "# Draw observations of y from state space model\n", "N = 5\n", "x, y = ss.simulate(N)\n", "y = y.flatten()\n", "\n", "# Set up plot\n", "fig, ax = plt.subplots(figsize=(10,8))\n", "xgrid = np.linspace(θ - 5, θ + 2, 200)\n", "\n", "for i in range(N):\n", " # Record the current predicted mean and variance\n", " m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]\n", " # Plot, update filter\n", " ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=f'$t={i}$')\n", " kalman.update(y[i])\n", "\n", "ax.set_title(f'First {N} densities when $\\\\theta = {θ:.1f}$')\n", "ax.legend(loc='upper left')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "ϵ = 0.1\n", "θ = 10 # Constant value of state x_t\n", "A, C, G, H = 1, 0, 1, 1\n", "ss = LinearStateSpace(A, C, G, H, mu_0=θ)\n", "\n", "x_hat_0, Σ_0 = 8, 1\n", "kalman = Kalman(ss, x_hat_0, Σ_0)\n", "\n", "T = 600\n", "z = np.empty(T)\n", "x, y = ss.simulate(T)\n", "y = y.flatten()\n", "\n", "for t in range(T):\n", " # Record the current predicted mean and variance and plot their densities\n", " m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]\n", "\n", " f = lambda x: norm.pdf(x, loc=m, scale=np.sqrt(v))\n", " integral, error = quad(f, θ - ϵ, θ + ϵ)\n", " z[t] = 1 - integral\n", "\n", " kalman.update(y[t])\n", "\n", "fig, ax = plt.subplots(figsize=(9, 7))\n", "ax.set_ylim(0, 1)\n", "ax.set_xlim(0, T)\n", "ax.plot(range(T), z)\n", "ax.fill_between(range(T), np.zeros(T), z, color=\"blue\", alpha=0.2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "# Define A, C, G, H\n", "G = np.identity(2)\n", "H = np.sqrt(0.5) * np.identity(2)\n", "\n", "A = [[0.5, 0.4],\n", " [0.6, 0.3]]\n", "C = np.sqrt(0.3) * np.identity(2)\n", "\n", "# Set up state space mode, initial value x_0 set to zero\n", "ss = LinearStateSpace(A, C, G, H, mu_0 = np.zeros(2))\n", "\n", "# Define the prior density\n", "Σ = [[0.9, 0.3],\n", " [0.3, 0.9]]\n", "Σ = np.array(Σ)\n", "x_hat = np.array([8, 8])\n", "\n", "# Initialize the Kalman filter\n", "kn = Kalman(ss, x_hat, Σ)\n", "\n", "# Print eigenvalues of A\n", "print(\"Eigenvalues of A:\")\n", "print(eigvals(A))\n", "\n", "# Print stationary Σ\n", "S, K = kn.stationary_values()\n", "print(\"Stationary prediction error variance:\")\n", "print(S)\n", "\n", "# Generate the plot\n", "T = 50\n", "x, y = ss.simulate(T)\n", "\n", "e1 = np.empty(T-1)\n", "e2 = np.empty(T-1)\n", "\n", "for t in range(1, T):\n", " kn.update(y[:,t])\n", " e1[t-1] = np.sum((x[:, t] - kn.x_hat.flatten())**2)\n", " e2[t-1] = np.sum((x[:, t] - A @ x[:, t-1])**2)\n", "\n", "fig, ax = plt.subplots(figsize=(9,6))\n", "ax.plot(range(1, T), e1, 'k-', lw=2, alpha=0.6,\n", " label='Kalman filter error')\n", "ax.plot(range(1, T), e2, 'g-', lw=2, alpha=0.6,\n", " label='Conditional expectation error')\n", "ax.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

 See, for example, page 93 of [[Bis06](https://python.quantecon.org/zreferences.html#id108)]. To get from his expressions to the ones used above, you will also need to apply the [Woodbury matrix identity](https://en.wikipedia.org/wiki/Woodbury_matrix_identity)." ] } ], "metadata": { "date": 1627535051.5552385, "filename": "kalman.md", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "title": "A First Look at the Kalman Filter" }, "nbformat": 4, "nbformat_minor": 4 }