{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A Problem that Stumped Milton Friedman\n", "\n", "(and that Abraham Wald solved by inventing sequential analysis)\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Contents\n", "\n", "- [A Problem that Stumped Milton Friedman](#A-Problem-that-Stumped-Milton-Friedman) \n", " - [Overview](#Overview) \n", " - [Origin of the Problem](#Origin-of-the-Problem) \n", " - [A Dynamic Programming Approach](#A-Dynamic-Programming-Approach) \n", " - [Implementation](#Implementation) \n", " - [Analysis](#Analysis) \n", " - [Comparison with Neyman-Pearson Formulation](#Comparison-with-Neyman-Pearson-Formulation) \n", " - [Sequels](#Sequels) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to what’s in Anaconda, this lecture will need the following libraries:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "!conda install -y quantecon\n", "!pip install interpolation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "This lecture describes a statistical decision problem encountered by Milton\n", "Friedman and W. Allen Wallis during World War II when they were analysts at\n", "the U.S. Government’s Statistical Research Group at Columbia University.\n", "\n", "This problem led Abraham Wald [[Wal47](https://python.quantecon.org/zreferences.html#id69)] to formulate **sequential analysis**,\n", "an approach to statistical decision problems intimately related to dynamic programming.\n", "\n", "In this lecture, we apply dynamic programming algorithms to Friedman and Wallis and Wald’s problem.\n", "\n", "Key ideas in play will be:\n", "\n", "- Bayes’ Law \n", "- Dynamic programming \n", "- Type I and type II statistical errors \n", " - a type I error occurs when you reject a null hypothesis that is true \n", " - a type II error is when you accept a null hypothesis that is false \n", "- Abraham Wald’s **sequential probability ratio test** \n", "- The **power** of a statistical test \n", "- The **critical region** of a statistical test \n", "- A **uniformly most powerful test** \n", "\n", "\n", "We’ll begin with some imports:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from numba import jit, prange, float64, int64\n", "from numba.experimental import jitclass\n", "from interpolation import interp\n", "from math import gamma" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This lecture uses ideas studied in [this lecture](https://python.quantecon.org/likelihood_ratio_process.html), [this lecture](https://python.quantecon.org/likelihood_bayes.html).\n", "and [this lecture](https://python.quantecon.org/exchangeable.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Origin of the Problem\n", "\n", "On pages 137-139 of his 1998 book *Two Lucky People* with Rose Friedman [[FF98](https://python.quantecon.org/zreferences.html#id64)],\n", "Milton Friedman described a problem presented to him and Allen Wallis\n", "during World War II, when they worked at the US Government’s\n", "Statistical Research Group at Columbia University.\n", "\n", "Let’s listen to Milton Friedman tell us what happened\n", "\n", "> In order to understand the story, it is necessary to have an idea of a\n", "simple statistical problem, and of the standard procedure for dealing\n", "with it. The actual problem out of which sequential analysis grew will\n", "serve. The Navy has two alternative designs (say A and B) for a\n", "projectile. It wants to determine which is superior. To do so it\n", "undertakes a series of paired firings. On each round, it assigns the\n", "value 1 or 0 to A accordingly as its performance is superior or inferior\n", "to that of B and conversely 0 or 1 to B. The Navy asks the statistician\n", "how to conduct the test and how to analyze the results.\n", "\n", "\n", "> The standard statistical answer was to specify a number of firings (say\n", "1,000) and a pair of percentages (e.g., 53% and 47%) and tell the client\n", "that if A receives a 1 in more than 53% of the firings, it can be\n", "regarded as superior; if it receives a 1 in fewer than 47%, B can be\n", "regarded as superior; if the percentage is between 47% and 53%, neither\n", "can be so regarded.\n", "\n", "\n", "> When Allen Wallis was discussing such a problem with (Navy) Captain\n", "Garret L. Schyler, the captain objected that such a test, to quote from\n", "Allen’s account, may prove wasteful. If a wise and seasoned ordnance\n", "officer like Schyler were on the premises, he would see after the first\n", "few thousand or even few hundred [rounds] that the experiment need not\n", "be completed either because the new method is obviously inferior or\n", "because it is obviously superior beyond what was hoped for\n", "$\\ldots$.\n", "\n", "\n", "Friedman and Wallis struggled with the problem but, after realizing that\n", "they were not able to solve it, described the problem to Abraham Wald.\n", "\n", "That started Wald on the path that led him to *Sequential Analysis* [[Wal47](https://python.quantecon.org/zreferences.html#id69)].\n", "\n", "We’ll formulate the problem using dynamic programming." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A Dynamic Programming Approach\n", "\n", "The following presentation of the problem closely follows Dmitri\n", "Berskekas’s treatment in **Dynamic Programming and Stochastic Control** [[Ber75](https://python.quantecon.org/zreferences.html#id68)].\n", "\n", "A decision-maker observes a sequence of draws of a random variable $z$.\n", "\n", "He (or she) wants to know which of two probability distributions $f_0$ or $f_1$ governs $z$.\n", "\n", "Conditional on knowing that successive observations are drawn from distribution $f_0$, the sequence of\n", "random variables is independently and identically distributed (IID).\n", "\n", "Conditional on knowing that successive observations are drawn from distribution $f_1$, the sequence of\n", "random variables is also independently and identically distributed (IID).\n", "\n", "But the observer does not know which of the two distributions generated the sequence.\n", "\n", "For reasons explained in [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html), this means that the sequence is not\n", "IID and that the observer has something to learn, even though he knows both $f_0$ and $f_1$.\n", "\n", "The decision maker chooses a number of draws (i.e., random samples from the unknown distribution) and uses them to decide\n", "which of the two distributions is generating outcomes.\n", "\n", "He starts with prior\n", "\n", "$$\n", "\\pi_{-1} =\n", "\\mathbb P \\{ f = f_0 \\mid \\textrm{ no observations} \\} \\in (0, 1)\n", "$$\n", "\n", "After observing $k+1$ observations $z_k, z_{k-1}, \\ldots, z_0$, he updates this value to\n", "\n", "$$\n", "\\pi_k = \\mathbb P \\{ f = f_0 \\mid z_k, z_{k-1}, \\ldots, z_0 \\}\n", "$$\n", "\n", "which is calculated recursively by applying Bayes’ law:\n", "\n", "$$\n", "\\pi_{k+1} = \\frac{ \\pi_k f_0(z_{k+1})}{ \\pi_k f_0(z_{k+1}) + (1-\\pi_k) f_1 (z_{k+1}) },\n", "\\quad k = -1, 0, 1, \\ldots\n", "$$\n", "\n", "After observing $z_k, z_{k-1}, \\ldots, z_0$, the decision-maker believes\n", "that $z_{k+1}$ has probability distribution\n", "\n", "$$\n", "f_{{\\pi}_k} (v) = \\pi_k f_0(v) + (1-\\pi_k) f_1 (v) ,\n", "$$\n", "\n", "which is a mixture of distributions $f_0$ and $f_1$, with the weight\n", "on $f_0$ being the posterior probability that $f = f_0$ .\n", "\n", "To illustrate such a distribution, let’s inspect some mixtures of beta distributions.\n", "\n", "The density of a beta probability distribution with parameters $a$ and $b$ is\n", "\n", "$$\n", "f(z; a, b) = \\frac{\\Gamma(a+b) z^{a-1} (1-z)^{b-1}}{\\Gamma(a) \\Gamma(b)}\n", "\\quad \\text{where} \\quad\n", "\\Gamma(t) := \\int_{0}^{\\infty} x^{t-1} e^{-x} dx\n", "$$\n", "\n", "The next figure shows two beta distributions in the top panel.\n", "\n", "The bottom panel presents mixtures of these distributions, with various mixing probabilities $\\pi_k$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "@jit(nopython=True)\n", "def p(x, a, b):\n", " r = gamma(a + b) / (gamma(a) * gamma(b))\n", " return r * x**(a-1) * (1 - x)**(b-1)\n", "\n", "f0 = lambda x: p(x, 1, 1)\n", "f1 = lambda x: p(x, 9, 9)\n", "grid = np.linspace(0, 1, 50)\n", "\n", "fig, axes = plt.subplots(2, figsize=(10, 8))\n", "\n", "axes.set_title(\"Original Distributions\")\n", "axes.plot(grid, f0(grid), lw=2, label=\"$f_0$\")\n", "axes.plot(grid, f1(grid), lw=2, label=\"$f_1$\")\n", "\n", "axes.set_title(\"Mixtures\")\n", "for π in 0.25, 0.5, 0.75:\n", " y = π * f0(grid) + (1 - π) * f1(grid)\n", " axes.plot(y, lw=2, label=f\"$\\pi_k$ = {π}\")\n", "\n", "for ax in axes:\n", " ax.legend()\n", " ax.set(xlabel=\"$z$ values\", ylabel=\"probability of $z_k$\")\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Losses and Costs\n", "\n", "After observing $z_k, z_{k-1}, \\ldots, z_0$, the decision-maker\n", "chooses among three distinct actions:\n", "\n", "- He decides that $f = f_0$ and draws no more $z$’s \n", "- He decides that $f = f_1$ and draws no more $z$’s \n", "- He postpones deciding now and instead chooses to draw a\n", " $z_{k+1}$ \n", "\n", "\n", "Associated with these three actions, the decision-maker can suffer three\n", "kinds of losses:\n", "\n", "- A loss $L_0$ if he decides $f = f_0$ when actually\n", " $f=f_1$ \n", "- A loss $L_1$ if he decides $f = f_1$ when actually\n", " $f=f_0$ \n", "- A cost $c$ if he postpones deciding and chooses instead to draw\n", " another $z$ " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Digression on Type I and Type II Errors\n", "\n", "If we regard $f=f_0$ as a null hypothesis and $f=f_1$ as an alternative hypothesis,\n", "then $L_1$ and $L_0$ are losses associated with two types of statistical errors\n", "\n", "- a type I error is an incorrect rejection of a true null hypothesis (a “false positive”) \n", "- a type II error is a failure to reject a false null hypothesis (a “false negative”) \n", "\n", "\n", "So when we treat $f=f_0$ as the null hypothesis\n", "\n", "- We can think of $L_1$ as the loss associated with a type I\n", " error. \n", "- We can think of $L_0$ as the loss associated with a type II\n", " error. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Intuition\n", "\n", "Let’s try to guess what an optimal decision rule might look like before we go further.\n", "\n", "Suppose at some given point in time that $\\pi$ is close to 1.\n", "\n", "Then our prior beliefs and the evidence so far point strongly to $f = f_0$.\n", "\n", "If, on the other hand, $\\pi$ is close to 0, then $f = f_1$ is strongly favored.\n", "\n", "Finally, if $\\pi$ is in the middle of the interval $[0, 1]$, then we have little information in either direction.\n", "\n", "This reasoning suggests a decision rule such as the one shown in the figure\n", "\n", "![https://python.quantecon.org/_static/lecture_specific/wald_friedman/wald_dec_rule.png](https://python.quantecon.org/_static/lecture_specific/wald_friedman/wald_dec_rule.png)\n", "\n", " \n", "As we’ll see, this is indeed the correct form of the decision rule.\n", "\n", "The key problem is to determine the threshold values $\\alpha, \\beta$,\n", "which will depend on the parameters listed above.\n", "\n", "You might like to pause at this point and try to predict the impact of a\n", "parameter such as $c$ or $L_0$ on $\\alpha$ or $\\beta$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Bellman Equation\n", "\n", "Let $J(\\pi)$ be the total loss for a decision-maker with current belief $\\pi$ who chooses optimally.\n", "\n", "With some thought, you will agree that $J$ should satisfy the Bellman equation\n", "\n", "\n", "\n", "$$\n", "J(\\pi) =\n", " \\min\n", " \\left\\{\n", " (1-\\pi) L_0, \\; \\pi L_1, \\;\n", " c + \\mathbb E [ J (\\pi') ]\n", " \\right\\} \\tag{1}\n", "$$\n", "\n", "where $\\pi'$ is the random variable defined by Bayes’ Law\n", "\n", "$$\n", "\\pi' = \\kappa(z', \\pi) = \\frac{ \\pi f_0(z')}{ \\pi f_0(z') + (1-\\pi) f_1 (z') }\n", "$$\n", "\n", "when $\\pi$ is fixed and $z'$ is drawn from the current best guess, which is the distribution $f$ defined by\n", "\n", "$$\n", "f_{\\pi}(v) = \\pi f_0(v) + (1-\\pi) f_1 (v)\n", "$$\n", "\n", "In the Bellman equation, minimization is over three actions:\n", "\n", "1. Accept the hypothesis that $f = f_0$ \n", "1. Accept the hypothesis that $f = f_1$ \n", "1. Postpone deciding and draw again \n", "\n", "\n", "We can represent the Bellman equation as\n", "\n", "\n", "\n", "$$\n", "J(\\pi) =\n", "\\min \\left\\{ (1-\\pi) L_0, \\; \\pi L_1, \\; h(\\pi) \\right\\} \\tag{2}\n", "$$\n", "\n", "where $\\pi \\in [0,1]$ and\n", "\n", "- $(1-\\pi) L_0$ is the expected loss associated with accepting\n", " $f_0$ (i.e., the cost of making a type II error). \n", "- $\\pi L_1$ is the expected loss associated with accepting\n", " $f_1$ (i.e., the cost of making a type I error). \n", "- $h(\\pi) := c + \\mathbb E [J(\\pi')]$ the continuation value; i.e.,\n", " the expected cost associated with drawing one more $z$. \n", "\n", "\n", "The optimal decision rule is characterized by two numbers $\\alpha, \\beta \\in (0,1) \\times (0,1)$ that satisfy\n", "\n", "$$\n", "(1- \\pi) L_0 < \\min \\{ \\pi L_1, c + \\mathbb E [J(\\pi')] \\} \\textrm { if } \\pi \\geq \\alpha\n", "$$\n", "\n", "and\n", "\n", "$$\n", "\\pi L_1 < \\min \\{ (1-\\pi) L_0, c + \\mathbb E [J(\\pi')] \\} \\textrm { if } \\pi \\leq \\beta\n", "$$\n", "\n", "The optimal decision rule is then\n", "\n", "\n", "\\begin{aligned}\n", "\\textrm { accept } f=f_0 \\textrm{ if } \\pi \\geq \\alpha \\\\\n", "\\textrm { accept } f=f_1 \\textrm{ if } \\pi \\leq \\beta \\\\\n", "\\textrm { draw another } z \\textrm{ if } \\beta \\leq \\pi \\leq \\alpha\n", "\\end{aligned}\n", "\n", "\n", "Our aim is to compute the value function $J$, and from it the associated cutoffs $\\alpha$\n", "and $\\beta$.\n", "\n", "To make our computations simpler, using [(38.2)](#equation-optdec), we can write the continuation value $h(\\pi)$ as\n", "\n", "\n", "\n", "\n", "\\begin{aligned}\n", "h(\\pi) &= c + \\mathbb E [J(\\pi')] \\\\\n", "&= c + \\mathbb E_{\\pi'} \\min \\{ (1 - \\pi') L_0, \\pi' L_1, h(\\pi') \\} \\\\\n", "&= c + \\int \\min \\{ (1 - \\kappa(z', \\pi) ) L_0, \\kappa(z', \\pi) L_1, h(\\kappa(z', \\pi) ) \\} f_\\pi (z') dz'\n", "\\end{aligned} \\tag{3}\n", "\n", "\n", "The equality\n", "\n", "\n", "\n", "$$\n", "h(\\pi) =\n", "c + \\int \\min \\{ (1 - \\kappa(z', \\pi) ) L_0, \\kappa(z', \\pi) L_1, h(\\kappa(z', \\pi) ) \\} f_\\pi (z') dz' \\tag{4}\n", "$$\n", "\n", "can be understood as a functional equation, where $h$ is the unknown.\n", "\n", "Using the functional equation, [(38.4)](#equation-funceq), for the continuation value, we can back out\n", "optimal choices using the right side of [(38.2)](#equation-optdec).\n", "\n", "This functional equation can be solved by taking an initial guess and iterating\n", "to find a fixed point.\n", "\n", "Thus, we iterate with an operator $Q$, where\n", "\n", "$$\n", "Q h(\\pi) =\n", "c + \\int \\min \\{ (1 - \\kappa(z', \\pi) ) L_0, \\kappa(z', \\pi) L_1, h(\\kappa(z', \\pi) ) \\} f_\\pi (z') dz'\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation\n", "\n", "First, we will construct a jitclass to store the parameters of the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "wf_data = [('a0', float64), # Parameters of beta distributions\n", " ('b0', float64),\n", " ('a1', float64),\n", " ('b1', float64),\n", " ('c', float64), # Cost of another draw\n", " ('π_grid_size', int64),\n", " ('L0', float64), # Cost of selecting f0 when f1 is true\n", " ('L1', float64), # Cost of selecting f1 when f0 is true\n", " ('π_grid', float64[:]),\n", " ('mc_size', int64),\n", " ('z0', float64[:]),\n", " ('z1', float64[:])]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "@jitclass(wf_data)\n", "class WaldFriedman:\n", "\n", " def __init__(self,\n", " c=1.25,\n", " a0=1,\n", " b0=1,\n", " a1=3,\n", " b1=1.2,\n", " L0=25,\n", " L1=25,\n", " π_grid_size=200,\n", " mc_size=1000):\n", "\n", " self.a0, self.b0 = a0, b0\n", " self.a1, self.b1 = a1, b1\n", " self.c, self.π_grid_size = c, π_grid_size\n", " self.L0, self.L1 = L0, L1\n", " self.π_grid = np.linspace(0, 1, π_grid_size)\n", " self.mc_size = mc_size\n", "\n", " self.z0 = np.random.beta(a0, b0, mc_size)\n", " self.z1 = np.random.beta(a1, b1, mc_size)\n", "\n", " def f0(self, x):\n", "\n", " return p(x, self.a0, self.b0)\n", "\n", " def f1(self, x):\n", "\n", " return p(x, self.a1, self.b1)\n", "\n", " def f0_rvs(self):\n", " return np.random.beta(self.a0, self.b0)\n", "\n", " def f1_rvs(self):\n", " return np.random.beta(self.a1, self.b1)\n", "\n", " def κ(self, z, π):\n", " \"\"\"\n", " Updates π using Bayes' rule and the current observation z\n", " \"\"\"\n", "\n", " f0, f1 = self.f0, self.f1\n", "\n", " π_f0, π_f1 = π * f0(z), (1 - π) * f1(z)\n", " π_new = π_f0 / (π_f0 + π_f1)\n", "\n", " return π_new" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As in the [optimal growth lecture](https://python.quantecon.org/optgrowth.html), to approximate a continuous value function\n", "\n", "- We iterate at a finite grid of possible values of $\\pi$. \n", "- When we evaluate $\\mathbb E[J(\\pi')]$ between grid points, we use linear interpolation. \n", "\n", "\n", "We define the operator function Q below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "@jit(nopython=True, parallel=True)\n", "def Q(h, wf):\n", "\n", " c, π_grid = wf.c, wf.π_grid\n", " L0, L1 = wf.L0, wf.L1\n", " z0, z1 = wf.z0, wf.z1\n", " mc_size = wf.mc_size\n", "\n", " κ = wf.κ\n", "\n", " h_new = np.empty_like(π_grid)\n", " h_func = lambda p: interp(π_grid, h, p)\n", "\n", " for i in prange(len(π_grid)):\n", " π = π_grid[i]\n", "\n", " # Find the expected value of J by integrating over z\n", " integral_f0, integral_f1 = 0, 0\n", " for m in range(mc_size):\n", " π_0 = κ(z0[m], π) # Draw z from f0 and update π\n", " integral_f0 += min((1 - π_0) * L0, π_0 * L1, h_func(π_0))\n", "\n", " π_1 = κ(z1[m], π) # Draw z from f1 and update π\n", " integral_f1 += min((1 - π_1) * L0, π_1 * L1, h_func(π_1))\n", "\n", " integral = (π * integral_f0 + (1 - π) * integral_f1) / mc_size\n", "\n", " h_new[i] = c + integral\n", "\n", " return h_new" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To solve the model, we will iterate using Q to find the fixed point" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "@jit(nopython=True)\n", "def solve_model(wf, tol=1e-4, max_iter=1000):\n", " \"\"\"\n", " Compute the continuation value function\n", "\n", " * wf is an instance of WaldFriedman\n", " \"\"\"\n", "\n", " # Set up loop\n", " h = np.zeros(len(wf.π_grid))\n", " i = 0\n", " error = tol + 1\n", "\n", " while i < max_iter and error > tol:\n", " h_new = Q(h, wf)\n", " error = np.max(np.abs(h - h_new))\n", " i += 1\n", " h = h_new\n", "\n", " if i == max_iter:\n", " print(\"Failed to converge!\")\n", "\n", " return h_new" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis\n", "\n", "Let’s inspect outcomes.\n", "\n", "We will be using the default parameterization with distributions like so" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "wf = WaldFriedman()\n", "\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "ax.plot(wf.f0(wf.π_grid), label=\"$f_0$\")\n", "ax.plot(wf.f1(wf.π_grid), label=\"$f_1$\")\n", "ax.set(ylabel=\"probability of $z_k$\", xlabel=\"$z_k$\", title=\"Distributions\")\n", "ax.legend()\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Value Function\n", "\n", "To solve the model, we will call our solve_model function" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "h_star = solve_model(wf) # Solve the model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will also set up a function to compute the cutoffs $\\alpha$ and $\\beta$\n", "and plot these on our value function plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "@jit(nopython=True)\n", "def find_cutoff_rule(wf, h):\n", "\n", " \"\"\"\n", " This function takes a continuation value function and returns the\n", " corresponding cutoffs of where you transition between continuing and\n", " choosing a specific model\n", " \"\"\"\n", "\n", " π_grid = wf.π_grid\n", " L0, L1 = wf.L0, wf.L1\n", "\n", " # Evaluate cost at all points on grid for choosing a model\n", " payoff_f0 = (1 - π_grid) * L0\n", " payoff_f1 = π_grid * L1\n", "\n", " # The cutoff points can be found by differencing these costs with\n", " # The Bellman equation (J is always less than or equal to p_c_i)\n", " β = π_grid[np.searchsorted(\n", " payoff_f1 - np.minimum(h, payoff_f0),\n", " 1e-10)\n", " - 1]\n", " α = π_grid[np.searchsorted(\n", " np.minimum(h, payoff_f1) - payoff_f0,\n", " 1e-10)\n", " - 1]\n", "\n", " return (β, α)\n", "\n", "β, α = find_cutoff_rule(wf, h_star)\n", "cost_L0 = (1 - wf.π_grid) * wf.L0\n", "cost_L1 = wf.π_grid * wf.L1\n", "\n", "fig, ax = plt.subplots(figsize=(10, 6))\n", "\n", "ax.plot(wf.π_grid, h_star, label='continuation value')\n", "ax.plot(wf.π_grid, cost_L1, label='choose f1')\n", "ax.plot(wf.π_grid, cost_L0, label='choose f0')\n", "ax.plot(wf.π_grid,\n", " np.amin(np.column_stack([h_star, cost_L0, cost_L1]),axis=1),\n", " lw=15, alpha=0.1, color='b', label='minimum cost')\n", "\n", "ax.annotate(r\"$\\beta$\", xy=(β + 0.01, 0.5), fontsize=14)\n", "ax.annotate(r\"$\\alpha$\", xy=(α + 0.01, 0.5), fontsize=14)\n", "\n", "plt.vlines(β, 0, β * wf.L0, linestyle=\"--\")\n", "plt.vlines(α, 0, (1 - α) * wf.L1, linestyle=\"--\")\n", "\n", "ax.set(xlim=(0, 1), ylim=(0, 0.5 * max(wf.L0, wf.L1)), ylabel=\"cost\",\n", " xlabel=\"$\\pi$\", title=\"Value function\")\n", "\n", "plt.legend(borderpad=1.1)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The value function equals $\\pi L_1$ for $\\pi \\leq \\beta$, and $(1-\\pi )L_0$ for $\\pi\n", "\\geq \\alpha$.\n", "\n", "The slopes of the two linear pieces of the value function are determined by $L_1$\n", "and $- L_0$.\n", "\n", "The value function is smooth in the interior region, where the posterior\n", "probability assigned to $f_0$ is in the indecisive region $\\pi \\in (\\beta, \\alpha)$.\n", "\n", "The decision-maker continues to sample until the probability that he attaches to\n", "model $f_0$ falls below $\\beta$ or above $\\alpha$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulations\n", "\n", "The next figure shows the outcomes of 500 simulations of the decision process.\n", "\n", "On the left is a histogram of the stopping times, which equal the number of draws of $z_k$ required to make a decision.\n", "\n", "The average number of draws is around 6.6.\n", "\n", "On the right is the fraction of correct decisions at the stopping time.\n", "\n", "In this case, the decision-maker is correct 80% of the time" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "def simulate(wf, true_dist, h_star, π_0=0.5):\n", "\n", " \"\"\"\n", " This function takes an initial condition and simulates until it\n", " stops (when a decision is made)\n", " \"\"\"\n", "\n", " f0, f1 = wf.f0, wf.f1\n", " f0_rvs, f1_rvs = wf.f0_rvs, wf.f1_rvs\n", " π_grid = wf.π_grid\n", " κ = wf.κ\n", "\n", " if true_dist == \"f0\":\n", " f, f_rvs = wf.f0, wf.f0_rvs\n", " elif true_dist == \"f1\":\n", " f, f_rvs = wf.f1, wf.f1_rvs\n", "\n", " # Find cutoffs\n", " β, α = find_cutoff_rule(wf, h_star)\n", "\n", " # Initialize a couple of useful variables\n", " decision_made = False\n", " π = π_0\n", " t = 0\n", "\n", " while decision_made is False:\n", " # Maybe should specify which distribution is correct one so that\n", " # the draws come from the \"right\" distribution\n", " z = f_rvs()\n", " t = t + 1\n", " π = κ(z, π)\n", " if π < β:\n", " decision_made = True\n", " decision = 1\n", " elif π > α:\n", " decision_made = True\n", " decision = 0\n", "\n", " if true_dist == \"f0\":\n", " if decision == 0:\n", " correct = True\n", " else:\n", " correct = False\n", "\n", " elif true_dist == \"f1\":\n", " if decision == 1:\n", " correct = True\n", " else:\n", " correct = False\n", "\n", " return correct, π, t\n", "\n", "def stopping_dist(wf, h_star, ndraws=250, true_dist=\"f0\"):\n", "\n", " \"\"\"\n", " Simulates repeatedly to get distributions of time needed to make a\n", " decision and how often they are correct\n", " \"\"\"\n", "\n", " tdist = np.empty(ndraws, int)\n", " cdist = np.empty(ndraws, bool)\n", "\n", " for i in range(ndraws):\n", " correct, π, t = simulate(wf, true_dist, h_star)\n", " tdist[i] = t\n", " cdist[i] = correct\n", "\n", " return cdist, tdist\n", "\n", "def simulation_plot(wf):\n", " h_star = solve_model(wf)\n", " ndraws = 500\n", " cdist, tdist = stopping_dist(wf, h_star, ndraws)\n", "\n", " fig, ax = plt.subplots(1, 2, figsize=(16, 5))\n", "\n", " ax.hist(tdist, bins=np.max(tdist))\n", " ax.set_title(f\"Stopping times over {ndraws} replications\")\n", " ax.set(xlabel=\"time\", ylabel=\"number of stops\")\n", " ax.annotate(f\"mean = {np.mean(tdist)}\", xy=(max(tdist) / 2,\n", " max(np.histogram(tdist, bins=max(tdist))) / 2))\n", "\n", " ax.hist(cdist.astype(int), bins=2)\n", " ax.set_title(f\"Correct decisions over {ndraws} replications\")\n", " ax.annotate(f\"% correct = {np.mean(cdist)}\",\n", " xy=(0.05, ndraws / 2))\n", "\n", " plt.show()\n", "\n", "simulation_plot(wf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparative Statics\n", "\n", "Now let’s consider the following exercise.\n", "\n", "We double the cost of drawing an additional observation.\n", "\n", "Before you look, think about what will happen:\n", "\n", "- Will the decision-maker be correct more or less often? \n", "- Will he make decisions sooner or later? " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "hide-output": false }, "outputs": [], "source": [ "wf = WaldFriedman(c=2.5)\n", "simulation_plot(wf)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Increased cost per draw has induced the decision-maker to take fewer draws before deciding.\n", "\n", "Because he decides with fewer draws, the percentage of time he is correct drops.\n", "\n", "This leads to him having a higher expected loss when he puts equal weight on both models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Notebook Implementation\n", "\n", "To facilitate comparative statics, we provide\n", "a [Jupyter notebook](https://nbviewer.jupyter.org/github/QuantEcon/lecture-python-advanced.notebooks/blob/master/wald_friedman.ipynb) that\n", "generates the same plots, but with sliders.\n", "\n", "With these sliders, you can adjust parameters and immediately observe\n", "\n", "- effects on the smoothness of the value function in the indecisive middle range\n", " as we increase the number of grid points in the piecewise linear approximation. \n", "- effects of different settings for the cost parameters $L_0, L_1, c$, the\n", " parameters of two beta distributions $f_0$ and $f_1$, and the number\n", " of points and linear functions $m$ to use in the piece-wise continuous approximation to the value function. \n", "- various simulations from $f_0$ and associated distributions of waiting times to making a decision. \n", "- associated histograms of correct and incorrect decisions. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison with Neyman-Pearson Formulation\n", "\n", "For several reasons, it is useful to describe the theory underlying the test\n", "that Navy Captain G. S. Schuyler had been told to use and that led him\n", "to approach Milton Friedman and Allan Wallis to convey his conjecture\n", "that superior practical procedures existed.\n", "\n", "Evidently, the Navy had told\n", "Captail Schuyler to use what it knew to be a state-of-the-art\n", "Neyman-Pearson test.\n", "\n", "We’ll rely on Abraham Wald’s [[Wal47](https://python.quantecon.org/zreferences.html#id69)] elegant summary of Neyman-Pearson theory.\n", "\n", "For our purposes, watch for there features of the setup:\n", "\n", "- the assumption of a *fixed* sample size $n$ \n", "- the application of laws of large numbers, conditioned on alternative\n", " probability models, to interpret the probabilities $\\alpha$ and\n", " $\\beta$ defined in the Neyman-Pearson theory \n", "\n", "\n", "Recall that in the sequential analytic formulation above, that\n", "\n", "- The sample size $n$ is not fixed but rather an object to be\n", " chosen; technically $n$ is a random variable. \n", "- The parameters $\\beta$ and $\\alpha$ characterize cut-off\n", " rules used to determine $n$ as a random variable. \n", "- Laws of large numbers make no appearances in the sequential\n", " construction. \n", "\n", "\n", "In chapter 1 of **Sequential Analysis** [[Wal47](https://python.quantecon.org/zreferences.html#id69)] Abraham Wald summarizes the\n", "Neyman-Pearson approach to hypothesis testing.\n", "\n", "Wald frames the problem as making a decision about a probability\n", "distribution that is partially known.\n", "\n", "(You have to assume that *something* is already known in order to state a well-posed\n", "problem – usually, *something* means *a lot*)\n", "\n", "By limiting what is unknown, Wald uses the following simple structure\n", "to illustrate the main ideas:\n", "\n", "- A decision-maker wants to decide which of two distributions\n", " $f_0$, $f_1$ govern an IID random variable $z$. \n", "- The null hypothesis $H_0$ is the statement that $f_0$\n", " governs the data. \n", "- The alternative hypothesis $H_1$ is the statement that\n", " $f_1$ governs the data. \n", "- The problem is to devise and analyze a test of hypothesis\n", " $H_0$ against the alternative hypothesis $H_1$ on the\n", " basis of a sample of a fixed number $n$ independent\n", " observations $z_1, z_2, \\ldots, z_n$ of the random variable\n", " $z$. \n", "\n", "\n", "To quote Abraham Wald,\n", "\n", "> A test procedure leading to the acceptance or rejection of the [null]\n", "hypothesis in question is simply a rule specifying, for each possible\n", "sample of size $n$, whether the [null] hypothesis should be accepted\n", "or rejected on the basis of the sample. This may also be expressed as\n", "follows: A test procedure is simply a subdivision of the totality of\n", "all possible samples of size $n$ into two mutually exclusive\n", "parts, say part 1 and part 2, together with the application of the\n", "rule that the [null] hypothesis be accepted if the observed sample is\n", "contained in part 2. Part 1 is also called the critical region. Since\n", "part 2 is the totality of all samples of size $n$ which are not\n", "included in part 1, part 2 is uniquely determined by part 1. Thus,\n", "choosing a test procedure is equivalent to determining a critical\n", "region.\n", "\n", "\n", "Let’s listen to Wald longer:\n", "\n", "> As a basis for choosing among critical regions the following\n", "considerations have been advanced by Neyman and Pearson: In accepting\n", "or rejecting $H_0$ we may commit errors of two kinds. We commit\n", "an error of the first kind if we reject $H_0$ when it is true;\n", "we commit an error of the second kind if we accept $H_0$ when\n", "$H_1$ is true. After a particular critical region $W$ has\n", "been chosen, the probability of committing an error of the first\n", "kind, as well as the probability of committing an error of the second\n", "kind is uniquely determined. The probability of committing an error\n", "of the first kind is equal to the probability, determined by the\n", "assumption that $H_0$ is true, that the observed sample will be\n", "included in the critical region $W$. The probability of\n", "committing an error of the second kind is equal to the probability,\n", "determined on the assumption that $H_1$ is true, that the\n", "probability will fall outside the critical region $W$. For any\n", "given critical region $W$ we shall denote the probability of an\n", "error of the first kind by $\\alpha$ and the probability of an\n", "error of the second kind by $\\beta$.\n", "\n", "\n", "Let’s listen carefully to how Wald applies law of large numbers to\n", "interpret $\\alpha$ and $\\beta$:\n", "\n", "> The probabilities $\\alpha$ and $\\beta$ have the\n", "following important practical interpretation: Suppose that we draw a\n", "large number of samples of size $n$. Let $M$ be the\n", "number of such samples drawn. Suppose that for each of these\n", "$M$ samples we reject $H_0$ if the sample is included in\n", "$W$ and accept $H_0$ if the sample lies outside\n", "$W$. In this way we make $M$ statements of rejection or\n", "acceptance. Some of these statements will in general be wrong. If\n", "$H_0$ is true and if $M$ is large, the probability is\n", "nearly $1$ (i.e., it is practically certain) that the\n", "proportion of wrong statements (i.e., the number of wrong statements\n", "divided by $M$) will be approximately $\\alpha$. If\n", "$H_1$ is true, the probability is nearly $1$ that the\n", "proportion of wrong statements will be approximately $\\beta$.\n", "Thus, we can say that in the long run [ here Wald applies law of\n", "large numbers by driving $M \\rightarrow \\infty$ (our comment,\n", "not Wald’s) ] the proportion of wrong statements will be\n", "$\\alpha$ if $H_0$is true and $\\beta$ if\n", "$H_1$ is true.\n", "\n", "\n", "The quantity $\\alpha$ is called the *size* of the critical region,\n", "and the quantity $1-\\beta$ is called the *power* of the critical\n", "region.\n", "\n", "Wald notes that\n", "\n", "> one critical region $W$ is more desirable than another if it\n", "has smaller values of $\\alpha$ and $\\beta$. Although\n", "either $\\alpha$ or $\\beta$ can be made arbitrarily small\n", "by a proper choice of the critical region $W$, it is possible\n", "to make both $\\alpha$ and $\\beta$ arbitrarily small for a\n", "fixed value of $n$, i.e., a fixed sample size.\n", "\n", "\n", "Wald summarizes Neyman and Pearson’s setup as follows:\n", "\n", "> Neyman and Pearson show that a region consisting of all samples\n", "$(z_1, z_2, \\ldots, z_n)$ which satisfy the inequality\n", "\n", "$$\n", "\\frac{ f_1(z_1) \\cdots f_1(z_n)}{f_0(z_1) \\cdots f_0(z_n)} \\geq k\n", "$$\n", "\n", "is a most powerful critical region for testing the hypothesis\n", "$H_0$ against the alternative hypothesis $H_1$. The term\n", "$k$ on the right side is a constant chosen so that the region\n", "will have the required size $\\alpha$.\n", "\n", "\n", "Wald goes on to discuss Neyman and Pearson’s concept of *uniformly most\n", "powerful* test.\n", "\n", "Here is how Wald introduces the notion of a sequential test\n", "\n", "> A rule is given for making one of the following three decisions at any stage of\n", "the experiment (at the m th trial for each integral value of m ): (1) to\n", "accept the hypothesis H , (2) to reject the hypothesis H , (3) to\n", "continue the experiment by making an additional observation. Thus, such\n", "a test procedure is carried out sequentially. On the basis of the first\n", "observation, one of the aforementioned decision is made. If the first or\n", "second decision is made, the process is terminated. If the third\n", "decision is made, a second trial is performed. Again, on the basis of\n", "the first two observations, one of the three decision is made. If the\n", "third decision is made, a third trial is performed, and so on. The\n", "process is continued until either the first or the second decisions is\n", "made. The number n of observations required by such a test procedure is\n", "a random variable, since the value of n depends on the outcome of the\n", "observations." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sequels\n", "\n", "We’ll dig deeper into some of the ideas used here in the following lectures:\n", "\n", "- [this lecture](https://python.quantecon.org/exchangeable.html) discusses the key concept of **exchangeability** that rationalizes statistical learning \n", "- [this lecture](https://python.quantecon.org/likelihood_ratio_process.html) describes **likelihood ratio processes** and their role in frequentist and Bayesian statistical theories \n", "- [this lecture](https://python.quantecon.org/likelihood_bayes.html) discusses the role of likelihood ratio processes in **Bayesian learning** \n", "- [this lecture](https://python.quantecon.org/navy_captain.html) returns to the subject of this lecture and studies whether the Captain’s hunch that the (frequentist) decision rule\n", " that the Navy had ordered him to use can be expected to be better or worse than the rule sequential rule that Abraham Wald designed \n", "\n", "\n", "

 The decision maker acts as if he believes that the sequence of random variables\n", "$[z_{0}, z_{1}, \\ldots]$ is *exchangeable*. See [Exchangeability and Bayesian Updating](https://python.quantecon.org/exchangeable.html) and\n", "[[Kre88](https://python.quantecon.org/zreferences.html#id67)] chapter 11, for discussions of exchangeability." ] } ], "metadata": { "date": 1619590837.9580243, "filename": "wald_friedman.md", "kernelspec": { "display_name": "Python", "language": "python3", "name": "python3" }, "title": "A Problem that Stumped Milton Friedman" }, "nbformat": 4, "nbformat_minor": 4 }