{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c2198911",
   "metadata": {},
   "source": [
    "\n",
    "<a id='likelihood-ratio-process'></a>\n",
    "<div id=\"qe-notebook-header\" align=\"right\" style=\"text-align:right;\">\n",
    "        <a href=\"https://quantecon.org/\" title=\"quantecon.org\">\n",
    "                <img style=\"width:250px;display:inline;\" width=\"250px\" src=\"https://assets.quantecon.org/img/qe-menubar-logo.svg\" alt=\"QuantEcon\">\n",
    "        </a>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8dd397f",
   "metadata": {},
   "source": [
    "# Likelihood Ratio Processes"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3cab188a",
   "metadata": {},
   "source": [
    "## Contents\n",
    "\n",
    "- [Likelihood Ratio Processes](#Likelihood-Ratio-Processes)  \n",
    "  - [Overview](#Overview)  \n",
    "  - [Likelihood Ratio Process](#Likelihood-Ratio-Process)  \n",
    "  - [Nature Permanently Draws from Density g](#Nature-Permanently-Draws-from-Density-g)  \n",
    "  - [Peculiar Property](#Peculiar-Property)  \n",
    "  - [Nature Permanently Draws from Density f](#Nature-Permanently-Draws-from-Density-f)  \n",
    "  - [Likelihood Ratio Test](#Likelihood-Ratio-Test)  \n",
    "  - [Kullback–Leibler Divergence](#Kullback–Leibler-Divergence)  \n",
    "  - [Sequels](#Sequels)  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e90bace",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This lecture describes likelihood ratio processes and some of their uses.\n",
    "\n",
    "We’ll use a setting described in [this lecture](https://python.quantecon.org/exchangeable.html).\n",
    "\n",
    "Among  things that we’ll learn  are\n",
    "\n",
    "- A peculiar property of likelihood ratio processes  \n",
    "- How a likelihood ratio process is a key ingredient in frequentist hypothesis testing  \n",
    "- How a **receiver operator characteristic curve** summarizes information about a false alarm probability and power in frequentist hypothesis testing  \n",
    "- How during World War II the United States Navy devised a decision rule that Captain Garret L. Schyler challenged, a topic to be studied in  [this lecture](https://python.quantecon.org/wald_friedman.html)  \n",
    "\n",
    "\n",
    "Let’s start  by importing some Python tools."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "770ef87d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "plt.rcParams[\"figure.figsize\"] = (11, 5)  #set default figure size\n",
    "import numpy as np\n",
    "from numba import vectorize, jit\n",
    "from math import gamma\n",
    "from scipy.integrate import quad"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "287d4e1f",
   "metadata": {},
   "source": [
    "## Likelihood Ratio Process\n",
    "\n",
    "A nonnegative random variable $ W $ has one of two probability density functions, either\n",
    "$ f $ or $ g $.\n",
    "\n",
    "Before the beginning of time, nature once and for all decides whether she will draw a sequence of IID draws from either\n",
    "$ f $ or $ g $.\n",
    "\n",
    "We will sometimes let $ q $ be the density that nature chose once and for all, so\n",
    "that $ q $ is either $ f $ or $ g $, permanently.\n",
    "\n",
    "Nature knows which density it permanently draws from, but we the observers do not.\n",
    "\n",
    "We know both $ f $ and $ g $ but we don’t know which density nature\n",
    "chose.\n",
    "\n",
    "But we want to know.\n",
    "\n",
    "To do that, we use observations.\n",
    "\n",
    "We observe a sequence $ \\{w_t\\}_{t=1}^T $ of $ T $ IID draws that we know came from either $ f $ or $ g $.\n",
    "\n",
    "We want to use these observations to infer whether nature chose $ f $ or $ g $.\n",
    "\n",
    "A **likelihood ratio process** is a useful tool for this task.\n",
    "\n",
    "To begin, we define key component of a likelihood ratio process, namely, the time $ t $ likelihood ratio  as the random variable\n",
    "\n",
    "$$\n",
    "\\ell (w_t)=\\frac{f\\left(w_t\\right)}{g\\left(w_t\\right)},\\quad t\\geq1.\n",
    "$$\n",
    "\n",
    "We assume that $ f $ and $ g $ both put positive probabilities on the\n",
    "same intervals of possible realizations of the random variable $ W $.\n",
    "\n",
    "That means that under the $ g $ density,  $ \\ell (w_t)=\n",
    "\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)} $\n",
    "is a nonnegative  random variable with mean $ 1 $.\n",
    "\n",
    "A **likelihood ratio process** for sequence\n",
    "$ \\left\\{ w_{t}\\right\\} _{t=1}^{\\infty} $ is defined as\n",
    "\n",
    "$$\n",
    "L\\left(w^{t}\\right)=\\prod_{i=1}^{t} \\ell (w_i),\n",
    "$$\n",
    "\n",
    "where $ w^t=\\{ w_1,\\dots,w_t\\} $ is a history of\n",
    "observations up to and including time $ t $.\n",
    "\n",
    "Sometimes for shorthand we’ll write $ L_t =  L(w^t) $.\n",
    "\n",
    "Notice that the likelihood process satisfies the *recursion* or\n",
    "*multiplicative decomposition*\n",
    "\n",
    "$$\n",
    "L(w^t) = \\ell (w_t) L (w^{t-1}) .\n",
    "$$\n",
    "\n",
    "The likelihood ratio and its logarithm are key tools for making\n",
    "inferences using a classic frequentist approach due to Neyman and\n",
    "Pearson [[Neyman and Pearson, 1933](https://python.quantecon.org/zreferences.html#id256)].\n",
    "\n",
    "To help us appreciate how things work, the following Python code evaluates $ f $ and $ g $ as two different\n",
    "beta distributions, then computes and simulates an associated likelihood\n",
    "ratio process by generating a sequence $ w^t $ from one of the two\n",
    "probability distributionss, for example, a sequence of  IID draws from $ g $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dce97bc1",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "# Parameters in the two beta distributions.\n",
    "F_a, F_b = 1, 1\n",
    "G_a, G_b = 3, 1.2\n",
    "\n",
    "@vectorize\n",
    "def p(x, a, b):\n",
    "    r = gamma(a + b) / (gamma(a) * gamma(b))\n",
    "    return r * x** (a-1) * (1 - x) ** (b-1)\n",
    "\n",
    "# The two density functions.\n",
    "f = jit(lambda x: p(x, F_a, F_b))\n",
    "g = jit(lambda x: p(x, G_a, G_b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "303ed120",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "@jit\n",
    "def simulate(a, b, T=50, N=500):\n",
    "    '''\n",
    "    Generate N sets of T observations of the likelihood ratio,\n",
    "    return as N x T matrix.\n",
    "\n",
    "    '''\n",
    "\n",
    "    l_arr = np.empty((N, T))\n",
    "\n",
    "    for i in range(N):\n",
    "\n",
    "        for j in range(T):\n",
    "            w = np.random.beta(a, b)\n",
    "            l_arr[i, j] = f(w) / g(w)\n",
    "\n",
    "    return l_arr"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e6876b1",
   "metadata": {},
   "source": [
    "## Nature Permanently Draws from Density g\n",
    "\n",
    "We first simulate the likelihood ratio process when nature permanently\n",
    "draws from $ g $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acb09640",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "l_arr_g = simulate(G_a, G_b)\n",
    "l_seq_g = np.cumprod(l_arr_g, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d5ad25e",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "N, T = l_arr_g.shape\n",
    "\n",
    "for i in range(N):\n",
    "\n",
    "    plt.plot(range(T), l_seq_g[i, :], color='b', lw=0.8, alpha=0.5)\n",
    "\n",
    "plt.ylim([0, 3])\n",
    "plt.title(\"$L(w^{t})$ paths\");"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ac5bb98a",
   "metadata": {},
   "source": [
    "Evidently, as sample length $ T $ grows, most probability mass\n",
    "shifts toward zero\n",
    "\n",
    "To see it this more clearly clearly, we plot over time the fraction of\n",
    "paths $ L\\left(w^{t}\\right) $ that fall in the interval\n",
    "$ \\left[0, 0.01\\right] $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cebdb56",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plt.plot(range(T), np.sum(l_seq_g <= 0.01, axis=0) / N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "815e795b",
   "metadata": {},
   "source": [
    "Despite the evident convergence of most probability mass to a\n",
    "very small interval near $ 0 $,  the unconditional mean of\n",
    "$ L\\left(w^t\\right) $ under probability density $ g $ is\n",
    "identically $ 1 $ for all $ t $.\n",
    "\n",
    "To verify this assertion, first notice that as mentioned earlier the unconditional mean\n",
    "$ E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] $ is $ 1 $ for\n",
    "all $ t $:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right]  &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n",
    "    &=\\int f\\left(w_{t}\\right)dw_{t} \\\\\n",
    "    &=1,\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "which immediately implies\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "E\\left[L\\left(w^{1}\\right)\\bigm|q=g\\right]  &=E\\left[\\ell \\left(w_{1}\\right)\\bigm|q=g\\right]\\\\\n",
    "    &=1.\\\\\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "Because $ L(w^t) = \\ell(w_t) L(w^{t-1}) $ and\n",
    "$ \\{w_t\\}_{t=1}^t $ is an IID sequence, we have\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "E\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]  &=E\\left[L\\left(w^{t-1}\\right)\\ell \\left(w_{t}\\right)\\bigm|q=g\\right] \\\\\n",
    "    &=E\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g,w^{t-1}\\right]\\bigm|q=g\\right] \\\\\n",
    "    &=E\\left[L\\left(w^{t-1}\\right)E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=g\\right]\\bigm|q=g\\right] \\\\\n",
    "    &=E\\left[L\\left(w^{t-1}\\right)\\bigm|q=g\\right] \\\\\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "for any $ t \\geq 1 $.\n",
    "\n",
    "Mathematical induction implies\n",
    "$ E\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1 $ for all\n",
    "$ t \\geq 1 $."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ade1fa7",
   "metadata": {},
   "source": [
    "## Peculiar Property\n",
    "\n",
    "How can $ E\\left[L\\left(w^{t}\\right)\\bigm|q=g\\right]=1 $ possibly be true when most  probability mass of the likelihood\n",
    "ratio process is piling up near $ 0 $ as\n",
    "$ t \\rightarrow + \\infty $?\n",
    "\n",
    "The answer is that as $ t \\rightarrow + \\infty $, the\n",
    "distribution of $ L_t $ becomes more and more fat-tailed:\n",
    "enough  mass shifts to larger and larger values of $ L_t $ to make\n",
    "the mean of $ L_t $ continue to be one despite most of the probability mass piling up\n",
    "near $ 0 $.\n",
    "\n",
    "To illustrate this peculiar property, we simulate many paths and\n",
    "calculate the unconditional mean of $ L\\left(w^t\\right) $ by\n",
    "averaging across these many paths at each $ t $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8246f60d",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "l_arr_g = simulate(G_a, G_b, N=50000)\n",
    "l_seq_g = np.cumprod(l_arr_g, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "772259d4",
   "metadata": {},
   "source": [
    "It would be useful to use simulations to verify that  unconditional means\n",
    "$ E\\left[L\\left(w^{t}\\right)\\right] $ equal unity by averaging across sample\n",
    "paths.\n",
    "\n",
    "But it would be too computer-time-consuming for us to that  here simply by applying a standard Monte Carlo simulation approach.\n",
    "\n",
    "The reason is that the distribution of $ L\\left(w^{t}\\right) $ is extremely skewed for large values of  $ t $.\n",
    "\n",
    "Because the probability density in the right tail is close to $ 0 $,  it just takes too much computer time to sample enough points from the right tail.\n",
    "\n",
    "We explain the problem in more detail  in [this lecture](https://python.quantecon.org/imp_sample.html).\n",
    "\n",
    "There we describe a way to an alternative way to compute the mean of a likelihood ratio by computing the mean of a *different* random variable by sampling from  a *different* probability distribution."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2daa0f0",
   "metadata": {},
   "source": [
    "## Nature Permanently Draws from Density f\n",
    "\n",
    "Now suppose that before time $ 0 $ nature permanently decided to draw repeatedly from density $ f $.\n",
    "\n",
    "While the mean of the likelihood ratio $ \\ell \\left(w_{t}\\right) $ under density\n",
    "$ g $ is $ 1 $, its mean under the density $ f $ exceeds one.\n",
    "\n",
    "To see this, we compute\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "E\\left[\\ell \\left(w_{t}\\right)\\bigm|q=f\\right]  &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}f\\left(w_{t}\\right)dw_{t} \\\\\n",
    "    &=\\int\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}\\frac{f\\left(w_{t}\\right)}{g\\left(w_{t}\\right)}g\\left(w_{t}\\right)dw_{t} \\\\\n",
    "    &=\\int \\ell \\left(w_{t}\\right)^{2}g\\left(w_{t}\\right)dw_{t} \\\\\n",
    "    &=E\\left[\\ell \\left(w_{t}\\right)^{2}\\mid q=g\\right] \\\\\n",
    "    &=E\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2}+Var\\left(\\ell \\left(w_{t}\\right)\\mid q=g\\right) \\\\\n",
    "    &>E\\left[\\ell \\left(w_{t}\\right)\\mid q=g\\right]^{2} = 1 \\\\\n",
    "       \\end{aligned}\n",
    "$$\n",
    "\n",
    "This in turn implies that the unconditional mean of the likelihood ratio process $ L(w^t) $\n",
    "diverges toward $ + \\infty $.\n",
    "\n",
    "Simulations below confirm this conclusion.\n",
    "\n",
    "Please note the scale of the $ y $ axis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f401c8b3",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "l_arr_f = simulate(F_a, F_b, N=50000)\n",
    "l_seq_f = np.cumprod(l_arr_f, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1db9f6a",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "N, T = l_arr_f.shape\n",
    "plt.plot(range(T), np.mean(l_seq_f, axis=0))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b02c188a",
   "metadata": {},
   "source": [
    "We also plot the probability that $ L\\left(w^t\\right) $ falls into\n",
    "the interval $ [10000, \\infty) $ as a function of time and watch how\n",
    "fast probability mass diverges  to $ +\\infty $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "48217517",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "plt.plot(range(T), np.sum(l_seq_f > 10000, axis=0) / N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c276953f",
   "metadata": {},
   "source": [
    "## Likelihood Ratio Test\n",
    "\n",
    "We now describe how to employ the machinery\n",
    "of Neyman and Pearson [[Neyman and Pearson, 1933](https://python.quantecon.org/zreferences.html#id256)] to test the hypothesis that  history $ w^t $ is generated by repeated\n",
    "IID draws from density $ g $.\n",
    "\n",
    "Denote $ q $ as the data generating process, so that\n",
    "$ q=f \\text{ or } g $.\n",
    "\n",
    "Upon observing a sample $ \\{W_i\\}_{i=1}^t $, we want to decide\n",
    "whether nature is drawing from $ g $ or from $ f $ by performing  a (frequentist)\n",
    "hypothesis test.\n",
    "\n",
    "We specify\n",
    "\n",
    "- Null hypothesis $ H_0 $: $ q=f $,  \n",
    "- Alternative hypothesis $ H_1 $: $ q=g $.  \n",
    "\n",
    "\n",
    "Neyman and Pearson proved that the best way to test this hypothesis is to use a **likelihood ratio test** that takes the\n",
    "form:\n",
    "\n",
    "- reject $ H_0 $ if $ L(W^t) < c $,  \n",
    "- accept $ H_0 $ otherwise.  \n",
    "\n",
    "\n",
    "where $ c $ is a given  discrimination threshold, to be chosen in a way we’ll soon describe.\n",
    "\n",
    "This test is *best* in the sense that it is a **uniformly most powerful** test.\n",
    "\n",
    "To understand what this means, we have to define probabilities of two important events that\n",
    "allow us to characterize a test associated with a given\n",
    "threshold $ c $.\n",
    "\n",
    "The two probabilities are:\n",
    "\n",
    "- Probability of detection (= power = 1 minus probability\n",
    "  of Type II error):  \n",
    "  $$\n",
    "  1-\\beta \\equiv \\Pr\\left\\{ L\\left(w^{t}\\right)<c\\mid q=g\\right\\}\n",
    "  $$\n",
    "- Probability of false alarm (= significance level = probability of\n",
    "  Type I error):  \n",
    "  $$\n",
    "  \\alpha \\equiv  \\Pr\\left\\{ L\\left(w^{t}\\right)<c\\mid q=f\\right\\}\n",
    "  $$\n",
    "\n",
    "\n",
    "The [Neyman-Pearson\n",
    "Lemma](https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma)\n",
    "states that among all possible tests, a likelihood ratio test\n",
    "maximizes the probability of detection for a given probability of false\n",
    "alarm.\n",
    "\n",
    "Another way to say the same thing is that  among all possible tests, a likelihood ratio test\n",
    "maximizes **power** for a given **significance level**.\n",
    "\n",
    "To have made a good inference, we want a small probability of\n",
    "false alarm and a large probability of detection.\n",
    "\n",
    "With sample size $ t $ fixed, we can change our two probabilities by\n",
    "adjusting $ c $.\n",
    "\n",
    "A troublesome “that’s life” fact is that these two probabilities  move in the same direction as we vary the critical value\n",
    "$ c $.\n",
    "\n",
    "Without specifying quantitative losses from making Type I and Type II errors, there is little that we can say\n",
    "about how we *should*  trade off probabilities of the two types of mistakes.\n",
    "\n",
    "We do know that increasing sample size $ t $ improves\n",
    "statistical inference.\n",
    "\n",
    "Below we plot some informative figures that illustrate this.\n",
    "\n",
    "We also present a classical frequentist method for choosing a sample\n",
    "size $ t $.\n",
    "\n",
    "Let’s start with a case in which we fix the threshold $ c $ at\n",
    "$ 1 $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d697bb0",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "c = 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc94aef0",
   "metadata": {},
   "source": [
    "Below we plot empirical distributions of logarithms of the cumulative\n",
    "likelihood ratios simulated above, which are generated by either\n",
    "$ f $ or $ g $.\n",
    "\n",
    "Taking logarithms has no effect on calculating the probabilities because\n",
    "the log  is a monotonic transformation.\n",
    "\n",
    "As $ t $ increases, the probabilities of making Type I and Type II\n",
    "errors both decrease, which is good.\n",
    "\n",
    "This is because most of the probability mass of log$ (L(w^t)) $\n",
    "moves toward $ -\\infty $ when $ g $ is the data generating\n",
    "process, ; while log$ (L(w^t)) $ goes to\n",
    "$ \\infty $ when data are generated by $ f $.\n",
    "\n",
    "That disparate  behavior of log$ (L(w^t)) $ under $ f $ and $ q $\n",
    "is what makes it possible to distinguish\n",
    "$ q=f $ from $ q=g $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e88b095",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "fig, axs = plt.subplots(2, 2, figsize=(12, 8))\n",
    "fig.suptitle('distribution of $log(L(w^t))$ under f or g', fontsize=15)\n",
    "\n",
    "for i, t in enumerate([1, 7, 14, 21]):\n",
    "    nr = i // 2\n",
    "    nc = i % 2\n",
    "\n",
    "    axs[nr, nc].axvline(np.log(c), color=\"k\", ls=\"--\")\n",
    "\n",
    "    hist_f, x_f = np.histogram(np.log(l_seq_f[:, t]), 200, density=True)\n",
    "    hist_g, x_g = np.histogram(np.log(l_seq_g[:, t]), 200, density=True)\n",
    "\n",
    "    axs[nr, nc].plot(x_f[1:], hist_f, label=\"dist under f\")\n",
    "    axs[nr, nc].plot(x_g[1:], hist_g, label=\"dist under g\")\n",
    "\n",
    "    for i, (x, hist, label) in enumerate(zip([x_f, x_g], [hist_f, hist_g], [\"Type I error\", \"Type II error\"])):\n",
    "        ind = x[1:] <= np.log(c) if i == 0 else x[1:] > np.log(c)\n",
    "        axs[nr, nc].fill_between(x[1:][ind], hist[ind], alpha=0.5, label=label)\n",
    "\n",
    "    axs[nr, nc].legend()\n",
    "    axs[nr, nc].set_title(f\"t={t}\")\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec0d9b2d",
   "metadata": {},
   "source": [
    "When we hold $ c $ fixed at $ c=1 $, the following graph shows  that\n",
    "\n",
    "- the probability of detection monotonically increases with increases in\n",
    "  $ t $  \n",
    "- the probability of a false alarm monotonically decreases with increases in $ t $.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3b1a9abb",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "PD = np.empty(T)\n",
    "PFA = np.empty(T)\n",
    "\n",
    "for t in range(T):\n",
    "    PD[t] = np.sum(l_seq_g[:, t] < c) / N\n",
    "    PFA[t] = np.sum(l_seq_f[:, t] < c) / N\n",
    "\n",
    "plt.plot(range(T), PD, label=\"Probability of detection\")\n",
    "plt.plot(range(T), PFA, label=\"Probability of false alarm\")\n",
    "plt.xlabel(\"t\")\n",
    "plt.title(\"$c=1$\")\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51fa81b5",
   "metadata": {},
   "source": [
    "For a given sample size $ t $,  the threshold $ c $ uniquely pins down  probabilities\n",
    "of both types of error.\n",
    "\n",
    "If for a fixed $ t $ we now free up and move $ c $, we will sweep out the probability\n",
    "of detection as a function of the probability of false alarm.\n",
    "\n",
    "This produces  a [receiver operating characteristic\n",
    "curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic).\n",
    "\n",
    "Below, we plot receiver operating characteristic curves for different\n",
    "sample sizes $ t $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f37d33b9",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "PFA = np.arange(0, 100, 1)\n",
    "\n",
    "for t in range(1, 15, 4):\n",
    "    percentile = np.percentile(l_seq_f[:, t], PFA)\n",
    "    PD = [np.sum(l_seq_g[:, t] < p) / N for p in percentile]\n",
    "\n",
    "    plt.plot(PFA / 100, PD, label=f\"t={t}\")\n",
    "\n",
    "plt.scatter(0, 1, label=\"perfect detection\")\n",
    "plt.plot([0, 1], [0, 1], color='k', ls='--', label=\"random detection\")\n",
    "\n",
    "plt.arrow(0.5, 0.5, -0.15, 0.15, head_width=0.03)\n",
    "plt.text(0.35, 0.7, \"better\")\n",
    "plt.xlabel(\"Probability of false alarm\")\n",
    "plt.ylabel(\"Probability of detection\")\n",
    "plt.legend()\n",
    "plt.title(\"Receiver Operating Characteristic Curve\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ac71615",
   "metadata": {},
   "source": [
    "Notice that as $ t $ increases, we are assured a larger probability\n",
    "of detection and a smaller probability of false alarm associated with\n",
    "a given discrimination threshold $ c $.\n",
    "\n",
    "As $ t \\rightarrow + \\infty $, we approach the perfect detection\n",
    "curve that is indicated by a right angle hinging on the blue dot.\n",
    "\n",
    "For a given sample size $ t $, the discrimination threshold $ c $ determines a point on the receiver operating\n",
    "characteristic curve.\n",
    "\n",
    "It is up to the test designer to trade off probabilities of\n",
    "making the two types of errors.\n",
    "\n",
    "But we know how to choose the smallest sample size to achieve given targets for\n",
    "the probabilities.\n",
    "\n",
    "Typically, frequentists aim for a high probability of detection that\n",
    "respects an upper bound on the probability of false alarm.\n",
    "\n",
    "Below we show an example in which we fix the probability of false alarm at\n",
    "$ 0.05 $.\n",
    "\n",
    "The required sample size for making a decision is then determined by a\n",
    "target probability of detection, for example, $ 0.9 $, as depicted in the following graph."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2247cabf",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "PFA = 0.05\n",
    "PD = np.empty(T)\n",
    "\n",
    "for t in range(T):\n",
    "\n",
    "    c = np.percentile(l_seq_f[:, t], PFA * 100)\n",
    "    PD[t] = np.sum(l_seq_g[:, t] < c) / N\n",
    "\n",
    "plt.plot(range(T), PD)\n",
    "plt.axhline(0.9, color=\"k\", ls=\"--\")\n",
    "\n",
    "plt.xlabel(\"t\")\n",
    "plt.ylabel(\"Probability of detection\")\n",
    "plt.title(f\"Probability of false alarm={PFA}\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14c68844",
   "metadata": {},
   "source": [
    "The United States Navy evidently used a procedure like this to select a sample size $ t $ for doing quality\n",
    "control tests during World War II.\n",
    "\n",
    "A Navy Captain who had been ordered to perform tests of this kind had doubts about it that he\n",
    "presented to Milton Friedman, as we describe in  [this lecture](https://python.quantecon.org/wald_friedman.html).\n",
    "\n",
    "\n",
    "<a id='rel-entropy'></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56e49bf2",
   "metadata": {},
   "source": [
    "## Kullback–Leibler Divergence\n",
    "\n",
    "Now let’s consider a case in which neither $ g $ nor $ f $\n",
    "generates the data.\n",
    "\n",
    "Instead, a third distribution $ h $ does.\n",
    "\n",
    "Let’s watch how how the cumulated likelihood ratios $ f/g $ behave\n",
    "when $ h $ governs the data.\n",
    "\n",
    "A key tool here is called **Kullback–Leibler divergence**.\n",
    "\n",
    "It is also called **relative entropy**.\n",
    "\n",
    "It measures how one probability distribution differs from another.\n",
    "\n",
    "In our application, we want to measure how $ f $ or $ g $\n",
    "diverges from $ h $\n",
    "\n",
    "The two Kullback–Leibler divergences pertinent for us are $ K_f $\n",
    "and $ K_g $ defined as\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "K_{f}   &=E_{h}\\left[\\log\\left(\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right] \\\\\n",
    "    &=\\int\\log\\left(\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{f\\left(w\\right)}{h\\left(w\\right)}h\\left(w\\right)dw \\\\\n",
    "    &=\\int\\log\\left(\\frac{f\\left(w\\right)}{h\\left(w\\right)}\\right)f\\left(w\\right)dw\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "K_{g}   &=E_{h}\\left[\\log\\left(\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right] \\\\\n",
    "    &=\\int\\log\\left(\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right)\\frac{g\\left(w\\right)}{h\\left(w\\right)}h\\left(w\\right)dw \\\\\n",
    "    &=\\int\\log\\left(\\frac{g\\left(w\\right)}{h\\left(w\\right)}\\right)g\\left(w\\right)dw\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "When $ K_g < K_f $, $ g $ is closer to $ h $ than $ f $\n",
    "is.\n",
    "\n",
    "- In that case we’ll find that $ L\\left(w^t\\right) \\rightarrow 0 $.  \n",
    "\n",
    "\n",
    "When $ K_g > K_f $, $ f $ is closer to $ h $ than $ g $\n",
    "is.\n",
    "\n",
    "- In that case we’ll find that\n",
    "  $ L\\left(w^t\\right) \\rightarrow + \\infty $  \n",
    "\n",
    "\n",
    "We’ll now experiment with an $ h $ is also a beta distribution\n",
    "\n",
    "We’ll start by setting parameters $ G_a $ and $ G_b $ so that\n",
    "$ h $ is closer to $ g $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cf612867",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "H_a, H_b = 3.5, 1.8\n",
    "\n",
    "h = jit(lambda x: p(x, H_a, H_b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0e11dfc",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "x_range = np.linspace(0, 1, 100)\n",
    "plt.plot(x_range, f(x_range), label='f')\n",
    "plt.plot(x_range, g(x_range), label='g')\n",
    "plt.plot(x_range, h(x_range), label='h')\n",
    "\n",
    "plt.legend()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec893cb6",
   "metadata": {},
   "source": [
    "Let’s compute the Kullback–Leibler discrepancies by quadrature\n",
    "integration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af0a8760",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def KL_integrand(w, q, h):\n",
    "\n",
    "    m = q(w) / h(w)\n",
    "\n",
    "    return np.log(m) * q(w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "480d9182",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "def compute_KL(h, f, g):\n",
    "\n",
    "    Kf, _ = quad(KL_integrand, 0, 1, args=(f, h))\n",
    "    Kg, _ = quad(KL_integrand, 0, 1, args=(g, h))\n",
    "\n",
    "    return Kf, Kg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f9654b93",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Kf, Kg = compute_KL(h, f, g)\n",
    "Kf, Kg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6147a1d2",
   "metadata": {},
   "source": [
    "We have $ K_g < K_f $.\n",
    "\n",
    "Next, we can verify our conjecture about $ L\\left(w^t\\right) $ by\n",
    "simulation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba996a02",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "l_arr_h = simulate(H_a, H_b)\n",
    "l_seq_h = np.cumprod(l_arr_h, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e6a0f86",
   "metadata": {},
   "source": [
    "The figure below plots over time the fraction of paths\n",
    "$ L\\left(w^t\\right) $ that fall in the interval $ [0,0.01] $.\n",
    "\n",
    "Notice that it converges to 1 as expected when $ g $ is closer to\n",
    "$ h $ than $ f $ is."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20daed34",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "N, T = l_arr_h.shape\n",
    "plt.plot(range(T), np.sum(l_seq_h <= 0.01, axis=0) / N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1edb1b56",
   "metadata": {},
   "source": [
    "We can also try an $ h $ that is closer to $ f $ than is\n",
    "$ g $ so that now $ K_g $ is larger than $ K_f $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ead2008",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "H_a, H_b = 1.2, 1.2\n",
    "h = jit(lambda x: p(x, H_a, H_b))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd72872f",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "Kf, Kg = compute_KL(h, f, g)\n",
    "Kf, Kg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f4080ab",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "l_arr_h = simulate(H_a, H_b)\n",
    "l_seq_h = np.cumprod(l_arr_h, axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "257b05a7",
   "metadata": {},
   "source": [
    "Now probability mass of $ L\\left(w^t\\right) $ falling above\n",
    "$ 10000 $ diverges to $ +\\infty $."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41148cee",
   "metadata": {
    "hide-output": false
   },
   "outputs": [],
   "source": [
    "N, T = l_arr_h.shape\n",
    "plt.plot(range(T), np.sum(l_seq_h > 10000, axis=0) / N)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fa6d7074",
   "metadata": {},
   "source": [
    "## Sequels\n",
    "\n",
    "Likelihood processes play an important role in Bayesian learning, as described in [this lecture](https://python.quantecon.org/likelihood_bayes.html)\n",
    "and as applied in [this lecture](https://python.quantecon.org/odu.html).\n",
    "\n",
    "Likelihood ratio processes appear again in [this lecture](https://python-advanced.quantecon.org/additive_functionals.html), which contains another illustration\n",
    "of the **peculiar property** of likelihood ratio processes described above."
   ]
  }
 ],
 "metadata": {
  "date": 1752455386.4859102,
  "filename": "likelihood_ratio_process.md",
  "kernelspec": {
   "display_name": "Python",
   "language": "python3",
   "name": "python3"
  },
  "title": "Likelihood Ratio Processes"
 },
 "nbformat": 4,
 "nbformat_minor": 5
}