{
"cells": [
{
"cell_type": "markdown",
"id": "701044e0",
"metadata": {},
"source": [
"# Elementary Probability with Matrices\n",
"\n",
"This lecture uses matrix algebra to illustrate some basic ideas about probability theory.\n",
"\n",
"After providing somewhat informal definitions of the underlying objects, we’ll use matrices and vectors to describe probability distributions.\n",
"\n",
"Among concepts that we’ll be studying include\n",
"\n",
"- a joint probability distribution \n",
"- marginal distributions associated with a given joint distribution \n",
"- conditional probability distributions \n",
"- statistical independence of two random variables \n",
"- joint distributions associated with a prescribed set of marginal distributions \n",
" - couplings \n",
" - copulas \n",
"- the probability distribution of a sum of two independent random variables \n",
" - convolution of marginal distributions \n",
"- parameters that define a probability distribution \n",
"- sufficient statistics as data summaries \n",
"\n",
"\n",
"We’ll use a matrix to represent a bivariate probability distribution and a vector to represent a univariate probability distribution\n",
"\n",
"In addition to what’s in Anaconda, this lecture will need the following libraries:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce832272",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"!pip install prettytable"
]
},
{
"cell_type": "markdown",
"id": "4198fa9d",
"metadata": {},
"source": [
"As usual, we’ll start with some imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf3706b6",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import prettytable as pt\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"from matplotlib_inline.backend_inline import set_matplotlib_formats\n",
"set_matplotlib_formats('retina')"
]
},
{
"cell_type": "markdown",
"id": "1c6c7e1c",
"metadata": {},
"source": [
"## Sketch of Basic Concepts\n",
"\n",
"We’ll briefly define what we mean by a **probability space**, a **probability measure**, and a **random variable**.\n",
"\n",
"For most of this lecture, we sweep these objects into the background, but they are there underlying the other objects that we’ll mainly focus on.\n",
"\n",
"Let $ \\Omega $ be a set of possible underlying outcomes and let $ \\omega \\in \\Omega $ be a particular underlying outcomes.\n",
"\n",
"Let $ \\mathcal{G} \\subset \\Omega $ be a subset of $ \\Omega $.\n",
"\n",
"Let $ \\mathcal{F} $ be a collection of such subsets $ \\mathcal{G} \\subset \\Omega $.\n",
"\n",
"The pair $ \\Omega,\\mathcal{F} $ forms our **probability space** on which we want to put a probability measure.\n",
"\n",
"A **probability measure** $ \\mu $ maps a set of possible underlying outcomes $ \\mathcal{G} \\in \\mathcal{F} $ into a scalar number between $ 0 $ and $ 1 $\n",
"\n",
"- this is the “probability” that $ X $ belongs to $ A $, denoted by $ \\textrm{Prob}\\{X\\in A\\} $. \n",
"\n",
"\n",
"A **random variable** $ X(\\omega) $ is a function of the underlying outcome $ \\omega \\in \\Omega $.\n",
"\n",
"The random variable $ X(\\omega) $ has a **probability distribution** that is induced by the underlying probability measure $ \\mu $ and the function\n",
"$ X(\\omega) $:\n",
"\n",
"\n",
"\n",
"$$\n",
"\\textrm{Prob} (X \\in A ) = \\int_{\\mathcal{G}} \\mu(\\omega) d \\omega \\tag{10.1}\n",
"$$\n",
"\n",
"where $ {\\mathcal G} $ is the subset of $ \\Omega $ for which $ X(\\omega) \\in A $.\n",
"\n",
"We call this the induced probability distribution of random variable $ X $."
]
},
{
"cell_type": "markdown",
"id": "b97b2746",
"metadata": {},
"source": [
"## What Does Probability Mean?\n",
"\n",
"Before diving in, we’ll say a few words about what probability theory means and how it connects to statistics.\n",
"\n",
"We also touch on these topics in the quantecon lectures [https://python.quantecon.org/prob_meaning.html](https://python.quantecon.org/prob_meaning.html) and [https://python.quantecon.org/navy_captain.html](https://python.quantecon.org/navy_captain.html).\n",
"\n",
"For much of this lecture we’ll be discussing fixed “population” probabilities.\n",
"\n",
"These are purely mathematical objects.\n",
"\n",
"To appreciate how statisticians connect probabilities to data, the key is to understand the following concepts:\n",
"\n",
"- A single draw from a probability distribution \n",
"- Repeated independently and identically distributed (i.i.d.) draws of “samples” or “realizations” from the same probability distribution \n",
"- A **statistic** defined as a function of a sequence of samples \n",
"- An **empirical distribution** or **histogram** (a binned empirical distribution) that records observed **relative frequencies** \n",
"- The idea that a population probability distribution is what we anticipate **relative frequencies** will be in a long sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by **anticipated relative frequencies** \n",
" - **Law of Large Numbers (LLN)** \n",
" - **Central Limit Theorem (CLT)** \n",
"\n",
"\n",
"**Scalar example**\n",
"\n",
"Let $ X $ be a scalar random variable that takes on the $ I $ possible values\n",
"$ 0, 1, 2, \\ldots, I-1 $ with probabilities\n",
"\n",
"$$\n",
"{\\rm Prob}(X = i) = f_i, \\quad\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"f_i \\geqslant 0, \\quad \\sum_i f_i = 1 .\n",
"$$\n",
"\n",
"We sometimes write\n",
"\n",
"$$\n",
"X \\sim \\{{f_i}\\}_{i=0}^{I-1}\n",
"$$\n",
"\n",
"as a short-hand way of saying that the random variable $ X $ is described by the probability distribution $ \\{{f_i}\\}_{i=0}^{I-1} $.\n",
"\n",
"Consider drawing a sample $ x_0, x_1, \\dots , x_{N-1} $ of $ N $ independent and identically distributoed draws of $ X $.\n",
"\n",
"What do the “identical” and “independent” mean in IID or iid (“identically and independently distributed)?\n",
"\n",
"- “identical” means that each draw is from the same distribution. \n",
"- “independent” means that joint distribution equal products of marginal distributions, i.e., \n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\textrm{Prob}\\{x_0 = i_0, x_1 = i_1, \\dots , x_{N-1} = i_{N-1}\\} &= \\textrm{Prob}\\{x_0 = i_0\\} \\cdot \\dots \\cdot \\textrm{Prob}\\{x_{I-1} = i_{I-1}\\}\\\\\n",
"&= f_{i_0} f_{i_1} \\cdot \\dots \\cdot f_{i_{N-1}}\\\\\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"We define an e **empirical distribution** as follows.\n",
"\n",
"For each $ i = 0,\\dots,I-1 $, let\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"N_i & = \\text{number of times} \\ X = i,\\\\\n",
"N & = \\sum^{I-1}_{i=0} N_i \\quad \\text{total number of draws},\\\\\n",
"\\tilde {f_i} & = \\frac{N_i}{N} \\sim \\ \\text{frequency of draws for which}\\ X=i\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Key ideas that justify connecting probability theory with statistics are laws of large numbers and central limit theorems\n",
"\n",
"**LLN:**\n",
"\n",
"- A Law of Large Numbers (LLN) states that $ \\tilde {f_i} \\to f_i \\text{ as } N \\to \\infty $ \n",
"\n",
"\n",
"**CLT:**\n",
"\n",
"- A Central Limit Theorem (CLT) describes a **rate** at which $ \\tilde {f_i} \\to f_i $ \n",
"\n",
"\n",
"**Remarks**\n",
"\n",
"- For “frequentist” statisticians, **anticipated relative frequency** is **all** that a probability distribution means. \n",
"- But for a Bayesian it means something more or different. "
]
},
{
"cell_type": "markdown",
"id": "bed04dae",
"metadata": {},
"source": [
"## Representing Probability Distributions\n",
"\n",
"A probability distribution $ \\textrm{Prob} (X \\in A) $ can be described by its **cumulative distribution function (CDF)**\n",
"\n",
"$$\n",
"F_{X}(x) = \\textrm{Prob}\\{X\\leq x\\}.\n",
"$$\n",
"\n",
"Sometimes, but not always, a random variable can also be described by **density function** $ f(x) $\n",
"that is related to its CDF by\n",
"\n",
"$$\n",
"\\textrm{Prob} \\{X\\in B\\} = \\int_{t\\in B}f(t)dt\n",
"$$\n",
"\n",
"$$\n",
"F(x) = \\int_{-\\infty}^{x}f(t)dt\n",
"$$\n",
"\n",
"Here $ B $ is a set of possible $ X $’s whose probability we want to compute.\n",
"\n",
"When a probability density exists, a probability distribution can be characterized either by its CDF or by its density.\n",
"\n",
"For a **discrete-valued** random variable\n",
"\n",
"- the number of possible values of $ X $ is finite or countably infinite \n",
"- we replace a **density** with a **probability mass function**, a non-negative sequence that sums to one \n",
"- we replace integration with summation in the formula like [(10.1)](#equation-eq-cdffromdensity) that relates a CDF to a probability mass function \n",
"\n",
"\n",
"In this lecture, we mostly discuss discrete random variables.\n",
"\n",
"Doing this enables us to confine our tool set basically to linear algebra.\n",
"\n",
"Later we’ll briefly discuss how to approximate a continuous random variable with a discrete random variable."
]
},
{
"cell_type": "markdown",
"id": "a7d995e3",
"metadata": {},
"source": [
"## Univariate Probability Distributions\n",
"\n",
"We’ll devote most of this lecture to discrete-valued random variables, but we’ll say a few things\n",
"about continuous-valued random variables."
]
},
{
"cell_type": "markdown",
"id": "128d0e9e",
"metadata": {},
"source": [
"### Discrete random variable\n",
"\n",
"Let $ X $ be a discrete random variable that takes possible values: $ i=0,1,\\ldots,I-1 = \\bar{X} $.\n",
"\n",
"Here, we choose the maximum index $ I-1 $ because of how this aligns nicely with Python’s index convention.\n",
"\n",
"Define $ f_i \\equiv \\textrm{Prob}\\{X=i\\} $\n",
"and assemble the non-negative vector\n",
"\n",
"\n",
"\n",
"$$\n",
"f=\\left[\\begin{array}{c}\n",
"f_{0}\\\\\n",
"f_{1}\\\\\n",
"\\vdots\\\\\n",
"f_{I-1}\n",
"\\end{array}\\right] \\tag{10.2}\n",
"$$\n",
"\n",
"for which $ f_{i} \\in [0,1] $ for each $ i $ and $ \\sum_{i=0}^{I-1}f_i=1 $.\n",
"\n",
"This vector defines a **probability mass function**.\n",
"\n",
"The distribution [(10.2)](#equation-eq-discretedist)\n",
"has **parameters** $ \\{f_{i}\\}_{i=0,1, \\cdots ,I-2} $ since $ f_{I-1} = 1-\\sum_{i=0}^{I-2}f_{i} $.\n",
"\n",
"These parameters pin down the shape of the distribution.\n",
"\n",
"(Sometimes $ I = \\infty $.)\n",
"\n",
"Such a “non-parametric” distribution has as many “parameters” as there are possible values of the random variable.\n",
"\n",
"We often work with special distributions that are characterized by a small number parameters.\n",
"\n",
"In these special parametric distributions,\n",
"\n",
"$$\n",
"f_i = g(i; \\theta)\n",
"$$\n",
"\n",
"where $ \\theta $ is a vector of parameters that is of much smaller dimension than $ I $.\n",
"\n",
"**Remarks:**\n",
"\n",
"- The concept of **parameter** is intimately related to the notion of **sufficient statistic**. \n",
"- Sufficient statistics are nonlinear functions of a data set. \n",
"- Sufficient statistics are designed to summarize all **information** about parameters that is contained in a data set. \n",
"- They are important tools that AI uses to summarize a **big data** set \n",
"- R. A. Fisher provided a rigorous definition of **information** – see [https://en.wikipedia.org/wiki/Fisher_information](https://en.wikipedia.org/wiki/Fisher_information) \n",
"\n",
"\n",
"An example of a parametric probability distribution is a **geometric distribution**.\n",
"\n",
"It is described by\n",
"\n",
"$$\n",
"f_{i} = \\textrm{Prob}\\{X=i\\} = (1-\\lambda)\\lambda^{i},\\quad \\lambda \\in [0,1], \\quad i = 0, 1, 2, \\ldots\n",
"$$\n",
"\n",
"Evidently, $ \\sum_{i=0}^{\\infty}f_i=1 $.\n",
"\n",
"Let $ \\theta $ be a vector of parameters of the distribution described by $ f $, then\n",
"\n",
"$$\n",
"f_i( \\theta)\\ge0, \\sum_{i=0}^{\\infty}f_i(\\theta)=1\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "585ff49c",
"metadata": {},
"source": [
"### Continuous random variable\n",
"\n",
"Let $ X $ be a continous random variable that takes values $ X \\in \\tilde{X}\\equiv[X_U,X_L] $ whose distributions have parameters $ \\theta $.\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X\\in A\\} = \\int_{x\\in A} f(x;\\theta)\\,dx; \\quad f(x;\\theta)\\ge0\n",
"$$\n",
"\n",
"where $ A $ is a subset of $ \\tilde{X} $ and\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X\\in \\tilde{X}\\} =1\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "a2ea3282",
"metadata": {},
"source": [
"## Bivariate Probability Distributions\n",
"\n",
"We’ll now discuss a bivariate **joint distribution**.\n",
"\n",
"To begin, we restrict ourselves to two discrete random variables.\n",
"\n",
"Let $ X,Y $ be two discrete random variables that take values:\n",
"\n",
"$$\n",
"X\\in\\{0,\\ldots,I-1\\}\n",
"$$\n",
"\n",
"$$\n",
"Y\\in\\{0,\\ldots,J-1\\}\n",
"$$\n",
"\n",
"Then their **joint distribution** is described by a matrix\n",
"\n",
"$$\n",
"F_{I\\times J}=[f_{ij}]_{i\\in\\{0,\\ldots,I-1\\}, j\\in\\{0,\\ldots,J-1\\}}\n",
"$$\n",
"\n",
"whose elements are\n",
"\n",
"$$\n",
"f_{ij}=\\textrm{Prob}\\{X=i,Y=j\\} \\geq 0\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\sum_{i}\\sum_{j}f_{ij}=1\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "811371c4",
"metadata": {},
"source": [
"## Marginal Probability Distributions\n",
"\n",
"The joint distribution induce marginal distributions\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=i\\}= \\sum_{j=0}^{J-1}f_{ij} = \\mu_i, \\quad i=0,\\ldots,I-1\n",
"$$\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{Y=j\\}= \\sum_{i=0}^{I-1}f_{ij} = \\nu_j, \\quad j=0,\\ldots,J-1\n",
"$$\n",
"\n",
"For example, let a joint distribution over $ (X,Y) $ be\n",
"\n",
"\n",
"\n",
"$$\n",
"F = \\left[\n",
" \\begin{matrix}\n",
" .25 & .1\\\\\n",
" .15 & .5\n",
" \\end{matrix}\n",
"\\right] \\tag{10.3}\n",
"$$\n",
"\n",
"The implied marginal distributions are:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\textrm{Prob} \\{X=0\\}&=.25+.1=.35\\\\\n",
"\\textrm{Prob}\\{X=1\\}& =.15+.5=.65\\\\\n",
"\\textrm{Prob}\\{Y=0\\}&=.25+.15=.4\\\\\n",
"\\textrm{Prob}\\{Y=1\\}&=.1+.5=.6\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"**Digression:** If two random variables $ X,Y $ are continuous and have joint density $ f(x,y) $, then marginal distributions can be computed by\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"f(x)& = \\int_{\\mathbb{R}} f(x,y) dy\\\\\n",
"f(y)& = \\int_{\\mathbb{R}} f(x,y) dx\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "56ca6ba4",
"metadata": {},
"source": [
"## Conditional Probability Distributions\n",
"\n",
"Conditional probabilities are defined according to\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{A \\mid B\\}=\\frac{\\textrm{Prob}\\{A \\cap B\\}}{\\textrm{Prob}\\{B\\}}\n",
"$$\n",
"\n",
"where $ A, B $ are two events.\n",
"\n",
"For a pair of discrete random variables, we have the **conditional distribution**\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=i|Y=j\\}=\\frac{f_{ij}}{\\sum_{i}f_{ij}}\n",
"=\\frac{\\textrm{Prob} \\{X=i, Y=j\\} }{\\textrm{Prob} \\{Y=j\\} }\n",
"$$\n",
"\n",
"where $ i=0, \\ldots,I-1, \\quad j=0,\\ldots,J-1 $.\n",
"\n",
"Note that\n",
"\n",
"$$\n",
"\\sum_{i}\\textrm{Prob}\\{X_i=i|Y_j=j\\}\n",
"=\\frac{ \\sum_{i}f_{ij} }{ \\sum_{i}f_{ij}}=1\n",
"$$\n",
"\n",
"**Remark:** The mathematics of conditional probability implies **Bayes’ Law**:\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=i|Y=j\\}\t=\\frac{\\textrm{Prob}\\{X=i,Y=j\\}}{\\textrm{Prob}\\{Y=j\\}}=\\frac{\\textrm{Prob}\\{Y=j|X=i\\}\\textrm{Prob}\\{X=i\\}}{\\textrm{Prob}\\{Y=j\\}}\n",
"$$\n",
"\n",
"For the joint distribution [(10.3)](#equation-eq-example101discrete)\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=0|Y=1\\} =\\frac{ .1}{.1+.5}=\\frac{.1}{.6}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "d7fea728",
"metadata": {},
"source": [
"## Statistical Independence\n",
"\n",
"Random variables X and Y are statistically **independent** if\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=i,Y=j\\}={f_ig_j}\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\textrm{Prob}\\{X=i\\} &=f_i\\ge0, \\sum{f_i}=1 \\cr\n",
"\\textrm{Prob}\\{Y=j\\} & =g_j\\ge0, \\sum{g_j}=1\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Conditional distributions are\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\textrm{Prob}\\{X=i|Y=j\\} & =\\frac{f_ig_j}{\\sum_{i}f_ig_j}=\\frac{f_ig_j}{g_j}=f_i \\\\\n",
"\\textrm{Prob}\\{Y=j|X=i\\} & =\\frac{f_ig_j}{\\sum_{j}f_ig_j}=\\frac{f_ig_j}{f_i}=g_j\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "d354945a",
"metadata": {},
"source": [
"## Means and Variances\n",
"\n",
"The mean and variance of a discrete random variable $ X $ are\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mu_{X} & \\equiv\\mathbb{E}\\left[X\\right]\n",
"=\\sum_{k}k \\textrm{Prob}\\{X=k\\} \\\\\n",
"\\sigma_{X}^{2} & \\equiv\\mathbb{D}\\left[X\\right]=\\sum_{k}\\left(k-\\mathbb{E}\\left[X\\right]\\right)^{2}\\textrm{Prob}\\{X=k\\}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"A continuous random variable having density $ f_{X}(x) $) has mean and variance\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mu_{X} & \\equiv\\mathbb{E}\\left[X\\right]=\\int_{-\\infty}^{\\infty}xf_{X}(x)dx \\\\\n",
"\\sigma_{X}^{2}\\equiv\\mathbb{D}\\left[X\\right] & =\\mathrm{E}\\left[\\left(X-\\mu_{X}\\right)^{2}\\right]=\\int_{-\\infty}^{\\infty}\\left(x-\\mu_{X}\\right)^{2}f_{X}(x)dx\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "bf149e4a",
"metadata": {},
"source": [
"## Generating Random Numbers\n",
"\n",
"Suppose we have at our disposal a pseudo random number that draws a uniform random variable, i.e., one with probability distribution\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{\\tilde{X}=i\\}=\\frac{1}{I},\\quad i=0,\\ldots,I-1\n",
"$$\n",
"\n",
"How can we transform $ \\tilde{X} $ to get a random variable $ X $ for which $ \\textrm{Prob}\\{X=i\\}=f_i,\\quad i=0,\\ldots,I-1 $,\n",
"where $ f_i $ is an arbitary discrete probability distribution on $ i=0,1,\\dots,I-1 $?\n",
"\n",
"The key tool is the inverse of a cumulative distribution function (CDF).\n",
"\n",
"Observe that the CDF of a distribution is monotone and non-decreasing, taking values between $ 0 $ and $ 1 $.\n",
"\n",
"We can draw a sample of a random variable $ X $ with a known CDF as follows:\n",
"\n",
"- draw a random variable $ u $ from a uniform distribution on $ [0,1] $ \n",
"- pass the sample value of $ u $ into the **“inverse”** target CDF for $ X $ \n",
"- $ X $ has the target CDF \n",
"\n",
"\n",
"Thus, knowing the **“inverse”** CDF of a distribution is enough to simulate from this distribution.\n",
"\n",
">**Note**\n",
">\n",
">The “inverse” CDF needs to exist for this method to work.\n",
"\n",
"The inverse CDF is\n",
"\n",
"$$\n",
"F^{-1}(u)\\equiv\\inf \\{x\\in \\mathbb{R}: F(x) \\geq u\\} \\quad(00 $.\n",
"\n",
"Its density function is\n",
"\n",
"$$\n",
"\\quad f(x)=\\lambda e^{-\\lambda x}\n",
"$$\n",
"\n",
"Its CDF is\n",
"\n",
"$$\n",
"F(x)=\\int_{0}^{\\infty}\\lambda e^{-\\lambda x}=1-e^{-\\lambda x}\n",
"$$\n",
"\n",
"Let $ U $ follow a uniform distribution on $ [0,1] $.\n",
"\n",
"$ X $ is a random variable such that $ U=F(X) $.\n",
"\n",
"The distribution $ X $ can be deduced from\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"U& =F(X)=1-e^{-\\lambda X}\\qquad\\\\\n",
"\\implies & \\quad -U=e^{-\\lambda X}\\\\\n",
"\\implies& \\quad \\log(1-U)=-\\lambda X\\\\\n",
"\\implies & \\quad X=\\frac{(1-U)}{-\\lambda}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Let’s draw $ u $ from $ U[0,1] $ and calculate $ x=\\frac{log(1-U)}{-\\lambda} $.\n",
"\n",
"We’ll check whether $ X $ seems to follow a **continuous geometric** (exponential) distribution.\n",
"\n",
"Let’s check with `numpy`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e9399b6f",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"n, λ = 1_000_000, 0.3\n",
"\n",
"# draw uniform numbers\n",
"u = np.random.rand(n)\n",
"\n",
"# transform\n",
"x = -np.log(1-u)/λ\n",
"\n",
"# draw geometric distributions\n",
"x_g = np.random.exponential(1 / λ, n)\n",
"\n",
"# plot and compare\n",
"plt.hist(x, bins=100, density=True)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20cbb17d",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"plt.hist(x_g, bins=100, density=True, alpha=0.6)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "7f2fd4e5",
"metadata": {},
"source": [
"**Geometric distribution**\n",
"\n",
"Let $ X $ distributed geometrically, that is\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\textrm{Prob}(X=i) & =(1-\\lambda)\\lambda^i,\\quad\\lambda\\in(0,1), \\quad i=0,1,\\dots \\\\\n",
" & \\sum_{i=0}^{\\infty}\\textrm{Prob}(X=i)=1\\longleftrightarrow(1- \\lambda)\\sum_{i=0}^{\\infty}\\lambda^i=\\frac{1-\\lambda}{1-\\lambda}=1\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Its CDF is given by\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\textrm{Prob}(X\\le i)& =(1-\\lambda)\\sum_{j=0}^{i}\\lambda^i\\\\\n",
"& =(1-\\lambda)[\\frac{1-\\lambda^{i+1}}{1-\\lambda}]\\\\\n",
"& =1-\\lambda^{i+1}\\\\\n",
"& =F(X)=F_i \\quad\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Again, let $ \\tilde{U} $ follow a uniform distribution and we want to find $ X $ such that $ F(X)=\\tilde{U} $.\n",
"\n",
"Let’s deduce the distribution of $ X $ from\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\tilde{U} & =F(X)=1-\\lambda^{x+1}\\\\\n",
"1-\\tilde{U} & =\\lambda^{x+1}\\\\\n",
"\\log(1-\\tilde{U})& =(x+1)\\log\\lambda\\\\\n",
"\\frac{\\log(1-\\tilde{U})}{\\log\\lambda}& =x+1\\\\\n",
"\\frac{\\log(1-\\tilde{U})}{\\log\\lambda}-1 &=x\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"However, $ \\tilde{U}=F^{-1}(X) $ may not be an integer for any $ x\\geq0 $.\n",
"\n",
"So let\n",
"\n",
"$$\n",
"x=\\lceil\\frac{\\log(1-\\tilde{U})}{\\log\\lambda}-1\\rceil\n",
"$$\n",
"\n",
"where $ \\lceil . \\rceil $ is the ceiling function.\n",
"\n",
"Thus $ x $ is the smallest integer such that the discrete geometric CDF is greater than or equal to $ \\tilde{U} $.\n",
"\n",
"We can verify that $ x $ is indeed geometrically distributed by the following `numpy` program.\n",
"\n",
">**Note**\n",
">\n",
">The exponential distribution is the continuous analog of geometric distribution."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2cb80016",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"n, λ = 1_000_000, 0.8\n",
"\n",
"# draw uniform numbers\n",
"u = np.random.rand(n)\n",
"\n",
"# transform\n",
"x = np.ceil(np.log(1-u)/np.log(λ) - 1)\n",
"\n",
"# draw geometric distributions\n",
"x_g = np.random.geometric(1-λ, n)\n",
"\n",
"# plot and compare\n",
"plt.hist(x, bins=150, density=True)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9f63ad34",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"np.random.geometric(1-λ, n).max()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e471c2d3",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"np.log(0.4)/np.log(0.3)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86808cac",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"plt.hist(x_g, bins=150, density=True, alpha=0.6)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "7a88123a",
"metadata": {},
"source": [
"## Some Discrete Probability Distributions\n",
"\n",
"Let’s write some Python code to compute means and variances of some univariate random variables.\n",
"\n",
"We’ll use our code to\n",
"\n",
"- compute population means and variances from the probability distribution \n",
"- generate a sample of $ N $ independently and identically distributed draws and compute sample means and variances \n",
"- compare population and sample means and variances "
]
},
{
"cell_type": "markdown",
"id": "4b324898",
"metadata": {},
"source": [
"## Geometric distribution\n",
"\n",
"$$\n",
"\\textrm{Prob}(X=k)=(1-p)^{k-1}p,k=1,2, \\ldots\n",
"$$\n",
"\n",
"$ \\implies $\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mathbb{E}(X) & =\\frac{1}{p}\\\\\\mathbb{D}(X) & =\\frac{1-p}{p^2}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"We draw observations from the distribution and compare the sample mean and variance with the theoretical results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74c198e1",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# specify parameters\n",
"p, n = 0.3, 1_000_000\n",
"\n",
"# draw observations from the distribution\n",
"x = np.random.geometric(p, n)\n",
"\n",
"# compute sample mean and variance\n",
"μ_hat = np.mean(x)\n",
"σ2_hat = np.var(x)\n",
"\n",
"print(\"The sample mean is: \", μ_hat, \"\\nThe sample variance is: \", σ2_hat)\n",
"\n",
"# compare with theoretical results\n",
"print(\"\\nThe population mean is: \", 1/p)\n",
"print(\"The population variance is: \", (1-p)/(p**2))"
]
},
{
"cell_type": "markdown",
"id": "945dd6e0",
"metadata": {},
"source": [
"### Newcomb–Benford distribution\n",
"\n",
"The **Newcomb–Benford law** fits many data sets, e.g., reports of incomes to tax authorities, in which\n",
"the leading digit is more likely to be small than large.\n",
"\n",
"See [https://en.wikipedia.org/wiki/Benford’s_law](https://en.wikipedia.org/wiki/Benford%27s_law)\n",
"\n",
"A Benford probability distribution is\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=d\\}=\\log _{10}(d+1)-\\log _{10}(d)=\\log _{10}\\left(1+\\frac{1}{d}\\right)\n",
"$$\n",
"\n",
"where $ d\\in\\{1,2,\\cdots,9\\} $ can be thought of as a **first digit** in a sequence of digits.\n",
"\n",
"This is a well defined discrete distribution since we can verify that probabilities are nonnegative and sum to $ 1 $.\n",
"\n",
"$$\n",
"\\log_{10}\\left(1+\\frac{1}{d}\\right)\\geq0,\\quad\\sum_{d=1}^{9}\\log_{10}\\left(1+\\frac{1}{d}\\right)=1\n",
"$$\n",
"\n",
"The mean and variance of a Benford distribution are\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mathbb{E}\\left[X\\right]\t &=\\sum_{d=1}^{9}d\\log_{10}\\left(1+\\frac{1}{d}\\right)\\simeq3.4402 \\\\\n",
"\\mathbb{V}\\left[X\\right]\t & =\\sum_{d=1}^{9}\\left(d-\\mathbb{E}\\left[X\\right]\\right)^{2}\\log_{10}\\left(1+\\frac{1}{d}\\right)\\simeq6.0565\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"We verify the above and compute the mean and variance using `numpy`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77b67412",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"Benford_pmf = np.array([np.log10(1+1/d) for d in range(1,10)])\n",
"k = np.array(range(1,10))\n",
"\n",
"# mean\n",
"mean = np.sum(Benford_pmf * k)\n",
"\n",
"# variance\n",
"var = np.sum([(k-mean)**2 * Benford_pmf])\n",
"\n",
"# verify sum to 1\n",
"print(np.sum(Benford_pmf))\n",
"print(mean)\n",
"print(var)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ab523105",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# plot distribution\n",
"plt.plot(range(1,10), Benford_pmf, 'o')\n",
"plt.title('Benford\\'s distribution')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "18ad4725",
"metadata": {},
"source": [
"### Pascal (negative binomial) distribution\n",
"\n",
"Consider a sequence of independent Bernoulli trials.\n",
"\n",
"Let $ p $ be the probability of success.\n",
"\n",
"Let $ X $ be a random variable that represents the number of failures before we get $ r $ success.\n",
"\n",
"Its distribution is\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X & \\sim NB(r,p) \\\\\n",
"\\textrm{Prob}(X=k;r,p) & = \\begin{pmatrix}k+r-1 \\\\ r-1 \\end{pmatrix}p^r(1-p)^{k}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Here, we choose from among $ k+r-1 $ possible outcomes because the last draw is by definition a success.\n",
"\n",
"We compute the mean and variance to be\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mathbb{E}(X) & = \\frac{k(1-p)}{p} \\\\\n",
"\\mathbb{V}(X) & = \\frac{k(1-p)}{p^2}\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56422a9d",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# specify parameters\n",
"r, p, n = 10, 0.3, 1_000_000\n",
"\n",
"# draw observations from the distribution\n",
"x = np.random.negative_binomial(r, p, n)\n",
"\n",
"# compute sample mean and variance\n",
"μ_hat = np.mean(x)\n",
"σ2_hat = np.var(x)\n",
"\n",
"print(\"The sample mean is: \", μ_hat, \"\\nThe sample variance is: \", σ2_hat)\n",
"print(\"\\nThe population mean is: \", r*(1-p)/p)\n",
"print(\"The population variance is: \", r*(1-p)/p**2)"
]
},
{
"cell_type": "markdown",
"id": "315f46ed",
"metadata": {},
"source": [
"## Continuous Random Variables"
]
},
{
"cell_type": "markdown",
"id": "723beda4",
"metadata": {},
"source": [
"### Univariate Gaussian distribution\n",
"\n",
"We write\n",
"\n",
"$$\n",
"X \\sim N(\\mu,\\sigma^2)\n",
"$$\n",
"\n",
"to indicate the probability distribution\n",
"\n",
"$$\n",
"f(x|u,\\sigma^2)=\\frac{1}{\\sqrt{2\\pi \\sigma^2}}e^{[-\\frac{1}{2\\sigma^2}(x-u)^2]}\n",
"$$\n",
"\n",
"In the below example, we set $ \\mu = 0, \\sigma = 0.1 $."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5da2191",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# specify parameters\n",
"μ, σ = 0, 0.1\n",
"\n",
"# specify number of draws\n",
"n = 1_000_000\n",
"\n",
"# draw observations from the distribution\n",
"x = np.random.normal(μ, σ, n)\n",
"\n",
"# compute sample mean and variance\n",
"μ_hat = np.mean(x)\n",
"σ_hat = np.std(x)\n",
"\n",
"print(\"The sample mean is: \", μ_hat)\n",
"print(\"The sample standard deviation is: \", σ_hat)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "696e1ca3",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# compare\n",
"print(μ-μ_hat < 1e-3)\n",
"print(σ-σ_hat < 1e-3)"
]
},
{
"cell_type": "markdown",
"id": "7bfba136",
"metadata": {},
"source": [
"### Uniform Distribution\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X & \\sim U[a,b] \\\\\n",
"f(x)& = \\begin{cases} \\frac{1}{b-a}, & a \\leq x \\leq b \\\\ \\quad0, & \\text{otherwise} \\end{cases}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"The population mean and variance are\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mathbb{E}(X) & = \\frac{a+b}{2} \\\\\n",
"\\mathbb{V}(X) & = \\frac{(b-a)^2}{12}\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ef97f765",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# specify parameters\n",
"a, b = 10, 20\n",
"\n",
"# specify number of draws\n",
"n = 1_000_000\n",
"\n",
"# draw observations from the distribution\n",
"x = a + (b-a)*np.random.rand(n)\n",
"\n",
"# compute sample mean and variance\n",
"μ_hat = np.mean(x)\n",
"σ2_hat = np.var(x)\n",
"\n",
"print(\"The sample mean is: \", μ_hat, \"\\nThe sample variance is: \", σ2_hat)\n",
"print(\"\\nThe population mean is: \", (a+b)/2)\n",
"print(\"The population variance is: \", (b-a)**2/12)"
]
},
{
"cell_type": "markdown",
"id": "e7f7a884",
"metadata": {},
"source": [
"## A Mixed Discrete-Continuous Distribution\n",
"\n",
"We’ll motivate this example with a little story.\n",
"\n",
"Suppose that to apply for a job you take an interview and either pass or fail it.\n",
"\n",
"You have $ 5\\% $ chance to pass an interview and you know your salary will uniformly distributed in the interval 300~400 a day only if you pass.\n",
"\n",
"We can describe your daily salary as a discrete-continuous variable with the following probabilities:\n",
"\n",
"$$\n",
"P(X=0)=0.95\n",
"$$\n",
"\n",
"$$\n",
"P(300\\le X \\le 400)=\\int_{300}^{400} f(x)\\, dx=0.05\n",
"$$\n",
"\n",
"$$\n",
"f(x) = 0.0005\n",
"$$\n",
"\n",
"Let’s start by generating a random sample and computing sample moments."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f4454410",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"x = np.random.rand(1_000_000)\n",
"# x[x > 0.95] = 100*x[x > 0.95]+300\n",
"x[x > 0.95] = 100*np.random.rand(len(x[x > 0.95]))+300\n",
"x[x <= 0.95] = 0\n",
"\n",
"μ_hat = np.mean(x)\n",
"σ2_hat = np.var(x)\n",
"\n",
"print(\"The sample mean is: \", μ_hat, \"\\nThe sample variance is: \", σ2_hat)"
]
},
{
"cell_type": "markdown",
"id": "9e49a6e3",
"metadata": {},
"source": [
"The analytical mean and variance can be computed:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mu &= \\int_{300}^{400}xf(x)dx \\\\\n",
"&= 0.0005\\int_{300}^{400}xdx \\\\\n",
"&= 0.0005 \\times \\frac{1}{2}x^2\\bigg|_{300}^{400}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\sigma^2 &= 0.95\\times(0-17.5)^2+\\int_{300}^{400}(x-17.5)^2f(x)dx \\\\\n",
"&= 0.95\\times17.5^2+0.0005\\int_{300}^{400}(x-17.5)^2dx \\\\\n",
"&= 0.95\\times17.5^2+0.0005 \\times \\frac{1}{3}(x-17.5)^3 \\bigg|_{300}^{400}\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4ff6e16f",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"mean = 0.0005*0.5*(400**2 - 300**2)\n",
"var = 0.95*17.5**2+0.0005/3*((400-17.5)**3-(300-17.5)**3)\n",
"print(\"mean: \", mean)\n",
"print(\"variance: \", var)"
]
},
{
"cell_type": "markdown",
"id": "55026b0a",
"metadata": {},
"source": [
"## Matrix Representation of Some Bivariate Distributions\n",
"\n",
"Let’s use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and variance of a bivariate random variable.\n",
"\n",
"The table below illustrates a probability distribution for a bivariate random variable.\n",
"\n",
"$$\n",
"F=[f_{ij}]=\\left[\\begin{array}{cc}\n",
"0.3 & 0.2\\\\\n",
"0.1 & 0.4\n",
"\\end{array}\\right]\n",
"$$\n",
"\n",
"Marginal distributions are\n",
"\n",
"$$\n",
"\\textrm{Prob}(X=i)=\\sum_j{f_{ij}}=u_i\n",
"$$\n",
"\n",
"$$\n",
"\\textrm{Prob}(Y=j)=\\sum_i{f_{ij}}=v_j\n",
"$$\n",
"\n",
"Below we draw some samples confirm that the “sampling” distribution agrees well with the “population” distribution.\n",
"\n",
"**Sample results:**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75013d97",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# specify parameters\n",
"xs = np.array([0, 1])\n",
"ys = np.array([10, 20])\n",
"f = np.array([[0.3, 0.2], [0.1, 0.4]])\n",
"f_cum = np.cumsum(f)\n",
"\n",
"# draw random numbers\n",
"p = np.random.rand(1_000_000)\n",
"x = np.vstack([xs[1]*np.ones(p.shape), ys[1]*np.ones(p.shape)])\n",
"# map to the bivariate distribution\n",
"\n",
"x[0, p < f_cum[2]] = xs[1]\n",
"x[1, p < f_cum[2]] = ys[0]\n",
"\n",
"x[0, p < f_cum[1]] = xs[0]\n",
"x[1, p < f_cum[1]] = ys[1]\n",
"\n",
"x[0, p < f_cum[0]] = xs[0]\n",
"x[1, p < f_cum[0]] = ys[0]\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"id": "a07c53d9",
"metadata": {},
"source": [
"Here, we use exactly the inverse CDF technique to generate sample from the joint distribution $ F $."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f85ef4ae",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# marginal distribution\n",
"xp = np.sum(x[0, :] == xs[0])/1_000_000\n",
"yp = np.sum(x[1, :] == ys[0])/1_000_000\n",
"\n",
"# print output\n",
"print(\"marginal distribution for x\")\n",
"xmtb = pt.PrettyTable()\n",
"xmtb.field_names = ['x_value', 'x_prob']\n",
"xmtb.add_row([xs[0], xp])\n",
"xmtb.add_row([xs[1], 1-xp])\n",
"print(xmtb)\n",
"\n",
"print(\"\\nmarginal distribution for y\")\n",
"ymtb = pt.PrettyTable()\n",
"ymtb.field_names = ['y_value', 'y_prob']\n",
"ymtb.add_row([ys[0], yp])\n",
"ymtb.add_row([ys[1], 1-yp])\n",
"print(ymtb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a9c31f0c",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# conditional distributions\n",
"xc1 = x[0, x[1, :] == ys[0]]\n",
"xc2 = x[0, x[1, :] == ys[1]]\n",
"yc1 = x[1, x[0, :] == xs[0]]\n",
"yc2 = x[1, x[0, :] == xs[1]]\n",
"\n",
"xc1p = np.sum(xc1 == xs[0])/len(xc1)\n",
"xc2p = np.sum(xc2 == xs[0])/len(xc2)\n",
"yc1p = np.sum(yc1 == ys[0])/len(yc1)\n",
"yc2p = np.sum(yc2 == ys[0])/len(yc2)\n",
"\n",
"# print output\n",
"print(\"conditional distribution for x\")\n",
"xctb = pt.PrettyTable()\n",
"xctb.field_names = ['y_value', 'prob(x=0)', 'prob(x=1)']\n",
"xctb.add_row([ys[0], xc1p, 1-xc1p])\n",
"xctb.add_row([ys[1], xc2p, 1-xc2p])\n",
"print(xctb)\n",
"\n",
"print(\"\\nconditional distribution for y\")\n",
"yctb = pt.PrettyTable()\n",
"yctb.field_names = ['x_value', 'prob(y=10)', 'prob(y=20)']\n",
"yctb.add_row([xs[0], yc1p, 1-yc1p])\n",
"yctb.add_row([xs[1], yc2p, 1-yc2p])\n",
"print(yctb)"
]
},
{
"cell_type": "markdown",
"id": "81ee43e1",
"metadata": {},
"source": [
"Let’s calculate population marginal and conditional probabilities using matrix algebra.\n",
"\n",
"$$\n",
"\\left[\\begin{array}{cccccc}\n",
"\\ & \\vdots & y_{1} & y_{2} & \\vdots & x\\\\\n",
"\\cdots & \\vdots & \\cdots & \\cdots & \\vdots & \\cdots\\\\\n",
"x_{1} & \\vdots & 0.3 & 0.2 & \\vdots & 0.5\\\\\n",
"x_{2} & \\vdots & 0.1 & 0.4 & \\vdots & 0.5\\\\\n",
"\\cdots & \\vdots & \\cdots & \\cdots & \\vdots & \\cdots\\\\\n",
"y & \\vdots & 0.4 & 0.6 & \\vdots & 1\n",
"\\end{array}\\right]\n",
"$$\n",
"\n",
"$ \\implies $\n",
"\n",
"(1) Marginal distribution:\n",
"\n",
"$$\n",
"\\left[\\begin{array}{cccccc}\n",
"var & \\vdots & var_1 & var_2 \\\\\n",
"\\cdots & \\vdots & \\cdots & \\cdots \\\\\n",
"x & \\vdots & 0.5 & 0.5 \\\\\n",
"\\cdots & \\vdots & \\cdots & \\cdots \\\\\n",
"y & \\vdots & 0.4 & 0.6 \\\\\n",
"\\end{array}\\right]\n",
"$$\n",
"\n",
"(2) Conditional distribution:\n",
"\n",
"$$\n",
"\\left[\\begin{array}{cccccc}\n",
"\\quad x & \\vdots & \\quad x_1 & \\quad x_2 \\\\\n",
"\\cdots\\cdots\\cdots & \\vdots & \\cdots\\cdots\\cdots & \\cdots\\cdots\\cdots \\\\\n",
"y=y_1 & \\vdots & \\frac{0.3}{0.4}=0.75 & \\frac{0.1}{0.4}=0.25 \\\\\n",
"\\cdots\\cdots\\cdots & \\vdots & \\cdots\\cdots\\cdots & \\cdots\\cdots\\cdots \\\\\n",
"y=y_2 & \\vdots & \\frac{0.2}{0.6}\\approx 0.33 & \\frac{0.4}{0.6}\\approx0.67 \\\\\n",
"\\end{array}\\right]\n",
"$$\n",
"\n",
"$$\n",
"\\left[\\begin{array}{cccccc}\n",
"\\quad y & \\vdots & \\quad y_1 & \\quad y_2 \\\\\n",
"\\cdots\\cdots\\cdots & \\vdots & \\cdots\\cdots\\cdots & \\cdots\\cdots\\cdots \\\\\n",
"x=x_1 & \\vdots & \\frac{0.3}{0.5}=0.6 & \\frac{0.2}{0.5}=0.4 \\\\\n",
"\\cdots\\cdots\\cdots & \\vdots & \\cdots\\cdots\\cdots & \\cdots\\cdots\\cdots \\\\\n",
"x=x_2 & \\vdots & \\frac{0.1}{0.5}=0.2 & \\frac{0.4}{0.5}=0.8 \\\\\n",
"\\end{array}\\right]\n",
"$$\n",
"\n",
"These population objects closely resemble sample counterparts computed above.\n",
"\n",
"Let’s wrap some of the functions we have used in a Python class for a general discrete bivariate joint distribution."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "33add4d0",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"class discrete_bijoint:\n",
"\n",
" def __init__(self, f, xs, ys):\n",
" '''initialization\n",
" -----------------\n",
" parameters:\n",
" f: the bivariate joint probability matrix\n",
" xs: values of x vector\n",
" ys: values of y vector\n",
" '''\n",
" self.f, self.xs, self.ys = f, xs, ys\n",
"\n",
" def joint_tb(self):\n",
" '''print the joint distribution table'''\n",
" xs = self.xs\n",
" ys = self.ys\n",
" f = self.f\n",
" jtb = pt.PrettyTable()\n",
" jtb.field_names = ['x_value/y_value', *ys, 'marginal sum for x']\n",
" for i in range(len(xs)):\n",
" jtb.add_row([xs[i], *f[i, :], np.sum(f[i, :])])\n",
" jtb.add_row(['marginal_sum for y', *np.sum(f, 0), np.sum(f)])\n",
" print(\"\\nThe joint probability distribution for x and y\\n\", jtb)\n",
" self.jtb = jtb\n",
"\n",
" def draw(self, n):\n",
" '''draw random numbers\n",
" ----------------------\n",
" parameters:\n",
" n: number of random numbers to draw\n",
" '''\n",
" xs = self.xs\n",
" ys = self.ys\n",
" f_cum = np.cumsum(self.f)\n",
" p = np.random.rand(n)\n",
" x = np.empty([2, p.shape[0]])\n",
" lf = len(f_cum)\n",
" lx = len(xs)-1\n",
" ly = len(ys)-1\n",
" for i in range(lf):\n",
" x[0, p < f_cum[lf-1-i]] = xs[lx]\n",
" x[1, p < f_cum[lf-1-i]] = ys[ly]\n",
" if ly == 0:\n",
" lx -= 1\n",
" ly = len(ys)-1\n",
" else:\n",
" ly -= 1\n",
" self.x = x\n",
" self.n = n\n",
"\n",
" def marg_dist(self):\n",
" '''marginal distribution'''\n",
" x = self.x\n",
" xs = self.xs\n",
" ys = self.ys\n",
" n = self.n\n",
" xmp = [np.sum(x[0, :] == xs[i])/n for i in range(len(xs))]\n",
" ymp = [np.sum(x[1, :] == ys[i])/n for i in range(len(ys))]\n",
"\n",
" # print output\n",
" xmtb = pt.PrettyTable()\n",
" ymtb = pt.PrettyTable()\n",
" xmtb.field_names = ['x_value', 'x_prob']\n",
" ymtb.field_names = ['y_value', 'y_prob']\n",
" for i in range(max(len(xs), len(ys))):\n",
" if i < len(xs):\n",
" xmtb.add_row([xs[i], xmp[i]])\n",
" if i < len(ys):\n",
" ymtb.add_row([ys[i], ymp[i]])\n",
" xmtb.add_row(['sum', np.sum(xmp)])\n",
" ymtb.add_row(['sum', np.sum(ymp)])\n",
" print(\"\\nmarginal distribution for x\\n\", xmtb)\n",
" print(\"\\nmarginal distribution for y\\n\", ymtb)\n",
"\n",
" self.xmp = xmp\n",
" self.ymp = ymp\n",
"\n",
" def cond_dist(self):\n",
" '''conditional distribution'''\n",
" x = self.x\n",
" xs = self.xs\n",
" ys = self.ys\n",
" n = self.n\n",
" xcp = np.empty([len(ys), len(xs)])\n",
" ycp = np.empty([len(xs), len(ys)])\n",
" for i in range(max(len(ys), len(xs))):\n",
" if i < len(ys):\n",
" xi = x[0, x[1, :] == ys[i]]\n",
" idx = xi.reshape(len(xi), 1) == xs.reshape(1, len(xs))\n",
" xcp[i, :] = np.sum(idx, 0)/len(xi)\n",
" if i < len(xs):\n",
" yi = x[1, x[0, :] == xs[i]]\n",
" idy = yi.reshape(len(yi), 1) == ys.reshape(1, len(ys))\n",
" ycp[i, :] = np.sum(idy, 0)/len(yi)\n",
"\n",
" # print output\n",
" xctb = pt.PrettyTable()\n",
" yctb = pt.PrettyTable()\n",
" xctb.field_names = ['x_value', *xs, 'sum']\n",
" yctb.field_names = ['y_value', *ys, 'sum']\n",
" for i in range(max(len(xs), len(ys))):\n",
" if i < len(ys):\n",
" xctb.add_row([ys[i], *xcp[i], np.sum(xcp[i])])\n",
" if i < len(xs):\n",
" yctb.add_row([xs[i], *ycp[i], np.sum(ycp[i])])\n",
" print(\"\\nconditional distribution for x\\n\", xctb)\n",
" print(\"\\nconditional distribution for y\\n\", yctb)\n",
"\n",
" self.xcp = xcp\n",
" self.xyp = ycp"
]
},
{
"cell_type": "markdown",
"id": "b289ff26",
"metadata": {},
"source": [
"Let’s apply our code to some examples.\n",
"\n",
"**Example 1**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "259d349d",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# joint\n",
"d = discrete_bijoint(f, xs, ys)\n",
"d.joint_tb()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c5fca877",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# sample marginal\n",
"d.draw(1_000_000)\n",
"d.marg_dist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "00651c47",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# sample conditional\n",
"d.cond_dist()"
]
},
{
"cell_type": "markdown",
"id": "d37097f6",
"metadata": {},
"source": [
"**Example 2**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bf948b68",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"xs_new = np.array([10, 20, 30])\n",
"ys_new = np.array([1, 2])\n",
"f_new = np.array([[0.2, 0.1], [0.1, 0.3], [0.15, 0.15]])\n",
"d_new = discrete_bijoint(f_new, xs_new, ys_new)\n",
"d_new.joint_tb()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ede7104",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"d_new.draw(1_000_000)\n",
"d_new.marg_dist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cc71bcaf",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"d_new.cond_dist()"
]
},
{
"cell_type": "markdown",
"id": "6193f878",
"metadata": {},
"source": [
"## A Continuous Bivariate Random Vector\n",
"\n",
"A two-dimensional Gaussian distribution has joint density\n",
"\n",
"$$\n",
"f(x,y) =(2\\pi\\sigma_1\\sigma_2\\sqrt{1-\\rho^2})^{-1}\\exp\\left[-\\frac{1}{2(1-\\rho^2)}\\left(\\frac{(x-\\mu_1)^2}{\\sigma_1^2}-\\frac{2\\rho(x-\\mu_1)(y-\\mu_2)}{\\sigma_1\\sigma_2}+\\frac{(y-\\mu_2)^2}{\\sigma_2^2}\\right)\\right]\n",
"$$\n",
"\n",
"$$\n",
"\\frac{1}{2\\pi\\sigma_1\\sigma_2\\sqrt{1-\\rho^2}}\\exp\\left[-\\frac{1}{2(1-\\rho^2)}\\left(\\frac{(x-\\mu_1)^2}{\\sigma_1^2}-\\frac{2\\rho(x-\\mu_1)(y-\\mu_2)}{\\sigma_1\\sigma_2}+\\frac{(y-\\mu_2)^2}{\\sigma_2^2}\\right)\\right]\n",
"$$\n",
"\n",
"We start with a bivariate normal distribution pinned down by\n",
"\n",
"$$\n",
"\\mu=\\left[\\begin{array}{c}\n",
"0\\\\\n",
"5\n",
"\\end{array}\\right],\\quad\\Sigma=\\left[\\begin{array}{cc}\n",
"5 & .2\\\\\n",
".2 & 1\n",
"\\end{array}\\right]\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "671792c6",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# define the joint probability density function\n",
"def func(x, y, μ1=0, μ2=5, σ1=np.sqrt(5), σ2=np.sqrt(1), ρ=.2/np.sqrt(5*1)):\n",
" A = (2 * np.pi * σ1 * σ2 * np.sqrt(1 - ρ**2))**(-1)\n",
" B = -1 / 2 / (1 - ρ**2)\n",
" C1 = (x - μ1)**2 / σ1**2\n",
" C2 = 2 * ρ * (x - μ1) * (y - μ2) / σ1 / σ2\n",
" C3 = (y - μ2)**2 / σ2**2\n",
" return A * np.exp(B * (C1 - C2 + C3))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "39728410",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"μ1 = 0\n",
"μ2 = 5\n",
"σ1 = np.sqrt(5)\n",
"σ2 = np.sqrt(1)\n",
"ρ = .2 / np.sqrt(5 * 1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abcb37e8",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"x = np.linspace(-10, 10, 1_000)\n",
"y = np.linspace(-10, 10, 1_000)\n",
"x_mesh, y_mesh = np.meshgrid(x, y, indexing=\"ij\")"
]
},
{
"cell_type": "markdown",
"id": "3180e32a",
"metadata": {},
"source": [
"**Joint Distribution**\n",
"\n",
"Let’s plot the **population** joint density."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f04e4b85",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# %matplotlib notebook\n",
"\n",
"fig = plt.figure()\n",
"ax = plt.axes(projection='3d')\n",
"\n",
"surf = ax.plot_surface(x_mesh, y_mesh, func(x_mesh, y_mesh), cmap='viridis')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "74e604f8",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# %matplotlib notebook\n",
"\n",
"fig = plt.figure()\n",
"ax = plt.axes(projection='3d')\n",
"\n",
"curve = ax.contour(x_mesh, y_mesh, func(x_mesh, y_mesh), zdir='x')\n",
"plt.ylabel('y')\n",
"ax.set_zlabel('f')\n",
"ax.set_xticks([])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "f7398a46",
"metadata": {},
"source": [
"Next we can simulate from a built-in `numpy` function and calculate a **sample** marginal distribution from the sample mean and variance."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d681a02",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"μ= np.array([0, 5])\n",
"σ= np.array([[5, .2], [.2, 1]])\n",
"n = 1_000_000\n",
"data = np.random.multivariate_normal(μ, σ, n)\n",
"x = data[:, 0]\n",
"y = data[:, 1]"
]
},
{
"cell_type": "markdown",
"id": "584032e1",
"metadata": {},
"source": [
"**Marginal distribution**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1c00795e",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"plt.hist(x, bins=1_000, alpha=0.6)\n",
"μx_hat, σx_hat = np.mean(x), np.std(x)\n",
"print(μx_hat, σx_hat)\n",
"x_sim = np.random.normal(μx_hat, σx_hat, 1_000_000)\n",
"plt.hist(x_sim, bins=1_000, alpha=0.4, histtype=\"step\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eae460e4",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"plt.hist(y, bins=1_000, density=True, alpha=0.6)\n",
"μy_hat, σy_hat = np.mean(y), np.std(y)\n",
"print(μy_hat, σy_hat)\n",
"y_sim = np.random.normal(μy_hat, σy_hat, 1_000_000)\n",
"plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype=\"step\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "5c994e8b",
"metadata": {},
"source": [
"**Conditional distribution**\n",
"\n",
"The population conditional distribution is\n",
"\n",
"$$\n",
"\\begin{aligned} \\\\\n",
"[X|Y &= y ]\\sim \\mathbb{N}\\bigg[\\mu_X+\\rho\\sigma_X\\frac{y-\\mu_Y}{\\sigma_Y},\\sigma_X^2(1-\\rho^2)\\bigg] \\\\\n",
"[Y|X &= x ]\\sim \\mathbb{N}\\bigg[\\mu_Y+\\rho\\sigma_Y\\frac{x-\\mu_X}{\\sigma_X},\\sigma_Y^2(1-\\rho^2)\\bigg]\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Let’s approximate the joint density by discretizing and mapping the approximating joint density into a matrix.\n",
"\n",
"We can compute the discretized marginal density by just using matrix algebra and noting that\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=i|Y=j\\}=\\frac{f_{ij}}{\\sum_{i}f_{ij}}\n",
"$$\n",
"\n",
"Fix $ y=0 $."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "489afa25",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# discretized marginal density\n",
"x = np.linspace(-10, 10, 1_000_000)\n",
"z = func(x, y=0) / np.sum(func(x, y=0))\n",
"plt.plot(x, z)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "5901c87e",
"metadata": {},
"source": [
"The mean and variance are computed by\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\mathbb{E}\\left[X\\vert Y=j\\right] & =\\sum_{i}iProb\\{X=i\\vert Y=j\\}=\\sum_{i}i\\frac{f_{ij}}{\\sum_{i}f_{ij}} \\\\\n",
"\\mathbb{D}\\left[X\\vert Y=j\\right] &=\\sum_{i}\\left(i-\\mu_{X\\vert Y=j}\\right)^{2}\\frac{f_{ij}}{\\sum_{i}f_{ij}}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Let’s draw from a normal distribution with above mean and variance and check how accurate our approximation is."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "aded2197",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# discretized mean\n",
"μx = np.dot(x, z)\n",
"\n",
"# discretized standard deviation\n",
"σx = np.sqrt(np.dot((x - μx)**2, z))\n",
"\n",
"# sample\n",
"zz = np.random.normal(μx, σx, 1_000_000)\n",
"plt.hist(zz, bins=300, density=True, alpha=0.3, range=[-10, 10])\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "ca34f42f",
"metadata": {},
"source": [
"Fix $ x=1 $."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98f27449",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"y = np.linspace(0, 10, 1_000_000)\n",
"z = func(x=1, y=y) / np.sum(func(x=1, y=y))\n",
"plt.plot(y,z)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5ef3a09",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# discretized mean and standard deviation\n",
"μy = np.dot(y,z)\n",
"σy = np.sqrt(np.dot((y - μy)**2, z))\n",
"\n",
"# sample\n",
"zz = np.random.normal(μy,σy,1_000_000)\n",
"plt.hist(zz, bins=100, density=True, alpha=0.3)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9b67f1bd",
"metadata": {},
"source": [
"We compare with the analytically computed parameters and note that they are close."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1f1e41f2",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"print(μx, σx)\n",
"print(μ1 + ρ * σ1 * (0 - μ2) / σ2, np.sqrt(σ1**2 * (1 - ρ**2)))\n",
"\n",
"print(μy, σy)\n",
"print(μ2 + ρ * σ2 * (1 - μ1) / σ1, np.sqrt(σ2**2 * (1 - ρ**2)))"
]
},
{
"cell_type": "markdown",
"id": "77651705",
"metadata": {},
"source": [
"## Sum of Two Independently Distributed Random Variables\n",
"\n",
"Let $ X, Y $ be two independent discrete random variables that take values in $ \\bar{X}, \\bar{Y} $, respectively.\n",
"\n",
"Define a new random variable $ Z=X+Y $.\n",
"\n",
"Evidently, $ Z $ takes values from $ \\bar{Z} $ defined as follows:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\bar{X} & =\\{0,1,\\ldots,I-1\\};\\qquad f_i= \\textrm{Prob} \\{X=i\\}\\\\\n",
"\\bar{Y} & =\\{0,1,\\ldots,J-1\\};\\qquad g_j= \\textrm{Prob}\\{Y=j\\}\\\\\n",
"\\bar{Z}& =\\{0,1,\\ldots,I+J-2\\};\\qquad h_k= \\textrm{Prob} \\{X+Y=k\\}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Independence of $ X $ and $ Y $ implies that\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"h_k & =\\textrm{Prob}\\{X=0,Y=k\\}+\\textrm{Prob}\\{X=1,Y=k-1\\}+\\ldots+\\textrm{Prob}\\{X=k,Y=0\\}\\\\\n",
"h_k& =f_0g_k+f_1g_{k-1}+\\ldots+f_{k-1}g_1+f_kg_0 \\qquad \\text{for}\\quad k=0,1,\\ldots,I+J-2\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Thus, we have:\n",
"\n",
"$$\n",
"h_k=\\sum_{i=0}^{k} f_ig_{k-i} \\equiv f*g\n",
"$$\n",
"\n",
"where $ f * g $ denotes the **convolution** of the $ f $ and $ g $ sequences.\n",
"\n",
"Similarly, for two random variables $ X,Y $ with densities $ f_{X}, g_{Y} $, the density of $ Z=X+Y $ is\n",
"\n",
"$$\n",
"f_{Z}(z)=\\int_{-\\infty}^{\\infty} f_{X}(x) f_{Y}(z-x) dx \\equiv f_{X}*g_{Y}\n",
"$$\n",
"\n",
"where $ f_{X}*g_{Y} $ denotes the **convolution** of the $ f_X $ and $ g_Y $ functions."
]
},
{
"cell_type": "markdown",
"id": "8c6695c4",
"metadata": {},
"source": [
"## Transition Probability Matrix\n",
"\n",
"Consider the following joint probability distribution of two random variables.\n",
"\n",
"Let $ X,Y $ be discrete random variables with joint distribution\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{X=i,Y=j\\} = \\rho_{ij}\n",
"$$\n",
"\n",
"where $ i = 0,\\dots,I-1; j = 0,\\dots,J-1 $ and\n",
"\n",
"$$\n",
"\\sum_i\\sum_j \\rho_{ij} = 1, \\quad \\rho_{ij} \\geqslant 0.\n",
"$$\n",
"\n",
"An associated conditional distribution is\n",
"\n",
"$$\n",
"\\textrm{Prob}\\{Y=i\\vert X=j\\} = \\frac{\\rho_{ij}}{ \\sum_{i}\\rho_{ij}}\n",
"= \\frac{\\textrm{Prob}\\{Y=j, X=i\\}}{\\textrm{Prob}\\{ X=i\\}}\n",
"$$\n",
"\n",
"We can define a transition probability matrix\n",
"\n",
"$$\n",
"p_{ij}=\\textrm{Prob}\\{Y=j|X=i\\}= \\frac{\\rho_{ij}}{ \\sum_{j}\\rho_{ij}}\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\left[\n",
" \\begin{matrix}\n",
" p_{11} & p_{12}\\\\\n",
" p_{21} & p_{22}\n",
" \\end{matrix}\n",
"\\right]\n",
"$$\n",
"\n",
"The first row is the probability of $ Y=j, j=0,1 $ conditional on $ X=0 $.\n",
"\n",
"The second row is the probability of $ Y=j, j=0,1 $ conditional on $ X=1 $.\n",
"\n",
"Note that\n",
"\n",
"- $ \\sum_{j}\\rho_{ij}= \\frac{ \\sum_{j}\\rho_{ij}}{ \\sum_{j}\\rho_{ij}}=1 $, so each row of $ \\rho $ is a probability distribution (not so for each column. "
]
},
{
"cell_type": "markdown",
"id": "6c9de071",
"metadata": {},
"source": [
"## Coupling\n",
"\n",
"Start with a joint distribution\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"f_{ij} & =\\textrm{Prob}\\{X=i,Y=j\\}\\\\\n",
"i& =0, \\cdots,I-1\\\\\n",
"j& =0, \\cdots,J-1\\\\\n",
"& \\text{stacked to an }I×J\\text{ matrix}\\\\\n",
"& e.g. \\quad I=1, J=1\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\left[\n",
" \\begin{matrix}\n",
" f_{11} & f_{12}\\\\\n",
" f_{21} & f_{22}\n",
" \\end{matrix}\n",
"\\right]\n",
"$$\n",
"\n",
"From the joint distribution, we have shown above that we obtain **unique** marginal distributions.\n",
"\n",
"Now we’ll try to go in a reverse direction.\n",
"\n",
"We’ll find that from two marginal distributions, can we usually construct more than one joint distribution that verifies these marginals.\n",
"\n",
"Each of these joint distributions is called a **coupling** of the two marginal distributions.\n",
"\n",
"Let’s start with marginal distributions\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\text{Prob} \\{X=i\\} &= \\sum_{j}f_{ij}=\\mu_{i}, i=0, \\cdots, I-1\\\\\n",
"\\text{Prob} \\{Y=j\\}&= \\sum_{j}f_{ij}=\\nu_{j}, j=0, \\cdots, J-1\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Given two marginal distribution, $ \\mu $ for $ X $ and $ \\nu $ for $ Y $, a joint distribution $ f_{ij} $ is said to be a **coupling** of $ \\mu $ and $ \\nu $.\n",
"\n",
"**Example:**\n",
"\n",
"Consider the following bivariate example.\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\text{Prob} \\{X=0\\}= & 1-q =\\mu_{0}\\\\\n",
"\\text{Prob} \\{X=1\\}=& q =\\mu_{1}\\\\\n",
"\\text{Prob} \\{Y=0\\}=& 1-r =\\nu_{0}\\\\\n",
"\\text{Prob} \\{Y=1\\}= & r =\\nu_{1}\\\\\n",
"\\text{where } 0 \\leq q < r \\leq 1\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"We construct two couplings.\n",
"\n",
"The first coupling if our two marginal distributions is the joint distribution\n",
"\n",
"$$\n",
"f_{ij}=\n",
"\\left[\n",
" \\begin{matrix}\n",
" (1-q)(1-r)& (1-q)r\\\\\n",
" q(1-r) & qr\\\\\n",
" \\end{matrix}\n",
"\\right]\n",
"$$\n",
"\n",
"To verify that it is a coupling, we check that\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"(1-q)(1-r)+(1-q)r+q(1-r)+qr &=1\\\\\n",
"\\mu_{0}= (1-q)(1-r)+(1-q)r & =1-q\\\\\n",
"\\mu_{1}= q(1-r)+qr & =q\\\\\n",
"\\nu_{0}= (1-q)(1-r)+(1-r)q& =1-r\\\\\n",
"\\mu_{1}= r(1-q)+qr& =r\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"A second coupling of our two marginal distributions is the joint distribution\n",
"\n",
"$$\n",
"f_{ij}=\n",
"\\left[\n",
" \\begin{matrix}\n",
"(1-r)&r-q\\\\\n",
"0 & q\\\\\n",
" \\end{matrix}\n",
"\\right]\n",
"$$\n",
"\n",
"The verify that this is a coupling, note that\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"1-r+r-q+q &=1\\\\\n",
"\\mu_{0}& = 1-q\\\\\n",
"\\mu_{1}& = q\\\\\n",
"\\nu_{0}& = 1-r\\\\\n",
"\\nu_{1}& = r\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Thus, our two proposed joint distributions have the same marginal distributions.\n",
"\n",
"But the joint distributions differ.\n",
"\n",
"Thus, multiple joint distributions $ [f_{ij}] $ can have the same marginals.\n",
"\n",
"**Remark:**\n",
"\n",
"- Couplings are important in optimal transport problems and in Markov processes. "
]
},
{
"cell_type": "markdown",
"id": "8a81b8c6",
"metadata": {},
"source": [
"## Copula Functions\n",
"\n",
"Suppose that $ X_1, X_2, \\dots, X_n $ are $ N $ random variables and that\n",
"\n",
"- their marginal distributions are $ F_1(x_1), F_2(x_2),\\dots, F_N(x_N) $, and \n",
"- their joint distribution is $ H(x_1,x_2,\\dots,x_N) $ \n",
"\n",
"\n",
"Then there exists a **copula function** $ C(\\cdot) $ that verifies\n",
"\n",
"$$\n",
"H(x_1,x_2,\\dots,x_N) = C(F_1(x_1), F_2(x_2),\\dots,F_N(x_N)).\n",
"$$\n",
"\n",
"We can obtain\n",
"\n",
"$$\n",
"C(u_1,u_2,\\dots,u_n) = H[F^{-1}_1(u_1),F^{-1}_2(u_2),\\dots,F^{-1}_N(u_N)]\n",
"$$\n",
"\n",
"In a reverse direction of logic, given univariate **marginal distributions**\n",
"$ F_1(x_1), F_2(x_2),\\dots,F_N(x_N) $ and a copula function $ C(\\cdot) $, the function $ H(x_1,x_2,\\dots,x_N) = C(F_1(x_1), F_2(x_2),\\dots,F_N(x_N)) $ is a **coupling** of $ F_1(x_1), F_2(x_2),\\dots,F_N(x_N) $.\n",
"\n",
"Thus, for given marginal distributions, we can use a copula function to determine a joint distribution when the associated univariate random variables are not independent.\n",
"\n",
"Copula functions are often used to characterize **dependence** of random variables.\n",
"\n",
"**Discrete marginal distribution**\n",
"\n",
"As mentioned above, for two given marginal distributions there can be more than one coupling.\n",
"\n",
"For example, consider two random variables $ X, Y $ with distributions\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\text{Prob}(X = 0)& = 0.6,\\\\\n",
"\\text{Prob}(X = 1) &= 0.4,\\\\\n",
"\\text{Prob}(Y = 0)& = 0.3,\\\\\n",
"\\text{Prob}(Y = 1) &= 0.7,\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"For these two random variables there can be more than one coupling.\n",
"\n",
"Let’s first generate X and Y."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11c4288d",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# define parameters\n",
"mu = np.array([0.6, 0.4])\n",
"nu = np.array([0.3, 0.7])\n",
"\n",
"# number of draws\n",
"draws = 1_000_000\n",
"\n",
"# generate draws from uniform distribution\n",
"p = np.random.rand(draws)\n",
"\n",
"# generate draws of X and Y via uniform distribution\n",
"x = np.ones(draws)\n",
"y = np.ones(draws)\n",
"x[p <= mu[0]] = 0\n",
"x[p > mu[0]] = 1\n",
"y[p <= nu[0]] = 0\n",
"y[p > nu[0]] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ccb885e4",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# calculate parameters from draws\n",
"q_hat = sum(x[x == 1])/draws\n",
"r_hat = sum(y[y == 1])/draws\n",
"\n",
"# print output\n",
"print(\"distribution for x\")\n",
"xmtb = pt.PrettyTable()\n",
"xmtb.field_names = ['x_value', 'x_prob']\n",
"xmtb.add_row([0, 1-q_hat])\n",
"xmtb.add_row([1, q_hat])\n",
"print(xmtb)\n",
"\n",
"print(\"distribution for y\")\n",
"ymtb = pt.PrettyTable()\n",
"ymtb.field_names = ['y_value', 'y_prob']\n",
"ymtb.add_row([0, 1-r_hat])\n",
"ymtb.add_row([1, r_hat])\n",
"print(ymtb)"
]
},
{
"cell_type": "markdown",
"id": "1bd08ca9",
"metadata": {},
"source": [
"Let’s now take our two marginal distributions, one for $ X $, the other for $ Y $, and construct two distinct couplings.\n",
"\n",
"For the first joint distribution:\n",
"\n",
"$$\n",
"\\textrm{Prob}(X=i,Y=j) = f_{ij}\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"[f_{ij}] = \\left[\\begin{array}{cc}\n",
"0.18 & 0.42\\\\\n",
"0.12 & 0.28\n",
"\\end{array}\\right]\n",
"$$\n",
"\n",
"Let’s use Python to construct this joint distribution and then verify that its marginal distributions are what we want."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff74ba96",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# define parameters\n",
"f1 = np.array([[0.18, 0.42], [0.12, 0.28]])\n",
"f1_cum = np.cumsum(f1)\n",
"\n",
"# number of draws\n",
"draws1 = 1_000_000\n",
"\n",
"# generate draws from uniform distribution\n",
"p = np.random.rand(draws1)\n",
"\n",
"# generate draws of first copuling via uniform distribution\n",
"c1 = np.vstack([np.ones(draws1), np.ones(draws1)])\n",
"# X=0, Y=0\n",
"c1[0, p <= f1_cum[0]] = 0\n",
"c1[1, p <= f1_cum[0]] = 0\n",
"# X=0, Y=1\n",
"c1[0, (p > f1_cum[0])*(p <= f1_cum[1])] = 0\n",
"c1[1, (p > f1_cum[0])*(p <= f1_cum[1])] = 1\n",
"# X=1, Y=0\n",
"c1[0, (p > f1_cum[1])*(p <= f1_cum[2])] = 1\n",
"c1[1, (p > f1_cum[1])*(p <= f1_cum[2])] = 0\n",
"# X=1, Y=1\n",
"c1[0, (p > f1_cum[2])*(p <= f1_cum[3])] = 1\n",
"c1[1, (p > f1_cum[2])*(p <= f1_cum[3])] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edc57ecc",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# calculate parameters from draws\n",
"f1_00 = sum((c1[0, :] == 0)*(c1[1, :] == 0))/draws1\n",
"f1_01 = sum((c1[0, :] == 0)*(c1[1, :] == 1))/draws1\n",
"f1_10 = sum((c1[0, :] == 1)*(c1[1, :] == 0))/draws1\n",
"f1_11 = sum((c1[0, :] == 1)*(c1[1, :] == 1))/draws1\n",
"\n",
"# print output of first joint distribution\n",
"print(\"first joint distribution for c1\")\n",
"c1_mtb = pt.PrettyTable()\n",
"c1_mtb.field_names = ['c1_x_value', 'c1_y_value', 'c1_prob']\n",
"c1_mtb.add_row([0, 0, f1_00])\n",
"c1_mtb.add_row([0, 1, f1_01])\n",
"c1_mtb.add_row([1, 0, f1_10])\n",
"c1_mtb.add_row([1, 1, f1_11])\n",
"print(c1_mtb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be2be6ea",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# calculate parameters from draws\n",
"c1_q_hat = sum(c1[0, :] == 1)/draws1\n",
"c1_r_hat = sum(c1[1, :] == 1)/draws1\n",
"\n",
"# print output\n",
"print(\"marginal distribution for x\")\n",
"c1_x_mtb = pt.PrettyTable()\n",
"c1_x_mtb.field_names = ['c1_x_value', 'c1_x_prob']\n",
"c1_x_mtb.add_row([0, 1-c1_q_hat])\n",
"c1_x_mtb.add_row([1, c1_q_hat])\n",
"print(c1_x_mtb)\n",
"\n",
"print(\"marginal distribution for y\")\n",
"c1_ymtb = pt.PrettyTable()\n",
"c1_ymtb.field_names = ['c1_y_value', 'c1_y_prob']\n",
"c1_ymtb.add_row([0, 1-c1_r_hat])\n",
"c1_ymtb.add_row([1, c1_r_hat])\n",
"print(c1_ymtb)"
]
},
{
"cell_type": "markdown",
"id": "faeb35f0",
"metadata": {},
"source": [
"Now, let’s construct another joint distribution that is also a coupling of $ X $ and $ Y $\n",
"\n",
"$$\n",
"[f_{ij}] = \\left[\\begin{array}{cc}\n",
"0.3 & 0.3\\\\\n",
"0 & 0.4\n",
"\\end{array}\\right]\n",
"$$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "37171614",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# define parameters\n",
"f2 = np.array([[0.3, 0.3], [0, 0.4]])\n",
"f2_cum = np.cumsum(f2)\n",
"\n",
"# number of draws\n",
"draws2 = 1_000_000\n",
"\n",
"# generate draws from uniform distribution\n",
"p = np.random.rand(draws2)\n",
"\n",
"# generate draws of first coupling via uniform distribution\n",
"c2 = np.vstack([np.ones(draws2), np.ones(draws2)])\n",
"# X=0, Y=0\n",
"c2[0, p <= f2_cum[0]] = 0\n",
"c2[1, p <= f2_cum[0]] = 0\n",
"# X=0, Y=1\n",
"c2[0, (p > f2_cum[0])*(p <= f2_cum[1])] = 0\n",
"c2[1, (p > f2_cum[0])*(p <= f2_cum[1])] = 1\n",
"# X=1, Y=0\n",
"c2[0, (p > f2_cum[1])*(p <= f2_cum[2])] = 1\n",
"c2[1, (p > f2_cum[1])*(p <= f2_cum[2])] = 0\n",
"# X=1, Y=1\n",
"c2[0, (p > f2_cum[2])*(p <= f2_cum[3])] = 1\n",
"c2[1, (p > f2_cum[2])*(p <= f2_cum[3])] = 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1ad6ebb0",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# calculate parameters from draws\n",
"f2_00 = sum((c2[0, :] == 0)*(c2[1, :] == 0))/draws2\n",
"f2_01 = sum((c2[0, :] == 0)*(c2[1, :] == 1))/draws2\n",
"f2_10 = sum((c2[0, :] == 1)*(c2[1, :] == 0))/draws2\n",
"f2_11 = sum((c2[0, :] == 1)*(c2[1, :] == 1))/draws2\n",
"\n",
"# print output of second joint distribution\n",
"print(\"first joint distribution for c2\")\n",
"c2_mtb = pt.PrettyTable()\n",
"c2_mtb.field_names = ['c2_x_value', 'c2_y_value', 'c2_prob']\n",
"c2_mtb.add_row([0, 0, f2_00])\n",
"c2_mtb.add_row([0, 1, f2_01])\n",
"c2_mtb.add_row([1, 0, f2_10])\n",
"c2_mtb.add_row([1, 1, f2_11])\n",
"print(c2_mtb)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "369bf56b",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"# calculate parameters from draws\n",
"c2_q_hat = sum(c2[0, :] == 1)/draws2\n",
"c2_r_hat = sum(c2[1, :] == 1)/draws2\n",
"\n",
"# print output\n",
"print(\"marginal distribution for x\")\n",
"c2_x_mtb = pt.PrettyTable()\n",
"c2_x_mtb.field_names = ['c2_x_value', 'c2_x_prob']\n",
"c2_x_mtb.add_row([0, 1-c2_q_hat])\n",
"c2_x_mtb.add_row([1, c2_q_hat])\n",
"print(c2_x_mtb)\n",
"\n",
"print(\"marginal distribution for y\")\n",
"c2_ymtb = pt.PrettyTable()\n",
"c2_ymtb.field_names = ['c2_y_value', 'c2_y_prob']\n",
"c2_ymtb.add_row([0, 1-c2_r_hat])\n",
"c2_ymtb.add_row([1, c2_r_hat])\n",
"print(c2_ymtb)"
]
},
{
"cell_type": "markdown",
"id": "c2737267",
"metadata": {},
"source": [
"We have verified that both joint distributions, $ c_1 $ and $ c_2 $, have identical marginal distributions of $ X $ and $ Y $, respectively.\n",
"\n",
"So they are both couplings of $ X $ and $ Y $."
]
},
{
"cell_type": "markdown",
"id": "83561614",
"metadata": {},
"source": [
"## Time Series\n",
"\n",
"Suppose that there are two time periods.\n",
"\n",
"- $ t=0 $ “today” \n",
"- $ t=1 $ “tomorrow” \n",
"\n",
"\n",
"Let $ X(0) $ be a random variable to be realized at $ t=0 $, $ X(1) $ be a random variable to be realized at $ t=1 $.\n",
"\n",
"Suppose that\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"\\text{Prob} \\{X(0)=i,X(1)=j\\} &=f_{ij}≥0,i=0,\\cdots,I-1\\\\\n",
"\\sum_{i}\\sum_{j}f_{ij}&=1\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"$ f_{ij} $ is a joint distribution over $ [X(0), X(1)] $.\n",
"\n",
"A conditional distribution is\n",
"\n",
"$$\n",
"\\text{Prob} \\{X(1)=j|X(0)=i\\}= \\frac{f_{ij}}{ \\sum_{j}f_{ij}}\n",
"$$\n",
"\n",
"**Remark:**\n",
"\n",
"- This is a key formula for a theory of optimally predicting a time series. "
]
}
],
"metadata": {
"date": 1710733975.5642238,
"filename": "prob_matrix.md",
"kernelspec": {
"display_name": "Python",
"language": "python3",
"name": "python3"
},
"title": "Elementary Probability with Matrices"
},
"nbformat": 4,
"nbformat_minor": 5
}