{
"cells": [
{
"cell_type": "markdown",
"id": "ff050ad3",
"metadata": {},
"source": [
"# Singular Value Decomposition (SVD)"
]
},
{
"cell_type": "markdown",
"id": "7566c7be",
"metadata": {},
"source": [
"## Overview\n",
"\n",
"The **singular value decomposition** (SVD) is a work-horse in applications of least squares projection that\n",
"form foundations for many statistical and machine learning methods.\n",
"\n",
"After defining the SVD, we’ll describe how it connects to\n",
"\n",
"- **four fundamental spaces** of linear algebra \n",
"- under-determined and over-determined **least squares regressions** \n",
"- **principal components analysis** (PCA) \n",
"\n",
"\n",
"Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure that represents salient patterns by projecting data onto a limited set of factors.\n",
"\n",
"In a sequel to this lecture about [Dynamic Mode Decompositions](https://python.quantecon.org/var_dmd.html), we’ll describe how SVD’s provide ways rapidly to compute reduced-order approximations to first-order Vector Autoregressions (VARs)."
]
},
{
"cell_type": "markdown",
"id": "82a26782",
"metadata": {},
"source": [
"## The Setting\n",
"\n",
"Let $ X $ be an $ m \\times n $ matrix of rank $ p $.\n",
"\n",
"Necessarily, $ p \\leq \\min(m,n) $.\n",
"\n",
"In much of this lecture, we’ll think of $ X $ as a matrix of **data** in which\n",
"\n",
"- each column is an **individual** – a time period or person, depending on the application \n",
"- each row is a **random variable** describing an attribute of a time period or a person, depending on the application \n",
"\n",
"\n",
"We’ll be interested in two situations\n",
"\n",
"- A **short and fat** case in which $ m << n $, so that there are many more columns (individuals) than rows (attributes). \n",
"- A **tall and skinny** case in which $ m >> n $, so that there are many more rows (attributes) than columns (individuals). \n",
"\n",
"\n",
"We’ll apply a **singular value decomposition** of $ X $ in both situations.\n",
"\n",
"In the $ m < < n $ case in which there are many more individuals $ n $ than attributes $ m $, we can calculate sample moments of a joint distribution by taking averages across observations of functions of the observations.\n",
"\n",
"In this $ m < < n $ case, we’ll look for **patterns** by using a **singular value decomposition** to do a **principal components analysis** (PCA).\n",
"\n",
"In the $ m > > n $ case in which there are many more attributes $ m $ than individuals $ n $ and when we are in a time-series setting in which $ n $ equals the number of time periods covered in the data set $ X $, we’ll proceed in a different way.\n",
"\n",
"We’ll again use a **singular value decomposition**, but now to construct a **dynamic mode decomposition** (DMD)"
]
},
{
"cell_type": "markdown",
"id": "9ded7f91",
"metadata": {},
"source": [
"## Singular Value Decomposition\n",
"\n",
"A **singular value decomposition** of an $ m \\times n $ matrix $ X $ of rank $ p \\leq \\min(m,n) $ is\n",
"\n",
"\n",
"\n",
"$$\n",
"X = U \\Sigma V^\\top \\tag{5.1}\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"UU^\\top & = I & \\quad U^\\top U = I \\cr\n",
"VV^\\top & = I & \\quad V^\\top V = I\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"and\n",
"\n",
"- $ U $ is an $ m \\times m $ orthogonal matrix of **left singular vectors** of $ X $ \n",
"- Columns of $ U $ are eigenvectors of $ X X^\\top $ \n",
"- $ V $ is an $ n \\times n $ orthogonal matrix of **right singular vectors** of $ X $ \n",
"- Columns of $ V $ are eigenvectors of $ X^\\top X $ \n",
"- $ \\Sigma $ is an $ m \\times n $ matrix in which the first $ p $ places on its main diagonal are positive numbers $ \\sigma_1, \\sigma_2, \\ldots, \\sigma_p $ called **singular values**; remaining entries of $ \\Sigma $ are all zero \n",
"- The $ p $ singular values are positive square roots of the eigenvalues of the $ m \\times m $ matrix $ X X^\\top $ and also of the $ n \\times n $ matrix $ X^\\top X $ \n",
"- We adopt a convention that when $ U $ is a complex valued matrix, $ U^\\top $ denotes the **conjugate-transpose** or **Hermitian-transpose** of $ U $, meaning that\n",
" $ U_{ij}^\\top $ is the complex conjugate of $ U_{ji} $. \n",
"- Similarly, when $ V $ is a complex valued matrix, $ V^\\top $ denotes the **conjugate-transpose** or **Hermitian-transpose** of $ V $ \n",
"\n",
"\n",
"The matrices $ U,\\Sigma,V $ entail linear transformations that reshape in vectors in the following ways:\n",
"\n",
"- multiplying vectors by the unitary matrices $ U $ and $ V $ **rotates** them, but leaves **angles between vectors** and **lengths of vectors** unchanged. \n",
"- multiplying vectors by the diagonal matrix $ \\Sigma $ leaves **angles between vectors** unchanged but **rescales** vectors. \n",
"\n",
"\n",
"Thus, representation [(5.1)](#equation-eq-svd101) asserts that multiplying an $ n \\times 1 $ vector $ y $ by the $ m \\times n $ matrix $ X $\n",
"amounts to performing the following three multiplications of $ y $ sequentially:\n",
"\n",
"- **rotating** $ y $ by computing $ V^\\top y $ \n",
"- **rescaling** $ V^\\top y $ by multiplying it by $ \\Sigma $ \n",
"- **rotating** $ \\Sigma V^\\top y $ by multiplying it by $ U $ \n",
"\n",
"\n",
"This structure of the $ m \\times n $ matrix $ X $ opens the door to constructing systems\n",
"of data **encoders** and **decoders**.\n",
"\n",
"Thus,\n",
"\n",
"- $ V^\\top y $ is an encoder \n",
"- $ \\Sigma $ is an operator to be applied to the encoded data \n",
"- $ U $ is a decoder to be applied to the output from applying operator $ \\Sigma $ to the encoded data \n",
"\n",
"\n",
"We’ll apply this circle of ideas later in this lecture when we study Dynamic Mode Decomposition.\n",
"\n",
"**Road Ahead**\n",
"\n",
"What we have described above is called a **full** SVD.\n",
"\n",
"In a **full** SVD, the shapes of $ U $, $ \\Sigma $, and $ V $ are $ \\left(m, m\\right) $, $ \\left(m, n\\right) $, $ \\left(n, n\\right) $, respectively.\n",
"\n",
"Later we’ll also describe an **economy** or **reduced** SVD.\n",
"\n",
"Before we study a **reduced** SVD we’ll say a little more about properties of a **full** SVD."
]
},
{
"cell_type": "markdown",
"id": "9d06d639",
"metadata": {},
"source": [
"## Four Fundamental Subspaces\n",
"\n",
"Let $ {\\mathcal C} $ denote a column space, $ {\\mathcal N} $ denote a null space, and $ {\\mathcal R} $ denote a row space.\n",
"\n",
"Let’s start by recalling the four fundamental subspaces of an $ m \\times n $\n",
"matrix $ X $ of rank $ p $.\n",
"\n",
"- The **column space** of $ X $, denoted $ {\\mathcal C}(X) $, is the span of the columns of $ X $, i.e., all vectors $ y $ that can be written as linear combinations of columns of $ X $. Its dimension is $ p $. \n",
"- The **null space** of $ X $, denoted $ {\\mathcal N}(X) $ consists of all vectors $ y $ that satisfy\n",
" $ X y = 0 $. Its dimension is $ n-p $. \n",
"- The **row space** of $ X $, denoted $ {\\mathcal R}(X) $ is the column space of $ X^\\top $. It consists of all\n",
" vectors $ z $ that can be written as linear combinations of rows of $ X $. Its dimension is $ p $. \n",
"- The **left null space** of $ X $, denoted $ {\\mathcal N}(X^\\top ) $, consist of all vectors $ z $ such that\n",
" $ X^\\top z =0 $. Its dimension is $ m-p $. \n",
"\n",
"\n",
"For a full SVD of a matrix $ X $, the matrix $ U $ of left singular vectors and the matrix $ V $ of right singular vectors contain orthogonal bases for all four subspaces.\n",
"\n",
"They form two pairs of orthogonal subspaces\n",
"that we’ll describe now.\n",
"\n",
"Let $ u_i, i = 1, \\ldots, m $ be the $ m $ column vectors of $ U $ and let\n",
"$ v_i, i = 1, \\ldots, n $ be the $ n $ column vectors of $ V $.\n",
"\n",
"Let’s write the full SVD of X as\n",
"\n",
"\n",
"\n",
"$$\n",
"X = \\begin{bmatrix} U_L & U_R \\end{bmatrix} \\begin{bmatrix} \\Sigma_p & 0 \\cr 0 & 0 \\end{bmatrix}\n",
" \\begin{bmatrix} V_L & V_R \\end{bmatrix}^\\top \\tag{5.2}\n",
"$$\n",
"\n",
"where $ \\Sigma_p $ is a $ p \\times p $ diagonal matrix with the $ p $ singular values on the diagonal and\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"U_L & = \\begin{bmatrix}u_1 & \\cdots & u_p \\end{bmatrix}, \\quad U_R = \\begin{bmatrix}u_{p+1} & \\cdots u_m \\end{bmatrix} \\cr\n",
"V_L & = \\begin{bmatrix}v_1 & \\cdots & v_p \\end{bmatrix} , \\quad U_R = \\begin{bmatrix}v_{p+1} & \\cdots u_n \\end{bmatrix}\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Representation [(5.2)](#equation-eq-fullsvdpartition) implies that\n",
"\n",
"$$\n",
"X \\begin{bmatrix} V_L & V_R \\end{bmatrix} = \\begin{bmatrix} U_L & U_R \\end{bmatrix} \\begin{bmatrix} \\Sigma_p & 0 \\cr 0 & 0 \\end{bmatrix}\n",
"$$\n",
"\n",
"or\n",
"\n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X V_L & = U_L \\Sigma_p \\cr\n",
"X V_R & = 0\n",
"\\end{aligned} \\tag{5.3}\n",
"$$\n",
"\n",
"or\n",
"\n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X v_i & = \\sigma_i u_i , \\quad i = 1, \\ldots, p \\cr\n",
"X v_i & = 0 , \\quad i = p+1, \\ldots, n\n",
"\\end{aligned} \\tag{5.4}\n",
"$$\n",
"\n",
"Equations [(5.4)](#equation-eq-orthoortho1) tell how the transformation $ X $ maps a pair of orthonormal vectors $ v_i, v_j $ for $ i $ and $ j $ both less than or equal to the rank $ p $ of $ X $ into a pair of orthonormal vectors $ u_i, u_j $.\n",
"\n",
"Equations [(5.3)](#equation-eq-xfour1a) assert that\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"{\\mathcal C}(X) & = {\\mathcal C}(U_L) \\cr\n",
"{\\mathcal N}(X) & = {\\mathcal C} (V_R)\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Taking transposes on both sides of representation [(5.2)](#equation-eq-fullsvdpartition) implies\n",
"\n",
"$$\n",
"X^\\top \\begin{bmatrix} U_L & U_R \\end{bmatrix} = \\begin{bmatrix} V_L & V_R \\end{bmatrix} \\begin{bmatrix} \\Sigma_p & 0 \\cr 0 & 0 \\end{bmatrix}\n",
"$$\n",
"\n",
"or\n",
"\n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X^\\top U_L & = V_L \\Sigma_p \\cr\n",
"X^\\top U_R & = 0\n",
"\\end{aligned} \\tag{5.5}\n",
"$$\n",
"\n",
"or\n",
"\n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X^\\top u_i & = \\sigma_i v_i, \\quad i=1, \\ldots, p \\cr\n",
"X^\\top u_i & = 0 \\quad i= p+1, \\ldots, m\n",
"\\end{aligned} \\tag{5.6}\n",
"$$\n",
"\n",
"Notice how equations [(5.6)](#equation-eq-orthoortho2) assert that the transformation $ X^\\top $ maps a pair of distinct orthonormal vectors $ u_i, u_j $ for $ i $ and $ j $ both less than or equal to the rank $ p $ of $ X $ into a pair of distinct orthonormal vectors $ v_i, v_j $ .\n",
"\n",
"Equations [(5.5)](#equation-eq-xfour1b) assert that\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"{\\mathcal R}(X) & \\equiv {\\mathcal C}(X^\\top ) = {\\mathcal C} (V_L) \\cr\n",
"{\\mathcal N}(X^\\top ) & = {\\mathcal C}(U_R)\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Thus, taken together, the systems of equations [(5.3)](#equation-eq-xfour1a) and [(5.5)](#equation-eq-xfour1b)\n",
"describe the four fundamental subspaces of $ X $ in the following ways:\n",
"\n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"{\\mathcal C}(X) & = {\\mathcal C}(U_L) \\cr\n",
"{\\mathcal N}(X^\\top ) & = {\\mathcal C}(U_R) \\cr\n",
"{\\mathcal R}(X) & \\equiv {\\mathcal C}(X^\\top ) = {\\mathcal C} (V_L) \\cr\n",
"{\\mathcal N}(X) & = {\\mathcal C} (V_R) \\cr\n",
"\n",
"\\end{aligned} \\tag{5.7}\n",
"$$\n",
"\n",
"Since $ U $ and $ V $ are both orthonormal matrices, collection [(5.7)](#equation-eq-fourspacesvd) asserts that\n",
"\n",
"- $ U_L $ is an orthonormal basis for the column space of $ X $ \n",
"- $ U_R $ is an orthonormal basis for the null space of $ X^\\top $ \n",
"- $ V_L $ is an orthonormal basis for the row space of $ X $ \n",
"- $ V_R $ is an orthonormal basis for the null space of $ X $ \n",
"\n",
"\n",
"We have verified the four claims in [(5.7)](#equation-eq-fourspacesvd) simply by performing the multiplications called for by the right side of [(5.2)](#equation-eq-fullsvdpartition) and reading them.\n",
"\n",
"The claims in [(5.7)](#equation-eq-fourspacesvd) and the fact that $ U $ and $ V $ are both unitary (i.e, orthonormal) matrices imply\n",
"that\n",
"\n",
"- the column space of $ X $ is orthogonal to the null space of $ X^\\top $ \n",
"- the null space of $ X $ is orthogonal to the row space of $ X $ \n",
"\n",
"\n",
"Sometimes these properties are described with the following two pairs of orthogonal complement subspaces:\n",
"\n",
"- $ {\\mathcal C}(X) $ is the orthogonal complement of $ {\\mathcal N}(X^\\top ) $ \n",
"- $ {\\mathcal R}(X) $ is the orthogonal complement $ {\\mathcal N}(X) $ \n",
"\n",
"\n",
"Let’s do an example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "05fede75",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"import numpy.linalg as LA\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"id": "390047b3",
"metadata": {},
"source": [
"Having imported these modules, let’s do the example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "881f9f73",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"np.set_printoptions(precision=2)\n",
"\n",
"# Define the matrix\n",
"A = np.array([[1, 2, 3, 4, 5],\n",
" [2, 3, 4, 5, 6],\n",
" [3, 4, 5, 6, 7],\n",
" [4, 5, 6, 7, 8],\n",
" [5, 6, 7, 8, 9]])\n",
"\n",
"# Compute the SVD of the matrix\n",
"U, S, V = np.linalg.svd(A,full_matrices=True)\n",
"\n",
"# Compute the rank of the matrix\n",
"rank = np.linalg.matrix_rank(A)\n",
"\n",
"# Print the rank of the matrix\n",
"print(\"Rank of matrix:\\n\", rank)\n",
"print(\"S: \\n\", S)\n",
"\n",
"# Compute the four fundamental subspaces\n",
"row_space = U[:, :rank]\n",
"col_space = V[:, :rank]\n",
"null_space = V[:, rank:]\n",
"left_null_space = U[:, rank:]\n",
"\n",
"\n",
"print(\"U:\\n\", U)\n",
"print(\"Column space:\\n\", col_space)\n",
"print(\"Left null space:\\n\", left_null_space)\n",
"print(\"V.T:\\n\", V.T)\n",
"print(\"Row space:\\n\", row_space.T)\n",
"print(\"Right null space:\\n\", null_space.T)"
]
},
{
"cell_type": "markdown",
"id": "f4bbf64d",
"metadata": {},
"source": [
"## Eckart-Young Theorem\n",
"\n",
"Suppose that we want to construct the best rank $ r $ approximation of an $ m \\times n $ matrix $ X $.\n",
"\n",
"By best, we mean a matrix $ X_r $ of rank $ r < p $ that, among all rank $ r $ matrices, minimizes\n",
"\n",
"$$\n",
"|| X - X_r ||\n",
"$$\n",
"\n",
"where $ || \\cdot || $ denotes a norm of a matrix $ X $ and where $ X_r $ belongs to the space of all rank $ r $ matrices\n",
"of dimension $ m \\times n $.\n",
"\n",
"Three popular **matrix norms** of an $ m \\times n $ matrix $ X $ can be expressed in terms of the singular values of $ X $\n",
"\n",
"- the **spectral** or $ l^2 $ norm $ || X ||_2 = \\max_{||y|| \\neq 0} \\frac{||X y ||}{||y||} = \\sigma_1 $ \n",
"- the **Frobenius** norm $ ||X ||_F = \\sqrt{\\sigma_1^2 + \\cdots + \\sigma_p^2} $ \n",
"- the **nuclear** norm $ || X ||_N = \\sigma_1 + \\cdots + \\sigma_p $ \n",
"\n",
"\n",
"The Eckart-Young theorem states that for each of these three norms, same rank $ r $ matrix is best and that it equals\n",
"\n",
"\n",
"\n",
"$$\n",
"\\hat X_r = \\sigma_1 U_1 V_1^\\top + \\sigma_2 U_2 V_2^\\top + \\cdots + \\sigma_r U_r V_r^\\top \\tag{5.8}\n",
"$$\n",
"\n",
"This is a very powerful theorem that says that we can take our $ m \\times n $ matrix $ X $ that in not full rank, and we can best approximate it by a full rank $ p \\times p $ matrix through the SVD.\n",
"\n",
"Moreover, if some of these $ p $ singular values carry more information than others, and if we want to have the most amount of information with the least amount of data, we can take $ r $ leading singular values ordered by magnitude.\n",
"\n",
"We’ll say more about this later when we present Principal Component Analysis.\n",
"\n",
"You can read about the Eckart-Young theorem and some of its uses [here](https://en.wikipedia.org/wiki/Low-rank_approximation).\n",
"\n",
"We’ll make use of this theorem when we discuss principal components analysis (PCA) and also dynamic mode decomposition (DMD)."
]
},
{
"cell_type": "markdown",
"id": "85cc98ef",
"metadata": {},
"source": [
"## Full and Reduced SVD’s\n",
"\n",
"Up to now we have described properties of a **full** SVD in which shapes of $ U $, $ \\Sigma $, and $ V $ are $ \\left(m, m\\right) $, $ \\left(m, n\\right) $, $ \\left(n, n\\right) $, respectively.\n",
"\n",
"There is an alternative bookkeeping convention called an **economy** or **reduced** SVD in which the shapes of $ U, \\Sigma $ and $ V $ are different from what they are in a full SVD.\n",
"\n",
"Thus, note that because we assume that $ X $ has rank $ p $, there are only $ p $ nonzero singular values, where $ p=\\textrm{rank}(X)\\leq\\min\\left(m, n\\right) $.\n",
"\n",
"A **reduced** SVD uses this fact to express $ U $, $ \\Sigma $, and $ V $ as matrices with shapes $ \\left(m, p\\right) $, $ \\left(p, p\\right) $, $ \\left( n, p\\right) $.\n",
"\n",
"You can read about reduced and full SVD here\n",
"[https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html](https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html)\n",
"\n",
"For a full SVD,\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"UU^\\top & = I & \\quad U^\\top U = I \\cr\n",
"VV^\\top & = I & \\quad V^\\top V = I\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"But not all these properties hold for a **reduced** SVD.\n",
"\n",
"Which properties hold depend on whether we are in a **tall-skinny** case or a **short-fat** case.\n",
"\n",
"- In a **tall-skinny** case in which $ m > > n $, for a **reduced** SVD \n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"UU^\\top & \\neq I & \\quad U^\\top U = I \\cr\n",
"VV^\\top & = I & \\quad V^\\top V = I\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"- In a **short-fat** case in which $ m < < n $, for a **reduced** SVD \n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"UU^\\top & = I & \\quad U^\\top U = I \\cr\n",
"VV^\\top & = I & \\quad V^\\top V \\neq I\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"When we study Dynamic Mode Decomposition below, we shall want to remember these properties when we use a reduced SVD to compute some DMD representations.\n",
"\n",
"Let’s do an exercise to compare **full** and **reduced** SVD’s.\n",
"\n",
"To review,\n",
"\n",
"- in a **full** SVD \n",
" - $ U $ is $ m \\times m $ \n",
" - $ \\Sigma $ is $ m \\times n $ \n",
" - $ V $ is $ n \\times n $ \n",
"- in a **reduced** SVD \n",
" - $ U $ is $ m \\times p $ \n",
" - $ \\Sigma $ is $ p\\times p $ \n",
" - $ V $ is $ n \\times p $ \n",
"\n",
"\n",
"First, let’s study a case in which $ m = 5 > n = 2 $.\n",
"\n",
"(This is a small example of the **tall-skinny** case that will concern us when we study **Dynamic Mode Decompositions** below.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1023e4d7",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"X = np.random.rand(5,2)\n",
"U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD\n",
"Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD\n",
"print('U, S, V =')\n",
"U, S, V"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ab427df",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"print('Uhat, Shat, Vhat = ')\n",
"Uhat, Shat, Vhat"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "763cc11b",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"rr = np.linalg.matrix_rank(X)\n",
"print(f'rank of X = {rr}')"
]
},
{
"cell_type": "markdown",
"id": "7d684fb5",
"metadata": {},
"source": [
"**Properties:**\n",
"\n",
"- Where $ U $ is constructed via a full SVD, $ U^\\top U = I_{m\\times m} $ and $ U U^\\top = I_{m \\times m} $ \n",
"- Where $ \\hat U $ is constructed via a reduced SVD, although $ \\hat U^\\top \\hat U = I_{p\\times p} $, it happens that $ \\hat U \\hat U^\\top \\neq I_{m \\times m} $ \n",
"\n",
"\n",
"We illustrate these properties for our example with the following code cells."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "64f7b92e",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"UTU = U.T@U\n",
"UUT = U@U.T\n",
"print('UUT, UTU = ')\n",
"UUT, UTU"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c52c9583",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"UhatUhatT = Uhat@Uhat.T\n",
"UhatTUhat = Uhat.T@Uhat\n",
"print('UhatUhatT, UhatTUhat= ')\n",
"UhatUhatT, UhatTUhat"
]
},
{
"cell_type": "markdown",
"id": "4e4db037",
"metadata": {},
"source": [
"**Remarks:**\n",
"\n",
"The cells above illustrate the application of the `full_matrices=True` and `full_matrices=False` options.\n",
"Using `full_matrices=False` returns a reduced singular value decomposition.\n",
"\n",
"The **full** and **reduced** SVD’s both accurately decompose an $ m \\times n $ matrix $ X $\n",
"\n",
"When we study Dynamic Mode Decompositions below, it will be important for us to remember the preceding properties of full and reduced SVD’s in such tall-skinny cases.\n",
"\n",
"Now let’s turn to a short-fat case.\n",
"\n",
"To illustrate this case, we’ll set $ m = 2 < 5 = n $ and compute both full and reduced SVD’s."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9360be2c",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"X = np.random.rand(2,5)\n",
"U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD\n",
"Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD\n",
"print('U, S, V = ')\n",
"U, S, V"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d5c044ea",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"print('Uhat, Shat, Vhat = ')\n",
"Uhat, Shat, Vhat"
]
},
{
"cell_type": "markdown",
"id": "0e42cf9e",
"metadata": {},
"source": [
"Let’s verify that our reduced SVD accurately represents $ X $"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86b32eb9",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"SShat=np.diag(Shat)\n",
"np.allclose(X, Uhat@SShat@Vhat)"
]
},
{
"cell_type": "markdown",
"id": "a29adb23",
"metadata": {},
"source": [
"## Polar Decomposition\n",
"\n",
"A **reduced** singular value decomposition (SVD) of $ X $ is related to a **polar decomposition** of $ X $\n",
"\n",
"$$\n",
"X = SQ\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
" S & = U\\Sigma U^\\top \\cr\n",
"Q & = U V^\\top\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Here\n",
"\n",
"- $ S $ is an $ m \\times m $ **symmetric** matrix \n",
"- $ Q $ is an $ m \\times n $ **orthogonal** matrix \n",
"\n",
"\n",
"and in our reduced SVD\n",
"\n",
"- $ U $ is an $ m \\times p $ orthonormal matrix \n",
"- $ \\Sigma $ is a $ p \\times p $ diagonal matrix \n",
"- $ V $ is an $ n \\times p $ orthonormal "
]
},
{
"cell_type": "markdown",
"id": "9325f66f",
"metadata": {},
"source": [
"## Application: Principal Components Analysis (PCA)\n",
"\n",
"Let’s begin with a case in which $ n >> m $, so that we have many more individuals $ n $ than attributes $ m $.\n",
"\n",
"The matrix $ X $ is **short and fat** in an $ n >> m $ case as opposed to a **tall and skinny** case with $ m > > n $ to be discussed later.\n",
"\n",
"We regard $ X $ as an $ m \\times n $ matrix of **data**:\n",
"\n",
"$$\n",
"X = \\begin{bmatrix} X_1 \\mid X_2 \\mid \\cdots \\mid X_n\\end{bmatrix}\n",
"$$\n",
"\n",
"where for $ j = 1, \\ldots, n $ the column vector $ X_j = \\begin{bmatrix}X_{1j}\\\\X_{2j}\\\\\\vdots\\\\X_{mj}\\end{bmatrix} $ is a vector of observations on variables $ \\begin{bmatrix}x_1\\\\x_2\\\\\\vdots\\\\x_m\\end{bmatrix} $.\n",
"\n",
"In a **time series** setting, we would think of columns $ j $ as indexing different **times** at which random variables are observed, while rows index different random variables.\n",
"\n",
"In a **cross-section** setting, we would think of columns $ j $ as indexing different **individuals** for which random variables are observed, while rows index different **attributes**.\n",
"\n",
"As we have seen before, the SVD is a way to decompose a matrix into useful components, just like polar decomposition, eigendecomposition, and many others.\n",
"\n",
"PCA, on the other hand, is a method that builds on the SVD to analyze data. The goal is to apply certain steps, to help better visualize patterns in data, using statistical tools to capture the most important patterns in data.\n",
"\n",
"**Step 1: Standardize the data:**\n",
"\n",
"Because our data matrix may hold variables of different units and scales, we first need to standardize the data.\n",
"\n",
"First by computing the average of each row of $ X $.\n",
"\n",
"$$\n",
"\\bar{X_j}= \\frac{1}{m} \\sum_{i = 1}^{m} x_{i,j}\n",
"$$\n",
"\n",
"We then create an average matrix out of these means:\n",
"\n",
"$$\n",
"\\bar{X} = \\begin{bmatrix}1 \\\\1 \\\\\\ldots\\\\1 \\end{bmatrix} \\begin{bmatrix} \\bar{X_1} \\mid \\bar{X_2} \\mid \\cdots \\mid \\bar{X_n}\\end{bmatrix}\n",
"$$\n",
"\n",
"And subtract out of the original matrix to create a mean centered matrix:\n",
"\n",
"$$\n",
"B = X - \\bar{X}\n",
"$$\n",
"\n",
"**Step 2: Compute the covariance matrix:**\n",
"\n",
"Then because we want to extract the relationships between variables rather than just their magnitude, in other words, we want to know how they can explain each other, we compute the covariance matrix of $ B $.\n",
"\n",
"$$\n",
"C = \\frac{1}{{n}} B^\\top B\n",
"$$\n",
"\n",
"**Step 3: Decompose the covariance matrix and arrange the singular values:**\n",
"\n",
"If the matrix $ C $ is diagonalizable, we can eigendecompose it, find its eigenvalues and rearrange the eigenvalue and eigenvector matrices in a decreasing other.\n",
"\n",
"If $ C $ is not diagonalizable, we can perform an SVD of $ C $:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"B^T B &= V \\Sigma^\\top U^\\top U \\Sigma V^\\top \\cr\n",
"&= V \\Sigma^\\top \\Sigma V^\\top\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"$$\n",
"C = \\frac{1}{{n}} V \\Sigma^\\top \\Sigma V^\\top\n",
"$$\n",
"\n",
"We can then rearrange the columns in the matrices $ V $ and $ \\Sigma $ so that the singular values are in decreasing order.\n",
"\n",
"**Step 4: Select singular values, (optional) truncate the rest:**\n",
"\n",
"We can now decide how many singular values to pick, based on how much variance you want to retain. (e.g., retaining 95% of the total variance).\n",
"\n",
"We can obtain the percentage by calculating the variance contained in the leading $ r $ factors divided by the variance in total:\n",
"\n",
"$$\n",
"\\frac{\\sum_{i = 1}^{r} \\sigma^2_{i}}{\\sum_{i = 1}^{p} \\sigma^2_{i}}\n",
"$$\n",
"\n",
"**Step 5: Create the Score Matrix:**\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"T&= BV \\cr\n",
"&= U\\Sigma V^\\top \\cr\n",
"&= U\\Sigma\n",
"\\end{aligned}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "4de32fa4",
"metadata": {},
"source": [
"## Relationship of PCA to SVD\n",
"\n",
"To relate an SVD to a PCA of data set $ X $, first construct the SVD of the data matrix $ X $:\n",
"\n",
"Let’s assume that sample means of all variables are zero, so we don’t need to standardize our matrix.\n",
"\n",
"\n",
"\n",
"$$\n",
"X = U \\Sigma V^\\top = \\sigma_1 U_1 V_1^\\top + \\sigma_2 U_2 V_2^\\top + \\cdots + \\sigma_p U_p V_p^\\top \\tag{5.9}\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"U=\\begin{bmatrix}U_1|U_2|\\ldots|U_m\\end{bmatrix}\n",
"$$\n",
"\n",
"$$\n",
"V^\\top = \\begin{bmatrix}V_1^\\top \\\\V_2^\\top \\\\\\ldots\\\\V_n^\\top \\end{bmatrix}\n",
"$$\n",
"\n",
"In equation [(5.9)](#equation-eq-pca1), each of the $ m \\times n $ matrices $ U_{j}V_{j}^\\top $ is evidently\n",
"of rank $ 1 $.\n",
"\n",
"Thus, we have\n",
"\n",
"\n",
"\n",
"$$\n",
"X = \\sigma_1 \\begin{pmatrix}U_{11}V_{1}^\\top \\\\U_{21}V_{1}^\\top \\\\\\cdots\\\\U_{m1}V_{1}^\\top \\\\\\end{pmatrix} + \\sigma_2\\begin{pmatrix}U_{12}V_{2}^\\top \\\\U_{22}V_{2}^\\top \\\\\\cdots\\\\U_{m2}V_{2}^\\top \\\\\\end{pmatrix}+\\ldots + \\sigma_p\\begin{pmatrix}U_{1p}V_{p}^\\top \\\\U_{2p}V_{p}^\\top \\\\\\cdots\\\\U_{mp}V_{p}^\\top \\\\\\end{pmatrix} \\tag{5.10}\n",
"$$\n",
"\n",
"Here is how we would interpret the objects in the matrix equation [(5.10)](#equation-eq-pca2) in\n",
"a time series context:\n",
"\n",
"- $ \\textrm{for each} \\ k=1, \\ldots, n $, the object $ \\lbrace V_{kj} \\rbrace_{j=1}^n $ is a time series for the $ k $th **principal component** \n",
"- $ U_j = \\begin{bmatrix}U_{1k}\\\\U_{2k}\\\\\\ldots\\\\U_{mk}\\end{bmatrix} \\ k=1, \\ldots, m $\n",
" is a vector of **loadings** of variables $ X_i $ on the $ k $th principal component, $ i=1, \\ldots, m $ \n",
"- $ \\sigma_k $ for each $ k=1, \\ldots, p $ is the strength of $ k $th **principal component**, where strength means contribution to the overall covariance of $ X $. "
]
},
{
"cell_type": "markdown",
"id": "57513e64",
"metadata": {},
"source": [
"## PCA with Eigenvalues and Eigenvectors\n",
"\n",
"We now use an eigen decomposition of a sample covariance matrix to do PCA.\n",
"\n",
"Let $ X_{m \\times n} $ be our $ m \\times n $ data matrix.\n",
"\n",
"Let’s assume that sample means of all variables are zero.\n",
"\n",
"We can assure this by **pre-processing** the data by subtracting sample means.\n",
"\n",
"Define a sample covariance matrix $ \\Omega $ as\n",
"\n",
"$$\n",
"\\Omega = XX^\\top\n",
"$$\n",
"\n",
"Then use an eigen decomposition to represent $ \\Omega $ as follows:\n",
"\n",
"$$\n",
"\\Omega =P\\Lambda P^\\top\n",
"$$\n",
"\n",
"Here\n",
"\n",
"- $ P $ is $ m×m $ matrix of eigenvectors of $ \\Omega $ \n",
"- $ \\Lambda $ is a diagonal matrix of eigenvalues of $ \\Omega $ \n",
"\n",
"\n",
"We can then represent $ X $ as\n",
"\n",
"$$\n",
"X=P\\epsilon\n",
"$$\n",
"\n",
"where\n",
"\n",
"$$\n",
"\\epsilon = P^{-1} X\n",
"$$\n",
"\n",
"and\n",
"\n",
"$$\n",
"\\epsilon\\epsilon^\\top =\\Lambda .\n",
"$$\n",
"\n",
"We can verify that\n",
"\n",
"\n",
"\n",
"$$\n",
"XX^\\top =P\\Lambda P^\\top . \\tag{5.11}\n",
"$$\n",
"\n",
"It follows that we can represent the data matrix $ X $ as\n",
"\n",
"$$\n",
"\\begin{equation*}\n",
"X=\\begin{bmatrix}X_1|X_2|\\ldots|X_m\\end{bmatrix} =\\begin{bmatrix}P_1|P_2|\\ldots|P_m\\end{bmatrix}\n",
"\\begin{bmatrix}\\epsilon_1\\\\\\epsilon_2\\\\\\ldots\\\\\\epsilon_m\\end{bmatrix}\n",
"= P_1\\epsilon_1+P_2\\epsilon_2+\\ldots+P_m\\epsilon_m\n",
"\\end{equation*}\n",
"$$\n",
"\n",
"To reconcile the preceding representation with the PCA that we had obtained earlier through the SVD, we first note that $ \\epsilon_j^2=\\lambda_j\\equiv\\sigma^2_j $.\n",
"\n",
"Now define $ \\tilde{\\epsilon_j} = \\frac{\\epsilon_j}{\\sqrt{\\lambda_j}} $,\n",
"which implies that $ \\tilde{\\epsilon}_j\\tilde{\\epsilon}_j^\\top =1 $.\n",
"\n",
"Therefore\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X&=\\sqrt{\\lambda_1}P_1\\tilde{\\epsilon_1}+\\sqrt{\\lambda_2}P_2\\tilde{\\epsilon_2}+\\ldots+\\sqrt{\\lambda_m}P_m\\tilde{\\epsilon_m}\\\\\n",
"&=\\sigma_1P_1\\tilde{\\epsilon_2}+\\sigma_2P_2\\tilde{\\epsilon_2}+\\ldots+\\sigma_mP_m\\tilde{\\epsilon_m} ,\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"which agrees with\n",
"\n",
"$$\n",
"X=\\sigma_1U_1{V_1}^{T}+\\sigma_2 U_2{V_2}^{T}+\\ldots+\\sigma_{r} U_{r}{V_{r}}^{T}\n",
"$$\n",
"\n",
"provided that we set\n",
"\n",
"- $ U_j=P_j $ (a vector of loadings of variables on principal component $ j $) \n",
"- $ {V_k}^{T}=\\tilde{\\epsilon_k} $ (the $ k $th principal component) \n",
"\n",
"\n",
"Because there are alternative algorithms for computing $ P $ and $ U $ for given a data matrix $ X $, depending on algorithms used, we might have sign differences or different orders of eigenvectors.\n",
"\n",
"We can resolve such ambiguities about $ U $ and $ P $ by\n",
"\n",
"1. sorting eigenvalues and singular values in descending order \n",
"1. imposing positive diagonals on $ P $ and $ U $ and adjusting signs in $ V^\\top $ accordingly "
]
},
{
"cell_type": "markdown",
"id": "55647856",
"metadata": {},
"source": [
"## Connections\n",
"\n",
"To pull things together, it is useful to assemble and compare some formulas presented above.\n",
"\n",
"First, consider an SVD of an $ m \\times n $ matrix:\n",
"\n",
"$$\n",
"X = U\\Sigma V^\\top\n",
"$$\n",
"\n",
"Compute:\n",
"\n",
"\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"XX^\\top &=U\\Sigma V^\\top V\\Sigma^\\top U^\\top \\cr\n",
"&\\equiv U\\Sigma\\Sigma^\\top U^\\top \\cr\n",
"&\\equiv U\\Lambda U^\\top\n",
"\\end{aligned} \\tag{5.12}\n",
"$$\n",
"\n",
"Compare representation [(5.12)](#equation-eq-xxcompare) with equation [(5.11)](#equation-eq-xxo) above.\n",
"\n",
"Evidently, $ U $ in the SVD is the matrix $ P $ of\n",
"eigenvectors of $ XX^\\top $ and $ \\Sigma \\Sigma^\\top $ is the matrix $ \\Lambda $ of eigenvalues.\n",
"\n",
"Second, let’s compute\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"X^\\top X &=V\\Sigma^\\top U^\\top U\\Sigma V^\\top \\\\\n",
"&=V\\Sigma^\\top {\\Sigma}V^\\top\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"Thus, the matrix $ V $ in the SVD is the matrix of eigenvectors of $ X^\\top X $\n",
"\n",
"Summarizing and fitting things together, we have the eigen decomposition of the sample\n",
"covariance matrix\n",
"\n",
"$$\n",
"X X^\\top = P \\Lambda P^\\top\n",
"$$\n",
"\n",
"where $ P $ is an orthogonal matrix.\n",
"\n",
"Further, from the SVD of $ X $, we know that\n",
"\n",
"$$\n",
"X X^\\top = U \\Sigma \\Sigma^\\top U^\\top\n",
"$$\n",
"\n",
"where $ U $ is an orthogonal matrix.\n",
"\n",
"Thus, $ P = U $ and we have the representation of $ X $\n",
"\n",
"$$\n",
"X = P \\epsilon = U \\Sigma V^\\top\n",
"$$\n",
"\n",
"It follows that\n",
"\n",
"$$\n",
"U^\\top X = \\Sigma V^\\top = \\epsilon\n",
"$$\n",
"\n",
"Note that the preceding implies that\n",
"\n",
"$$\n",
"\\epsilon \\epsilon^\\top = \\Sigma V^\\top V \\Sigma^\\top = \\Sigma \\Sigma^\\top = \\Lambda ,\n",
"$$\n",
"\n",
"so that everything fits together.\n",
"\n",
"Below we define a class `DecomAnalysis` that wraps PCA and SVD for a given a data matrix `X`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3017990c",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"class DecomAnalysis:\n",
" \"\"\"\n",
" A class for conducting PCA and SVD.\n",
" X: data matrix\n",
" r_component: chosen rank for best approximation\n",
" \"\"\"\n",
"\n",
" def __init__(self, X, r_component=None):\n",
"\n",
" self.X = X\n",
"\n",
" self.Ω = (X @ X.T)\n",
"\n",
" self.m, self.n = X.shape\n",
" self.r = LA.matrix_rank(X)\n",
"\n",
" if r_component:\n",
" self.r_component = r_component\n",
" else:\n",
" self.r_component = self.m\n",
"\n",
" def pca(self):\n",
"\n",
" 𝜆, P = LA.eigh(self.Ω) # columns of P are eigenvectors\n",
"\n",
" ind = sorted(range(𝜆.size), key=lambda x: 𝜆[x], reverse=True)\n",
"\n",
" # sort by eigenvalues\n",
" self.𝜆 = 𝜆[ind]\n",
" P = P[:, ind]\n",
" self.P = P @ diag_sign(P)\n",
"\n",
" self.Λ = np.diag(self.𝜆)\n",
"\n",
" self.explained_ratio_pca = np.cumsum(self.𝜆) / self.𝜆.sum()\n",
"\n",
" # compute the N by T matrix of principal components\n",
" self.𝜖 = self.P.T @ self.X\n",
"\n",
" P = self.P[:, :self.r_component]\n",
" 𝜖 = self.𝜖[:self.r_component, :]\n",
"\n",
" # transform data\n",
" self.X_pca = P @ 𝜖\n",
"\n",
" def svd(self):\n",
"\n",
" U, 𝜎, VT = LA.svd(self.X)\n",
"\n",
" ind = sorted(range(𝜎.size), key=lambda x: 𝜎[x], reverse=True)\n",
"\n",
" # sort by eigenvalues\n",
" d = min(self.m, self.n)\n",
"\n",
" self.𝜎 = 𝜎[ind]\n",
" U = U[:, ind]\n",
" D = diag_sign(U)\n",
" self.U = U @ D\n",
" VT[:d, :] = D @ VT[ind, :]\n",
" self.VT = VT\n",
"\n",
" self.Σ = np.zeros((self.m, self.n))\n",
" self.Σ[:d, :d] = np.diag(self.𝜎)\n",
"\n",
" 𝜎_sq = self.𝜎 ** 2\n",
" self.explained_ratio_svd = np.cumsum(𝜎_sq) / 𝜎_sq.sum()\n",
"\n",
" # slicing matrices by the number of components to use\n",
" U = self.U[:, :self.r_component]\n",
" Σ = self.Σ[:self.r_component, :self.r_component]\n",
" VT = self.VT[:self.r_component, :]\n",
"\n",
" # transform data\n",
" self.X_svd = U @ Σ @ VT\n",
"\n",
" def fit(self, r_component):\n",
"\n",
" # pca\n",
" P = self.P[:, :r_component]\n",
" 𝜖 = self.𝜖[:r_component, :]\n",
"\n",
" # transform data\n",
" self.X_pca = P @ 𝜖\n",
"\n",
" # svd\n",
" U = self.U[:, :r_component]\n",
" Σ = self.Σ[:r_component, :r_component]\n",
" VT = self.VT[:r_component, :]\n",
"\n",
" # transform data\n",
" self.X_svd = U @ Σ @ VT\n",
"\n",
"def diag_sign(A):\n",
" \"Compute the signs of the diagonal of matrix A\"\n",
"\n",
" D = np.diag(np.sign(np.diag(A)))\n",
"\n",
" return D"
]
},
{
"cell_type": "markdown",
"id": "61967f37",
"metadata": {},
"source": [
"We also define a function that prints out information so that we can compare decompositions\n",
"obtained by different algorithms."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7eeae72e",
"metadata": {
"hide-output": false
},
"outputs": [],
"source": [
"def compare_pca_svd(da):\n",
" \"\"\"\n",
" Compare the outcomes of PCA and SVD.\n",
" \"\"\"\n",
"\n",
" da.pca()\n",
" da.svd()\n",
"\n",
" print('Eigenvalues and Singular values\\n')\n",
" print(f'λ = {da.λ}\\n')\n",
" print(f'σ^2 = {da.σ**2}\\n')\n",
" print('\\n')\n",
"\n",
" # loading matrices\n",
" fig, axs = plt.subplots(1, 2, figsize=(14, 5))\n",
" plt.suptitle('loadings')\n",
" axs[0].plot(da.P.T)\n",
" axs[0].set_title('P')\n",
" axs[0].set_xlabel('m')\n",
" axs[1].plot(da.U.T)\n",
" axs[1].set_title('U')\n",
" axs[1].set_xlabel('m')\n",
" plt.show()\n",
"\n",
" # principal components\n",
" fig, axs = plt.subplots(1, 2, figsize=(14, 5))\n",
" plt.suptitle('principal components')\n",
" axs[0].plot(da.ε.T)\n",
" axs[0].set_title('ε')\n",
" axs[0].set_xlabel('n')\n",
" axs[1].plot(da.VT[:da.r, :].T * np.sqrt(da.λ))\n",
" axs[1].set_title('$V^\\top *\\sqrt{\\lambda}$')\n",
" axs[1].set_xlabel('n')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"id": "d758ad99",
"metadata": {},
"source": [
"## Exercises"
]
},
{
"cell_type": "markdown",
"id": "4ffa918e",
"metadata": {},
"source": [
"## Exercise 5.1\n",
"\n",
"In Ordinary Least Squares (OLS), we learn to compute $ \\hat{\\beta} = (X^\\top X)^{-1} X^\\top y $, but there are cases such as when we have colinearity or an underdetermined system: **short fat** matrix.\n",
"\n",
"In these cases, the $ (X^\\top X) $ matrix is not not invertible (its determinant is zero) or ill-conditioned (its determinant is very close to zero).\n",
"\n",
"What we can do instead is to create what is called a [pseudoinverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse), a full rank approximation of the inverted matrix so we can compute $ \\hat{\\beta} $ with it.\n",
"\n",
"Thinking in terms of the Eckart-Young theorem, build the pseudoinverse matrix $ X^{+} $ and use it to compute $ \\hat{\\beta} $."
]
},
{
"cell_type": "markdown",
"id": "88640e0b",
"metadata": {},
"source": [
"## Solution to[ Exercise 5.1](https://python.quantecon.org/#svd_ex1)\n",
"\n",
"We can use SVD to compute the pseudoinverse:\n",
"\n",
"$$\n",
"X = U \\Sigma V^\\top\n",
"$$\n",
"\n",
"inverting $ X $, we have:\n",
"\n",
"$$\n",
"X^{+} = V \\Sigma^{+} U^\\top\n",
"$$\n",
"\n",
"where:\n",
"\n",
"$$\n",
"\\Sigma^{+} = \\begin{bmatrix}\n",
"\\frac{1}{\\sigma_1} & 0 & \\cdots & 0 & 0 \\\\\n",
"0 & \\frac{1}{\\sigma_2} & \\cdots & 0 & 0 \\\\\n",
"\\vdots & \\vdots & \\ddots & \\vdots & \\vdots \\\\\n",
"0 & 0 & \\cdots & \\frac{1}{\\sigma_p} & 0 \\\\\n",
"0 & 0 & \\cdots & 0 & 0 \\\\\n",
"\\end{bmatrix}\n",
"$$\n",
"\n",
"and finally:\n",
"\n",
"$$\n",
"\\hat{\\beta} = X^{+}y = V \\Sigma^{+} U^\\top y\n",
"$$\n",
"\n",
"For an example PCA applied to analyzing the structure of intelligence tests see this lecture [Multivariable Normal Distribution](https://python.quantecon.org/multivariate_normal.html).\n",
"\n",
"Look at parts of that lecture that describe and illustrate the classic factor analysis model.\n",
"\n",
"As mentioned earlier, in a sequel to this lecture about [Dynamic Mode Decompositions](https://python.quantecon.org/var_dmd.html), we’ll describe how SVD’s provide ways rapidly to compute reduced-order approximations to first-order Vector Autoregressions (VARs)."
]
}
],
"metadata": {
"date": 1727927328.9274595,
"filename": "svd_intro.md",
"kernelspec": {
"display_name": "Python",
"language": "python3",
"name": "python3"
},
"title": "Singular Value Decomposition (SVD)"
},
"nbformat": 4,
"nbformat_minor": 5
}