6.5 最小二乘问题

最小二乘问题

最小二乘问题用于求解不相容方程组

~\mathbf{A}\mathbf{x} = \mathbf{b}~

，使

~\mathbf{A}\hat{\mathbf{x}}~

近似于

~\mathbf{b}~

，即最小化误差

~\|\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}\|~

。其解满足法方程

~\mathbf{A}^T \mathbf{A} \hat{x} = \mathbf{A}^T \mathbf{b}~

。若

~\mathbf{A}~

的列向量线性无关，则解唯一，可通过 QR 分解计算。

1. 最小二乘问题概述

在实际应用中，由于数据不完整或噪声等原因，我们经常会遇到一些不相容的方程组。这种情况下，虽然没有精确解，但我们仍然需要找到一个近似解，使得方程的左右两边差距尽可能小。为了解决这个问题，我们引入最小二乘法

~(\textbf{least squares solution})~

，它通过最小化误差的平方和来找到最优的近似解。

定义

最小二乘法

如果

~\mathbf{A}~

是

~m\times n~

矩阵，向量

~\mathbf{b} \in \mathbb{R^m}~

，那么方程

~\mathbf{A}\mathbf{x} = \mathbf{b}~

的最小二乘解是一个向量

~\hat{\mathbf{x}} \in \mathbb{R}^n~

，使得对所有的

~\mathbf{x} \in \mathbb{R}^n~

都满足：

\|\mathbf{b} - A\hat{\mathbf{x}}\| \leq \|\mathbf{b} - A\mathbf{x}\|

开通会员解锁全部动画

在几何上，最小二乘问题的解可以通过正交投影来理解。向量 $~\mathbf{b}~$ 和 $~\text{Col}\,\mathbf{A}~$ 之间的距离是通过投影来最小化的。

2. 一般最小二乘问题的解

接下来，我们将讨论如何求解最小二乘问题，并分析其解所满足的数学特性。这些特性不仅揭示了最小二乘解的几何意义，还确保了它在误差最小化方面的最优性。

开通会员解锁全部动画

2.1 正交投影与最优逼近

在求解最小二乘问题时，我们应用最佳逼近定理。设

~\mathbf{A}~

是

~m\times n~

矩阵，

~\mathbf{b}~

是

~\mathbb{R^m}~

中的一个向量。我们希望找到一个

~\mathbf{x} \in \mathbb{R}^n~

，使得

\mathbf{A}\mathbf{x}

与

~\mathbf{b}~

之间的误差

~\|\mathbf{A}\mathbf{x} - \mathbf{b}\|~

最小。根据正交投影的性质，我们可以找到

~\mathbf{b}~

在列空间

~\text{Col}\,A~

上的正交投影：

\mathbf{\hat{b}} = \text{proj}_{\text{Col}A} \mathbf{b}

由于

~\hat{\mathbf{b}}~

是

~\text{Col}\mathbf{A}~

上离

~\mathbf{b}~

最近的向量，最小二乘解满足：

\mathbf{A} \hat{\mathbf{x}} = \hat{\mathbf{b}}

这保证了

~\mathbf{A}\hat{\mathbf{x}}~

在

~\text{Col}\mathbf{A}~

内，并且是最接近

~\mathbf{b}~

的向量。

2.2 误差向量的正交性质

误差向量

~\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}~

与

~\text{Col}\mathbf{A}~

正交，，这意味着它与

~\mathbf{A}~

的每一列都正交。设

~\mathbf{a}_j~

是

~\mathbf{A}~

的任意列，那么

~\mathbf{a}_j\cdot (\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}) = 0

，根据点积的矩阵乘法的形式还可以写为

~\mathbf{a}_j^T(\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}) = 0 ~

，用矩阵形式表示：

\begin{bmatrix}\mathbf{a}_1^T \\[1ex] \mathbf{a}_2^T \\[1ex] \vdots \\[1ex] \mathbf{a}_n^T \end{bmatrix}(\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}) = \mathbf{A}^T(\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}) = \mathbf{0}\tag{2}

对

~(2)~

展开：

\mathbf{A}^T \mathbf{b} - \mathbf{A}^T \mathbf{A} \hat{\mathbf{x}} = \mathbf{0}

整理得：

\colorbox{#F0F8FF}{$\mathbf{A}^T \mathbf{A} \hat{\mathbf{x}} = \mathbf{A}^T \mathbf{b}$}\tag{3}

这就是法方程

(\textbf{Normal Equations})

或正规方程，其解即为最小二乘解。

2.3 最小二乘解与法方程的关系

下面的定理说明法方程的解集就是最小二乘解集：

定理 13

最小二乘解与法方程解的关系

方程

\mathbf{A}\mathbf{x} = \mathbf{b}

的所有最小二乘解的集合与法方程

~\mathbf{A}^T\mathbf{A}\mathbf{x} = \mathbf{A}^T\mathbf{b}~

的非空解集一致。

假设

~\hat{\mathbf{x}}~

是

\mathbf{A}\mathbf{x} = \mathbf{b}

的最小二乘解，即它使得误差

\mathbf{r} = \mathbf{b} - \mathbf{A}\hat{\mathbf{x}}

的范数最小：

\hat{x} = \text{arg} \min_{\mathbf{x} \in \mathbb{R}^n} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|

根据最佳逼近定理，最优解

~\mathbf{A}\hat{\mathbf{x}}~

必须是

~\mathbf{b}~

在

~\text{Col}\mathbf{A}~

上的正交投影：

A\hat{x} = \mathbf{\hat{b}}

其中

~\hat{\mathbf{b}}~

是

~\mathbf{b}~

在

~\mathbf{A}~

的列空间上的正交投影。由于投影误差向量

~\mathbf{r} = \mathbf{b} - \mathbf{\hat{b}}~

必须与

~\text{Col}\mathbf{A}~

正交，因此对于

~\mathbf{A}~

的每一列

~\mathbf{a}_j~

，都有：

\mathbf{a}_j^T (\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}) = 0

以矩阵形式写出：

\mathbf{A}^T (\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}) = \mathbf{0}

展开整理可得：

\mathbf{A}^T \mathbf{A} \hat{\mathbf{x}} = \mathbf{A}^T \mathbf{b}

因此，每一个最小二乘解都满足法方程。

现在，假设

\hat{\mathbf{x}}

是法方程的解，即：

\mathbf{A}^T \mathbf{A} \hat{\mathbf{x}} = \mathbf{A}^T \mathbf{b}

对误差向量

\mathbf{r} = \mathbf{b} - A\hat{x}

进行分析：

\mathbf{A}^T \mathbf{r} = \mathbf{A}^T (\mathbf{b} -\mathbf{A}\hat{\mathbf{x}}) = \mathbf{A}^T \mathbf{b} - \mathbf{A}^T A\hat{\mathbf{x}} = 0

这说明

~\mathbf{r}~

与

~\mathbf{A}~

的列空间正交，即

~\mathbf{r}~

是

~\mathbf{b}~

在

~\text{Col}\mathbf{A}~

的正交补空间上的分量。因此，

\mathbf{A}\hat{\mathbf{x}}

是

~\mathbf{b}~

在

\text{Col}\mathbf{A}

上的正交投影：

\mathbf{A}\hat{\mathbf{x}} = \hat{\mathbf{b}}

由正交投影的唯一性，

\hat{\mathbf{x}}

是最小二乘解。

3. 最小二乘法求解

法方程

~\mathbf{A}^T \mathbf{A} \hat{\mathbf{x}} = \mathbf{A}^T \mathbf{b}~

是一个

~n\times n~

的线性方程组，如果

~\mathbf{A}^T\mathbf{A}~

可逆（即列向量线性无关），我们可以直接求解：

\colorbox{#F0F8FF}{$\mathbf{x} = (\mathbf{A}^T \mathbf{A})^{-1} \mathbf{A}^T \mathbf{b}$}\tag{4}

如果

~\mathbf{A}^T\mathbf{A}~

不可逆，则最小二乘解不唯一。这个结论可由下面定理给出：

定理 14

最小二乘解唯一的等价条件

设

~\mathbf{A}~

是一个

~m\times n~

矩阵，以下命题是逻辑等价的：

对于任意 $~\mathbf{b} \in \mathbb{R}^m~$ ，方程 $\mathbf{A}\mathbf{x} = \mathbf{b}$ 存在唯一的最小二乘解。
矩阵 $~\mathbf{A}~$ 的列向量是线性无关的。
矩阵 $~\mathbf{A}^T\mathbf{A}~$ 是可逆的。

当这些条件成立时，最小二乘解

~\hat{\mathbf{x}}~

可以通过以下公式计算：

\hat{\mathbf{x}} = (\mathbf{A}^T\mathbf{A})^{-1}\mathbf{A}^T\mathbf{b}

下面通过两个示例来说明这两种情况。

3.1 存在唯一解的情况

求下面这个不相容的方程组

~\mathbf{A}\mathbf{x} = \mathbf{b}~

的最小二乘解，其中：

\mathbf{A} = \begin{bmatrix} 4 & 0 \\ 0 & 2 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 2 \\ 0 \\ 11 \end{bmatrix}

计算：

\mathbf{A}^T \mathbf{A} = \begin{bmatrix} 17 & 1 \\ 1 & 5 \end{bmatrix}, \quad \mathbf{A}^T \mathbf{b} = \begin{bmatrix} 19 \\ 11 \end{bmatrix}

由于

~\det(\mathbf{A}^T\mathbf{A}) \neq 0~

，解得：

\hat{\mathbf{x}} = (\mathbf{A}^T \mathbf{A})^{-1} \mathbf{A}^T \mathbf{b} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}

即

~\hat{\mathbf{x}} = \begin{bmatrix} 1 & 2 \end{bmatrix}^T~

是最小二乘的唯一解，它使得

~\mathbf{A}\hat{\mathbf{x}}~

最接近

~\mathbf{b}~

。

3.2 最小二乘解不唯一的情况

求下面这个不相容的方程组

~\mathbf{A}\mathbf{x} = \mathbf{b}~

的最小二乘解，其中

\mathbf{A} = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} -3 \\ -1 \\ 0 \\ 2 \\ 5 \\ 1 \end{bmatrix}

计算

\mathbf{A}^T\mathbf{A} = \begin{bmatrix} 6 & 2 & 2 & 2 \\ 2 & 2 & 0 & 0 \\ 2 & 0 & 2 & 0 \\ 2 & 0 & 0 & 2 \end{bmatrix},\quad \mathbf{A}^T\mathbf{b} = \begin{bmatrix} 4 \\ -4 \\ 2 \\ 6 \end{bmatrix}

由于

~\det(\mathbf{A}^T\mathbf{A}) = 0~

，

\mathbf{A}^T\mathbf{A}~

不可逆。我们需要通过矩阵方程

~\mathbf{A}^T\mathbf{A} = \mathbf{A}^T\mathbf{b}~

的增广矩阵来求解，增广矩阵为：

\left[\begin{array}{cccc|c} 6 & 2 & 2 & 2 & 4 \\ 2 & 2 & 0 & 0 & -4 \\ 2 & 0 & 2 & 0 & 2 \\ 2 & 0 & 0 & 2 & 6 \end{array}\right] \sim \left[\begin{array}{cccc|c} 1 & 0 & 0 & 1 & 3 \\ 0 & 1 & 0 & -1 & -5 \\ 0 & 0 & 1 & -1 & -2 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]

从化简后的增广矩阵，我们可以解出：

x_1 = 3-x_4,\quad x_2 = -5 + x_4,\quad x_3 = -2 + x_4, \quad x_4 ~ \text{是自由变量}

所以，

\mathbf{A}\mathbf{x} = \mathbf{b}

的最小二乘具有下面通解形式：

\mathbf{\hat{x}} = \begin{bmatrix} 3 \\ -5 \\ -2 \\ 0 \end{bmatrix} + x_4 \begin{bmatrix} -1 \\ 1 \\ 1 \\ 1 \end{bmatrix}

4. 最小二乘法的误差分析

当使用最小二乘解

~\hat{\mathbf{x}}~

计算

~\mathbf{A}\hat{\mathbf{x}}~

作为

~\mathbf{b}~

的近似时，

~\mathbf{b}~

与

~\mathbf{A}\hat{\mathbf{x}}~

的距离称为最小二乘误差

~(\textbf{least-squares error})~

。数学上，这个误差由欧几里得范数（

~\ell_2~

范数）表示，即：

\|\mathbf{b} - \mathbf{A}\hat{\mathbf{x}}\|

借用前面 3.1 中的示例，我们来计算最小二乘解的最小二乘误差。已知：

\mathbf{b} = \begin{bmatrix} 2 \\ 0 \\ 11 \end{bmatrix} ,\quad \mathbf{A}\hat{\mathbf{x}} = \begin{bmatrix} 4 & 0 \\ 0 & 2 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 4 \\ 4 \\ 3 \end{bmatrix}

计算

~\mathbf{r} = \mathbf{b} - \mathbf{A}\hat{\mathbf{x}}~

（

\mathbf{r}~

也被称作残差向量，

\textbf{Residual Vector}

）：

\mathbf{b} - \mathbf{A}\hat{\mathbf{x}} = \begin{bmatrix} 2 \\ 0 \\ 11 \end{bmatrix} - \begin{bmatrix} 4 \\ 4 \\ 3 \end{bmatrix} = \begin{bmatrix} -2 \\ -4 \\ 8 \end{bmatrix}

计算最小二乘误差：

\| \mathbf{b} - \mathbf{A}\hat{\mathbf{x}} \| = \sqrt{(-2)^2 + (-4)^2 + 8^2} = \sqrt{4 + 16 + 64} = \sqrt{84}

开通会员解锁全部动画

5. 求解最小二乘问题的其它方法

在求解最小二乘问题时，除了使用法方程 $~\mathbf{A}^T\mathbf{A}\mathbf{x} = \mathbf{A}^T\mathbf{b}~$ 之外，在特定情况下可以采用更直接或更稳定的方法。如果矩阵 $~\mathbf{A}~$ 的列向量正交，则最小二乘解可以通过投影公式直接计算，无需求解方程组，计算过程更简单。而当矩阵 $~\mathbf{A}~$ 的列向量线性无关但不正交时，可以使用 QR 分解，将问题转换为求解上三角方程，从而避免法方程带来的数值不稳定性。下面分别来介绍这两种方法。

5.1 正交列矩阵下的最小二乘解计算

当矩阵

~\mathbf{A}~

的列正交时，求解最小二乘问题

~\mathbf{A}\mathbf{x} = \mathbf{b}~

的过程会很简单。最小二乘解的核心思想是找到向量

~\mathbf{b}~

在矩阵

~\mathbf{A}~

的列空间

~\text{Col}\mathbf{A}~

上的正交投影

~\hat{\mathbf{b}}~

，即：

\mathbf{\hat{b}} = \text{proj}_{\text{Col}\mathbf{A}} \mathbf{b}

如果矩阵

~\mathbf{A}~

的列是正交的（即

~\mathbf{a}_i\cdot \mathbf{a}_j = 0,~~i\neq j~

），那么直接应用用投影计算公式即可：

\mathbf{\hat{b}} = \sum_i \frac{\mathbf{b} \cdot \mathbf{a}_i}{\mathbf{a}_i \cdot \mathbf{a}_i} \mathbf{a}_i

下面的示例中

~\mathbf{A}~

的列向量正交，我们来计算最小二乘解问题。条件如下：

\mathbf{A} = \begin{bmatrix} 1 & -6 \\ 1 & -2 \\ 1 & 1 \\ 1 & 7 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} -1 \\ 2 \\ 1 \\ 6 \end{bmatrix}

由于列向量

~\mathbf{a}_1~

和

~\mathbf{a}_2~

正交，最小二乘解可通过如下投影公式得到：

\begin{align*}\mathbf{\hat{b}} &= c_1\mathbf{a}_1 + c_2\mathbf{a}_2 \\[2ex] &= \frac{\mathbf{b} \cdot \mathbf{a_1}}{\mathbf{a_1} \cdot \mathbf{a_1}} \mathbf{a_1} + \frac{\mathbf{b} \cdot \mathbf{a_2}}{\mathbf{a_2} \cdot \mathbf{a_2}} \mathbf{a_2}\\[3ex] &= 2\begin{bmatrix} 1 \\ 1 \\ 1 \\ 1 \end{bmatrix} + \frac{45}{90}\begin{bmatrix} -6 \\ -2 \\ 1 \\ 7 \end{bmatrix} = \begin{bmatrix} -1 \\ 1 \\ 5/2 \\ 11/2 \end{bmatrix} \end{align*}

\hat{\mathbf{x}}~

由系数

~c_1,~c_2~

构成：

\hat{\mathbf{x}} = \begin{bmatrix}c_1 \\ c_2 \end{bmatrix} = \begin{bmatrix}2 \\ 1/2\end{bmatrix}

5.2 QR 分解求最小二乘解

当矩阵

~\mathbf{A}~

的列向量线性无关时，使用 QR 分解来计算最小二乘解效率更高。相比于直接求解法方程

~\mathbf{A}^T\mathbf{A}\mathbf{x} = \mathbf{A}^T\mathbf{b}~

，QR 分解可以避免求逆矩阵，更适用于大规模矩阵或病态矩阵

(\textbf{ill-conditioned matrix})

。

定理 15

QR 分解求最小二乘解

给定一个列向量线性无关的

~m\times n~

矩阵

~\mathbf{A}~

，设其 QR 分解为

~\mathbf{A} = \mathbf{Q}\mathbf{R}~

，其中

~\mathbf{Q}~

为正交矩阵，

~\mathbf{R}~

为上三角矩阵。则对于任意

~\mathbf{b} \in \mathbb{R}^m~

，线性方程组

~\mathbf{A}\mathbf{x} = \mathbf{b}~

存在唯一的最小二乘解，其表达式如下：

\hat{\mathbf{x}} = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{b}\tag{6}

证明

~\hat{\mathbf{x}} = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{b}~

是方程

~\mathbf{A}\mathbf{x} = \mathbf{b}~

的唯一最小二乘解。

令 $\hat{\mathbf{x}} = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{b}$ ，则
$\mathbf{A}\hat{\mathbf{x}} = \mathbf{Q}\mathbf{R} \hat{\mathbf{x}}$
代入 $\hat{\mathbf{x}}$ ：
$\mathbf{A}\hat{\mathbf{x}} = \mathbf{Q} \mathbf{R} \mathbf{R}^{-1} Q^T \mathbf{b}$
由于 $~\mathbf{R}\mathbf{R}^{-1} = \mathbf{I}~$ ，化简得： $\mathbf{A}\hat{\mathbf{x}} = \mathbf{Q} \mathbf{Q}^T \mathbf{b}$
由定理 12，矩阵 $~\mathbf{Q}~$ 的列向量是构成 $~\text{Col}\mathbf{A}~$ 的标准正交基，且 $\mathbf{Q}\mathbf{Q}^T\mathbf{b}$ 是 $\mathbf{b}$ 在 $\text{Col}\mathbf{A}$ 上的正交投影：
$\hat{\mathbf{b}} = \mathbf{Q}\mathbf{Q}^T\mathbf{b}$
因此：
$\mathbf{A}\hat{\mathbf{x}} = \hat{\mathbf{b}}$
这说明 $~\hat{\mathbf{x}}~$ 是最小二乘解。
唯一性由定理 14保证，因为 $~\mathbf{A}~$ 的列向量线性无关，最小二乘解唯一。
结论： $~\hat{\mathbf{x}} = \mathbf{R}^{-1}\mathbf{Q}^T\mathbf{b}~$ 是 $~\mathbf{A}\mathbf{x} = \mathbf{b}~$ 的唯一最小二乘解。

避免直接计算

\mathbf{R}^{-1}

实际运算中，我们不需要显式计算逆矩阵

~\mathbf{R}^{-1}~

，而是通过求解线性方程组：

\mathbf{R}\hat{\mathbf{x}} = \mathbf{Q}^T\mathbf{b}\tag{7}

来求解

~\hat{\mathbf{x}}~

。由于

~\mathbf{R}~

是上三角矩阵，所以利用

~(7)~

来求解

~\hat{\mathbf{x}}~

的效率要明显高于使用

~(6)~

的方式。这种情形在特征值估计一节中我们也讨论过。

下面的示例中

~\mathbf{A}~

的列向量线性无关。给定矩阵和向量：

\mathbf{A} = \begin{bmatrix} 1 & 3 & 5 \\ 1 & 1 & 0 \\ 1 & 1 & 2 \\ 1 & 3 & 3 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 3 \\ 5 \\ 7 \\ -3 \end{bmatrix}

进行 QR 分解：这里忽略矩阵 $~\mathbf{A}~$ 的 QR 分解过程，QR 分解后的形式如下：
$\mathbf{A} = \mathbf{Q} \mathbf{R} = \begin{bmatrix} 1/2 & 1/2 & 1/2 & 1/2 \\ 1/2 & -1/2 & -1/2 & -1/2 \\ 1/2 & -1/2 & 1/2 & 1/2 \\ 1/2 & 1/2 & -1/2 & -1/2 \end{bmatrix} \begin{bmatrix} 2 & 4 & 5 \\ 0 & 2 & 3 \\ 0 & 0 & 2 \\ 0 & 0 & 0 \end{bmatrix}$
计算 $~\mathbf{Q}^T\mathbf{b}~$ ：
$\mathbf{Q}^T \mathbf{b} = \begin{bmatrix} 1/2 & 1/2 & 1/2 & 1/2 \\ 1/2 & -1/2 & -1/2 & 1/2 \\ 1/2 & -1/2 & 1/2 & -1/2 \end{bmatrix} \begin{bmatrix} 3 \\ 5 \\ 7 \\ -3 \end{bmatrix} = \begin{bmatrix} 6 \\ -6 \\ 4 \end{bmatrix}$
求解上三角方程 $~\mathbf{R}\mathbf{x} = \mathbf{Q}^T\mathbf{b}~$ ：
$\begin{bmatrix} 2 & 4 & 5 \\ 0 & 2 & 3 \\ 0 & 0 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 6 \\ -6 \\ 4 \end{bmatrix}$
解得最小二乘解：
$\hat{\mathbf{x}} = \begin{bmatrix} 10 \\ -6 \\ 2 \end{bmatrix}$

6.4 格拉姆-施密特正交化

6.6 机器学习和线性模型

最小二乘问题

1. 最小二乘问题概述

2. 一般最小二乘问题的解

2.1 正交投影与最优逼近

2.2 误差向量的正交性质

2.3 最小二乘解与法方程的关系

证充分性：最小二乘解满足法方程

证必要性：法方程的解是最小二乘解

3. 最小二乘法求解

3.1 存在唯一解的情况

3.2 最小二乘解不唯一的情况

4. 最小二乘法的误差分析

5. 求解最小二乘问题的其它方法

5.1 正交列矩阵下的最小二乘解计算

5.2 QR 分解求最小二乘解

证定理 15 ~15~ 15

解QR 分解计算 x^ ~\hat{\mathbf{x}}~ x^