Least Squares Problems
Contents
Least Squares Problems#
Let \(A\in\mathbb{R}^{m\times n}\) with \(m\geq n\) and \(b\in\mathbb{R}^m\). In the following we assume that \(\text{rank}(A) = n\).
We define the following leasts-squares problem
The normal equation#
Assume that \(\hat{x}\) is a minimizer. Let \(\epsilon\in\mathbb{R}^n\) be a small perturbation. Then
It follows that a necessary condition for \(\hat{x}\) to be a minimum is that
Since \(\epsilon^TA^TA\epsilon = \|A\epsilon\|_2^2 > 0 \forall \epsilon\neq 0\) the necessary condition is also sufficient.
The system
is called the normal equation for the least-squares problem.
Properties of the normal equation#
Let \(A = U\Sigma V^T\) be the SVD of \(A\). As generalization of the relative condition number of a square matrix we can define the relative condition number for a rectangular matrix as \(\kappa_{rel}(A):=\sigma_1/\sigma_n\).
We have
as SVD of \(A\). Hence, \(\kappa_{rel}(A^TA) = \sigma_1^2 / \sigma_n^2 = \kappa_{rel}(A)^2\).
It follows that the linear system in the normal equation has a squared condition number compared to the original matrix \(A\). In the following, we discuss orthogonalization methods that avoid this squaring of condition number.
Solving least-squares problems with the QR decomposition#
Let the full QR decomposition of \(A\) be given as
Substitute into the least-squares problem to obtain
The last term gives
We see that we can only influence the first block-line with \(x\). In the second line \(x\) is multiplied with zero and therefore no choice of \(x\) would reduce the contribution from there. Hence, the minimizer is the solution of the problem
It is possible to show (see exercises) that \(R\) has the same condition number as \(A\). Hence, we avoid the squaring of condition number from the normal equation.
Solving least-squares problems with the SVD#
The SVD can also be used to solve least-squares problems. This will lead us to an important concept, namely the pseudo-inverse of a matrix.
Let \(A = U\Sigma V^T\) be the SVD of \(A\). Substitute into the least-squares problem to obtain
Let \(y=V^Tx\) and factorise out \(U\) to obtain
Hence, \(y = \tilde{\Sigma}^{-1}U_1^Tb\) and therefore
We define the pseudo-inverse of \(A\) as
The pseudo-inverse of \(A\) is a generalization of the inverse of \(A\). In the same way that \(A^{-1}b\) solves the linear system \(Ax=b\) for a square matrix \(A\), the pseudo-inverse \(A^{\dagger}\) applied to \(b\) solves the least-squares minimisation problem \(\|Ax-b\|_2\) for rectangular \(A\).