Implementing the LU decomposition in Python
Implementing the LU decomposition in Python#
In the following we demonstrate a simple LU decomposition in Python. We also demonstrate do timing comparisions against the Python implementation from Scipy to show that one should never use a self-implementation of the LU decomposition but always use existing Numpy/Scipy routines.
import numpy as np
from scipy.linalg import lu as scipy_lu
def lu(A):
"""Our cool LU implementation."""
if not A.shape[0] == A.shape[1]:
raise ValueError("Input matrix must be square.")
n = A.shape[0]
L = np.zeros((n, n), dtype=np.float64)
U = np.zeros((n, n), dtype=np.float64)
U[:] = A
np.fill_diagonal(L, 1)
for col in range(n-1):
for row in range(col + 1, n):
L[row, col] = U[row, col] / U[col, col]
U[row, col:] = U[row, col:] - L[row, col] * U[col, col:]
U[row, col] = 0
return (L, U)
n = 200
A = np.random.randn(n, n)
L, U = lu(A)
residual = np.linalg.norm(A - L @ U) / np.linalg.norm(A)
print(f"The residual is: {residual}")
runtime_my_lu = %timeit -o lu(A)
runtime_scipy_lu = %timeit -o scipy_lu(A)
print(f"Runtime of our LU (s): {runtime_my_lu.best}")
print(f"Runtime of Scipy LU (s): {runtime_scipy_lu.best}")
print(f"The speed ratio between the two implementations is: {runtime_my_lu.best / runtime_scipy_lu.best}")
The residual is: 2.285256427880218e-13
50.6 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.26 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Runtime of our LU (s): 0.05018079660003423
Runtime of Scipy LU (s): 0.0018813237100403057
The speed ratio between the two implementations is: 26.673132503581297