Skip to content
Extras/math-deep-dive/convolution-as-matrix
// companion content · math depth

Convolution as Matrix Multiplication

Convolution can be expressed as multiplication by a Toeplitz matrix — a structured matrix where each diagonal is constant.

Instructor

In the Signal Processing module, you applied convolution as a sliding window over data. That's the intuitive view. But mathematically, convolution is matrix multiplication in disguise. The kernel becomes a special structured matrix called a Toeplitz matrix, and the "sliding" becomes a single matrix-vector multiply. This is exactly why GPUs — which are optimized for matrix math — run convolutions so fast.

Learning Objectives

  • Understand the Toeplitz matrix and how it encodes a sliding window
  • Convert a 1D convolution into an equivalent matrix multiplication
  • Explain why convolution in time domain equals multiplication in frequency domain
  • Connect CSS transform matrices to the structured matrices used in convolution
  • Reason about computational cost of convolution vs. FFT-based approaches

The Sliding Window as a Matrix

When you convolve a kernel [a, b, c] with an input [x1, x2, x3, x4, x5], you slide the kernel across and compute dot products. But you can express this as a single matrix multiplication:

Frontend

CSS Transform Matrix
transform: matrix(a, b, c, d, tx, ty)

Machine Learning

Toeplitz Matrix
const output = tf.matMul(toeplitzMatrix, inputVector)
Structural Bridge
⚠ Where this breaks
CSS transform matrix is a fixed 3x3 you wrote. The Toeplitz matrix that represents a convolution is automatically derived from the kernel, sparse, structured, and used only as a mathematical equivalence — practical implementations exploit the structure to avoid the explicit matrix.
toeplitz-basic.tstypescript
import * as tf from '@tensorflow/tfjs';

// Kernel: [1, 0, -1] (edge detector, like a CSS filter)
// Input:  [3, 7, 2, 8, 4]

// Sliding window (the intuitive way):
// Position 0: 1*3 + 0*7 + (-1)*2 = 1
// Position 1: 1*7 + 0*2 + (-1)*8 = -1
// Position 2: 1*2 + 0*8 + (-1)*4 = -2

const kernel = tf.tensor1d([1, 0, -1]);
const input = tf.tensor1d([3, 7, 2, 8, 4]);

// The SAME operation as a Toeplitz matrix:
const toeplitz = tf.tensor2d([
[1,  0, -1,  0,  0],   // kernel at position 0
[0,  1,  0, -1,  0],   // kernel at position 1
[0,  0,  1,  0, -1],   // kernel at position 2
]);

const resultMatrix = tf.matMul(toeplitz, input.reshape([5, 1]));
console.log('Matrix result:', await resultMatrix.array());
// [[1], [-1], [-2]] — identical to the sliding window

// Using tf.conv1d for comparison
const resultConv = tf.conv1d(
input.reshape([1, 5, 1]),
kernel.reshape([3, 1, 1]),
1,    // stride
'valid'
);
console.log('Conv1d result:', await resultConv.array());
// [[[1], [-1], [-2]]] — same answer

Building a Toeplitz Matrix

The pattern is simple: each row of the Toeplitz matrix is the kernel shifted one position to the right, padded with zeros. The diagonals are constant — that's the defining property of a Toeplitz matrix.

build-toeplitz.tstypescript
import * as tf from '@tensorflow/tfjs';

function buildToeplitz(kernel: number[], inputLength: number): number[][] {
const kernelLength = kernel.length;
const outputLength = inputLength - kernelLength + 1; // 'valid' padding
const matrix: number[][] = [];

for (let i = 0; i < outputLength; i++) {
  const row = new Array(inputLength).fill(0);
  for (let j = 0; j < kernelLength; j++) {
    row[i + j] = kernel[j];
  }
  matrix.push(row);
}
return matrix;
}

// Build Toeplitz for a 3-element kernel on 7-element input
const kernel = [1, -2, 1]; // second-difference operator
const T = buildToeplitz(kernel, 7);

console.log('Toeplitz matrix:');
T.forEach(row => console.log(row.map(v => v.toString().padStart(3)).join(' ')));
// Each row is the kernel shifted right by one position:
//   1 -2  1  0  0  0  0
//   0  1 -2  1  0  0  0
//   0  0  1 -2  1  0  0
//   0  0  0  1 -2  1  0
//   0  0  0  0  1 -2  1

// Apply it
const input = tf.tensor1d([1, 4, 6, 4, 1, 0, 2]);
const toeplitzTensor = tf.tensor2d(T);
const result = tf.matMul(toeplitzTensor, input.reshape([7, 1]));
console.log('Result:', await result.flatten().array());

The Convolution Theorem

One of the most beautiful results in mathematics: convolution in the time domain equals element-wise multiplication in the frequency domain. This means you can compute convolution by:

  1. FFT the input
  2. FFT the kernel
  3. Multiply element-wise
  4. Inverse FFT

For large kernels, this is dramatically faster than sliding the window.

convolution-theorem.tstypescript
import * as tf from '@tensorflow/tfjs';

// Demonstrate the convolution theorem
// For simplicity, we'll work with compatible sizes

// The idea: conv(a, b) = ifft(fft(a) * fft(b))
// This is why audio processing uses FFT-based convolution
// for reverb effects — the impulse response kernel can be huge

// Direct convolution: O(n * k) where n=input size, k=kernel size
// FFT convolution:    O(n * log(n))
// When k is large (e.g., audio reverb with 48000-sample kernel),
// FFT wins massively

// In neural networks, kernels are small (3x3, 5x5)
// so direct convolution (via matrix multiply) is usually faster.
// But for signal processing with large kernels, FFT is essential.

// The connection to matrix multiplication:
// The Toeplitz matrix is diagonalized by the DFT matrix.
// T = F^(-1) * D * F, where D has the FFT of the kernel on diagonal.
// This is WHY the convolution theorem works!

const inputSize = 8;
const input = tf.randomNormal([inputSize]);
const kernel = tf.tensor1d([0.25, 0.5, 0.25]); // smoothing kernel

// Direct convolution
const directResult = tf.conv1d(
input.reshape([1, inputSize, 1]),
kernel.reshape([3, 1, 1]),
1,
'valid'
);

console.log('Direct convolution:', await directResult.flatten().array());
// FFT-based would give the same result, but is only faster for large kernels

Why This Matters for GPUs

GPUs have thousands of cores optimized for matrix multiplication. By expressing convolution as a matrix multiply (via Toeplitz or the related im2col approach), convolution layers run at near-peak GPU throughput. This is the fundamental reason deep learning took off when GPUs became accessible.

performance-insight.tstypescript
import * as tf from '@tensorflow/tfjs';

// In practice, frameworks use im2col (image to column):
// 1. Rearrange input patches into columns of a matrix
// 2. Reshape kernel into a matrix
// 3. Single matrix multiply = all convolution outputs
//
// This trades memory for speed — classic CS tradeoff

// Benchmark: sliding window vs matrix multiply
const inputLength = 1000;
const kernelSize = 5;
const input = tf.randomNormal([1, inputLength, 1]);
const filter = tf.randomNormal([kernelSize, 1, 1]);

// Time the convolution (internally uses matrix multiply on GPU)
const start = performance.now();
for (let i = 0; i < 100; i++) {
const result = tf.conv1d(input, filter, 1, 'valid');
result.dispose();
}
const elapsed = performance.now() - start;
console.log(`100 convolutions: ${elapsed.toFixed(1)}ms`);
// Fast because it's matrix multiplication under the hood

Challenge

Build a Toeplitz matrix from a kernel and verify it matches TensorFlow.js convolution output.

Exercise

AdvancedTensor Data~20 min

Toeplitz Convolution

Implement two functions: (1) `buildToeplitz` that takes a kernel array and input length, and returns a 2D array representing the Toeplitz matrix for 'valid' convolution (output length = inputLength - kernelLength + 1). Each row should be the kernel placed at the corresponding position, padded with zeros. (2) `convolveViaMatrix` that uses your Toeplitz matrix with TensorFlow.js matrix multiplication to compute the convolution result, returning a 1D tensor.

# bridge

CSS Transform MatrixToeplitz Matrix

Key Takeaways

  • Convolution is matrix multiplication using a Toeplitz matrix — each row is the kernel shifted by one position
  • The Toeplitz matrix turns a sliding window into a single matrix-vector multiply
  • The convolution theorem: convolution in time domain = multiplication in frequency domain
  • GPUs are fast at convolution because they're fast at matrix multiplication — it's the same operation
  • For small kernels (neural networks), direct/matrix convolution wins; for large kernels (audio), FFT wins

Need a hint?

🧭 Guidance
Solution
Report Issue
0/2000
Severity
Screenshot
+ Attach screenshot (optional)
page url + browser info captured automatically