Seeing Like a Machine

Context

Dilan
Meridian Systems operates 200 cameras per city block. Their AI tracks every face, every gait, every pattern. We need someone who can build vision systems that fight back. Noor — you built a real-time collaborative drawing app with Canvas API and WebRTC for 50,000 users. You already know how to work with pixel data. You just never trained a model on it.

Noor
So when I was streaming pixel buffers over WebRTC in my drawing app — I was already working with image tensors? I just didn't call them that?

Every time you call getImageData() on a canvas context, you get back a Uint8ClampedArray — a flat buffer of RGBA pixel values. That array is a tensor: a multi-dimensional grid of numbers. Computer vision starts with exactly the data structures frontend developers already use.

Learning Objectives

○Understand that images are multi-dimensional arrays (tensors) of pixel values
○Convert between canvas pixel data and TensorFlow.js tensors
○Identify the shape of image data: [height, width, channels]
○Recognize that frontend canvas work is already image tensor manipulation

Images Are Tensors

Frontend

Canvas getImageData()

const pixels = ctx.getImageData(0, 0, w, h).data; // Uint8ClampedArray

Machine Learning

Image Tensor

const tensor = tf.browser.fromPixels(canvas); // [height, width, channels]

Identity Bridge

image-to-tensor.tstypescript

import * as tf from '@tensorflow/tfjs';

// Frontend: you've done this before
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d')!;
canvas.width = 224;
canvas.height = 224;

// Draw an image onto the canvas
ctx.drawImage(img, 0, 0, 224, 224);

// Method 1: Raw pixel access (what you already know)
const imageData = ctx.getImageData(0, 0, 224, 224);
const pixels = imageData.data; // Uint8ClampedArray, length = 224 * 224 * 4
// Each pixel: [R, G, B, A] — values 0-255

// Method 2: TensorFlow.js tensor (the ML way)
const tensor = tf.browser.fromPixels(canvas);
console.log(tensor.shape); // [224, 224, 3] — height, width, RGB channels
console.log(tensor.dtype); // 'int32' — pixel values as integers

// Same data, different interface:
// pixels[0] === tensor.dataSync()[0]  — the red value of pixel (0,0)

// Why 3 channels, not 4?
// tf.browser.fromPixels drops the alpha channel by default
// Pass numChannels: 4 to keep it:
const withAlpha = tf.browser.fromPixels(canvas, 4);
console.log(withAlpha.shape); // [224, 224, 4]

Image Shapes

image-shapes.tstypescript

// Image dimensions in ML:
// Grayscale: [height, width, 1]    — one value per pixel
// Color:     [height, width, 3]    — RGB per pixel
// Batch:     [batch, height, width, 3] — multiple images

// A single 224x224 RGB image
const singleImage = tf.randomNormal([224, 224, 3]);
console.log('Single image:', singleImage.shape);

// A batch of 32 images (what models expect during training)
const batch = tf.randomNormal([32, 224, 224, 3]);
console.log('Batch:', batch.shape);

// Models expect batched input — even for a single image
// Use expandDims to add the batch dimension:
const batched = singleImage.expandDims(0);
console.log('Batched single:', batched.shape); // [1, 224, 224, 3]

// Surveillance feed dimensions:
// Meridian cameras output 640x480 RGB frames
// Protocol Sentinel resizes to 224x224 for inference
const cameraFrame = tf.randomNormal([480, 640, 3]);
const resized = tf.image.resizeBilinear(cameraFrame.expandDims(0), [224, 224]);
console.log('Resized for model:', resized.shape); // [1, 224, 224, 3]

Challenge

Convert canvas pixel data to a tensor and inspect its shape.

⚡

Exercise

Beginner~5 min

Convert Canvas to Tensor

Write a function that takes image dimensions (width, height, channels) and a flat pixel array, and returns an object with the tensor shape and total number of elements.

interface TensorInfo {
  shape: [number, number, number];
  totalElements: number;
}

function describeTensor(width: number, height: number, channels: number): TensorInfo {
  // shape is [height, width, channels] (note: height first!)
  // totalElements = height * width * channels
  return null; // your code here
}

Key Takeaways

✓Images are multi-dimensional arrays: [height, width, channels] — the same data as canvas getImageData()
✓tf.browser.fromPixels() converts a canvas element to a TensorFlow.js tensor
✓Models expect batched input: [batch, height, width, channels] — use expandDims() for single images
✓You've been working with image tensors every time you used the Canvas API

Need a hint?

🧭 Guidance

✅ Solution