DilanMeridian Systems operates 200 cameras per city block. Their AI tracks every face, every gait, every pattern. We need someone who can build vision systems that fight back. Noor — you built a real-time collaborative drawing app with Canvas API and WebRTC for 50,000 users. You already know how to work with pixel data. You just never trained a model on it.
NoorSo when I was streaming pixel buffers over WebRTC in my drawing app — I was already working with image tensors? I just didn't call them that?
Every time you call getImageData() on a canvas context, you get back a Uint8ClampedArray — a flat buffer of RGBA pixel values. That array is a tensor: a multi-dimensional grid of numbers. Computer vision starts with exactly the data structures frontend developers already use.
const pixels = ctx.getImageData(0, 0, w, h).data; // Uint8ClampedArrayconst tensor = tf.browser.fromPixels(canvas); // [height, width, channels]import * as tf from '@tensorflow/tfjs';
// Frontend: you've done this before
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d')!;
canvas.width = 224;
canvas.height = 224;
// Draw an image onto the canvas
ctx.drawImage(img, 0, 0, 224, 224);
// Method 1: Raw pixel access (what you already know)
const imageData = ctx.getImageData(0, 0, 224, 224);
const pixels = imageData.data; // Uint8ClampedArray, length = 224 * 224 * 4
// Each pixel: [R, G, B, A] — values 0-255
// Method 2: TensorFlow.js tensor (the ML way)
const tensor = tf.browser.fromPixels(canvas);
console.log(tensor.shape); // [224, 224, 3] — height, width, RGB channels
console.log(tensor.dtype); // 'int32' — pixel values as integers
// Same data, different interface:
// pixels[0] === tensor.dataSync()[0] — the red value of pixel (0,0)
// Why 3 channels, not 4?
// tf.browser.fromPixels drops the alpha channel by default
// Pass numChannels: 4 to keep it:
const withAlpha = tf.browser.fromPixels(canvas, 4);
console.log(withAlpha.shape); // [224, 224, 4]// Image dimensions in ML:
// Grayscale: [height, width, 1] — one value per pixel
// Color: [height, width, 3] — RGB per pixel
// Batch: [batch, height, width, 3] — multiple images
// A single 224x224 RGB image
const singleImage = tf.randomNormal([224, 224, 3]);
console.log('Single image:', singleImage.shape);
// A batch of 32 images (what models expect during training)
const batch = tf.randomNormal([32, 224, 224, 3]);
console.log('Batch:', batch.shape);
// Models expect batched input — even for a single image
// Use expandDims to add the batch dimension:
const batched = singleImage.expandDims(0);
console.log('Batched single:', batched.shape); // [1, 224, 224, 3]
// Surveillance feed dimensions:
// Meridian cameras output 640x480 RGB frames
// Protocol Sentinel resizes to 224x224 for inference
const cameraFrame = tf.randomNormal([480, 640, 3]);
const resized = tf.image.resizeBilinear(cameraFrame.expandDims(0), [224, 224]);
console.log('Resized for model:', resized.shape); // [1, 224, 224, 3]Convert canvas pixel data to a tensor and inspect its shape.
Write a function that takes image dimensions (width, height, channels) and a flat pixel array, and returns an object with the tensor shape and total number of elements.
interface TensorInfo { shape: [number, number, number]; totalElements: number; } function describeTensor(width: number, height: number, channels: number): TensorInfo { // shape is [height, width, channels] (note: height first!) // totalElements = height * width * channels return null; // your code here }
NEON boots up — glitchy, disoriented, but functional. 'I used to track 14 million faces simultaneously. Now I am helping you debug a tensor shape mismatch. How the mighty have fallen.'
Next: pixels and tensors — understanding the shape of image data