ZaraA grayscale image is a 2D grid — height by width. Each cell holds one brightness value. Add color and you get a third dimension: three channels for red, green, blue. Stack multiple images and you get a fourth dimension: the batch. You already think in these dimensions when you write CSS grid layouts.
NoorSo a 224x224 RGB image is like a CSS grid with 224 rows, 224 columns, and 3 layers stacked on top of each other? And a batch is like rendering that grid 32 times?
The flat Uint8ClampedArray from getImageData() stores pixels in row-major order: [R, G, B, A, R, G, B, A, ...]. TensorFlow.js tensors organize the same data as a shaped multi-dimensional array: [height, width, channels]. Different interface, same numbers underneath.
imageData.data[i * 4 + 0] // Red channel of pixel i (flat array)tensor.slice([y, x, 0], [1, 1, 1]) // Red channel of pixel (y, x)import * as tf from '@tensorflow/tfjs';
// Canvas: flat array, RGBA interleaved
// imageData.data = [R0, G0, B0, A0, R1, G1, B1, A1, ...]
// To get pixel (x, y) in a WxH image:
// index = (y * width + x) * 4;
// red = data[index + 0]
// green = data[index + 1]
// blue = data[index + 2]
// alpha = data[index + 3]
function getPixelCanvas(data: Uint8ClampedArray, width: number, x: number, y: number) {
const i = (y * width + x) * 4;
return { r: data[i], g: data[i + 1], b: data[i + 2], a: data[i + 3] };
}
// Tensor: shaped array, [height, width, channels]
// Pixel (x, y) is at position [y, x, :]
// No manual index math needed — the shape handles it
function getPixelTensor(tensor: tf.Tensor3D, x: number, y: number) {
const pixel = tensor.slice([y, x, 0], [1, 1, 3]);
const [r, g, b] = Array.from(pixel.dataSync());
pixel.dispose();
return { r, g, b };
}
// Both approaches access the same underlying data
// Tensors are cleaner because you think in dimensions, not offsets// Converting between formats ML models need
// Channels-last (TF.js default): [height, width, channels]
// This is what tf.browser.fromPixels returns
const channelsLast = tf.randomNormal([224, 224, 3]);
// Channels-first (PyTorch convention): [channels, height, width]
// Some pre-trained models expect this
const channelsFirst = channelsLast.transpose([2, 0, 1]);
console.log(channelsFirst.shape); // [3, 224, 224]
// Grayscale from RGB — average the channels
const rgb = tf.browser.fromPixels(canvas); // [224, 224, 3]
const grayscale = rgb.mean(2, true); // [224, 224, 1]
console.log(grayscale.shape);
// Or use a weighted conversion (human perception)
// Luminance = 0.299*R + 0.587*G + 0.114*B
const weights = tf.tensor1d([0.299, 0.587, 0.114]);
const luminance = rgb.toFloat().mul(weights).sum(2, true);
console.log(luminance.shape); // [224, 224, 1]Access specific pixel values from a tensor and reshape image data.
Write a function that converts between flat pixel array index and (x, y) coordinates. Given width, convert a flat index to {x, y} and vice versa. Assume 4 channels (RGBA) per pixel.
interface Pixel { x: number; y: number; } function flatToPixel(flatIndex: number, width: number): Pixel { // flatIndex is the byte offset in the RGBA array // Each pixel is 4 bytes: R, G, B, A // pixelIndex = floor(flatIndex / 4) // x = pixelIndex % width // y = floor(pixelIndex / width) return null; // your code here } function pixelToFlat(x: number, y: number, width: number): number { // Return the flat byte offset for pixel (x, y) return null; // your code here }
Noor realizes the pixel patterns in Meridian's feeds are familiar — too familiar. NEON says: 'I have seen this encoding before. At Meridian.'
Next: color channels — RGB, grayscale, and hidden data in color spaces