Ensign DemirCommander, the raw sensor data is coming in too fast for single-pass processing. We need a pipeline — something that feeds data to the analysis system in manageable chunks.
ARIAConfirmed. Batch processing will allow me to analyze data incrementally while maintaining memory efficiency.
If you've ever implemented infinite scroll or pagination in a frontend app, you already understand the core concept of data batching. Instead of loading everything at once, you process data in chunks.
In frontend development, you fetch data from APIs and render it. In ML, you fetch data and feed it to a model. The pattern is remarkably similar:
import * as tf from '@tensorflow/tfjs';
// Frontend: paginated API fetch
// const page = await fetch('/api/data?page=1&limit=32');
// ML: data generator yields batches
function* sensorDataGenerator() {
const rawData = [
{ input: [1, 2, 3], label: 0 },
{ input: [4, 5, 6], label: 1 },
{ input: [7, 8, 9], label: 0 },
// ... thousands more rows
];
for (const row of rawData) {
yield {
xs: tf.tensor(row.input),
ys: tf.tensor([row.label]),
};
}
}
// Create a dataset pipeline
const dataset = tf.data.generator(sensorDataGenerator)
.shuffle(100) // shuffle buffer of 100 items
.batch(32); // group into batches of 32
// Iterate through batches (like paginated API calls)
await dataset.forEachAsync((batch) => {
console.log('Batch shape:', batch.xs.shape); // [32, 3]
});The key difference from frontend pagination: ML shuffles the data randomly before batching. This prevents the model from memorizing the order instead of learning the patterns.
Think of it like rendering a virtual list. You don't render 10,000 DOM nodes at once — you render a "window" of visible items. Similarly, ML models process a "window" (batch) of data at a time, updating their understanding incrementally.
// Split data into training (80%) and validation (20%)
const TOTAL = 1000;
const TRAIN_SIZE = Math.floor(TOTAL * 0.8);
const allData = tf.data.generator(sensorDataGenerator).shuffle(TOTAL);
const trainData = allData.take(TRAIN_SIZE).batch(32);
const valData = allData.skip(TRAIN_SIZE).batch(32);The data pipeline is operational. ARIA confirms: ready for analysis.
Next: normalizing your data for optimal model performance