Both generate a population of plausible variations from a single base example, drawn from a distribution. A user-factory emits 1000 different `{name, email, address}` records sampled from realistic distributions; data augmentation emits a different rotated / cropped / color-jittered version of every training image. Same intent: expose the consumer (test, model) to the variation it will see in production, not a single hand-picked example.