Can I learn machine learning with JavaScript?

Yes. Tensorcraft teaches machine learning entirely in JavaScript and TypeScript using TensorFlow.js. Models train and run in the browser, no Python required. The curriculum covers neural networks, LSTMs, CNNs, and transformers through hands-on tutorials built for frontend developers.

How does Tensorcraft teach ML to frontend developers?

Through 50+ 'bridge' analogies that map frontend concepts you already know (like useState, Array.map, and fetch) to their ML equivalents (model weights, tensor operations, and inference APIs). Each course is a story-driven narrative where you build real ML models.

How much math do I need?

None upfront. You need working JavaScript: comfortable with functions, arrays, and async. The math is there when you want it: derivations sit in optional expandable drawers, and you can finish every module without opening one.

Module 1 of Deep Orbit, the live theme, is free, no account or credit card required. The other four themes ship in waves, each with a waitlist. Full themes cost $59 each, with bundle discounts available up to $159 for all 5 themes.

What ML topics does Tensorcraft cover?

Five specializations: Time-Series & Signals (RNNs, LSTMs), Computer Vision (CNNs, YOLO), NLP & Text Intelligence (Transformers, BERT), Multimodal & Generative AI (GANs, Diffusion), and Edge AI & Production ML (quantization, MLOps).

What if it turns out not to be for me?

Module 1 is free before any money moves. After purchase there's a 14-day money-back guarantee: full refund if you've used less than 20% of a theme.

Extras/math-deep-dive/information-theory

companion content · math depth

Information Theory for ML

Entropy quantifies uncertainty, cross-entropy measures prediction quality, and KL divergence measures the distance between distributions.

Instructor

Every classification model you've trained used cross-entropy loss. But why cross-entropy? Why not mean squared error for classification? The answer comes from information theory, the field that quantifies uncertainty, surprise, and the cost of being wrong. Once you understand it, loss function selection becomes a principled decision, not a recipe to memorize.

Learning Objectives

○Compute entropy as a measure of uncertainty in a distribution
○Understand cross-entropy loss as measuring the information gap between predictions and truth
○Calculate KL divergence and understand its asymmetry
○Connect entropy to compression ratios, the bridge to frontend intuition
○Choose appropriate loss functions based on information-theoretic principles

Entropy Measures Surprise

You've used gzip on web assets. Files with repetitive content compress well (low entropy); random data doesn't compress at all (high entropy). Entropy is literally the theoretical minimum number of bits needed to encode a message.

Frontend

Gzip Compression

const ratio = compressed.length / original.length

Machine Learning

Entropy

const H = -probs.reduce((s, p) => s + p * Math.log2(p), 0)

Structural Bridge

Where the analogy ends

Gzip compression measures redundancy in a single file. Entropy in ML measures expected information across a distribution and shows up in cross-entropy loss, mutual information, and KL divergence. Gzip is a useful intuition for compressibility but does not yield gradients.

entropy-basics.tstypescript

// Entropy: H(X) = -sum(p(x) * log2(p(x)))
// It measures the average surprise of events from a distribution

function entropy(probs: number[]): number {
return -probs.reduce((sum, p) => {
  if (p === 0) return sum;  // 0 * log(0) = 0 by convention
  return sum + p * Math.log2(p);
}, 0);
}

// Fair coin: maximum uncertainty
console.log('Fair coin:', entropy([0.5, 0.5]).toFixed(4));
// 1.0000 bit: you need exactly 1 bit (0 or 1) to encode each flip

// Biased coin (90% heads)
console.log('Biased coin:', entropy([0.9, 0.1]).toFixed(4));
// 0.4690 bits: less surprise, more compressible

// Certain outcome (100% heads)
console.log('Certain:', entropy([1.0, 0.0]).toFixed(4));
// 0.0000 bits: no surprise at all, nothing to encode

// Uniform distribution over 8 classes
const uniform8 = Array(8).fill(1/8);
console.log('Uniform 8-class:', entropy(uniform8).toFixed(4));
// 3.0000 bits: need 3 bits to encode 8 equally likely outcomes

// Connection to gzip:
// High-entropy file (random bytes) -> poor compression ratio
// Low-entropy file (repeated patterns) -> great compression ratio
// Entropy IS the compression limit

Cross-Entropy Prices Wrong Predictions

Cross-entropy H(p, q) asks: "If the true distribution is p, but I'm using distribution q to encode messages, how many bits do I waste?" When q matches p perfectly, cross-entropy equals entropy (no waste). When q is wrong, you pay extra bits.

cross-entropy.tstypescript

import * as tf from '@tensorflow/tfjs';

function crossEntropy(trueProbs: number[], predictedProbs: number[]): number {
return -trueProbs.reduce((sum, p, i) => {
  if (p === 0) return sum;
  return sum + p * Math.log2(predictedProbs[i]);
}, 0);
}

// True distribution: cat with 100% certainty
const trueLabel = [1.0, 0.0, 0.0]; // [cat, dog, bird]

// Good prediction
const goodPred = [0.9, 0.05, 0.05];
console.log('Good prediction CE:', crossEntropy(trueLabel, goodPred).toFixed(4));
// 0.1520 bits: low cost, model is close

// Bad prediction
const badPred = [0.1, 0.6, 0.3];
console.log('Bad prediction CE:', crossEntropy(trueLabel, badPred).toFixed(4));
// 3.3219 bits: high cost, model is very wrong

// Terrible prediction (almost certain it's NOT a cat)
const terriblePred = [0.01, 0.90, 0.09];
console.log('Terrible prediction CE:', crossEntropy(trueLabel, terriblePred).toFixed(4));
// 6.6439 bits: massive cost for confident wrong answer

// This is why cross-entropy penalizes confident wrong predictions
// so harshly: the log makes small probabilities very expensive

// TensorFlow.js built-in (uses natural log, not log2).
// softmaxCrossEntropy expects LOGITS, the raw pre-softmax scores,
// not probabilities. Log-probabilities work as logits here because
// softmax(ln(p)) recovers p exactly when p sums to 1.
const labels = tf.tensor2d([[1, 0, 0]]);
const logits = tf.tensor2d([
[Math.log(0.9), Math.log(0.05), Math.log(0.05)]
]);
const ce = tf.losses.softmaxCrossEntropy(labels, logits);
console.log('TF cross-entropy:', await ce.array());
// 0.1054 nats. Divide by ln(2) and you get 0.1520 bits,
// the same number we computed by hand for goodPred above.

KL Divergence: Distance Between Distributions

KL divergence measures how different distribution q is from distribution p. It's cross-entropy minus entropy: the extra bits wasted by using the wrong distribution.

kl-divergence.tstypescript

function klDivergence(p: number[], q: number[]): number {
return p.reduce((sum, pi, i) => {
  if (pi === 0) return sum;
  return sum + pi * Math.log2(pi / q[i]);
}, 0);
}

const p = [0.7, 0.2, 0.1]; // true distribution
const q = [0.3, 0.4, 0.3]; // model's distribution

console.log('KL(p || q):', klDivergence(p, q).toFixed(4));
// Positive: q is not a perfect model of p

console.log('KL(q || p):', klDivergence(q, p).toFixed(4));
// Different value! KL divergence is ASYMMETRIC
// KL(p||q) != KL(q||p)
// This asymmetry has practical consequences:
// - Minimizing KL(p||q) = minimizing cross-entropy (what we do in training)
// - Minimizing KL(q||p) = mode-seeking (used in variational inference)

// KL divergence of p from itself
console.log('KL(p || p):', klDivergence(p, p).toFixed(4));
// 0.0000: zero when distributions match

// Why cross-entropy loss works for classification:
// Minimizing CE(true, predicted) is equivalent to minimizing
// KL(true || predicted), because the entropy of the true labels
// is constant. We're directly minimizing the information gap.

Why Cross-Entropy and Not MSE for Classification?

ce-vs-mse.tstypescript

// Consider a confident wrong prediction: true=[1,0], pred=[0.01, 0.99]
const trueLabel = 1.0;

// MSE gradient at pred=0.01:
// d/dp (1 - p)^2 = -2(1-p) = -2(0.99) = -1.98
const mseGrad = -2 * (trueLabel - 0.01);

// Cross-entropy gradient at pred=0.01:
// d/dp -log(p) = -1/p = -1/0.01 = -100
const ceGrad = -1 / 0.01;

console.log('MSE gradient:', mseGrad.toFixed(2));    // -1.98
console.log('CE gradient:', ceGrad.toFixed(2));        // -100.00

// Cross-entropy produces a MUCH stronger gradient for confident
// wrong predictions. This is why CE trains faster for classification:
// it screams "you're wrong!" when the model is confidently incorrect,
// while MSE just whispers.

// Information-theoretic reason: CE directly minimizes the information
// gap. MSE minimizes squared distance, which has no information
// theoretic justification for probability distributions.

Challenge

Implement entropy, cross-entropy, and KL divergence from scratch.

Loading editor…

Recall Prompt

Why does cross-entropy loss produce much stronger gradients than mean squared error when a model makes a confident wrong prediction?

Lesson Recap

What you learned

✓Entropy measures the average surprise of a distribution: a perfectly predictable outcome has zero entropy, a uniform distribution has maximum entropy.
✓Cross-entropy measures how many extra bits you waste by encoding data from distribution p using distribution q, which is why minimizing it is the natural training objective for classification.
✓KL divergence is the extra cost above entropy; it is asymmetric, so minimizing KL(true || predicted) is exactly what cross-entropy training does.

The bridge

Just as a low gzip compression ratio signals low redundancy (high entropy), a low cross-entropy loss signals that the model's predicted distribution is close to the true label distribution.

You can now

Implement entropy, cross-entropy, and KL divergence from scratch, and explain why cross-entropy is the correct loss function for classification tasks.

Need a hint?

Guidance

Solution

← All Extras