Can I learn machine learning with JavaScript?

Yes. Tensorcraft teaches machine learning entirely in JavaScript and TypeScript using TensorFlow.js. Models train and run in the browser, no Python required. The curriculum covers neural networks, LSTMs, CNNs, and transformers through hands-on tutorials built for frontend developers.

How does Tensorcraft teach ML to frontend developers?

Through 50+ 'bridge' analogies that map frontend concepts you already know (like useState, Array.map, and fetch) to their ML equivalents (model weights, tensor operations, and inference APIs). Each course is a story-driven narrative where you build real ML models.

How much math do I need?

None upfront. You need working JavaScript: comfortable with functions, arrays, and async. The math is there when you want it: derivations sit in optional expandable drawers, and you can finish every module without opening one.

Module 1 of Deep Orbit, the live theme, is free, no account or credit card required. The other four themes ship in waves, each with a waitlist. Full themes cost $59 each, with bundle discounts available up to $159 for all 5 themes.

What ML topics does Tensorcraft cover?

Five specializations: Time-Series & Signals (RNNs, LSTMs), Computer Vision (CNNs, YOLO), NLP & Text Intelligence (Transformers, BERT), Multimodal & Generative AI (GANs, Diffusion), and Edge AI & Production ML (quantization, MLOps).

What if it turns out not to be for me?

Module 1 is free before any money moves. After purchase there's a 14-day money-back guarantee: full refund if you've used less than 20% of a theme.

Extras/math-deep-dive/optimization-landscape

companion content · math depth

The Optimization Landscape

The loss surface is a high-dimensional terrain, and SGD with momentum is a ball rolling across it: velocity carries it through flat regions and past shallow dips.

Instructor

In the Training Loop module, you adjusted learning rates and watched loss curves. But what's the geometry behind those curves? The loss function defines a surface in high-dimensional space, and training is the process of finding the lowest point. The shape of that surface determines everything: whether training converges, how fast, and to what solution.

Learning Objectives

○Visualize loss functions as surfaces in weight space
○Distinguish convex from non-convex optimization problems
○Understand saddle points and why they're more common than local minima in high dimensions
○Implement SGD with momentum using the ball-rolling-downhill analogy
○Explain why learning rate schedules improve convergence

The Loss Surface

Imagine plotting as a function of two . You get a 3D surface: hills, valleys, ridges. The goal of training is to find the lowest valley. In a real network with millions of weights, this surface exists in millions of , but the intuition from 3D holds.

Frontend

3D Game Terrain

player.velocity += gravity * dt

Machine Learning

SGD Momentum

velocity = momentum * velocity - lr * gradient

Structural Bridge

Where the analogy ends

3D game terrain is authored by a level designer. SGD navigates a million-dimensional non-convex loss landscape with no map; momentum helps escape some local minima but offers no guarantee of finding the global optimum.

In a game engine, a character walks on terrain defined by a heightmap. is the same idea: you're standing on the loss surface, you look which direction goes downhill (the negative ), and you take a step that way.

loss-surface.tstypescript

import * as tf from '@tensorflow/tfjs';

// A simple 2D loss function: L(w1, w2) = w1^2 + 3*w2^2
// This is a bowl: convex, one global minimum at (0, 0)
function convexLoss(w1: number, w2: number): number {
return w1 * w1 + 3 * w2 * w2;
}

// Gradient: [dL/dw1, dL/dw2] = [2*w1, 6*w2]
function convexGradient(w1: number, w2: number): [number, number] {
return [2 * w1, 6 * w2];
}

// Vanilla gradient descent
let w1 = 5.0, w2 = 3.0;
const lr = 0.1;

for (let step = 0; step < 20; step++) {
const [g1, g2] = convexGradient(w1, w2);
w1 -= lr * g1;
w2 -= lr * g2;
console.log(`Step ${step}: w=[${w1.toFixed(3)}, ${w2.toFixed(3)}] loss=${convexLoss(w1, w2).toFixed(4)}`);
}
// Converges smoothly to (0, 0)

Convexity and Non-Convexity

A convex function is bowl-shaped: any line between two points on the surface stays above the surface. This guarantees a single global minimum. Linear regression loss is convex, so gradient descent always finds the best answer.

Neural network loss functions are non-convex. They have multiple valleys, ridges, and saddle points, and there's no guarantee you'll find the global minimum.

non-convex.tstypescript

// A non-convex 1D loss function with multiple minima
function nonConvexLoss(w: number): number {
return Math.sin(3 * w) + 0.5 * w * w - w;
}

function nonConvexGradient(w: number): number {
return 3 * Math.cos(3 * w) + w - 1;
}

// Starting from different points leads to different minima
for (const start of [-2.0, 0.0, 2.0, 4.0]) {
let w = start;
const lr = 0.05;
for (let i = 0; i < 100; i++) {
  w -= lr * nonConvexGradient(w);
}
console.log(`Start=${start.toFixed(1)} -> converged to w=${w.toFixed(4)}, loss=${nonConvexLoss(w).toFixed(4)}`);
}
// Different starting points, different answers. This is non-convex optimization

Saddle Points

In high dimensions, local minima are rare. Saddle points are far more common: points where the gradient is zero but the surface curves up in some directions and down in others. Think of a mountain pass: it's the lowest point along the ridge but the highest point along the valley.

saddle-point.tstypescript

// Saddle point example: f(x, y) = x^2 - y^2
// At (0, 0): gradient is [0, 0] but it's NOT a minimum
// It curves up in x, down in y: a saddle

function saddleLoss(x: number, y: number): number {
return x * x - y * y;
}

function saddleGradient(x: number, y: number): [number, number] {
return [2 * x, -2 * y];
}

// Plain gradient descent gets stuck at the saddle
let x = 0.001, y = 0.001;
const lr = 0.1;
for (let i = 0; i < 10; i++) {
const [gx, gy] = saddleGradient(x, y);
x -= lr * gx;
y -= lr * gy;
console.log(`Step ${i}: (${x.toFixed(6)}, ${y.toFixed(6)}) loss=${saddleLoss(x, y).toFixed(6)}`);
}
// y escapes (gradient pushes it away), but shows the saddle dynamics

SGD with Momentum

Momentum solves two problems: it helps escape saddle points and accelerates through flat regions. The physics analogy is perfect. A ball rolling downhill accumulates velocity.

sgd-momentum.tstypescript

import * as tf from '@tensorflow/tfjs';

// SGD with momentum: the ball-rolling-downhill optimizer
function sgdMomentum(
lossGradFn: (w: number[]) => number[],
initialWeights: number[],
lr: number,
momentum: number,
steps: number
) {
const weights = [...initialWeights];
const velocity = new Array(weights.length).fill(0);

for (let step = 0; step < steps; step++) {
  const grads = lossGradFn(weights);

  for (let i = 0; i < weights.length; i++) {
    // Physics: v = momentum * v - lr * gradient
    velocity[i] = momentum * velocity[i] - lr * grads[i];
    // Physics: position += velocity
    weights[i] += velocity[i];
  }
}
return weights;
}

// Compare: plain SGD vs momentum on a narrow valley
// L(w1, w2) = 0.5 * w1^2 + 50 * w2^2
// This is like a long, narrow canyon (hard for plain SGD)
const lossGrad = (w: number[]): number[] => [w[0], 100 * w[1]];

const plainResult = sgdMomentum(lossGrad, [10, 1], 0.005, 0, 200);
const momentumResult = sgdMomentum(lossGrad, [10, 1], 0.005, 0.9, 200);

console.log('Plain SGD:', plainResult.map(v => v.toFixed(4)));
console.log('Momentum:', momentumResult.map(v => v.toFixed(4)));
// Momentum converges much faster in the narrow valley

Challenge

Build a visualization of gradient descent on a loss surface and implement momentum.

Loading editor…

Recall Prompt

Why does adding momentum to gradient descent help more than just increasing the learning rate?

Lesson Recap

What you learned

✓The loss function defines a surface in weight space; gradient descent navigates this surface by stepping in the direction of steepest descent at each point.
✓Neural network loss surfaces are non-convex, so gradient descent can converge to different valleys depending on where training starts.
✓Saddle points, where the gradient is zero but the surface curves up in some directions and down in others, are more common than true local minima in high-dimensional spaces.

The bridge

A physics game applies `velocity += gravity * dt` to carry a character through dips in terrain; SGD momentum applies the same update rule to carry the optimizer through flat regions and past shallow saddle points on the loss surface.

You can now

Implement SGD with momentum and explain why the loss surface geometry determines when learning rate schedules are needed.

Need a hint?

Guidance

Solution

← All Extras