[ build log / visual explainer ]

Building a neuralnetwork from scratch

A neural network is easier to understand when you can watch the numbers move. I built this one in TypeScript, drew the signals on canvas, and made each training step visible, from the first guess to the weight update.

TypeScript·Canvas API·Zero ML deps·~10 min read

Why build it

Neural networks often arrive as a wall of notation. The formulas matter, but the picture has to come first. I wanted to see one small guess, one small correction, then another, until the shape of learning became visible.

So I built a playground. You choose the layers, press train, and watch the model learn at 60 frames per second. Every neuron, weight, and guess is drawn live. Signal enters on the left, passes through the hidden layers, and lands as a prediction on the right.

FIG. 00A network thinking

Input on the left, output on the right. Each pulse is a number being multiplied by a weight, shifted by a bias, and passed forward.

The idea to keep

A neural network is a stack of simple calculations that can change itself. The useful part is feedback, repeated many times.

The playground aims all of this at handwriting because handwriting is concrete. Draw a digit. Give the model a few examples. Then watch its guesses move from random to useful.

The forward pass

A neuron is a small calculation. It takes each input number, gives it a weight, adds a bias, and sends the result through an activation function.

Put many neurons in a layer, feed that layer's outputs into the next layer, and you have the forward pass. For layer l, the calculation can be written like this:

a^(l) = f( W^(l) · a^(l−1) + b^(l) )

That is the move: multiply, add, bend, repeat. The bend matters. Without the activation f, every layer would collapse into one linear map, no matter how many layers you stack. With it, the network can draw curved boundaries.

Activation functions

An activation function decides how much of a signal gets through. It is the bend in the line. Drag across the input axis and watch a plain input turn into three different outputs.

FIG. 01Activation functions

zero-centred, bounded (−1, 1)

The x value goes in. The curve returns the y value. tanh and sigmoid flatten near their ends, which makes gradients small there. ReLU clips negative values and leaves positive values alone.

Sigmoid

σ(x) =11 + e^−x

Smooth and bounded between 0 and 1. Near either end it gets flat, so gradients get tiny. Good for intuition, touchy in deep stacks.

Tanh

tanh(x) =e^x − e^−xe^x + e^−x

Centered around zero, which keeps updates more balanced. A steady default for this demo.

ReLU

ReLU(x) = max(0, x)

Simple and fast. Values below zero become zero. Values above zero pass through. Common in larger deep learning models.

Backpropagation

Training starts with a guess. The network compares that guess with the target, then sends an error signal backward. Each weight gets a small correction based on how much it contributed to the error.

We need a single number for the miss. This demo uses mean squared error across the outputs:

L = 1n Σ ( a_i − y_i )²

Backpropagation is the chain rule with bookkeeping. At each layer, we calculate how a tiny change in a weight would change the loss.

∂L∂W^(l) = ∂L∂a^(l) · ∂a^(l)∂z^(l) · ∂z^(l)∂W^(l)

The animation runs the same network backward. The error moves right to left. Connections flash as their weights change. A forward pass makes the guess. A backward pass edits the network.

FIG. 02The backward pass

The error starts at the output and moves back toward the input. Every lit edge is a weight being corrected. Training is thousands of these small corrections.

The learning rate

The learning rate sets the size of each correction. Small steps learn slowly. Large steps can jump past the answer. Too large, and the network starts chasing its own mistakes.

FIG. 03Gradient descent, four learning rates

converges in a few steps

A low rate crawls. A higher rate reaches the bottom sooner. Push too far and the steps overshoot, then grow unstable.

Letting the controls move made the lesson obvious. A rate above the useful range shakes the model loose. Too few neurons miss the pattern. Too many can memorize quirks in the samples.

What worked

· Learning rates around 0.01 to 0.1
· Three or four hidden layers for this small digit set
· Tanh for calmer gradients
· Real drawn samples
· Drawing every training step

Common pitfalls

· Rates above about 0.5
· Too few neurons, which underfit
· Sigmoid in deeper stacks, where gradients shrink
· Raw pixels with no normalization
· Judging only by training accuracy

Pixels to prediction

A drawing has to become numbers before the network can read it. The canvas is a 28 by 28 grid. Read row by row, it becomes 784 inputs, one value for each pixel.

FIG. 04Flattening an image

The grid is read row by row into one tall column. That column is the input. The network does not receive the original 2D layout, so it has to learn which nearby pixels tend to matter together.

Network shape

For the first version I kept the problem small: digits 1, 2, and 3. The model uses 784 inputs, three hidden layers of 16, and 3 outputs:

· Input, 784 neurons (28×28 pixels)
· Hidden, 3 layers of 16
· Output, 3 neurons, one per digit

Drawing like a real pen

The drawing canvas turns pointer events into those 784 pixel values. Anti-aliasing helped more than almost any model tweak. Real strokes spread into neighboring pixels, so the data should too:

HandwritingCanvas.tsx · anti-aliased strokes

1const drawPixel = (x, y, intensity = 1.0) => {2  const i = y * 28 + x;3  pixels[i] = Math.min(1, pixels[i] + intensity);   // the struck pixel4 5  // bleed a fraction into the 8 neighbours so strokes read smoothly6  const around = [7    { dx: -1, dy: 0, f: 0.3 }, { dx: 1, dy: 0, f: 0.3 },8    { dx: 0, dy: -1, f: 0.3 }, { dx: 0, dy: 1, f: 0.3 },9    { dx: -1, dy: -1, f: 0.15 }, { dx: 1, dy: -1, f: 0.15 },10  ];11  for (const { dx, dy, f } of around) {12    pixels[(y+dy)*28 + (x+dx)] += intensity * f;13  }14};

Data beats cleverness

I started with synthetic digits. They looked clean, and the network learned the wrong thing. Real samples helped more. Fifty messy digits from people beat five hundred perfect-looking generated ones.

Keeping it live

The page trains and renders at the same time. The loop uses requestAnimationFrame: run one training step, repaint the canvases, then ask the browser for the next frame.

Canvas resolution needs its own pass. The drawing buffer has to match the device pixel ratio. Otherwise the browser stretches the canvas and the lines go soft:

NetworkCanvas.tsx · DPR-aware render

1const render = () => {2  const rect = canvas.getBoundingClientRect();3  const dpr = window.devicePixelRatio || 1;4  canvas.width = rect.width * dpr;       // back the buffer at true resolution5  canvas.height = rect.height * dpr;6  ctx.setTransform(dpr, 0, 0, dpr, 0, 0); // then work in CSS pixels7 8  ctx.fillStyle = '#efe9de';9  ctx.fillRect(0, 0, rect.width, rect.height);10  // …draw weighted connections, then neurons on top11};

The expensive part

The slow part is drawing the network while training runs. Hundreds of connections need to be repainted at full resolution, and the model still has to make predictions underneath. The page keeps the detail level modest and pauses demos when they leave the viewport.

What I learned

Building it changed what I notice when I read about networks.

· Pictures expose costs. Real-time rendering made tradeoffs visible that a notebook would hide.
· Data shape matters. A few real examples taught more than many clean fake ones.
· Hyperparameters are control knobs. Small changes can turn learning into noise.
· Types catch math mistakes. TypeScript caught matrix dimensions before the browser did.

The takeaway

Watching the numbers move changes the subject from magic to mechanics. A network becomes easier to reason about when each guess and correction has a visible trace.

[ try it ]

Train the network yourself.

Pick the layers, draw a few digits, and watch the weights change as it learns.

Open the playground