Part 23 of 58

The Loop

By Madhav Kaushish · Ages 12+

Trviksha needed a velociraptor that remembered. Not one that received all the data at once, but one that processed it step by step, carrying forward what it had learned.

The Feedback Wire

She started with a single velociraptor — Drysska, as always. At each time step, Drysska received the current day's weather readings (five inputs: temperature, humidity, wind, cloud cover, rainfall) and produced an output.

The modification was simple in concept: Drysska's output from one time step would feed back as an additional input at the next time step.

Trviksha: On Day 1, you receive five weather inputs and produce an output. On Day 2, you receive five new weather inputs plus your own output from Day 1 — six inputs total. On Day 3, five inputs plus your Day 2 output. And so on.

Drysska: On Day 1, I have no previous output.

Trviksha: On Day 1, you start with a blank — a zero. You have seen nothing yet, so your carried-forward summary is empty.

Drysska processed Day 1's weather, producing a single output number. That number was a compressed summary of Day 1 — whatever Drysska's weights deemed important about that day's readings. On Day 2, Drysska received both the new weather readings and that summary. She processed the combination, producing a new output — a summary of Days 1 and 2, compressed through her weights.

By Day 7, Drysska's output encoded — in a single number — a compressed summary of the entire week, as filtered through her weights. If her weights were trained well, this summary would capture the weather patterns that mattered for prediction.

Blortz: The output at each step is a summary of everything up to that step. Day 7's output contains Day 6's output, which contains Day 5's, and so on. The entire history is folded into a single number, like a letter sealed inside a letter sealed inside a letter.

A velociraptor (Drysska) sitting at a workstation. On Day 1, she receives five pebbles from the left (today's weather) and produces one pebble on the right (her output). An arrow curves from the output pebble back around to the input side. On Day 2, the same five pebbles arrive from the left, plus the curved-back output pebble from Day 1 — six inputs total. The loop continues across several days, with the output always feeding back

Training the Loop

Training was more complex than before. The error at the final time step — the difference between the predicted and actual Day 8 weather — had to flow backward not just through the network's layers, but through time. The error at Day 7 flowed back to Day 6, then to Day 5, and so on. Each time step was like a layer in a very deep network, with the same weights reused at every step.

Trviksha: The same weights apply at every time step. Just as the convolutional filter used the same weights at every spatial position, this loop uses the same weights at every temporal position. The velociraptor processes Day 1 and Day 7 using the same set of weights.

Blortz: Parameter sharing again. But through time instead of space.

Trviksha: Exactly. And for the same reason. The weather patterns we are looking for — "rain follows humidity" — should work the same way regardless of whether they happen on Day 3 or Day 47. The pattern is time-invariant.

She trained the looping network on Vrothjelka's data. For each seven-day window in the training set, the network processed the days one at a time, carried the summary forward through the loop, and produced a prediction for Day 8. The error flowed backward through all seven time steps, adjusting the shared weights.

The Hidden State

After training, Trviksha examined the carried-forward summary at each step. On a sequence where a storm was building — rising humidity, falling pressure, shifting winds — the summary changed gradually over the days, tracking the buildup. On a sequence where conditions were stable, the summary barely changed.

Trviksha: The carried-forward summary is the network's memory. It is not recording the raw data — it is recording a compressed, weighted version of whatever the network has learned matters. I am calling it the hidden state. It is hidden because it is internal to the network — the customer never sees it. They only see the final prediction.

Vrothjelka: How many numbers is this hidden state?

Trviksha: For a single velociraptor, it is one number. But I can use multiple velociraptors, each carrying their own hidden state. Eight velociraptors means an eight-number hidden state — eight different aspects of the weather history being tracked simultaneously.

She expanded the network to eight recurrent velociraptors. Each received the five weather inputs plus all eight hidden states from the previous step. Each produced a new hidden state. The eight hidden states at the final time step fed into an output layer that produced the rainfall prediction.

Results: one-day-ahead predictions improved substantially over the flat network. The recurrent network captured day-to-day patterns — sequences of rising humidity leading to rain — that the non-sequential network had missed entirely.

Vrothjelka: Better. But I asked about weekly forecasts, not daily. Can it predict seven days ahead?

Trviksha: Let me try.