Part 19 of 58
The Same Eyes Everywhere
By Madhav Kaushish · Ages 12+
The sliding-window approach worked well. But Trviksha had not yet appreciated how dramatically it changed the economics of the network.
Counting the Pebbles
Blortz, ever the bookkeeper, did the arithmetic.
Blortz: Your old fully connected network took four hundred inputs and connected each one to each hidden velociraptor. With sixteen hidden velociraptors, that was four hundred times sixteen weights — six thousand four hundred weight arrangements. Plus sixteen weights from the hidden layer to the output. Six thousand four hundred and sixteen weights in total.
Trviksha: That sounds right.
Blortz: Your new sliding-window approach has nine weights.
Trviksha: Nine.
Blortz: Nine weights, applied at every position on a 20×20 grid. The same nine weights, reused approximately three hundred and twenty-four times. Nine pebble arrangements on one shelf versus six thousand four hundred on sixty shelves. The storage difference is enormous.
Nine weights versus six thousand four hundred. The convolutional approach used less than one-seventh of one percent of the fully connected approach's parameters. And it worked better.
Why Fewer Is Better
Trviksha: It is not just that fewer weights are cheaper to store and faster to adjust. Fewer weights actually produce a better model.
Glagalbagal: How can knowing less lead to better predictions?
Trviksha: The fully connected network had six thousand four hundred chances to memorize. Each weight could encode a position-specific quirk — "plot 17 tends to be blighted in the training fields." With that many free parameters and only forty training fields, the network had more than enough capacity to memorize every field individually.
The convolutional network had nine chances to encode knowledge. Nine weights had to capture everything the network knew about what blight looked like. There was no room to memorize specific fields or specific positions. The nine weights could only encode the general pattern — and the general pattern was what Kvrothja needed.
Blortz: Fewer weights force generalization. The network cannot memorize because it does not have enough capacity. It must learn the rule.
Trviksha: And the rule is the same everywhere in the field, which is exactly what the shared weights assume. The assumption matches the reality. A disease cluster genuinely does look the same regardless of position. So the constraint — "use the same weights everywhere" — is not a limitation. It is correct.
The Comparison
To confirm, Trviksha trained both architectures on the same forty fields with the same train/test split and the same regularization:
| Architecture | Weights | Training accuracy | Test accuracy |
|---|---|---|---|
| Fully connected (16 hidden) | 6,416 | 94% | 76% |
| Sliding window (3×3) | 9 | 86% | 88% |
The fully connected network had higher training accuracy but lower test accuracy — classic overfitting. The sliding window had lower training accuracy (it could not memorize) but higher test accuracy (it had learned the right thing).

Kvrothja: The simpler system is better?
Trviksha: The simpler system is better because its simplicity matches the problem. The data has spatial structure. The sliding window respects that structure — same weights everywhere, local patches only. The fully connected network ignores the structure and wastes its capacity learning things that are not true in general.
Blortz: If the data did not have spatial structure — if the pattern were different at every position — the sliding window would fail, because its assumption would be wrong.
Trviksha: Correct. The sliding window works because the assumption of translation invariance holds for this data. For data where it does not hold — patient records, for instance, where each column means something different — you would not use this approach.
The lesson was not that fewer weights are always better. It was that encoding the right assumptions into the architecture can dramatically reduce the number of weights needed, while simultaneously improving performance. Knowing the structure of the problem allowed Trviksha to build a network that was both smaller and smarter.