Too Many Factors

The age-based model predicted sickness with modest accuracy. But age was only one of seven factors. Grothvik was not satisfied.

Grothvik: You have shown me that older patients tend to get sicker. I already knew that. What I need is a system that accounts for everything — age, diet, location, water proximity, prior illness, occupation, household size. All at once.

Trviksha: One weight per factor. The same idea, but bigger.

The Weighted Sum

The single-factor model had two weights: one for age, and a baseline. Extending to seven factors meant eight weights: one per factor, plus a baseline. For each patient, the prediction would be:

Baseline + (age weight times age) + (water weight times water proximity) + (diet weight times diet score) + ... and so on for all seven factors.

Each weight represented how much its factor contributed to the prediction. A large weight meant the factor mattered a lot. A weight near zero meant the factor barely mattered. A negative weight meant the factor was protective — it pushed the prediction down instead of up.

Trviksha initialised the weights with guesses based on what she had learned from her manual sorting. Age: weight 3. Water proximity: weight 2. Dried-fish diet: weight 1. Prior illness: weight 2. Household size: weight -1 (larger households seemed slightly protective). Occupation and other factors: weight 1 each.

She ran the systematic adjustment — the same nudging process as before, but now with eight weights instead of two. For each patient, compute the prediction, measure the error, adjust each weight slightly in the direction that reduces the error.

After one pass through all two hundred and fourteen patients, the accuracy was 58%. After ten passes, 74%. After thirty passes, 82%.

Grothvik: Eighty-two percent. Better than the age-only model. Not as much better as I had hoped.

The Problem of Noise

Grothvik, encouraged by the progress, collected additional data. Over two months she gathered records from three more village healers, bringing the patient count to six hundred and forty-one. She also added new factors: water source type (river, well, spring, cistern), number of meals per day, roof material (thatched or stone), and distance to the nearest waste pit.

The model now had twelve input factors.

Trviksha ran the adjustment on the expanded data. After thirty passes, the accuracy had climbed to — 79%. Lower than the 82% on the smaller dataset.

Trviksha: More patients and more factors, but the prediction is worse. How?

Blortz: Some of your new factors may be noise. The roof material, for instance — is a stone roof genuinely protective, or is it correlated with wealth, which is correlated with better nutrition, which is the actual protective factor?

This was the problem of confounded variables. Roof material, nutrition, and wealth moved together in the data. The model could not separate their contributions. It assigned roof material a weight, but the weight captured a tangle of effects — roofs, wealth, and nutrition mixed together — attributing to roofs what was really about food.

The Phantom Factor

Worse, one factor was entirely meaningless. The patient identifier — a sequential number assigned when each record was created — had been accidentally included in the factor list. The identifier was a number like 147 or 382 that had no medical significance whatsoever.

Yet the model had assigned it a small but nonzero weight. Patients from the newest village happened to have higher identifiers (they were added last) and a slightly higher sickness rate. The model found this spurious correlation and used it.

Trviksha: The model is finding patterns that are not real.

Blortz: The model is finding patterns that are real in the data but not real in the world. The distinction matters.

Trviksha removed the patient identifier and the roof material. She re-ran the adjustment. Accuracy climbed to 84%. The improvement came not from adding better factors, but from removing misleading ones.

A growing wall of pebble arrangements representing model weights — starting small on the left (two weights for the age-only model) and growing larger toward the right (twelve weights for the full model). Trviksha looks concerned while a velociraptor dutifully adjusts pebbles one at a time

Deployment

Trviksha assigned a team to automate the process. Drysska — a literal-minded velociraptor who did exactly what the instruction tablet said, no more, no less — managed three junior velociraptors. They divided the patients into batches. Each velociraptor processed a batch, computed weight adjustments, and reported to Drysska, who combined them.

After fifty passes, the ten weights stabilised. Trviksha stored them as pebble arrangements on a dedicated shelf.

Grothvik tested the model on fifty new patients who had not been in the training data. It correctly flagged thirty-nine of the forty-three who eventually fell ill and correctly cleared five of the seven who remained healthy.

Grothvik: Thirty-nine out of forty-three. Better than I do on my own.

Trviksha: The model still misses four.

Grothvik: Medicine misses patients. The question is whether this misses fewer than the alternative, which is me trying to remember six hundred patient histories while treating a queue out the door. It does.

The model was deployed. Each new patient's factors were encoded, run through the ten weights, and scored. High scores triggered a flag. Grothvik reviewed the flagged patients personally. The model did not replace her judgment — it focused her attention.

Trviksha: The velociraptors are following rules they found. Rules embedded in the weights. The weights encode a pattern that no one explicitly programmed.

Glagalbagal: Is that not what you intended?

Trviksha: It is what I intended. It is also what unnerves me.

Trviksha has built a multiple regression model — a weighted sum of many input factors, with weights found by systematic adjustment against data. Three important lessons emerge. First: more data and more factors are not automatically better. If new factors are noisy, confounded, or meaningless, they can make predictions worse by introducing spurious patterns the model exploits. Garbage in, garbage out. Second: the model does not understand which factors are meaningful. It treats every number the same — multiply by a weight, add to the sum. If a coincidental correlation exists, the model will use it. Third: the weights encode learned knowledge that no human explicitly programmed. This is both the power and the mystery of machine learning. Can you think of a decision you make using multiple factors at once — choosing a restaurant, picking a route home, deciding what to wear — where some factors might be less relevant than they seem?