Part 6 of 58

The Threshold

By Madhav Kaushish · Ages 12+

Grothvik was satisfied with the model's accuracy. She was not satisfied with its output.

Grothvik: I do not need to know that patient 412 has a risk score of 73.4 and patient 413 has a risk score of 28.1. I need to know: should I examine this patient now, or can they wait? Yes or no.

Trviksha: The model produces a score. You can decide that anyone above a certain score gets examined.

Grothvik: Above what score?

Trviksha: That is up to you.

The Cutoff Problem

Trviksha tested different cutoffs on the training data.

A cutoff of 50 caught 84% of the patients who eventually got sick but flagged 31% of healthy patients — false alarms that wasted Grothvik's time examining people who were fine.

A cutoff of 70 reduced false alarms to 12% but missed 35% of the sick patients. More than a third of the people who needed attention slipped through.

A cutoff of 30 caught nearly every sick patient but flagged two-thirds of all patients as high-risk, which was useless — Grothvik might as well examine everyone.

Grothvik: Every cutoff is a compromise.

Trviksha: Yes. A low cutoff catches more sick patients but creates more false alarms. A high cutoff creates fewer false alarms but misses more sick patients. There is no cutoff that catches all the sick patients and none of the healthy ones. The data is too noisy for that.

Grothvik: Then I need to decide which kind of mistake I am willing to live with.

The Asymmetry

In some problems, the two kinds of mistakes are equally costly. In medicine, they are not.

Grothvik: A patient I examine unnecessarily wastes an hour of my time. A patient I miss might die. These are not equivalent mistakes.

Trviksha: Then the cutoff should be low. You accept more false alarms in exchange for catching more real cases.

Grothvik: Set it at 40.

At a cutoff of 40, the model caught 91% of sick patients and flagged 38% of healthy patients. Grothvik would spend extra time on unnecessary examinations, but she would miss far fewer of the patients who needed her.

A stone number line stretching from 0 to 100 with pebbles representing patients scattered along it. A movable wooden marker sits at 40. To the left of the marker, patients are labelled "examine now." To the right, "can wait." Some pebbles are coloured red (truly sick) and others grey (healthy). A few red pebbles sit to the right of the marker — missed cases — while several grey pebbles sit to the left — false alarms. Trviksha slides the marker while Grothvik watches

Blortz: The number 40 is not a property of the data. It is a property of your values. Someone who weighs false alarms more heavily — an employer screening for a minor condition, where the cost of examination is high — would choose a different cutoff.

Trviksha: The model gives the same scores regardless. The cutoff is a separate decision. The model predicts; the threshold decides.

This distinction mattered. The model's job was to assign accurate scores — to rank patients by risk as faithfully as possible. The threshold's job was to convert those scores into actions. The model was mathematical. The threshold was ethical.

Grothvik: I want 40. And I want to revisit it each season, as the data changes.

Trviksha: You can change the threshold at any time without retraining the model. The model does not know or care where the threshold is. It only produces scores.

The model and the threshold were separate components: one learned from data, the other set by a human making a value judgment about acceptable risk. Trviksha filed this separation away. It would matter again.