The Dial

After the hallucination incident, Hjentova was cautious about the model's outputs. But she found a different use for it: drafting text. She asked the model to generate summaries of tablets, compose catalogue entries, and produce translations of dialect texts. The results were competent but dull.

The Repetition Problem

Hjentova: Every summary sounds the same. "This tablet describes the trade regulations of the province of..." followed by a mechanical list. It is technically correct, but I would be embarrassed to put these in the catalogue. They read like they were written by a very diligent clerk with no personality.

Trviksha examined the generation process. At each step, the model predicted probabilities for every token in the vocabulary. The token with the highest probability was selected. Then that token was added to the sequence, and the process repeated.

The problem was that "highest probability" was conservative. The most likely next word was always the safest choice — the most common, most expected, most generic continuation. A sequence of safest choices produced text that was correct, predictable, and colourless.

Trviksha: The model always picks the most likely next word. "The trade regulations of the province" is more likely than "The fascinating trade regulations of the province" or "The surprisingly strict trade regulations of the province." The safe choice, repeated at every step, produces safe text.

The Randomness Setting

Trviksha added a parameter she called the dial. Instead of always picking the most likely token, the model could pick randomly from among the likely tokens, weighted by their probabilities. The dial controlled how much randomness to inject.

At the lowest setting — dial at near zero — the model almost always picked the single most probable token. Safe, repetitive, predictable.

At a moderate setting, the model sometimes picked the second or third most likely token. This introduced variety — different word choices, unexpected phrasing, occasional creative turns.

At a high setting, the model picked broadly from many possible tokens, even ones with low probability. This produced diverse, surprising text — but also frequent nonsense and errors.

Trviksha: Low dial: "The province produced grain." Medium dial: "The province was known for its exceptional grain harvests." High dial: "The province harvested luminous grain from crystalline terraces under the singing moons."

Hjentova: The first is boring. The second is good. The third is poetry, or possibly insanity.

A stone dial with markings from 0 to 2. At the left (0), a sample of text is neat but monotonous — the same phrases repeated. In the middle (around 0.7), the text is varied and engaging. At the right (2), the text is wild and incoherent, with words that make no sense together. A velociraptor's claw rests on the dial, positioned at the middle setting

The Tradeoff

At dial setting 0.3, summaries were correct but dull — useful for factual catalogue entries where accuracy mattered more than style.

At dial setting 0.7, summaries were engaging and varied — useful for public-facing descriptions where readability mattered.

At dial setting 1.2, the model produced creative and occasionally brilliant text, but roughly one in five outputs contained fabricated details or grammatically mangled passages.

Hjentova: Different settings for different tasks?

Trviksha: Exactly. For factual work — answering questions about specific tablets, looking up dates and names — use a low dial. The model will be repetitive but accurate. For creative work — drafting catalogue entries, composing introductions — use a moderate dial. For generating novel text where accuracy is not critical — brainstorming, exploring ideas — use a high dial and expect some nonsense.

Blortz: The dial trades reliability for creativity. At one extreme, the model says only what it is most confident about. At the other extreme, it says whatever comes to mind, confident or not.

Trviksha: And the right setting depends on what you need. There is no universally correct setting. A legal document should be generated at a low dial. A poem should be generated at a high dial. The same model, the same training, the same architecture — but different behaviour depending on how much randomness you inject.

Glagalbagal: In my experience, the best writing is somewhere between the predictable and the wild. Too careful, and it bores. Too reckless, and it confuses. The right amount of surprise is what makes language interesting.

Trviksha: The model would agree — if it could agree with things. At dial 0.7, it produces text that is surprising enough to be interesting but predictable enough to be coherent. That is the sweet spot for most purposes.

Trviksha has discovered temperature — the parameter that controls randomness in language model generation. At each step, the model produces a probability distribution over possible next tokens. Temperature scales these probabilities: low temperature sharpens the distribution (the most likely token dominates), high temperature flattens it (many tokens become equally likely). Temperature 0 always picks the single most probable token (deterministic, repetitive). Temperature 1 samples according to the model's learned probabilities (balanced). Temperature above 1 amplifies randomness (creative but unreliable). This is a fundamental tradeoff in generative AI: reliability versus creativity. The same model can behave as a conservative fact-retriever or an inventive storyteller, depending entirely on this single number. Modern AI applications set temperature based on the task: low for code generation and factual Q&A, higher for creative writing and brainstorming. Have you ever revised your own writing, going back and forth between "too boring" and "too weird"? That revision process is your internal temperature dial.