The Calculator

Chain-of-thought reasoning and step verification had improved the model's performance on complex questions. But certain types of errors persisted stubbornly.

The Arithmetic Problem

The model consistently made errors on large-number arithmetic. It could identify that a problem required multiplication, set up the calculation correctly in its reasoning chain, and then produce the wrong numerical result.

Trviksha: The model writes: "The total cost is 47,832 times 156. Let me calculate: 47,832 times 156 equals 7,425,592." The correct answer is 7,461,792. The model set up the right operation and got the wrong number.

Blortz: It is a language model, not an arithmetic engine. It learned to approximate arithmetic from patterns in text — seeing examples like "12 times 8 equals 96" — but it did not learn the actual algorithm for multiplication. For small numbers, the patterns are reliable. For large numbers, the approximation breaks down.

Trviksha: I can add more arithmetic examples to the training data. But the model will still be approximating. It will never be reliable on seven-digit multiplication because it is not actually multiplying — it is predicting what the result of multiplication looks like in text.

GlagalCloud had a dedicated arithmetic engine — a system of pebble arrangements designed specifically for precise computation. It was not a neural network. It was a rule-based system that executed the multiplication algorithm exactly, step by step. It never made arithmetic errors because it was not predicting — it was computing.

Trviksha: The language model is good at understanding questions, reasoning about relationships, and generating explanations. It is bad at precise arithmetic. The arithmetic engine is good at precise arithmetic and bad at everything else. What if the language model could call the arithmetic engine when it needs a calculation done?

The Tool Call

She modified the model's generation process. When the model encountered a computation it could not reliably perform — large multiplication, division, exponentiation — it generated a special token sequence: a request for the arithmetic engine.

Instead of writing "47,832 times 156 equals 7,425,592," the model would write: "47,832 times 156 equals [CALCULATE: 47832 * 156]." The system intercepted this token, sent the calculation to the arithmetic engine, received the exact result (7,461,792), inserted it back into the text, and the model continued generating from that point.

Trviksha: The model decides when to call the tool and what to ask it. The tool performs the computation and returns the result. The model incorporates the result and continues its reasoning. The model is the thinker. The tool is the calculator.

Drysska: The model knows what it does not know?

Trviksha: Not exactly. The model has learned — through training on examples — that certain types of calculations benefit from the tool call. It has a rough sense of which computations are easy (small numbers, common multiplications) and which are error-prone (large numbers, multi-digit operations). But its calibration is imperfect — it sometimes attempts calculations it should delegate, and occasionally delegates calculations it could handle.

The Agent

The arithmetic engine was just the first tool. Trviksha added more:

Archive search: When the model needed to verify a fact — "What was the grain output of Klomvaj province in Year 14?" — instead of generating an answer from memory (risking hallucination), it could issue a search query to Hjentova's archive index and receive the actual recorded value.

Weather lookup: When the model needed current conditions — "Is the Grintjak Pass currently clear?" — it could query Vrothjelka's weather system for real-time data.

Legal reference: When the model needed to cite a specific statute, it could query the legal code index rather than reconstructing the statute from memory.

With multiple tools, the model became something more than a text generator. It was a coordinator — understanding the question, deciding which tools to invoke, interpreting the results, and synthesising a final answer.

Trviksha: The model plans. It decides what information it needs. It calls the appropriate tool. It reads the result. It incorporates the result into its reasoning. It may call another tool if the first result raises new questions. It is not just generating text — it is acting in a loop of planning, acting, observing, and reasoning.

Glagalbagal: An agent. Not a document. An agent that takes actions in the world — even if the "world" is just a set of databases and calculators.

Blortz: A velociraptor that delegates. It knows what it is good at — reasoning, language, planning — and uses tools for what it is bad at — arithmetic, fact retrieval, current data. Each component does what it does best.

Zhrondvik: This is what I wanted from the beginning. Not a model that guesses at numbers and invents facts, but a system that looks up facts when it needs them and calculates precisely when it needs to. The language model provides the intelligence. The tools provide the reliability.

Trviksha: The model is the brain. The tools are the hands, the eyes, the reference books. Together, they are more capable than either alone.

Trviksha has built an AI agent — a system where a language model plans actions, uses external tools, observes results, and iterates. Tool use addresses a fundamental limitation: language models are trained on text patterns and are unreliable at tasks that require precise computation, up-to-date information, or access to specific databases. By generating structured tool calls instead of guessing, the model leverages specialised systems that are reliable at exactly the things the model is not. The model contributes reasoning, language understanding, and planning; the tools contribute precision and access to real data. This is the architecture behind modern AI assistants that can search the web, run code, query databases, and interact with APIs. The agent paradigm — plan, act, observe, reason — transforms the language model from a passive text generator into an active problem-solver. Think about how you solve problems: you do not try to compute everything in your head. You use a calculator for arithmetic, a search engine for facts, a calendar for scheduling. Your intelligence is in knowing what to look up and how to combine the results — not in memorising everything. The language model plays the same role.