How to write tool descriptions that don’t confuse your AI agents
The single highest-leverage piece of writing you’ll do when shipping AI agents isn’t the system prompt. It’s the tool description.
If you’re shipping AI agents that call real tools against real systems, the single highest-leverage piece of writing you will do is the tool description. Not the system prompt. Not the retrieval strategy. Not the model selection. The description that tells the model what this specific callable thing is and when to reach for it.
Most teams spend hours on the system prompt and about ninety seconds on each tool description. The ratio is inverted. The system prompt is read once per session. Every tool description is read again, silently, every time the model decides whether to call that tool. It compounds in a way a system prompt does not.
The four parts, plus one
The format I’ve been using comes from a reliability-debug heuristic surfaced by practitioners. Four parts:
- What the tool does. Literal and narrow. One sentence.
- When to USE it. The triggers. Specific user intents or system conditions.
- When NOT to use it. The lookalikes. The tools that feel similar but are wrong.
- Input/output examples. One real call, one real response.
After living with this for a few months, I started adding a fifth:
- Failure modes. What the agent should do when the tool returns an error, rather than guessing.
The four parts tell the model how to pick the tool correctly. The fifth tells the model how to recover when the tool misbehaves. Both belong in the description — not in the system prompt, not in a separate error-handling layer, but right next to the call site where the decision is made.
Example 1 — a delete operation
Before:
deleteUser(userId: string): Promise<DeleteResult>
// Removes a user.“Removes a user” is lexically close to “archive,” “deactivate,” “suspend,” “hide.” A model under time pressure will pattern-match. It will not ask. It will call deleteUserwhen the user said “archive this account,” and your support team will spend the next Tuesday apologizing.
After:
deleteUser(input: { userId: string, confirmedByUser: boolean }): Promise<DeleteResult>
What: Permanently removes a user record and all owned data from the
database. This is not recoverable.
Use when: The user has explicitly requested account deletion AND has
been told the action is permanent. confirmedByUser MUST be true.
Do NOT use when: The user asks to archive, suspend, deactivate, hide,
disable, pause, or close. Use archiveUser for
reversible lifecycle changes.
Example:
Input: { userId: "u_123", confirmedByUser: true }
Output: { deleted: true, deletedAt: "2026-04-21T14:03:00Z" }
Error: { error: "confirmation_required" } if confirmedByUser is false.
Failure modes: If { error: "confirmation_required" }, do NOT retry.
Ask the user for explicit confirmation. If { error: "not_found" },
tell the user -- do not attempt fuzzy matches.The “do not retry on confirmation_required” line prevents the agent from interpreting the error as transient and replaying the deletion. “Do not attempt fuzzy matches on not_found” prevents the agent from looking up a similar account and deleting the wrong one.
Example 2 — a search operation
Before:
searchOrders(query: string): Order[]
// Search orders by query string.“Search” is the most overloaded word in enterprise software. The model does not know whether this searches by customer name, by SKU, by order ID, or by free-text. It will guess wrong.
After:
searchOrders(input: { query: string, limit?: number }): Order[]
What: Full-text search across order notes, customer name, and order ID.
Does NOT search by SKU, line-item text, or financial fields.
Use when: Finding orders by name, order ID, or a phrase in the notes.
Do NOT use when: Searching by SKU -- use searchBySku. By dollar amount
or status -- use filterOrders. Retrieving a specific
order by exact ID -- use getOrder (faster, full detail).
Example:
Input: { query: "Acme Corp", limit: 5 }
Output: [ { id: "ord_91", customer: "Acme Corp", ... } ]
Failure modes: Empty array is valid -- do not re-query with broader terms.
If { error: "query_too_short" }, inform the user; do not
pad the query yourself.The “do NOT use when” section converts the ambiguous “search” into a narrow tool with explicit siblings. The agent now has a decision tree, not a hunt.
Example 3 — a write operation with side effects
Before:
sendNotification(userId: string, message: string): Promise<void>
// Sends a notification to the user.“Notification” is another overloaded word. Email? Push? In-app? The model will guess, and it will cost money.
After:
sendNotification(input: {
userId: string,
message: string,
channel: "email" | "push" | "in_app",
priority?: "low" | "normal" | "high",
}): Promise<{ notificationId: string }>
What: Queues one notification on exactly one channel for one user.
Does NOT retry across channels. Does NOT batch.
Use when: The user or system explicitly needs to alert a specific
person on a specific channel.
Do NOT use when: Broadcasting to multiple users -- use sendNotificationBatch.
Transactional email (receipts, resets) -- use
sendTransactionalEmail, which logs compliance metadata
this tool does not.
Failure modes: If { error: "channel_disabled" }, do NOT fall back to
another channel -- the user has opted out. If
{ error: "rate_limited" }, do not retry; tell the user
their notification is deferred.The “do not fall back to another channel” line is worth its weight in compliance lawyers. Without it, a model that receives channel_disabledwill try push because it pattern-matches to “still a notification.” That’s a consent violation.
The meta-insight
Here is what most people miss: you’re not writing these descriptions for the model. You’re writing for the model’s context window.
Every tool description lives alongside every other tool description, in a finite space, read top to bottom on every step. When the descriptions are vague, the model resolves ambiguity by interpolating across all of them — blending semantics, inventing commonalities, making overconfident picks. When they are precise, the model resolves ambiguity by reading the relevant one and making the right call.
Clarity compounds. A well-written description pays you back every time the tool is considered. A vague one costs you every time.
Write them like every caller is a first-time reader who will take one shot and never ask for clarification. Because that’s exactly what’s happening, every step of every agent, in every production run.
StateAnchor gates your API contract in CI so your agents always call a stable surface.
Get started →Quickstart guide →