Adding tools to an agent is supposed to make it more capable. In practice, above a certain threshold, adding tools makes the agent less reliable at using any of them. The model must now select the correct tool from a larger set, formulate the correct arguments for it, and interpret its output correctly. Each additional tool increases the probability that the agent selects the wrong one, uses the right one incorrectly, or gets confused by tool outputs that don’t fit its expectations.
Analysis Briefing
- Topic: Tool selection reliability, tool set size, and agent capability limits
- Analyst: Mike D (@MrComputerScience)
- Context: A technical briefing developed with Claude Sonnet 4.6
- Source: Pithy Cyborg | AI News Made Simple
- Key Question: At what point does giving an agent more tools start making it worse at its job?
The Tool Selection Problem at Scale
When an agent has five tools, the tool selection decision is relatively easy. The model reads the descriptions, identifies which tool fits the current step, and calls it. The cognitive load of the selection process is low relative to the overall task.
With twenty tools, the selection decision is harder. Similar tools with overlapping descriptions create ambiguity. The model must distinguish between search_web and search_documents and search_knowledge_base under conditions where the right choice depends on nuance the tool descriptions may not fully capture. Selection errors compound: a wrong tool selection at step three creates outputs the model must interpret, and that interpretation shapes the rest of the task.
AI function call silent failures make this worse. An agent that selects the wrong tool and gets a response it doesn’t understand will sometimes proceed as though the call succeeded, carrying a wrong premise forward through the rest of the task.
Tool Description Quality Degrades With Scale
Tool descriptions written carefully when you have five tools become inconsistent and ambiguous when you have twenty, because the fifth tool’s description was written without anticipating the fifteenth. As tool count grows, descriptions that seemed clear in isolation start overlapping with descriptions added later.
The model uses tool descriptions as its primary basis for selection. Ambiguous or overlapping descriptions mean the model is making selection decisions from poor information. The more tools, the harder it is to maintain description quality, and the more the selection reliability degrades.
The Right Architecture: Tool Sets, Not Tool Libraries
The solution is not to avoid adding tools but to avoid giving all tools to every agent all the time. Route tasks to specialist agents that have access to a small, coherent set of tools relevant to their role. A research agent has web search and document retrieval. A code agent has file read, file write, and code execution. An orchestrator routes between them but does not itself have access to all tools.
This approach preserves tool quality per agent, reduces selection error rates, and makes each agent’s behavior easier to evaluate and debug. Prompt injection across agent pipelines is also reduced when agents have narrower tool access and cannot be manipulated into misusing tools outside their defined role.
What This Means For You
- Keep each agent’s tool set to five to eight tools maximum and route to specialist agents for tasks requiring different tool sets, rather than building one agent with access to everything.
- Audit tool descriptions for ambiguity and overlap every time you add a new tool, because the description that was clear when written becomes ambiguous when a similar tool is added later.
- Measure tool selection accuracy explicitly by logging which tool the agent selected and whether it was the optimal choice, because tool selection errors are invisible in output quality metrics until they produce a clearly wrong final result.
Enjoyed this? Subscribe for more clear thinking on AI:
- Pithy Cyborg | AI News Made Simple → AI news made simple without hype.
