LLM API costs are not determined by prompt length alone. Input tokens, output tokens, cached tokens, and tool call tokens are all priced differently. A short prompt that triggers multiple tool calls costs more than a long prompt that generates a direct answer. An uncached system prompt costs ten times more per token than a cached one. The bill you did not expect came from the pricing mechanics you did not read.
Analysis Briefing
- Topic: LLM API token pricing mechanics and unexpected cost drivers
- Analyst: Mike D (@MrComputerScience)
- Context: A technical briefing developed with Claude Sonnet 4.6
- Source: Pithy Cyborg
- Key Question: Why is my API bill five times higher than my token count suggests it should be?
Why Output Tokens Cost More Than Input Tokens and by How Much
Input and output tokens are not the same price. For Claude Sonnet 4.6, output tokens cost three times more per token than input tokens. For GPT-4o, output tokens cost approximately four times more than input tokens. The asymmetry reflects the computational difference between processing existing tokens and generating new ones.
This pricing structure means that response length is a significant cost driver that prompt length does not predict. A short prompt that generates a 2,000-token response costs more than a long prompt that generates a 200-token response. Applications that generate long responses by default, detailed explanations, full code files, comprehensive analyses, pay a disproportionate share of their API costs in output token pricing.
Output length control is the highest-leverage cost optimization available for most applications. Explicit instructions to be concise, to answer directly without preamble, and to avoid unnecessary elaboration reduce output tokens without reducing response quality for tasks where the additional length was not adding value.
How Tool Calls Multiply Your Per-Request Cost
Every tool call in a function-calling or agentic workflow adds token cost that does not appear in the prompt length. A tool call generates output tokens for the function call itself, then adds the tool response as input tokens for the next generation step, then generates output tokens for the response following the tool use.
A simple request that triggers three tool calls can easily cost five to ten times more than the same request answered directly, because each tool call adds a complete generation-plus-ingestion cycle. The tool call overhead is invisible in the prompt and response length but fully visible in the token usage breakdown that most API clients log and most developers do not read.
Applications that use tool calls on every request regardless of whether tools are needed pay tool call overhead on requests that would have been cheaper with a direct answer. Routing simple requests to a direct generation path and complex requests to a tool-enabled path reduces this overhead significantly.
The Prompt Caching Math That Makes Large System Prompts Affordable
Prompt caching dramatically changes the economics of large system prompts. Without caching, a 10,000-token system prompt costs 10,000 input tokens on every request. With caching active and a cache hit, the same system prompt costs 1,000 tokens at 0.1x the standard input price, a 90 percent reduction on the system prompt portion of each request.
The breakeven point for caching is approximately two requests after the cache write. The first request pays a cache write premium of 1.25x standard input pricing. Every subsequent request within the five-minute TTL window pays 0.1x. For applications with any meaningful request volume, caching large system prompts converts a major recurring cost into a near-negligible one.
The five-minute TTL is the implementation detail that catches teams who set up caching and see inconsistent cost reduction. Requests separated by more than five minutes trigger cache misses and pay full input pricing plus the cache write premium. For low-frequency applications, the caching savings may not materialize consistently and the cache write overhead may actually increase costs on some requests.
What This Means For You
- Read your token usage breakdown, not just your total API bill. Input tokens, output tokens, cache writes, cache reads, and tool call tokens are all reported separately in API responses and explain cost spikes that prompt length alone does not.
- Control output length explicitly with instructions to be concise. Output tokens cost three to four times more than input tokens. Unnecessary elaboration is your highest-cost prompt engineering mistake.
- Route simple requests away from tool-calling workflows. A request that can be answered directly without tool use should be. Tool call overhead multiplies per-request cost by three to five times on requests where the tools are not needed.
- Implement prompt caching on system prompts above 1,000 tokens if your application sends more than a few requests per five-minute window. The 90 percent reduction on cached tokens pays back the implementation cost immediately at any meaningful request volume.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg → AI news made simple without hype.
