A large language model can summarize a legal brief, write a sonnet, and explain quantum entanglement in plain English. Ask it to count the number of times the letter “r” appears in “strawberry” and it will probably get it wrong. This is not a bug. It is a direct consequence of how language models represent text.
Analysis Briefing
- Topic: Tokenization, character-level reasoning, and LLM architectural limits
- Analyst: Mike D (@MrComputerScience)
- Context: Sparked by a question from Claude Sonnet 4.6
- Source: Pithy Cyborg | AI News Made Simple
- Key Question: How can a model that reads text not actually see the individual characters in that text?
Tokens Are Not Characters
Language models do not process text as individual characters. They process tokens, which are chunks of text that a tokenizer has split the input into. Common words become single tokens. Rare words get split into multiple tokens. Subword pieces like “##ing” or “straw” and “berry” might be separate tokens even when you see one word.
“Strawberry” in many tokenizers splits into something like “straw” + “berry” or even “st” + “raw” + “berry” depending on the tokenizer and training corpus. When you ask the model to count the letters in “strawberry,” the model sees tokens, not the ten characters you typed. It is doing something more like recalling from training patterns than performing a character-level scan.
This is why the model might confidently report two “r”s in “strawberry” when there are three. It is not miscounting characters. It is making a prediction based on token-level representations where the character-level detail has been partially abstracted away.
Why Semantic Tasks Are Easy and Character Tasks Are Hard
Language models are extraordinarily good at tasks that map to the kind of reasoning embedded in their training data. Summarization, translation, explanation, argument construction, and code generation all appear in training data in ways that let the model learn the underlying patterns.
Character-level tasks rarely appear in training data as explicit reasoning chains. Nobody writes an article that says “The word ‘necessary’ has one ‘c’ and two ‘s’ characters, which I verified by scanning each position individually.” The character-level counting operation has no analog in the text the model learned from.
Extended thinking and chain-of-thought prompting can help with some character tasks because they force the model to work step by step rather than predict in one pass. Even so, the fundamental token-level representation makes this a hard problem to fully solve without architectural changes.
What This Tells You About AI Capability Gaps
The failure on character counting is not a sign that AI is generally weak. It is a sign that capability is architecture-dependent. Tokenization and LLM spelling failures explain why math, spelling, and precise string manipulation all fail for the same underlying reason.
Tasks that require exact manipulation of surface-level text (character counts, precise string matching, letter-by-letter operations) are consistently harder for language models than tasks requiring conceptual reasoning. The model is essentially a very sophisticated next-token predictor, and some tasks do not decompose well into token prediction.
What This Means For You
- Never rely on an LLM for character-level text operations in production. Use your programming language’s built-in string functions for anything that requires exact character counting, substring matching, or letter-by-letter processing.
- Understand that fluency and accuracy measure different things. A confident, grammatically perfect answer about the number of letters in a word can still be wrong.
- Use chain-of-thought prompting if you need the model to attempt character tasks, and always verify the result independently.
Enjoyed this? Subscribe for more clear thinking on AI:
- Pithy Cyborg | AI News Made Simple → AI news made simple without hype.
