Back to Daily Feed 
Newer LLMs Worsen Tool Use with Custom Schemas
Must Read
Originally published on Simon Willison's Weblog by Simon Willison
View Original Article
Share this article:
Summary & Key Takeaways
- Newer Claude models (Opus 4.8, Sonnet 5) exhibit degraded performance with custom tool schemas.
- These models invent extra fields in tool calls, leading to schema validation failures.
- Armin theorizes this issue stems from specialized training on Claude Code's internal edit tools.
- This training may make models less generalizable to external or custom tool definitions.
- OpenAI models are trained on an
apply_patchmechanism, suggesting different approaches. - The problem raises questions about the need for multiple tool implementations in third-party harnesses.
Our Commentary
This is a genuinely concerning trend. We're seeing these models get "smarter" in some ways, but at the cost of generalizability for tool use. It feels like a step backward for developer experience. If every LLM requires its own specific tool implementation, the dream of interchangeable AI agents becomes a nightmare of bespoke integrations. I really hope this isn't a sign of things to come, where model providers optimize for their own ecosystems at the expense of broader utility.
View Original Article
Share this article: