From Answering to Acting
An AI that can only talk and an AI that can actually do things are not separated by a feature flag. They are separated by a whole stack of system design.
The picture from the previous part — probabilistic prediction, token-level concatenation, a finite window — describes a machine that can only "speak." But the AI coding tools you use every day are clearly doing more than that. They read your files, search your codebase, run commands in a terminal, and even kick off tests to verify their own output. Going from "answering questions" to "executing tasks" forces a long list of problems into the open:
- How does it decide what to do next, and how does it know which tools are even available?
- What happens when a tool call fails halfway through?
- Some abilities cannot really be expressed as "tools" at all — how do you handle those?
- And when one agent is not enough for the task, how do multiple agents actually work together?
The answers to these questions are what give concepts like Agent, Function Calling, MCP, and Skill their underlying logic. None of them was invented because some product wanted a new feature. They are different solutions that the same pressure — let the model act on its own — produces under different constraints. What you need to understand is not the trendy names, but what each of them is really solving, what cost it brings in, and where its edges are.