A language model, by itself, is a sophisticated text transformer: it reads text and produces text. Everything we call an "agent" — a system that can read files, query databases, call APIs, run code, and interact with the world — is a language model equipped with tools. Function calling (Anthropic calls it "tool use"; OpenAI calls it "function calling"; they are the same concept) is the mechanism that bridges the gap between a model that can describe an action and one that can actually take it.

Understanding function calling deeply — how schemas work, how the execution loop is structured, what parallel calling looks like, and how it relates to structured output — is foundational for anyone building LLM-powered applications. Then there is a newer layer on top: the Model Context Protocol (MCP), an open standard from Anthropic that standardizes how agents connect to tools across a diverse ecosystem. MCP is to agents what REST is to web services: a common interface that makes components interoperable without bespoke integrations. This article covers both in full depth.

⚡ Quick Takeaways
  • Function calling is the execution layer. The model doesn't run code — it emits a structured tool call (name + arguments); your controller executes the actual function and feeds the result back. The model only sees inputs and outputs, never the implementation.
  • JSON Schema defines the contract. Every tool is described by a name, a description the model uses to decide when to call it, and a JSON Schema that constrains the arguments. Good descriptions are critical — the model decides which tool to call based on them.
  • The tool-use loop is multi-turn by design. A single task may involve dozens of tool calls, each observing the previous result. The loop ends when the model emits a text response instead of a tool call.
  • Parallel tool calls cut latency. Models can emit multiple tool calls in a single turn; you fan them out concurrently and return all results at once, reducing wall-clock time for independent reads.
  • MCP standardizes the tool surface. Instead of writing a custom integration for every tool in every agent, MCP servers expose tools via a standard protocol — any MCP-compatible host (Claude Code, Cursor, etc.) can connect to any MCP server without custom glue code.
  • MCP and RAG solve different problems. RAG retrieves read-only context; MCP provides stateful, actionable tools. They are complementary, not alternatives.
tldr

Function calling is the mechanism: model outputs a tool call → your code executes it → result goes back → model continues. MCP is the standardization layer: a protocol that makes tools (as MCP servers) reusable across any compliant agent host. Master both to build agents that are both powerful and maintainable.

How Function Calling Works: The Mechanism

The key mental model: the model does not execute functions. It never touches your database, filesystem, or any external API. What the model does is produce a structured output that says "I would like to call function X with arguments Y." Your application code receives that output, validates it, runs the actual function, and returns the result to the model as a new message. The model then decides what to do next.

This design is deliberate. Execution happens in your code, under your control, with your permissions and your error handling. The model is sandboxed to producing intents; you are responsible for acting on them. This separation makes function-calling systems auditable, testable, and safe — you can mock tool responses in tests, log every call, rate-limit expensive tools, and require confirmation before side-effecting operations, all without touching the model at all.

The Three-Phase Interaction

  1. Model produces a tool call. When the model decides it needs a tool, it emits a structured tool-use block instead of (or in addition to) text. The block contains the tool name and a JSON object of arguments that conform to the tool's schema.
  2. Controller executes the tool. Your code deserializes the tool-use block, validates the arguments, calls the actual function, and collects the result (success or error).
  3. Result is fed back. You append the tool result as a new message (role: tool on Anthropic, role: tool with a tool_call_id on OpenAI) and call the API again. The model sees the result and decides what to do next — either make another tool call or produce a final text response.

Defining Tools with JSON Schema

Every tool is defined by three things: a name the model uses to identify it, a description the model reads to decide when to use it, and an input_schema (JSON Schema) that defines what arguments are valid. All three matter.

python — tool definition with JSON Schema
tools = [
    {
        "name": "search_codebase",
        "description": """Search the repository for files or code matching a query.
Use this to find relevant files before reading them. Returns a list of
file paths and matching line snippets. Prefer this over read_file when
you don't know which file contains what you need.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query — function name, variable, error message, or concept.",
                },
                "file_pattern": {
                    "type": "string",
                    "description": "Optional glob pattern to restrict search, e.g. '*.py' or 'src/**/*.ts'.",
                },
                "max_results": {
                    "type": "integer",
                    "description": "Maximum number of results to return. Default 10.",
                    "default": 10,
                },
            },
            "required": ["query"],
        },
    },
    {
        "name": "run_tests",
        "description": """Run the test suite and return results. Use after making code changes
to verify correctness. Returns pass/fail status and any error output.""",
        "input_schema": {
            "type": "object",
            "properties": {
                "test_path": {
                    "type": "string",
                    "description": "Specific test file or directory. Omit to run all tests.",
                },
                "flags": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Additional pytest flags, e.g. ['-x', '--tb=short'].",
                },
            },
            "required": [],
        },
    },
]

Writing Descriptions the Model Will Use Correctly

The model uses your description — not the schema — to decide which tool to call. Descriptions are prompt text, and prompt quality matters here as much as anywhere. Weak descriptions lead to wrong tool choices:

Similarly, property descriptions in the schema matter. "query" with no description is underspecified; "The search query — function name, variable, error message, or concept" tells the model how to formulate the input.

The Tool-Use Loop in Detail

Let us trace a complete multi-tool interaction to make the mechanics concrete. The task: "Find and fix the bug causing test_auth to fail."

python — full tool-use loop with result routing
import anthropic, json

client = anthropic.Anthropic()

def execute_tool(name: str, args: dict) -> str:
    # Dispatch to actual implementations
    if name == "run_tests":
        import subprocess
        cmd = ["python", "-m", "pytest"] + args.get("flags", [])
        if "test_path" in args:
            cmd.append(args["test_path"])
        result = subprocess.run(cmd, capture_output=True, text=True)
        return result.stdout + result.stderr
    elif name == "read_file":
        return open(args["path"]).read()
    elif name == "write_file":
        with open(args["path"], "w") as f:
            f.write(args["content"])
        return f"Written {len(args['content'])} chars to {args['path']}"
    return f"Unknown tool: {name}"

messages = [{"role": "user", "content": "Find and fix the bug causing test_auth to fail."}]

while True:
    response = client.messages.create(
        model="claude-opus-4-5", max_tokens=4096,
        tools=tools, messages=messages,
    )

    # Append assistant turn to conversation
    messages.append({"role": "assistant", "content": response.content})

    if response.stop_reason == "end_turn":
        # Model produced a final text response — done
        print(response.content[0].text)
        break

    # Collect all tool calls in this turn (may be multiple)
    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            result = execute_tool(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": result,
            })

    # Feed results back to model
    messages.append({"role": "user", "content": tool_results})

Notice that the loop is completely generic. It does not know anything about the specific task — that knowledge lives in the model. The controller only knows how to: (1) call the API, (2) check whether the model is done, (3) execute any tool calls it finds, and (4) return the results. This is the standard pattern for every function-calling application.

Parallel Tool Calls: Eliminating Unnecessary Latency

In the basic loop above, if the model decides to read three files, it does so in three sequential turns: request file A, get A, request file B, get B, request file C, get C. That is 3× the round-trip time needed. Modern models support emitting multiple tool calls in a single turn, which your controller should fan out concurrently.

python — parallel tool execution with asyncio
import asyncio, anthropic

async def execute_tool_async(name: str, args: dict) -> str:
    # Async versions of tool implementations
    if name == "read_file":
        import aiofiles
        async with aiofiles.open(args["path"]) as f:
            return await f.read()
    # ... other tools

async def handle_tool_calls(tool_use_blocks):
    # Fan out ALL tool calls in this turn concurrently
    tasks = [
        execute_tool_async(block.name, block.input)
        for block in tool_use_blocks
        if block.type == "tool_use"
    ]
    results = await asyncio.gather(*tasks)  # all run in parallel

    return [
        {
            "type": "tool_result",
            "tool_use_id": block.id,
            "content": result,
        }
        for block, result in zip(
            [b for b in tool_use_blocks if b.type == "tool_use"],
            results
        )
    ]
# If model requests read_file("auth.py"), read_file("utils.py"), read_file("tests/test_auth.py")
# → all three reads happen concurrently; latency = max(read_time) not sum(read_time)

The speedup is real: reading 5 files in parallel takes roughly as long as reading 1 file. For agents that do extensive exploration (reading many files to understand a codebase), parallel tool calls can cut exploration time by 3–5×.

When Parallel Calls Are Not Safe

Fan-out is only correct for independent operations. If tool call B depends on the result of tool call A (e.g., "search for a file, then read the result"), they must be sequential — B uses information from A's result. The model usually handles this correctly by emitting sequential turns for dependent calls. Your controller's job is to execute whatever the model emits in a single turn concurrently, without trying to impose its own ordering on independent calls.

Tool Use as Structured Output

Tool use is not just for agents — it is also the most reliable way to extract structured output from a model. When you want the model to return a specific data shape (for parsing by downstream code), defining that shape as a tool schema and setting tool_choice to force that tool is far more reliable than asking the model to "return JSON."

MethodSyntax reliabilitySchema reliabilityUse when
Prompt: "return JSON"80–95%LowQuick experiments only
JSON mode100%Low (valid JSON, wrong shape)When any valid JSON is acceptable
Forced tool call100%High (schema-constrained)Production extraction pipelines

The forced tool call approach works because the model is trained to produce arguments that conform to the schema. The API validates the output before returning it — a malformed tool call would be a model error, not an application error. You can treat the result as a typed dict without defensive parsing.

What Is MCP?

Function calling as described above requires you to write the tool definitions (schemas), implement the execution logic, and wire everything together in your application. Every agent application reimplements this wheel independently. If you want your agent to talk to GitHub, you write a GitHub tool. If you want it to talk to Postgres, you write a Postgres tool. If someone else builds a different agent, they write their own GitHub tool. There is no reuse.

The Model Context Protocol (MCP), released by Anthropic in late 2024, addresses this directly. MCP is an open standard that defines a common protocol for exposing tools (and other capabilities) that any compliant host can consume without custom integration work. It is to agents what USB is to peripherals: a standard interface that makes components interoperable.

The Core Idea in One Sentence

MCP separates the tool implementation (an MCP server) from the tool consumer (an MCP host), with a standard protocol in between, so that any server works with any host.

MCP Architecture: Host, Client, Server

MCP defines three roles:

text — MCP architecture diagram
┌─────────────────────────────────────────────┐
│                  MCP HOST                   │
│  (Claude Code, Cursor, custom agent app)    │
│                                             │
│  ┌──────────┐   ┌──────────┐   ┌─────────┐ │
│  │ LLM      │   │ MCP      │   │ MCP     │ │
│  │ (Claude  │◄──│ Client   │   │ Client  │ │
│  │  /GPT)   │   │    A     │   │    B    │ │
│  └──────────┘   └────┬─────┘   └────┬────┘ │
└────────────────────────────────────────────-┘
                        │                  │
              MCP Protocol         MCP Protocol
              (JSON-RPC 2.0)       (JSON-RPC 2.0)
                        │                  │
               ┌────────▼───────┐  ┌─▼──────────────┐
               │  MCP Server A  │  │  MCP Server B  │
               │  (GitHub API)  │  │  (PostgreSQL)  │
               │                │  │                │
               │  tools:        │  │  tools:        │
               │  - list_repos  │  │  - run_query   │
               │  - create_pr   │  │  - list_tables │
               │  - get_issues  │  │  - describe_db │
               └────────────────┘  └────────────────┘

The transport layer is JSON-RPC 2.0, typically over stdio (for local servers) or HTTP with Server-Sent Events (for remote servers). The framing is standardized; what varies is the tools each server exposes and their schemas — which the server advertises at connection time via a capability negotiation handshake.

The Three Primitives MCP Servers Expose

PrimitiveWhat it isAnalogy
ToolsFunctions the model can invoke; take arguments, have side effects, return resultsREST API endpoints (POST/PUT/DELETE)
ResourcesRead-only data the host can load into context; identified by URIREST API endpoints (GET) or files
PromptsPre-written prompt templates the server exposes for common tasksCanned SQL queries / saved searches

Tools are the most important primitive and what most MCP servers focus on. Resources provide a lightweight way for a server to surface data (e.g., a database schema, a document index) without requiring the model to invoke a tool. Prompts allow servers to package up task-specific prompt templates that hosts can inject into the conversation.

Writing a Minimal MCP Server

python — minimal MCP server with mcp SDK
from mcp.server.fastmcp import FastMCP
import subprocess, pathlib

mcp = FastMCP("dev-tools")

@mcp.tool()
def run_tests(test_path: str = "", flags: list[str] = []) -> str:
    """Run the project test suite. Pass test_path to run a specific file.
    Returns combined stdout+stderr from pytest."""
    cmd = ["python", "-m", "pytest"] + flags
    if test_path:
        cmd.append(test_path)
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout + result.stderr

@mcp.tool()
def list_changed_files() -> str:
    """Return files changed since the last git commit. Useful for scoping reviews."""
    result = subprocess.run(["git", "diff", "--name-only", "HEAD"],
                            capture_output=True, text=True)
    return result.stdout or "No changed files."

@mcp.resource("file://{path}")
def read_project_file(path: str) -> str:
    """Expose project files as resources for context loading."""
    return pathlib.Path(path).read_text()

if __name__ == "__main__":
    mcp.run()  # serves over stdio by default

This server exposes two tools (run_tests and list_changed_files) and one resource (file://). Any MCP host — Claude Code, a custom agent, Cursor — can connect to this server over stdio and discover these capabilities automatically via the protocol handshake. No custom integration code needed on the host side.

MCP vs. Traditional Function Calling

Traditional function calling (as covered earlier in this article) is per-application: you define tools in your agent's code, implement the execution logic inline, and wire everything together. It works, but it does not compose across applications.

DimensionTraditional function callingMCP
ReusabilityPer-application; each agent reimplements the same toolsServers are reusable across any MCP-compatible host
DiscoveryTools hardcoded into agent codeDynamic: host discovers tools at runtime via capability negotiation
DeploymentTool logic is inside the agent processServers are separate processes, independently deployable and scalable
EcosystemProprietary per-teamOpen standard; growing library of pre-built MCP servers
ComplexitySimpler for small, self-contained agentsMore setup; pays off at scale and when reuse matters

For a small, single-purpose agent where you control both the LLM loop and all the tools, traditional function calling is simpler. MCP's value compounds as you have more agents, more tools, or want to share tool implementations across teams. Think of it like the REST vs. direct database access tradeoff: direct is simpler for one use case; a standard interface wins when you have many clients.

MCP vs. RAG: Complementary, Not Competing

Retrieval-Augmented Generation (RAG) fetches relevant documents and inserts them into the model's context before generation. It is read-only and passive: the model does not request retrieval; the application retrieves for the model. MCP tools are active and on-demand: the model decides when to call a tool and what to ask for.

DimensionRAGMCP Tools
InitiationApplication pre-fetches before model turnModel requests at runtime during its turn
Side effectsNone — read-onlyCan write, delete, call APIs, run commands
SpecificityApproximate — retrieval by semantic similarityExact — model specifies precise arguments
LatencyAdds to pre-generation latencyAdds to mid-generation latency (each tool call)
Best forProviding background knowledge the model didn't haveTaking actions, fetching precise data, verifying output

In practice, sophisticated agents use both: RAG to seed the context with background knowledge (codebase overview, documentation, recent issues), and MCP tools for precise runtime operations (reading a specific file, running a specific test, querying a specific database row). RAG reduces the number of tool calls needed for exploration; tools handle everything that requires precision or side effects.

The MCP Ecosystem in Practice

A key benefit of MCP being an open standard is that pre-built servers accumulate into a reusable ecosystem. As of 2026, commonly available MCP servers include:

The practical implication: for common tools, you can pull an existing MCP server off the shelf rather than implementing it yourself. For internal tools, you implement an MCP server once and every agent in your stack gains access to it automatically.

Security Considerations for Tool Use and MCP

Giving a model the ability to take actions in the world is powerful and dangerous if done carelessly. Some principles for secure tool-use design:

takeaway

Function calling is what turns a text model into an actor. The pattern is simple — define tool schemas, run the loop, fan out parallel calls — but the details matter: descriptions determine correct tool choice, schema constraints ensure parseable output, and execution must be validated and sandboxed. MCP lifts this pattern to a standard protocol, enabling a reusable ecosystem of tools that any compliant agent can consume without custom integration. Together, they form the core architecture of every production AI agent.

🎯 interview hot-takes

Explain the function calling loop at the API level. Model emits a tool_use block (name + arguments JSON) → your code executes the actual function → you append a tool_result message → model continues. The model never executes code directly; it only emits intents that your controller acts on.

Why is the tool description more important than the schema? The schema constrains the arguments once the model has decided to call a tool. The description is what the model reads to decide whether to call the tool at all. A vague description leads to wrong tool selection, which no schema can fix — the model will call the wrong tool with perfectly valid arguments.

What problem does MCP solve that function calling doesn't? Function calling is per-application: every agent reimplements the same tools independently. MCP standardizes the interface between tool implementations (servers) and agent hosts so a tool is written once and reusable across any MCP-compatible agent — the same composability benefit that REST APIs gave to web services.

← prev
Building with the Claude & OpenAI APIs