OpenAI Function Calling: The Complete Guide for Agent Builders (2026)

OpenAI function calling lets models invoke external tools via JSON schema. This guide covers the Responses API, strict mode, parallel tool calls, structured outputs, MCP integration, and how to wire code execution into your agent.

April 5, 2026 ยท 1 min read

How Function Calling Works

Function calling is a conversation protocol between your application and an OpenAI model. You tell the model what functions exist and what arguments they accept. The model decides whether to call a function based on the user's message. If it does, your application executes the function and feeds the result back. The model then uses that result to generate its final response.

The model never executes code itself. It generates a JSON object containing the function name and arguments. Your application parses that JSON, runs the actual function, and returns the output. This separation is what makes function calling safe and controllable: you decide which functions to expose, validate every argument, and control execution.

5
Steps in the function calling loop
100%
Schema compliance in strict mode
N+1
API calls for N tool invocations

Terminology

OpenAI uses "function calling" and "tool calling" interchangeably. The API parameter is called tools, and each tool has a type: "function". The older functions parameter is deprecated. This guide uses "function calling" because that's what developers search for, but the concepts are identical.

The Five-Step Loop

Every function calling interaction follows the same pattern, whether you're building a simple weather bot or a full coding agent.

1. Define tools with JSON Schema

You describe each function the model can call: its name, a description of what it does, and a JSON schema for its parameters. The description matters. The model uses it to decide when a function is relevant. Vague descriptions lead to wrong tool selection.

2. Send the request with tool definitions

Your API call includes the user's message and the array of available tools. The model reads the message, evaluates which tools (if any) are relevant, and decides whether to call one.

3. Model returns tool_calls

Instead of a text response, the model returns one or more tool_calls, each with a unique ID, the function name, and JSON-encoded arguments. The finish_reason is 'tool_calls' instead of 'stop'.

4. Execute and return results

Your application parses the arguments, calls the actual function, and sends the result back as a tool message with the matching tool_call_id. The model needs this ID to associate results with the correct call.

5. Model generates final response

With the function results in context, the model produces its final answer. This might be a direct response to the user, or it might be another round of tool calls if the task requires multiple steps.

Defining a function tool

const response = await openai.responses.create({
  model: "gpt-5",
  input: [{ role: "user", content: "What files are in the src directory?" }],
  tools: [{
    type: "function",
    name: "list_files",
    description: "List files in a directory",
    parameters: {
      type: "object",
      properties: {
        path: {
          type: "string",
          description: "The directory path to list"
        },
        recursive: {
          type: "boolean",
          description: "Whether to list files recursively"
        }
      },
      required: ["path"],
      additionalProperties: false
    },
    strict: true
  }]
});

Handling the tool call and returning results

// Model returns a function_call output item
const toolCall = response.output.find(
  item => item.type === "function_call"
);

if (toolCall) {
  const args = JSON.parse(toolCall.arguments);
  const result = listFiles(args.path, args.recursive);

  // Send the result back
  const finalResponse = await openai.responses.create({
    model: "gpt-5",
    input: [
      { role: "user", content: "What files are in the src directory?" },
      toolCall,  // the function_call from the model
      {
        type: "function_call_output",
        call_id: toolCall.call_id,
        output: JSON.stringify(result)
      }
    ],
    tools: [/* same tools array */]
  });
}

Strict Mode and Schema Enforcement

Without strict mode, OpenAI models generate function arguments on a best-effort basis. They usually get the schema right, but not always. A model might omit a required field, use a string where you expected a number, or add properties you didn't define. In production, "usually right" isn't good enough.

Setting strict: true in your function definition enables constrained decoding. The model's token generation is restricted to outputs that are valid against your schema. This guarantees 100% schema compliance, not as a post-hoc validation, but at the generation level.

Requirements for Strict Mode

RequirementDetails
additionalPropertiesMust be false on every object in the schema
requiredAll properties must be listed in the required array
Optional fieldsUse type union with null (e.g., ["string", "null"]) instead of omitting from required
First request latencySlightly higher on the first call as OpenAI compiles the schema. Cached afterward.
Unsupported featuresSome JSON Schema features like pattern, minItems, and conditional schemas are not supported

Always enable strict mode

OpenAI recommends strict mode for all production deployments. The reliability gain far outweighs the minor first-request latency cost. Without it, you need defensive parsing code for every function call. With it, you can trust the schema and focus on business logic.

Parallel Tool Calls

A model can return multiple tool calls in a single response. If a user asks "What's the weather in Tokyo and New York?", the model returns two function_call items rather than calling one, waiting for the result, and calling the other. Your application executes both in parallel and sends both results back in a single request.

This matters for agents. A coding agent that needs to read three files doesn't need three round trips. It issues three read_file calls simultaneously. The reduction in latency scales linearly with the number of parallel calls.

When to enable parallel calls

Independent operations: reading multiple files, querying multiple APIs, checking multiple conditions. If the calls don't depend on each other, parallel execution cuts latency proportionally.

When to disable parallel calls

Dependent operations: writing a file then reading it, creating a resource then querying it. Set parallel_tool_calls: false when ordering matters. Reasoning models (o3, o4-mini) naturally produce sequential calls for dependent steps.

Handling parallel tool calls

// Model returns multiple function_call items
const toolCalls = response.output.filter(
  item => item.type === "function_call"
);

// Execute all calls in parallel
const results = await Promise.all(
  toolCalls.map(async (call) => {
    const args = JSON.parse(call.arguments);
    const result = await executeFunction(call.name, args);
    return {
      type: "function_call_output" as const,
      call_id: call.call_id,
      output: JSON.stringify(result)
    };
  })
);

// Send all results back in one request
const finalResponse = await openai.responses.create({
  model: "gpt-5",
  input: [
    ...previousMessages,
    ...toolCalls,
    ...results
  ],
  tools: [/* same tools */]
});

Structured Outputs vs Function Calling

These are two different features that solve two different problems. The confusion exists because both involve JSON schemas, and structured outputs can be used inside function calling (via strict mode). Here's the distinction.

DimensionFunction CallingStructured Outputs
PurposeModel invokes tools with argumentsModel formats its response as JSON
Who actsYour application executes the functionNobody. The JSON is the final output.
Schema locationtools[].parametersresponse_format.json_schema
Use caseAPI calls, file ops, code execution, data retrievalExtracting entities, classification, form filling
Strict modeSet strict: true per functionEnforced by default in json_schema format
Multiple outputsModel can call multiple tools per turnOne JSON response per turn

If you need the model to do something (call an API, run a command, edit a file), use function calling. If you need the model to format something (extract structured data from text, return a classification), use structured outputs. Many agent architectures use both: function calling for tool invocation, structured outputs for parsing tool results into a consistent format.

Function Calling vs MCP

Function calling and MCP (Model Context Protocol) operate at different layers. Function calling is the mechanism a model uses to invoke a tool. MCP is the protocol for discovering, connecting to, and managing tools across providers.

Function calling: the invocation layer

Vendor-specific. OpenAI's format (tools array with JSON schema) differs from Anthropic's. Each provider has its own wire format for defining tools and receiving tool calls. You define functions inline with each API request.

MCP: the discovery and transport layer

Vendor-neutral. MCP servers expose tools through a standardized interface. Any MCP client can connect to any MCP server, regardless of which model it's using. Define a tool once, use it across Claude, GPT, Gemini, or any MCP-compatible agent.

MCP uses function calling under the hood. When an MCP client connects to a server, it discovers available tools and translates them into the model's native function calling format. The model calls tools using its own API (OpenAI's tools parameter, Anthropic's tool_use), and the MCP client routes the invocation to the correct server.

ScenarioUse
Quick prototype with one modelNative function calling. Less infrastructure, faster to ship.
Tools shared across multiple models/agentsMCP. Define once, connect from any compatible client.
Third-party tool marketplaceMCP. Standardized interface means any agent can use any tool.
Tight integration with OpenAI-specific featuresNative function calling. Access to strict mode, parallel calls, Responses API.
Enterprise with many models and vendorsMCP. Avoid vendor lock-in. Swap models without rewriting tool integrations.

They're complementary, not competing

You don't choose between function calling and MCP. MCP is a transport layer; function calling is the invocation mechanism. An agent that supports MCP still uses function calling to communicate with its underlying model. The OpenAI Responses API now supports remote MCP servers directly, connecting the two layers natively.

Building a Coding Agent with Function Calling

A coding agent is a loop: the model reads context, decides which tool to call, your application executes the tool, and the result goes back to the model. The quality of the agent depends on three things: the model's ability to select the right tool, the quality of your tool implementations, and how efficiently you manage context.

The typical coding agent exposes 5-15 tools. More tools means more decisions for the model, and accuracy drops as the tool count rises. OpenAI recommends fewer than 20 tools available at any given time.

Core tools for a coding agent

File operations

read_file, write_file, list_directory. The foundation. Every coding agent needs to read code, write edits, and navigate the file system. Keep path arguments relative to the project root.

Code search

grep, semantic_search, find_references. The model needs to locate relevant code before editing it. Agents that search well edit accurately. Agents that search poorly waste tokens re-reading files.

Shell execution

run_command, run_tests. Let the model run build commands, install dependencies, and execute tests. Sandbox appropriately. The shell tool is what turns a code editor into an engineer.

Code application

apply_diff, apply_patch. Instead of rewriting entire files, let the model describe targeted edits. This reduces token usage and makes changes more precise. Morph's Fast Apply API is purpose-built for this.

Minimal coding agent loop

const tools = [
  { type: "function", name: "read_file", strict: true,
    description: "Read the contents of a file",
    parameters: {
      type: "object",
      properties: {
        path: { type: "string", description: "File path relative to project root" }
      },
      required: ["path"], additionalProperties: false
    }},
  { type: "function", name: "apply_edit", strict: true,
    description: "Apply a targeted edit to a file using Morph Fast Apply",
    parameters: {
      type: "object",
      properties: {
        path: { type: "string", description: "File path to edit" },
        original: { type: "string", description: "The original code to be replaced" },
        updated: { type: "string", description: "The new code to insert" }
      },
      required: ["path", "original", "updated"], additionalProperties: false
    }},
  { type: "function", name: "run_command", strict: true,
    description: "Execute a shell command and return stdout/stderr",
    parameters: {
      type: "object",
      properties: {
        command: { type: "string", description: "The shell command to run" }
      },
      required: ["command"], additionalProperties: false
    }}
];

async function agentLoop(task: string) {
  let messages = [{ role: "user", content: task }];

  while (true) {
    const response = await openai.responses.create({
      model: "gpt-5", input: messages, tools
    });

    const toolCalls = response.output.filter(
      item => item.type === "function_call"
    );

    if (toolCalls.length === 0) break; // Model is done

    // Execute tools in parallel
    const results = await Promise.all(
      toolCalls.map(call => executeAndFormat(call))
    );

    messages = [...messages, ...toolCalls, ...results];
  }
}

Wiring Morph Tools into Function Calling

Morph's APIs are designed as function-callable tools. They solve three problems coding agents face: applying edits accurately to existing files, searching large codebases without exhausting context, and executing code in sandboxed environments.

10,500
Tokens/sec (Fast Apply, morph-v3-fast)
0.73
F1 score (WarpGrep, SWE-Bench)
<50ms
Sandbox cold start

Fast Apply

When a coding model generates an edit, it typically produces either a full-file rewrite or a diff. Full rewrites waste tokens. Diffs are fragile: a single wrong line number breaks the patch. Morph Fast Apply takes the original file and a description of the change, then produces the edited file at 10,500 tok/s. It handles fuzzy matching, so the model doesn't need perfect line-level precision.

Fast Apply as a function tool

{
  type: "function",
  name: "fast_apply",
  strict: true,
  description: "Apply a code edit to a file. Takes the original file content and a description or snippet of the desired change. Returns the complete updated file.",
  parameters: {
    type: "object",
    properties: {
      original_code: {
        type: "string",
        description: "The current contents of the file"
      },
      edit_snippet: {
        type: "string",
        description: "The edit to apply. Can be a diff, a code snippet with changes, or a natural language description of the change."
      },
      filename: {
        type: "string",
        description: "The filename, used for language detection"
      }
    },
    required: ["original_code", "edit_snippet", "filename"],
    additionalProperties: false
  }
}

WarpGrep

Cognition measured that their Devin agent spends 60% of its time on code search. Inefficient search means wasted tokens and stale context. WarpGrep is an agentic code search tool that runs 8 parallel searches per turn over 4 turns, finding relevant code in sub-6 seconds. It works as an MCP server or a direct function tool.

Sandboxes

Code execution needs isolation. Morph Sandboxes provide ephemeral Linux environments with sub-50ms cold starts. Your agent's run_command tool routes to a sandbox instead of your host machine. Each sandbox gets its own filesystem, network namespace, and resource limits.

Best Practices

Write precise descriptions

The model selects tools based on their descriptions, not their names. 'Search for code using regex patterns across the repository' is better than 'search'. Include what the tool does, when to use it, and what it returns.

Enable strict mode everywhere

There is no good reason to leave strict mode off in production. The schema compliance guarantee eliminates an entire class of runtime errors. The first-request latency penalty is negligible after the schema is cached.

Keep tool count under 20

More tools means more potential for wrong selection. If you need 30+ tools, organize them into tool groups and load relevant groups based on the task phase. A planning phase might get search tools. An editing phase might get file tools.

Don't make the model fill known values

If your application already knows the project root, the current branch, or the user's name, don't define those as function parameters. Inject them at execution time. Every parameter the model must generate is a chance for it to hallucinate.

Validate before executing

Even with strict mode, validate arguments against your application's security constraints. A path parameter that's valid JSON might still be a path traversal attack. Apply the principle of least privilege: each function should have only the access it needs.

Return structured results

Function outputs should be consistent and parseable. Return JSON with predictable fields. Include error information in a structured format, not as prose. The model processes structured results more reliably than free-text output.

The Responses API is the current standard

The Assistants API is deprecated and shuts down August 26, 2026. The Chat Completions API still works but doesn't support newer features like built-in web search, file search, and MCP integration. If you're starting a new project, use the Responses API. It unifies all tool-calling capabilities in a single interface.

Frequently Asked Questions

What is OpenAI function calling?

Function calling lets OpenAI models invoke external functions you define. You provide a JSON schema describing available functions, the model decides when to call them and with what arguments, and your application executes the function and returns the result. It's the mechanism that turns a language model into an agent capable of reading files, calling APIs, and executing code.

What is the difference between function calling and structured outputs?

Function calling makes the model invoke a tool. Your code executes the tool and returns results. Structured outputs make the model format its text response as JSON. No execution happens. Use function calling when the model needs to take actions. Use structured outputs when you need predictable response formatting for downstream parsing.

What is strict mode in OpenAI function calling?

Setting strict: true guarantees the model's function arguments exactly match your JSON schema. It uses constrained decoding at the token level, not post-hoc validation. This requires additionalProperties: false on every object and all fields in the required array. OpenAI recommends it for all production use.

How does function calling relate to MCP?

Function calling is the invocation mechanism. MCP is the discovery and transport protocol. MCP servers expose tools through a standardized interface. The model's function calling format is used under the hood to invoke those tools. You don't choose between them. MCP uses function calling, and adds a vendor-neutral layer for tool portability across models and providers.

Which OpenAI models support function calling?

GPT-5, GPT-5.1, GPT-4o, o3, o4-mini, and GPT-4 Turbo all support function calling. The o3 and o4-mini reasoning models call tools within their chain of thought, which means tool results inform the reasoning process directly. GPT-5.1 is the strongest model for coding agent workloads as of April 2026.

Can OpenAI models call multiple functions at once?

Yes. The model can return multiple function_call items in a single response. Your application executes them in parallel and sends all results back together. Disable this with parallel_tool_calls: false when calls have dependencies. Reasoning models sometimes sequence calls naturally when step order matters.

Related Guides

Add code execution tools to your agent

Morph's Fast Apply, WarpGrep, and Sandbox APIs are built for function calling. Define them as tools, wire them into your agent loop, and let your model read, search, edit, and execute code.