phidea
Reference · page 2 / 6

# 2. Architecture

Part 2 of 6. ← Overview · Index · Next → Build walkthrough

The flow

`` User types in ChatGPT │ ▼ ChatGPT model decides to call a tool ── (MCP tools/list, tools/call over HTTP) │ ▼ Your MCP server (HTTPS, Streamable HTTP or SSE transport) │ returns: │ - structuredContent (JSON the model sees) │ - content (optional markdown narration) │ - _meta.ui.resourceUri (pointer to a UI template you registered) │ - _meta (extra payload that ONLY the widget sees, never the model) ▼ ChatGPT loads the HTML/JS bundle in a sandboxed iframe │ rendered under <yourdomain>.web-sandbox.oaiusercontent.com ▼ Widget talks back via JSON-RPC over postMessage ("MCP Apps bridge") - reads toolInput / toolOutput / widgetState - can call more tools (tools/call), post follow-up messages (ui/message), update model context (ui/update-model-context), request fullscreen, etc. ``

Three things to internalise

1. MCP is the wire format; Apps SDK is the extension

The protocol is open (modelcontextprotocol.io). Apps SDK layers on:

  • _meta.ui.* fields (widget resource URI, CSP, domain)
  • A text/html;profile=mcp-app MIME type
  • The window.openai bridge (JSON-RPC over postMessage between the iframe and ChatGPT)

Everything else — tool definitions, resource registration, transport — is vanilla MCP.

2. Two payloads from every tool call

Each tool response carries two separate payloads that go to different readers:

FieldWho sees itWhat to put there
structuredContentThe model (next turn, counts against context)Task-relevant JSON only: IDs, titles, statuses
contentThe model (as narration)Short markdown the model can echo or reason over
_metaThe widget (via window.openai)Large UI-only data: image URLs, full transcripts, rich render data

Leaking diagnostics or full payloads into structuredContent is one of the most common causes of review rejection and unnecessary token burn.

3. Transport

  • Streamable HTTP is the recommended transport today.
  • SSE still works.
  • Whichever you pick must be reachable on HTTPS by ChatGPT.

The MCP Inspector's --transport http / --transport sse flags must match your server's choice.

Runtime surfaces

  • Tool handler (server-side, Node/Python). Privileged work. Holds auth tokens. Validates arguments. Returns both payloads.
  • Widget (iframe, browser). Renders UI, reads window.openai.toolOutput and window.openai.toolInput, can call more tools via window.openai.callTool(...).
  • ChatGPT host. Routes tool calls, passes bearer tokens, enforces CSP, mediates between widget ↔ server.

Never blur these:

  • Widget ≠ privileged. It runs in the user's browser. Don't put secrets there.
  • Server ≠ UI code. Don't try to stream DOM to ChatGPT; return resource URIs.
  • Model-visible data ≠ UI data. Keep them separate or the model's context gets polluted.