phidea
Reference · page 2 / 6

2. Architecture

Part 2 of 6. ← Overview · Index · Next → Build walkthrough

The flow

User types in ChatGPT
        │
        ▼
ChatGPT model decides to call a tool ── (MCP tools/list, tools/call over HTTP)
        │
        ▼
Your MCP server (HTTPS, Streamable HTTP or SSE transport)
        │  returns:
        │   - structuredContent (JSON the model sees)
        │   - content (optional markdown narration)
        │   - _meta.ui.resourceUri (pointer to a UI template you registered)
        │   - _meta (extra payload that ONLY the widget sees, never the model)
        ▼
ChatGPT loads the HTML/JS bundle in a sandboxed iframe
        │  rendered under <yourdomain>.web-sandbox.oaiusercontent.com
        ▼
Widget talks back via JSON-RPC over postMessage ("MCP Apps bridge")
   - reads toolInput / toolOutput / widgetState
   - can call more tools (tools/call), post follow-up messages (ui/message),
     update model context (ui/update-model-context), request fullscreen, etc.

Three things to internalise

1. MCP is the wire format; Apps SDK is the extension

The protocol is open (modelcontextprotocol.io). Apps SDK layers on:

  • _meta.ui.* fields (widget resource URI, CSP, domain)
  • A text/html;profile=mcp-app MIME type
  • The window.openai bridge (JSON-RPC over postMessage between the iframe and ChatGPT)

Everything else — tool definitions, resource registration, transport — is vanilla MCP.

2. Two payloads from every tool call

Each tool response carries two separate payloads that go to different readers:

FieldWho sees itWhat to put there
structuredContentThe model (next turn, counts against context)Task-relevant JSON only: IDs, titles, statuses
contentThe model (as narration)Short markdown the model can echo or reason over
_metaThe widget (via window.openai)Large UI-only data: image URLs, full transcripts, rich render data

Leaking diagnostics or full payloads into structuredContent is one of the most common causes of review rejection and unnecessary token burn.

3. Transport

  • Streamable HTTP is the recommended transport today.
  • SSE still works.
  • Whichever you pick must be reachable on HTTPS by ChatGPT.

The MCP Inspector's --transport http / --transport sse flags must match your server's choice.

Runtime surfaces

  • Tool handler (server-side, Node/Python). Privileged work. Holds auth tokens. Validates arguments. Returns both payloads.
  • Widget (iframe, browser). Renders UI, reads window.openai.toolOutput and window.openai.toolInput, can call more tools via window.openai.callTool(...).
  • ChatGPT host. Routes tool calls, passes bearer tokens, enforces CSP, mediates between widget ↔ server.

Never blur these:

  • Widget ≠ privileged. It runs in the user's browser. Don't put secrets there.
  • Server ≠ UI code. Don't try to stream DOM to ChatGPT; return resource URIs.
  • Model-visible data ≠ UI data. Keep them separate or the model's context gets polluted.