The first time you stream an LLM, the answer is obvious and it is SSE. The model emits tokens, you push them to the browser, the cursor types itself across the screen. Server-Sent Events were designed for precisely this in 2012, every major provider — OpenAI, Anthropic, Google — streams completions over SSE, and you can wire it up in an afternoon. Case closed.
Then you turn the chat into an agent, and the clean answer cracks. The model stops mid-sentence to call a tool. It searches a database, reads a file, waits on an API, and resumes. Suddenly the thing you are streaming is not text. And this is where most teams reach for WebSockets, conclude they need bidirectionality, and rebuild their whole transport. That is the wrong lesson.
An agent doesn't emit tokens. It emits events.
The mistake hides in the data model. Naive streaming treats the output as one growing string: read delta.content, append, render. That works right up until the model does something other than speak. A tool call is not text. A tool result is not text. A state update — "the plan now has four steps" — is not text. An error is not text.
If your stream only carries a string, you have nowhere to put any of that. The "calling search…" affordance, the half-formed JSON of a tool argument, the live plan — they have no slot, so they get dropped, or smuggled into the text and parsed back out with a regex you will hate. The bug isn't the transport. The bug is that you modeled an agent's output as prose when it is a stream of typed events: text deltas, tool-call start/delta/end, tool results, state deltas, lifecycle signals, errors.
Design that envelope first — a type on every chunk and a small JSON payload — and the transport question mostly dissolves.
SSE was never just for text
Here is the fact the "SSE is one-way and simple" framing buries: SSE has carried named event types since the day it shipped. An SSE frame can set an event: field and a JSON data: payload, so a single connection can emit TEXT_MESSAGE_CONTENT, then TOOL_CALL_START, then STATE_DELTA, then error, and your client switches on the type. This is not a hack. It is exactly what the AG-UI protocol — the emerging standard for agent-to-frontend communication — does by default: a typed JSON event stream, ~16 event types, delivered over SSE. The Vercel AI SDK's data-stream protocol is the same idea with a different spelling.
So SSE handles the part everyone assumes it can't. What it genuinely can't do is let the client talk back mid-stream.
The agent's tokens want a firehose. The user's interrupts want a doorbell. Don't run a firehose backwards to ring a bell.
The one real argument for WebSockets — and why it usually loses
The legitimate case for WebSockets is the back-channel: the user wants to cancel a run, approve a risky tool call, or inject a correction while the agent is still working. SSE is strictly server-to-client, so it cannot carry that upstream signal.
But look at the shape of that traffic. The downstream stream is high-volume and continuous — thousands of token events per response. The upstream signal is rare and tiny — a cancel, an approval, maybe a typed nudge. Folding both onto one stateful, full-duplex WebSocket means every token now rides a connection that demands sticky sessions, a WebSocket-aware load balancer or broker, and a reconnection scheme you write yourself. You pay statefulness tax on the firehose to support the doorbell.
The cheaper factoring keeps them apart. Stream the output over SSE (stateless, auto-reconnecting via Last-Event-ID, friendly to any HTTP load balancer once you disable proxy buffering). Send the upstream signals as separate, stateless HTTP POSTs keyed to the run ID: POST /runs/{id}/cancel, POST /runs/{id}/input. The server correlates them to the in-flight run. This hybrid — SSE data plane, HTTP control plane — is not a compromise between two transports. It is the correct decomposition of two genuinely different workloads, and it keeps your expensive path stateless and horizontally scalable.
When to actually open the socket
WebSockets earn their complexity when the upstream channel is itself rich and high-frequency: real-time voice agents streaming audio frames both ways, multiplayer collaboration on a shared agent canvas, or anything where round-trip latency on a per-message HTTP POST would be felt. There the full-duplex connection is load-bearing, not decorative, and AG-UI will happily run over a WebSocket transport when you ask it to.
For the other 90% — a single user watching a single agent think, act, and answer — the answer is the one you started with, just with a better data model behind it. Stream typed events over SSE. Add a thin POST for interrupts. Save the WebSocket for the day you have a reason that isn't "the agent called a tool."



