What is SSE in LLM APIs
Server-Sent Events is a one-way streaming protocol where the server pushes a sequence of plain-text events to the client over a long-running HTTP response. Each event is one or more field: value lines terminated by a blank line. LLM streaming endpoints encode each delta as one event, with the JSON payload in the data: field.
Compared to WebSockets, SSE is simpler — uses regular HTTP, works through proxies, supports auto-reconnect via Last-Event-ID — at the cost of being uni-directional. Perfect fit for "model talks, client listens".
FAQ
- Where do I get the raw SSE bytes?
curl -N https://api.openai.com/v1/chat/completions -d '{...}'or copy from your browser DevTools Network panel (right-click → Copy response). Paste into the textarea here.- How does it know to "reconstruct" the streamed text?
- It probes for three known shapes: OpenAI
choices[0].delta.content, Anthropiccontent_block_delta.delta.text, and Geminicandidates[0].content.parts[0].text. If your response uses a different shape, the events render fine but reconstruction stays empty. - Why does the [DONE] line cause no error?
- Every parser handles it: it's the OpenAI sentinel for "stream complete". The viewer shows it as a non-JSON event without flagging it.
- Are partial / truncated streams handled?
- Yes. Events are split on blank-line boundaries; an incomplete final event without trailing blank line is parsed best-effort. Trailing partial JSON is shown as a parse error.
Common pitfalls
- Splitting on single newlines instead of blank-line boundaries.
- Forgetting that
data:lines can repeat within one event — concatenate with newlines. - Passing the literal string
[DONE]throughJSON.parseand erroring out. - Not handling backpressure on the client; long streams can balloon memory if you store every chunk.