How We Implement Streaming

Server-Sent Events, real-time responses, and progressive rendering

How We Implement Streaming

Users hate waiting. Streaming shows AI responses as they're generated, making interactions feel instant. This guide covers the principles behind effective streaming.

Core principle: Show progress immediately. Users should never stare at a blank screen.

Why Streaming Matters

Without StreamingWith Streaming
Wait 10 seconds for full responseSee first words in 100ms
No idea if it's workingVisible progress
Feels slow even if fastFeels instant even if same duration

The psychological impact is significant: users perceive streamed responses as faster even when total time is identical.

Streaming Technologies

Server-Sent Events (SSE)

One-way stream from server to client. Our default choice for LLM responses.

Pros: Simple, built into browsers, reconnects automatically, works through most proxies

Cons: Server to client only, limited to text data

WebSockets

Two-way real-time communication.

Pros: Bidirectional, supports binary data, lower latency for back-and-forth

Cons: More complex to implement, requires manual reconnection logic

Choosing Between Them

Use CaseRecommendedWhy
LLM response streamingSSEOne-way is sufficient, simpler
Chat with typing indicatorsWebSocketNeed bidirectional updates
Progress updatesSSEServer pushing to client
Real-time collaborationWebSocketMultiple parties sending data
File upload progressWebSocketClient sending, server acknowledging

Rule of thumb: If only the server needs to push data, use SSE. If both sides need to send data in real-time, use WebSocket.

SSE Design Principles

Message Format

SSE uses a simple text-based format. Each message has:

  • A data: prefix
  • JSON payload (for structured data)
  • Double newline to separate messages
  • A completion signal (we use [DONE])

Event Types for Agents

When agents use tools, communicate different event types:

Event TypePurposeWhat to Show
textStreamed contentAppend to response
tool_startTool execution beginning"Searching..." indicator
tool_endTool execution completeHide indicator, maybe show result
errorSomething went wrongError message
doneStream completeRemove streaming indicator

Headers

Three headers are essential for SSE:

  1. Content-Type: text/event-stream — tells the browser it's SSE
  2. Cache-Control: no-cache — prevents caching of the stream
  3. Connection: keep-alive — keeps the connection open

Handling Edge Cases

Connection Drops

Connections will drop. Plan for it.

With EventSource API: Reconnection is automatic. The browser handles it.

With fetch(): You need manual retry logic with exponential backoff.

Principle: Never assume a stream will complete successfully. Always have fallback behavior.

Cancellation

Users should be able to cancel streaming requests.

Why it matters:

  • User realizes they asked the wrong question
  • Response is going in the wrong direction
  • User wants to try a different approach

Implementation principle: Use AbortController on the client side. The server should detect client disconnection and stop processing.

Rate Limiting

Multiple concurrent streams can overwhelm your server or hit API limits.

Approach:

  • Limit concurrent streams per user (2-3 is reasonable)
  • Return 429 Too Many Requests if limit exceeded
  • Client should queue or fail gracefully

Partial Chunk Handling

SSE data can arrive in incomplete chunks, especially with slow connections.

The problem: A JSON message might be split across multiple network packets.

The solution: Buffer incoming data. Only parse when you have complete SSE messages (ending with \n\n).

Frontend Considerations

Streaming State

Track three states:

  1. Idle: No request in progress
  2. Streaming: Receiving data
  3. Complete: Stream finished (success or error)

Visual Feedback

StateWhat to Show
Streaming startedCursor or typing indicator
Receiving textAppend text progressively
Tool in progressTool-specific indicator ("Searching...", "Analyzing...")
ErrorClear error message with retry option
CompleteRemove indicators, enable input

Memory Considerations

Long streams can accumulate significant DOM content.

Watch for:

  • Too many re-renders as text appends
  • Growing memory from stored messages
  • Performance degradation on long conversations

Mitigations:

  • Batch DOM updates
  • Virtualize long message lists
  • Truncate or summarize old messages

Common Mistakes

Not handling partial chunks

Signs: Garbled text, JSON parse errors, missing content.

Fix: Buffer chunks, only parse complete SSE messages.

Blocking UI during stream

Signs: Can't cancel, can't navigate away, frozen interface.

Fix: Keep UI responsive. Streaming should be non-blocking.

Memory leaks

Signs: Memory grows with each stream, performance degrades over time.

Fix: Clean up event listeners. Close streams properly. Don't hold references to completed streams.

No loading state

Signs: User doesn't know request is in progress, may submit again.

Fix: Show streaming indicator immediately on submit. Disable input while streaming.

Ignoring errors

Signs: Stream fails silently, user stares at incomplete response.

Fix: Handle errors explicitly. Show clear error messages. Offer retry.

Evaluation Checklist

Your streaming is working if:

  • First content appears within 500ms of request
  • UI remains responsive during streaming
  • User can cancel in-progress streams
  • Connection drops are handled gracefully (auto-retry or clear error)
  • Tool usage is visible to users (not a black box)
  • Errors display with helpful messages

Your streaming needs work if:

  • Long delay before first content appears
  • UI freezes during streaming
  • No way to cancel
  • Connection drops cause crashes or hang
  • Users don't know something is happening
  • Errors fail silently

Quick Reference

Technology Selection

NeedUse
LLM responsesSSE
Real-time bidirectionalWebSocket
Simple progress updatesSSE
Complex multi-party updatesWebSocket

Event Type Pattern

For agent responses, use consistent event types:

  • text — content to display
  • tool_start — beginning tool execution
  • tool_end — tool execution complete
  • error — error occurred
  • done — stream complete

Performance Targets

MetricTarget
Time to first byte< 500ms
UI responsivenessNever block
ReconnectionAutomatic or < 3s
Error visibilityImmediate and clear