How We Implement Streaming
Server-Sent Events, real-time responses, and progressive rendering
How We Implement Streaming
Users hate waiting. Streaming shows AI responses as they're generated, making interactions feel instant. This guide covers the principles behind effective streaming.
Core principle: Show progress immediately. Users should never stare at a blank screen.
Why Streaming Matters
| Without Streaming | With Streaming |
|---|---|
| Wait 10 seconds for full response | See first words in 100ms |
| No idea if it's working | Visible progress |
| Feels slow even if fast | Feels instant even if same duration |
The psychological impact is significant: users perceive streamed responses as faster even when total time is identical.
Streaming Technologies
Server-Sent Events (SSE)
One-way stream from server to client. Our default choice for LLM responses.
Pros: Simple, built into browsers, reconnects automatically, works through most proxies
Cons: Server to client only, limited to text data
WebSockets
Two-way real-time communication.
Pros: Bidirectional, supports binary data, lower latency for back-and-forth
Cons: More complex to implement, requires manual reconnection logic
Choosing Between Them
| Use Case | Recommended | Why |
|---|---|---|
| LLM response streaming | SSE | One-way is sufficient, simpler |
| Chat with typing indicators | WebSocket | Need bidirectional updates |
| Progress updates | SSE | Server pushing to client |
| Real-time collaboration | WebSocket | Multiple parties sending data |
| File upload progress | WebSocket | Client sending, server acknowledging |
Rule of thumb: If only the server needs to push data, use SSE. If both sides need to send data in real-time, use WebSocket.
SSE Design Principles
Message Format
SSE uses a simple text-based format. Each message has:
- A
data:prefix - JSON payload (for structured data)
- Double newline to separate messages
- A completion signal (we use
[DONE])
Event Types for Agents
When agents use tools, communicate different event types:
| Event Type | Purpose | What to Show |
|---|---|---|
text | Streamed content | Append to response |
tool_start | Tool execution beginning | "Searching..." indicator |
tool_end | Tool execution complete | Hide indicator, maybe show result |
error | Something went wrong | Error message |
done | Stream complete | Remove streaming indicator |
Headers
Three headers are essential for SSE:
Content-Type: text/event-stream— tells the browser it's SSECache-Control: no-cache— prevents caching of the streamConnection: keep-alive— keeps the connection open
Handling Edge Cases
Connection Drops
Connections will drop. Plan for it.
With EventSource API: Reconnection is automatic. The browser handles it.
With fetch(): You need manual retry logic with exponential backoff.
Principle: Never assume a stream will complete successfully. Always have fallback behavior.
Cancellation
Users should be able to cancel streaming requests.
Why it matters:
- User realizes they asked the wrong question
- Response is going in the wrong direction
- User wants to try a different approach
Implementation principle: Use AbortController on the client side. The server should detect client disconnection and stop processing.
Rate Limiting
Multiple concurrent streams can overwhelm your server or hit API limits.
Approach:
- Limit concurrent streams per user (2-3 is reasonable)
- Return 429 Too Many Requests if limit exceeded
- Client should queue or fail gracefully
Partial Chunk Handling
SSE data can arrive in incomplete chunks, especially with slow connections.
The problem: A JSON message might be split across multiple network packets.
The solution: Buffer incoming data. Only parse when you have complete SSE messages (ending with \n\n).
Frontend Considerations
Streaming State
Track three states:
- Idle: No request in progress
- Streaming: Receiving data
- Complete: Stream finished (success or error)
Visual Feedback
| State | What to Show |
|---|---|
| Streaming started | Cursor or typing indicator |
| Receiving text | Append text progressively |
| Tool in progress | Tool-specific indicator ("Searching...", "Analyzing...") |
| Error | Clear error message with retry option |
| Complete | Remove indicators, enable input |
Memory Considerations
Long streams can accumulate significant DOM content.
Watch for:
- Too many re-renders as text appends
- Growing memory from stored messages
- Performance degradation on long conversations
Mitigations:
- Batch DOM updates
- Virtualize long message lists
- Truncate or summarize old messages
Common Mistakes
Not handling partial chunks
Signs: Garbled text, JSON parse errors, missing content.
Fix: Buffer chunks, only parse complete SSE messages.
Blocking UI during stream
Signs: Can't cancel, can't navigate away, frozen interface.
Fix: Keep UI responsive. Streaming should be non-blocking.
Memory leaks
Signs: Memory grows with each stream, performance degrades over time.
Fix: Clean up event listeners. Close streams properly. Don't hold references to completed streams.
No loading state
Signs: User doesn't know request is in progress, may submit again.
Fix: Show streaming indicator immediately on submit. Disable input while streaming.
Ignoring errors
Signs: Stream fails silently, user stares at incomplete response.
Fix: Handle errors explicitly. Show clear error messages. Offer retry.
Evaluation Checklist
Your streaming is working if:
- First content appears within 500ms of request
- UI remains responsive during streaming
- User can cancel in-progress streams
- Connection drops are handled gracefully (auto-retry or clear error)
- Tool usage is visible to users (not a black box)
- Errors display with helpful messages
Your streaming needs work if:
- Long delay before first content appears
- UI freezes during streaming
- No way to cancel
- Connection drops cause crashes or hang
- Users don't know something is happening
- Errors fail silently
Quick Reference
Technology Selection
| Need | Use |
|---|---|
| LLM responses | SSE |
| Real-time bidirectional | WebSocket |
| Simple progress updates | SSE |
| Complex multi-party updates | WebSocket |
Event Type Pattern
For agent responses, use consistent event types:
text— content to displaytool_start— beginning tool executiontool_end— tool execution completeerror— error occurreddone— stream complete
Performance Targets
| Metric | Target |
|---|---|
| Time to first byte | < 500ms |
| UI responsiveness | Never block |
| Reconnection | Automatic or < 3s |
| Error visibility | Immediate and clear |