How We Manage Memory
Conversation history, context windows, and persistent agent memory
How We Manage Memory
Agents need memory to maintain context across conversations. Without it, every message starts fresh. This guide covers how we think about agent memory.
Core principle: Remember what matters, forget what doesn't. Context windows are finite.
Types of Memory
| Type | Scope | Storage | Use Case |
|---|---|---|---|
| Conversation | Single chat | In-memory or DB | Current chat context |
| Session | User session | Database | Multi-conversation context |
| Long-term | Permanent | Database | User preferences, facts |
Each type requires different strategies.
Conversation Memory
The most common type: remembering what was said in this conversation.
The Core Challenge
LLMs have limited context windows. You can't include every message from a long conversation.
Memory Strategies
| Strategy | When to Use | Trade-off |
|---|---|---|
| Keep all | Short conversations | Runs out of space |
| Keep recent N | Most conversations | Loses old context |
| Summarize old | Long conversations | Loses detail |
| Selective | Important conversations | Complex to implement |
Token Budgeting
Plan your context window usage:
| Component | Typical Budget |
|---|---|
| System prompt | 1-2K tokens |
| Tools | 2-3K tokens |
| History | 40-60K tokens |
| Current message | 2-4K tokens |
| Response reserve | 4K tokens |
When history exceeds budget, you must trim or summarize.
Trimming Approaches
Simple: Remove oldest messages when over budget.
Smarter: Summarize older messages, keep recent ones verbatim.
Selective: Keep messages marked as important, summarize the rest.
When to Summarize
Summarization is useful when:
- Conversation exceeds 30+ messages
- Important context is in older messages
- You need to preserve facts but not exact wording
Summarization adds latency (LLM call) and loses detail. Use only when needed.
Session Memory
Remembering context across multiple conversations with the same user.
What to Remember
| Remember | Don't Remember |
|---|---|
| User preferences (language, style) | Temporary task details |
| Key facts (company, role) | One-off instructions |
| Recent topics | Everything |
| Explicit corrections | Guesses about preferences |
Memory-Enhanced Prompts
Include relevant memory in the system prompt:
- "User prefers Vietnamese"
- "User works at [Company] doing SEO"
- "Recent topics: keyword clustering, content optimization"
This gives the agent context without cluttering conversation history.
Extracting Memory
Automatically identify what to remember from conversations:
Look for:
- Explicit statements: "I prefer..." "I work at..."
- Corrections: "No, I meant..." "Actually..."
- Repeated patterns: User always asks for X
Avoid:
- Guessing preferences from single interactions
- Storing temporary task context
- Keeping everything
Long-term Memory
Persistent facts and preferences that last beyond sessions.
Storage Structure
Two tables typically:
- Conversations table: Conversation metadata
- Messages table: Individual messages with conversation_id
- User memory table: Key-value pairs for persistent facts
Memory Keys
Organize long-term memory by type:
| Key | Example Value |
|---|---|
| preferences.language | "vi" |
| preferences.style | "professional" |
| facts | ["Works at SEO agency", "Focus on Vietnamese market"] |
| recent_topics | ["keyword clustering", "content optimization"] |
Memory Hygiene
Long-term memory needs maintenance:
- Expiry: Facts can become stale
- Limits: Don't store unlimited facts
- Updates: New information should replace old
- User control: Users should be able to clear memory
Context Window Management
The Problem
A 10-message conversation might fit. A 100-message conversation won't.
The Solution
- Budget: Know how much space you have
- Prioritize: Recent messages > old messages
- Summarize: Compress old context when needed
- Truncate: Remove if summarization isn't worth it
Prioritization
What to keep when space is limited:
| Priority | Content |
|---|---|
| Always keep | System prompt, tools, current message |
| High priority | Last 10 messages |
| Medium priority | User facts, summarized history |
| Low priority | Detailed old messages |
Common Mistakes
Storing everything
Signs: Database grows huge, context windows overflow, costs spike.
Fix: Be selective. Not every message needs permanent storage.
No memory limits
Signs: Old conversations slow down, token limits hit.
Fix: Implement summarization, set max history length.
Ignoring memory in prompts
Signs: Agent keeps asking same questions, ignores user context.
Fix: Include relevant memory in system prompt.
Memory without cleanup
Signs: Stale facts, outdated preferences, conflicting information.
Fix: Add expiry, allow users to reset, periodic cleanup.
Forgetting mid-conversation
Signs: Agent loses track of what was discussed.
Fix: Check if trimming is too aggressive. Consider summarization.
Evaluation Checklist
Your memory system is working if:
- Agent remembers context within conversation
- User preferences persist across sessions
- Old conversations don't slow things down
- Memory stays within token budgets
- Users can clear/reset memory
Your memory system needs work if:
- Agent forgets mid-conversation
- Every conversation starts fresh (when it shouldn't)
- Database size growing unbounded
- Context window errors appearing
- Stale information being used
Quick Reference
Memory Types
| Type | Persistence | Strategy |
|---|---|---|
| Conversation | Session | Trim or summarize old |
| Session | User lifetime | Key-value storage |
| Long-term | Permanent | Explicit extraction |
Token Budget Template
| Component | Tokens |
|---|---|
| System prompt | 1,500 |
| Tools | 2,500 |
| User memory | 500 |
| Conversation history | 40,000 |
| Current message | 3,000 |
| Response reserve | 4,000 |
| Total | ~51,500 |
What to Extract for Long-term Memory
Extract:
- Explicit preferences
- Stated facts about user/company
- Corrections and clarifications
- Repeated patterns
Don't extract:
- Temporary task context
- Single-interaction guesses
- Everything