How We Manage Memory

Agents need memory to maintain context across conversations. Without it, every message starts fresh. This guide covers how we think about agent memory.

Core principle: Remember what matters, forget what doesn't. Context windows are finite.

Types of Memory

Type	Scope	Storage	Use Case
Conversation	Single chat	In-memory or DB	Current chat context
Session	User session	Database	Multi-conversation context
Long-term	Permanent	Database	User preferences, facts

Each type requires different strategies.

Conversation Memory

The most common type: remembering what was said in this conversation.

The Core Challenge

LLMs have limited context windows. You can't include every message from a long conversation.

Memory Strategies

Strategy	When to Use	Trade-off
Keep all	Short conversations	Runs out of space
Keep recent N	Most conversations	Loses old context
Summarize old	Long conversations	Loses detail
Selective	Important conversations	Complex to implement

Token Budgeting

Plan your context window usage:

Component	Typical Budget
System prompt	1-2K tokens
Tools	2-3K tokens
History	40-60K tokens
Current message	2-4K tokens
Response reserve	4K tokens

When history exceeds budget, you must trim or summarize.

Trimming Approaches

Simple: Remove oldest messages when over budget.

Smarter: Summarize older messages, keep recent ones verbatim.

Selective: Keep messages marked as important, summarize the rest.

When to Summarize

Summarization is useful when:

Conversation exceeds 30+ messages
Important context is in older messages
You need to preserve facts but not exact wording

Summarization adds latency (LLM call) and loses detail. Use only when needed.

Session Memory

Remembering context across multiple conversations with the same user.

What to Remember

Remember	Don't Remember
User preferences (language, style)	Temporary task details
Key facts (company, role)	One-off instructions
Recent topics	Everything
Explicit corrections	Guesses about preferences

Memory-Enhanced Prompts

Include relevant memory in the system prompt:

"User prefers Vietnamese"
"User works at [Company] doing SEO"
"Recent topics: keyword clustering, content optimization"

This gives the agent context without cluttering conversation history.

Extracting Memory

Automatically identify what to remember from conversations:

Look for:

Explicit statements: "I prefer..." "I work at..."
Corrections: "No, I meant..." "Actually..."
Repeated patterns: User always asks for X

Avoid:

Guessing preferences from single interactions
Storing temporary task context
Keeping everything

Long-term Memory

Persistent facts and preferences that last beyond sessions.

Storage Structure

Two tables typically:

Conversations table: Conversation metadata
Messages table: Individual messages with conversation_id
User memory table: Key-value pairs for persistent facts

Memory Keys

Organize long-term memory by type:

Key	Example Value
preferences.language	"vi"
preferences.style	"professional"
facts	["Works at SEO agency", "Focus on Vietnamese market"]
recent_topics	["keyword clustering", "content optimization"]

Memory Hygiene

Long-term memory needs maintenance:

Expiry: Facts can become stale
Limits: Don't store unlimited facts
Updates: New information should replace old
User control: Users should be able to clear memory

Context Window Management

The Problem

A 10-message conversation might fit. A 100-message conversation won't.

The Solution

Budget: Know how much space you have
Prioritize: Recent messages > old messages
Summarize: Compress old context when needed
Truncate: Remove if summarization isn't worth it

Prioritization

What to keep when space is limited:

Priority	Content
Always keep	System prompt, tools, current message
High priority	Last 10 messages
Medium priority	User facts, summarized history
Low priority	Detailed old messages

Common Mistakes

Storing everything

Signs: Database grows huge, context windows overflow, costs spike.

Fix: Be selective. Not every message needs permanent storage.

No memory limits

Signs: Old conversations slow down, token limits hit.

Fix: Implement summarization, set max history length.

Ignoring memory in prompts

Signs: Agent keeps asking same questions, ignores user context.

Fix: Include relevant memory in system prompt.

Memory without cleanup

Signs: Stale facts, outdated preferences, conflicting information.

Fix: Add expiry, allow users to reset, periodic cleanup.

Forgetting mid-conversation

Signs: Agent loses track of what was discussed.

Fix: Check if trimming is too aggressive. Consider summarization.

Evaluation Checklist

Your memory system is working if:

Agent remembers context within conversation
User preferences persist across sessions
Old conversations don't slow things down
Memory stays within token budgets
Users can clear/reset memory

Your memory system needs work if:

Agent forgets mid-conversation
Every conversation starts fresh (when it shouldn't)
Database size growing unbounded
Context window errors appearing
Stale information being used

Quick Reference

Memory Types

Type	Persistence	Strategy
Conversation	Session	Trim or summarize old
Session	User lifetime	Key-value storage
Long-term	Permanent	Explicit extraction

Token Budget Template

Component	Tokens
System prompt	1,500
Tools	2,500
User memory	500
Conversation history	40,000
Current message	3,000
Response reserve	4,000
Total	~51,500

What to Extract for Long-term Memory

Extract:

Explicit preferences
Stated facts about user/company
Corrections and clarifications
Repeated patterns

Don't extract:

Temporary task context
Single-interaction guesses
Everything