Building AI Agents with Long-Term Memory: From Design Patterns to Production

In 2025, we got used to ChatGPT remembering conversation context. In 2026, AI Agents need more than “remembering what was just said” — they need to remember decisions made three months ago, understand a user’s long-term preferences, and learn from mistakes.

This isn’t just chat-log storage. It’s a full memory system design.

Why Do Agents Need Memory?

The Current Problem: A Forgetful Assistant

Picture this:

Day 1:
You: Deploy my blog for me — remember to use rsync, not scp
Agent: Got it, noted. Deploying with rsync.

Day 30:
You: Deploy my blog for me
Agent: Sure, deploying with scp... (your preference is forgotten)

This happens because most Agents only have “short-term memory” (the current conversation window) and lack “long-term memory.”

Three Types of Memory

Cognitive science divides human memory into three categories, and AI Agents need all three:

1. Episodic Memory

Definition: Events that happened at a specific time and place
Example: “On February 13, 2026, the deployment failed because the dist folder wasn’t committed”
Purpose: Reviewing past conversations, avoiding repeated mistakes

2. Semantic Memory

Definition: General knowledge and concepts
Example: “The user prefers rsync over scp”, “The blog framework is Astro”
Purpose: Understanding user preferences, project configuration, domain knowledge

3. Procedural Memory

Definition: Skills — how to do something
Example: “The correct steps to deploy the blog: build → commit dist → push to GitHub → pull on the server”
Purpose: Optimizing workflows, learning best practices

A complete Agent memory system needs to support all three at once.

Technical Architecture

The Big Picture

┌─────────────────────────────────────────────────────────────┐
│                        AI Agent                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │ Short-term   │  │ Long-term    │  │ Memory       │      │
│  │ (session ctx)│  │ (vector DB)  │  │ Manager      │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
                            │
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
  ┌──────────┐       ┌──────────┐       ┌──────────┐
  │ Episodic │       │ Semantic │       │Procedural│
  │ (dialog) │       │(knowledge)│      │ (skills) │
  └──────────┘       └──────────┘       └──────────┘
        │                   │                   │
        └───────────────────┴───────────────────┘
                            │
                    ┌───────▼───────┐
                    │  Vector DB    │
                    │  (Chroma)     │
                    └───────────────┘

Choosing a Vector Database

Comparing three common options:

Option	Strengths	Weaknesses	Best for
Pinecone	Managed service, works out of the box, strong performance	Costs money, external dependency	Production, large-scale data
Chroma	Open source and free, runs locally, easy to integrate	Slower than Pinecone	Development and testing, small to mid scale
Qdrant	Excellent performance, supports complex filtering	You deploy and maintain it yourself	Enterprise, high-performance needs

Recommendations:

Development: Chroma (runs locally, zero config)
Production: Pinecone (stable and reliable) or Qdrant (self-hosted)

Choosing an Embedding Model

Embedding converts text into vectors and directly determines retrieval quality:

// OpenAI Embedding（推荐用于生产）
{
  model: "text-embedding-3-small",
  dimensions: 1536,
  cost: "$0.02 / 1M tokens",
  quality: "优秀",
  speed: "快"
}

// 本地 Embedding（适合隐私敏感场景）
{
  model: "sentence-transformers/all-MiniLM-L6-v2",
  dimensions: 384,
  cost: "免费（本地运行）",
  quality: "良好",
  speed: "中等"
}

Cost comparison:

Assume 100 memories stored per day, 50 retrievals per day, running for 1 year:

OpenAI Embeddings + Chroma (local):
- Storage embeddings: 36,500 entries × 50 tokens = 1.825M tokens
  → $0.02 / 1M × 1.825 = $0.0365
- Retrieval embeddings: 18,250 queries × 20 tokens = 365K tokens
  → $0.02 / 1M × 0.365 = $0.0073
- Vector storage (Chroma, local): free (disk usage ~224 MB)
- Total cost: ~$0.044/year (essentially free)

OpenAI Embeddings + Pinecone (managed):
- Embedding API: $0.044/year (same as above)
- Vector storage: 0.037M vectors × $70/M ≈ $2.59/month ≈ $31/year
- Total cost: ~$31/year

Local model (fully free):
- Embedding: free (but needs a GPU; sentence-transformers recommended)
- Vector storage: free (Chroma, local)
- Cost: $0, but you maintain the model yourself

Conclusion:
- Lowest cost: OpenAI + local Chroma ($0.044/year)
- Zero ops: OpenAI + Pinecone ($31/year)
- Fully self-hosted: local model + Chroma ($0, requires the skills to run it)

Memory Retrieval Strategy

Retrieval isn’t just “find the most similar.” Time, importance, and other factors matter too:

interface RetrievalStrategy {
  // 1. 混合检索：向量相似度 + 关键词匹配
  hybrid: {
    vectorWeight: 0.7,    // 语义相似度
    keywordWeight: 0.3    // 精确匹配
  },
  
  // 2. 时间衰减：最近的记忆权重更高
  temporalDecay: {
    enabled: true,
    halfLifeDays: 30      // 30 天后权重减半
  },
  
  // 3. 重要性加权：关键事件优先
  importanceBoost: {
    error: 1.5,           // 错误记录 × 1.5
    decision: 1.3,        // 决策记录 × 1.3
    preference: 1.2       // 用户偏好 × 1.2
  },
  
  // 4. 去重：避免返回相似内容
  mmr: {                  // Maximal Marginal Relevance
    enabled: true,
    lambda: 0.7           // 相似度与多样性的平衡
  }
}

Implementation

Step 1: Initialize the Memory System

// memory-system.ts
import { Chroma } from "@langchain/community/vectorstores/chroma";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Document } from "@langchain/core/documents";

interface MemoryConfig {
  collectionName: string;
  embeddingModel?: string;
  chromaUrl?: string;
}

class LongTermMemory {
  private vectorStore: Chroma;
  private embeddings: OpenAIEmbeddings;
  
  constructor(config: MemoryConfig) {
    // 初始化 Embedding 模型
    this.embeddings = new OpenAIEmbeddings({
      modelName: config.embeddingModel || "text-embedding-3-small",
      dimensions: 1536
    });
    
    // 初始化向量数据库
    this.vectorStore = new Chroma(this.embeddings, {
      collectionName: config.collectionName,
      url: config.chromaUrl || "http://localhost:8000"
    });
  }
  
  // 存储记忆
  async store(memory: Memory): Promise<void> {
    const doc = new Document({
      pageContent: memory.content,
      metadata: {
        type: memory.type,        // episodic | semantic | procedural
        timestamp: Date.now(),
        importance: memory.importance || 1.0,
        tags: memory.tags || []
      }
    });
    
    await this.vectorStore.addDocuments([doc]);
  }
  
  // 检索记忆
  async retrieve(query: string, options?: RetrievalOptions): Promise<Memory[]> {
    const results = await this.vectorStore.similaritySearchWithScore(
      query,
      options?.limit || 5
    );
    
    // 应用时间衰减
    const withDecay = this.applyTemporalDecay(results);
    
    // 应用重要性加权
    const withBoost = this.applyImportanceBoost(withDecay);
    
    // 去重
    const deduplicated = this.applyMMR(withBoost);
    
    return deduplicated.map(r => ({
      content: r.document.pageContent,
      type: r.document.metadata.type,
      score: r.score,
      timestamp: r.document.metadata.timestamp,
      tags: r.document.metadata.tags
    }));
  }
  
  private applyTemporalDecay(
    results: Array<{ document: Document; score: number }>
  ): Array<{ document: Document; score: number }> {
    const now = Date.now();
    const halfLife = 30 * 24 * 60 * 60 * 1000; // 30 天
    
    return results.map(r => {
      const age = now - r.document.metadata.timestamp;
      const decay = Math.pow(0.5, age / halfLife);
      return {
        ...r,
        score: r.score * decay
      };
    });
  }
  
  private applyImportanceBoost(
    results: Array<{ document: Document; score: number }>
  ): Array<{ document: Document; score: number }> {
    const boostMap = {
      error: 1.5,
      decision: 1.3,
      preference: 1.2
    };
    
    return results.map(r => {
      const importance = r.document.metadata.importance || 1.0;
      const tags = r.document.metadata.tags || [];
      
      let boost = 1.0;
      for (const tag of tags) {
        boost = Math.max(boost, boostMap[tag as keyof typeof boostMap] || 1.0);
      }
      
      return {
        ...r,
        score: r.score * importance * boost
      };
    });
  }
  
  private applyMMR(
    results: Array<{ document: Document; score: number }>
  ): Array<{ document: Document; score: number }> {
    // Maximal Marginal Relevance 去重算法
    const lambda = 0.7; // 相似度与多样性的平衡
    const selected: typeof results = [];
    const remaining = [...results];
    
    // 选择得分最高的作为第一个
    if (remaining.length > 0) {
      selected.push(remaining.shift()!);
    }
    
    while (remaining.length > 0 && selected.length < 5) {
      let maxScore = -Infinity;
      let maxIndex = 0;
      
      for (let i = 0; i < remaining.length; i++) {
        const candidate = remaining[i];
        
        // 计算与已选择项的最大相似度
        const maxSimilarity = Math.max(
          ...selected.map(s => 
            this.jaccardSimilarity(
              candidate.document.pageContent,
              s.document.pageContent
            )
          )
        );
        
        // MMR 得分 = λ × 相关性 - (1-λ) × 相似度
        const mmrScore = lambda * candidate.score - (1 - lambda) * maxSimilarity;
        
        if (mmrScore > maxScore) {
          maxScore = mmrScore;
          maxIndex = i;
        }
      }
      
      selected.push(remaining.splice(maxIndex, 1)[0]);
    }
    
    return selected;
  }
  
  private jaccardSimilarity(text1: string, text2: string): number {
    // Jaccard 相似度：交集 / 并集
    const set1 = new Set(text1.toLowerCase().split(""));
    const set2 = new Set(text2.toLowerCase().split(""));
    const intersection = new Set([...set1].filter(x => set2.has(x)));
    const union = new Set([...set1, ...set2]);
    return intersection.size / union.size;
  }
}

interface Memory {
  content: string;
  type: "episodic" | "semantic" | "procedural";
  importance?: number;
  tags?: string[];
  timestamp?: number;
  score?: number;
}

interface RetrievalOptions {
  limit?: number;
  type?: Memory["type"];
  tags?: string[];
}

export { LongTermMemory, Memory };

Step 2: Implement Episodic Memory

Episodic memory stores concrete conversation events:

// episodic-memory.ts
import { LongTermMemory, Memory } from "./memory-system";

class EpisodicMemory {
  private memory: LongTermMemory;
  
  constructor(memory: LongTermMemory) {
    this.memory = memory;
  }
  
  // 存储对话
  async storeConversation(
    userMessage: string,
    agentResponse: string,
    context?: Record<string, any>
  ): Promise<void> {
    const memory: Memory = {
      content: `User: ${userMessage}\nAgent: ${agentResponse}`,
      type: "episodic",
      importance: this.calculateImportance(userMessage, agentResponse),
      tags: this.extractTags(userMessage, agentResponse, context),
      timestamp: Date.now()
    };
    
    await this.memory.store(memory);
  }
  
  // 检索相关对话
  async recallSimilarConversations(
    query: string,
    limit: number = 3
  ): Promise<Memory[]> {
    return await this.memory.retrieve(query, {
      limit,
      type: "episodic"
    });
  }
  
  private calculateImportance(user: string, agent: string): number {
    let importance = 1.0;
    
    // 包含错误信息的对话更重要
    if (agent.includes("错误") || agent.includes("失败")) {
      importance *= 1.5;
    }
    
    // 包含决策的对话更重要
    if (user.includes("记住") || user.includes("偏好")) {
      importance *= 1.3;
    }
    
    // 长对话可能包含更多信息
    const totalLength = user.length + agent.length;
    if (totalLength > 500) {
      importance *= 1.2;
    }
    
    return importance;
  }
  
  private extractTags(
    user: string,
    agent: string,
    context?: Record<string, any>
  ): string[] {
    const tags: string[] = [];
    
    // 从内容提取标签
    if (agent.includes("错误") || agent.includes("失败")) {
      tags.push("error");
    }
    
    if (user.includes("记住") || user.includes("偏好")) {
      tags.push("preference");
    }
    
    if (agent.includes("部署") || agent.includes("发布")) {
      tags.push("deployment");
    }
    
    // 从上下文提取标签
    if (context?.action) {
      tags.push(context.action);
    }
    
    return tags;
  }
}

export { EpisodicMemory };

Step 3: Implement Semantic Memory

Semantic memory extracts and stores general knowledge:

// semantic-memory.ts
import { LongTermMemory, Memory } from "./memory-system";

class SemanticMemory {
  private memory: LongTermMemory;
  private knowledgeGraph: Map<string, Set<string>> = new Map();
  
  constructor(memory: LongTermMemory) {
    this.memory = memory;
  }
  
  // 存储知识
  async storeKnowledge(
    subject: string,
    predicate: string,
    object: string
  ): Promise<void> {
    // 存储为三元组（Subject-Predicate-Object）
    const memory: Memory = {
      content: `${subject} ${predicate} ${object}`,
      type: "semantic",
      importance: 1.2, // 知识默认较重要
      tags: ["knowledge", subject.toLowerCase()],
      timestamp: Date.now()
    };
    
    await this.memory.store(memory);
    
    // 更新知识图谱
    this.updateKnowledgeGraph(subject, predicate, object);
  }
  
  // 存储用户偏好
  async storePreference(
    preference: string,
    context?: string
  ): Promise<void> {
    const memory: Memory = {
      content: context 
        ? `Preference: ${preference} (Context: ${context})`
        : `Preference: ${preference}`,
      type: "semantic",
      importance: 1.3,
      tags: ["preference"],
      timestamp: Date.now()
    };
    
    await this.memory.store(memory);
  }
  
  // 检索知识
  async recallKnowledge(query: string): Promise<Memory[]> {
    return await this.memory.retrieve(query, {
      limit: 5,
      type: "semantic"
    });
  }
  
  // 更新知识图谱
  private updateKnowledgeGraph(
    subject: string,
    predicate: string,
    object: string
  ): void {
    const key = `${subject}:${predicate}`;
    
    if (!this.knowledgeGraph.has(key)) {
      this.knowledgeGraph.set(key, new Set());
    }
    
    this.knowledgeGraph.get(key)!.add(object);
  }
  
  // 查询知识图谱
  queryGraph(subject: string, predicate: string): string[] {
    const key = `${subject}:${predicate}`;
    return Array.from(this.knowledgeGraph.get(key) || []);
  }
}

export { SemanticMemory };

Step 4: Implement Procedural Memory

Procedural memory learns and refines skills:

// procedural-memory.ts
import { LongTermMemory, Memory } from "./memory-system";

class ProceduralMemory {
  private memory: LongTermMemory;
  private skillRegistry: Map<string, Skill> = new Map();
  
  constructor(memory: LongTermMemory) {
    this.memory = memory;
  }
  
  // 存储技能
  async storeSkill(
    name: string,
    steps: string[],
    successRate?: number
  ): Promise<void> {
    const skill: Skill = {
      name,
      steps,
      successRate: successRate || 0,
      attempts: 0,
      lastUsed: Date.now()
    };
    
    this.skillRegistry.set(name, skill);
    
    const memory: Memory = {
      content: `Skill: ${name}\nSteps:\n${steps.map((s, i) => `${i + 1}. ${s}`).join("\n")}`,
      type: "procedural",
      importance: 1.0 + (successRate || 0),
      tags: ["skill", name.toLowerCase()],
      timestamp: Date.now()
    };
    
    await this.memory.store(memory);
  }
  
  // 更新技能（从反馈中学习）
  async updateSkill(
    name: string,
    success: boolean,
    feedback?: string
  ): Promise<void> {
    const skill = this.skillRegistry.get(name);
    if (!skill) return;
    
    skill.attempts += 1;
    skill.successRate = (
      skill.successRate * (skill.attempts - 1) + (success ? 1 : 0)
    ) / skill.attempts;
    skill.lastUsed = Date.now();
    
    // 如果失败，记录反馈以供改进
    if (!success && feedback) {
      const memory: Memory = {
        content: `Skill "${name}" failed: ${feedback}`,
        type: "procedural",
        importance: 1.5, // 失败记录很重要
        tags: ["skill", "error", name.toLowerCase()],
        timestamp: Date.now()
      };
      
      await this.memory.store(memory);
    }
  }
  
  // 检索最佳技能
  async getBestSkill(task: string): Promise<Skill | null> {
    const memories = await this.memory.retrieve(task, {
      limit: 3,
      type: "procedural",
      tags: ["skill"]
    });
    
    if (memories.length === 0) return null;
    
    // 从记忆中解析技能名称
    const skillName = this.extractSkillName(memories[0].content);
    return this.skillRegistry.get(skillName) || null;
  }
  
  private extractSkillName(content: string): string {
    const match = content.match(/^Skill: (.+)$/m);
    return match ? match[1] : "";
  }
}

interface Skill {
  name: string;
  steps: string[];
  successRate: number;
  attempts: number;
  lastUsed: number;
}

export { ProceduralMemory, Skill };

Step 5: Wire It into the Agent

Bring all three memory types together inside the Agent:

// agent-with-memory.ts
import { LongTermMemory } from "./memory-system";
import { EpisodicMemory } from "./episodic-memory";
import { SemanticMemory } from "./semantic-memory";
import { ProceduralMemory } from "./procedural-memory";

class AgentWithMemory {
  private longTermMemory: LongTermMemory;
  private episodic: EpisodicMemory;
  private semantic: SemanticMemory;
  private procedural: ProceduralMemory;
  
  constructor() {
    // 初始化记忆系统
    this.longTermMemory = new LongTermMemory({
      collectionName: "agent-memory",
      embeddingModel: "text-embedding-3-small"
    });
    
    this.episodic = new EpisodicMemory(this.longTermMemory);
    this.semantic = new SemanticMemory(this.longTermMemory);
    this.procedural = new ProceduralMemory(this.longTermMemory);
  }
  
  // 处理用户消息
  async chat(userMessage: string): Promise<string> {
    // 1. 检索相关记忆
    const relevantMemories = await this.recallRelevantContext(userMessage);
    
    // 2. 构建增强上下文
    const contextPrompt = this.buildContextPrompt(relevantMemories);
    
    // 3. 调用 LLM（这里省略实际调用）
    const agentResponse = await this.callLLM(userMessage, contextPrompt);
    
    // 4. 存储对话
    await this.episodic.storeConversation(userMessage, agentResponse);
    
    // 5. 提取并存储知识
    await this.extractAndStoreKnowledge(userMessage, agentResponse);
    
    return agentResponse;
  }
  
  private async recallRelevantContext(query: string): Promise<string[]> {
    // 混合检索：同时查询三种记忆
    const [episodicMemories, semanticMemories, proceduralMemories] = await Promise.all([
      this.episodic.recallSimilarConversations(query, 2),
      this.semantic.recallKnowledge(query),
      this.procedural.getBestSkill(query)
    ]);
    
    const context: string[] = [];
    
    // 添加相关对话
    if (episodicMemories.length > 0) {
      context.push("## Recent Conversations:");
      context.push(...episodicMemories.map(m => m.content));
    }
    
    // 添加相关知识
    if (semanticMemories.length > 0) {
      context.push("## Relevant Knowledge:");
      context.push(...semanticMemories.map(m => m.content));
    }
    
    // 添加相关技能
    if (proceduralMemories) {
      context.push("## Best Practice:");
      context.push(
        proceduralMemories.steps.map((s, i) => `${i + 1}. ${s}`).join("\n")
      );
    }
    
    return context;
  }
  
  private buildContextPrompt(memories: string[]): string {
    if (memories.length === 0) return "";
    
    return [
      "# Context from Long-Term Memory",
      "",
      ...memories,
      "",
      "Use the above context to provide a more personalized and informed response."
    ].join("\n");
  }
  
  private async callLLM(
    userMessage: string,
    context: string
  ): Promise<string> {
    // 实际项目中调用 OpenAI/Anthropic API
    // 这里返回模拟响应
    return `Response to: ${userMessage} (with context: ${context.length} chars)`;
  }
  
  private async extractAndStoreKnowledge(
    userMessage: string,
    agentResponse: string
  ): Promise<void> {
    // 检测用户偏好
    const preferenceMatch = userMessage.match(/记住|偏好|喜欢|使用(.+)而不是(.+)/);
    if (preferenceMatch) {
      await this.semantic.storePreference(userMessage);
    }
    
    // 检测新技能
    const skillMatch = userMessage.match(/如何|怎么|步骤/);
    if (skillMatch && agentResponse.includes("1.")) {
      const steps = agentResponse.match(/\d+\.\s+(.+)/g) || [];
      if (steps.length > 0) {
        await this.procedural.storeSkill(
          "Extracted Skill",
          steps.map(s => s.replace(/^\d+\.\s+/, ""))
        );
      }
    }
  }
}

export { AgentWithMemory };

Production Optimization

Performance

1. Index Optimization

// 为常用查询字段创建索引
await vectorStore.createIndex({
  field: "metadata.type",
  indexType: "exact"
});

await vectorStore.createIndex({
  field: "metadata.timestamp",
  indexType: "range"
});

2. Caching Strategy

class MemoryCacheLayer {
  private cache: Map<string, { result: Memory[]; expiry: number }> = new Map();
  private cacheTTL: number = 5 * 60 * 1000; // 5 分钟
  
  async retrieve(
    query: string,
    fetcher: () => Promise<Memory[]>
  ): Promise<Memory[]> {
    const cached = this.cache.get(query);
    
    if (cached && cached.expiry > Date.now()) {
      return cached.result;
    }
    
    const result = await fetcher();
    this.cache.set(query, {
      result,
      expiry: Date.now() + this.cacheTTL
    });
    
    return result;
  }
  
  invalidate(query?: string): void {
    if (query) {
      this.cache.delete(query);
    } else {
      this.cache.clear();
    }
  }
}

3. Batch Processing

// 批量存储记忆，减少 API 调用
async function batchStoreMemories(memories: Memory[]): Promise<void> {
  const docs = memories.map(m => new Document({
    pageContent: m.content,
    metadata: {
      type: m.type,
      timestamp: m.timestamp || Date.now(),
      importance: m.importance || 1.0,
      tags: m.tags || []
    }
  }));
  
  // 一次性批量插入
  await vectorStore.addDocuments(docs);
}

Cost Control

Optimizing Embedding API calls:

class EmbeddingCache {
  private cache: Map<string, number[]> = new Map();
  
  async getEmbedding(text: string): Promise<number[]> {
    // 1. 检查缓存
    const hash = this.hashText(text);
    if (this.cache.has(hash)) {
      return this.cache.get(hash)!;
    }
    
    // 2. 调用 API
    const embedding = await this.embeddings.embedQuery(text);
    
    // 3. 存入缓存
    this.cache.set(hash, embedding);
    
    return embedding;
  }
  
  private hashText(text: string): string {
    // 使用简单哈希（生产环境使用 crypto）
    return text.toLowerCase().replace(/\s+/g, " ");
  }
}

// 成本节省示例：
// 假设每天 1000 次查询，其中 30% 重复
// - 无缓存：1000 × $0.02 / 1M = $0.00002
// - 有缓存：700 × $0.02 / 1M = $0.000014
// 节省：30% API 调用

Error Handling and Graceful Degradation

Production systems must handle all kinds of failures: vector database connection errors, Embedding API rate limits, and so on.

1. A Complete Error-Handling Implementation

class RobustLongTermMemory {
  private vectorStore: Chroma;
  private fallbackMode: boolean = false;
  private localCache: Map<string, Memory[]> = new Map();
  private retryQueue: Memory[] = [];
  
  async store(memory: Memory): Promise<void> {
    try {
      // 尝试存储到向量数据库
      const doc = new Document({
        pageContent: memory.content,
        metadata: {
          type: memory.type,
          timestamp: Date.now(),
          importance: memory.importance || 1.0,
          tags: memory.tags || []
        }
      });
      
      await this.vectorStore.addDocuments([doc]);
      
      // 成功后退出 fallback 模式
      if (this.fallbackMode) {
        console.log("向量存储已恢复，退出 fallback 模式");
        this.fallbackMode = false;
        await this.processRetryQueue();
      }
    } catch (error) {
      console.error("向量存储失败，启用 fallback 模式:", error);
      
      // 降级策略：存入本地缓存
      this.fallbackMode = true;
      const key = `fallback_${Date.now()}`;
      this.localCache.set(key, [memory]);
      
      // 加入重试队列
      this.retryQueue.push(memory);
      
      // 异步重试（5 秒后）
      setTimeout(() => this.retryStore(memory), 5000);
    }
  }
  
  async retrieve(query: string, options?: RetrievalOptions): Promise<Memory[]> {
    // 如果在 fallback 模式，使用本地缓存
    if (this.fallbackMode) {
      console.warn("使用 fallback 缓存检索");
      return this.fallbackRetrieve(query, options?.limit || 5);
    }
    
    try {
      const results = await this.vectorStore.similaritySearchWithScore(
        query,
        options?.limit || 5
      );
      
      return this.processResults(results);
    } catch (error) {
      console.error("向量检索失败，降级到本地缓存:", error);
      this.fallbackMode = true;
      return this.fallbackRetrieve(query, options?.limit || 5);
    }
  }
  
  private fallbackRetrieve(query: string, limit: number): Memory[] {
    // 简单的关键词匹配（降级方案）
    const allMemories = Array.from(this.localCache.values()).flat();
    const keywords = query.toLowerCase().split(/\s+/);
    
    return allMemories
      .filter(m => 
        keywords.some(kw => m.content.toLowerCase().includes(kw))
      )
      .slice(0, limit);
  }
  
  private async retryStore(memory: Memory): Promise<void> {
    try {
      await this.store(memory);
      console.log("重试存储成功");
      
      // 从重试队列移除
      const index = this.retryQueue.indexOf(memory);
      if (index > -1) {
        this.retryQueue.splice(index, 1);
      }
    } catch (error) {
      console.error("重试存储失败，保留在缓存中");
    }
  }
  
  private async processRetryQueue(): Promise<void> {
    console.log(`处理重试队列，共 ${this.retryQueue.length} 条记忆`);
    
    for (const memory of this.retryQueue) {
      await this.retryStore(memory);
    }
  }
  
  private processResults(
    results: Array<{ document: Document; score: number }>
  ): Memory[] {
    return results.map(r => ({
      content: r.document.pageContent,
      type: r.document.metadata.type,
      score: r.score,
      timestamp: r.document.metadata.timestamp,
      tags: r.document.metadata.tags
    }));
  }
}

2. Health Checks and Auto-Recovery

class MemoryHealthMonitor {
  private isHealthy: boolean = true;
  private lastCheckTime: number = 0;
  private checkInterval: number = 60 * 1000; // 1 分钟
  
  async checkHealth(memory: RobustLongTermMemory): Promise<boolean> {
    const now = Date.now();
    
    if (now - this.lastCheckTime < this.checkInterval) {
      return this.isHealthy;
    }
    
    try {
      // 测试写入
      await memory.store({
        content: "health_check_test",
        type: "episodic",
        timestamp: now
      });
      
      // 测试读取
      await memory.retrieve("health_check_test", { limit: 1 });
      
      this.isHealthy = true;
      this.lastCheckTime = now;
      return true;
    } catch (error) {
      console.error("健康检查失败:", error);
      this.isHealthy = false;
      this.lastCheckTime = now;
      return false;
    }
  }
}

Degradation strategies at a glance:

Failure type	Fallback	Recovery mechanism
Vector database unavailable	Local cache (keyword matching)	Periodic retry + automatic sync
Embedding API rate limited	Use cached embeddings	Retry with exponential backoff
Disk space exhausted	Automatically prune old memories	Monitoring + alerting
Retrieval timeout	Return cached results	Narrow the search scope

Privacy and Security

1. Sensitive Data Filtering

class PrivacyFilter {
  private sensitivePatterns = [
    /\b\d{3}-\d{2}-\d{4}\b/g,           // SSN
    /\b\d{16}\b/g,                      // Credit card
    /[a-zA-Z0-9]{32,}/g,                // API keys
    /password[:=]\s*\S+/gi              // Passwords
  ];
  
  sanitize(text: string): string {
    let sanitized = text;
    
    for (const pattern of this.sensitivePatterns) {
      sanitized = sanitized.replace(pattern, "[REDACTED]");
    }
    
    return sanitized;
  }
}

// 在存储前过滤
async function storeMemory(content: string): Promise<void> {
  const filter = new PrivacyFilter();
  const sanitized = filter.sanitize(content);
  
  await memory.store({
    content: sanitized,
    type: "episodic",
    timestamp: Date.now()
  });
}

2. Access Control

class MemoryAccessControl {
  private userPermissions: Map<string, Set<string>> = new Map();
  
  async retrieve(
    userId: string,
    query: string
  ): Promise<Memory[]> {
    // 只检索用户有权限的记忆
    const permissions = this.userPermissions.get(userId) || new Set();
    
    const results = await memory.retrieve(query);
    
    return results.filter(m => 
      !m.tags || m.tags.some(tag => permissions.has(tag))
    );
  }
}

Monitoring and Debugging

class MemoryMonitor {
  private metrics = {
    storeCount: 0,
    retrieveCount: 0,
    averageRetrievalTime: 0,
    cacheHitRate: 0
  };
  
  async trackRetrieval<T>(
    operation: () => Promise<T>
  ): Promise<T> {
    const start = Date.now();
    
    try {
      const result = await operation();
      
      this.metrics.retrieveCount += 1;
      const duration = Date.now() - start;
      
      // 更新平均检索时间
      this.metrics.averageRetrievalTime = (
        this.metrics.averageRetrievalTime * (this.metrics.retrieveCount - 1) +
        duration
      ) / this.metrics.retrieveCount;
      
      return result;
    } catch (error) {
      console.error("Memory retrieval error:", error);
      throw error;
    }
  }
  
  getMetrics() {
    return this.metrics;
  }
}

A Real-World Case: A Personal Assistant Agent

Let’s build a complete example: a personal assistant that remembers the user’s preferences.

Scenario Design

Day 1:
User: I like VS Code, not Vim
Agent: Got it, noted. You prefer VS Code.

Day 7:
User: Recommend a code editor for me
Agent: Based on your earlier preference, I recommend VS Code. You mentioned you like it better than Vim.

Day 30:
User: What went wrong with the last deployment?
Agent: On February 13, the deployment failed because the dist folder wasn't committed. We later updated the deployment script to add an automatic check.

Full Code

// personal-assistant.ts
import { AgentWithMemory } from "./agent-with-memory";

class PersonalAssistant extends AgentWithMemory {
  async handleUserMessage(message: string): Promise<string> {
    // 特殊处理：偏好设置
    if (this.isPreferenceSetting(message)) {
      return await this.handlePreference(message);
    }
    
    // 特殊处理：回忆请求
    if (this.isRecallRequest(message)) {
      return await this.handleRecall(message);
    }
    
    // 普通对话
    return await this.chat(message);
  }
  
  private isPreferenceSetting(message: string): boolean {
    return /喜欢|偏好|使用.+而不是/.test(message);
  }
  
  private async handlePreference(message: string): Promise<string> {
    // 提取偏好
    const match = message.match(/喜欢|偏好|使用(.+)而不是(.+)/);
    if (!match) {
      await this.semantic.storePreference(message);
      return "好的，已记住你的偏好。";
    }
    
    const [_, preferred, notPreferred] = match;
    
    await this.semantic.storeKnowledge(
      "User",
      "prefers",
      preferred.trim()
    );
    
    await this.semantic.storeKnowledge(
      "User",
      "dislikes",
      notPreferred.trim()
    );
    
    return `好的，已记住。你偏好 ${preferred.trim()}，不喜欢 ${notPreferred.trim()}。`;
  }
  
  private isRecallRequest(message: string): boolean {
    return /上次|之前|记得|还记得/.test(message);
  }
  
  private async handleRecall(message: string): Promise<string> {
    const memories = await this.episodic.recallSimilarConversations(message, 5);
    
    if (memories.length === 0) {
      return "抱歉，我没有找到相关的记忆。";
    }
    
    // 格式化记忆输出
    const formatted = memories.map((m, i) => {
      const date = new Date(m.timestamp!);
      return `${i + 1}. ${date.toLocaleDateString()} - ${m.content}`;
    }).join("\n\n");
    
    return `我找到了这些相关记忆：\n\n${formatted}`;
  }
}

// 使用示例
const assistant = new PersonalAssistant();

// 第 1 天
await assistant.handleUserMessage("我喜欢用 VS Code，不喜欢 Vim");
// → "好的，已记住。你偏好 VS Code，不喜欢 Vim。"

// 第 7 天
await assistant.handleUserMessage("帮我推荐一个代码编辑器");
// → "基于你之前的偏好，我推荐 VS Code..."

// 第 30 天
await assistant.handleUserMessage("上次部署出了什么问题？");
// → "我找到了这些相关记忆：1. 2月13日 - ..."

Where This Is Heading

1. Adaptive Forgetting

Not every memory deserves to live forever. Low-value memories need periodic cleanup:

class AdaptiveForgetfulness {
  private memory: LongTermMemory;
  private retentionPolicy = {
    minImportance: 1.5,        // 重要度 < 1.5 可删除
    maxAge: 90,                // 最多保留 90 天
    maxCount: 10000,           // 最多保留 10000 条
    accessThreshold: 30        // 30 天内未访问可删除
  };
  
  async pruneMemories(): Promise<void> {
    console.log("开始记忆清理...");
    
    // 1. 获取所有记忆（分批获取避免内存溢出）
    const allMemories = await this.getAllMemories();
    console.log(`总记忆数：${allMemories.length}`);
    
    // 2. 筛选待删除的记忆
    const now = Date.now();
    const ageThreshold = now - this.retentionPolicy.maxAge * 24 * 60 * 60 * 1000;
    const accessThreshold = now - this.retentionPolicy.accessThreshold * 24 * 60 * 60 * 1000;
    
    const toDelete = allMemories.filter(m => {
      // 重要记忆不删除
      if (m.importance && m.importance >= this.retentionPolicy.minImportance) {
        return false;
      }
      
      // 过于陈旧（超过 90 天）
      if (m.timestamp && m.timestamp < ageThreshold) {
        return true;
      }
      
      // 长时间未访问（30 天内未访问）
      if (m.lastAccessed && m.lastAccessed < accessThreshold) {
        return true;
      }
      
      return false;
    });
    
    // 3. 如果总数超过限制，删除最旧的
    if (allMemories.length > this.retentionPolicy.maxCount) {
      const excess = allMemories.length - this.retentionPolicy.maxCount;
      const sorted = allMemories
        .filter(m => !toDelete.includes(m))
        .sort((a, b) => (a.timestamp || 0) - (b.timestamp || 0));
      
      toDelete.push(...sorted.slice(0, excess));
    }
    
    // 4. 执行删除
    console.log(`待删除记忆数：${toDelete.length}`);
    for (const memory of toDelete) {
      await this.deleteMemory(memory.id);
    }
    
    console.log(`清理完成，删除了 ${toDelete.length} 条记忆`);
    console.log(`剩余记忆数：${allMemories.length - toDelete.length}`);
  }
  
  private async getAllMemories(): Promise<Memory[]> {
    // 分批获取所有记忆（避免一次性加载过多）
    const batchSize = 1000;
    const allMemories: Memory[] = [];
    let offset = 0;
    
    while (true) {
      const batch = await this.memory.vectorStore.getDocuments({
        offset,
        limit: batchSize
      });
      
      if (batch.length === 0) break;
      
      allMemories.push(...batch.map(doc => ({
        id: doc.id,
        content: doc.pageContent,
        type: doc.metadata.type,
        timestamp: doc.metadata.timestamp,
        importance: doc.metadata.importance,
        lastAccessed: doc.metadata.lastAccessed,
        tags: doc.metadata.tags
      })));
      
      offset += batchSize;
      
      if (batch.length < batchSize) break;
    }
    
    return allMemories;
  }
  
  private async deleteMemory(id: string): Promise<void> {
    await this.memory.vectorStore.delete({ ids: [id] });
  }
}

// 使用示例：定时清理
setInterval(async () => {
  const forgetter = new AdaptiveForgetfulness();
  await forgetter.pruneMemories();
}, 7 * 24 * 60 * 60 * 1000); // 每周清理一次

Multiple Agents sharing one memory pool:

class SharedMemoryPool {
  private pools: Map<string, LongTermMemory> = new Map();
  
  async shareMemory(
    fromAgent: string,
    toAgent: string,
    memoryId: string
  ): Promise<void> {
    const sourceMemory = this.pools.get(fromAgent);
    const targetMemory = this.pools.get(toAgent);
    
    if (!sourceMemory || !targetMemory) return;
    
    // 复制记忆到目标 Agent
    const memory = await sourceMemory.retrieve(memoryId);
    await targetMemory.store(memory[0]);
  }
}

3. Memory Visualization

Helping users understand what the Agent has remembered:

interface MemoryVisualization {
  timeline: {
    date: string;
    events: Memory[];
  }[];
  
  knowledgeGraph: {
    nodes: { id: string; label: string }[];
    edges: { from: string; to: string; label: string }[];
  };
  
  skills: {
    name: string;
    successRate: number;
    lastUsed: Date;
  }[];
}

Closing Thoughts

Long-term memory isn’t an optional feature for AI Agents — it’s the prerequisite for upgrading from a “tool” to a “partner.”

An Agent with memory can:

Understand context: no need to re-explain the project background every time
Learn preferences: automatically adapt to how you work
Avoid mistakes: remember past failures and not repeat them
Keep improving: learn from every interaction and get better

When your Agent can accurately recall your preferences three months later, you have a true “AI partner.”

Related reading:

AI Agent Memory Systems in Practice: OpenClaw Memory Best Practices - If you’re using the OpenClaw framework, this article covers configuring and tuning its built-in memory tools.

Further reading: