Architecture Overview — Epitome Docs

Architecture Overview

System design, key decisions, and data flow in Epitome.

System Overview

Epitome follows a monolithic server architecture. A single Hono application serves the REST API, MCP server, OAuth endpoints, and static dashboard assets. PostgreSQL is the sole database, handling structured data, vector embeddings (via pgvector), graph relationships, metadata, and audit logs.

This deliberate simplicity reduces operational overhead, eliminates inter-service communication complexity, and makes self-hosting straightforward. The entire system can run on a single server or container.

┌─────────────────────────────────────────────────────┐
│                   AI Agents                         │
│  (Claude, ChatGPT, custom bots, etc.)               │
└───────────────┬───────────────────┬─────────────────┘
                │ MCP (Streamable HTTP)  │ REST API
                ▼                   ▼
┌─────────────────────────────────────────────────────┐
│                Hono Server (Node.js 22)              │
│                                                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────────────┐   │
│  │ MCP      │  │ REST     │  │ Auth (OAuth,     │   │
│  │ Server   │  │ Routes   │  │ Sessions, Keys)  │   │
│  └────┬─────┘  └────┬─────┘  └────────┬─────────┘   │
│       │              │                  │             │
│  ┌────▼──────────────▼──────────────────▼─────────┐  │
│  │              Service Layer                      │  │
│  │  (Profile, Tables, Vectors, Graph, Activity)    │  │
│  └────────────────────┬───────────────────────────┘  │
│                       │                              │
│  ┌────────────────────▼───────────────────────────┐  │
│  │         Drizzle ORM + postgres.js              │  │
│  └────────────────────┬───────────────────────────┘  │
└───────────────────────┼──────────────────────────────┘
                        │
┌───────────────────────▼──────────────────────────────┐
│              PostgreSQL 17 + pgvector 0.8             │
│                                                      │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ shared   │  │ user_abc123  │  │ user_def456   │  │
│  │ schema   │  │ schema       │  │ schema        │  │
│  │ (users,  │  │ (profile,    │  │ (profile,     │  │
│  │ accounts │  │  vectors,    │  │  vectors,     │  │
│  │ sessions)│  │  graph,      │  │  graph,       │  │
│  │          │  │  tables,     │  │  tables,      │  │
│  │          │  │  activity)   │  │  activity)    │  │
│  └──────────┘  └──────────────┘  └───────────────┘  │
└──────────────────────────────────────────────────────┘

Key Architectural Decisions

The following decisions shape the system and are documented in the tradeoff register in the tech spec. Understanding them is important for contributors and self-hosters.

Hono over Express / Fastify

Hono is lightweight, built on web standards (Request/Response), and has first-class TypeScript support. It runs on Node.js, Deno, Bun, and Cloudflare Workers, giving us deployment flexibility. Its middleware system is simpler than Express's and it has no legacy baggage. The framework is fast, with zero dependencies beyond itself.

PostgreSQL for Everything (No Redis, No Mongo, No Pinecone)

Using a single database eliminates operational complexity. PostgreSQL handles structured data natively. pgvector provides vector similarity search that is fast enough for personal-scale data (thousands, not billions of vectors). PostgreSQL's JSONB columns handle schema-flexible data. The graph is modeled with entities and edges tables, queried with recursive CTEs. One backup covers everything.

Per-User Schemas over Row-Level Security (RLS)

Each user gets their own PostgreSQL schema (e.g., user_abc123). This provides hard data isolation — a bug in one query cannot leak another user's data. It also simplifies indexing (no composite indexes with user_id), makes per-user backups trivial, and allows clean data deletion. The tradeoff is slightly more complex connection handling (SET LOCAL search_path in transactions) and an upper bound of roughly 10,000 users per database before schema management overhead becomes noticeable.

Streamable HTTP for MCP Transport

The hosted MCP service uses Streamable HTTP transport rather than stdio. This allows agents to connect over the network without needing a local process. The transport supports both request-response patterns and server-sent events for streaming. Agents authenticate via their MCP URL token.

Drizzle over Prisma

Drizzle provides type-safe SQL with a minimal abstraction layer and, critically, a raw SQL escape hatch via tagged template literals. This is essential because some queries (vector similarity search, graph traversals, dynamic user-table queries) cannot be expressed cleanly in any ORM's query builder. Prisma's $queryRaw exists but loses type safety. With Drizzle + postgres.js, we get the best of both worlds.

React SPA over Next.js

The dashboard is a client-only React SPA. There is no need for server-side rendering — the dashboard is behind authentication, not indexed by search engines, and communicates entirely via the REST API. A Vite-built SPA is simpler to deploy (static files + API) and avoids the complexity of Next.js's server components, caching, and build system.

Service Layer

The codebase follows a clean separation between routes, services, and database access. Routes handle HTTP concerns (parsing request, sending response). Services contain business logic. Database access uses Drizzle ORM and raw postgres.js queries.

text
Route Handler (Hono)
  │
  ├── Validates request with Zod
  ├── Extracts user context from auth middleware
  │
  ▼
Service Layer
  │
  ├── Implements business logic
  ├── Calls database via withUserSchema(userId, async (tx) => {...})
  ├── Triggers async side effects (entity extraction, etc.)
  │
  ▼
Database (postgres.js + Drizzle)
  │
  ├── SET LOCAL search_path = 'user_abc123', public;
  ├── Execute queries within transaction
  └── Return typed results

The withUserSchema() utility wraps all per-user database operations in a transaction that sets the PostgreSQL search path to the user's schema. This ensures queries automatically resolve to the correct tables without explicit schema prefixes.

typescript
// Example service function
async function getProfile(userId: string) {
  return withUserSchema(userId, async (tx) => {
    const [profile] = await tx`
      SELECT version, confidence, data, updated_at
      FROM profile
      ORDER BY version DESC
      LIMIT 1
    `;
    return profile;
  });
}

Important pattern: Never nestwithUserSchema() calls. If a service function needs to be called from within an existing transaction, use the*Internal(tx, ...) variant that accepts an existing transaction object.

Entity Extraction Pipeline

When a memory is stored via store_memory or the vectors API, the system triggers an asynchronous entity extraction pipeline. The API returns immediately — extraction happens in the background.

text
1. Memory Stored (store_memory / POST /v1/vectors/:collection)
   │
   ▼
2. Content embedded (text-embedding-3-small) → vector stored in pgvector
   │
   ▼
3. Async: Entity extraction triggered (background)
   │
   ├── Content sent to gpt-5-mini with structured output schema
   ├── Response parsed: entities[] with {name, type, properties}
   ├── Edges[] with {source, target, type, properties}
   │
   ▼
4. Deduplication
   │
   ├── For each extracted entity:
   │   ├── Fuzzy match against existing entities (pg_trgm similarity)
   │   ├── If match > 0.8: merge with existing entity
   │   └── If no match: create new entity
   │
   ├── For each extracted edge:
   │   ├── Check for existing edge with same source, target, type
   │   └── If exists: update properties; else: create new edge
   │
   ▼
5. Entity mentions recorded (links entity ↔ source vector entry)
   │
   ▼
6. Graph statistics updated

The extraction uses OpenAI's Responses API with structured output (JSON schema withstrict: true) to ensure the model returns well-formed entity and edge data. The deduplication step uses PostgreSQL'spg_trgm extension for fuzzy string matching, preventing duplicate entities for slight name variations (e.g., "Bob" vs "Bobby" vs "Robert").

Data Flow

Here is the complete lifecycle of a request from an AI agent through the MCP server:

text
AI Agent (e.g., Claude Desktop)
  │
  │  MCP tool call: store_memory({content: "...", collection: "facts"})
  │  Transport: Streamable HTTP POST to /mcp
  │
  ▼
Hono Server
  │
  ├── 1. Parse MCP request, extract tool name + arguments
  ├── 2. Authenticate: validate MCP token → resolve user_id + agent_id
  ├── 3. Consent check: does this agent have 'vectors' write permission?
  │      └── If no: return CONSENT_REQUIRED error
  ├── 4. Rate limit check: is this agent under its request quota?
  │      └── If no: return RATE_LIMITED error
  │
  ▼
MCP Tool Handler (store_memory)
  │
  ├── 5. Validate arguments with Zod schema
  ├── 6. Call VectorService.store(userId, content, collection, metadata)
  │
  ▼
Vector Service
  │
  ├── 7. Generate embedding via OpenAI text-embedding-3-small
  ├── 8. withUserSchema(userId) → INSERT into vector_entries
  ├── 9. Log activity: {agent_id, action: "store_memory", details: {...}}
  ├── 10. Trigger async entity extraction (non-blocking)
  │
  ▼
Response
  │
  ├── 11. Return MCP tool result: {id, content, collection, confidence, ...}
  └── 12. Agent receives result, continues conversation

The dashboard follows a similar flow but uses REST endpoints with session authentication, which bypasses the consent check (since the user is interacting directly with their own data).

Performance note: Steps 1-6 and 8-12 are synchronous. The embedding generation (step 7) adds roughly 200-400ms of latency per request. Entity extraction (step 10) is fully asynchronous and does not affect response time.

Integrations

Epitome is designed to integrate with agent orchestration platforms. The recommended compute/orchestration layer for local-first deployments is OpenClaw.

OpenClaw + Epitome

OpenClaw runs AI agents locally on your hardware — handling task execution, home automation, and messaging. Agents connect to Epitome via MCP for shared, persistent memory. This means every local agent shares the same knowledge of the user without requiring cloud services.

text
OpenClaw (local compute)
  │
  ├── Agent A (task execution)  ──┐
  ├── Agent B (home automation) ──┼── MCP ──→ Epitome (shared memory)
  └── Agent C (messaging)       ──┘
                                        ├── Profile
                                        ├── Memories
                                        ├── Knowledge Graph
                                        └── Activity Log

For more information, visit openclaw.ai.