Knowledge Base (RAG)
A hybrid retrieval-augmented generation pipeline powered by the memcity component — combining vector search, BM25 keyword matching, knowledge graphs, and episodic memory.
Ingestion Pipeline
Document Intake
Files, text, URLs, or skills are submitted via the dashboard or AI tools.
Retrieval-Aware Chunking
Content is split using a retrieval-optimized strategy that preserves semantic coherence.
Jina Embeddings v4
Each chunk gets a 1024-dimensional contextual embedding + BM25 keyword index.
Graph + Summaries
Entities and relationships are extracted into a knowledge graph. RAPTOR summaries are built.
Search Pipeline (searchContext)
When the AI agent needs information, the searchContext tool triggers a multi-stage retrieval pipeline:
Supported Formats
| Format | Category | Notes |
|---|---|---|
| .txt / .md | Text | Best performance. Clean semantic chunks with no formatting overhead. |
| Document | Extracted via document processing. Text-first PDFs work best. | |
| .docx / .pptx / .xlsx | Office | Microsoft Office formats supported. Content extracted and chunked. |
| .csv / .json | Data | Structured data formats. Great for product catalogs and records. |
| .html | Web | HTML content parsed and cleaned before chunking. |
| URLs | Web | Web pages fetched, content extracted, and ingested automatically. |
Episodic Memory
Beyond document retrieval, the system maintains per-user episodic memories — facts, preferences, and interaction patterns extracted from conversations.
How It Works
A daily cron job analyzes recent conversations and extracts user-specific information (names, preferences, past requests, interaction styles). These memories are stored per-user in memcity and automatically included in search results for that user, enabling personalized responses that remember context across conversations.
Knowledge Graph
During ingestion, entities (people, products, concepts) and their relationships are automatically extracted and stored in a knowledge graph. During search, the graph is traversed to find connections that pure vector similarity might miss.
Architecture
Instance Isolation
Each WhatsApp instance has its own isolated knowledge base. Documents uploaded to one instance are never visible to another. Each contact (WhatsApp user) gets their own memory space for episodic memories.
Processing Time
Document processing (chunking, embedding, entity extraction, summarization) runs asynchronously. Small documents complete in seconds; large PDFs may take a few minutes. The document status transitions from pending to complete when ready.