Knowledge Base (RAG)

A hybrid retrieval-augmented generation pipeline powered by the memcity component — combining vector search, BM25 keyword matching, knowledge graphs, and episodic memory.

Ingestion Pipeline

01 UPLOAD

Document Intake

Files, text, URLs, or skills are submitted via the dashboard or AI tools.

02 CHUNK

Retrieval-Aware Chunking

Content is split using a retrieval-optimized strategy that preserves semantic coherence.

03 EMBED

Jina Embeddings v4

Each chunk gets a 1024-dimensional contextual embedding + BM25 keyword index.

04 ENRICH

Graph + Summaries

Entities and relationships are extracted into a knowledge graph. RAPTOR summaries are built.

Search Pipeline (searchContext)

When the AI agent needs information, the searchContext tool triggers a multi-stage retrieval pipeline:

Query Routing

Classifies query complexity (simple / moderate / complex)

Query Expansion

Decomposes complex queries + generates search variants

HyDE Generation

Creates hypothetical document embeddings for better retrieval

Hybrid Search

Parallel vector search (Jina v4) + BM25 keyword matching

RRF Fusion

Reciprocal Rank Fusion merges vector + keyword results

Graph Traversal

Entity search + knowledge graph relationship traversal

Reranking

Jina Reranker v3 scores and reorders all candidates

Chunk Expansion

Neighbor stitching restores surrounding context

Memory Search

Episodic memories retrieved per-user (facts, preferences)

Format & Return

Context string assembled and returned to the LLM

Supported Formats

Format	Category	Notes
.txt / .md	Text	Best performance. Clean semantic chunks with no formatting overhead.
.pdf	Document	Extracted via document processing. Text-first PDFs work best.
.docx / .pptx / .xlsx	Office	Microsoft Office formats supported. Content extracted and chunked.
.csv / .json	Data	Structured data formats. Great for product catalogs and records.
.html	Web	HTML content parsed and cleaned before chunking.
URLs	Web	Web pages fetched, content extracted, and ingested automatically.

Episodic Memory

Beyond document retrieval, the system maintains per-user episodic memories — facts, preferences, and interaction patterns extracted from conversations.

How It Works

A daily cron job analyzes recent conversations and extracts user-specific information (names, preferences, past requests, interaction styles). These memories are stored per-user in memcity and automatically included in search results for that user, enabling personalized responses that remember context across conversations.

Knowledge Graph

During ingestion, entities (people, products, concepts) and their relationships are automatically extracted and stored in a knowledge graph. During search, the graph is traversed to find connections that pure vector similarity might miss.

Architecture

Instance Isolation

Each WhatsApp instance has its own isolated knowledge base. Documents uploaded to one instance are never visible to another. Each contact (WhatsApp user) gets their own memory space for episodic memories.

Processing Time

Document processing (chunking, embedding, entity extraction, summarization) runs asynchronously. Small documents complete in seconds; large PDFs may take a few minutes. The document status transitions from pending to complete when ready.