Knowledge Base (RAG)

A hybrid retrieval-augmented generation pipeline powered by the memcity component — combining vector search, BM25 keyword matching, knowledge graphs, and episodic memory.

Ingestion Pipeline

01 UPLOAD

Document Intake

Files, text, URLs, or skills are submitted via the dashboard or AI tools.

02 CHUNK

Retrieval-Aware Chunking

Content is split using a retrieval-optimized strategy that preserves semantic coherence.

03 EMBED

Jina Embeddings v4

Each chunk gets a 1024-dimensional contextual embedding + BM25 keyword index.

04 ENRICH

Graph + Summaries

Entities and relationships are extracted into a knowledge graph. RAPTOR summaries are built.

Search Pipeline (searchContext)

When the AI agent needs information, the searchContext tool triggers a multi-stage retrieval pipeline:

01
Query Routing
Classifies query complexity (simple / moderate / complex)
02
Query Expansion
Decomposes complex queries + generates search variants
03
HyDE Generation
Creates hypothetical document embeddings for better retrieval
04
Hybrid Search
Parallel vector search (Jina v4) + BM25 keyword matching
05
RRF Fusion
Reciprocal Rank Fusion merges vector + keyword results
06
Graph Traversal
Entity search + knowledge graph relationship traversal
07
Reranking
Jina Reranker v3 scores and reorders all candidates
08
Chunk Expansion
Neighbor stitching restores surrounding context
09
Memory Search
Episodic memories retrieved per-user (facts, preferences)
10
Format & Return
Context string assembled and returned to the LLM

Supported Formats

FormatCategoryNotes
.txt / .mdTextBest performance. Clean semantic chunks with no formatting overhead.
.pdfDocumentExtracted via document processing. Text-first PDFs work best.
.docx / .pptx / .xlsxOfficeMicrosoft Office formats supported. Content extracted and chunked.
.csv / .jsonDataStructured data formats. Great for product catalogs and records.
.htmlWebHTML content parsed and cleaned before chunking.
URLsWebWeb pages fetched, content extracted, and ingested automatically.

Episodic Memory

Beyond document retrieval, the system maintains per-user episodic memories — facts, preferences, and interaction patterns extracted from conversations.

How It Works

A daily cron job analyzes recent conversations and extracts user-specific information (names, preferences, past requests, interaction styles). These memories are stored per-user in memcity and automatically included in search results for that user, enabling personalized responses that remember context across conversations.

Knowledge Graph

During ingestion, entities (people, products, concepts) and their relationships are automatically extracted and stored in a knowledge graph. During search, the graph is traversed to find connections that pure vector similarity might miss.

Architecture

Instance Isolation

Each WhatsApp instance has its own isolated knowledge base. Documents uploaded to one instance are never visible to another. Each contact (WhatsApp user) gets their own memory space for episodic memories.

Processing Time

Document processing (chunking, embedding, entity extraction, summarization) runs asynchronously. Small documents complete in seconds; large PDFs may take a few minutes. The document status transitions from pending to complete when ready.