Metric Memory (RAG)

Extending the LLM's static training limit via dynamic vector retrieval mechanics.

The RAG Pipeline

01 INPUT

Ingestion

PDF/TXT/MD files are parsed and chunked into 500-token segments.

02 PROCESS

Embedding

Chunks are converted to vector floats via `text-embedding-3-small`.

03 STORE

Indexing

Vectors stored in Convex Vector Index for cosine similarity search.

04 RETRIEVE

Injection

Relevant chunks are prepended to the user query before LLM inference.

Supported Data Structures

FormatParserOptimization Strategy
.md / .txtRaw TextBest performance. No formatting overhead. Clean semantic chunks.
.pdfPDFParserHigh noise risk (headers/footers). Use OCR-ready, text-first PDFs only.
.csvPapaparseIdeal for product catalogs. Converted to "Row: [Key]: [Value]" string format.

Usage & Limitations

Context Window Budget

The AI retrieval is capped at 5 chunks per query to preserve token budget. If a document is too large ( more than 20 pages), the semantic search might miss nuances if the query is vague.

Re-Indexing Lag

When you update or delete a file, the `by_instance` index may take up to 10-60 seconds to propagate changes. During this window, the AI might serve stale data.