Metric Memory (RAG)
Extending the LLM's static training limit via dynamic vector retrieval mechanics.
The RAG Pipeline
01 INPUT
Ingestion
PDF/TXT/MD files are parsed and chunked into 500-token segments.
02 PROCESS
Embedding
Chunks are converted to vector floats via `text-embedding-3-small`.
03 STORE
Indexing
Vectors stored in Convex Vector Index for cosine similarity search.
04 RETRIEVE
Injection
Relevant chunks are prepended to the user query before LLM inference.
Supported Data Structures
| Format | Parser | Optimization Strategy |
|---|---|---|
| .md / .txt | Raw Text | Best performance. No formatting overhead. Clean semantic chunks. |
| PDFParser | High noise risk (headers/footers). Use OCR-ready, text-first PDFs only. | |
| .csv | Papaparse | Ideal for product catalogs. Converted to "Row: [Key]: [Value]" string format. |
Usage & Limitations
Context Window Budget
The AI retrieval is capped at 5 chunks per query to preserve token budget. If a document is too large ( more than 20 pages), the semantic search might miss nuances if the query is vague.
Re-Indexing Lag
When you update or delete a file, the `by_instance` index may take up to 10-60 seconds to propagate changes. During this window, the AI might serve stale data.