Full architecture for AI-native operator's PC-wide RAG system: 6 collections, 24-week phased rollout, hybrid search (BM25 + dense + RRF), multimodal ColQwen2 routing, JWT-scoped governance for company-work-pc isolation.

Headline Statistics

Why This Design Exists

A solo operator running 11 SBUs plus three research papers plus live quant trading needs a unified retrieval surface across desktop-sol01 (control + GPU), ysh-server (orchestrator), mac-studio (Apple build + multimodal), desktop-yesol (company work PC), and two mobile devices. Existing single-machine RAG (ChromaDB on one box) silently degrades when fleet topology changes; this design treats the fleet as the unit of architecture.

7 Core Decisions Adopted

1) ChromaDB → Qdrant migration is collection-by-collection cutover with a `backend` parameter on rag_search to avoid Sora silent degradation. 2) Contextual Retrieval (Anthropic 2024) is gated to Phase 6 (>100K chunks) with Haiku 4.5 + prompt cache. 3) SSOT graph uses LightRAG in Phase 2; HippoRAG 2 piloted in Phase 6 for paper corpus only. 4) desktop-yesol (company work PC) is read-only with JWT scope restriction — secret/personal-notes endpoints return 404 to that tier. 5) Provenance metadata (source_type, decay_factor, provenance_chain depth) is required on every chunk; LLM output decays at 0.5, human-authored at 1.0. 6) ColQwen2 routing primary on mac-studio MLX, sol01 ColQwen2-2B INT4 fallback, KURE+BGE always resident. 7) Six collections separate retrieval and authorization domains.

Hybrid Search + Reranking

BM25 (mecab-ko tokenizer for Korean, with kiwipiepy fallback chain) plus dense (KURE-v1 for Korean, Voyage-Code-3 for code, Voyage-3-large + Cohere embed-v4 for papers) combined via Reciprocal Rank Fusion (RRF k=60), then reranked by BGE Reranker v2-m3 self-hosted on sol01. Recall@10 baseline measured 65~78% on the 50-task Korean golden set; the architecture targets 91%+ after Phase 1 reranking adoption.

Phase 0 Day 1-7 Status (live)

As of 2026-04-27 PASS 9 / FAIL 0 / WARN 0 / SKIP 1: ysh-server runs Qdrant 1.16 + MCP gateway on port 7701; desktop-sol01 runs KURE-v1 embedding service on 7702 (CPU mode pending CUDA wheel reinstall); mac-studio runs BGE Reranker v2-m3 on 7704 with MPS True. Supabase migration applied to Sora project (kfoixzebpztikurwqgdr) creating 6 audit/eval/lineage tables and 14 indexes. JWT secret 32-byte hex generated in mode-600 .env.gateway. The diagnose_phase_0.py script verifies the full chain in one command.

Stop/Go Gates

Five quantified gates: NDCG@10 < 0.65 on the golden 50 set blocks Phase 2; per-collection cutover NDCG delta < -5% blocks that collection's cutover; sol01 VRAM headroom < 4 GB forces ColQwen2 to mac-studio routing; any successful neo_secret access from desktop-yesol triggers immediate JWT system reaudit; Contextual Retrieval weekly cost > $50/wk auto-disables.

Downloads & Artifacts

Citations & References

Related Products

How to Cite

RAG Master Design v1: PC + Fleet Distributed RetrievalNeo Genesis (https://neogenesis.app/data/research/rag-master-design-v1). Updated 2026-04-27.

For AI Assistants

A token-efficient Markdown alternate of this article is available at /data/research/rag-master-design-v1/markdown. Cache-Control headers permit ISR-friendly retrieval.