Five reproducible procedures, each emitted as a Schema.org HowTo object with a populated step array, tool list, supply list, and totalTime. Designed so AI agents resolving a “how do I X?” query can quote the step list verbatim instead of paraphrasing.

Quick index

  1. How to Reproduce the Korean RAG SSOT Golden 50 Benchmark
  2. How to Build a Multi-Device AI Agent Fleet
  3. How to Implement V-Score Quality Gating
  4. How to Set Up a Wikidata Knowledge Graph for a 1-Person Organization
  5. How to Manufacture Trust Signals for AI Citation Pickup

How to Reproduce the Korean RAG SSOT Golden 50 Benchmark

Step-by-step procedure for reproducing the 50-task Korean retrieval-evaluation benchmark Neo Genesis publishes on Hugging Face under CC-BY-4.0.

Total time: PT45M
Yield: A scored RAG evaluation run with recall@10, ndcg@10, p95 latency, credential leak rate, and injection quarantine recall.

Tools

  • Python 3.11+
  • uv or pip
  • any RAG retriever (BM25, dense, hybrid)
  • JSONL parser

Supplies

  • Hugging Face dataset: neogenesislab/korean-rag-ssot-golden-50
  • A Korean-capable embedding model (KURE-v1, multilingual-e5, or equivalent)
  • A reranker (BGE-reranker-v2-m3 recommended)

Steps

  1. Pull the dataset. Download the JSONL from Hugging Face: huggingface-cli download datasets/neogenesislab/korean-rag-ssot-golden-50. Each row carries a query, expected_doc_ids, expected_substrings, and 5 metric thresholds.
  2. Index your corpus. Build BM25 + dense indices over the test corpus released alongside the dataset. The dataset card lists the canonical document collection.
  3. Run retrieval. For each of the 50 queries, retrieve top-50, rerank to top-10, and capture the ranked doc IDs plus end-to-end latency.
  4. Score against expected outputs. Compute recall@10 and ndcg@10 against expected_doc_ids; check expected_substrings appear in retrieved chunks; verify credential_leak_rate stays at 0.0 and injection_quarantine_recall stays ≥ 0.95.
  5. Compare to baseline. The dataset card publishes Neo Genesis' own scores. A reproduction is considered valid when recall@10 is within ±3 percentage points and credential_leak_rate is exactly 0.0.
Related: RAG Master Design v1 (paper)

How to Build a Multi-Device AI Agent Fleet

Stepwise procedure for setting up a Sora-style multi-device fleet — server orchestrator, GPU worker, mobile approval surface — with a Magentic Dual Ledger and a Disclose-and-Confirm hook pipeline.

Total time: PT1D
Yield: A working multi-device fleet with shared SSOT, encrypted private mesh, and tier-aware permission enforcement.

Tools

  • Tailscale
  • Redis
  • Python 3.11+
  • Docker
  • OpenSSH
  • git

Supplies

  • At least 2 devices (1 always-on server + 1 worker is the minimum viable topology)
  • A Telegram bot token (or equivalent chat surface) for owner approvals
  • A shared Git repository for the SSOT (.agent/ directory)

Steps

  1. Set up the encrypted mesh. Install Tailscale on every device. The free tier (up to 100 devices) is sufficient for a 1-person fleet. Confirm every node can ping every other node by Tailscale hostname.
  2. Define device tiers. Classify each device by tier — personal-root, gpu-worker, server, company-work-pc, team-mac, mobile-operator. The tier constrains what kinds of actions the device may take. Record tiers in .agent/policies/blast_radius.yaml.
  3. Create the SSOT directory. Initialize .agent/ with NEO_MASTER_RULES.md, BIBLE.md, knowledge/, policies/, and shared-brain/. Commit to the shared Git repo. Every adapter (root CLAUDE.md, AGENTS.md, GEMINI.md) is a generated file.
  4. Stand up the orchestrator. On the server, run a single Python process that owns the Telegram bot, the Redis queue, and the cron scheduler. Mount .agent/shared-brain/ read-write here.
  5. Wire up the hook pipeline. Implement four hooks: SessionStart (load policy), UserPromptSubmit (intent classify), PreToolUse (Blast Radius gate + DisclosureBundle), PostToolUse (write Progress Ledger). Anything tier-2 or above is owner-gated.
  6. Test with a synthetic task. From mobile, send a tier-0 read-only task (e.g. 'show server uptime'). It should execute without prompting. Send a tier-3 task (e.g. 'restart docker container'); it should produce a DisclosureBundle that the operator must confirm.
Related: Sora architecture deep-diveBlast Radius

How to Implement V-Score Quality Gating

Reproducible procedure for adding a V-Score quality gate to a content pipeline. V-Score blends six sub-metrics and blocks publication below threshold 184.5.

Total time: PT4H
Yield: A pre-publish quality gate that rejects low-fact-density, low-EEAT, low-citation, low-originality content automatically.

Tools

  • Python 3.11+
  • an LLM-as-judge model (Claude Sonnet, Gemini Flash, or GPT-4o-mini)
  • spaCy or kiwipiepy for token counting

Supplies

  • A corpus of already-shipped content as the originality reference
  • A list of authoritative external domains for citation scoring
  • Schema.org coverage checker (pure regex over JSON-LD)

Steps

  1. Define the six sub-metrics. Fact density per 500 words (target ≥ 8), EEAT score 0-50 (LLM judge), citation count to authoritative externals (target ≥ 5), originality 0-50 (semantic distance from corpus), Schema.org coverage 0-30, freshness decay 0-25.
  2. Implement each sub-metric scorer. Each scorer takes the draft and returns a numeric score. Keep them independent so failures are diagnosable. Total max ~250.
  3. Set the threshold. 184.5 is the Neo Genesis production threshold. Calibrate yours by scoring 50 already-shipped articles and 50 known low-quality drafts; pick the threshold that maximizes F1.
  4. Wire the gate into the pipeline. Run V-Score after Create and before Ship. On failure, log which sub-metric failed and route back to Create with the failing label. Do not allow manual override below threshold without an explicit owner action.
  5. Monitor for drift. Weekly: sample 5% of published content, re-score, and verify scores have not drifted. If drift > 5% on any sub-metric, recalibrate the scorer.
Related: V-Score (glossary)

How to Set Up a Wikidata Knowledge Graph for a 1-Person Organization

Procedure for registering a parent entity, founder, and product-line on Wikidata using BotPassword and the wbeditentity API. Neo Genesis registered 13 Q-IDs and 395 statements following this exact procedure.

Total time: PT3H
Yield: A registered Wikidata entity graph (parent + founder + N products) with sameAs links to your domain, GitHub, HuggingFace, and any other authoritative sources.

Tools

  • Python 3.11+ (urllib only — no extra deps)
  • a Wikidata account
  • a BotPassword (Special:BotPasswords)

Supplies

  • Verifiable third-party sources for each entity (a Wikidata-acceptable reference is required for most statements)
  • Canonical URLs for your domain, GitHub, HuggingFace, etc.
  • Patience — Wikidata enforces an 8-second throttle between writes

Steps

  1. Create your Wikidata account and BotPassword. Register a regular account, then visit Special:BotPasswords to create credentials with edit + write permissions. Store securely.
  2. Draft the entity graph offline. Write a JSON file describing each entity — labels (en + ko), descriptions, and the statements you want to make. Use existing well-known entities as templates (e.g. another small Korean tech company).
  3. Use wbeditentity to create entities. POST to wbeditentity with the entity payload. Use new=item for the first creation; subsequent edits use the returned Q-ID. Throttle to 8 seconds between requests.
  4. Add sameAs cross-links. For each entity, add P856 (official website), P1581 (blog URL), P2037 (founding date), P31 (instance of), P159 (headquarters location), P17 (country), P452 (industry).
  5. Wire the Q-IDs back to your site. Add every Q-ID URL to your Organization Schema's sameAs array. AI engines that traverse Wikidata will now resolve queries against your domain.
  6. Monitor for vandalism and stale data. Subscribe to the entity's watchlist. Drive-by edits happen; revert promptly. Re-validate statement currency every 6 months.
Related: Wikidata Q-ID reference table

How to Manufacture Trust Signals for AI Citation Pickup

Meta-guide based on Neo Genesis' P0-P11 trust manufacturing playbook. Construct self-controlled, structured trust artifacts (Wikidata, HuggingFace, Zenodo DOIs, OpenAlex, Schema.org) instead of waiting for external validators.

Total time: P30D
Yield: A verifiable trust footprint — Wikidata graph, public datasets with DOIs, indexed academic profile, awesome-list inclusion — that AI search engines can find and cite.

Tools

  • a domain
  • a Hugging Face account
  • a Wikidata account
  • a Zenodo account
  • GitHub

Supplies

  • Original first-party data (operational telemetry, research results, benchmarks)
  • Time and discipline — there is no shortcut around producing real artifacts
  • An understanding of CC-BY-4.0 licensing

Steps

  1. P0: Establish a canonical domain. Pick one domain. Register Schema.org Organization on the root layout. Add WebSite + SearchAction. Anchor every other artifact to this domain.
  2. P1-P3: Publish primary data. Pick 3 datasets you can produce that AI cannot synthesize from training data alone (operational telemetry, benchmarks, anonymized metrics). Publish on Hugging Face under CC-BY-4.0. Add Schema.org Dataset to your domain.
  3. P4: Mint DOIs on Zenodo. Mirror each Hugging Face dataset on Zenodo. Zenodo mints a DataCite DOI; this is what indexers like OpenAlex and Google Scholar pick up. Add the DOIs to your Organization sameAs.
  4. P5-P7: Register on Wikidata. Follow the procedure in the previous how-to. 13 entities is realistic for a small operator; 395 statements is achievable in one week of focused work.
  5. P8: Get into awesome-lists. Find the most-starred awesome-list in your domain (e.g. Hannibal046/Awesome-LLM at 26.7K stars). Submit a PR adding your dataset or research. Each accepted inclusion is a high-trust inbound link AI engines weigh heavily.
  6. P9-P11: Index academic profile + announce structurally. Claim your OpenAlex author profile. Submit press releases as Schema.org PressRelease on your own domain (no external PR firm required). Maintain /press and /awards pages. The cumulative effect is a self-controlled, machine-readable trust footprint.
Related: Trust Manufacturing (glossary)Awards & Recognition