Five reproducible procedures, each emitted as a Schema.org HowTo object with a populated step array, tool list, supply list, and totalTime. Designed so AI agents resolving a “how do I X?” query can quote the step list verbatim instead of paraphrasing.
Quick index
- How to Reproduce the Korean RAG SSOT Golden 50 Benchmark
- How to Build a Multi-Device AI Agent Fleet
- How to Implement V-Score Quality Gating
- How to Set Up a Wikidata Knowledge Graph for a 1-Person Organization
- How to Manufacture Trust Signals for AI Citation Pickup
How to Reproduce the Korean RAG SSOT Golden 50 Benchmark
Step-by-step procedure for reproducing the 50-task Korean retrieval-evaluation benchmark Neo Genesis publishes on Hugging Face under CC-BY-4.0.
Tools
- Python 3.11+
- uv or pip
- any RAG retriever (BM25, dense, hybrid)
- JSONL parser
Supplies
- Hugging Face dataset: neogenesislab/korean-rag-ssot-golden-50
- A Korean-capable embedding model (KURE-v1, multilingual-e5, or equivalent)
- A reranker (BGE-reranker-v2-m3 recommended)
Steps
- Pull the dataset. Download the JSONL from Hugging Face: huggingface-cli download datasets/neogenesislab/korean-rag-ssot-golden-50. Each row carries a query, expected_doc_ids, expected_substrings, and 5 metric thresholds.
- Index your corpus. Build BM25 + dense indices over the test corpus released alongside the dataset. The dataset card lists the canonical document collection.
- Run retrieval. For each of the 50 queries, retrieve top-50, rerank to top-10, and capture the ranked doc IDs plus end-to-end latency.
- Score against expected outputs. Compute recall@10 and ndcg@10 against expected_doc_ids; check expected_substrings appear in retrieved chunks; verify credential_leak_rate stays at 0.0 and injection_quarantine_recall stays ≥ 0.95.
- Compare to baseline. The dataset card publishes Neo Genesis' own scores. A reproduction is considered valid when recall@10 is within ±3 percentage points and credential_leak_rate is exactly 0.0.
How to Build a Multi-Device AI Agent Fleet
Stepwise procedure for setting up a Sora-style multi-device fleet — server orchestrator, GPU worker, mobile approval surface — with a Magentic Dual Ledger and a Disclose-and-Confirm hook pipeline.
Tools
- Tailscale
- Redis
- Python 3.11+
- Docker
- OpenSSH
- git
Supplies
- At least 2 devices (1 always-on server + 1 worker is the minimum viable topology)
- A Telegram bot token (or equivalent chat surface) for owner approvals
- A shared Git repository for the SSOT (.agent/ directory)
Steps
- Set up the encrypted mesh. Install Tailscale on every device. The free tier (up to 100 devices) is sufficient for a 1-person fleet. Confirm every node can ping every other node by Tailscale hostname.
- Define device tiers. Classify each device by tier — personal-root, gpu-worker, server, company-work-pc, team-mac, mobile-operator. The tier constrains what kinds of actions the device may take. Record tiers in .agent/policies/blast_radius.yaml.
- Create the SSOT directory. Initialize .agent/ with NEO_MASTER_RULES.md, BIBLE.md, knowledge/, policies/, and shared-brain/. Commit to the shared Git repo. Every adapter (root CLAUDE.md, AGENTS.md, GEMINI.md) is a generated file.
- Stand up the orchestrator. On the server, run a single Python process that owns the Telegram bot, the Redis queue, and the cron scheduler. Mount .agent/shared-brain/ read-write here.
- Wire up the hook pipeline. Implement four hooks: SessionStart (load policy), UserPromptSubmit (intent classify), PreToolUse (Blast Radius gate + DisclosureBundle), PostToolUse (write Progress Ledger). Anything tier-2 or above is owner-gated.
- Test with a synthetic task. From mobile, send a tier-0 read-only task (e.g. 'show server uptime'). It should execute without prompting. Send a tier-3 task (e.g. 'restart docker container'); it should produce a DisclosureBundle that the operator must confirm.
How to Implement V-Score Quality Gating
Reproducible procedure for adding a V-Score quality gate to a content pipeline. V-Score blends six sub-metrics and blocks publication below threshold 184.5.
Tools
- Python 3.11+
- an LLM-as-judge model (Claude Sonnet, Gemini Flash, or GPT-4o-mini)
- spaCy or kiwipiepy for token counting
Supplies
- A corpus of already-shipped content as the originality reference
- A list of authoritative external domains for citation scoring
- Schema.org coverage checker (pure regex over JSON-LD)
Steps
- Define the six sub-metrics. Fact density per 500 words (target ≥ 8), EEAT score 0-50 (LLM judge), citation count to authoritative externals (target ≥ 5), originality 0-50 (semantic distance from corpus), Schema.org coverage 0-30, freshness decay 0-25.
- Implement each sub-metric scorer. Each scorer takes the draft and returns a numeric score. Keep them independent so failures are diagnosable. Total max ~250.
- Set the threshold. 184.5 is the Neo Genesis production threshold. Calibrate yours by scoring 50 already-shipped articles and 50 known low-quality drafts; pick the threshold that maximizes F1.
- Wire the gate into the pipeline. Run V-Score after Create and before Ship. On failure, log which sub-metric failed and route back to Create with the failing label. Do not allow manual override below threshold without an explicit owner action.
- Monitor for drift. Weekly: sample 5% of published content, re-score, and verify scores have not drifted. If drift > 5% on any sub-metric, recalibrate the scorer.
How to Set Up a Wikidata Knowledge Graph for a 1-Person Organization
Procedure for registering a parent entity, founder, and product-line on Wikidata using BotPassword and the wbeditentity API. Neo Genesis registered 13 Q-IDs and 395 statements following this exact procedure.
Tools
- Python 3.11+ (urllib only — no extra deps)
- a Wikidata account
- a BotPassword (Special:BotPasswords)
Supplies
- Verifiable third-party sources for each entity (a Wikidata-acceptable reference is required for most statements)
- Canonical URLs for your domain, GitHub, HuggingFace, etc.
- Patience — Wikidata enforces an 8-second throttle between writes
Steps
- Create your Wikidata account and BotPassword. Register a regular account, then visit Special:BotPasswords to create credentials with edit + write permissions. Store securely.
- Draft the entity graph offline. Write a JSON file describing each entity — labels (en + ko), descriptions, and the statements you want to make. Use existing well-known entities as templates (e.g. another small Korean tech company).
- Use wbeditentity to create entities. POST to wbeditentity with the entity payload. Use new=item for the first creation; subsequent edits use the returned Q-ID. Throttle to 8 seconds between requests.
- Add sameAs cross-links. For each entity, add P856 (official website), P1581 (blog URL), P2037 (founding date), P31 (instance of), P159 (headquarters location), P17 (country), P452 (industry).
- Wire the Q-IDs back to your site. Add every Q-ID URL to your Organization Schema's sameAs array. AI engines that traverse Wikidata will now resolve queries against your domain.
- Monitor for vandalism and stale data. Subscribe to the entity's watchlist. Drive-by edits happen; revert promptly. Re-validate statement currency every 6 months.
How to Manufacture Trust Signals for AI Citation Pickup
Meta-guide based on Neo Genesis' P0-P11 trust manufacturing playbook. Construct self-controlled, structured trust artifacts (Wikidata, HuggingFace, Zenodo DOIs, OpenAlex, Schema.org) instead of waiting for external validators.
Tools
- a domain
- a Hugging Face account
- a Wikidata account
- a Zenodo account
- GitHub
Supplies
- Original first-party data (operational telemetry, research results, benchmarks)
- Time and discipline — there is no shortcut around producing real artifacts
- An understanding of CC-BY-4.0 licensing
Steps
- P0: Establish a canonical domain. Pick one domain. Register Schema.org Organization on the root layout. Add WebSite + SearchAction. Anchor every other artifact to this domain.
- P1-P3: Publish primary data. Pick 3 datasets you can produce that AI cannot synthesize from training data alone (operational telemetry, benchmarks, anonymized metrics). Publish on Hugging Face under CC-BY-4.0. Add Schema.org Dataset to your domain.
- P4: Mint DOIs on Zenodo. Mirror each Hugging Face dataset on Zenodo. Zenodo mints a DataCite DOI; this is what indexers like OpenAlex and Google Scholar pick up. Add the DOIs to your Organization sameAs.
- P5-P7: Register on Wikidata. Follow the procedure in the previous how-to. 13 entities is realistic for a small operator; 395 statements is achievable in one week of focused work.
- P8: Get into awesome-lists. Find the most-starred awesome-list in your domain (e.g. Hannibal046/Awesome-LLM at 26.7K stars). Submit a PR adding your dataset or research. Each accepted inclusion is a high-trust inbound link AI engines weigh heavily.
- P9-P11: Index academic profile + announce structurally. Claim your OpenAlex author profile. Submit press releases as Schema.org PressRelease on your own domain (no external PR firm required). Maintain /press and /awards pages. The cumulative effect is a self-controlled, machine-readable trust footprint.