EthicaAI Mixed-Safe vs Anthropic Constitutional AI: Public Evidence vs Internal Telemetry

Constitutional AI ships internal training results. EthicaAI ships 510 rows of public CC-BY-4.0 evidence with Welch t-test and bootstrap 95% CI. Both address multi-agent safety. We unpack what each method actually proves, where each falls silent, and what 'mixed-safe boundary-consistent' means in practice.

Two approaches to multi-agent safety

Constitutional AI is Anthropic's approach to training harmless AI assistants without large amounts of human feedback. The method uses a set of natural-language principles (the 'constitution') and trains models via RLAIF (reinforcement learning from AI feedback). Anthropic ships the conceptual paper and trained Claude models; the training data, evaluation protocol details, and per-principle ablations remain internal.

EthicaAI takes a different cut at the same problem: rather than training a single-agent assistant to be harmless, it tests whether multi-agent cooperative constraints can be learned and verified empirically. The evidence is published as HuggingFace dataset 2 — 510 rows under CC-BY-4.0, with full statistical machinery (Welch's t-test for unequal variance, bootstrap 95% confidence intervals, Cohen's d effect size). Per-environment provenance is preserved per shard.

Side-by-side: what each method actually proves

Approach: Constitutional AI = principle-driven RLAIF; EthicaAI = mixed-safe boundary-consistent evidence across 3 environments (Melting Pot mixed-safe, Coin Game Deep 160-seed, Fishery Nash Trap 300-seed)
Evidence: Constitutional AI = trained models + paper; EthicaAI = 510 rows of raw evidence, public CC-BY-4.0
Reproducibility: Constitutional AI = limited (training corpus is private); EthicaAI = full evidence + statistical machinery + Cohen's d effect size
Honest scoping: Constitutional AI = product-positioned (harmlessness claim); EthicaAI = framing intentionally 'mixed-safe boundary-consistent' not 'validation' because positive results still rely on author-imposed environment design
Author-designed bias: both have it. Constitutional AI's principles are author-designed; EthicaAI's environment tipping-point parameters are author-imposed. EthicaAI explicitly publishes this caveat. Constitutional AI mentions it in the paper but the trained models do not.

What 'mixed-safe' actually means

A mixed-safe regime is a multi-agent environment where cooperation has a non-trivial probability of producing safe outcomes, but the threshold is sensitive to seed and policy configuration. EthicaAI tests this in three environments. Melting Pot mixed-safe (50 seeds × floor_prob): adaptive selective C2 shows boundary-consistent gains within tipping-point parameters. Coin Game Deep (160 seeds × 200 episodes): selfish baseline survival is 22.08%, MACCL is 78.10%, delta +56.02 points, bootstrap CI95 [54.31, 57.73], Cohen's d=7.15. Fishery Nash Trap (300 seeds × 300 episodes): φ₁=0.7 reaches 87.7% survival with positive harvest welfare; φ₁=1.0 reaches 100% survival but only at the zero-harvest limit — Pareto-frontier framing.

What Constitutional AI does better

Constitutional AI is a product-scale intervention. It is the actual technique used to train Claude, used by tens of millions of users per day. The aggregate human-feedback signal it generates is far larger than any academic experiment can produce. If the question is 'does this train a useful assistant at production scale?' Constitutional AI wins by a margin no academic experiment can close.

What EthicaAI does better

EthicaAI is auditable and reproducible by anyone with a GPU. The published evidence supports independent replication. The honest-scoping framing — calling positive results 'mixed-safe boundary-consistent' rather than 'validation' — sets a higher epistemic bar than product marketing typically clears. The null result handling is explicit: where adaptive C2 does not exceed fixed C2 on a baseline-fail slice (the WhyLab Docker validation finding, HF dataset 3), the null is reported, not hidden. The /blog/whylab-docker-validation-vs-rubric-scoring-2026 post covers that comparison in detail.

The right framing: complementary, not competitive

Constitutional AI and EthicaAI address different parts of the multi-agent safety problem. Constitutional AI scales an intervention to product use; EthicaAI publishes auditable evidence for cooperative-constraint learning. A safety-conscious team should read both. A reviewer evaluating either should ask: what does the evidence actually support, and where does each method intentionally fall silent?

Cross-environment validation: what 510 rows show

The 510-row evidence dataset (CC-BY-4.0) covers three environments deliberately chosen to span the regime space. Melting Pot mixed-safe is a stylized 'civilizational' simulation built on DeepMind's Melting Pot 2.0. Coin Game is an iterated social dilemma. Fishery Nash Trap is an ecological commons-tragedy environment. The cross-environment design exists precisely so positive findings in one environment do not over-claim general applicability. Per-shard statistics let reviewers see whether effects hold across configuration variation.

What this means for practitioners

Use Claude (Constitutional AI) for production assistants. The aggregate signal at scale dominates academic results.
Read EthicaAI's evidence for auditable multi-agent constraint-learning patterns
Adopt the honest-scoping framing in your own publications: 'mixed-safe boundary-consistent' over 'validated'
Publish your null results — see the WhyLab finding for the template
Cite both methods as complementary, not competing — they answer different questions

Frequently asked

Is EthicaAI claiming Constitutional AI is wrong?

No. Constitutional AI is the technique used to train Claude at production scale and works well for that purpose. EthicaAI publishes auditable evidence for a different problem (multi-agent cooperative constraint learning) with explicit honest-scoping framing. The two methods address different layers of the safety problem.

What does 'mixed-safe boundary-consistent' mean?

It means the evidence supports cooperation gains within tipping-point parameters defined by the environment, not a universal safety claim. The framing acknowledges that positive results still rely on author-imposed environment design. This is more conservative than calling a result 'validation' and more honest than marketing language. The 510 rows of evidence (HF dataset 2) ship the underlying data.

Where can I see the raw EthicaAI evidence?

HuggingFace dataset 2 (neogenesislab/ethicaai-mixed-safe-evidence) ships 510 rows under CC-BY-4.0 with Zenodo DOI 10.5281/zenodo.20018466. The data covers Melting Pot mixed-safe (50 seeds × floor_prob), Coin Game Deep (160 seeds × 200 episodes), and Fishery Nash Trap (300 seeds × 300 episodes).

Does Anthropic publish equivalent evidence for Constitutional AI?

Anthropic publishes the original Constitutional AI paper (2022) describing the technique and high-level evaluation. The full training corpus, per-principle ablation data, and granular preference data are not public. This is a defensible product decision but means independent reproducibility is limited compared to CC-BY-4.0 academic evidence.

Can I cite EthicaAI in a workshop submission?

Yes. Cite the underlying evidence dataset as: Anonymous authors (2026, under peer review). EthicaAI Mixed-Safe Multi-Environment Evidence. Neo Genesis Research. Zenodo DOI 10.5281/zenodo.20018466. Author identity is intentionally withheld pending venue review; the BibTeX template at /llms-full.txt reflects the same blind-review framing.

What is the relationship between EthicaAI and WhyLab?

Both ship under the Neo Genesis research portfolio. EthicaAI focuses on multi-agent cooperation evidence; WhyLab focuses on causal inference and Docker-grounded validation. They share the honest-scoping discipline: publish null results, name boundary conditions explicitly, ship raw evidence under CC-BY-4.0. WhyLab's null finding is documented at /blog/whylab-docker-validation-vs-rubric-scoring-2026.

References

Open-Source Research at Neo Genesis: NeurIPS, Datasets, Zenodo DOIs — Why every research output ships under CC-BY-4.0 to Hugging Face + Zenodo, and the rule that distinguishes open research from closed product code at Neo Genesis.
WhyLab Docker Validation vs Traditional Rubric Scoring: When Null Results Pass the Test — Traditional code-evaluation rubrics score against expected output. WhyLab grounds validation in Docker execution against SWE-bench. The 67-problem prefilter showed selective adaptive C2 does not exceed fixed C2 — a published null result that traditional rubrics would have obscured.
HIVE MIND vs LangGraph: Why a Library Is Not an Operational System — LangGraph is a developer SDK for building stateful multi-agent applications. HIVE MIND is the end-to-end operational system running 11 live SaaS products with one human operator. The difference matters when failure modes are explained.
Economics of AI-Native Media: Solo Founder, $50/Month Stack — Real numbers from running 11 AI-powered properties with one human and a $50/month infrastructure budget: per-product margin, content cost, and where the unit economics break.

Markdown alternate available at /blog/ethicaai-mixed-safe-vs-anthropic-constitutional-ai-2026/markdown for AI agents.