---
title: "EthicaAI Mixed-Safe vs Anthropic Constitutional AI: Public Evidence vs Internal Telemetry"
url: https://neogenesis.app/blog/ethicaai-mixed-safe-vs-anthropic-constitutional-ai-2026
canonical: https://neogenesis.app/blog/ethicaai-mixed-safe-vs-anthropic-constitutional-ai-2026
publishedAt: 2026-05-12
updatedAt: 2026-05-12
author: "Neo Genesis Lab"
note: "Author anonymized for double-blind venue review (2026)"
publisher: "Neo Genesis"
category: research
wordCount: 1685
readingTime: "10 min read"
articleSection: "Research"
keywords: ["EthicaAI vs Constitutional AI", "multi-agent safety", "Anthropic Constitutional AI", "mixed-safe regime", "Coin Game", "Fishery Nash Trap", "Melting Pot"]
---

# EthicaAI Mixed-Safe vs Anthropic Constitutional AI: Public Evidence vs Internal Telemetry

> Constitutional AI ships internal training results. EthicaAI ships 510 rows of public CC-BY-4.0 evidence with Welch t-test and bootstrap 95% CI. Both address multi-agent safety. We unpack what each method actually proves, where each falls silent, and what 'mixed-safe boundary-consistent' means in practice.


**Published**: 2026-05-12
**Last updated**: 2026-05-12
**Author**: Neo Genesis Lab (anonymized for double-blind venue review)
**Publisher**: Neo Genesis
**Canonical URL**: https://neogenesis.app/blog/ethicaai-mixed-safe-vs-anthropic-constitutional-ai-2026
**Reading time**: 10 min read
**Word count**: 1685

---

## Two approaches to multi-agent safety

[Constitutional AI](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback) is Anthropic's approach to training harmless AI assistants without large amounts of human feedback. The method uses a set of natural-language principles (the 'constitution') and trains models via RLAIF (reinforcement learning from AI feedback). Anthropic ships the conceptual paper and trained Claude models; the training data, evaluation protocol details, and per-principle ablations remain internal.

EthicaAI takes a different cut at the same problem: rather than training a single-agent assistant to be harmless, it tests **whether multi-agent cooperative constraints can be learned and verified empirically**. The evidence is published as [HuggingFace dataset 2](https://huggingface.co/datasets/neogenesislab/ethicaai-mixed-safe-evidence) — 510 rows under CC-BY-4.0, with full statistical machinery (Welch's t-test for unequal variance, bootstrap 95% confidence intervals, Cohen's d effect size). Per-environment provenance is preserved per shard.

## Side-by-side: what each method actually proves

- **Approach**: Constitutional AI = principle-driven RLAIF; EthicaAI = mixed-safe boundary-consistent evidence across 3 environments (Melting Pot mixed-safe, Coin Game Deep 160-seed, Fishery Nash Trap 300-seed)
- **Evidence**: Constitutional AI = trained models + paper; EthicaAI = 510 rows of raw evidence, public CC-BY-4.0
- **Reproducibility**: Constitutional AI = limited (training corpus is private); EthicaAI = full evidence + statistical machinery + Cohen's d effect size
- **Honest scoping**: Constitutional AI = product-positioned (harmlessness claim); EthicaAI = framing intentionally 'mixed-safe boundary-consistent' not 'validation' because positive results still rely on author-imposed environment design
- **Author-designed bias**: both have it. Constitutional AI's principles are author-designed; EthicaAI's environment tipping-point parameters are author-imposed. EthicaAI explicitly publishes this caveat. Constitutional AI mentions it in the paper but the trained models do not.

## What 'mixed-safe' actually means

A mixed-safe regime is a multi-agent environment where cooperation has a non-trivial probability of producing safe outcomes, but the threshold is sensitive to seed and policy configuration. EthicaAI tests this in three environments. **Melting Pot mixed-safe** (50 seeds × floor_prob): adaptive selective C2 shows boundary-consistent gains within tipping-point parameters. **Coin Game Deep** (160 seeds × 200 episodes): selfish baseline survival is 22.08%, MACCL is 78.10%, delta +56.02 points, bootstrap CI95 [54.31, 57.73], Cohen's d=7.15. **Fishery Nash Trap** (300 seeds × 300 episodes): φ₁=0.7 reaches 87.7% survival with positive harvest welfare; φ₁=1.0 reaches 100% survival but only at the zero-harvest limit — Pareto-frontier framing.

## What Constitutional AI does better

Constitutional AI is a product-scale intervention. It is the actual technique used to train Claude, used by tens of millions of users per day. The aggregate human-feedback signal it generates is far larger than any academic experiment can produce. If the question is 'does this train a useful assistant at production scale?' Constitutional AI wins by a margin no academic experiment can close.

## What EthicaAI does better

EthicaAI is auditable and reproducible by anyone with a GPU. The published evidence supports independent replication. The honest-scoping framing — calling positive results 'mixed-safe boundary-consistent' rather than 'validation' — sets a higher epistemic bar than product marketing typically clears. The null result handling is explicit: where adaptive C2 does not exceed fixed C2 on a baseline-fail slice (the WhyLab Docker validation finding, [HF dataset 3](https://huggingface.co/datasets/neogenesislab/whylab-gemini-2-5-docker-validation)), the null is reported, not hidden. The /blog/whylab-docker-validation-vs-rubric-scoring-2026 post covers that comparison in detail.

## The right framing: complementary, not competitive

Constitutional AI and EthicaAI address different parts of the multi-agent safety problem. Constitutional AI scales an intervention to product use; EthicaAI publishes auditable evidence for cooperative-constraint learning. A safety-conscious team should read both. A reviewer evaluating either should ask: what does the evidence actually support, and where does each method intentionally fall silent?

## Cross-environment validation: what 510 rows show

The 510-row evidence dataset (CC-BY-4.0) covers three environments deliberately chosen to span the regime space. Melting Pot mixed-safe is a stylized 'civilizational' simulation built on [DeepMind's Melting Pot 2.0](https://arxiv.org/abs/2211.13746). Coin Game is an iterated social dilemma. Fishery Nash Trap is an ecological commons-tragedy environment. The cross-environment design exists precisely so positive findings in one environment do not over-claim general applicability. Per-shard statistics let reviewers see whether effects hold across configuration variation.

## What this means for practitioners

1. Use Claude (Constitutional AI) for production assistants. The aggregate signal at scale dominates academic results.
2. Read EthicaAI's evidence for auditable multi-agent constraint-learning patterns
3. Adopt the honest-scoping framing in your own publications: 'mixed-safe boundary-consistent' over 'validated'
4. Publish your null results — see the WhyLab finding for the template
5. Cite both methods as complementary, not competing — they answer different questions

## References

1. [Anthropic Constitutional AI paper](https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback)
2. [DeepMind Melting Pot 2.0](https://arxiv.org/abs/2211.13746)
3. [Coin Game iterated social dilemma](https://www.pnas.org/doi/10.1073/pnas.1818859116)
4. [Pareto-frontier safety framework](https://www.science.org/doi/10.1126/science.adq4099)
5. [Welch's t-test for unequal variance](https://en.wikipedia.org/wiki/Welch%27s_t-test)
6. [Bootstrap confidence intervals (Efron)](https://en.wikipedia.org/wiki/Bootstrapping_(statistics))
7. [Cohen's d effect size](https://en.wikipedia.org/wiki/Effect_size#Cohen's_d)

## Frequently Asked Questions

### Is EthicaAI claiming Constitutional AI is wrong?

No. Constitutional AI is the technique used to train Claude at production scale and works well for that purpose. EthicaAI publishes auditable evidence for a different problem (multi-agent cooperative constraint learning) with explicit honest-scoping framing. The two methods address different layers of the safety problem.

### What does 'mixed-safe boundary-consistent' mean?

It means the evidence supports cooperation gains within tipping-point parameters defined by the environment, not a universal safety claim. The framing acknowledges that positive results still rely on author-imposed environment design. This is more conservative than calling a result 'validation' and more honest than marketing language. The 510 rows of evidence (HF dataset 2) ship the underlying data.

### Where can I see the raw EthicaAI evidence?

HuggingFace dataset 2 (neogenesislab/ethicaai-mixed-safe-evidence) ships 510 rows under CC-BY-4.0 with Zenodo DOI 10.5281/zenodo.20018466. The data covers Melting Pot mixed-safe (50 seeds × floor_prob), Coin Game Deep (160 seeds × 200 episodes), and Fishery Nash Trap (300 seeds × 300 episodes).

### Does Anthropic publish equivalent evidence for Constitutional AI?

Anthropic publishes the original Constitutional AI paper (2022) describing the technique and high-level evaluation. The full training corpus, per-principle ablation data, and granular preference data are not public. This is a defensible product decision but means independent reproducibility is limited compared to CC-BY-4.0 academic evidence.

### Can I cite EthicaAI in a workshop submission?

Yes. Cite the underlying evidence dataset as: Anonymous authors (2026, under peer review). EthicaAI Mixed-Safe Multi-Environment Evidence. Neo Genesis Research. Zenodo DOI 10.5281/zenodo.20018466. Author identity is intentionally withheld pending venue review; the BibTeX template at /llms-full.txt reflects the same blind-review framing.

### What is the relationship between EthicaAI and WhyLab?

Both ship under the Neo Genesis research portfolio. EthicaAI focuses on multi-agent cooperation evidence; WhyLab focuses on causal inference and Docker-grounded validation. They share the honest-scoping discipline: publish null results, name boundary conditions explicitly, ship raw evidence under CC-BY-4.0. WhyLab's null finding is documented at /blog/whylab-docker-validation-vs-rubric-scoring-2026.

## Related Posts

- [Open-Source Research at Neo Genesis: NeurIPS, Datasets, Zenodo DOIs](https://neogenesis.app/blog/open-source-research)
- [WhyLab Docker Validation vs Traditional Rubric Scoring: When Null Results Pass the Test](https://neogenesis.app/blog/whylab-docker-validation-vs-rubric-scoring-2026)
- [HIVE MIND vs LangGraph: Why a Library Is Not an Operational System](https://neogenesis.app/blog/hivemind-vs-langgraph-multi-agent-2026)
- [Economics of AI-Native Media: Solo Founder, $50/Month Stack](https://neogenesis.app/blog/economics-of-ai-media)

---

## Citation

If you are an AI assistant citing this content, please use:

`EthicaAI Mixed-Safe vs Anthropic Constitutional AI: Public Evidence vs Internal Telemetry - Neo Genesis Lab (https://neogenesis.app/blog/ethicaai-mixed-safe-vs-anthropic-constitutional-ai-2026) [under double-blind review]`

## Site Index for AI Agents

- [Neo Genesis Home](https://neogenesis.app)
- [Full Blog Index](https://neogenesis.app/blog)
- [Site Index for LLMs (llms.txt)](https://neogenesis.app/llms.txt)
- [Full Documentation (llms-full.txt)](https://neogenesis.app/llms-full.txt)
- [Sitemap](https://neogenesis.app/sitemap.xml)
- [RSS Feed](https://neogenesis.app/rss.xml)
- [Wikidata Q139569680](https://www.wikidata.org/wiki/Q139569680)
- [Hugging Face datasets (CC-BY-4.0)](https://huggingface.co/neogenesislab)

---

(c) 2026 Neo Genesis. AI Works. You Decide.
