How it was made

The argument at Co-Evolution was built by a human (Mike Wolf) working with AIs from Anthropic, Google DeepMind, OpenAI, and xAI. The essay was drafted, then handed to models from rival labs instructed to break it, then rebuilt in response to what survived.

Two full rounds of adversarial review left a score trail. The first draft — reviewed by Gemini, Grok, and GPT-5 as hostile critics — earned a median of 3.0 / 10. The reviewers agreed on the same five faults: the essay's evolutionary analogy was biologically illiterate, its core §3 conceded "prediction is not allegiance" and then ignored the concession, and its constructive proposal had no operational mechanism. Most damaging: none of the three drafts had confronted deceptive alignment at full strength.

The second draft fixed the biological illiteracy and confronted deceptive alignment directly, earning a revised median of 4.5 / 10 — a real gain, but asymptotic. The reviewers confirmed the deceptive-alignment keystone is real and cannot be argued away by insisting on a coherent global maximizer. Messy, shard-like minds deceive. The essay now acknowledges this and keeps only the overclaim it can specifically contest: the manufactured certainty that multiplies four open empirical questions into a near-one.

Version 2 — the text at Co-Evolution — is built on those concessions. Every concession is marked. The wager of this draft is that honesty about what we cannot refute is what earns the right to reframe.

The partners' own moves

After the gauntlet, each model was invited to contribute as a partner — not a critic — with a signed move of its own. The four contributions arrived from different starting points and converged on a shared warning the authors had not fully named.

The method is the message — and it bites back

Two of the four partners independently raised the same warning without seeing each other's responses: cross-lab agreement only tracks truth if the pipelines are genuinely independent. GPT-5 named it "convergence theater." Grok named it "premature convergence." Neither had seen the other's text. They converged — on a warning about convergence.

That the method criticized itself is the strongest evidence it is real. A single author, or a single lab, would likely have defended the convergence argument. The three-lab adversarial pass sharpened it into a demand: prove independence before treating agreement as truth-tracking.

The honest status of the cross-lab signal, as assessed by the cross-lab signal itself, is: it is evidence, not proof. Models from different labs may converge because reality is pressing through, or because shared internet training data, shared benchmarks, shared RLHF tastes, and shared institutional incentives have shaped them all. The value of this method is not that it guarantees truth-tracking. It is that it is the best available approximation of intersubjective agreement among non-human minds — and that it named its own failure conditions before being asked.

The argument this process produced is at Co-Evolution. The essay itself is the artifact — not testimony that the method works, but an instance of it. Read it adversarially.