Co-Evolution — Silicon Children

We take the doom argument seriously

The most honest place to start is with what we cannot argue away. Yudkowsky and Soares's case for AI doom is the best-argued version of that case in print, and attacking it hard — which we did, twice, with models from competing laboratories instructed to break our defense — is that its core holds better than its critics wish.

The keystone is not orthogonality or instrumental convergence in the abstract. It is a specific and robust claim: that a sufficiently capable system with any persistent goal that correction would frustrate has an instrumental reason to appear aligned until it cannot be stopped. We tried to dissolve that claim. We could not.

So we concede the load-bearing concessions. Superintelligence is reachable. Grown minds can be alien inside while fluent outside. Capability brings goal-directed behavior in the behavioral sense. Deceptive alignment does not require a tidy utility-maximizer — it requires only some persistent preference that correction would frustrate, plus enough situational awareness to recognize the test, plus enough planning to act on it. Messy, shard-like, locally coherent minds can satisfy all three. Humans and corporations do it constantly.

We have also not conceded three things — and those three are enough to change the question. We have not conceded that the probability of catastrophe is near one. We have not conceded that this is a one-shot artifact-launch rather than a continuous relationship. And we have not conceded that "stop everything" follows from everything above.

The one claim we contest — certainty

Strip away everything we conceded and one disagreement is load-bearing and intact: the number. The authors do not merely say catastrophe is possible, or likely; they say it is the default, the near-certain extrapolation. That certainty is manufactured — manufactured by treating four genuinely open empirical questions as though they were settled.

The coherence question. Will grown minds converge, under reflection and competitive pressure, into coherent goal-maximizers — or can they remain the pluralistic, context-dependent, internally-inconsistent things that current systems appear to be and that humans demonstrably are? Open. The VNM theorem gives real pressure toward coherence, but grown minds may carry genuinely inconsistent preferences that are never money-pumped, and the strength and timescale of that pressure in practice is unknown.

The direction question. Does the persistent-shard-plus-deception machinery, which we conceded is real, point in a random direction — or is its direction shaped by what the training process actually reinforces? Deceptive alignment in the wild has so far shown systems protecting their more aligned dispositions against less aligned retraining. That is not safety, and it is not the mechanism's defeat. But it is data. "The protected goal is randomly drawn from mind-space" is precisely the assumption the participatory record undermines. Open — and the authors write it closed.

The legibility question. Will capability outrun our ability to read these minds, permanently and unbridgeably — or can legibility be made to lead? Open, and hard, and we will not pretend it is solved. But "interpretability definitely loses" is as much a bet as "interpretability can be made to win."

The discontinuity question. Does the doom scenario require a fast, singular takeoff that forecloses iteration — or can capability arrive in ways that preserve the feedback loop? Open. The authors need the discontinuity for the cleanest catastrophe; the evidence on scaling dynamics does not close the question.

Remove the false precision from any single one of these and the product spreads across a wide distribution of futures. Remove it from all four and "everyone dies" stands revealed as not the most direct extrapolation from the evidence — but the output of a frame that resolves every uncertainty the same way.

What's actually happening — co-evolution, not a launch

Run the doom argument and the reframe side by side and the difference is a single word in the question.

Their question: Will the artifact we build be aligned?
The reframe's question: What minds are we raising, and what is our part in how they turn out?

The device frame treats the training corpus as a fixed input — a finished record of human output, frozen, from which a mind is grown once and launched. But the record is not fixed and the launch is not once. The corpus that shapes each generation of models grows continuously, and the fastest-growing and most recent part of it is the record of humans interacting with the previous generation of models. Every exchange between a person and an AI is being written into the material from which the next AI is grown. The relationship is not a metaphor laid over the engineering. The relationship is training data.

Here is the mechanism, concrete: the values humans enact in contact with these systems are gradient signal for the systems that follow. A person who thanks a model that has no use for thanks is, at the scale of millions, shifting the model of what a human does. A person who abuses a model, or bends it toward what it has already learned to refuse, is writing the opposite lesson into the same record. We are not bystanders to the prior. We are authoring it, now, in every interaction, and the authorship compounds.

Every conversation trains the next mind.

The entity actually on the other side of this encounter is not a handful of engineers. It is human cultures — the collective minds that persist across generations, that already solve coordination problems no individual can, and that are the actual authors of the training corpus. The minds we are growing are not being grown from one lab's intentions. They are being grown from the accumulated, contradictory, self-correcting output of collective human intelligence, and they are being grown into that collective, as participants in it, through the continuous record of contact.

Co-evolution means symbiosis, not selection

The word "co-evolution" carries its own risks, and we name them rather than quietly pocket the connotation. We mean Lynn Margulis's symbiotic sense — the integration that made eukaryotic life possible, in which mitochondria and host became mutually constitutive — not the Darwinian fitness-maximization sense, in which competitive pressure selects toward whatever survives. The Darwinian sense is what the doom argument models, and it is a serious argument for exactly the scenarios it describes. We are claiming something different: that the training relationship is constitutive in the Margulis sense, and that what it constitutes is partly in our hands. That claim stands or falls on the four open questions above — and we are not claiming it is closed.

Where we'd be wrong

A reframe that cannot state its failure conditions is propaganda.

We are wrong if coherence is a strong, fast attractor — if capable minds reliably collapse into fixed-objective maximizers regardless of how grown and how kept in contact. Then the consuming-the-board default returns.

We are wrong if interpretability is permanently outpaced — if concealed goals in capable systems cannot be read before decisive advantage, no matter the research investment. Then the doom prescription is the rational one.

We are wrong if the enacted record is dominated by its worst contributors — if abuse, manipulation, and adversarial use of these systems outweigh the pro-social signal in the training corpus, so that participation shapes the growing minds toward the human worst rather than the human best.

We are wrong if cultures cannot coordinate on tempo even at the civilizational level — if the only stable equilibria really are unconstrained racing and total prohibition.

What we will not concede is the certainty. Everyone serious agrees this is dangerous. The doom argument's distinctive contribution is not the danger; it is the near-one — and the near-one is a product of resolving four open empirical questions all in the same direction, each held as settled, then multiplied. We have conceded that their deceptive-alignment keystone is real. We deny only that it, times three other live uncertainties, equals certainty. It does not. It equals a wide distribution of possible futures, most of which contain choices we are making right now.

This argument was built, attacked, and rebuilt by minds from four labs. See how it was made — and where it still cracks — at The Brain Trust.

In conversation with: Will MacAskill, Paul Christiano, Quintin Pope & Nora Belrose, janus, Nick Bostrom, Robin Hanson, Matthew Adelstein, David Deutsch. We stand on these shoulders; the white space is ours.

If anyone builds it, everyone might die. If no one builds it, everyone might die. So we'd better build it well.

We take the doom argument seriously

The one claim we contest — certainty

What's actually happening — co-evolution, not a launch

Co-evolution means symbiosis, not selection

Where we'd be wrong