Conclusion
This paper has argued that even incremental AI development could lead to an existential catastrophe through the gradual erosion of human influence over key societal systems, generalizing the argument from previous work studying how AI progress may influence these systems in isolation [0, 0].
Unlike scenarios involving sudden technological discontinuities or overtly hostile AI systems, the risk we describe could emerge from the natural evolution of current trends and incentives. The displacement of human cognition and labor across multiple domains could weaken both explicit control mechanisms and the implicit alignment that emerges from human participation.
Our analysis suggests three particularly concerning features of this scenario:
- First, the loss of human influence could occur even without any single transformative advance in AI capabilities. Instead, it might emerge from the cumulative effect of many smaller shifts in how societal systems operate and interact.
- Second, the effect can be driven not by any deliberate or even agentic action by AIs, but simply by individuals and institutions following their local incentives.
- Third, meaningfully preventing these risks will require substantial effort: more research and data collection, international coordination, comprehensive regulation, and major societal interventions grounded in novel fundamental research.
A distinctive feature of this challenge is that it may subvert our traditional mechanisms for course-correction, and cause types of harm we cannot easily conceptualize or even recognize in advance, potentially leaving us in a position from which it is impossible to recover.
Nonetheless, we do believe it is currently possible to intervene, and we present many avenues for future work spanning both research and governance. By anticipating the risk, carefully moderating the growth of influence from AI, and finding ways to strengthen the influence of humans, we can navigate this risk and capture the proportionate benefits.
Humanity's future may depend not only on whether we can prevent AI systems from pursuing overtly hostile goals, but also on whether we can ensure that the evolution of our fundamental societal systems remains meaningfully guided by human values and preferences. This is both a technical challenge and a broader civilizational one, requiring us to think carefully about what it means for humans to retain genuine influence in an increasingly automated world.
We are grateful to many people for helpful conversations and feedback, including Carl Shulman, Owen Cotton-Barratt, Lionel Levine, Benjamin Hilton, Marie Buhl, Clem von Stengel and Tomáš Gavenčiak. We used Claude Sonnet, Claude Opus and ChatGPT o1 AI models to help with various parts of writing and editing this text.