Mitigating the Risk
The gradual disempowerment scenario described in this paper presents distinct challenges from more commonly discussed AI risk scenarios. Rather than addressing the risk of misaligned AI systems breaking free from human control, we must consider how to maintain human relevance and influence in societal systems that may continue functioning but cease to depend on human participation.
Understanding the Challenge
The core challenge is maintaining alignment between societal systems and human interests when these systems no longer inherently require human labor, participation, or cognition. This may be a bigger challenge than merely preventing AI systems from pursuing overtly harmful goals, as the systems may continue to function as requested locally, while the overall civilizational incentives become increasingly detached from human welfare.
Understanding these risks, and developing potential mitigating strategies, is a highly interdisciplinary endeavor, as the risks may emerge from complex interactions between multiple societal systems, each individually moving away from human influence and control. Solutions need to address multiple domains, and be robust to the problem of mutual reinforcement we describe in Mutual reinforcement. As such, it will likely be necessary to draw on many disparate yet relevant fields: economics, political science, sociology, cultural studies, complex systems, anthropology and institutional theory, for example.
Instead of merely(!) aligning a single, powerful AI system, we need to align one or several complex systems that are at risk of collectively drifting away from human interests. This drift can occur even while each individual AI system successfully follows the local specification of its goals.
Below, we identify four broad categories of intervention: measuring and monitoring the extent of the problem, preventing excessive accumulation of AI influence, strengthening human control over key societal systems, and system-wide alignment. A robust response will require progress in each category.
Estimating Human Disempowerment
To effectively address gradual disempowerment, we need to be able to detect and quantify it. This is challenging partly because, for many of the systems we would hope to measure, we lack external reference points to measure their degree of alignment. Nonetheless, several approaches warrant investigation.
System-Specific Metrics
For each of the major societal systems we have described, we can develop metrics tracking human influence:
-
Economic metrics: Beyond traditional measures like labor share of GDP, we should also measure AI share of GDP, as a distinct category from either labor or capital. We need metrics capturing human control over economic decisions. This could include tracking the fraction of major corporate decisions made primarily by AI systems, the scale of unsupervised AI spending, and patterns in wealth distribution between AI-heavy and human-centric industries.
-
Cultural metrics: We can measure the proportion of widely-consumed content created primarily by humans versus AI, track the prevalence and depth of human-AI interpersonal relationships, and analyze how cultural transmission patterns change as AI becomes more prevalent. While most machine learning benchmarks and evaluations focus on quantifiable STEM tasks, we should develop a broad spectrum of evaluations focusing on ability of frontier AI systems to influence humans on emotional level, write persuasive prose, or create new ideologies. Also, we should strengthen runtime monitoring of deployed AI systems and of the influence they have on their users.
-
Political metrics: Key indicators might include the complexity of legislation (as a proxy for human comprehensibility); the role of AI systems in legal processes, policy formation, and security apparatuses; and the effectiveness of traditional democratic mechanisms in influencing outcomes.
Similar metrics should be developed for more narrow but significant societal systems, like research and education.
Interaction Effects
Given the mutual reinforcement dynamics we describe in Section Mutual reinforcement, it is crucial to track how changes in one domain affect others. This might involve:
- Early warning indicators for concerning feedback loops
- Analysis of AI participation in methods for translating power between societal systems, like lobbying and financial regulation
- Historical analysis of similar dynamics in past technological transitions
Research Priorities
Several fundamental research questions need to be addressed. For example:
- How can we distinguish between beneficial AI augmentation of human capabilities and problematic displacement of human influence?
- What are the key thresholds or tipping points in these systems beyond which human influence becomes critically compromised?
- How can we measure the effectiveness of various intervention strategies?
Preventing Excessive AI Influence
While measurement can help us understand the problem, we also need to consider what direct interventions could be effective in preventing the accumulation of excessive AI influence, including:
- Regulatory frameworks mandating human oversight for critical decisions, limiting AI autonomy in specific domains, and restricting AI ownership of assets or participation in markets
- Progressive taxation of AI-generated revenues both to redistribute resources to humans and to subsidize human participation in key sectors
- Cultural norms supporting human agency and influence, and opposing AI that is overly autonomous or insufficiently accountable
Crucially, these interventions will often involve sacrificing potential value. Furthermore, the more value they sacrifice, the greater the incentive to circumvent them: for example, companies may face strong economic incentives to delegate authority to AIs regardless of the spirit, or letter, of the law.
Similarly, they will be much less effective if they are not widely adopted: if some countries choose to forego the economic benefits of AI to preserve their own alignment with human values, we may find ourselves in a world where the most powerful economies are in states where the population is most disempowered. The success of these interventions will depend on international coordination in the face of increasing pressures.
As such, interventions that seek to limit AI influence will likely serve mostly as stopgaps. Nonetheless, they may be important intermediary steps towards more robust solutions.
Strengthening Human Influence
Beyond preventing excessive AI influence, we need to actively strengthen human control over key societal systems. This will involve both enhancing existing mechanisms, and developing new ones, which may in turn require fundamental research. Approaches in this direction include:
- Developing faster, more representative, and more robust democratic processes
- Requiring AI systems or their outputs to meet high levels of human understandability in order to ensure that humans continue to be able to autonomously navigate domains such as law, institutional processes or science
- Developing AI delegates who can advocate for people's interest with high fidelity, while also being better to keep up with the competitive dynamics that are causing the human replacement
- Making institutions more robust to human obsolescence
- Investing in tools for forecasting future outcomes (such as conditional prediction markets, and tools for collective cooperation and bargaining) in order to increase humanity's ability to anticipate and proactively steer the course
- Research into the relationship between humans and larger multi-agent systems
Importantly, to mitigate the problem effectively, we need to go beyond simply making it easier for humans to influence societal systems: it is unclear, for instance, whether a direct democracy would actually do a better job of satisfying citizen preferences in the long term because, for example, it would leave the state more vulnerable to cultural misalignment. A key part of the challenge is clarifying what it even means for large, complex systems to serve the interests of individuals who are accustomed to thinking on smaller scales.
System-wide Alignment
While the previous approaches focus on specific interventions and measurements, they ultimately depend on having a clearer understanding of what we're trying to achieve. Currently, we lack a compelling positive vision of how highly capable AI systems could be integrated into societal systems while maintaining meaningful human influence. This is not just a matter of technical AI alignment or institutional design, but requires understanding how to align complex, interconnected systems that include both human and artificial components. It seems likely we need fundamental research into what might be called "ecosystem alignment" - understanding how to maintain human values and agency within complex socio-technical systems. This goes beyond traditional approaches to AI alignment focused on individual systems, and beyond traditional institutional design focused purely on human actors. We need new frameworks for thinking about the alignment of an entire civilization of interacting human and artificial components, potentially drawing on fields like systems ecology, institutional economics, and complexity science.