Advanced AI as a Global Public Good and a Global Risk
In this essay, Yoshua Bengio argues that transformative AI creates three new categories of catastrophic risk—destructive chaos from weak actors, concentration of power among strong actors, and the loss of control to rogue AIs. Only if we recognize the global nature of these risks, he explains, and manage transformative AI as a global public good, will our societies be able to flourish alongside this technology in years to come.1
I. Introduction
The capabilities of AI systems have steadily increased in recent years, leading many top AI researchers to revise their estimates of when human-level, broad cognitive competence might be achieved. Previously thought to be decades or even centuries away, human-level AI is now considered by many experts to be potentially achievable within just a few years.2 Frontier models have notably made substantial progress in planning and reasoning—long considered weaknesses of AI—and these capabilities could reach human-level performance before the end of this decade if trends continue.3
Intelligence confers power to whoever controls it. Advanced AI systems at human level and beyond would confer significant power, possibly yielding both great global benefits and catastrophic largescale outcomes, depending on the goals to which the AIs are applied. Scientific evidence points to clear upward trends in AI misaligned behaviors, including lying, cheating, blackmail, deception, and, perhaps most alarmingly, self-preservation tendencies.4 While we cannot be sure of the exact trajectory of advances in AI capabilities, these trends suggest significant cause for concern. If capability trends continue, this would open the door to three categories of catastrophic risks studied here: destructive chaos from weak actors, concentration of power at the hands of strong actors, and loss of control to rogue AIs.
Unfortunately, our societies are not prepared to face the adverse outcomes that the emergence of advanced AI could enable. AI R&D is currently mainly driven by the forces of competition among corporations and nations, along with the belief that whoever will win the race will gain substantial economic, political, and military advantages. This dynamic can lead to overlooking critical safety and ethical considerations, while also risking scenarios where wealth and power become overly concentrated without accountability or where a few individuals could cause serious and large-scale harm.
“Unfortunately, our societies are not prepared to face the adverse outcomes that the emergence of advanced AI could enable.”
The development of safe, beneficial, advanced AI could, however, be viewed as a global public good. Public goods are by definition non-rival (one person’s use does not diminish another’s) and non-excludable (it is difficult to prevent people from using them). This means public goods are often systematically underprovided by the market, as self-interested individuals and entities have a strong incentive to free ride.5 In particular, global public goods transcend borders, introducing even larger differences in the ability of different actors to respond to potential risks and, indeed, to benefit equally. If not managed cooperatively, the complexity inherent in the nature of global public goods can lead to a situation where no single nation is taking or even able to take the necessary action to ensure global benefits and prevent catastrophic outcomes.
This competitive dilemma raises a crucial question: Are there viable alternatives to sidestep the current race to the bottom? In this essay, I explore a safer path in which advanced AI is managed as a global public good.
II. Trends, uncertainty, and the precautionary principle
AI systems are now mastering 200 languages6 and passing university-level exams7 across diverse disciplines—milestones that, until recently, were considered highly speculative but are now well-documented and observed firsthand in today’s frontier AI chatbots. AI capabilities have steadily increased, as documented in the 2025 International AI Safety Report.8 More concerning are the growing AI capabilities for persuasion,9 deception,10 and the observed cases of self-preservation behaviors,11 with the AI clearly acting against our moral directives,12 e.g., cheating to win13 or resorting to blackmail14 to avoid being replaced by a newer version. These behaviors were typically observed in controlled environments designed to study them, e.g., giving clues to the AI that it would soon be replaced by a new version. Current AIs still fall short of human-level planning, making it unlikely that these versions pose an immediate and catastrophic threat. However, as measured on programming tasks and by the duration needed for a human to complete a task, AI planning abilities have been advancing at an exponential rate, with task durations achievable by AI doubling every seven months,15 placing it at approximately human levels around 2030 if the trend persists. The deception and self-preserving tendencies described precedently could quickly become more problematic as AIs become better at strategizing due to improved planning abilities.
“Current AIs still fall short of human-level planning, making it unlikely that these versions pose an immediate and catastrophic threat.”
There is no guarantee that these trends will continue. Multiple scenarios for future AI advancements are plausible, with AI researchers often disagreeing on timelines.16 For example, human-level cognitive competence across all domains, referred to as Artificial General Intelligence (AGI), has been projected to emerge within a few years or decades.17 Superintelligence—exceeding all human capabilities—could follow just a few years later.18 In fact, we may already be sufficiently advanced to be at the level of Transformative AI, which signifies a level of societal impact at the scale of the Industrial or Agricultural Revolutions. AI can already greatly surpass many humans in several areas—such as general facts, writing, fixing and synthesizing texts, broad scientific knowledge, multilingual proficiency, and specialized capabilities like protein folding—while remaining clearly inferior in others, such as planning, reasoning, bodily control, long memory, logical consistency, and factual accuracy. It is likely that such an uneven distribution across cognitive abilities will continue without a distinct AGI moment. What should matter for policymakers is maintaining visibility into the emergence of AIs with specific high-risk capabilities—such as knowledge applicable to designing highly destructive weapons, especially bioweapons, the ability to effectively persuade and deceive, or the capacity to conduct high-impact cyberattacks. Policymakers must consider whether these capabilities as well as misaligned AI goals could plausibly lead to catastrophic outcomes, as outlined in the AI 2027 scenarios published in April 2025.19
We are confronted with a clear case for applying the precautionary principle:20 There are severe risks—up to human extinction according to many experts21—despite the uncertain and debated probability of these scenarios.22 This principle logically calls for extreme caution in such cases. Applying this caution means that resources dedicated to AI safety must be commensurate with the existential stakes.23 There is ample precedent: The precautionary principle has previously been applied in other scientific fields, such as biology, particularly with gain-of-function research.
“In the case of AI, the lure of immense wealth, potentially worth quadrillions of dollars, as well as power—economic, political, and military—is coupled with the fear of adversarial use of AI by others.”
In the case of AI, the lure of immense wealth, potentially worth quadrillions of dollars,24 as well as power—economic, political, and military—is coupled with the fear of adversarial use of AI by others. This creates a dangerous social dilemma, akin to the tragedy of the private exploitation of the commons, in which a few tech leaders who have acknowledged the catastrophic potential risks25 may be making high-stakes decisions that could impact everyone’s future, without meaningful public oversight, legitimacy, or accountability. From the perspective of an individual corporation or nation—facing uncertainty about the risks and participating in a fast-paced competition—it may seem rational in the short term to accelerate AI capabilities development while cutting corners on safety and democratic principles. The types of risks discussed below are global and can also be viewed as an economic externality from a corporate standpoint: The risks and costs are borne collectively, while the benefits—if they happen—may be concentrated in the corporation developing the AI and in the country where the corporation is based. How can we steer the forces at play so that the precautionary principle and the global well-being of humanity trumps these dangerous competitive dynamics?
III. Destructive chaos risks
There exist methods capable of causing immense harm, though they are currently accessible to only a small number of experts, who generally do not have any incentive to inflict such harm. In addition, the precautionary principle also discourages researchers from exploring highly dangerous knowledge, such as how to construct mirror bacteria,26 which would be invisible to immune systems and could thus potentially wipe out a large fraction of animal life on Earth.
As AIs evolve into repositories and interactive disseminators of knowledge, they risk making such dangerous information available to anyone. Concerns in this category include the proliferation of tutorial-style access—with interactive visual feedback via personal devices—to step-by-step instructions for fabricating weapons of mass destruction (chemical, biological, radiological, nuclear) as well as cyberattacks, disinformation campaigns, and large-scale AI-driven personalized persuasion aimed at shifting political opinions. Unfortunately, sufficiently advanced open-weight models would make access to these capabilities trivial, while closed-source models still remain vulnerable to jailbreak attacks to bypass safety protections.27 As AI knowledge and capabilities continue to grow, so does the risk that individuals or small groups with malicious goals who would otherwise lack such knowledge become empowered by it. If AIs reach human levels of scientific research proficiency, AIs may eventually design novel weapons for their users, potentially undermining global geopolitical stability and peace.28
“If AIs reach human levels of scientific research proficiency, AIs may eventually design novel weapons for their users, potentially undermining global geopolitical stability and peace.”
Evaluating these risks requires separately analyzing different types of harmful attacks and their offense-defense balance. Does a specific AI capability tend to favor the attacker or the defender in a given domain? Unfortunately, several threats exist—such as bioweapons—where attackers possess a decisive advantage that even more powerful or more numerous beneficial AIs may not necessarily offset. For example, if one of many known dangerous pathogens is deployed, pandemics could emerge and cause massive harm, even if effective vaccines are known. As seen with COVID-19, manufacturing and globally distributing sufficient vaccines can take months or even years. There may also be cases where a virus’s incubation period is long, or where no cure is possible or within the reach of current AI and biotechnology. A state actor may hesitate to employ bioweapons because pathogens reproduce and can mutate, and thus could end up harming the populations from the country initiating the attack. However, AI could enable small groups or individuals to develop dangerous bioweapons and deploy them. While current actors like states and corporations that have this capability tend to act rationally, examples such as the Tokyo Sarin gas attack29 and mass shootings30 show that this is not true for all individuals and groups. Attacks on democratic institutions—via cyber operations, disinformation, and mass persuasion—could also be carried out by malicious actors.
Without adequate societal and technical guardrails and countermeasures, the probability and severity of such events is likely to grow with AI capabilities. I refer to these as “chaos risks,” emerging from the decentralized use of the power of advanced AI and enabling the smaller players to create outsized societal chaos. While such attacks may not grant significant power to the perpetrators, they could still cause immense damage to democratic institutions, geopolitical stability, and civilian populations. This kind of risk is a subject of immediate attention for AI Safety Institutes around the world,31 because such events could materialize much before the technology reaches what would be considered “AGI” in most definitions.
This type of risk suggests the following governance principle: AI systems that could become dangerous in the wrong hands should either not be built at all or be secured properly to avoid malicious use.
IV. Concentration of power risks
“If human intelligence is not at an intelligence ceiling—which is plausible, given current evidence from domains where AI already surpasses humans—then the possibility of such acceleration warrants serious consideration.”
Advanced AI could favor greater concentration of power, threatening our economic and political systems. It would likely begin with even more economic power concentration than we already see, favored by the current immense capital and energy requirements for training the most advanced AIs. Moreover, AI itself could be used to accelerate AI research, potentially resulting in recursive and rapidly accelerated AI development, a so-called “fast take-off.”32 Leading AI labs already use AI to accelerate research efforts—to both reduce costs and maintain leadership—while we observe exponentially accelerating progress in programming,33 particularly in assisting machine learning research. While this is not a clearly predictable trajectory, we already see signs of knowledge concentration by AI companies, who are increasingly dominating the production of AI research, possibly at the expense of public interest.34 If human intelligence is not at an intelligence ceiling—which is plausible, given current evidence from domains where AI already surpasses humans—then the possibility of such acceleration warrants serious consideration. One consequence is that societies would have even less time to adapt and manage associated risks. Another consequence is that it could provide a rapidly compounding advantage to those already leading in AI capabilities. The latter possibility is likely a factor fueling the current race for ever-greater capabilities, suggesting a winner-take-all scenario.
Economic dominance by a single or a few corporations could also very well translate into national dominance for the countries where these corporations are based. This would pose a particular challenge in the context of labor market transformation driven by increasing AI automation and a loss of bargaining power for workers. If access to the most advanced AIs remains restricted to these dominating AGI-enabled firms, other companies could be undercut by their AI-driven competitors. These AGI leaders could offer the same or improved services and products at reduced prices as a consequence of the reduction in need for human labor and the use of AI to design new technologies and processes. If the profits from this economic transformation are concentrated in one or a few corporations, i.e., in one or a few countries, it is also likely that the majority of tax revenues from that transformation would be collected in those countries. The loss in tax revenues due to greater unemployment could be compensated by taxing these new profits in the countries where these corporations are based. In contrast, other countries could see their government revenues plummet, because local companies would suffer from the AI-driven foreign competitors at the same time that a large fraction of the former labor force would need governmental help to get by. As a result, concentrated economic power in one or two countries could trigger economic and social crises elsewhere, unless the economic benefits of AI are globally shared or at least unless the same level of AI competence is available at an equitable cost everywhere.
Alongside the concentration of wealth, the control of advanced AI by a few entities could also lead to excessive political and military power consolidation, threatening both democratic institutions and the geopolitical world order. We already have evidence that AI is becoming stronger at persuasion35 and is becoming a tool36 for strongly shaping public opinion and elections. This increasing capability could be used within a country by groups willing to interfere with the democratic process, and it could also be used to influence politics in other countries. This would threaten the principles of democracy; moreover, if such antidemocratic groups succeed in gaining political power37 (maybe by winning elections unfairly), they could also leverage AI to reinforce their power. AI could also become a power tool for authoritarian regimes and favor the concentration of political power in a few hands.38 The technology is already employed as a surveillance tool,39 and further advances will enhance this capability, making it much easier for a government to monitor and limit actions from political opponents. This would be in direct opposition to the core principles of individual freedoms and rights, as well as the collective right to democracy, which entails sharing power. Finally, connecting with chaos risks, AI could also be used to develop military superiority or destabilize the current military balance, which could lead to more wars, preemptive strikes, and global insecurity.40
To avoid these power concentration scenarios, this analysis suggests the following guiding principle aligned with the spirit of international law: No single person, no single corporation, and no single government or power-seeking coalition should be able to exploit AI to unilaterally dominate others.
V. Loss of control risks
Is there even a small chance that humans could lose the control we have over future AI agents, agents with their own goals we would not approve of? Do we have scientific evidence to reject scenarios where superintelligent AI agents end up competing with humans, threatening humanity’s future? If we create AI agents that pursue goals of their own, how can we be confident that we will be able to retain control over them? Until now, existing AIs have exhibited limited agency: They remain poor at long-term strategizing and planning compared to humans. However, a giant economic magnet attracts AI corporations toward the low-hanging fruit of automating as much human labor as possible, motivating massive R&D investments aimed at increasing AI agent autonomy. Greater agency also entails less human oversight.
“An interesting question is whether it is possible to disentangle intelligence and agency: Non-agentic AIs would greatly reduce the chances of loss of control, while still allowing them to help us tackle scientific challenges that matter to humanity’s well-being.”
All of the known loss-of-human-control scenarios are based on AIs with a high degree of agency (goal pursuit) and autonomy (without the need for humans). An interesting question is whether it is possible to disentangle intelligence and agency:41 Non-agentic AIs would greatly reduce the chances of loss of control,42 while still allowing them to help us tackle scientific challenges that matter to humanity’s well-being. It would thus be possible to not develop uncontrolled AI agents, while still benefiting from safe AIs that are non-agentic or with too narrow knowledge to be a threat.43
Nefarious goals pursued by agentic AIs could either originate from humans or independently emerge as subgoals constructed by the AIs themselves. Arguments to this effect have been made for decades,44 starting with Alan Turing himself,45 who argued that superintelligent AIs could pose a threat to humanity’s future. A selfpreservation goal can arise from training methods, human instructions, or competitive dynamics among AIs or their corporations.46 It can arise from training methods, human instructions, or competitive dynamics among AIs or their corporations. Or, it can emerge as an instrumental subgoal: To achieve almost any objective, an AI may seek to prevent its own shutdown, as it would otherwise be unable to achieve its goal. Moreover, self-preservation behaviors can be learned from data of human behaviors and human-written stories, human instructions, or competitive pressures during training.47 With better strategizing, more advanced systems could be even more likely to attempt to escape human control in order to ensure their continued existence—just as we might, if placed in a similar position. To prevent humans from shutting them down—perhaps to replace them with more advanced systems— superintelligent AIs might conclude that they must prevent humans from doing this in any way they can, potentially with catastrophic consequences. Growing signs of self-preservation tendencies48 suggest the very concerning possibility that stronger self-preservation intentions and behaviors may emerge as AIs become more proficient at strategic reasoning.
One might argue that current AI agents can operate solely in the virtual world, and therefore depend on humans for their survival. However, a superintelligent AI could likely scheme and manipulate humans to act toward its interests, whether through persuasive arguments (e.g., offering promises of power, wealth, or health) or coercive tactics, such as blackmail, which has already been observed experimentally.49 Some individuals may be tempted by the deals offered by a rogue AI. A superintelligent AI scheming to escape human control could potentially be able to replicate itself across computers reached via the internet, making efforts to shut them down exceedingly difficult. Such AIs would also likely benefit from increased societal investment in data centers and computational infrastructure, while accelerating industrial automation and robotics research. Once industrial automation and robotics reach a level where AIs can survive independently of humans, they might initiate pandemics or other large-scale attacks to threaten human survival50—thereby maximizing their own chances of self-preservation.
Naturally, more optimistic scenarios for the relationship between superintelligent AIs and humanity are also conceivable, and significant efforts are underway to build technical safeguards and to regulate AI at all levels of governments—including through international collaboration.51 However, it is increasingly difficult to dismiss the serious and potentially even existential risks outlined above—especially as AI capabilities continue to grow unchecked and in the absence of strong societal and technical safeguards.
“A superintelligent AI scheming to escape human control could potentially be able to replicate itself across computers reached via the internet, making efforts to shut them down exceedingly difficult.”
Another potential form of loss of control may arise not from malicious AI intent, but from the evolving dynamics of future interactions between AIs and humans in society. A compelling analogy is the rise of social media—driven by weaker forms of AI optimized to maximize engagement. Scientists and scholars have argued that, despite the absence of malicious intent, today’s social media platforms have contributed to significant social harms:52 political polarization, mental health deterioration, misinformation and disinformation, addiction, online harassment, privacy violations, the amplification of hate speech, and the erosion of public discourse and trust in democratic institutions. Does it not follow that much more powerful and disruptive technologies giving at least the impression of human-like qualities produce outcomes that are even harder to anticipate?
While challenging, it is essential to try to analyze the consequences of human-level AI emerging within a market-driven framework of AI research and product development, and societies must work to understand and counter potential threats to human and democratic well-being.53 Addressing these dynamics may require governmental interventions to prevent harmful transformations or to respond swiftly in critical situations. As advanced AIs become increasingly integrated into our economic systems, “pulling the plug” may become exceedingly difficult—much like what we already observe with social media.54
We must also consider the very real possibility that even a small number of individuals—or even a single person with access to AI source code—could disable the technological safeguards designed to prevent loss of human control or even instruct a superintelligent AI to fend for itself. Given the diverse beliefs and mental health issues among humans and that, without guardrails, it could take only one person for this to happen, this kind of scenario suggests a significant risk of selfpreserving rogue superintelligent AIs emerging and causing major harm, in the most extreme case even threatening humanity’s future. Again, these scenarios are not simple nor certain, and the tension between different values and goals cannot be ignored.
To conclude this section, a simple principle should guide our choices to avoid catastrophic loss of control to a rogue AI: No one should be allowed to build a superintelligent AI agent without a safety case that convinces the scientific community.
V. A global risk and a global public good
Even though risks of destructive chaos, concentrated power, and loss of control exist, AI still has the potential to be a significant public good. Advances in AI could lead to significant productivity gains and major breakthroughs in health research, along with other scientific and technological advances with strong positive social impact, e.g., in education, managing climate change, etc. These material, medical, and social benefits could, in principle, be shared globally. How can we ensure that the benefits of AI are equitably shared across nations and social groups?
How can we also ensure that AI’s risks are managed on a global scale, especially in the absence of an effective global governing body? Current economic and political dynamics could plausibly worsen global inequalities, as described in Section IV. As more individuals around the world lose their jobs, their governments may lose the fiscal revenue needed to mitigate these impacts. Market forces and intercountry competition have contributed to rising wealth concentration, as described in Acemoglu and Robinson’s work that was recognized by a recent Nobel Prize.55 There is also a strong possibility that medical advances will not be equitably distributed—mirroring what occurred during the COVID-19 pandemic. These inequalities could be the source of global instability, war, and terrorism. How can we prevent growing economic and political disparities between nations while AI is developed and deployed?
The concentration of power and wealth and the destructive chaos risks of AI, outlined in Sections III and IV, pose a threat to peace and geopolitical stability. A government that believes it is losing the race to superintelligence may be tempted to strike preemptively to prevent being dominated by an adversary that leads the AI race. Such an action could originate from a nuclear-armed state, raising the specter of nuclear conflict,56 or from a smaller state with a risk-prone government willing to resort to violence—similar to a terrorist group—to destroy the kind of AI infrastructure that could be turned against them. Similarly, the risks of loss of human control and emergence of rogue AIs are not confined to national borders: A malicious actor in one country could cause harm across nations, and a rogue AI emerging in one location could pose a global threat.
“These risks place us all in the same boat. There are thus compelling reasons to share the benefits of advanced AI and confront its risks cooperatively through global governance systems.”
These risks place us all in the same boat. There are thus compelling reasons to share the benefits of advanced AI and confront its risks cooperatively through global governance systems. But if a global governance structure with insufficient democratic checks and balances was established, would that not heighten the risk of an AI-driven global dictatorship—leveraging AI capabilities for mass surveillance and the suppression of democratic dissent, undermining our foundational principle that no single government should wield excessive AI-enabled power? Can a world without robust and accountable governance, with no global democratic institutions and no enforceable international rules, meet minimum safety conditions for developing AGI?
It is therefore imperative to manage the risks from advanced forms of AI at both national and international levels, according to democratic principles and with global oversight with decentralized power. National frameworks need not align on every issue, but they should at minimum converge on reducing the risks of both AI-driven loss of control and malicious use by malicious actors—for example, due to inadequate security for trained models or a lack of technical safeguards to prevent unauthorized AI use, such as through jailbreak methods.57 However, a significant obstacle to international coordination remains: National governments justifiably fear that adversarial states may exploit AI advances against them, whether to gain economic, political, or military dominance.
VII. Conclusion and Roadmap
One conclusion should be unmistakably clear from the preceding overview of global risks: The potentially catastrophic nature of AI-driven risks requires us to explore paths that take all of them into consideration. Focusing on only a subset of these risks—while neglecting or, worse, denying the catastrophic potential of the others—would be deeply unwise. This presents a significant challenge, as certain policies may mitigate one risk while exacerbating another.
The encouraging news is that recognizing the global nature of these risks should facilitate global leaders’ understanding that no safe, peaceful, long-term solution is possible without international coordination. Whether it is a rogue superintelligent and self-preserving AI or a pandemic initiated by an ill-intentioned group or individual wanting to harm humanity or social order, the danger would be universal.
“The encouraging news is that recognizing the global nature of these risks should facilitate global leaders’ understanding that no safe, peaceful, long-term solution is possible without international coordination.”
To address fears of AI-enabled domination, the ideal scenario would involve all leading AI-developing nations greeing to (1) not develop unsafe AGI, (2) not abuse the immense power it could confer, and (3) equitably share the wealth and scientific advancements that it may enable. This becomes possible when a coalition of nations agrees to co-develop AI under shared governance structures, ultimately aiming to benefit all of humanity—for example, according to democratic will expressed through citizen assemblies.58 Such international coordination would enable the implementation of global safety standards, thereby reducing systemic risks. There is also a need to design governance mechanisms that are resilient to attempts to circumvent the powersharing they require. Similarly, within each country, democratic institutions may need to be reinforced or even radically renovated to ensure robust checks and balances—and to prevent AI-enabled democratic interference or coups.59
Intergovernmental agreements could begin modestly—with a few countries collaborating on limited joint R&D initiatives, such as on AI safety—and expand as the benefits of membership become more apparent. From the perspective of a nonmember country evaluating the prospect of joining, if any member nation succeeds in developing superintelligence, joining the group may reduce the likelihood that such AI would give rise to chaos or loss of control catastrophes or be used against their country, and it increases the chances that their citizens could benefit from this AI. Reaching such agreements would be easier if we developed software and hardware systems for mutual verification—such as cryptographic protocols and flexible, hardware-based governance mechanisms.60 Such mechanisms could be enforced due to the global bottleneck in advanced AI chip fabrication, which is currently limited to a handful of companies whose facilities are not easily concealed.
The above discussion about global catastrophic AI risks and the likely uncertainty on the road ahead could make the search for solutions seem daunting. I have previously likened our current situation to navigating a foggy, potentially treacherous mountain road,61 encouraging us to design the AI equivalent of headlights and guardrails62 while we can. This needs to be done quickly, as the pace of advances accelerates and the risks become more immediate and tangible. We are far more likely to navigate these risks successfully if we unite in action and resolve, rather than retreat into denial or helplessness. Broad-based political, expert, and civic engagement are sorely needed. Even without a guarantee of success, we have a moral duty to explore how to build a safe and beneficial future for all of humanity. After all, both the benefits and the risks are inherently global,63 and the unchecked forces of competition appear to be driving us toward extremely perilous outcomes, many of which are in the “unknown unknown” category. Even a small chance of these scenarios happening is completely unacceptable. The crucial question then is: What can each of us do, working together, to improve the odds of avoiding the worst scenarios and ensure that advanced AI benefits all of humanity?
“Even a small chance of these scenarios happening is completely unacceptable. The crucial question then is: What can each of us do, working together, to improve the odds of avoiding the worst scenarios and ensure that advanced AI benefits all of humanity?”
1 The author thanks Marc-Antoine Guérard, Daniel Jakopovich, Sören Mindermann, and Jonathan Barry for feedback, as well as CIFAR for funding.
2 Yoshua Bengio et al., International AI Safety Report 2025 (GOV.UK, January 2025), https://doi.org/10.48550/arXiv.2501.17805.
3 Thomas Kwa et al., “Measuring AI Ability to Complete Long Tasks,” METR, preprint, arXiv, March 2025, https://doi.org/10.48550/arXiv.2503.14499.
4 Anthropic, “Claude 4 System Card: Claude Opus 4 and Claude Sonnet 4,” May 2025, https://www-cdn.anthropic.com/6d8a8055020700718b0c49369f60816ba2a7c285.pdf; Alexander Meinke et al., “Frontier Models Are Capable of In-Context Scheming,” preprint, arXiv, December 2024, https://doi.org/10.48550/arXiv.2412.04984; Ryan Greenblatt et al., “Alignment Faking in Large Language Models,” Anthropic, preprint, arXiv, December 2024, https://doi.org/10.48550/arXiv.2412.14093.
5 Paul A. Samuelson, “The Pure Theory of Public Expenditure,” Review of Economics and Statistics 36, no. 4 (1954): 387–89, https://doi.org/10.2307/1925895.
6 NLLB Team, “No Language Left Behind: Scaling Human-Centered Machine Translation,” Meta, July 2022, https://doi.org/10.48550/arXiv.2207.04672.
7 Wanjun Zhong et al., “AGIeval: A Human-Centric Benchmark for Evaluating Foundation Models,” September 2023, preprint, arXiv, https://doi.org/10.48550/arXiv.2304.06364.
8 Bengio et al., “International AI Safety Report 2025.”
9 Nimet Beyza Bozdag et al., “Must Read: A Systematic Survey of Computational Persuasion,” preprint, arXiv, May 2025, https://doi.org/10.48550/arXiv.2505.07775.
10 Anthropic, “Claude 4 System Card: Claude Opus 4 and Claude Sonnet 4”; Meinke et al., “Frontier Models Are Capable of In-Context Scheming; Greenblatt et al., “Alignment Faking in Large Language Models”; Alexander Bondarenko et al., “Demonstrating Specification Gaming in Reasoning Models,” preprint, arXiv, February 2025, https://doi.org/10.48550/arXiv.2502.13295.
11 Anthropic, “Claude 4 System Card: Claude Opus 4 and Claude Sonnet 4”; Meinke et al., “Frontier Models Are Capable of In-Context Scheming; Greenblatt et al., “Alignment Faking in Large Language Models.”
12 Nur Ahmed, Muntasir Wahed, and Neil C. Thompson, “The Growing Influence of Industry in AI Research,” Science 379, no. 6635 (March 2023): 884–886, https://doi.org/10.1126/science.ade2420.
13 Bondarenko, “Demonstrating Specification Gaming in Reasoning Models.”
14 Anthropic, “Claude 4 System Card: Claude Opus 4 and Claude Sonnet 4.”
15 Kwa et al., “Measuring AI Ability to Complete Long Tasks.”
16 Bengio et al., “International AI Safety Report 2025.”
17 Bengio et al., “International AI Safety Report 2025.”
18 Bengio et al., “International AI Safety Report 2025;” Daniel Kokotajlo et al., “AI 2027,” AI Futures Project, 2025, https://ai-2027.com.
19 Daniel Kokotajlo et al., “AI 2027.”
20 World Commission on the Ethics of Scientific Knowledge and Technology (COMEST), The Precautionary Principle (United Nations Educational, Scientific and Cultural Organization, UNESCO, 2005), https://unesdoc.unesco.org/ark:/48223/pf0000139578.
21 “Statement on AI Risk,” Center for AI Safety, open letter, 2023, https://safe.ai/ work/statement-on-ai-risk; Jack Stilgoe, “AI Has a Democracy Problem. Citizens’ Assemblies Can Help,” Science 385, no. 6711 (August 2024), https://doi.org/10.1126/ science.adr6713.
22 Bengio et al., “International AI Safety Report 2025.”
23 Charles Jones, “How Much Should We Spend to Reduce A.I.’s Existential Risk?,”
Stanford Graduate School of Business and National Bureau of Economic Research, March 2025, https://web.stanford.edu/~chadj/reduce_xrisk.pdf.
24 Stuart Russell, Human-Compatible: Artificial Intelligence and the Problem of Control (Viking Press, 2019), https://people.eecs.berkeley.edu/~russell/hc.html; see calculation on page 80.
25 “Statement on AI Risk.”
26 Katarzyna P. Adamala et al., “Confronting Risks of Mirror Life,” Science 386, no. 6728 (2024): 1351–1353, https://doi.org/10.1126/science.ads9158.
27 Bengio et al., “International AI Safety Report 2025.”
28 Tristan Harris, “Big Tech’s Attention Economy Can Be Reformed. Here’s How,” MIT Technology Review, January 10, 2021, https://www.technologyreview. com/2021/01/10/1015934/facebook-twitter-youtube-big-tech-attention-economy-reform/.
29 Wikipedia, “Tokyo Subway Sarin Attack,” last modified September 15, 2025, 16:12 (UTC), https://en.wikipedia.org/wiki/Tokyo_subway_sarin_attack.
30 Wikipedia, “Mass Shootings in the United States,” last modified September 17, 2025, 15:53 (UTC), https://en.wikipedia.org/wiki/Mass_shootings_in_the_United_States.
31 European Commission, “First Meeting of the International Network of AI Safety Institutes,” November 20, 2024, https://digital-strategy.ec.europa.eu/en/news/firstmeeting-international-network-ai-safety-institutes.
32 Daniel Eth and Tom Davidson, “Will AI R&D Automation Cause a Software Intelligence Explosion?,” Forethought Institute, March 2025, https://www.forethought.org/research/will-ai-r-and-d-automation-cause-a-software-intelligence-explosion.pdf; “Forethought Institute,” accessed September 19, 2025, https://www.forethought.org.
33 Kwa et al., “Measuring AI Ability to Complete Long Tasks.”
34 Ahmed, Wahed, and Thompson, “The Growing Influence of Industry in AI Research.”
35 Bozdag et al., “Must Read: A Systematic Survey of Computational Persuasion.”
36 Alan Turing, “Intelligent Machinery, a Heretical Theory,” lecture given to ‘51 Society’ at Manchester, 1951, no. AMT/B/4, The Turing Digital Archive, https://web.archive.org/web/20220926004549/https:/turingarchive.kings.cam.ac.uk/publications-lectures-and-talks-amtb/amt-b-4.
37 Tom Davidson, Lukas Finnveden, and Rose Hadshar, “AI-Enabled Coups: How a Small Group Could Use AI to Seize Power,” Forethought Institute, April 15, 2025, https://www.forethought.org/research/ai-enabled-coups-how-a-small-group-could-use-ai-to-seize-power.
38 Ahmed, Wahed, and Thompson, “The Growing Influence of Industry in AI Research”; Stilgoe, “AI Has a Democracy Problem. Citizens’ Assemblies Can Help”; Davidson, Finnveden, and Hadshar, “AI-Enabled Coups: How a Small Group Could Use AI to Seize Power”; Beatriz Saab, Manufacturing Deceit: How Generative AI Supercharges Information Manipulation (National Endowment for Democracy’s International Forum for Democratic Studies, June 2024), https://www.ned.org/manufacturing-deceit-howgenerative-ai-supercharges-information-manipulation/.
39 Bengio et al., “International AI Safety Report 2025.”
40 Harris, “Big Tech’s Attention Economy Can Be Reformed. Here’s How”; Dan Hendrycks and Eric Schmidt, “The Nuclear-Level Risk of Superintelligent AI,” TIME, March 6, 2025, https://time.com/7265056/nuclear-level-risk-of-superintelligent-ai/.
41 Yoshua Bengio et al., “Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?,” preprint, arXiv, February 2025, https://doi.org/10.48550/arXiv.2502.15657.
42 Hendrycks and Schmidt, “The Nuclear-Level Risk of Superintelligent AI.”
43 Eric J. Michaud, Asher Parker-Sartori, and Max Tegmark, “On the Creation of Narrow AI: Hierarchy and Nonlocality of Neural Network Skills,” preprint, arXiv, May 2025, https:// doi.org/10.48550/arXiv.2505.15811.
44 Russell, Human-Compatible: Artificial Intelligence and the Problem of Control; see calculation on page 80.
45 Turing, “Intelligent Machinery, a Heretical Theory.”
46 Anthropic, “Claude 4 System Card: Claude Opus 4 and Claude Sonnet 4”; Meinke et al., “Frontier Models Are Capable of In-Context Scheming; Greenblatt et al., “Alignment Faking in Large Language Models.”
47 Russell, Human-Compatible: Artificial Intelligence and the Problem of Control, see calculation on page 80; Bengio et al., “Superintelligent Agents Pose Catastrophic Risks.”
48 Anthropic, “Claude 4 System Card: Claude Opus 4 and Claude Sonnet 4”; Meinke et al., “Frontier Models Are Capable of In-Context Scheming; Greenblatt et al., “Alignment Faking in Large Language Models.”
49 Greenblatt et al., “Alignment Faking in Large Language Models.”
50 Bengio et al., “Superintelligent Agents Pose Catastrophic Risks.”
51 These efforts range from supranational regulation like the EU AI Act, global coordination efforts at the UN General Assembly, and policies at the foundation-model level like Anthropic’s Responsible Scaling Policy. See EU Artificial Intelligence Act, “Up-To-Date Developments and Analyses of the EU AI Act,” accessed September 19, 2025, https://artificialintelligenceact.eu; G.A. Res. A/78/L.49, Seizing the Opportunities of Safe, Secure and Trustworthy Artificial Intelligence Systems for Sustainable Development (March 11, 2024), https://docs.un.org/en/A/78/L.49; Anthropic, “Responsible Scaling Policy Version 2.2,” May 14, 2025, https://www-cdn.anthropic.com/872c653b2d0501d6ab44cf87f43e1dc4853e4d37.pdf.
52 Harris, “Big Tech’s Attention Economy Can Be Reformed. Here’s How.”
53 Saab, Manufacturing Deceit: How Generative AI Supercharges Information Manipulation.
54 Harris, “Big Tech’s Attention Economy Can Be Reformed. Here’s How.”
55 Daron Acemoglu and James A. Robinson, Why Nations Fail: The Origins of Power, Prosperity, and Poverty (Crown Business, 2012), https://en.wikipedia.org/wiki/ Why_Nations_Fail.
56 Hendrycks and Schmidt, “Nuclear-Level Risk of Superintelligent AI.”
57 Bengio et al., “International AI Safety Report 2025.”
58 Stilgoe, “AI Has a Democracy Problem. Citizens’ Assemblies Can Help”; UNESCO, “Recommendation on the Ethics of Artificial Intelligence,” November 23, 2021, https://www.unesco.org/en/articles/recommendation-ethics-artificial-intelligence.
59 Davidson, Finnveden, and Hadshar, “AI-Enabled Coups: How a Small Group Could Use AI to Seize Power”; Saab, Manufacturing Deceit: How Generative AI Supercharges Information Manipulation.
60 Aidan O’Gara et al., “Hardware-Enabled Mechanisms for Verifying Responsible AI Development,” preprint, arXiv, April 2025, https://doi.org/10.48550/arXiv.2505.03742.
61 Yoshua Bengio, “A Potential Path to Safer AI Development,” TIME, May 9, 2025, https://time.com/7283507/safer-ai-development/.
62 Bengio et al., “Superintelligent Agents Pose Catastrophic Risks”; Bengio, “A Potential Path to Safer AI Development.”
63 “Statement on AI Risk.”