Update (5-15-2024): I wrote that “it appears that not all of the leading AI labs are honoring the voluntary agreements they made at [AI Safety Summit],” citing a Politico article. However, after seeing more discussion about it (e.g. here), I am now highly uncertain about whether the labs made specific commitments, what those commitments were, and whether commitments were broken. These seem like important questions, so I hope that we can get more clarity.

MIRI updates:

  • MIRI is shutting down the Visible Thoughts Project.
    • We originally announced the project in November of 2021. At the time we were hoping we could build a new type of data set for training models to exhibit more of their inner workings. MIRI leadership is pessimistic about humanity’s ability to solve the alignment problem in time, but this was an idea that seemed relatively promising to us, albeit still a longshot.
    • We also hoped that the $1+ million bounty on the project might attract someone who could build an organization to build the data set. Many of MIRI’s ambitions are bottlenecked on executive capacity, and we hoped that we might find individuals (and/or a process) that could help us spin up more projects without requiring a large amount of oversight from MIRI leadership.
    • Neither hope played out, and in the intervening time, the ML field has moved on. (ML is a fast-moving field, and alignment researchers are working on a deadline; a data set we’d find useful if we could start working with it in 2022 isn’t necessarily still useful if it would only become available 2+ years later.) We would like to thank the many writers and other support staff who contributed over the last two and a half years.
  • Mitchell Howe and Joe Rogero joined the comms team as writers. Mitch is a longtime MIRI supporter with a background in education, and Joe is a former reliability engineer who has facilitated courses for BlueDot Impact. We’re excited to have their help in transmitting MIRI’s views to a broad audience.
  • Additionally, Daniel Filan will soon begin working with MIRI’s new Technical Governance Team part-time as a technical writer. Daniel is the host of two podcasts: AXRP, and The Filan Cabinet. As a technical writer, Daniel will help to scale up our research output and make the Technical Governance Team’s research legible to key audiences.
  • The Technical Governance Team submitted responses to the NTIA’s request for comment on open-weight AI models, the United Nations’ request for feedback on the Governing AI for Humanity interim report. and the Office of Management and Budget’s request for information on AI procurement in government.
  • Eliezer Yudkowsky spoke with Semafor for a piece about the risks of expanding the definition of “AI safety”. “You want different names for the project of ‘having AIs not kill everyone’ and ‘have AIs used by banks make fair loans.”
     

A number of important developments in the larger world occurred during the MIRI Newsletter’s hiatus from July 2022 to April 2024To recap just a few of these:

  • In November of 2022, OpenAI released ChatGPT, a chatbot application that reportedly gained 100 million users within 2 months of its launch. As we mentioned in our 2024 strategy update, GPT-3.5 and GPT-4 were more impressive than some of the MIRI team expected, representing a pessimistic update for some of us “about how plausible it is that humanity could build world-destroying AGI with relatively few (or no) additional algorithmic advances”. ChatGPT’s success significantly increased public awareness of AI and sparked much of the post-2022 conversation about AI risk.
  • In March of 2023, the Future of Life Institute released an open letter calling for a six-month moratorium on training runs for AI systems stronger than GPT-4. Following the letter’s release, Eliezer wrote in TIME that a six-month pause is not enough and that an indefinite worldwide moratorium is needed to avert catastrophe.
  • In May of 2023, the Center for AI Safety released a one-sentence statement, “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” We were especially pleased with this statement, because it focused attention on existential risk in particular, and did so in a way that would be maximally understandable to policymakers and the general public. The list of signatories included the three most cited researchers in AI (Bengio, Hinton, and Sutskever) and leadership at all three of the leading AI labs (Anthropic, DeepMind, and OpenAI).
  • In October of 2023, President Biden signed an executive order on AI. The order’s provisions include reporting requirements for some large models, rules for federal procurement of AI products, and a directive for the NIST to develop safety standards for generative AI.
  • In November of 2023, the UK’s AI Safety Summit brought experts and world leaders together to discuss risks from AI. The summit showed some promise, but its outcomes so far have seemed limited. Six months later, it appears that not all of the leading AI labs are honoring the voluntary agreements they made at the summit.
  • In March of 2024, the European Union passed the AI Act, a broad regulatory framework for the use of all AI systems, organized into risk categories. The act includes evaluation and reporting requirements for “general-purpose AI” systems trained with more than 10^25 FLOP.
  • Over the past year and a half, AI systems have exhibited many new capabilities, including generating high-quality imagesexpert-level Strategoexpert-level Diplomacywriting codegenerating musicgenerating videoacing AP examssolving Olympiad-level geometry problems, and winning drone races against human world-champions.

You can subscribe to the MIRI Newsletter here.

New Comment
1 comment, sorted by Click to highlight new comments since:

I wrote that “it appears that not all of the leading AI labs are honoring the voluntary agreements they made at [AI Safety Summit],” citing a Politico article. However, after seeing more discussion about it (e.g. here), I am now highly uncertain about whether the labs made specific commitments, what those commitments were, and whether commitments were broken. These seem like important questions, so I hope that we can get more clarity.