LESSWRONG
LW

Justin Bullock — LessWrong

This is an 8-page comprehensive summary of the results from Threshold 2030: a recent expert conference on economic impacts hosted by Convergence Analysis, Metaculus, and the Future of Life Institute. Please see the linkpost for the full end-to-end report, which is 80 pages of analysis and 100+ pages of raw writing and results from our attendees during the 2-day conference.

Comprehensive Summary

The Threshold 2030 conference brought together 30 leading economists, AI policy experts, and professional forecasters to evaluate the potential economic impacts of artificial intelligence by the year 2030. Held on October 30-31st, 2024 in Boston, Massachusetts, it spanned two full days and was hosted by Convergence Analysis and Metaculus, with financial support... (read 2852 more words →)

Training Data Attribution: Examining Its Adoption & Use Cases

Deric Cheng

Deric Cheng, Justin Bullock, David_Kristoffersson

Note: This report was conducted in June 2024 and is based on research originally commissioned by the Future of Life Foundation (FLF). The views and opinions expressed in this document are those of the authors and do not represent the positions of FLF.

This report investigates Training Data Attribution (TDA) and its potential importance to and tractability for reducing extreme risks from AI. TDA techniques aim to identify training data points that are especially influential on the behavior of specific model outputs. They are motivated by the question: how would the model's behavior change if one or more data points were removed from or added to the training dataset?

Report structure:

First, we discuss the

... (read 782 more words →)

Training Data Attribution (TDA): Examining Its Adoption & Use Cases

Deric Cheng

Deric Cheng, Justin Bullock, David_Kristoffersson

Report structure:

First, we discuss the

... (read 782 more words →)

Analysis of Global AI Governance Strategies

Sammy Martin

Sammy Martin, Justin Bullock, Corin Katzke

We analyze three prominent strategies for governing transformative AI (TAI) development: Cooperative Development, Strategic Advantage, and Global Moratorium. We evaluate these strategies across varying levels of alignment difficulty and development timelines, examining their effectiveness in preventing catastrophic risks while preserving beneficial AI development.

Our analysis reveals that strategy preferences shift significantly based on these key variables. Cooperative Development proves most effective with longer timelines and easier alignment challenges, offering maximum flexibility and minimal intrinsic risk. Strategic Advantage becomes more viable under shorter timelines or moderate alignment difficulty, particularly when international cooperation appears unlikely, but carries the most intrinsic risk. Global Moratorium emerges as potentially necessary in scenarios with very hard alignment or extremely... (read 10705 more words →)

AI governance needs a theory of victory

Corin Katzke

Corin Katzke, Justin Bullock

This post is part of a series by Convergence Analysis. An earlier post introduced scenario planning for AI risk. In this post, we argue for the importance of a related concept: theories of victory.

In order to improve your game you must study the endgame before everything else; for, whereas the endings can be studied and mastered by themselves, the middlegame and the opening must be studied in relation to the endgame." — José Raúl Capablanca, Last Lectures, 1966

Overview

The central goal of AI governance should be achieving existential security: a state in which existential risk from AI is negligible either indefinitely or for long enough that humanity can carefully plan its future. We can call... (read 5893 more words →)

AI Clarity: An Initial Research Agenda

Justin Bullock

Justin Bullock, Corin Katzke, Zershaaneh Qureshi, David_Kristoffersson

Cross-posted on our website: https://www.convergenceanalysis.org/publications/ai-clarity-an-initial-research-agenda

Cross-posted on the EA Forum: https://forum.effectivealtruism.org/posts/JyhoTRXxYvLfFycXi/ai-clarity-an-initial-research-agenda

Executive Summary

Transformative AI (TAI) has the potential to solve many of humanity's most pressing problems, but it may also pose an existential threat to our future. This significant potential of TAI warrants careful study of the AI’s possible trajectories and their corresponding consequences. Scenarios in which TAI emerges within the next decade are likely among the most treacherous, since society will not have much time to prepare for and adapt to advanced AI.

In response to this need, Convergence Analysis has developed a research program we call AI Clarity. AI Clarity’s research method centers on scenario planning. Scenario planning is an analytical tool used... (read 2333 more words →)

Transformative AI and Scenario Planning for AI X-risk

Elliot Mckernon

Elliot Mckernon, Justin Bullock

This post is part of a series by the AI Clarity team at Convergence Analysis. In our previous post, Corin Katzke reviewed methods for applying scenario planning methods to AI existential risk strategy. In this post, we want to provide the motivation for our focus on transformative AI.

Overview

We argue that “Transformative AI” (TAI) is a useful key milestone to consider for AI scenario analysis; it places the focus on the socio-technical impact of AI and is both widely used and well-defined within the existing AI literature. We briefly explore the literature and provide a definition of TAI. From here we examine TAI as a revolutionary, general purpose technology that could likely be achieved with “competent” AGI. We... (read 2207 more words →)

Replying toTo open-source or to not open-source, that is (an oversimplification of) the question.

Justin Bullock2y

To open-source or to not open-source, that is (an oversimplification of) the question.

Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term "extreme risks" which is defined as "risk of significant physical harm or disruption to key societal functions." I believe they avoid the terms of "extinction risk" and "existential risk," but are implying something not too different with their choice of extreme risks.

For me, I pose the question above as:

"How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment

Justin Bullock2y

To open-source or to not open-source, that is (an oversimplification of) the question.

Thank you for this comment!

I think your point that "The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there)." is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models.

I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. I think these allow for something like risk mitigations for partial open-sourcing but would be less feasible for fully open sourced models where weights... (read more)

Replying toMachine Evolution

Justin Bullock2y

Machine Evolution

Thanks for the comment!

I think your observation that biological evolution is a slow, blind, and undirected process is fair. We try to make this point explicit in our section on natural selection (as a main evolutionary selection pressure for biological evolution) where we say "The natural processes for succeeding or failing in survival and reproduction – natural and sexual selection – are both blind and slow."

For our contribution here we are not trying to dispute this. Instead we're seeking to find analogies to the ways in which machine evolution, which we define as "the process by which machines change over successive generations," may have some underlying similar mechanisms that we can apply... (read more)

To open-source or to not open-source, that is (an oversimplification of) the question.

Justin Bullock

*** This an edited and expanded version of a post I made on X in response to GovAI’s new report“Open-Sourcing Highly Capable Foundation Models” I think the report points in the right direction, but also leaves me with some additional questions. Also, thanks for significant feedback from @David_Kristoffersson, @Elliot_Mckernon, @Corin Katzke, and @cwdicarlo ***

From my vantage point the debate around open-sourcing foundation models became heated as Yann LeCun began advocating for open-sourcing (in particular) Meta's foundation models. This prompted a knee-jerk reaction in the AI Safety community.

The arguments went something like "of course open-sourcing foundation models is a good idea, just LOOK at all the BENEFITS open-sourcing has given us!" for the "pro" crowd,... (read 1283 more words →)

Replying toAI Alignment Breakthroughs this Week [new substack]

Justin Bullock2y

AI Alignment Breakthroughs this Week [new substack]

This is great! Thanks for sharing. I hope you continue to do these.

Machine Evolution

Justin Bullock

Justin Bullock, Elliot Mckernon, cwdicarlo

Epistemic Status: This post explores the relationship between biological evolution and the evolution of our machines. We examine parallels between the evolutionary pressures shaping organisms and machines. We are still exploring this analogy and our findings are tentative. We hope to explore these relationships further in future posts. Feedback and criticism are very much appreciated.

Introduction

After all then it comes to this, that the difference between the life of a man and that of a machine is one rather of degree than of kind, though differences in kind are not wanting. An animal has more provision for emergency than a machine. The machine is less versatile; its range of action is narrow; its

... (read 6481 more words →)

Replying to“Reframing Superintelligence” + LLMs + 4 years

Justin Bullock3y

“Reframing Superintelligence” + LLMs + 4 years

This discussion considers a relatively “flat”, dynamic organization of systems. The open-agency model^[13] considers flexible yet relatively stable patterns of delegation that more closely correspond to current developments.

I have a questions here that I'm curious about:

I wonder if you have any additional thoughts about the "structure" of the open agencies that you imagine here. Flexible and relatively stable patterns of delegation seem to be important dimensions. You mention here that the discussion focuses on "flat" organization of systems, but I'm wondering if we might expect more "hierarchical" relationships if we incorporate things like proposer/critic models as part of the role architecture.

Replying toRole Architectures: Applying LLMs to consequential tasks

Justin Bullock3y

Role Architectures: Applying LLMs to consequential tasks

We want work flows that divide tasks and roles because of the inherent structure of problems, and because we want legible solutions. Simple architectures and broad training facilitate applying structured roles and workflows to complex tasks. If the models themselves can propose the structures (think of chain-of-thought prompting), so much the better. Planning a workflow is an aspect of the workflow itself.

I think this has particular promise, and it's an area I would be excited to explore further. As I mentioned in a previous comment on your The Open Agency Model piece, I think this is a rich area of exploration for the different role architectures, roles, and tasks that would need... (read more)

Replying toThe Open Agency Model

Justin Bullock3y

The Open Agency Model

Thanks for this post, and really, this series of posts. I had not been following along, so I started with the "“Reframing Superintelligence” + LLMs + 4 years" and worked my way back to here.

I found your initial Reframing Superintelligence report very compelling back when I first came across it, and still do. I also appreciate your update post referenced above.

The thought I'd like to offer here is that it strikes me that your ideas here are somewhat similar to what both Max Weber and Herbert Simon proposed we should do with human agents. After reading your Reframing Superintelligence report, I wrote a post here that noted that it led me to... (read 460 more words →)

Replying toKeep humans in the loop

Justin Bullock3y

Keep humans in the loop

Thanks for this post. As I mentioned to both of you, it feels a little bit like we have been ships passing one another in the night. I really like your idea here of loops and the importance of keeping humans within these loops, particularly at key nodes in the loop or system, to keep Moloch at bay.

I have a couple scattered points for you to consider:

In my work in this direction, I've tried to distinguish between roles and tasks. You do something similar here, which I like. To me, the question often should be about what specific tasks should be automated as opposed to what roles. As you suggest, people within

Justin Bullock3y

GPT-7: The Tale of the Big Computer (An Experimental Story)

I was interested in seeing what the co-writing process would create. I also wanted to tell a story about technology in a different way, which I hope compliments the other stories in this part of the sequence. I also just think it’s fun to retell a story that was originally told from the point of view of future intelligent machines back in 1968, and then to use a modern intelligent machine to write that story. I think it makes a few additional points about how stable our fears have been, how much the technology has changed, and the plausibility of the story itself.

Replying toGPT-7: The Tale of the Big Computer (An Experimental Story)

Justin Bullock3y

GPT-7: The Tale of the Big Computer (An Experimental Story)

I love that response! I’ll be interested to see how quickly it strikes others. All the actual text that appears within the story is generated by ChatGPT with the 4.0 model. Basically, I asked ChatGPT to co-write a brief story. I had it pause throughout and ask for feedback in revisions. Then, at the end of the story it generated with my feedback along the way, I asked it to fill in some more details and examples, which it did. I asked for minor changes in these in style and specific type as well.

I’d be happy to directly send you screenshots of the chat as well.

Thanks for reading!

GPT-7: The Tale of the Big Computer (An Experimental Story)

Justin Bullock

In the not-too-distant future, a remarkable transformation took place. The world had seen the rise and fall of many technologies, but none as impactful as the data processing machines. These machines, born from the marriage of silicon and code, were not just tools; they were partners in our quest for knowledge and prosperity. And they were surprisingly good at winning trivia nights, which was both amusing and slightly unsettling.

GPT-7, the first of these machines to truly change the world, was a marvel of its time. It was like a digital Sherlock Holmes, with an insatiable appetite for data and an uncanny ability to generate human-like text. It was the brainchild of a... (read 1307 more words →)

LESSWRONG
LW

LESSWRONG
LW

Justin Bullock

Conference Report: Threshold 2030 - Modeling AI Economic Futures

Analysis of Global AI Governance Strategies

AI governance needs a theory of victory

AI Clarity: An Initial Research Agenda

Justin Bullock

Conference Report: Threshold 2030 - Modeling AI Economic Futures

Training Data Attribution: Examining Its Adoption & Use Cases

Training Data Attribution (TDA): Examining Its Adoption & Use Cases

Analysis of Global AI Governance Strategies

AI governance needs a theory of victory

AI Clarity: An Initial Research Agenda

Transformative AI and Scenario Planning for AI X-risk

Justin Bullock

Conference Report: Threshold 2030 - Modeling AI Economic Futures

Analysis of Global AI Governance Strategies

AI governance needs a theory of victory

AI Clarity: An Initial Research Agenda

Justin Bullock

Conference Report: Threshold 2030 - Modeling AI Economic Futures

Training Data Attribution: Examining Its Adoption & Use Cases

Training Data Attribution (TDA): Examining Its Adoption & Use Cases

Analysis of Global AI Governance Strategies

AI governance needs a theory of victory

AI Clarity: An Initial Research Agenda

Transformative AI and Scenario Planning for AI X-risk

Comprehensive Summary

Overview

Executive Summary

Overview

Introduction