Thanks for this comment. I agree there is some ambiguity here on the types of risks that are being considered with respect to the question of open-sourcing foundation models. I believe the report favors the term "extreme risks" which is defined as "risk of significant physical harm or disruption to key societal functions." I believe they avoid the terms of "extinction risk" and "existential risk," but are implying something not too different with their choice of extreme risks.

For me, I pose the question above as:

"How large are the risks from fully open-sourced foundation models? More specifically, how significant are these risks compared to the overall risks inherent in the development and deployment of foundation models?"

What I'm looking for is something like "total risk" versus "total benefit." In other words, if we take all the risks together, just how large are they in this context? In part I'm not sure if the more extreme risks really come from open sourcing the models or simply from the development and deployment of increasingly capable foundation models.

I hope this helps clarify!

To open-source or to not open-source, that is (an oversimplification of) the question.

Justin Bullock6mo50

Thank you for this comment!

I think your point that "The problem here is that fine-tuning easily strips any safety changes and easily adds all kinds of dangerous things (as long as capability is there)." is spot on and maps to my intuitions about the weaknesses of fine-tuning and one of strongest points in favor of the significant risks to open-sourcing foundation models.

I appreciate your suggestions for other methods of auditing that could possibly work such as a model being run within a protected framework and open-sourcing encrypted weights. I think these allow for something like risk mitigations for partial open-sourcing but would be less feasible for fully open sourced models where weights represented by plain tensors would be more likely to be available

Your comment is helpful and gave me some additional ideas to consider. Thanks!

One thing I would add is that the idea I had in mind for auditing was more of a broader process than a specific tool. The paper I mention to support this idea of a healthy ecosystem for auditing foundation models is “Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing.” Here the authors point to an auditing process that would guide a decision of whether or not to release a specific model and the types of decision points, stakeholders, and review process that might aid in making this decision. At the most abstract level the process includes scoping, mapping, artifact collection, testing, reflection, and post-audit decisions of whether or not to release the model.

Machine Evolution

Justin Bullock6mo10

Thanks for the comment!

I think your observation that biological evolution is a slow, blind, and undirected process is fair. We try to make this point explicit in our section on natural selection (as a main evolutionary selection pressure for biological evolution) where we say "The natural processes for succeeding or failing in survival and reproduction – natural and sexual selection – are both blind and slow."

For our contribution here we are not trying to dispute this. Instead we're seeking to find analogies to the ways in which machine evolution, which we define as "the process by which machines change over successive generations," may have some underlying similar mechanisms that we can apply to understand how machines change over successive generations.

To your point that, "Machine learning algorithms, which are the relevant machines here, aren't progressing in this pattern of dumb experiments which occasionally get lucky," I agree. To understand this process better and as distinct from biological evolution and natural selection, we propose the notion of artificial selection. The idea of artificial selection is that machines are responding in part to natural selection pressures but that the evolutionary pressures are different here, which is why we give them a different name. We describe artificial selection in a way that I think corresponds closely to your concern. We say:

"For an analogy to natural selection we have chosen the term artificial selection which is driven in large part by human culture, human artifacts, and individual humans.... Artificial selection also highlights the ways in which this selection pressure applies more generally to human artifacts. Human intention and human design have shifted the pace of evolution of artifacts, including machines, rocketing forward by comparison to biological evolution."

All of this to say, I agree that the comparison is pretty inexact. We were not going for an exact comparison. We were attempting to make it clear that machines and machine learning are influenced by a very different evolutionary selection process, which should lead to different expectations about the process by which machines change over successive generation. Our hope was not for the analogy to be exact to biological evolution, but rather to use components of biological evolution such as natural selection, inheritance, mutation, and recombination as familiar biological processes to explore potential parallels to machine evolution.

AI Alignment Breakthroughs this Week [new substack]

Justin Bullock7mo10

This is great! Thanks for sharing. I hope you continue to do these.

“Reframing Superintelligence” + LLMs + 4 years

Justin Bullock8mo30

This discussion considers a relatively “flat”, dynamic organization of systems. The open-agency model^[13] considers flexible yet relatively stable patterns of delegation that more closely correspond to current developments.

I have a questions here that I'm curious about:

I wonder if you have any additional thoughts about the "structure" of the open agencies that you imagine here. Flexible and relatively stable patterns of delegation seem to be important dimensions. You mention here that the discussion focuses on "flat" organization of systems, but I'm wondering if we might expect more "hierarchical" relationships if we incorporate things like proposer/critic models as part of the role architecture.

Role Architectures: Applying LLMs to consequential tasks

Justin Bullock8mo30

We want work flows that divide tasks and roles because of the inherent structure of problems, and because we want legible solutions. Simple architectures and broad training facilitate applying structured roles and workflows to complex tasks. If the models themselves can propose the structures (think of chain-of-thought prompting), so much the better. Planning a workflow is an aspect of the workflow itself.

I think this has particular promise, and it's an area I would be excited to explore further. As I mentioned in a previous comment on your The Open Agency Model piece, I think this is a rich area of exploration for the different role architectures, roles, and tasks that would need to be organized to ensure both alignment and capabilities. As I mentioned there, I think there are specific areas of study that may contribute meaningfully to how we might do that. However, these fields have their own limitations, and the analogy to human agents fulfilling these role architectures (organizations in traditional human coordination sense) is not perfect. And on this note, I'm quite interested to see the capabilities of LLMs creating structured roles and workflows to complex tasks that then other LLMs could be simulated to fulfill.

The Open Agency Model

Justin Bullock8mo30

Thanks for this post, and really, this series of posts. I had not been following along, so I started with the "“Reframing Superintelligence” + LLMs + 4 years" and worked my way back to here.

I found your initial Reframing Superintelligence report very compelling back when I first came across it, and still do. I also appreciate your update post referenced above.

The thought I'd like to offer here is that it strikes me that your ideas here are somewhat similar to what both Max Weber and Herbert Simon proposed we should do with human agents. After reading your Reframing Superintelligence report, I wrote a post here that noted that it led me to think more about this idea that human "bureaucrats" have a specific roles they play that are directed at a somewhat stable set of tasks. To me, this is a similar idea to what you're suggesting here with the Open Agency model.

Here's that post, it's from 2021: Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS).

In that post I also not some of Andrew Critch's work that I think is somewhat in this direction as well. In particular, I think this piece may contribute to these ideas here as well: What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs).

All of this to say, I think there may be some lessons here for your Open Agency Model that build from studies of public agencies, organization studies, public administration, and governance. One of the key questions across these fields is how to align human agents to performs roles and bounded tasks in alignment with the general goals of an agency.

There are of course limitations to the human agent analogy, but given LLMs agent simulating capacities, defining roles and task structures within agencies for an agent to accomplish may benefit from what we've learned about managing this task with human agents.

For this task, I think Weber's notion of creating a "Beamte" to fulfill specialized roles within the bureaucracy is a nice starting point for how to prompt or craft bounded agents that might fulfill specific roles as part of an open agency. And to highlight these specific elements, I include them below as a direct quote from the Controlling Intelligent Agents The Only Way We Know How: Ideal Bureaucratic Structure (IBS) piece:

"Weber provides 6 specific features of the IBS (he calls it the Modern Bureaucracy) including:

The principle of fixed competencies
The principle of hierarchically organized positions
Actions and rules are written and recorded
In-depth specialist training needed for agents undertaking their position
The position is full time and occupies all the professional energy of the agent in that position.
The duties of the position are based on general learnable rules and regulation, which are more or less firm and more or less comprehensive

Weber goes on to argue that a particular type of agent, a beamte, is needed to fulfill the various positions specialization demands for processing information and executing actions. So what does the position or role of the beamte demand?

The position is seen as a calling and a profession
The beamte (the agent) aims to gain and enjoy a high appreciation by people in power
The beamte is nominated by a higher authority
The beamte is a lifetime position
The beamte receives a regular remuneration
The beamte are organized into a professional track."

Anyways, this is just a potential starting point for ideas around how to create an open agency of role architectures that might be populated by LLM simulations to accomplish concrete tasks.

Keep humans in the loop

Justin Bullock10mo52

Thanks for this post. As I mentioned to both of you, it feels a little bit like we have been ships passing one another in the night. I really like your idea here of loops and the importance of keeping humans within these loops, particularly at key nodes in the loop or system, to keep Moloch at bay.

I have a couple scattered points for you to consider:

In my work in this direction, I've tried to distinguish between roles and tasks. You do something similar here, which I like. To me, the question often should be about what specific tasks should be automated as opposed to what roles. As you suggest, people within specific roles bring their humanity with them to the role. (See: "Artificial Intelligence, Discretion, and Bureaucracy")
One term I've used to help think about this within the context of organizations is the notion of discretion. This is the way in which individuals use of their decision making capacity within a defined role. It is this discretion that often allows individuals holding those roles to shape their decision making in a humane and contextualized way. (See: "Artificial discretion as a tool of governance: a framework for understanding the impact of artificial intelligence on public administration")
Elsewhere, coauthors and I have used the term administrative evil to examine the ways in which substituting machine decision making for human decision making dehumanizes the decision making process exacerbating the risk of administrative evil be perpetuated by an organization. (See: Artificial Intelligence and Administrative Evil")
One other line of work has looked at how the introduction of algorithms or machine intelligence within the loop changes the shape of the loop, potentially in unexpected ways, leading to changes in inputs in decision making throughout the loop. That is machine evolution influences organization (loop) evolution. (See: Machine Intelligence, Bureaucracy, and Human Control" & "Artificial Intelligence, bureaucratic form, and discretion in public service")
I like the inclusion of the work on Cyborgism. It seems to me that in someways we've already become Cyborgs to match the complexity of the loops in which we work and play together. as they've already evolved in response to machine evolution. In theory at least, it does seem that a Cyborg approach could help overcome some of the challenges presented by Moloch and failed attempts at coordination.

Finally, your focus on loops reminded me of "Godel, Escher, Bach" and Hofstadter's focus there and in his "I am A Strange Loop." I like how you apply the notion to human organizations here. It would be interesting to think about different types of persistent loops as a ways of describing different organizational structures, goals, resources, etc.

I'm hoping we can discuss together sometime soon. I think we have a lot of interest overlap here.

Thanks for this post! Hope the comments are helpful.

GPT-7: The Tale of the Big Computer (An Experimental Story)

Justin Bullock10mo10

I was interested in seeing what the co-writing process would create. I also wanted to tell a story about technology in a different way, which I hope compliments the other stories in this part of the sequence. I also just think it’s fun to retell a story that was originally told from the point of view of future intelligent machines back in 1968, and then to use a modern intelligent machine to write that story. I think it makes a few additional points about how stable our fears have been, how much the technology has changed, and the plausibility of the story itself.

GPT-7: The Tale of the Big Computer (An Experimental Story)

Justin Bullock10mo10

I love that response! I’ll be interested to see how quickly it strikes others. All the actual text that appears within the story is generated by ChatGPT with the 4.0 model. Basically, I asked ChatGPT to co-write a brief story. I had it pause throughout and ask for feedback in revisions. Then, at the end of the story it generated with my feedback along the way, I asked it to fill in some more details and examples, which it did. I asked for minor changes in these in style and specific type as well.

I’d be happy to directly send you screenshots of the chat as well.

Thanks for reading!