ryan_b

Sequences

National Institute of Standards and Technology: AI Standards

Wiki Contributions

Comments

portraying Sam Bankman-Fried as the Luke Skywalker to CZ’s Darth Vader? Presumably that will change a bit.

I feel like Han Solo and Jabba the Hutt is the new Star Wars narrative, because it looks like SBF is going to owe some people.

Although I would also entertain just moving SBF to Lando for a sympathetic portrayal, in the name of making the Robot Chicken version real.

I agree the gradient-of-physical-systems isn't the most natural way to think about it; I note that it didn't occur to me until this very conversation despite acausal trade being old hat here.

What I am thinking now is that a more natural way to think about it is overlapping abstraction space. My claim is that in order to acausally coordinate, at least one of the conditions is that all parties need to have access to the same chunk of abstraction space, somewhere in their timeline. This seems to cover the similar physical systems intuition we were talking about: two rocks with coordinate painted on them are abstractly identical, so check; two superrational AIs need the abstractions to model another superrational AI, so check. This is terribly fuzzy, but seems to allow in all the candidates for success.

The binary distinction makes sense, but I am a little confused about the work the counterfactual modeling is doing. Suppose I were to choose between two places to go to dinner, conditional on counterfactual modelling of each choice. Would this be acausal in your view?

I agree two ants in an anthill are not doing acausal coordination; they are following the pheromone trails laid down by each other. This is the ant version of explicit coordination.

But I think the crux between us is this:

It seems to stretch the original meaning

I agree, it does seem to stretch the original meaning. I think this is because the original meaning was surprising and weird; it seemed to be counterintuitive and I had to put quite a few cycles in to work through the examples of AIs negotiating without coexisting.

But consider for a moment we had begun from the opposite end: if we accept two rocks with "cooperate" painted on them as counting for coordination, starting from there we can make a series of deliberate extensions. By this I mean stuff like: if we can have rocks with cooperate painted on, surely we can have agents with cooperate painted on (which is what I think voting mostly is); if we can have agents with cooperate painted on, we can have agents with decision rules about whether to cooperate; if we can have decision rules about whether to cooperate they can use information about other decision rules, and so on until we encompass the original case of superrational AGI trading acausally with AGIs in the future.

I feel like this progression from cooperating rocks to superrational AGIs is just recognizing a gradient whereby progressively less-similar physical systems can still accomplish the same thing as the 0 computation, 0 information systems which are very similar.

I am happy with longer explanations, if you have the time. To be more specific about the kind of things I'm interested in:

  • Do you think Ukrainian forces are able to launch a campaign into Crimea?
  • Do you think Russian forces are able to respond in time?
  • If Russian forces do respond in time, do you think they will provide effective resistance?

In my model these kinds of questions tend to have a much bigger impact on diplomatic decisions than rhetorical or propaganda ones, and the recent history of the war has generated a lot more uncertainty about them through Russia's surprising underperformance.

The voting example is one of those interesting cases where I disagree with the reasoning but come to a similar conclusion anyway.

I claim the population of people who justify voting on any formal reasoning basis is at best a rounding error in the general population, and probably is indistinguishable from zero. Instead, the population in general believes one of three things:

  1. There is an election, so I vote because I'm a voter.
  2. Voting is meaningless anyway, so I don't.
  3. Election? What? Who cares?

But it looks to me this is still coordination without sharing any explicit reasoning with each other. The central difference is that group 1 are all rocks with the word "Vote" painted on them, group 2 are all rocks with the word "Don't vote" painted on them, and group 3 are all rocks scattered in the field somewhere rather than being in the game.

As I write this it occurs to me that when discussing acausal coordination or trade we are always showing isolated agents doing explicit computation about each other; does the zero-computation case still qualify? This feels sort of like it would be trivial, in the same way they might "coordinate" on not breaking the speed of light or falling at the acceleration of gravity.

On the other hand, there remains the question of how people came to be divided into groups with different cached answers in the first place. There's definitely a causal explanation for that, it just happens prior to whatever event we are considering. Yet going back to the first hand, the causal circumstances giving rise to differing sets of cached answers can't be different in any fundamental sense from the ones that give differing decision procedures.

Following from that, I feel like the zero-computation case for acausal coordination is real and counts, which appears to me to make the statement much stronger.

What are your thoughts about the object level of the conflict in Ukraine and Russia, and what bearing do you think they have on the Crimea question?

Thinking about ways in which this safety margin could break; is it possible to have a thin mapping layer on top of your Physics simulator that somehow subverts or obfuscates it

I suppose that a mapping task might fall under the heading of a mesa-optimizer, where what it is doing is optimizing for fidelity between between the outputs of the language layer and the inputs of the physics layer. This would be in addition to the mesa-optimization going on just in the physics simulator. Working title:

CAIS: The Case For Maximum Mesa Optimizers

I don't think so, no - the way I understand it, any kind of separation into separate systems falls into the CAIS sphere of thought. This is because we are measuring things from the capability side, rather than the generation-of-capability side: for example, I think it is still CAIS even if we take the exact same ML architecture and train copies of it into different specializations which then use each other as services.

There are a couple of things worth distinguishing, though:

  1. That a sufficiently integrated CAIS is indistinguishable from a single general agent to us is what tells us CAIS isn't safe either.
  2. The arguments that a single general agent represents a different or more severe danger profile still motivate tracking the two paths differently.

I will also say I think this type of work could easily be part of the creation of a single general agent. If we consider Gato as the anchor for a general agent: many different tasks were captured in a single transformer, but as far as I understand it, Gato kept the input-to-task associations, which is to say the language inputs for the language tasks and the physics inputs for the physics tasks. But if the language model fed to Gato-2 uses this Mind's Eye technique, it would be possible to do physics tasks from a language input and maybe also explain the physics tasks as a language output.

So before it could respond to sentences with other sentences, and it could respond to equations with other equations, but now it can process ancient geometry books and Newton's Principia which use words to describe equations for certain, and maybe even compose outputs of a similar kind.

That doesn't appear to be explained specifically, but what I think they are giving is the larger model size equivalence. That is to say, the 350M parameter language model with Mind's Eye is about as good as a 2.5B parameter language model, and so on.

Yeah, I shoulda linked that. Fixing shortly, thanks to niplav in the meantime!

Load More