johnswentworth

Sequences

From Atoms To Agents
"Why Not Just..."
Basic Foundations for Agent Models
Framing Practicum
Gears Which Turn The World
Abstraction 2020
Gears of Aging
Model Comparison

Wiki Contributions

Comments

Sorted by

On my understanding, the push for centralization came from a specific faction whose pitch was basically:

  • here's the scaling laws for tokamaks
  • here's how much money we'd need
  • ... so let's make one real big tokamak rather than spending money on lots of little research devices.

... and that faction mostly won the competition for government funding for about half a century.

The current boom accepted that faction's story at face value, but then noticed that new materials allowed the same "scale up the tokamaks" strategy to be executed on a budget achievable with private funding, and therefore they could fund projects without having to fight the faction which won the battle for government funding.

The counterfactual which I think is probably correct is that there exist entirely different designs far superior to tokamaks, which don't require that much scale in the first place, but which were never discovered because the "scale up the tokamaks" faction basically won the competition for funding and stopped most research on alternative designs from happening.

If you mean the Manhattan Project: no. IIUC there were basically zero Western groups and zero dollars working toward the bomb before that, so the Manhattan Project clearly sped things up. That's not really a case of "centralization" so much as doing-the-thing-at-all vs not-doing-the-thing-at-all.

If you mean fusion: yes. There were many fusion projects in the sixties, people were learning quickly. Then the field centralized, and progress slowed to a crawl.

I think this is missing the most important consideration: centralization would likely massively slow down capabilities progress.

the number one spontaneous conversation is "what are you working on" or "what have you done so far", which forces you to re-explain what you're doing & the reasons for doing it to a skeptical & ignorant audience

I'm very curious if others also find this to be the biggest value-contributor amongst spontaneous conversations. (Also, more generally, I'm curious what kinds of spontaneous conversations people are getting so much value out of.)

I have heard people say this so many times, and it is consistently the opposite of my experience. The random spontaneous conversations at conferences are disproportionately shallow and tend toward the same things which have been discussed to death online already, or toward the things which seem simple enough that everyone thinks they have something to say on the topic. When doing an activity with friends, it's usually the activity which is novel and/or interesting, while the conversation tends to be shallow and playful and fun but not as substantive as the activity. At work, spontaneous conversations generally had little relevance to the actual things we were/are working on (there are some exceptions, but they're rarely as high-value as ordinary work).

The ice cream snippets were good, but they felt too much like they were trying to be a relatively obvious not-very-controversial example of the problems you're pointing at, rather than a central/prototypical example. Which is good as an intro, but then I want to see it backed up by more central examples.

The dishes example was IMO the best in the post, more like that would be great.

Unfiltered criticism was discussed in the abstract, it wasn't really concrete enough to be an example. Walking through an example conversation (like the ice cream thing) would help.

Mono vs open vs poly would be a great example, but it needs an actual example conversation (like the ice cream thing), not just a brief mention. Same with career choice. I want to see how specifically the issues you're pointing to come up in those contexts.

(Also TBC it's an important post, and I'm glad you wrote it.)

This post would be a LOT better with about half-a-dozen representative real examples you've run into.

One example, to add a little concreteness: suppose that the path to AGI is to scale up o1-style inference-time compute, but it requires multiple OOMs of scaling. So it no longer has a relatively-short stream of "internal" thought, it's more like the natural-language record of an entire simulated society.

Then:

  • There is no hope of a human reviewing the whole thing, or any significant fraction of the whole thing. Even spot checks don't help much, because it's all so context-dependent.
  • Accurate summarization would itself be a big difficult research problem.
  • There's likely some part of the simulated society explicitly thinking about intentional deception, even if the system as a whole is well aligned.
  • ... but that's largely irrelevant, because in the context of a big complex system like a whole society, the effects of words are very decoupled from their content. Think of e.g. a charity which produces lots of internal discussion about reducing poverty, but frequently has effects entirely different from reducing poverty. The simulated society as a whole might be superintelligent, but its constituent simulated subagents are still pretty stupid (like humans), so their words decouple from effects (like humans' words).

... and that's how the proposal breaks down, for this example.

I haven't decided yet whether to write up a proper "Why Not Just..." for the post's proposal, but here's an overcompressed summary. (Note that I'm intentionally playing devil's advocate here, not giving an all-things-considered reflectively-endorsed take, but the object-level part of my reflectively-endorsed take would be pretty close to this.)

Charlie's concern isn't the only thing it doesn't handle. The only thing this proposal does handle is an AI extremely similar to today's, thinking very explicitly about intentional deception, and even then the proposal only detects it (as opposed to e.g. providing a way to solve the problem, or even a way to safely iterate without selecting against detectability). And that's an extremely narrow chunk of the X-risk probability mass - any significant variation in the AI breaks it, any significant variation in the threat model breaks it. The proposal does not generalize to anything.

Charlie's concern is just one specific example of a way in which the proposal does not generalize. A proper "Why Not Just..." post would list a bunch more such examples.

And as with Charlie's concern, the meta-level problem is that the proposal also probably wouldn't get us any closer to handling those more-general situations. Sure, we could make some very toy setups (like the chess thing), and see what the shoggoth+face AI does on those very toy setups, but we get very few bits, and the connection is very tenuous to both other threat models and AIs with any significant differences from the shoggoth+face. Accounting for the inevitable failure to measure what we think we're measuring (with probability close to 1), such experiments would not actually get us any closer to solving any of the problems which constitute the bulk of the X-risk probability mass. It's not "a start", because "a start" would imply that the experiment gets us closer, i.e. that the problem gets easier after doing the experiment. If you try to think about the You Are Not Measuring What You Think You Are Measuring problem as "well, we got at least some tiny epsilon of evidence, right?", then you will shoot yourself in the foot; such reasoning is technically correct, but the correct value of epsilon is small enough that the correct update from it is not distinguishable from zero in practice.

The problem with that sort of attitude is that, when the "experiment" yields so few bits and has such a tenuous connection to the thing we actually care about (as in Charlie's concern), that's exactly when You Are Not Measuring What You Think You Are Measuring bites real hard. Like, sure, you'll see this system do something in the toy chess experiment, but that's just not going to be particularly relevant to the things an actual smarter-than-human AI does in the situations Charlie's concerned about. If anything, the experimenter is far more to likely to fool themselves into thinking their results are relevant to Charlie's concern than they are to correctly learn anything relevant to Charlie's concern.

Load More