Mateusz Bagiński

Primarily interested in agent foundations and AI macrostrategy.

I endorse and operate by Crocker's rules.

I have not signed any agreements whose existence I cannot mention.

Wiki Contributions

Comments

Sorted by

I too have reservations about points 1 and 3 but not providing sufficient references or justifications doesn't imply they're not on SL1.

(I see what podcasts you listen to.)

My notion of progress is roughly: something that is either a building block for The Theory (i.e. marginally advancing our understanding) or a component of some solution/intervention/whatever that can be used to move probability mass from bad futures to good futures.

Re the three you pointed out, simulators I consider a useful insight, gradient hacking probably not (10% < p < 20%), and activation vectors I put in the same bin as RLHF whatever is the appropriate label for that bin.

Also, I'm curious what it is that you consider(ed) AI safety progress/innovation. Can you give a few representative examples?

  • the approaches that have been attracting the most attention and funding are dead ends

I'd love to try it, mainly thinking about research (agent foundations and AI safety macrostrategy).

I propose "token surprise" (as in type-token distinction). You expected this general type of thing but not that Ivanka would be one of the tokens instantiating it.

It's better but still not quite. When you play on two levels, sometimes the best strategy involves a pair of (level 1 and 2) substrategies that are seemingly opposites of each other. I don't think there's anything hypocritical about that.

Similarly, hedging is not hypocrisy.

Load More