LESSWRONG
LW

427
Sam Clarke
469Ω1059570
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
4Sam Clarke's Shortform
4y
6
When reporting AI timelines, be clear who you're deferring to
Sam Clarke2y10

Finally posted: https://www.lesswrong.com/posts/qccxb3uzwFDsRuJuP/deference-on-ai-timelines-survey-results

Reply
Deference on AI timelines: survey results
Sam Clarke2y30

Did people say why they deferred to these people?

No, only asked respondents to give names

I think another interesting question to correlate this would be "If you believe AI x-risk is a severely important issue, what year did you come to believe that?".

Agree, that would have been interesting to ask

Reply
Deference on AI timelines: survey results
Sam Clarke2y30

Things that surprised me about the results

  • There’s more variety than I expected in the group of people who are deferred to
    • I suspect that some of the people in the “everyone else” cluster defer to people in one of the other clusters—in which case there is more deference happening than these results suggest.
  • There were more “inside view” responses than I expected (maybe partly because people who have inside views were incentivised to respond, because it’s cool to say you have inside views or something). Might be interesting to think about whether it’s good (on the community level) for this number of people to have inside views on this topic.
  • Metaculus was given less weight than I expected (but as per Eli (see footnote 2), I think that’s a good thing).
  • Grace et al. AI expert surveys (1, 2) were deferred to less than I expected (but again, I think that’s good—many respondents to those surveys seem to have inconsistent views, see here for more details. And also there’s not much reason to expect AI experts to be excellent at forecasting things like AGI—it’s not their job, it’s probably not a skill they spend time training).
  • It seems that if you go around talking to lots of people about AI timelines, you could move the needle on community beliefs more than I expected.
Reply
When reporting AI timelines, be clear who you're deferring to
Sam Clarke3y20

Sorry for late, will be out this month!

Reply
Will Capabilities Generalise More?
Sam Clarke3y120

Just wanted to say this is the single most useful thing I've read for improving my understanding of alignment difficulty. Thanks for taking the time to write it!

Reply
Inner Alignment: Explain like I'm 12 Edition
Sam Clarke3y30

Part of me thinks: I was trying to push on whether it has a world model or rather has just memorised loads of stuff on the internet and learned a bunch of heuristics for how to produce compelling internet-like text. For me, "world model" evokes some object that has a map-territory relationship with the world. It's not clear to me that GPT-3 has that.

Another part of me thinks: I'm confused. It seems just as reasonable to claim that it obviously has a world model that's just not very smart. I'm probably using bad concepts and should think about this more.

Reply
Inner Alignment: Explain like I'm 12 Edition
Sam Clarke3y10

It looks good to me!

This is already true for GPT-3

Idk, maybe...?

Reply
Inner Alignment: Explain like I'm 12 Edition
Sam Clarke3yΩ080

Re the argument for "Why internalization might be difficult", I asked Evan Hubinger for his take on your rendition of the argument, and he thinks it's not right.

Rather, the argument that Risks from Learned Optimization makes that internalization would be difficult is that:

  • ~all models with good performance on a diverse training set probably have to have a complex world model already, which likely includes a model of the base objective,
  • so having the base objective re-encoded in a separate part of the model that represents its objective is just a waste of space/complexity.

Especially since this post is now (rightly!) cited in several introductory AI risk syllabi, it might be worth correcting this, if you agree it's an error.

Reply
Inner Alignment: Explain like I'm 12 Edition
Sam Clarke4y30

Edit: or do you just mean that even though you take the same steps, the two feel different because retreating =/= going further along the wall

Yeah, this — I now see what you were getting at!

Reply
Late 2021 MIRI Conversations: AMA / Discussion
Sam Clarke4y50

One argument for alignment difficulty is that corrigibility is "anti-natural" in a certain sense. I've tried to write out my understanding of this argument, and would be curious if anyone could add or improve anything about it.

I'd be equally interested in any attempts at succinctly stating other arguments for/against alignment difficulty.

Reply
Load More
25Deference on AI timelines: survey results
2y
4
38When reporting AI timelines, be clear who you're deferring to
3y
6
4Sam Clarke's Shortform
4y
6
21Collection of arguments to expect (outer and inner) alignment failure?
QΩ
4y
QΩ
10
74Distinguishing AI takeover scenarios
Ω
4y
Ω
11
65Survey on AI existential risk scenarios
Ω
4y
Ω
11
15What are the biggest current impacts of AI?
Q
5y
Q
5
97Clarifying “What failure looks like”
Ω
5y
Ω
14