1 min read13th Jan 20235 comments
This is a special post for quick takes by devansh. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

5 comments, sorted by Click to highlight new comments since: Today at 4:39 AM

Epistemic Status: Rant. Very rapidly written and upon reflection uncertain if I fully endorse; Cunningham’s Law says that this is the best way to get good takes quickly.

 

Rationalists should win. If you have contorted yourself into alternative decision theories that leave you vulnerable to Roko's Basilisk or whatever, and normal CDT or whatever actual humans implement in real life wouldn't leave you vulnerable to stuff like this, then you have failed and you need to go back to trying to be a normal person using normal decision procedures instead of mathing your way into being "forever acausally tortured by a powerful intelligent robot. 


If the average Joe on the street would not succumb to their mind being hacked by Eliezer Yudkowsky, or hell, by a late 2022 chatbot, and you potentially would (by virtue of being a part of the reference class of LessWrong users or whatever)—then you have failed and it is not obvious you can make an expected positive contribution to the field of AI risk reduction at all without becoming far more, for lack of a better word, normal. I don’t understand how people think that spending your time working on increasingly elaborate pseudophilosophical things that they then call “AI alignment” works if they are also the type of people who are highly vulnerable to getting mindhacked by ChatGPT—perhaps this is a bucket error or I’m attacking a strawman? I don’t think Eliezer or Nate or whatever would fall to this failure mode but in general the more philosophical parts of alignment to me feel worrying (and specifically I mean the MIRI-CFAR-sphere, although again maybe worried about attacking a strawman), because the potential negatives of “having people close to alignment solutions be unusually vulnerable to being hacked by AI.”

This statement is in the form of a conditional statement with a premise that I think is almost entirely false, which makes it technically (vacuously) true but not useful.

Do you believe that the premise is substantially true?

CDT gives into blackmail (such as the basilisk), whereas timeless decision theories do not.

If the average Joe on the street would not succumb to their mind being hacked by Eliezer Yudkowsky, or hell, by a late 2022 chatbot, and you potentially would (by virtue of being a part of the reference class of LessWrong users or whatever)—then you have failed and it is not obvious you can make an expected positive contribution to the field of AI risk reduction at all without becoming far more, for lack of a better word, normal. I don’t understand how people think that spending your time working on increasingly elaborate pseudophilosophical things that they then call “AI alignment” works if they are also the type of people who are highly vulnerable to getting mindhacked by ChatGPT—perhaps this is a bucket error or I’m attacking a strawman? I don’t think Eliezer or Nate or whatever would fall to this failure mode but in general the more philosophical parts of alignment to me feel worrying (and specifically I mean the MIRI-CFAR-sphere, although again maybe worried about attacking a strawman), because the potential negatives of “having people close to alignment solutions be unusually vulnerable to being hacked by AI.”

IMO, I don't agree with this take, since I think a common problem here is people falsely believe they wouldn't fall for ChatGPT or some other nonsense. In general people way overrate how well they would do in this situation.

(I promised I'd publish this last night no matter what state it was in, and then didn't get very far before the deadline. I will go back and edit and improve it later.)

 

I feel like I keep, over and over, hearing a complaint from people who get most of their information about college admissions from WhatsApp groups or their parents’ friends or a certain extraordinarily pervasive subreddit (you all know what I’m talking about). Something like “College admissions is ridiculous! Look at this person, who was top of his math class and took 10 AP classes and started lots of clubs, he didn’t get into a single Ivy, he’s going to UCLA!” I think the closest allegory I can find for this is something like “look at this guy, he’s 7 feet tall, didn’t even make it to the NBA!” There’s something important that they’re both missing, some fundamental confusion of a tiny part of the overall metric from reality.