Algon - LessWrong

Strong upvoted not because I agree with your take on alignment, though there is perhaps some merit to it in spite of it being hard to buy results in alignment, but because I think LW needs to focus more on the practice of rationality, and indeed instrumental rationality.

So You Think You've Awoken ChatGPT

Algon4d20

Fair enough. If/when you get any empirical data on how well this post works, writing it up would be pretty valuable and would likely resolve any remaining disagreements we have.

So You Think You've Awoken ChatGPT

Algon4d20

Then I'd lean away from the "this is for people who've awoken ChatGPT" framing. E.g. change your title to something like "so you think LLMs make you smarter", or something to that effect.

So You Think You've Awoken ChatGPT

Algon4d30

This essay seems like it's trying to address two different audiences: LW, and the people who get mind-hacked by AIs. That's to its detriment, IMO.

E.g. The questions in the Corollary FAQ don't sound like the questions you'd expect from someone who's been mind-hacked by AI. Like, why expect someone in a sycophancy doom loop to ask about if it's OK to use AI for translation? Also, texts produced by sycophancy doom loops look pretty different to AI translated texts. Both share a resemblance to low quality LLM assisted posts, yes. But you're addressing people who think they've awoken ChatGPT, not low-quality posters who use LLM assistance.

Daniel Kokotajlo's Shortform

Algon5d20

But reasoning models don't get reward during deployment. In what sense are they "optimizing for reward"?

Circular Reasoning

Algon8d20

Yep, and getting a solid proof of those theorems would provide insight IMO. So it isn't what I'd call "trivial". Though, I suppose it is possible that raising the theorem statement from entropy would require many bits of info, and finding a proof would require almost none.

leogao's Shortform

Algon13d20

have some threaded chat component bolted on (I have takes on best threading system).

I wish to hear these takes.

"Map of AI Futures" - An interactive flowchart

Algon22d20

Man, that's depressing. Gives too low an estimate for a good outcome though IMO. Very cool tool, though.

Different goals may bring AI into conflict with us

Algon25d20

A key question is if the typical goal-directed superintelligence would assign any significant value to humans. If it does, that greatly reduces the threat from superintelligence. We have a somewhat relevant article earlier in the sequence: AI's goals may not match ours.

BTW, if you're up for helping up improve the article, would you mind answering some questions? Like: do you feel like our article was "epistemically co-operative"? That is, do you think it helps readers orient themselves in the discussion on AI safety, makes the assumptions clear, and generally tries to explain rather than persuade? What's your general level of familiarity with AI Safety?

Ok, AI Can Write Pretty Good Fiction Now

Algon25d40

Personally, I liked "it's always been the floor". Feels real. I've certainly said/heard people say things like that in strained relationships. Perhaps "it's always the floor" would have been better. Or "it always is". Yes, that sounds right.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments