LESSWRONG
LW

1452
Dakara
12907588
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Agentic Misalignment: How LLMs Could be Insider Threats
Dakara3mo20

Fair enough, but is the broader trend of "Models won't take unethical actions unless they're the only options" still holding?

That was my takeaway from the experiments done in the aftermath of the alignment faking paper, so it's good to see that it's still holding.

Reply
No77e's Shortform
Dakara3mo10

I think the general population doesn't know all that much about singularity, so adding that to the part would just unnecessarily dilute it.

Reply
Escaping the Jungles of Norwood: A Rationalist’s Guide to Male Pattern Baldness
Dakara3mo42

I have read the entire piece and it didn't feel like an AI slop at all. In fact, if I wasn't told, I wouldn't have suspected that AI was involved here, so well done!

Reply
Knight Lee's Shortform
Dakara3mo31

A lot of splits happen because some employees think that the company is headed in the wrong direction (lackluster safety would be one example).

Reply
Angela's Shortform
Dakara3mo10

Test successful worked :)

Reply
Vladimir_Nesov's Shortform
Dakara3mo90

He probably doesn't have much influence on the public opinion of LessWrong, but as a person in charge of a major AI company, he is obviously a big player.

Reply1
Making deals with early schemers
Dakara3mo*1-1

It looks to me like a promising approach. Great results!

Reply
Debate experiments at The Curve, LessOnline and Manifest
Dakara3mo30

I've noticed that whenever the debate touches on a very personal topic, it tends to be heated and pretty unpleasant to listen to. On contrast, debates about things that are low-stakes for the people who are debating tend to be much more productive, sometimes even involving steelmanning.

Reply
Every Major LLM Endorses Newcomb One-Boxing
Dakara3mo10

That's certainly an interesting result. Have you tried running the same prompt again and seeing if the response changes? I've noticed that some LLMs answer different things to the same prompt. For example, when I quizzed DeepSeek R1 on whether a priori knowledge exists it answered in affirmative the first time and in negative the second time.

Reply
deep's Shortform
Dakara3mo*11

If alignment by default is not the majority opinion, then what is (pardon my ignorance as someone who mostly interacts with alignment community via LessWrong)? Is it 1) that we are all ~doomed or 2) that alignment is hard but we have a decent shot at solving it or 3) something else entirely?

I got a feeling like people used to be a lot more pessimistic about our chances of survival in 2023 than in 2024 or 2025 (in other words, pessimism seems to be going down somewhat), but I could be completely wrong about this.

Reply
Load More
No posts to display.
AI
8 months ago
(-16)
AI
8 months ago
(-81)
AI
8 months ago
(+12/-9)
AI
8 months ago
(+19/-27)
Interpretability (ML & AI)
8 months ago
(-13)
AI
8 months ago
(+12/-9)
AI
8 months ago
AI
8 months ago
AI
8 months ago
(+18/-31)
AI Risk Skepticism
8 months ago
(-13)
Load More