LESSWRONG
LW

Matrice Jacobine
46521570
Message
Dialogue
Subscribe

Student in fundamental and applied mathematics, interested in theoretical computer science and AI alignment

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The best simple argument for Pausing AI?
Matrice Jacobine9d10

FTR: You can choose your own commenting guidelines when writing or editing a post in the section "Moderation Guidelines".

[This comment is no longer endorsed by its author]Reply
evhub's Shortform
Matrice Jacobine16d1-1

I think your tentative position is correct and public-facing chatbots like Claude should lean toward harmlessness in the harmlessness-helpfulness trade-off, but (post-adaptation buffer) open-source models with no harmlessness training should be available as well.

Reply
Futarchy's fundamental flaw
Matrice Jacobine18d30

This seems related to the ? Especially @Scott Garrabrant's version, considering logical induction is based on prediction markets.

Reply
the void
Matrice Jacobine24d10

You seem to smuggle in an unjustified assumption: that white collar workers avoid thinking about taking over the world because they're unable to take over the world. Maybe they avoid thinking about it because that's just not the role they're playing in society.

White-collar workers avoid thinking about taking over the world because they're unable to take over the world, and they're unable to take over the world because their role in society doesn't involve that kind of thing. If a white-collar worker is somehow drafted for president of the United States, you would assume their propensity to think about world hegemony will increase. (Also, white-collar workers engage in scheming, sandbagging, and deception all the time? The average person lies 1-2 times per day)

Reply
the void
Matrice Jacobine24d10

Human white-collar workers are unarguably agents in the relevant sense here (intelligent beings with desires and taking actions to fulfil those desires). The fact that they have no ability to take over the world has no bearing on this.

Reply
the void
Matrice Jacobine24d10

... do you deny human white-collar workers are agents?

Reply
the void
Matrice Jacobine25d11

LLMs are agent simulators. Why would they contemplate takeover more frequently than the kind of agent they are induced to simulate? You don't expect a human white-collar worker, even one who make mistakes all the time, to contemplate world domination plans, let alone attempt one. You could however expect the head of state of a world power to do so.

Reply
the void
Matrice Jacobine1mo20

https://www.lesswrong.com/users/janus-1 ?

Reply
the void
Matrice Jacobine1mo12

I think Janus is closer to "AI safety mainstream" than nostalgebraist?

Reply
the void
Matrice Jacobine1mo10

Uh? The OpenAssistant dataset would qualify as supervised learning/fine-tuning, not RLHF, no?

Reply
Load More
5-and-10 problem
7Energy-Based Transformers are Scalable Learners and Thinkers
3d
5
7NYT article about the Zizians including quotes from Eliezer, Anna, Ozy, Jessica, Zvi
3d
0
24Hydra
1mo
0
11The Decreasing Value of Chain of Thought in Prompting
1mo
0
32Priming effects are fake, but framing effects are real
2mo
0
6Absolute Zero: Reinforced Self-play Reasoning with Zero Data
2mo
4
8Is there a Half-Life for the Success Rates of AI Agents?
2mo
0
12Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
3mo
4
21"Long" timelines to advanced AI have gotten crazy short
3mo
0
6Large Language Models Pass the Turing Test
3mo
0
Load More