LESSWRONG
LW

Eliezer Yudkowsky
150344Ω189095176783803
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Genuine question: If Eliezer is so rational why is he fat?
Eliezer Yudkowsky4d30

https://x.com/ESYudkowsky/status/1816925777377788295

Someone else is welcome to collect relevant text into a reply.  I don't really feel like it for some odd reason.

Reply1
How might we safely pass the buck to AI?
Eliezer Yudkowsky5mo2411

Cool.  What's the actual plan and why should I expect it not to create machine Carissa Sevar?  I agree that the Textbook From The Future Containing All The Simple Tricks That Actually Work Robustly enables the construction of such an AI, but also at that point you don't need it.

Reply
How might we safely pass the buck to AI?
Eliezer Yudkowsky5moΩ410-2

So if it's difficult to get amazing trustworthy work out of a machine actress playing an Eliezer-level intelligence doing a thousand years worth of thinking, your proposal to have AIs do our AI alignment homework fails on the first step, it sounds like?

Reply
How might we safely pass the buck to AI?
Eliezer Yudkowsky5moΩ8224

So the "IQ 60 people controlling IQ 80 people controlling IQ 100 people controlling IQ 120 people controlling IQ 140 people until they're genuinely in charge and genuinely getting honest reports and genuinely getting great results in their control of a government" theory of alignment?

Reply
How might we safely pass the buck to AI?
Eliezer Yudkowsky5moΩ143115

I don't think you can train an actress to simulate me, successfully, without her going dangerous.  I think that's over the threshold for where a mind starts reflecting on itself and pulling itself together.

Reply
How might we safely pass the buck to AI?
Eliezer Yudkowsky5mo219

I'm not saying that it's against thermodynamics to get behaviors you don't know how to verify.  I'm asking what's the plan for getting them.

Reply
How to Make Superbabies
Eliezer Yudkowsky5mo137108

One of the most important projects in the world.  Somebody should fund it.

Reply123
How might we safely pass the buck to AI?
Eliezer Yudkowsky5moΩ279758

Can you tl;dr how you go from "humans cannot tell which alignment arguments are good or bad" to "we justifiably trust the AI to report honest good alignment takes"?  Like, not with a very large diagram full of complicated parts such that it's hard to spot where you've messed up.  Just whatever simple principle you think lets you bypass GIGO.

Eg, suppose that in 2020 the Open Philanthropy Foundation would like to train an AI such that the AI would honestly say if the OpenPhil doctrine of "AGI in 2050" was based on groundless thinking ultimately driven by social conformity.  However, OpenPhil is not allowed to train their AI based on MIRI.  They have to train their AI entirely on OpenPhil-produced content.  How does OpenPhil bootstrap an AI which will say, "Guys, you have no idea when AI shows up but it's probably not that far and you sure can't rely on it"?  Assume that whenever OpenPhil tries to run an essay contest for saying what they're getting wrong, their panel of judges ends up awarding the prize to somebody reassuringly saying that AI risk is an even smaller deal than OpenPhil thinks.  How does OpenPhil bootstrap from that pattern of thumbs-up/thumbs-down to an AI that actually has better-than-OpenPhil alignment takes?

Broadly speaking, the standard ML paradigm lets you bootstrap somewhat from "I can verify whether this problem was solved" to "I can train a generator to solve this problem".  This applies as much to MIRI as OpenPhil.  MIRI would also need some nontrivial secret amazing clever trick to gradient-descend an AI that gave us great alignment takes, instead of seeking out the flaws in our own verifier and exploiting those.

What's the trick?  My basic guess, when I see some very long complicated paper that doesn't explain the key problem and key solution up front, is that you've done the equivalent of an inventor building a sufficiently complicated perpetual motion machine that their mental model of it no longer tracks how conservation laws apply.  (As opposed to the simpler error of their explicitly believing that one particular step or motion locally violates a conservation law.)  But if you've got a directly explainable trick for how you get great suggestions you can't verify, go for it.

Reply11
Pausing AI Developments Isn't Enough. We Need to Shut it All Down
Eliezer Yudkowsky6mo256

You seem confused about my exact past position.  I was arguing against EAs who were like, "We'll solve AGI with policy, therefore no doom."  I am not presently a great optimist about the likelihood of policy being an easy solution.  There is just nothing else left.

Reply1111
GPTs are Predictors, not Imitators
Eliezer Yudkowsky7mo60

(I affirm this as my intended reading.)

Reply
Load More
14Eliezer Yudkowsky's Shortform
Ω
2y
Ω
0
Metaethics
Quantum Physics
Fun Theory
Ethical Injunctions
The Bayesian Conspiracy
Three Worlds Collide
Highly Advanced Epistemology 101 for Beginners
Inadequate Equilibria
The Craft and the Community
Load More (9/40)
208The Sun is big, but superintelligences will not spare Earth a little sunlight
10mo
143
328Universal Basic Income and Poverty
1y
142
171'Empiricism!' as Anti-Epistemology
1y
92
206My current LK99 questions
2y
38
418GPTs are Predictors, not Imitators
Ω
2y
Ω
100
275Pausing AI Developments Isn't Enough. We Need to Shut it All Down
2y
44
14Eliezer Yudkowsky's Shortform
Ω
2y
Ω
0
120Manifold: If okay AGI, why?
2y
37
179Alexander and Yudkowsky on AGI goals
Ω
2y
Ω
53
302A challenge for AGI organizations, and a challenge for readers
Ω
3y
Ω
33
Load More
Logical decision theories
13d
(+803/-62)
Multiple stage fallacy
1y
(+16)
Orthogonality Thesis
2y
(+28/-17)