Recent interviews with Eliezer:
The bug patches / epiphanies / tortoises / wizardry square from Small, Consistent Effort: Uncharted Waters In the Art of Rationality
The nanobots, from the bloodstream, in the parlor, Professor Plum.
You could have written Colonel Mustard!
Figure out why it's important to you that your romantic partner agree with you on this. Does your relationship require agreement on all factual questions? Are you contemplating any big life changes because of x-risk that she won't be on board with?
Would you be happy if your partner fully understood your worries but didn't share them? If so, maybe focus on sharing your thoughts, feelings, and uncertainties around x-risk in addition to your reasoning.
I have to click twice on the Reply link, which is unintuitive. (Safari on iOS.)
I tried a couple other debates with GPT-4, and they both ended up at "A, nevertheless B" vs. "B, nevertheless A".
I expressed some disagreement in my comment, but I didn't disagree-vote.
I like your upper bound. The way I'd put it is: If you buy $1 of Microsoft stock, the most impact that can have is if Microsoft sells it to you, in which case Microsoft gets one more dollar to invest in AI today.
And Microsoft won't spend the whole dollar on AI. Although they'd plausibly spend most of a marginal dollar on AI, even if they don't spend most of the average dollar on AI.
I'm not sure what to make of the fact that Microsoft is buying back stock. I'd guess it doesn't make a difference either way? Perhaps if they were going to buy back $X worth of ...
That could be, but also maybe there won't be a period of increased strategic clarity. Especially if the emergence of new capabilities with scale remains unpredictable, or if progress depends on finding new insights.
I can't think of many games that don't have an endgame. These examples don't seem that fun:
I don't think this is a good argument. A low probability of impact does not imply the expected impact is negligible. If you have an argument that the expected impact is negligible, I'd be happy to see it.
Is there a transcript available?
We had the model for ChatGPT in the API for I don't know 10 months or something before we made ChatGPT. And I sort of thought someone was going to just build it or whatever and that enough people had played around with it.
I assume he's talking about text-davinci-002, a GPT 3.5 model supervised-finetuned on InstructGPT data. And he was expecting someone to finetune it on dialog data with OpenAI's API. I wonder how that would have compared to ChatGPT, which was finetuned with RL and can't be replicated through the API.
I agree that institutional inertia is a problem, and more generally there's the problem of getting principals to do the thing. But it's more dignified to make alignment/cooperation technology available than not to make it.
I'm a bit more optimistic about loopholes because I feel like if agents are determined to build trust, they can find a way.
I agree those nice-to-haves would be nice to have. One could probably think of more.
I have basically no idea how to make these happen, so I'm not opinionated on what we should do to achieve these goals. We need some combination of basic research, building tools people find useful, and stuff in-between.
You poster talks about "catastrophic outcomes" from "more-powerful-than-human" AI. Does that not count as alarmism and x-risk? This isn't meant to be a gotcha, I just want to know what counts as too alarmist for you.
Setting aside tgb's comment, shouldn't it be ? The formula in the post would have positive growth even if , which doesn't seem right.
It only took 7 years to make substantial progress on this problem: Logical Induction by Garrabrant et al..
Taking on a 60-hour/week job to see if you burn out seems unwise to me. Some better plans:
Hi Bob, I noticed you have some boxes of stuff stacked up in the laundry room. I can't open the washing machine door all the way because the boxes are in the way. Could you please move them somewhere else?
Some of the boxes in that stack belong to my partner Carol, and I'd have to ask her if she's okay with them being moved.
In theory I could ask Carol if she's all right with the idea of moving the boxes. If Carol were to agree to the idea, I would need to find a new place for the boxes, then develop a plan for how to actually move the
Thanks for sharing your reasoning. For what it's worth, I worked on OpenAI's alignment team for two years and think they do good work :) I can't speak objectively, but I'd be happy to see talented people continue to join their team.
I think they're reducing AI x-risk in expectation because of the alignment research they publish (1 2 3 4). If anyone thinks that research or that kind of research is bad for the world, I'm happy to discuss.
Why do you think the alignment team at OpenAI is contributing on net to AI danger?
Maybe I don't know enough about OpenAI's alignment team to criticize it in public? I wanted to name one alignment outfit because I like to be as specific as possible in my writing. OpenAI popped into my head because of the reasons I describe below. I would be interested in your opinion. Maybe you'll change my mind.
I had severe doubts about the alignment project (the plan of creating an aligned superintelligence before any group manages an unaligned one) even before Eliezer went public with his grave doubts in the fall of last year. It's not that I consider...
Also, chess usually ends in a draw, which is lame. Go rarely if ever ends in a draw.
CFAR used to have an awesome class called "Be specific!" that was mostly about concreteness. Exercises included:
Agents who model each other can be modeled as programs with access to reflective oracles. I used to think the agents have to use the same oracle. But actually the agents can use different oracles, as long as each oracle can predict all the other oracles. This feels more realistic somehow.
Ok, I think in the OP you were using the word "secrecy" to refer to a narrower concept than I realized. If I understand correctly, if Alice tells Bob "please don't tell Bob", and then five years later when Alice is dead or definitely no longer interested or it's otherwise clear that there won't be negative consequences, Carol tells Bob, and Alice finds out and doesn't feel betrayed — then you wouldn't call that a "secret". I guess for it to be a "secret" Carol would have to promise to carry it to her grave, even if circumstances changed, or something.
In that case I don't have strong opinions about the OP.
Become unpersuadable by bad arguments. Seek the best arguments both for and against a proposition. And accept that you'll never be epistemically self-sufficient in all domains.
Suppose Alice has a crush on Bob and wants to sort out her feelings with Carol's help. Is it bad for Alice to inform Carol about the crush on condition of confidentiality?
Your Boycott-itarianism could work just through market signals. As long as your diet makes you purchase less high-cruelty food and more low-cruelty food, you'll increase the average welfare of farm animals, right? Choosing a simple threshold and telling everyone about it is additionally useful for coordination and maybe sending farmers non-market signals, if you believe those work.
If you really want the diet to be robustly good with respect to the question of whether farm animals' lives are net-positive, you'd want to tune the threshold so as not to change...
Yep, I skimmed it by looking at the colorful plots that look like Ising models and reading the captions. Those are always fun.
No, I just took a look. The spin glass stuff looks interesting!
I think you're saying , right? In that case, since embeds into , we'd have embedding into . So not really a step up.
If you want to play ordinal games, you could drop the requirement that agents are computable / Scott-continuous. Then you get the whole ordinal hierarchy. But then we aren't guaranteed equilibria in games between agents of the same order.
I suppose you could have a hybrid approach: Order is allowed to be discontinuous in its order- beliefs, but higher orders have to be continuous? Maybe that would get you to ....
And as a matter of scope, your reaction here is incorrect. [...] Reacting to it as a synecdoche of the agricultural system does not seem useful.
On my reading, the OP is legit saddened by that individual turkey. One could argue that scope demands she be a billion times sadder all the time about poultry farming in general, but that's infeasible. And I don't think that's a reductio against feeling sad about an individual turkey.
Sometimes, sadness and crying are about integrating one's beliefs. There's an intuitive part of your mind that doesn't understand ...
I apologize, I shouldn't have leapt to that conclusion.
it legitimately takes the whole 4 years after that to develop real AGI that ends the world. FINE. SO WHAT. EVERYONE STILL DIES.
By Gricean implicature, "everyone still dies" is relevant to the post's thesis. Which implies that the post's thesis is that humanity will not go extinct. But the post is about the rate of AI progress, not human extinction.
This seems like a bucket error, where "will takeoff be fast or slow?" and "will AI cause human extinction?" are put in the same bucket.
The central hypothesis of "takeoff speeds" is that at the time of serious AGI being developed, it is perfectly anti-Thielian in that it is devoid of secrets
No, the slow takeoff model just precludes there being one big secret that unlocks both 30%/year growth and dyson spheres. It's totally compatible with a bunch of medium-sized $1B secrets that different actors discover, adding up to hyperbolic economic growth in the years leading up to "rising out of the atmosphere".
Rounding off the slow takeoff hypothesis to "lots and lots of little innovations addin...
"Takeoff Speeds" has become kinda "required reading" in discussions on takeoff speeds. It seems like Eliezer hadn't read it until September of this year? He may have other "required reading" from the past four years to catch up on.
I read "Takeoff Speeds" at the time. I did not liveblog my reaction to it at the time. I've read the first two other items.
I flag your weirdly uncharitable inference.
Ah, great! To fill in some of the details:
Given agents and numbers such that , there is an aggregate agent called which means "agents and acting together as a group, in which the relative power of versus is the ratio of to ". The group does not make decisions by combining their utility functions, but instead by negotiating or fighting or something.
Aggregation should be associative, so .
If you spell out all the associativity relations, you'll find that
I like that you glossed the phrase "have your cake and eat it too":
It's like a toddler thinking that they can eat their slice of cake, and still have that very same slice of cake available to eat again the next morning.
I also like that you explained the snowclone "lies, damned lies, and statistics". I'm familiar with both of these cliches, but they're generally overused to the point of meaninglessness. It's clear you used them with purpose.
Interestingly, if my research is not mistaken, "eat your cake and have it too" was the original form of the phrase and is much clearer imo; I was always confused by "have your cake and eat it too" because that seemed to be just ... describing the normal order of operations?
The psychotic break you describe sounds very scary and unpleasant, and I'm sorry you experienced that.
Typo: "common, share, agreed-on" should be "...shared...".
People are fond of using the neologism "cruxy", but there's already a word for that: "crucial". Apparently this sense of "crucial" can be traced back to Francis Bacon.
This story originally had a few more italicized words, and they make a big difference:
"Don't," Jeffreyssai said. There was real pain in it.
"I do not know," said Jeffreyssai, from which the overall state of the evidence was obvious enough.
Some of the formatting must have been lost when it was imported to LessWrong 2.0. You can see the original formatting at readthesequences.com and in Rationality: AI to Zombies.
It seems to me that if I make some reasonable-ish assumptions, then 2 micromorts is equivalent to needing to drive for an hour at a random time in my life. I expect the value of my time to change over my life, but I'm not sure in which direction. So equating 2 micromorts with driving for an hour tonight is probably not a great estimate.
How do you deal with this? Have you thought about it and concluded that the value of your time today is a good estimate of the average value over your life? Or are you assuming that the value of your time won't change by more than, say, a factor of 2 over your life?
What's more, even selfish agents with de dicto identical utility functions can trade: If I have two right shoes and you have two left shoes, we'd trade one shoe for another because of decreasing marginal utility.