All of Eliezer Yudkowsky's Comments + Replies

More Christiano, Cotra, and Yudkowsky on AI progress

Want to +1 that a vaguer version of this was my own rough sense of RNNs vs. CNNs vs. Transformers.

4paulfchristiano10hI think transformers are a big deal, but I think this comment is a bad guess at the counterfactual and it reaffirms my desire to bet with you about either history or the future. One bet down, handful to go?
Biology-Inspired AGI Timelines: The Trick That Never Works

Relatedly, do you consider [function approximators for basically everything becoming better with time] to also fail to be a good predictor of AGI timelines for the same reasons that compute-based estimates fail?

Obviously yes, unless you can take the metrics on which your graphs show steady progress and really actually locate AGI on them instead of just tossing out a shot-in-the-dark biological analogy to locate AGI on them.

Biology-Inspired AGI Timelines: The Trick That Never Works

As much as Moravec-1988 and Moravec-1998 sound like they should be basically the same people, a decade passed between them, and I'd like to note that Moravec may legit have been making an updated version of his wrong argument in 1998 compared to 1988 after he had a chance to watch 10 more years pass and make his earlier prediction look less likely.

2paulfchristiano8hI think this is uncharitable and most likely based on a misreading of Moravec. (And generally with gwern on this one.) As far as I can tell, the source for your attribution of this "prediction" is: As far as I could tell it sounds from the surrounding text like his "prediction" for transformative impacts from AI was something like "between 2010 and 2030" with broad error bars.
"Infohazard" is a predominantly conflict-theoretic concept

You're basically just failing at modeling rational agents with utility functions different from yours, I'm sorry to say.  If the Puritans value pleasure, they can pursue it even after learning the true facts of the matter.  If they don't value pleasure, but you do, you're unhappy they learned the secret because now they'll do things you don't want, but they do want to do those things under their own utility functions.

1RedMan3dOh I understand, I'm trying to apply an external measure of utility, but it doesn't necessarily match up to an internal measure, so this example fails. Thank you! Edit: you've written before about your experiences growing up in an insular religious environment. Can you in retrospect identify any pieces of true widely known information that would qualify as an infohazard to that group using your definition? Obviously I wouldn't ask you to actually state the true fact or the reason it's an infohazard.
Biology-Inspired AGI Timelines: The Trick That Never Works

A lot of the advantage of human technology is due to human technology figuring out how to use covalent bonds and metallic bonds, where biology sticks to ionic bonds and proteins held together by van der Waals forces (static cling, basically).  This doesn't fit into your paradigm; it's just biology mucking around in a part of the design space easily accessible to mutation error, while humans work in a much more powerful design space because they can move around using abstract cognition.

2jacob_cannell4dCovalent/metallic vs ionic bonds implements the high energy density vs wetware constrained distinction I was referring to, so we are mostly in agreement; that is my paradigm. But the evidence is pretty clear that "ionic bond and protein" tech does approach the Landauer limit - at least for protein computation. As for the brain, end of Moore's Law high end chip research is very much neuromorphic (memristor crossbars, etc), and some designs do claim perhaps 10x or so greater synop/J than the brain (roughly), but they aren't built yet. So if you had wider uncertainty in your claim, with most mass in the region of the brain being 1 to 3 OOMs from the limit, I probably wouldn't have commented, but for me that one claim distracted from your larger valid points.
"Infohazard" is a predominantly conflict-theoretic concept

Nope.  You're evaluating their strategies using your utility function.  Infohazards occur when individuals or groups create strategies using their own utility functions and then do worse under their own utility functions when knowledge of true facts is added to them.

1RedMan4dSo, the true fact 'the female orgasm is not necessary for reproduction' would not qualify as an infohazard to a colonial Puritan, who believes that reproduction is good, and that the female orgasm is necessary for its' accomplishment? In order to turn it into an infohazard to that Puritan, do I have to add the (unstated in previous) assertion that 'experiencing orgasmic joy is utility positive'? Is there a way to fix this example or am I just completely off base here? Edit: I'm trying to define an edge case, hope I'm not offending anyone.
"Infohazard" is a predominantly conflict-theoretic concept

The idea of Transfiguring antimatter (assuming it works) is something that collectively harms all wizards if all wizards know it; it's a group infohazard.  The group infohazards seem worth distinguishing from the individual infohazards, but both seem much more worth distinguishing from secrets.  Secrets exist among rational agents; individual and group infohazards only exist among causal decision theorists, humans, and other such weird creatures.

"Infohazard" is a predominantly conflict-theoretic concept

We already have a word for information that agent A would rather have B not know, because B's knowledge of it benefits B but harms A; that word is 'secret'.

As this is a very common and ordinary state of affairs, we need a larger and more technical word to describe that rarer and more interesting case where B's veridical knowledge of a true fact X harms B, or when a group's collective knowledge of a true fact X harms the group collectively.


Bostrom's original paper defines "infohazard" so as to be inclusive of what you term "secrets". I define "self-infohazard" to describe a specific case of an individual being harmed by themselves knowing something. Perhaps you would like to propose a different taxonomy that disagrees with Bostrom's and/or my definitions?

EDIT: At MIRI, Nate Soares frequently used the term "infohazard" to refer to potentially dangerous technological secrets, in line with Bostrom's usage. I have no reason to believe that anyone at the organization would personally be harmed... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works

It does fit well there, but I think it was more inspired by the person I met who thought I was being way too arrogant by not updating in the direction of OpenPhil's timeline estimates to the extent I was uncertain.

Visible Thoughts Project and Bounty Announcement

I initially tried doing post-hoc annotation and found it much more difficult than thinking my own actual thoughts, putting them down, and writing the prompt that resulted.  Most of the work is in writing the thoughts, not the prompts, so adding pregenerated prompts at expense of making the thoughts more difficult is a loss.

Visible Thoughts Project and Bounty Announcement

<non-binding handwave, ask again and more formally if serious>I'd say we'd pay $2000/each for the first 50, but after that we might also want 5 longer runs to train on in order to have the option of training for longer-range coherence too.  I suppose if somebody has a system to produce only 100-step runs, and nobody offers us 1000-step runs, we'd take what we could get.</non-binding>

Visible Thoughts Project and Bounty Announcement

My coauthor and myself generated the sample run by taking turns on Action, Thought, Prompt.  That is, I wrote an Action, she wrote a Thought, I wrote a Prompt, she wrote an Action, I wrote a Thought, she wrote a Prompt.  This also helped show up immediately when a Thought underspecified a Prompt, because it meant the Thought and Prompt were never written by the same person.

More coherent overall plot is better - that current systems are terrible at this is all the more reason to try to show a dataset of it being done better.  There doesn't ne... (read more)

Visible Thoughts Project and Bounty Announcement

I state: we'd be happy, nay, ecstatic, to get nice coherent complete shorter runs, thereby disproving my concern that short runs won't be possible to complete, and to pay for them proportionally.

5Tapatakt6dSo, hypothetically, if you receive only nice coherent complete 100-steps runs, will you pay $2000 for the first 100?
Visible Thoughts Project and Bounty Announcement

We pay out $20,000 per run for the first 10 runs, as quality runs are received, not necessarily all to one group.  If more than one group demonstrates the ability to scale, we might ask more than one group to contribute to the $1M 100-run dataset.  Them cooperating with each other would hardly be a problem.  That said, a lot of the purpose of the 10-run trial is exactly to locate executives or groups that can scale - and maybe be employed by us again, after the prize ends - so everybody getting together to produce the first 10 runs, and then disbanding, in a process that doesn't scale to produce 100 runs, is not quite what we are hoping for here!

Visible Thoughts Project and Bounty Announcement
  • 1:  I expect that it's easier for authors to write longer thoughtful things that make sense;
  • 2:  MIRI doesn't just target the AI we have, it targets the AI we're afraid we'll get;
  • 3:  Present-day use-cases for dungeons are a long-range problem even if they're currently addressed with short-range technology.

Answer 1:  Longer is easier to write per-step.

Fitting a coherent story with interesting stuff going on into 100 steps, is something I expect to be much harder for a human author than fitting that story into 1000 steps.  Novels are ... (read more)

-1Padure19hYou are completely missing that it turns into lottery from perspective of potential writer. You are asking people to spend enormous amount of work on writing 600 pages and hope that what they and what you consider as high-quality will align. AND that 10 slots will not be used up before they will complete. This way only people willing to take big risks and with plenty of spare time will remain. I would strongly suggest to start from something shorter. BTW, is 60 000 pages sufficient to train some pattern matching like GPT-3?

1:  I expect that it's easier for authors to write longer thoughtful things that make sense;

I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don't want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it's a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it's easier to find 1,000 people to write 60 pages each than it is to find 10... (read more)

9plex7dThese are reasonable points, but I am curious about whether you would accept a high-quality run of shorter (but still considerable) length for a payout of <steps>/1000 of $20,000, and approximately the lower bound of run length which seems likely to be valuable? Producing 600 pages of text is an extremely big commitment for uncertain gains, especially with the potential to run out of early slots and no guarantee that it will be included in the 100 later, giving people the option to do even modestly smaller chunks may mean much greater uptake and more high quality work to chose from.
Visible Thoughts Project and Bounty Announcement

We're guessing 1000 steps per reasonably-completed run (more or less, doesn't have to be exact) and guessing maybe 300 words per step, mostly 'thought'.  Where 'thoughts' can be relatively stream-of-consciousness once accustomed (we hope) and the dungeon run doesn't have to be Hugo quality in its plotting, so it's not like we're asking for a 300,000-word edited novel.

9WilliamKiely7dThe sample [] Nate linked is 30 pages and 12,267 words. So that works out to ~730 pages for a run. $20,000/300,000 words = $1 per 15 words. If an author writing it manually could average 15 wpm, that would be $60/hour.
Ngo and Yudkowsky on alignment difficulty

Singapore probably looks a lot less attractive to threaten if it's allied with another world power that can find and melt arbitrary objects.

Ngo and Yudkowsky on alignment difficulty

"Melt all GPUs" is indeed an unrealistic pivotal act - which is why I talk about it, since like any pivotal act it is outside the Overton Window, and then if any children get indignant about the prospect of doing something other than letting the world end miserably, I get to explain the child-reassuring reasons why you would never do the particular thing of "melt all GPUs" in real life.  In this case, the reassuring reason is that deploying open-air nanomachines to operate over Earth is a huge alignment problem, that is, relatively huger than the leas... (read more)

6Wei_Dai9dDo you have a plan to communicate the content of this to people whom it would be beneficial to communicate to? E.g., write about it in some deniable way, or should such people just ask you about it privately? Or more generally, how do you think that discussions / intellectual progress on this topic should go? Do you think the least difficult pivotal act you currently see has sociopolitical problems that are similar to "melt all GPUs"? Thanks for the clarification. I suggest mentioning this more often (like in the Arbital page), as I previously didn't think that your version of "pivotal act" had a significant sociopolitical component. If this kind of pivotal act is indeed how the world gets saved (conditional on the world being saved), one of my concerns is that "a miracle occurs" and the alignment problem gets solved, but the sociopolitical problem doesn't because nobody was working on it (even if it's easier in some sense). (Not a high priority to discuss this here and now, but) I'm skeptical that backing by a small government like Singapore is sufficient, since any number of major governments would be very tempted to grab the AGI(+team) from the small government, and the small government will be under tremendous legal and diplomatic stress from having nonconsensually destroyed a lot of very valuable other people's property. Having a partially aligned/alignable AGI in the hands of a small, geopolitically weak government seems like a pretty precarious state.
Yudkowsky and Christiano discuss "Takeoff Speeds"

Maybe another way of phrasing this - how much warning do you expect to get, how far out does your Nope Vision extend?  Do you expect to be able to say "We're now in the 'for all I know the IMO challenge could be won in 4 years' regime" more than 4 years before it happens, in general?  Would it be fair to ask you again at the end of 2022 and every year thereafter if we've entered the 'for all I know, within 4 years' regime?

Added:  This question fits into a larger concern I have about AI soberskeptics in general (not you, the soberskeptics wou... (read more)

I think I'll get less confident as our accomplishments get closer to the IMO grand challenge. Or maybe I'll get much more confident if we scale up from $1M -> $1B and pick the low hanging fruit without getting fairly close, since at that point further progress gets a lot easier to predict

There's not really a constant time horizon for my pessimism, it depends on how long and robust a trend you are extrapolating from. 4 years feels like a relatively short horizon, because theorem-proving has not had much investment so compute can be scaled up several orde... (read more)

Christiano, Cotra, and Yudkowsky on AI progress

I also think human brains are better than elephant brains at most things - what did I say that sounded otherwise?

2paulfchristiano11dOops, this was in reference to the later part of the discussion where you disagreed with "a human in a big animal body, with brain adapted to operate that body instead of our own, would beat a big animal [without using tools]".
Yudkowsky and Christiano discuss "Takeoff Speeds"

Okay, then we've got at least one Eliezerverse item, because I've said below that I think I'm at least 16% for IMO theorem-proving by end of 2025.  The drastic difference here causes me to feel nervous, and my second-order estimate has probably shifted some in your direction just from hearing you put 1% on 2024, but that's irrelevant because it's first-order estimates we should be comparing here.

So we've got huge GDP increases for before-End-days signs of Paulverse and quick IMO proving for before-End-days signs of Eliezerverse?  Pretty bare port... (read more)

I think IMO gold medal could be well before massive economic impact, I'm just surprised if it happens in the next 3 years. After a bit more thinking (but not actually looking at IMO problems or the state of theorem proving) I probably want to bump that up a bit, maybe 2%, it's hard reasoning about the tails. 

I'd say <4% on end of 2025.

I think this is the flipside of me having an intuition where I say things like "AlphaGo and GPT-3 aren't that surprising"---I have a sense for what things are and aren't surprising, and not many things happen that are... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I expect it to be hella difficult to pick anything where I'm at 75% that it happens in the next 5 years and Paul is at 25%.  Heck, it's not easy to find things where I'm at over 75% that aren't just obvious slam dunks; the Future isn't that easy to predict.  Let's get up to a nice crawl first, and then maybe a small portfolio of crawlings, before we start trying to make single runs that pierce the sound barrier.

I frame no prediction about whether Paul is under 16%.  That's a separate matter.  I think a little progress is made toward eventual epistemic virtue if you hand me a Metaculus forecast and I'm like "lol wut" and double their probability, even if it turns out that Paul agrees with me about it.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Ha!  Okay then.  My probability is at least 16%, though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more.  Paul?

EDIT:  I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists.  I'll stand by a >16% probabilit... (read more)

6paulfchristiano21hBased on the other thread I now want to revise this prediction, both because 4% was too low and "IMO gold" has a lot of noise in it based on test difficulty. I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.) Maybe I'll go 8% on "gets gold" instead of "solves hardest problem." Would be good to get your updated view on this so that we can treat it as staked out predictions.

I don't care about whether the AI is open-sourced (I don't expect anyone to publish the weights even if they describe their method) and I'm not that worried about our ability to arbitrate overfitting.

Ajeya suggested that I clarify: I'm significantly more impressed by an AI getting a gold medal than getting a bronze, and my 4% probability is for getting a gold in particular (as described in the IMO grand challenge). There are some categories of problems that can be solved using easy automation (I'd guess about 5-10% could be done with no deep learning and m... (read more)

4Matthew Barnett12dIf this task is bad for operationalization reasons, there are other theorem proving benchmarks []. Unfortunately it looks like there aren't a lot of people that are currently trying to improve on the known benchmarks, as far as I'm aware. The code generation benchmarks [] are slightly more active. I'm personally partial to Hendrycks et al.'s APPS benchmark [], which includes problems that "range in difficulty from introductory to collegiate competition level and measure coding and problem-solving ability." (Github link []).
7Matthew Barnett12dIt feels like this bet would look a lot better if it were about something that you predict at well over 50% (with people in Paul's camp still maintaining less than 50%). So, we could perhaps modify the terms such that the bot would only need to surpass a certain rank or percentile-equivalent in the competition (and not necessarily receive the equivalent of a Gold medal). The relevant question is which rank/percentile you think is likely to be attained by 2025 under your model but you predict would be implausible under Paul's model. This may be a daunting task, but one way to get started is to put a probability distribution over what you think the state-of-the-art will look like by 2025, and then compare to Paul's. Edit: Here are, for example, the individual rankings for 2021: []
Christiano, Cotra, and Yudkowsky on AI progress

Mostly, I think the Future is not very predictable in some ways, and this extends to, for example, it being the possible that 2022 is the year where we start Final Descent and by 2024 it's over, because it so happened that although all the warning signs were Very Obvious In Retrospect they were not obvious in antecedent and so stuff just started happening one day.  The places where I dare to extend out small tendrils of prediction are the rare exception to this rule; other times, people go about saying, "Oh, no, it definitely couldn't start in 2022" a... (read more)

I'm mostly not looking for virtue points, I'm looking for: (i) if your view is right then I get some kind of indication of that so that I can take it more seriously, (ii) if your view is wrong then you get some indication feedback to help snap you out of it.

I don't think it's surprising if a GPT-3 sized model can do relatively good translation. If talking about this prediction, and if you aren't happy just predicting numbers for overall value added from machine translation, I'd kind of like to get some concrete examples of mediocre translations or concrete problems with existing NMT that you are predicting can be improved.

Christiano, Cotra, and Yudkowsky on AI progress

If they've found some way to put a lot more compute into GPT-4 without making the model bigger, that's a very different - and unnerving - development.

2RomanS11dOne way they could do that, is by pitting the model against modified versions of itself, like they did in OpenAI Five (for Dota). From the minimizing-X-risk perspective, it might be the worst possible way to train AIs. As Jeff Clune (Uber AI) put it: Additionally, if you train a language model to outsmart millions of increasingly more intelligent copies of itself, you might end up with the perfect AI-box escape artist.

I believe Sam Altman implied they’re simply training a GPT-3-variant for significantly longer for “GPT-4”. The GPT-3 model in prod is nowhere near converged on its training data.

Edit: changed to be less certain, pretty sure this follows from public comments by Sam, but he has not said this exactly

Yudkowsky and Christiano discuss "Takeoff Speeds"

(I'm currently slightly hopeful about the theorem-proving thread, elsewhere and upthread.)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I have a sense that there's a lot of latent potential for theorem-proving to advance if more energy gets thrown at it, in part because current algorithms seem a bit weird to me - that we are waiting on the equivalent of neural MCTS as an enabler for AlphaGo, not just a bigger investment, though of course the key trick could already have been published in any of a thousand papers I haven't read.  I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO chal... (read more)

Yes, IMO challenge falling in 2024 is surprising to me at something like the 1% level or maybe even more extreme (though could also go down if I thought about it a lot or if commenters brought up relevant considerations, e.g. I'd look at IMO problems and gold medal cutoffs and think about what tasks ought to be easy or hard; I'm also happy to make more concrete per-question predictions). I do think that there could be huge amounts of progress from picking the low hanging fruit and scaling up spending by a few orders of magnitude, but I still don't expect i... (read more)

I feel like I "would not be surprised at all" if we get a bunch of shocking headlines in 2023 about theorem-proving problems falling, after which the IMO challenge falls in 2024

Possibly helpful: Metaculus currently puts the chances of the IMO grand challenge falling by 2025 at about 8%. Their median is 2039.

I think this would make a great bet, as it would definitely show that your model can strongly outperform a lot of people (and potentially Paul too). And the operationalization for the bet is already there -- so little work will be needed to do that part.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I kind of want to see you fight this out with Gwern (not least for social reasons, so that people would perhaps see that it wasn't just me, if it wasn't just me).

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."  We want to know when humans land on the moon, not whether their brain sizes continued on a smooth trend extrapolated over the last million years.

I think there's a very real sense in which, yes... (read more)

But it seems to me that the very obvious GPT-5 continuation of Gwern would say, "Gradualists can predict meaningless benchmarks, but they can't predict the jumpy surface phenomena we see in real life."

Don't you think you're making a falsifiable prediction here?

Name something that you consider part of the "jumpy surface phenomena" that will show up substantially before the world ends (that you think Paul doesn't expect). Predict a discontinuity. Operationalize everything and then propose the bet.

Christiano, Cotra, and Yudkowsky on AI progress

I don't necessarily expect GPT-4 to do better on perplexity than would be predicted by a linear model fit to neuron count plus algorithmic progress over time; my guess for why they're not scaling it bigger would be that Stack More Layers just basically stopped scaling in real output quality at the GPT-3 level.  They can afford to scale up an OOM to 1.75 trillion weights, easily, given their funding, so if they're not doing that, an obvious guess is that it's because they're not getting a big win from that.  As for their ability to then make algor... (read more)

While GPT-4 wouldn't be a lot bigger than GPT-3, Sam Altman did indicate that it'd use a lot more compute. That's consistent with Stack More Layers still working; they might just have found an even better use for compute.

(The increased compute-usage also makes me think that a Paul-esque view would allow for GPT-4 to be a lot more impressive than GPT-3, beyond just modest algorithmic improvements.)

Christiano, Cotra, and Yudkowsky on AI progress

My memory of the past is not great in general, but considering that I bet sums of my own money and advised others to do so, I am surprised that my memory here would be that bad, if it was.

Neither GJO nor Metaculus are restricted to only past superforecasters, as I understand it; and my recollection is that superforecasters in particular, not all participants at GJO or Metaculus, were saying in the range of 20%.  Here's an example of one such, which I have a potentially false memory of having maybe read at the time:

3Matthew Barnett12dThanks for clarifying. That makes sense that you may have been referring to a specific subset of forecasters. I do think that some forecasters tend to be much more reliable than others (and maybe there was/is a way to restrict to "superforecasters" in the UI). I will add the following piece of evidence, which I don't think counts much for or against your memory, but which still seems relevant. Metaculus shows a histogram of predictions. On the relevant question [] , a relatively high fraction of people put a 20% chance, but it also looks like over 80% of forecasters put higher credences.
Christiano, Cotra, and Yudkowsky on AI progress

Somebody tries to measure the human brain using instruments that can only detect numbers of neurons and energy expenditure, but not detect any difference of how the fine circuitry is wired; and concludes the human brain is remarkable only in its size and not in its algorithms.  You see the problem here?  The failure of large dinosaurs to quickly scale is a measuring instrument that detects how their algorithms scaled with more compute (namely: poorly), while measuring the number of neurons in a human brain tells you nothing about that at all.

8RomanS12dJeff Hawkins provided a rather interesting argument on the topic: The scaling of the human brain has happened too fast to implement any deep changes in how the circuitry works. The entire scaling process was mostly done by the favorite trick of biological evolution: copy and paste existing units (in this case - cortical columns). Jeff argues that there is no change in the basic algorithm between earlier primates and humans. It's the same reference-frames processing algo distributed across columns. The main difference is, humans have much more columns. I've found his arguments convincing for two reasons: * his neurobiological arguments are surprisingly good (to the point of being surprisingly obvious in hindsight) * It's the same "just add more layers" trick we reinvented in ML Are we sure about the low intelligence of dinosaurs? Judging by the living dinos (e.g. crows), they are able to pack a chimp-like intelligence into a 0.016 kg brain. And some of the dinos have had x60 more of it (e.g. the brain of Tyrannosaurus rex weighted about 1 kg, which is comparable to Homo erectus). And some of the dinos have had a surprisingly large encephalization quotient, combined with bipedalism, gripping hands, forward-facing eyes, omnivorism, nest building, parental care, and living in groups (e.g. troodontids []). Maybe it was not an asteroid after all... (Very unlikely, of course. But I find the idea rather amusing)
Yudkowsky and Christiano discuss "Takeoff Speeds"

I find it valuable to know what impressions other people had themselves; it only becomes tone-policing when you worry loudly about what impressions other people 'might' have.  (If one is worried about how it looks to say so publicly, one could always just DM me (though I might not respond).)

Christiano, Cotra, and Yudkowsky on AI progress

Furthermore 2/3 doom is straightforwardly the wrong thing to infer from the 1:1 betting odds, even taking those at face value and even before taking interest rates into account; Bryan gave me $100 which gets returned as $200 later.

(I do consider this a noteworthy example of 'People seem systematically to make the mistake in the direction that interprets Eliezer's stuff as more weird and extreme' because it's a clear arithmetical error and because I saw a recorded transcript of it apparently passing the notice of several people I considered usually epistemi... (read more)

Yes, Rob is right about the inference coming from the bet and Eliezer is right that the bet was actually 1:1 odds but due to the somewhat unusual bet format I misread it as 2:1 odds.

3Rob Bensinger12dMaybe I'm wrong about her deriving this from the Caplan bet? Ajeya hasn't actually confirmed that, it was just an inference I drew. I'll poke her to double-check.
Christiano, Cotra, and Yudkowsky on AI progress

I feel like the biggest subjective thing is that I don't feel like there is a "core of generality" that GPT-3 is missing

I just expect it to gracefully glide up to a human-level foom-ing intelligence

This is a place where I suspect we have a large difference of underlying models.  What sort of surface-level capabilities do you, Paul, predict that we might get (or should not get) in the next 5 years from Stack More Layers?  Particularly if you have an answer to anything that sounds like it's in the style of Gwern's questions, because I think those a... (read more)

9paulfchristiano11dI agree we seem to have some kind of deeper disagreement here. I think stack more layers + known training strategies (nothing clever) + simple strategies for using test-time compute (nothing clever, nothing that doesn't use the ML as a black box) can get continuous improvements in tasks like reasoning (e.g. theorem-proving), meta-learning (e.g. learning to learn new motor skills), automating R&D (including automating executing ML experiments, or proposing new ML experiments), or basically whatever. I think these won't get to human level in the next 5 years. We'll have crappy versions of all of them. So it seems like we basically have to get quantitative. If you want to talk about something we aren't currently measuring, then that probably takes effort, and so it would probably be good if you picked some capability where you won't just say "the Future is hard to predict." (Though separately I expect to make somewhat better predictions than you in most of these domains.) A plausible example is that I think it's pretty likely that in 5 years, with mere stack more layers + known techniques (nothing clever), you can have a system which is clearly (by your+my judgment) "on track" to improve itself and eventually foom, e.g. that can propose and evaluate improvements to itself, whose ability to evaluate proposals is good enough that it will actually move in the right direction and eventually get better at the process, etc., but that it will just take a long time for it to make progress. I'd guess that it looks a lot like a dumb kid in terms of the kind of stuff it proposes and its bad judgment (but radically more focused on the task and conscientious and wise than any kid would be). Maybe I think that's 10% unconditionally, but much higher given a serious effort. My impression is that you think this is unlikely without adding in some missing secret sauce to GPT, and that my picture is generally quite different from your criticallity-flavored model of takeoff.

If you give me 1 or 10 examples of surface capabilities I'm happy to opine. If you want me to name industries or benchmarks, I'm happy to opine on rates of progress. I don't like the game where you say "Hey, say some stuff. I'm not going to predict anything and I probably won't engage quantitatively with it since I don't think much about benchmarks or economic impacts or anything else that we can even talk about precisely in hindsight for GPT-3."

I don't even know which of Gwern's questions you think are interesting/meaningful. "Good meta-learning"--I don't... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world).

Would you say that this is a good description of Suddenly Hominids but you don't expect that to happen again, or that this is a bad description of hominids?

6paulfchristiano12dIt's not a description of hominids at all, no one spent any money on R&D. I think there are analogies where this would be analogous to hominids (which I think are silly, as we discuss in the next part of this transcript). And there are analogies where this is a bad description of hominids (which I prefer).
Yudkowsky and Christiano discuss "Takeoff Speeds"

Thanks for continuing to try on this!  Without having spent a lot of labor myself on looking into self-driving cars, I think my sheer impression would be that we'll get $1B/yr waifutech before we get AI freedom-of-the-road; though I do note again that current self-driving tech would be more than sufficient for $10B/yr revenue if people built new cities around the AI tech level, so I worry a bit about some restricted use-case of self-driving tech that is basically possible with current tech finding some less regulated niche worth a trivial $10B/yr. &nb... (read more)

4paulfchristiano12dYes, I think that value added by automated translation will follow a similar pattern. Number of words translated is more sensitive to how you count and random nonsense, as is number of "users" which has even more definitional issues. You can state a prediction about self-driving cars in any way you want. The obvious thing is to talk about programs similar to the existing self-driving taxi pilots (e.g. Waymo One) and ask when they do $X of revenue per year, or when $X of self-driving trucking is done per year. (I don't know what AI freedom-of-the-road means, do you mean something significantly more ambitious than self-driving trucks or taxis?)
Yudkowsky and Christiano discuss "Takeoff Speeds"

Once you can buy a self-driving car, the thing that Paul predicts with surety and that I shrug about has already happened. If it does happen, my model says very little about remaining timeline from there one way or another. It shrugs again and says, "Guess that's how difficult the AI problem and regulatory problem were."

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think you are underconfident about the fact that almost all AI profits will come from areas that had almost-as-much profit in recent years. So we could bet about where AI profits are in the near term, or try to generalize this.

I wouldn't be especially surprised by waifutechnology or machine translation jumping to newly accessible domains (the thing I care about and you shrug about (until the world ends)), but is that likely to exhibit a visible economic discontinuity in profits (which you care about and I shrug about (until the world ends))?  There'... (read more)

7paulfchristiano12dMan, the problem is that you say the "jump to newly accessible domains" will be the thing that lets you take over the world. So what's up for dispute is the prototype being enough to take over the world rather than years of progress by a giant lab on top of the prototype. It doesn't help if you say "I expect new things to sometimes become possible" if you don't further say something about the impact of the very early versions of the product. If e.g. people were spending $1B/year developing a technology, and then after a while it jumps from 0/year to $1B/year of profit, I'm not that surprised. (Note that machine translation is radically smaller than this, I don't know the numbers.) I do suspect they could have rolled out a crappy version earlier, perhaps by significantly changing their project. But why would they necessarily bother doing that? For me this isn't violating any of the principles that make your stories sound so crazy. The crazy part is someone spending $1B and then generating $100B/year in revenue (much less $100M and then taking over the world). (Note: it is surprising if an industry is spending $10T/year on R&D and then jumps from $1T --> $10T of revenue in one year in a world that isn't yet growing crazily. The surprising depends a lot on the numbers involved, and in particular on how valuable it would have been to deploy a worse version earlier and how hard it is to raise money at different scales.)

I'd be happy to disagree about romantic chatbots or machine translation. I'd have to look into it more to get a detailed sense in either, but I can guess. I'm not sure what "wouldn't be especially surprised" means, I think to actually get disagreements we need way more resolution than that so one question is whether you are willing to play ball (since presumably you'd also have to looking into to get a more detailed sense). Maybe we could save labor if people would point out the empirical facts we're missing and we can revise in light of that, but we'd sti... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

And to say it also explicitly, I think this is part of why I have trouble betting with Paul.  I have a lot of ? marks on the questions that the Gwern voice is asking above, regarding them as potentially important breaks from trend that just get dumped into my generalized inbox one day.  If a gradualist thinks that there ought to be a smooth graph of perplexity with respect to computing power spent, in the future, that's something I don't care very much about except insofar as it relates in any known way whatsoever to questions like those the Gwer... (read more)

This seems totally bogus to me.

It feels to me like you mostly don't have views about the actual impact of AI as measured by jobs that it does or the $s people pay for them, or performance on any benchmarks that we are currently measuring, while I'm saying I'm totally happy to use gradualist metrics to predict any of those things. If you want to say "what does it mean to be a gradualist" I can just give you predictions on them. 

To you this seems reasonable, because e.g. $ and benchmarks are not the right way to measure the kinds of impacts we care abou... (read more)

What does it even mean to be a gradualist about any of the important questions like those of the Gwern-voice, when they don't relate in known ways to the trend lines that are smooth?

Perplexity is one general “intrinsic” measure of language models, but there are many task-specific measures too. Studying the relationship between perplexity and task-specific measures is an important part of the research process. We shouldn’t speak as if people do not actively try to uncover these relationships.

I would generally be surprised if there were many highly non-li... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I predict that people will explicitly collect much larger datasets of human behavior as the economic stakes rise. This is in contrast to e.g. theorem-proving working well, although I think that theorem-proving may end up being an important bellwether because it allows you to assess the capabilities of large models without multi-billion-dollar investments in training infrastructure.

Well, it sounds like I might be more bullish than you on theorem-proving, possibly.  Not on it being useful or profitable, but in terms of underlying technology making progr... (read more)

I'm going to make predictions by drawing straight-ish lines through metrics like the ones in the gpt-f paper. Big unknowns are then (i) how many orders of magnitude of "low-hanging fruit" are there before theorem-proving even catches up to the rest of NLP? (ii) how hard their benchmarks are compared to other tasks we care about. On (i) my guess is maybe 2? On (ii) my guess is "they are pretty easy" / "humans are pretty bad at these tasks," but it's somewhat harder to quantify. If you think your methodology is different from that then we will probably end u... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I feel a bit confused about where you think we meta-disagree here, meta-policy-wise.  If you have a thesis about the sort of things I'm liable to disagree with you about, because you think you're more familiar with the facts on the ground, can't you write up Paul's View of the Next Five Years and then if I disagree with it better yet, but if not, you still get to be right and collect Bayes points for the Next Five Years?

I mean, it feels to me like this should be a case similar to where, for example, I think I know more about macroeconomics than your t... (read more)

I think you think there's a particular thing I said which implies that the ball should be in my court to already know a topic where I make a different prediction from what you do.

I've said I'm happy to bet about anything, and listed some particular questions I'd bet about where I expect you to be wronger. If you had issued the same challenge to me, I would have picked one of the things and we would have already made some bets. So that's why I feel like the ball is in your court to say what things you're willing to make forecasts about.

That said, I don't kn... (read more)

Inevitably, you can go back afterwards and claim it wasn't really a surprise in terms of the abstractions that seem so clear and obvious now, but I think it was surprised then

It seems like you are saying that there is some measure that was continuous all along, but that it's not obvious in advance which measure was continuous. That seems to suggest that there are a bunch of plausible measures you could suggest in advance, and lots of interesting action will be from changes that are discontinuous changes on some of those measures. Is that right?

If so, don't... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I wish to acknowledge this frustration, and state generally that I think Paul Christiano occupies a distinct and more clueful class than a lot of, like, early EAs who mm-hmmmed along with Robin Hanson on AI - I wouldn't put, eg, Dario Amodei in that class either, though we disagree about other things.

But again, Paul, it's not enough to say that you weren't surprised by GPT-2/3 in retrospect, it kinda is important to say it in advance, ideally where other people can see?  Dario picks up some credit for GPT-2/3 because he clearly called it in advance. &... (read more)

Suppose your view is "crazy stuff happens all the time" and my view is "crazy stuff happens rarely." (Of course "crazy" is my word, to you it's just normal stuff.) Then what am I supposed to do, in your game?

More broadly: if you aren't making bold predictions about the future, why do you think that other people will? (My predictions all feel boring to me.) And if you do have bold predictions, can we talk about some of them instead?

It seems to me like I want you to say "well I think 20% chance something crazy happens here" and I say "nah, that's more like 5... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I do wish to note that we spent a fair amount of time on Discord trying to nail down what earlier points we might disagree on, before the world started to end, and these Discord logs should be going up later.

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of us have ever heard of superforecasting and both of us are liable to predict near-term initial seg... (read more)

8Jotto99912dI disagree that this is a meaningful forecasting track record. Massive degrees of freedom, and the mentioned events seem unresolvable, and it's highly ambiguous how these things particularly prove the degree of error unless they were properly disambiguated in advance. Log score or it didn't happen. (Slightly edited to try and sound less snarky)

From my perspective, the basic problem is that Eliezer's story looks a lot like "business as usual until the world starts to end sharply", and Paul's story looks like "things continue smoothly until their smooth growth ends the world smoothly", and both of have ever heard of superforecasting and both of us are liable to predict near-term initial segments by extrapolating straight lines while those are available.

I agree that it's plausible that we both make the same predictions about the near future. I think we probably don't, and there are plenty of disagr... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

That was a pretty good Eliezer model; for a second I was trying to remember if and where I'd said that.

Yudkowsky and Christiano discuss "Takeoff Speeds"

The "weirdly uncharitable" part is saying that it "seemed like" I hadn't read it vs. asking.  Uncertainty is one thing, leaping to the wrong guess another.

Yudkowsky and Christiano discuss "Takeoff Speeds"

I read "Takeoff Speeds" at the time.  I did not liveblog my reaction to it at the time.  I've read the first two other items.

I flag your weirdly uncharitable inference.

FWIW, I did not find this weirdly uncharitable, only mildly uncharitable. I have extremely wide error bars on what you have and have not read, and "Eliezer has not read any of the things on that list" was within those error bars. It is really quite difficult to guess your epistemic state w.r.t. specific work when you haven't been writing about it for a while.

(Though I guess you might have been writing about it on Twitter? I have no idea, I generally do not use Twitter myself, so I might have just completely missed anything there.)


I apologize, I shouldn't have leapt to that conclusion.

Ngo and Yudkowsky on alignment difficulty

My reply to your distinction between 'consequentialists' and 'outcome pumps' would be, "Please forget entirely about any such thing as a 'consequentialist' as you defined it; I would now like to talk entirely about powerful outcome pumps.  All understanding begins there, and we should only introduce the notion of how outcomes are pumped later in the game.  Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine."

(Modulo that lots... (read more)

5Ramana Kumar12dA couple of direct questions I'm stuck on: * Do you agree that Flint's optimizing systems are a good model (or even definition) of outcome pumps? * Are black holes and fires reasonable examples of outcome pumps? I'm asking these to understand the work better. Currently my answers are: * Yes. Flint's notion is one I came to independently when thinking about "goal-directedness". It could be missing some details, but I find it hard to snap out of the framework entirely. * Yes. But maybe not the most informative examples. They're highly non-retargetable.
2Daniel Kokotajlo12dI don't know the relevant history of science, but I wouldn't be surprised if something like the opposite was true: Our modern, very useful understanding of work is an abstraction that grew out of many people thinking concretely about various engines. Thinking about engines was like the homework exercises that helped people to reach and understand the concept of work. Similarly, perhaps it is pedagogically (and conceptually) helpful to begin with the notion of a consequentialist and then generalize to outcome pumps.
Ngo and Yudkowsky on AI capability gains

I think some of your confusion may be that you're putting "probability theory" and "Newtonian gravity" into the same bucket.  You've been raised to believe that powerful theories ought to meet certain standards, like successful bold advance experimental predictions, such as Newtonian gravity made about the existence of Neptune (quite a while after the theory was first put forth, though).  "Probability theory" also sounds like a powerful theory, and the people around you believe it, so you think you ought to be able to produce a powerful advance p... (read more)

it seems to me that you want properly to be asking "How do we know this empirical thing ends up looking like it's close to the abstraction?" and not "Can you show me that this abstraction is a very powerful one?"

I agree that "powerful" is probably not the best term here, so I'll stop using it going forward (note, though, that I didn't use it in my previous comment, which I endorse more than my claims in the original debate).

But before I ask "How do we know this empirical thing ends up looking like it's close to the abstraction?", I need to ask "Does the ab... (read more)

4adamShimi17dThat's a really helpful comment (at least for me)! I'm guessing that a lot of the hidden work here and in the next steps would come from asking stuff like: * so I need to alter the bucket for each new idea, or does it instead fit in its current form each time? * does the mental act of finding that an idea fit into the bucket removes some confusion and clarifies, or is it just a mysterious answer [] ? * Does the bucket become more simple and more elegant with each new idea that fit in it? Is there some truth in this, or am I completely off the mark? You obviously can do whatever you want, but I find myself confused at this idea being discarded. Like, it sounds exactly like the antidote to so much confusion around these discussions and your position, such that if that was clarified, more people could contribute helpfully to the discussion, and either come to your side or point out non-trivial issues with your perspective. Which sounds really valuable for both you and the field! So I'm left wondering: * Do you disagree with my impression of the value of such a subsequence? * Do you think it would have this value but are spending your time doing something more valuable? * Do you think it would be valuable but really don't want to write it? * Do you think it would be valuable, you could in principle write it, but probably no one would get it even if you did? * Something else I'm failing to imagine? Once again, you do what you want, but I feel like this would be super valuable if there was anyway of making that possible. That's also completely relevant to my own focus on the different epistemic strategies [] used in alignment research, especially because we don't have access to empirical evidence or trial and error at all for AGI-type problems. (I'm also quite curious if you think
Load More