All of Daniel Kokotajlo's Comments + Replies

Daniel Kokotajlo's Shortform

Right. So, what do you think about the AI-timelines-related claim then? Will we need medium or long-horizon training for a number of episodes within an OOM or three of parameter count to get something x-risky?

ETA: To put it more provocatively: If EfficientZero can beat humans at Atari using less game experience starting from a completely blank slate whereas humans have decades of pre-training, then shouldn't a human-brain-sized EfficientZero beat humans at any intellectual task given decades of experience at those tasks + decades of pre-training similar to human pre-training.

3Conor Sullivan1dCan EfficientZero beat Montezuma's Revenge?
4gwern1dI have no good argument that a human-sized EfficientZero would somehow need to be much slower than humans. Arguing otherwise sounds suspiciously like moving the goalposts after an AI effect: "look how stupid DL agents are, they need tons of data to few-shot stuff like challenging text tasks or image classifications, and they OOMs more data on even something as simple as ALE games! So inefficient! So un-human-like! This should deeply concern any naive DL enthusiast, that the archs are so bad & inefficient." [later] "Oh no. Well... 'the curves cross', you know, this merely shows that DL agents can get good performance on uninteresting tasks, but human brains will surely continue showing their tremendous sample-efficiency in any real problem domain, no matter how you scale your little toys." -------------------------------------------------------------------------------- As I've said before, I continue to ask myself what it is that the human brain does with all the resources it uses, particularly with the estimates that put it at like 7 OOMs more than models like GPT-3 or other wackily high FLOPS-equivalence. It does not seem like those models do '0.0000001% of human performance', in some sense.
Discussion with Eliezer Yudkowsky on AGI interventions

It's not consensus. Ajeya, Richard, Paul, and Rohin are prominent examples of people widely considered to have expertise on this topic who think it's not true. (I think they'd say something more like 10% chance? IDK)

Daniel Kokotajlo's Shortform

I used to think that current AI methods just aren't nearly as sample/data - efficient as humans. For example, GPT-3 had to read 300B tokens of text whereas humans encounter 2 - 3 OOMs less, various game-playing AIs had to play hundreds of years worth of games to get gud, etc.

Plus various people with 20 - 40 year AI timelines seem to think it's plausible -- in fact, probable -- that unless we get radically new and better architectures, this will continue for decades, meaning that we'll get AGI only when we can actually train AIs on medium or long-horizon ta... (read more)

9gwern1dThe 'poverty of stimulus' argument proves too much, and is just a rehash of the problem of induction, IMO. Everything that humans learn is ill-posed/underdetermined/vulnerable to skeptical arguments and problems like Duhem-Quine or the grue paradox. There's nothing special about language. And so - it all adds up to normality - since we solve those other inferential problems, why shouldn't we solve language equally easily and for the same reasons? If we are not surprised that lasso can fit a good linear model by having an informative prior about coefficients being sparse/simple, we shouldn't be surprised if human children can learn a language without seeing an infinity of every possible instance of a language or if a deep neural net can do similar things.
Christiano, Cotra, and Yudkowsky on AI progress

Is that one dense or sparse/MoE? How many data points was it trained for? Does it set SOTA on anything? (I'm skeptical; I'm wondering if they only trained it for a tiny amount, for example.)

EfficientZero: How It Works

Thank you so much for writing this! Strong-upvoted.

Yudkowsky and Christiano discuss "Takeoff Speeds"

That's helpful, thanks!

To be clear, I think that if EY put more effort into it (and perhaps had some help from other people as RAs) he could write a book or sequence rebutting Paul & Katja much more thoroughly and convincingly than this post did. [ETA: I.e. I'm much more on Team Yud than Team Paul here.] The stuff said here felt like a rehashing of stuff from IEM and the Hanson-Yudkowsky AI foom debate to me. [ETA: Lots of these points were good! Just not surprising to me, and not presented as succinctly and compellingly (to an audience of me) as they... (read more)

This is my take: if I had been very epistemically self-aware, and carefully distinguished my own impression/models and my all-things considered beliefs, before I started reading, then this would've updated my models towards Eliezer (because hey, I heard new not-entirely-uncompelling arguments) but my all-things considered beliefs away from Eliezer (because I would have expected it to be even more convincing).

I'm not that surprised by the survey results. Most people don't obey conservation of expected evidence, because they don't take into account arguments... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

I think I was expecting somewhat better from EY; I was expecting more solid, well-explained arguments/rebuttals to Paul's points from "Takeoff Speeds." Also EY seemed to be angry and uncharitable, as opposed to calm and rational. I was imagining an audience that mostly already agrees with Paul encountering this and being like "Yeah this confirms what we already thought."

FWIW "yeah this confirms what we already thought" makes no sense to me. I heard someone say this the other day, and I was a bit floored. Who knew that Eliezer would respond with a long list of examples that didn't look like continuous progress at the time, and said this more than 3 days ago? 

I feel like I got a much better sense of Eliezer's perspective reading this. One key element is whether AI progress is surprising, which it often is even if you can make trend-line arguments after-the-fact, people basically don't, and when they do they often get i... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

My prediction was mainly about polarization rather than direction, but I would have expected the median or average to not move much probably, and to be slightly more likely to move towards Paul than towards Yudkowsky. I think. I don't think I was very surprised.

Why would it move toward Paul? He made almost no arguments, and Eliezer made lots. When Paul entered the chat it was focused on describing what each of them believe in order to find a bet, not communicating why they believe it.

Ngo and Yudkowsky on alignment difficulty
Understand the work before understanding the engines; nearly every key concept here is implicit in the notion of work rather than in the notion of a particular kind of engine."

I don't know the relevant history of science, but I wouldn't be surprised if something like the opposite was true: Our modern, very useful understanding of work is an abstraction that grew out of many people thinking concretely about various engines. Thinking about engines was like the homework exercises that helped people to reach and understand the concept of work.

Similarly, perhaps it is pedagogically (and conceptually) helpful to begin with the notion of a consequentialist and then generalize to outcome pumps.

3RS5dWere you surprised by the direction of the change or the amount?

I wonder what effect there is from selecting for reading the third post in a sequence of MIRI conversations from start to end and also looking at the comments and clicking links in them.

Coordinating the Unequal Treaties

Huh, that's interesting & good to know. Seems that Most Favored Nation is very much still a thing today:

Does it perhaps have an advantage for the Japanese, namely that the four powers will be less motivated to demand concessions because said concessions would also go to their rivals?

5lsusr6dThat's a good question. I think the answer is "no" because each Western power had lots of rivals [] . The Cold War was a different story. In the Cold War, there were (in theory) only two opposing sides. The USA would fund basically anyone [] who opposed the USSR (and vice versa).
Discussion with Eliezer Yudkowsky on AGI interventions

I don't think they'd even need to be raised to think that; they'd figure it out on their own. Unfortunately we don't have enough time.

4johnlawrenceaspden1dSo, is it now the consensus opinion round here that we're all dead in less than twenty years? (Sounds about right to me, but I've always been a pessimist...)

we don't have enough time

Setting aside this proposal's, ah, logistical difficulties, I certainly don't think we should ignore interventions that target only the (say) 10% of the probability space in which superintelligence takes longest to appear.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Hot damn, where can I see these preliminary results?

The results were presented at a workshop by the project organizers. The video from the workshop is available here (the most relevant presentation starts at 5:05:00).

It's one of those innocent presentations that, after you understand the implications, keep you awake at night. 

Yudkowsky and Christiano discuss "Takeoff Speeds"

Sorry! I'll go back and insert links + reference your comment

What exactly is GPT-3's base objective?

Ahhh, OK. Then perhaps I just was using inappropriate words; it sounds like what I meant to refer to by 4 was the same as what you meant to refer to by 3.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Fair enough! I too dislike premature meta, and feel bad that I engaged in it. However... I do still feel like my comment probably did more to prevent polarization than cause it? That's my independent impression at any rate. (For the reasons you mention).

I certainly don't want to give up! In light of your pushback I'll edit to add something at the top.

Yudkowsky and Christiano discuss "Takeoff Speeds"

Yes, though I'm much more comfortable explaining and arguing for my own position than EY's. It's just that my position turns out to be pretty similar. (Partly this is independent convergence, but of course partly this is causal influence since I've read a lot of his stuff.)

There's a lot to talk about, I'm not sure where to begin, and also a proper response would be a whole research project in itself. Fortunately I've already written a bunch of it; see these two sequences.

Here are some quick high-level thoughts:

1. Begin with timelines. The best way to forec... (read more)

The core part of Ajeya's model is a probability distribution over how many OOMs of compute we'd need with today's ideas to get to TAI / AGI / APS-AI / AI-PONR / etc.

I didn't know the last two acronyms despite reading a decent amount of this literature, so thought I'd leave this note for other readers. Listing all of them for completeness (readers will of course know the first two):

TAI: transformative AI

AGI: artificial general intelligence

APS-AI: Advanced, Planning, Strategically aware AI [1]

AI-PONR: AI point of no return [2]

[1] from Carlsmith, which Dan... (read more)

I feel like the debate between EY and Paul (and the broader debate about fast vs. slow takeoff) has been frustratingly much reference class tennis and frustratingly little gears-level modelling.

So, there's this inherent problem with deep gearsy models, where you have to convey a bunch of upstream gears (and the evidence supporting them) before talking about the downstream questions of interest, because if you work backwards then peoples' brains run out of stack space and they lose track of the whole multi-step path. But if you just go explaining upstream g... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

[ETA: In light of pushback from Rob: I really don't want this to become a self-fulfilling prophecy. My hope in making this post was to make the prediction less likely to come true, not more! I'm glad that MIRI & Eliezer are publicly engaging with the rest of the community more again, I want that to continue, and I want to do my part to help everybody to understand each other.]

And I know, before anyone bothers to say, that all of this reply is not written in the calm way that is right and proper for such arguments. I am tired. I have lost a lot of hope.
... (read more)

I grimly predict that the effect of this dialogue on the community will be polarization

Beware of self-fulfilling prophecies (and other premature meta)! If both sides in a dispute expect the other side to just entrench, then they're less likely to invest the effort to try to bridge the gap.

This very comment section is one of the main things that will determine the community's reaction, and diverting our focus to 'what will our reaction be?' before we've talked about the object-level claims can prematurely lock in a certain reaction.

(That said, I think you'r... (read more)

5adamShimi7dStrongly agree with that. Since you agree with Yudkowksy, do you think you could strongman his position?

While this may not be the ideal format for it, I thought Eliezer’s voicing of despair was a useful update to publish to the LW community about the current state of his AI beliefs.

Ngo and Yudkowsky on alignment difficulty

For (a): Deception is a convergent instrumental goal; you get it “for free” when you succeed in making an effective system, in the sense that the simplest, most-likely-to-be-randomly-generated effective systems are deceptive. Corrigibility by contrast is complex and involves making various nuanced decisions between good and bad sorts of influence on human behavior.

For (b): If you take an effective system and modify it to be corrigible, this will tend to make it less effective. By contrast, deceptiveness (insofar as it arises “naturally” as a byproduct of p... (read more)

6rohinmshah2dYeah, that's right. Adapted to the language here, it would be 1. Why would we have a "full and complete" outcome pump, rather than domain-specific outcome pumps that primarily use plans using actions from a certain domain rather than "all possible actions", and 2. Why are the outcomes being pumped incompatible with human survival?
Discussion with Eliezer Yudkowsky on AGI interventions

EY knows more neuroscience than me (I know very little) but here's a 5-min brainstorm of ideas:

--For a fixed compute budget, spend more of it on neurons associated with higher-level thought (the neocortex?) and less of it on neurons associated with e.g. motor control or vision.

--Assuming we are an upload of some sort rather than a physical brain, tinker with the rules a bit so that e.g. neuron waste products get magically deleted instead of having to be pumped out, neurons never run out of energy/oxygen and need to rest, etc. Study situations where you are... (read more)

LCDT, A Myopic Decision Theory
Myopia is the property of a system to not plan ahead, to not think too far about the consequences of its actions, and to do the obvious best thing in the moment instead of biding its time.

This seems inconsistent with how you later use the term. Don't you nowadays say that we could have a myopic imitator of HCH, or even a myopic Evan-imitator? But such a system would need to think about the long-term consequences of its actions in order to imitate HCH or Evan, since HCH / Evan would be thinking about those things.

4adamShimi9dYeah, that's a subtle point. Here we're stressing the difference between the simulator's action and the simulation's (HCH or Evan in your example) action. Obviously, if the simulation is non-myopic, then the simulation's action will depend on the long-term consequences of this action (for the goals of the simulation). But the simulator itself only cares about answering the question "what would the simulation do next?". Once again, that might mean that the simulator will think about the long term consequences of the simulation's action on the simulation's goals, but the simulator doesn't have this goal: such reasoning is completely instrumental to its task of simulation. And more generally, the simulator isn't choosing his next action to make it easier to predict the future actions (like a predict-o-matic [] would do). That might sound like nitpicking, but this means something important: the simulator itself has no reason to be deceptive. It might output actions (as its best guess of what the simulation would do) that are deceptive, but only if the simulation itself is deceptive. What does that give us? * If we manage to point the simulation at something that is non-deceptive yet powerful, the myopic simulator will not introduce deception into the mix. Whereas doing IRL on the simulation and then optimizing for the reward would probably lead to goodhart and deception because of mesa-optimizers. * Here Evan would probably say that HCH sounds like the right non-deceptive simulation; I'm less convinced that HCH will not be deceptive. * An obvious question is to ask why not do imitation learning? Well, I expect (and I believe Evan expects to) that simulation is strictly more powerful than imitation, because it can make models of non-observed or ideal processes that we point out to. * If instead of having a single simulation
What exactly is GPT-3's base objective?

Why do you choose answer 3 instead of answer 4? In some sense answer 3 is the random weights that the developers intended, but answer 4 is what actually happened.

5Stella Biderman7dI think that 4 is confused when people talk about "the GPT-3 training data." If someone said "there are strings of words found in the GPT-3 training data that GPT-3 never saw" I would tell them that they don't know what the words in that sentence mean. When an AI researcher speaks of "the GPT-3 training data" they are talking about the data that GPT-3 actually saw. There's data that OpenAI collected which GPT-3 didn't see, but that's not what the words "the GPT-3 training data" refers to.
Ngo and Yudkowsky on alignment difficulty

To be clear I think I agree with your overall position. I just don't think the argument you gave for it (about bureaucracies etc.) was compelling.

Ngo and Yudkowsky on alignment difficulty

[Notes mostly to myself, not important, feel free to skip]

My hot take overall is that Yudkowsky is basically right but doing a poor job of arguing for the position. Ngo is very patient and understanding.

"it doesn't seem implausible to me that we build AIs that are significantly more intelligent (in the sense of being able to understand the world) than humans, but significantly less agentic." --Ngo

"It is likely that, before the point where AGIs are strongly superhuman at seeking power, they will already be strongly superhuman at understanding the world, and... (read more)

The idea is not that humans are perfect consquentialists, but that they are able to work at all to produce future-steering outputs, insofar as humans actually do work at all, by an inner overlap of the shape of inner parts which has a shape resembling consequentialism, and the resemblance is what does the work.  That is, your objection has the same flavor as "But humans aren't Bayesian!  So how can you say that updating on evidence is what's doing their work of mapmaking?"

5Charlie Steiner12dPerhaps... too patient and understanding. Richard! Blink twice if you're being held against your will! (I too would like you to write more about agency :P)
Attempted Gears Analysis of AGI Intervention Discussion With Eliezer

There are fates worse than 1. Fortunately they aren't particularly likely, but they are scary nonetheless.

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Ah right, thanks!

How well do you think it would generalize? Like, say we made it 1000x bigger and trained it on 100x more training data, but instead of 1 game for 100x longer it was 100 games? Would it be able to do all the games? Would it be better or worse than models specialized to particular games, of similar size and architecture and training data length?

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised
They train for 220k steps for each agent and mention that 100k steps takes 7 hours on 4 GPUs (no mention of which gpus, but maybe RTX3090 would be a good guess?)

Holy cow, am I reading that right? RTX3090 costs, like, $2000. So they were able to train this whole thing for about one day's worth of effort using equipment that cost less than $10K in total? That means there's loads of room to scale this up... It means that they could (say) train a version of this architecture with 1000x more parameters and 100x more training data for about $10M and 100 days. Right?

5Razied15dYou're missing a factor for the number of agents trained (one for each atari game), so in fact this should correspond to about one month of training for the whole game library. More if you want to run each game with multiple random seeds to get good statistics, as you would if you're publishing a paper. But yeah, for a single task like protein folding or some other crucial RL task that only runs once, this could easily be scaled up a lot with GPT-3 scale money.
What would we do if alignment were futile?

When I look at the world today, it really doesn't seem like a ship steered by evolution. (Instead it is a ship steered by no one, chaotically drifting.) Maybe if there is economic and technological stagnation for ten thousand years, then maybe evolution will get back in the drivers seat and continue the long slow process of aligning humans... but I think that's very much not the most probable outcome.

Comments on Carlsmith's “Is power-seeking AI an existential risk?”

Thanks for putting this stuff online!

FWIW I agree with Nate (and my opinions were largely independent, having read the report and written a response before seeing this). Happy to discuss with anyone interested.

Why I'm excited about Redwood Research's current project

Nice. I'm tentatively excited about this... are there any backfire risks? My impression was that the AI governance people didn't know what to push for because of massive strategic uncertainty. But this seems like a good candidate for something they can do that is pretty likely to be non-negative? Maybe the idea is that if we think more we'll find even better interventions and political capital should be conserved until then?

Why I'm excited about Redwood Research's current project

This is helpful, thanks!

In my ideal world those labs would have large “adversarial evaluation departments” that try extremely hard to find inputs (or random seeds, or “pseudo” inputs) where a powerful model attempts to deliberately cause harm, or do anything that even vaguely smells like causing harm or deliberately undermining safety measures, or trying to deceptively hide their capabilities, or etc. ... This won’t be enough on its own to be confident that models don’t do anything bad, and ideally this would be just one piece of a machine that created muc
... (read more)

I think it's pretty realistic to have large-ish (say 20+ FTE at leading labs?) adversarial evaluation teams within 10 years, and much larger seems possible if it actually looks useful. Part of why it's unrealistic is just that this is a kind of random and specific story and it would more likely be mixed in a complicated way with other roles etc.

If AI is exciting as you are forecasting then it's pretty likely that labs are receptive to building those teams and hiring a lot of people, so the main question is whether safety-concerned people do a good enough j... (read more)

What exactly is GPT-3's base objective?

Yes. I have the intuition that training stories will make this problem worse. But I don't think my intuition on this matter is trustworthy (what experience do I have to base it on?) so don't worry about it. We'll try it and see what happens.

(to explain the intuition a little bit: With inner/outer alignment, any would-be AGI creator will have to face up to the fact that they haven't solved outer alignment, because it'll be easy for a philosopher to find differences between the base objective they've programmed and True Human Values. With training stories, I expect lots of people to be saying more sophisticated versions of "It just does what I meant it to do, no funny business.")

What if memes are common in highly capable minds?

I don't understand, can you elaborate / unpack that?

2M. Y. Zuo20dA stag hunt: [] is a game theory term about a pattern of coordination that commonly emerges in multi party interactions. AIs have coordination problems with other AIs and with humans. AGIs exponentially more so as well discussed on LW. In attempting to compete and solve such coordination problems, the usage of memes will almost certainly be utilized, in both AI-AI and AI-human interaction. The dynamics will induce memetic evolution.
What exactly is GPT-3's base objective?

I was wondering if that was the case, haha. Thanks!

This is unfortunate, no? The AI safety community had this whole thing going with mesa-optimization and whatnot... now you propose to abandon the terminology and shift to this new frame? But what about all the people using the old terminology? Is the old terminology unsalvageable?

I do like your new thing and it seems better to me in some ways, but worse in others. I feel like I expect a failure mode where people exploit ambiguity and norm-laden concepts to convince themselves of happy fairy tales. I should ... (read more)

6adamShimi20dJust wanted to point out that this is already something we need to worry about all the time in alignment. Calling them training stories doesn't create such failure mode, it makes them obvious to people like you and me who are wary of narrative explanations in science.

This is unfortunate, no? The AI safety community had this whole thing going with mesa-optimization and whatnot... now you propose to abandon the terminology and shift to this new frame? But what about all the people using the old terminology? Is the old terminology unsalvageable?

To be clear, that's definitely not what I'm arguing. I continue to think that the Risks from Learned Optimization terminology is really good, for the specific case that it's talking about. The problem is just that it's not general enough to handle all possible ways of training a... (read more)

Persuasion Tools: AI takeover without AGI or agency?

Thanks for this! Re: it's not really about AI, it's about memetics & ideologies: Yep, totally agree. (The OP puts the emphasis on the memetic ecosystem & thinks of persuasion tools as a change in the fitness landscape. Also, I wrote this story a while back.) What follows is a point-by-point response:

The most attractive values given a new technological/social situation are likely to be similar to those given the immediately preceding situation, so I'd generally expect the most attractive values to generally be endemic anyway or close enough to endem
... (read more)
Persuasion Tools: AI takeover without AGI or agency?

To elaborate on this idea a bit more:

If a very persuasive agent AGI were to take over the world by persuading humans to do its bidding (e.g. maximize paperclips), this would count as an AI takeover scenario. The boots on the ground, the "muscle," would be human. And the brains behind the steering wheels and control panels would be human. And even the brains behind the tech R&D, the financial management, etc. -- even they would be human! The world would look very human and it would look like it was just one group of humans conquering the others. Yet it ... (read more)

Cortés, Pizarro, and Afonso as Precedents for Takeover

I forgot to give an update: Now I have read a handful of real history books on the subject, and I think the original post still stands strong.

Rob B's Shortform Feed

I don't think so? It's possible that it did and I forgot.

Speaking of Stag Hunts

It's only a yellow flag if you are spending the money. If you are uninvolved and e.g. the Lightcone team is running the show, then it's fine.

(But I have no problem with you doing it either)

Speaking of Stag Hunts
Hire a team of well-paid moderators for a three-month high-effort experiment of responding to every bad comment with a fixed version of what a good comment making the same point would have looked like.  Flood the site with training data.

What's so terrible about this idea? I imagine the main way it could go wrong is not being able to find enough people willing to do it / accidentally having too low a bar and being overwhelmed by moderators who don't know what they are doing and promote the wrong norms. But I feel like there are probably enough people o... (read more)

6Duncan_Sabien24dOn reflection, it's of a slightly different character than other items on the list. (Each item on the list is "terrible" for somewhat different reasons/has a somewhat different failure mode.) For that one, the main reason I felt I should disclaim it is "here's the part where I try to spend tens of thousands of someone else's money," and it feels like that should be something of a yellow flag.
Daniel Kokotajlo's Shortform

Yeah, this is a map of how philosophy fits together, so it's about ideal agents/minds not actual ones. Though obviously there's some connection between the two.

Transcript: "You Should Read HPMOR"

How did the audience react? Did you get any feedback? Do you think many of them went and read HPMOR? Did they like it?

6TurnTrout1moGood reception. Got lots of questions about EA. No questions about HPMOR. There were 8 audience members. I'd imagine that half of them at least loaded
EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised

Some basic questions in case anyone knows and wants to help me out:

1. Is this a single neural net that can play all the Atari games well, or a different net for each game?

2. How much compute was spent on training?

3. How many parameters?

4. Would something like this work for e.g. controlling a robot using only a few hundred hours of training data? If not, why not?

5. What is the update / implication of this, in your opinion?

(I did skim the paper and use the search bar, but was unable to answer these questions myself, probably due to lack of expertise)

5. What is the update / implication of this, in your opinion?

Personal opinion: 

Progress in model-based RL is far more relevant to getting us closer to AGI than other fields like NLP or image recognition or neuroscience or ML hardware. I worry that once the research community shifts its focus towards RL, the AGI timeline will collapse - not necessarily because there are no more critical insights left to be discovered, but because it's fundamentally the right path to work on and whatever obstacles remain will buckle quickly once we throw enough warm bod... (read more)

(1) Same architecture and hyperparameters, trained separately on every game.

(4) It might work. In fact they also tested it on a benchmark that involves controlling a robot in a simulation, and showed it beats state-of-the-art on the same amount of training data (but there is no "human performance" to compare to).

(5) The poor sample complexity was one of the strongest arguments for why deep learning is not enough for AGI. So, this is a significant update in the direction of "we don't need that many more new ideas to reach AGI". Another implication is that model-based RL seems to be pulling way ahead of model-free RL.

  1. Different networks for each game
  2. They train for 220k steps for each agent and mention that 100k steps takes 7 hours on 4 GPUs (no mention of which gpus, but maybe RTX3090 would be a good guess?)
  3. They don't mention it
  4. They are explicitely motivated by robotics control, so yes, they expect this to help in that direction. I think the main problem is that robotics requires more complicated reward-shaping to obtain desired behaviour. In Atari the reward is already computed for you and you just need to maximise it, when designing a robot to put dishes in a dishwash
... (read more)
4Teja Prabhu1mo1. It is a different net for each game. That is why they compare with DQN, not Agent57. 2. To train an Atari agent for 100k steps, it only needs 4 GPUs to train 7 hours. 3. The entire architecture is described in the Appendix A.1 Models and Hyper-parameters. 4. Yes. 5. This algorithm is more sample-efficient than humans, so it learned a specific game faster than a human could. This is definitely a huge breakthrough.
Fun with +12 OOMs of Compute

Sorry, somehow I missed this. Basically, the answer is that we definitely shouldn't just extrapolate out the AI and compute trend into the future, and Ajeya's and my predictions are not doing that. Instead we are assuming something more like the historic 2 ooms a decade trend, combined with some amount of increased spending conditional on us being close to AGI/TAI/etc. Hence my conditional claim above:

Conditional on +6 OOMs being enough with 2020's ideas, it'll happen by 2030. Indeed, conditional on +8 OOMs being enough with 2020's ideas, I think it'll probably happen by 2030.

If you want to discuss this more with me, I'd love to, how bout we book a call?

A very crude deception eval is already passed

Somewhat related thread (which I think was super valuable for me at least, independently) Experimentally evaluating whether honesty generalizes - LessWrong

Daniel Kokotajlo's Shortform

I made this a while back to organize my thoughts about how all philosophy fits together:

3Samuel Shadrach1moI think it's important to mention that this map is only useful if you are meta-reasoning about humans as if they are ideal rational agents. And not how reasoning actually happens in the brain. System-1 and System-2 mapping would be more realistic, so would studying based on lobes.
5Measure1moI find the bright green text on white background difficult to read even on a large screen. I would recommend black or dark gray text instead.
AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors

This is very helpful, thanks! I now have a better understanding of what you are doing and basically endorse it. (FWIW, this is what I thought/hoped you were doing.)

AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors

This is helpful, thanks!

I agree that we should expect regulation by default. And so then maybe the question is: Is the regulation that would be inspired by Truthful AI better or worse than the default? Seems plausibly better to me, but mostly it seems not that different to me. What sort of regulation are you imagining would happen by default, and why would it be significantly worse?

I also totally agree with your points 1 - 4.

Is GPT-3 already sample-efficient?

Yeah, I'm counting things as correct if it gets in the right ballpark. Like, I myself didn't know where you worked exactly, but CFAR sounded plausible, especially as a place you may have worked in the past. The fact that GPT-3 said you work at CFAR means it thinks you are part of the rationalist community, which is pretty impressive IMO.

AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors

What do you think about the failure mode described here? In particular, (a) how would you go about deciding whether politicians and CEOs inspired by your paper will decrease, or increase, the pressure on AIs to believe false things (and thus become epistemically broken fanatics/ideologues) and/or lie about what they believe? And (b) how would you go about weighing the benefits and costs of increased trust in AI systems?

4owencb1moTo add to what Owain said: * I think you're pointing to a real and harmful possible dynamic * However I'm generally a bit sceptical of arguments of the form "we shouldn't try to fix problem X because then people will get complacent" * I think that the burden of proof lies squarely with the "don't fix problem X" side, and that usually it's good to fix the problem and then also give attention to the secondary problem that's come up * I note that I don't think of politicians and CEOs to be the primary audience of our paper * Rather I think in the next several years such people will naturally start having more of their attention drawn to AI falsehoods (as these become a real-world issue), and start looking for what to do about it * I think that at that point it would be good if the people they turn to are better informed about the possible dynamics and tradeoffs. I would like these people to have read work which builds upon what's in our paper. It's these further researchers (across a few fields) that I regard as the primary audience for our paper.
4Owain_Evans1mo(This won't address all parts of your questions.) You suggest that the default outcome is for governments and tech platforms to not regulate whether AI needs to be truthful. I think it’s plausible that the default outcome is some kind of regulation. Why to expect regulation? Suppose an AI system produces false statements that deceive a group of humans. Suppose also that the deception is novel in some way: e.g. the falsehoods are personalized to individuals, the content/style is novel, or the humans behind the AI didn't intend any kind of deception. I think if this happens repeatedly, there will be some kind of regulation. This could be voluntary self-regulation from tech companies or normal regulation by governments. Regulation may be more likely if it’s harder to litigate using existing laws relating to (human) deception. Why expect AI to cause deception? You also suggest that in the default scenario AI systems say lots of obviously false things and most humans would learn to distrust them. So there's little deception in the first place. I’m uncertain about this but your position seems overconfident. Some considerations: 1. AI systems that generate wild and blatant falsehoods all the time are not very useful. For most applications, it’s more useful to have systems that are fairly truthful in discussing non-controversial topics. Even for controversial or uncertain topics, there’s pressure for systems to not stray far from the beliefs of the intended audience. 2. I expect some people will find text/chat by AI systems compelling based on stylistic features. Style can be personalized to individual humans. For example, texts could be easy to read (“I understood every word without pausing once!”) and entertaining (“It was so witty that I just didn’t want to stop reading”). Texts can also use style to signal intelligence and expertise (“This writer is obviously a genius and so I took their views seriously”). 3. Sometimes people won't know whether it was an AI or huma
Load More