All of 1a3orn's Comments + Replies

I'm quite unsure as well.

On one hand, I have the same feeling that it has a lot of weirdly specific, surely-not-universalizing optimizations when I look at it.

But on the other -- it does seem to do quite well on different envs, and if this wasn't hyper-parameter-tuned then that performance seems like the ultimate arbiter. And I don't trust my intuitions about what qualifies as robust engineering v. non-robust tweaks in this domain. (Supervised learning is easier than RL in many ways, but LR warm-up still seems like a weird hack to me, even though it's vit... (read more)

It's working for me? I disabled the cache in devtools and am still seeing it. It looks like it's hitting a LW-specific CDN also. (

Thanks for this, this was a fun review of a topic that is both intrinsically and instrumentally interesting to me!

I remain pretty happy with most of this, looking back -- I think this remains clear, accessible, and about as truthful as possible without getting too technical.

I do want to grade my conclusions / predictions, though.

(1). I predicted that this work would quickly be exceeded in sample efficiency. This was wrong -- it's been a bit over a year and EfficientZero is still SOTA on Atari. My 3-to-24-month timeframe hasn't run out, but I said that I expected "at least a 25% gain" towards the start of the time, which hasn't happened.

(2). There has been a shift to... (read more)

Thermodynamics is the deep theory behind steam engine design (and many other things) -- it doesn't tell you how to build a steam engine, but to design a good one you probably need to draw on it somewhat.

This post feels like a gesture at a deep theory behind truth-oriented forum / community design (and many other things) -- it certainly doesn't help tell you how to build one, but you have to think at least around what it talks about to design a good one. Also applicable to many other things, of course.

It also has virtue of being very short. Per-word one of my favorite posts.

I like post because it: -- Focuses on a machine which is usually non-central to accounts of the industrial revolution (at least in others which I've read), which makes novel and interesting to those interested in the roots of progress -- And has a high ratio of specific empirical detail to speculation -- Furthermore separates speculation from historical claims pretty cleanly

This post is a good review of a book, to an space where small regulatory reform could result in great gains, and also changed my mind about LNT. As an introduction to the topic, more focus on economic details would be great, but you can't be all things to all men.

There's a scarcity of stories about how things could go wrong with AI which are not centered on the "single advanced misaligned research project" scenario. This post (and the mentioned RAAP post by Critch) helps partially fill that gap.

It definitely helped me picture / feel some of what some potential worlds look like, to the degree I currently think something like this -- albeit probably slower, as mentioned in the story -- is more likely than the misaligned research project disaster.

It also is a (1) pretty good / fun story and (2) mentions the elements within the story which the author feels are unlikely, which is virtuous and helps prevent higher detail from being mistaken for plausibility.

I like this post in part because of the dual nature of the conclusion, aimed at two different audiences. Focusing on the cost of implementing various coordination schemes seems... relatively unexamined on LW, I think. The list of life-lessons is intelligible, actionable, and short.

On the other hand, I think you could probably push it even further in "Secret of Our Success" tradition / culture direction. Because there's... a somewhat false claim in it: "Once upon a time, someone had to be the first person to invent each of these concepts."

This seems false ... (read more)

Yeah, I think a recurring wrong thing throughout the Coordination Frontier sequence is me thinking in terms of people inventing mechanisms. I think this is mostly not cruxy for what the Coordination Frontier sequence is for, which is a guide for people who are optimizing for experimenting and pushing forward coordination theory/practice. Slow Cultural Accumulation is probably how many coordination mechanisms first happened, but it ain't gonna compound fast enough [] to navigate the 21st century.

That's 100% true about the quote above being false for environments for which the optimal strategy is stochastic, and a very good catch. I'd expect naive action-value methods to have a lot of trouble in multi agent scenarios.

The ease with which other optimization methods (i.e., policy optimization, which directly adjusts likelihood of different actions, rather than using an estimate of the action-value function to choose actions) represent stochastic policies is one of their advantages over q-learning, which can't really do so. That's probably one reason ... (read more)

I'm sleep deprived as I wrote that/am writing this, so I may be making some technical errors. The list was supposed to be conditions under which there (is guaranteed to) exist(s) an optimal policy that assigns a pure strategy to every state. This doesn't rule out the existence of environments that don't meet all these criteria and nonetheless have optimal policies that assign pure strategies to some or all states. Such an optimal policy just isn't guaranteed to exist. (Some games have pure Nash equilibria/but pure Nash equilibria are not guaranteed to exist in general.) That said, knowing the laws of physics/transition rules was meant to cover the class of non stochastic environments with multiple possible state transitions from a given state and action. (Maybe one could say that such environments are non deterministic, but the state transitions could probably be modelled as fully deterministic if one added appropriate hidden state variables and/or allowed a state's transition to be path dependent.) It's in this sense that the agent needs to know the transition rules of the environment for pure strategies to be optimal in general.
Answer by 1a3ornJan 10, 20234416

The two broad paths to general intelligence -- RL and LLMs -- both had started to stall by the beginning of 2023.

As Chinchilla had shown, data is just as important as compute for training smarter models. The massive increase in performance in the behavior of LLM's in prior years occurred because of a one-time increase of data -- namely, training on nearly everything interesting that humans have ever written. Unless the amount of high quality human text could be increased by 10x, this leap in performance would never happen again. Attempts to improve the beh... (read more)

3Gerald Monroe1mo
"as the nuke began to detonate, an incredible coincidence happened. All the neutrons missed hitting further atoms of plutonium and the core fizzled out, leaving a few glowing masses of plutonium". It could happen but no one has even tried to give llms the apis to even access a jira.

Generally, I don't think it's good to gate "is subquestion X, related to great cause Y, true?" with questions about "does addressing this subquestion contribute to great cause Y?" Like I don't think it's good in general, and don't think it's good here.

I can't justify this in a paragraph, but I'm basing this mostly of "Huh, that's funny" being far more likely to lead to insight than "I must have insight!" Which means it's a better way of contributing to great causes, generally.

(And honestly, at another level entirely, I think that saying true things, which... (read more)

Yes, and to expand only slightly: Coordinating against dishonest agents or practices is an extremely important part of coordination in general; if you cannot agree on removing dishonest agents or practices from your own group, the group will likely be worse at accomplishing goals; groups that cannot remove dishonest instances will be correctly distrusted by other groups and individuals.

All of these are important and worth coordinating on, which I think sometimes means "Let's condemn X" makes sense even though the outside view suggests that many instances of "Let's condemn X" are bad. Some inside view is allowed.

if you cannot agree on removing dishonest agents or practices from your own group

What group, though? I'm not aware of Sam Bankman-Fried having posted on Less Wrong (a website for hosting blog posts on the subject matter of human rationality). If he did write misleading posts or comments on this website, we should definitely downvote them! If he didn't, why is this our problem?

(That is to say less rhetorically, why should this be our problem? Why can't we just be a website where anyone can post articles about probability theory or cognitive biases, rather than an enforcement arm of the branded "EA" movement, accountable for all its sins?)

It's not a counter-argument to the post in its entirety, though -- it's a counter-argument to the recommendation that we de-escalate, from the Twitter post, no? Specifically, it's not a counter-argument to the odds of nuclear war if we don't de-escalate.

Two things can be true at once:

  1. Not seeking a complete Russian defeat runs a 1-in-6 chance of Nuclear War -- or say 1-in-N for the general case.
  2. Not seeking a complete Russian defeat means that we've responded partially to blackmail in a game-theoretically nonoptimal fashion, which means we have M% increa
... (read more)
Important clarification: Neither here nor in the twitter post did I advocate appeasement or giving in to blackmail. In the Venn diagram of possible actions, there's certainly a non-empty intersection of "de-escalation" and "appeasement", but they're not the same set, and there are de-escalation strategies that don't involve appeasement but might nonetheless reduce nuclear war risk. I'm curious: do you agree that halting (and condemning) the following strategies can reduce escalation and help cool things down without giving in to blackmail? 1. nuclear threats 2. atrocities 3. misleading atrocity propaganda [] 4. assassinations lacking military value 5. infrastructure attacks lacking military value (e.g. Nordstream sabotage) 6. shelling the Zaporizhzhya nuclear plant 7. disparaging de-escalation supporters as unpatriotic I think it would reduce nuclear war risk if the international community strongly condemned 1-7 regardless of which side did it, and I'd like to see this type of de-escalation immediately. 

I don't know if you're intentionally recapitulating this line of argument, but C.S. Lewis makes this argument in Miracles. There's a long history of the back and forth on wikipedia

I don't think it works, mostly because the fact that a belief is result of a physical process doesn't tell my anything at all about the rationality / irrationality of belief. Different physical processes should be judged differently; some are entangled with the resulting state of belief and others aren't.

Not intentional, but didn't expect it to be a novel argument either. I suspect everyone has thought about it sometime during their life, likely while learning physics in secondary school. I just think "cognitive instability" is a nice handle for the discussion.

One slightly counterintuitive thing about this paper is how little it improves on the GSM8K dataset, given that it does very well on relatively advanced test sets.

The Grade School Math, 8-K is a bundle of problems suitable for middle-schoolers. It has problems like:

"Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"

"Randy has 60 mango trees on his farm. He also has 5 less than half as many coconut trees as mango trees. How many trees does Randy have i... (read more)

The previous SOTA for MATH ( is a fine-tuned GPT-2 (1.5b params), whereas the previous SOTA for GSM8K ( is PaLM (540b params), using a similar "majority voting" method as Minerva (query each question ~40 times, take the most common answer).

I'm curious what kind of blueprint / design docs / notes you have for the voluntarist global government. Do you have a website for this? Is there a governmental-design discord discussing this? What stage is this at? etc.

The article title here is hyperbolic.

The title is misleading in the same way that calling AlphaStar a "a Western AI optimized for strategic warfare" is misleading. Should we also say that the earlier western work on Doom -- see VizDoom -- was also about creating "agents optimized for killing"? That was work on a FPS as well. This is just more of the same -- researchers trying to find interesting video games to work on.

This work transfers with just as much easy / difficulty to real-world scenarios as AI work on entirely non-military-skinned video games -... (read more)

That's a fair description of AlphaStar. For example, see this report NATO report (pdf): From the Game Map to the Battlefield – Using DeepMind's Advanced AlphaStar Techniques to Support Military Decision-Makers [] Obviously, military people of both NATO and China are trying to apply any promising AI research that they deem relevant for the battlefield. And if your promising research is military-themed, it is much more likely to get their attention. Especially if you're working at a university that does AI research for the military (like the aforementioned Tsinghua University).  There is a qualitative difference between the primitive pixelated Doom and the realistic CS. The second one is much easier to transfer to the battlefield, because of the much more realistic graphics, physics, military tactics, weaponry.  Not sure about that. Clearly, CS is much more similar to the real battlefield, than, say, Super Mario. Thus, the transfer should be much easier.  Also not sure about that. For example, in the article, one of the simple scenarios they have is a gun turret-like scenario, where the agent is fixed in one place, and is shooting moving targets (that look like real humans). I can imagine that one can put the exact same agent in a real automated turret, and with a suitable middleware it will be capable of shooting down moving targets at decent rates. The main issue is that once you have a mid-quality agent that can shoot at people, it is trivial to improve its skill, and get it to superhuman levels. The task is much easier than, say, self-driving cars, as the agent's only goal is to maximize the damage, and the agent's body is expendable. 

For investigation of the kind of thing you suggest, take a look at Anthropic's "A General Language Assistant as a Laboratory for Alignment" and more importantly "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback".

They focus on training a helpful / harmless assistant rather than good short stories, but using human-filtered model-output to improve behavior is the basic paradigm.

Thanks for the pointer, I will check that out!

I'd also be interested in someone doing this; I tend towards seeing it as good, but haven't seen a compilation of arguments for and against.

That's entirely fair about the first case.

But the generator for the ideas is the problem is that the minimizing the harm an AI can do is more or less the same as minimizing its usefulness. If you had a superintelligent AI in a box, you could go further than letting it only emit strings. You could ask it questions, and restrict it to giving you "YES" | "NO" | "NOT_SURE" as answers. It's even more safe then! But even less useful.

But people their tools to be useful! Gwern has a good essay on this ( where he points out that th... (read more)

Correct. It means that if you want a very powerful language model, having compute & having data is pretty much the bottleneck, rather than having compute & being able to extend an incredibly massive model over it.

Hey look at the job listing. (

"Tell me by means of text how to make a perfect battery," you tell the AI, and wait a week.

"I cannot make a perfect battery without more information about the world," the AI tells you. "I'm superintelligent, but I'm not omniscient; I can't figure out everything about the world from this shitty copy of Wikipedia you loaded me up with. Hook me up with actuators meeting these specifications, and I can make a perfect battery."

"No, of course not," you say. "I can't trust you for that. Tell me what experiments to do, and I'll do them myself."

The AI gives you ... (read more)

It feels like you're using a bit of ghost in the machine reasoning to come up with some of these answers. In the first case, the AI would not ask for more computing power. It does not have utilities that extend beyond one week. Its only goal is to create a message that can communicate how to make a really good battery. If it had insufficient computing power, it would not output a message telling me so, because that would be in direct opposition to the goal. The outcome I would expect in that case would be for it to communicate a really shitty or expensive battery or else just copy and paste the answer from Wikipedia. And this wouldn't be a ploy for more computing power, it would just be the AI actually making its best effort to fulfill its goal.  The second and third cases point out legitimate security concerns, but they're not ones that are impossible to address, and I don't see how aligned AI wouldn't also suffer from those risks. An oracular AI has some safety features, and an aligned AI has some safety features, but both could be misused if those limits were removed. Another stupid intro question, could you use an oracular AI to build an aligned one?

Right now a model I'm considering is that the C19 vac, at least for a particular class of people (males under 30? 40?) has zero or negative EV, and mostly shifts risk from the legible (death from c19) to the illegible (brain fog? general systematic problems the medical system does not know how to interpret!) Where "legible" is legible in the seeing-like-a-state sense.

I'm mostly motivated, again, by the same thing as you. It seems like there's an incredible disproportion between the bad side effects among my friend group, and the bad side effects I should... (read more)

This is very anecdotal, but I know a number of young people who had COVID and a number of young people who got the vaccine. Of the ones who had COVID, about half report continuing brain fog and thinking issues - one person went from a 99th percentile score on the PSAT to a 80th percentile score on the SAT. Of the ones who got the vaccine (including one who got the vaccine and booster shot after previously having COVID) there have been no noticed issues with thinking/intelligence.

I found this especially grating because he used it to criticize engineering. Peer review is only very dubiously an important part of science; but it's just plain confused to look at a plan to build a bridge, to build a spaceship, or to prevent a comet from destroying Earth and say "Oh, no, it hasn't been peer reviewed."

Hard agree, it felt iffy to me.

"Also, here’s a thread pointing," etc should probably contain a link.

Regarding the maturity of a field, and whether we can expect progress in a mature field to take place in relatively slow / continuous steps:

Suppose you zoom into ML and don't treat it like a single field. Two things seem likely to be true:

  1. (Pretty likely): Supervised / semi-supervised techniques are far, far more mature than techniques for RL / acting in the world. So smaller groups, with fewer resources, can come up with bigger developments / more impactful architectural innovation in the second than in the first.

  2. (Kinda likely): Developments in RL

... (read more)

Ah, that does make sense, thanks. And yeah, it would be interesting to know what the curve / crossover point would look like for the impact from the consistency loss.

Agreed, I added an extra paragraph emphasizing ReAnalyse. And thanks a ton for pointing that out that ablation, I had totally missed that.

I meant a relative Pareto frontier, vis-a-vis the LW team's knowledge and resources. I think your posts on how to expand the frontier are absolutely great, and I think they (might) add to the available area within the frontier.

"If you want to suggest that OP is part of a "genre of rhetoric": make the case that it is, name it explicitly."

I mean, most of OP is about evoking emotion about community standards; deliberately evoking emotions is a standard part of rhetoric. (I don't know what genre -- ethos if you want to invoke Aristotle -- but I don't think i... (read more)

LW is likely currently on something like a Pareto frontier of several values, where it is difficult to promote one value better without sacrificing others. I think that this is true, and also think that this is probably what OP believes.

The above post renders one axis of that frontier particularly emotionally salient, then expresses willingness to sacrifice other axes for it.

I appreciate that the post explicitly points out that is willing to sacrifice these other axes. It nevertheless skims a little bit over what precisely might be sacrificed.

Let's name ... (read more)

I don't think you've made a convincing case that LW is on a Pareto frontier of these values, and I don't know what such a case would look like, either. I've personally made several suggestions here in the comments (for LW feature improvements) that would make some things better without necessarily making anybody worse off. Feature suggestions would take resources to implement, but as far as I can tell the LW team has sufficient resources [] to act on whatever it considers its highest-EV actions. As for the rest of your post: I appreciate that you mention other values to consider, and that you don't want them to be traded off for one another. In particular, I strongly agree that I do not want to increase barriers to entry for newcomers. But I strongly disapprove of your imputing motives into the OP that aren't explicitly there, or that aren't there without ridiculous numbers of caveats (like the suggestions OP himself flagged as "terrible ideas"). OP even ends with a disclaimer that "this essay is not as good as I wished it would be". In contrast, this entire section of yours reads to me as remarkably uncharitable and in bad faith: If you want to suggest that OP is part of a "genre of rhetoric": make the case that it is, name it explicitly. Make your own words vulnerable, put your own neck out there. Instead of making your own object-level arguments, you're imputing bad motives into the OP, insinuating things without pointing to specific quotes, and suggesting that arguments for your case could be made, but that you won't make the effort to make them. You even end on an applause light [] ffs. -------------------------------------------------------------------------------- Circling back to the object level of the essay, namely improving the culture here: As I mention in my comment on th

A model is a thing that gives predictions of what will happen.

For instance, your brain has an (implicit) model of physics, which it uses to predict what it will see when you toss a ball. Generally, the brain is believed to do some form of predictive modeling by pretty much all theories about the brain.

You can also form models explicitly, outside of your brain. If I look at median house prices every year in my area for the last five years, draw a line through the points, and predict next year's prices will continue to go up, that's a model too. It isn't ... (read more)

Answer by 1a3ornNov 03, 2021170
Promoted by Raemon

I want to do a big, long, detailed explainer on the lineage of EfficientZero, which is fascinating, and the mutations it makes in that lineage. This is not that. But here's my attempt at a quick ELI5, or maybe ELI12

There are two broad flavors of reinforcement learning -- where reinforcement learning is simply "learning to act in an environment to maximize a reward / learning to act to make a number go up."

Model Free RL: This is the kind of execution algorithm you (sort of) execute when you're keeping a bike upright.

When keeping a bike upright, you don't f... (read more)


I'm looking forward to that big, long, detailed explainer :)

Thanks! This was super helpful.

I haven't explicitly modeled out odds of war with China in the coming years, in any particular timeframe. Some rationalist-adjacent spheres on Twitter are talking about it, though. In terms of certainty, it definitely isn't in the "China has shut down transportation out of Wuhan" levels of alarm; but it might be "mysterious disease in Wuhan, WHO claims not airborne" levels of alarm.

I'd expect our government to be approximately as competent in preparing for and succeeding at this task as they were at preparing for and eliminating COVID. (A look at our go... (read more)

Can you give some examples of who in the "rationalist-adjacent spheres" are discussing it?

Will all user-submitted species entered into a single environment at the end? I.e., does the biodiversity depend on the number of submissions?

From the code, yes.

I'm still unsure about whether jittering / random action would generally reflect pathology in trained policy or value functions. You've convinced me that it reveals pathology in exploration though.

So vis-a-vis policies: in some states, even the optimal policy is indifferent between actions. For such states, we would want a great number of hypotheses about those states to be easily available to the function approximator, because we would have hopefully maintained such a state of easily-available hypotheses from the agent's untrained state. This probably ... (read more)

Thanks! That's definitely a consequence of the argument.

It looks to me like that prediction is generally true, from what I remember about RL videos I've seen -- i.e., the breakout paddle moves much more smoothly when the ball is near, DeepMind's agents move more smoothly when being chased in tag, and so on. I should definitely made mental note to be alert to possible exceptions to this, though. I'm not aware of anywhere it's been treated systematically.

I feel like I once saw RL agents trained with and without energy costs, where the agents trained with energy costs acted a lot less jittery. But I can't remember where I saw it. 

Yeah, I said that badly. It isn't precisely the lack of expressiveness that bugs me. You're 100% right about the equivalencies.

Instead, it's that the grammar for OR is built into the system at a deep level; that the goal-attention module has separate copies of itself getting as input however many As, Bs, amd Cs are in "A or B or C".

Like -- it makes sense to think of the agent as receiving the goals, given how they've set it up. But it doesn't make sense to think of the agent as receiving the goals in language, because language implies a greater disconne... (read more)

Yeah I definitely wouldn't want to say that this framing is the whole answer -- just that I found it seemed interesting / suggestive / productive of interesting analysis.  To be clear: I'm 100% unsure of just what I think.

But I like that chess analogy a lot.  You can't hire a let expert and a const expert to write your JS for you.

There's probably a useful sense in which the bundle of related romantic-relationship-benefits are difficult to disentangle because of human psychology (which your framing leans on?), which in turn occurs because of evolu... (read more)

Yeah. I don't think those two things (evolved psych priors, and functional connections) are necessarily sensibly separated, even in the limit of freely considered self-modification. Like, it's not a coincidence that the bonds formed; maybe childrearing was the longest time scale selective pressure towards interpersonal goal alignment. That sort of cooperation is a convergent instrumental goal of life in general (in particular, of childrearing, and of life in modernity). Not obvious that we'd want to draw the boundary of what counts as "our mind updating" to exclude "evolution discovered that faith in a relationship is useful for key goals".
Answer by 1a3ornFeb 08, 202116

The thing I've found most interesting by far to track is reaction time.  

There's a lot of research showing that reaction time correlates with intelligence.  Unfortunately, most of this is on the scale of individuals rather than individual-days; but that is of course in part because it's hard to give someone an intelligence test and a reaction-time test every day for a while, and (relatively) easy to just give someone an intelligence test and a reaction-time test just once

I track simple reaction time (how fast I can hit a button after a visual sti... (read more)

Brilliant! Sounds like exactly what I was looking for, thanks!