LESSWRONG
LW

ryan_greenblatt
17488Ω44204217278
Message
Dialogue
Subscribe

I'm the chief scientist at Redwood Research.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Habryka's Shortform Feed
ryan_greenblatt6h64

One lesson you should maybe take away is that if you want your predictions to be robust to different interpretations (including interpretations that you think are uncharitable), it could be worthwhile to try to make them more precise (in the case of a tweet, this could be in a linked blog post which explains in more detail). E.g., in the case of "No massive advance (no GPT-5, or disappointing GPT-5)" you could have said "Within 2024 no AI system will be publicly released which is as much of a qualitative advance over GPT-4 in broad capabilites as GPT-4 is over GPT-3 and where this increase in capabilites appears to be due to scale up in LLM pretraining". This prediction would have been relatively clearly correct (though I think also relatively uncontroversial at least among people I know as we probably should only have expected to get to ~GPT-4.65 in terms of compute scaling and algorithmic progress by the end of 2024). You could try to operationalize this further in terms of benchmarks or downstream tasks.

To the extent that you can make predictions in terms of concrete numbers or metrics (which is not always possible to be clear), this avoids ~any issues due to interpretation. You could also make predictions about metaculus questions when applicable as these also have relatively solid and well understood resolution criteria.

Reply
AI Task Length Horizons in Offensive Cybersecurity
ryan_greenblatt10h42

First blood times represent the time of first successful submission in the originally published competition. While there are some limitations (participants usually compete in teams, and may solve problems in parallel or in sequence), this still provides a useful proxy.

Another limitation is that first blood times represent the fastest time of some group rather than the typical time that an expert would take to complete the task. This makes cybench times less comparable to other human completion times.

Reply
Don't Eat Honey
ryan_greenblatt1d20

oops, I meant except. My terrible spelling strikes again.

Reply
Don't Eat Honey
ryan_greenblatt1d8-19

Hmm, I guess I think "something basically like hedonic utilitarianism, at least for downside" is pretty plausible.

Maybe a big difference is that I feel like I've generally updated away from putting much weight on moral intuitions / heuristics except with respect to forbidding some actions because they violate norms, are otherwise uncooperative, seem like the sort of thing which would be a bad societal policy, are bad for decision theory reasons, etc. So, relatively weak cases can swing me far because I started off being quite unopinionated without putting that much weight on moral intuitions (which feel like they often come from a source mostly unrelated to what I ultimately terminally care about).

I do agree that just directly using "Rethink Priorities says 15%" without flagging relevant caveats is bad.

A shitty summary of the case I would give would be something like:

  • It seems plausible we should be worried about suffering in a way which doesn't scale (that much) with the size/complexity of brains in practice. Maybe the thing which is bad about suffering is pretty simple. E.g., I don't notice that the complexity of my thought has huge effects on my suffering as far as I can tell.
    • I think there is a case for some asymmetry between downside and upside with respect to complexity, at least in the regime of the biological brains we see in front of us.
  • If so, then maybe bees have the core suffering circuitry which causes the badness and this is pretty similar to humans.
  • Then, we have to aggregate this with other arguments for humans being much more important. The aggregation is super non-obvious (and naive averaging isn't valid due to two envelope problems), but I feel like an intuition for being conservative about suffering points in favor of worrying about bee suffering if there is a chance it matters comparably to human suffering.
  • Overall, this doesn't get me to 15%, more like 1% (with a bunch of the discount occurring in aggregation over different views), but 1% is still a lot. (This is all within the frame of the argument.)
  • I can imagine different moral intuitions (e.g. intuitions more like those of Tomasik) that get out more like 15% by having somewhat different weighting. I think these seem a bit strong to me, but not totally insane.

In practice, the part of my moral views which is compelled by this sort of thing ends up focused on longtermism rather than insect welfare.

(I'm not currently planning on engaging further and I'm extremely sympathetic to you doing the same.)

Reply1
Don't Eat Honey
ryan_greenblatt1d126

I think we both agree that the underlying question is probably pretty confused, and importantly and relatedly, both probably agree that what we ultimately care about probably will not be grounded in the kind of analysis where you assign moral weights to entities and then sum up their experiences.

I think I narrowly agree on my moral views which are strongly influenced by longtermist-style thinking, though I think "assign weights and add experiences" isn't way off of a perspective I might end up putting a bunch of weight on[1]. However, I do think "what moral weight should we assign bees" isn't a notably more confused question in the context of animal welfare than "how should we prioritize between chicken welfare interventions and pig welfare interventions". So, I think there at least exists a pretty common and broadly reasonable-ish perspective in which this question is sane.


The thing that creates a strong feeling of "I feel like people are just being crazy here" in me is the following chain of logic:

This feels a bit like a motte and bailey to me. Your original claim was "If anyone remotely thinks a bee suffering is 15% (!!!!!!!!) as important as a human suffering, you do not sound like someone who has thought about this reasonably at all. It is so many orders of magnitude away from what sounds reasonable to me". This feels feels very different from claiming that the chain of logic you point out is crazy. One can totally arrive at conclusion similar to "bee suffering is 15% as important as a human suffering" via epistemic routes different to the one you outline. to dismiss a claim in the way you did (in particular calling the specific claim crazy) because someone saying a claim also appears to be exhibiting a bunch of bad epistemic practices and you think they followed a specific chain of logic that you think is problematic. (I'm not necessarily saying this is what you did, just that this justification would have been bad.)

Maybe you both think "the claim in isolation is crazy" (what you originally said and what I disagree with) and "the process used to reach that claim here seems particularly crazy". Or maybe you want to partially walk back your original statement and focus on the process (if so, seems good to make this more explicit).

Separately, it's worth noting that while Bentham's Bulldog emphasizes the takeaway of "don't eat honey", they also do seem do be aware of and endorse other extreme conclusions of high moral weight on insects. (I wish they would also note in the post that this obviously has other more important implications than don't eat honey!) So, I'm not sure that that point (4) is that much evidence about a bad epistemic process in this particular case.


  1. Considerations like an arbitrarily large multiverse make questions around diversity of cognitive experience more complex and makes literally linear population ethics incoherant due to infinities. But, I think you pretty plausibly end up with something that roughly resembles linear aggregation via something like UDASSA. ↩︎

Reply
leogao's Shortform
ryan_greenblatt2d40

IMO, part of the solution to endless scrolling is to not implement the feature where you can endless scroll. Instead, have an explicit next page button after some moderate amount of scrolling. (Also having the pop up is good, you could even let people program the pop up to be more frequent etc.)

Reply
Don't Eat Honey
ryan_greenblatt2d357

I think that it's pretty reasonable to think that bee suffering is plausibly similarly bad to human suffering. (Though I'll give some important caveats to this in the discussion below.)

More precisely, I think it's plausible that I (and others) think that on reflection[1] that the "bad" part of suffering is present in roughly the same "amount" in bees as in humans such that suffering in both is very comparable. (It's also plausible I'd end up thinking that bee suffering is worse due to e.g. higher clock speed.) This is mostly because I don't strongly think that on reflection I would care about the complex aspects of the suffering or end up caring in a way which is more proportional to neuron count (though these are also plausible).

See also Luke Muehlhauser's post on moral weights which also discusses a way of compute moral weights which implies it's plausible that bees have similar moral weight to humans.[2]

I find the idea that we should be radically uncertain about moral-weight-upon-reflection-for-bees pretty intuitive: I feel extremely uncertain about core questions in morality and philosophy which leaves extremely wide intervals. Upon hearing that some people put substantial moral weight on insects, my initial thought was that this was maybe reasonable but not very action relevant. I haven't engaged with the Rethink Priorities work on moral weights and this isn't shaping my perspective; my perspective is driven by mostly simpler and earlier views. I don't feel very sympathetic to perspectives which are extremely confident in low moral weights (like this one) due to general skepticism about extreme confidence in most salient questions in morality.

Just because I think it's plausible that I'll end up with a high moral-weight-upon-reflection for bees relative to humans doesn't mean that I necessarily think the aggregated moral weight should be high; this is because of two envelope problems. But, I think moral aggregation approaches that end up aggregating our current uncertainty in a way that assigns high overall moral weight to bees (e.g. a 15% weight like in the post) aren't unreasonable. My off-the-cuff guess would be more like 1% if it was important to give an estimate now, but this isn't very decision relevant from my perspective as I don't put much moral weight on perspectives that care about this sort of thing. (To oversimplify: I put most terminal weight on longtermism, which doesn't care about current bees, and then a bit of weight on something like common sense ethics which doesn't care about this sort of calculation.) And, to be clear, I have a hard time imagining reasonable perspectives which put something like a >1% weight on bees without focusing on stuff other than getting people to eat less honey given that they are riding the crazy train this far.

Overall, I'm surprised by extreme confidence that a view which puts high moral weight on bees is unreasonable. It seems to me like a very uncertain and tricky question at a minimum. And, I'm sympathetic to something more like 1% (which isn't orders of magnitude below 15%), though this mostly doesn't seem decision relevant for me due to longtermism.

(Also, I appreciate the discussion of the norm of seriously entertaining ideas before dismissing them as crazy. But, then I find myself surprised you'd dismiss this idea as crazy when I feel like we're so radically uncertain about the domain and plausible views about moral weights and plausible aggregations over these views end up with putting a bunch of weight on the bees.)


Separately, I don't particularly like this post for several reasons, so don't take this comment as an endorsement of the post overall. I'm not saying that this that this post argues effectively for its claims, just that these claims aren't totally crazy.


  1. As in, if I followed my preferred high effort (e.g. takes vast amounts of computational resources and probably at least thousands of subjective years) reflection procedure with access to an obedient powerful AI and other affordances. ↩︎

  2. Somewhat interestingly, you curated this post. The perspective expressed in the post is very similar to one that gets you substantial moral weight on bees, though two envelope problems are of course tricky. ↩︎

Reply41
Substack and Other Blog Recommendations
ryan_greenblatt2d240

Pitching the Redwood Research substack: We write a lot of content about technical AI safety focused on AI control.[1] This ranges from stuff like "what are the returns to compute/algorithms once AIs already beat top human experts" to "Making deals with early schemers" to "Comparing risk from internally-deployed AI to insider and outsider threats from humans".


  1. We also cross post this to LessWrong, but subscribing on substack is an easy to guarantee you see our content. ↩︎

Reply
Consider chilling out in 2028
ryan_greenblatt4d50

If someone did a detailed literature review or had relatively serious evidence, I'd be interested. By default, I'm quite skeptical of your level of confidence in this claims given that they directly contradict my experience and the experience of people I know. (E.g., I've done similar things for way longer than 12 weeks.)

To be clear, I think I currently work more like 60 hours a week depending on how you do the accounting, I was just defending 70 hours as reasonable and I think it makes sense to work up to this.

Reply
The Industrial Explosion
ryan_greenblatt4d*122

I expect fusion will outperform solar and is reasonably likely to be viable if there is an abundance of extremely superhuman AIs.

Notably, there is no hard physical reason why the payback time required for solar panels has to a year rather e.g. a day or two. For instance, there exist plants which can double this quickly (see e.g. duckweed) and the limits of technology could allow for much faster double times. So, I think your analysis holds true for current solar technology (which maybe relevant to part of this post), but certainly doesn't hold in the limit of technology and it may or may not be applicable at various points in a takeoff depending on how quickly AIs can advance relevant tech.

Reply2
Load More
14ryan_greenblatt's Shortform
Ω
2y
Ω
246
68Jankily controlling superintelligence
Ω
6d
Ω
4
46What does 10x-ing effective compute get you?
8d
4
29Prefix cache untrusted monitors: a method to apply after you catch your AI
Ω
13d
Ω
1
61AI safety techniques leveraging distillation
Ω
14d
Ω
0
70When does training a model change its goals?
Ω
20d
Ω
2
43OpenAI now has an RL API which is broadly accessible
Ω
21d
Ω
1
63When is it important that open-weight models aren't released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
Ω
23d
Ω
11
71The best approaches for mitigating "the intelligence curse" (or gradual disempowerment); my quick guesses at the best object-level interventions
Ω
1mo
Ω
19
81AIs at the current capability level may be important for future safety work
Ω
2mo
Ω
2
91Slow corporations as an intuition pump for AI R&D automation
Ω
2mo
Ω
23
Load More
I don't think it's good practice
Anthropic (org)
6mo
(+17/-146)
Frontier AI Companies
8mo
Frontier AI Companies
8mo
(+119/-44)
Deceptive Alignment
1y
(+15/-10)
Deceptive Alignment
1y
(+53)
Vote Strength
2y
(+35)
Holden Karnofsky
2y
(+151/-7)
Squiggle Maximizer (formerly "Paperclip maximizer")
2y
(+316/-20)