Cody Rushing — LessWrong

Why you should eat meat - even if you hate factory farming

Oh I think I see what you are arguing (that you should only care about whether or not eating meat is net good or net bad, theres no reason to factor in this other action of the donation offset)

Specifically then the two complaints may be:

You specify that $0 < E$ in your graph, where you are using E represent -1 * amount of badness. While in reality people are modelling $E$ as negative (where eating meat is instead being net good for the world)
People might instead think that doing E is 'net evil' but also desirable for them for another reason unrelated to that (maybe for some reason like 'i also enjoy eating meat'). So here, if they only want to take net good actions while also eating meat, then they would offset it with donations. The story you outlined above arguing that 'The concept of offsetting evil with good does not make sense' misses that people might be willing to make such a tradeoff

I think I agree with what you are saying, and might be missing other reasons people are disagree voting

Why you should eat meat - even if you hate factory farming

Cody Rushing1mo24

Responding to your confusion about disagreement votes: I think your model isn't correctly describing how people are modelling this situation. People may believe that they can do more good from [choosing to eat meat + offsetting with donations] vs [not eating meat + offsetting with donations] because of the benefits described in the post. So you are failing to include a +I (or -I) term that factors in peoples abilities to do good (or maybe even the terminal effects of eating meat on themself).

Four ways learning Econ makes people dumber re: future AI

Cody Rushing2moΩ-100

Regardless of the whether these anti-pedagogy's are correct, I'm confused about why you think you've shown that learning econ made the economists dumber. It seems like the majority of your tweets you linked, excluding maybe Tweet 4, are actually just the economists discussing narrow AI and failing to consider general intelligence?

If you meant to say something like 'econ pedagogy makes it hard for economists to view AGI as something that could actually be intelligent in a way similar to humans', then I may be more inclined to agree with you.

The (Unofficial) Rationality: A-Z Anki Deck

Cody Rushing5mo51

The deck appears to have no notes in it

Shortform

Cody Rushing7mo-10

My strong downvote just gave +3

The Case Against AI Control Research

Cody Rushing9mo21

Hmm, when I imagine "Scheming AI that is not easy to shut down with concerted nation-state effort, are attacking you with bioweapons, but are weak enough such that you can bargain/negotiate with them" I can imagine this outcome inspiring a lot more caution relative to many other worlds where control techniques work well but we can't get any convincing demos/evidence to inspire caution (especially if control techniques inspire overconfidence).

But the 'is currently working on becoming more powerful' part of your statement does carry a lot of weight.

The Case Against AI Control Research

Cody Rushing9mo42

Control research exclusively cares about intentional deception/scheming; it does not aim to solve any other failure mode.

(nitpick, doesn't address main point of article) I think this is incomplete. Though control research does indeed care a lot about scheming, control can be used more broadly to handle any worst-case deployment behavior. See Josh Clymer's post about Extending control evaluations to non-scheming threats.

How do you deal w/ Super Stimuli?

Cody Rushing10mo20

This might not work well for others, but a thing that's worked well for me has been to (basically) block cheap access to it with anticharities. Introducing friction in general is good

Anthropic: Three Sketches of ASL-4 Safety Case Components

Cody Rushing1y30

I'm glad to see this. Some initial thoughts about the control safety case:

I don't get how the "monitoring game" is different from the red-team game? Is it because the blue team is limited to only considering protocols that use just monitoring? Or do you not expect to play a full red-team simulation (this would be worrisome)
Option 2 (honeypotting) feels doomed (in isolation). This primarily stems from my belief that making situations that are indistinguishable from deployment is probably hard - the model has control over state in very broad and hard-to-predict ways, and this can implicitly encode that it is in deployment (for instance, the model can see code that it itself has written and would've taken a long time to write, and realize it has likely been deployed for some time). You can still run honeypotting tests using this newer distribution, but then you have to hope that you are running honeypotting frequently enough to catch when the model is confident it is in deployment.
The elicitation problem, and the broader problem of justifying that the red-blue teaming has been sufficient, indeed feel quite hard. This is one reason as to why you might expect that control cases can only work for human-level AI - if your 'elicitation protocol' just ends up being 'use attack strategies made primarily by humans'.
- There are related problems in justifying how many affordances to give to the red team. For instance, decisions related to if the red team should be able to give/train into the model 'coordination signals' are likely important, but might not be easy to make. Past research has been maximally conservative, but this might not be realistic.
The limitation that you are measuring P(catastrophe | misaligned) rather than just P(catastrophe) is fairly large, and I would be interested in discussion on how important this is to others. More broadly, I don't feel great about our chances of bounding P(catastrophe) for reasons such as this.

A Narrow Path: a plan to deal with AI extinction risk

Cody Rushing1y52

Am I correct in interpreting that your definition of "found system" would apply nearly all useful AI systems today such as ChatGPT, as these are algorithms which run on weights that are found with optimization methods such as gradient descent? If so, it is still fairly onerous.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments