LESSWRONG
LW

1567
williawa
3273980
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
3williawa's Shortform
7mo
38
10Thoughts About how RLHF and Related "Prosaic" Approaches Could be Used to Create Robustly Aligned AIs.
2mo
14
3williawa's Shortform
7mo
38
Which meat to eat: CO₂ vs Animal suffering
williawa3d10

I've eaten 150+ grams of raw nutritional yeast in a day without much issues. I'd worry more about getting excessive amounts of certain vitamins. I agree though, this is not a feasible thing to do for most people. Also it tastes terrible so I would not recommend.

Reply
Reminder: Morality is unsolved
williawa8d11

The point I was trying to make was that, in my opinion morality is not a thing that can be "solved".

If I prefer chinese and you prefer greek, I'll want to get chinese, you'll wanna get greek. There's not that much more to be said. The best we can hope for is reaching some pareto frontier so we're not deliberately screwing ourselves over, but along that pareto frontier we'll be pulling in opposite directions.

Perhaps a better example would've been music. Only one genre of music can be played from now on.

Reply
Reminder: Morality is unsolved
williawa8d10

Here is a game you can play with yourself, or others:

a) You have to decide on a five dishes and a recipe for each dish that can be cooked by any reasonably competent chef.

b) From tomorrow onwards, everyone on earth can only ever eat food if the food is one of those dishes, prepared according to the recipes you decided. 

c) Tomorrow, every single human on Earth, including you and everyone you know, will also have their tastebuds (and related neural circuitry) randomly swapped with someone else. 

This means that you are operating under the veil of ignorance. You should make sure that the dishes you decide on are tasty to whoever you are, once the law takes effect.

Multiplayer: The one to first convince all other players of what dishes, wins. 

Single player: If you play alone, you just need to convince yourself.

Good luck!

Reply
Any corrigibility naysayers outside of MIRI?
williawa9d30

Okay, sorry about this. You are right. I have a thought up a somewhat nuanced view about how prosaic corrigibility could work and I kind of just assumed that was the same was what Max had because he uses a lot of the same keywords I use when I think about this, but after actually reading the CAST article (or I read part 0 and 1), I realize we have really quite different views.

Reply
Any corrigibility naysayers outside of MIRI?
williawa9d31

I think I've independently arrived at a fairly similar view. I haven't read your post. But I think the corrigibility basin thing is one of the more plausible and practical ideas for aligning ASIs. The core problem is that you can't just train your ASI for corrigibility because it will sit and do nothing, you have to train it to do stuff. And then these two training schemes will grit against each-other. Which leads to tons of bad stuff happening, eg its a great way to make your AI a lot more situationally aware. This is an important facet of the "anti-naturality" thing, I think.

Reply
How Well Does RL Scale?
williawa10d10

I'm getting somewhat confused about information-theoretic arguments around RL scaling. What makes sense to me is that: information density is constant per token in pre-training, no matter how long you make contexts, but decrease 1/n as you make RL trajectories longer. This means that if you look at just scaling context length, RL should get asymptotically less efficient.

What's not clear to me is the relationship between "bits getting into the weights" and capabilities. Using the information-theoretic argument above, you'd probably get that in o3, one millionth of the information in the weights comes from RL, or something like that, I'm not sure. But o3's advance in capabilities over 4o seem clearly far more than a millionth factor improvement. I think this would be true even if you work to disentangle inference time scaling and RL scaling. Eg ratio of bits in o1 vs o3.  Number of bits in o3 over o1 is very small, but thinking for the same time, the difference is very noticeable.

Reply
Towards a scale-free theory of intelligent agency
williawa10d10

I think this is the wrong view, but its possible I'm misunderstanding. In my way of viewing things, the incentives of the bargainers and the coalition are how the analysis should start.

Meaning, you get a list of conditions more like:

  1. Either person will only join a coalition if their expected utility is higher after joining than before joining.
  2. The coalition should be "fair", meaning both sides have equal expected increases in utility before and after joining

I think what you end up with is that the utility function of the coalition is something like min(a(E1(x)−E1(x0)),b(E2(x)−E2(x0))). Where e1 and e2 are expected utilities for the two agents, E(x) is raw expected utility, and E(x_0) is a honest estimate of current utility. 

And a,b should be "normalizing" constants to make the utilities comparable. Just to avoid like me scaling down my utilities 100x to get a better deal. (I don't actually know if this construction makes sense. Finding a fair way to compare across utility functions seems hard. Interested if there's good writing on this.)

Reply
1a3orn's Shortform
williawa12d10

FYI (in case it wasn't you, or was by accident), you answered, but then the comment was deleted for some reason.

If you had an answer I'm interested.

Reply
The IABIED statement is not literally true
williawa13d32

This is maybe the most contentious point in my argument, and I agree this is not at all guaranteed to be true, but I have not seen MIRI arguing that it’s overwhelmingly likely to be false.

 

Hm? I feel this is basically the single argument they makes in the whole first third of the book. "You don't get what you train for" et cetera. I think they'd disagree current LLMs are aligned, like at all, and getting ASIs "about as aligned as current LLMs" would get us all  killed instantly.

I think this is what you should argue against in a post like this. The brain-emulations and collective intelligence do no heavy lifting. Ironically I've head Eliezer on several occasions literally propose "getting the smartest humans, uploading them, running them for a thousand subjective years" as a good idea.

For the record: I think their argument is coherent, but doesn't provide the level of confidence they display. I'd put like ~50% on "If anyone builds it, with anything remotely like current techniques, everyone dies". Maybe 75% if a random lab does it under intense time pressure, and like 25% if a safety conscious international project did it, with enough time to properly/thoroughly/carefully implement all the best prosaic safety techniques, but without enough time to make any new really fundamental approaches or changes to how the AIs are created.

Reply
Meditation is dangerous
williawa14d162

Have you done any research on the prevalence of meditation-related mental problems in countries where it is a tradition? Like Thailand or Myanmar.

 

That seems like it should be instructive in disentangling

  1. Meditation is not dangerous at all and its just crazy people going crazy for normal reasons while meditating at around the % baserates would suggest
  2. Meditation is dangerous if done incorrectly, or without the right context and cultural support
  3. Meditation is just always dangerous, at least if you are at high risk for mental problems.

 

I've done a fair bit of sutta-reading, and I can't remember reading that much about mental problems coming with meditation (or phenomena described in a way where it seems like they'd be describing something we'd refer to as mental problems). But they are obviously a biased source.

Anecdotally, I've done a fair bit of serious meditation, and know many others, and haven't had any experiences with people going crazy. I've had crazy experiences and know many people who've had crazy experiences, but no-one who permanently lowered their functioning, or even lowered their functioning for more than a day. I've had one experience with a guy who meditated a lot and kind of lost motivation to do anything career wise. This seems like it could be quite bad. But he seems super happy. It's hard to say whether this is actually bad or not. I feel the cases I see/hear of where this happens, often the person in question was pursuing eg their career for dumb reasons. Eg they weren't doing it for authentic reasons and they didn't really enjoy it, but they were trying to appease their parents, or they had some complex where they think their life is worthless if they can't become a multimillionaire, and like they're currently not a multimillionaire and they feel extreme anguish about it. 

Reply
Load More