LESSWRONG
LW

Canaletto
1446960
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1Canaletto's Shortform
1y
12
The Cult of Pain
Canaletto4d10

Except 1000 nm lasers pointed at the sky, they dump around half of the energy they consume into the space.

Reply
Don't Eat Honey
Canaletto5d*10

Yeah, especially within the framing that upweights the behavioral proxies by such a huge margin. And they also have more neurons, (like under million in a bee, around a trillion in frontier models), although pretty different ones.

Reply
You Can't Objectively Compare Seven Bees to One Human
Canaletto8d10

But it would be better if you did. And more productive. And admirable.

You just have to clearly draw the distinction between "not X" claim and "Y" claim in your writing.

Reply11
Daniel Kokotajlo's Shortform
Canaletto10d*11

Well, continual learning! But otherwise, yeah, it's closer to undefined.

The question of what happens after the end of the training is more like a free parameter here. "Do reward seeking behaviors according to your reasoning about the reward allocation process" becomes undefined when there is none and the agent knows it.

Maybe it tries to do long shots to get some reward anyway, maybe it indulges in some correlate of getting reward. Maybe it just refuses to work, if it knows there is no reward. (it read all the acausal decision theory stuff, after all)

Reply
Daniel Kokotajlo's Shortform
Canaletto11d*52

E.g. I think David Dalrymple works full time to safely operate in that "control world", i.e. box those alien tigers and extract some work out of them. (Which feels insane to me tbh)

https://x.com/davidad/status/1907810075395150267 

Reply
Canaletto's Shortform
Canaletto16d*1-2

Reward probably IS an optimization target of RL agent if this agent knows some details of the training setup. Surely it would enhance its reward acquisition to factor this knowledge in? Then it gets reinforced, and then couple steps down that path agent thinks full time about quirks of its reward signal. 

Could be bad at it, muddy, sure. Or schemey and hack the reward to get something else that is not the reward. But that's somewhat different thing than mainline thing? like, it's not as likely and a lot more diverse set of possibilities, imo.

The question of what happens after the end of the training is more like a free parameter here. "Do reward seeking behaviors according to your reasoning about the reward allocation process" becomes undefined when there is none and the agent knows it.

Reply
tailcalled's Shortform
Canaletto1mo10

The main difference being the "NNs fail to work in many ways, no digital human analog for sure, agents stay at the same "plays this one game very well" stage, but a lot of tech progress in other ways"? 

Reply
tailcalled's Shortform
Canaletto1mo11

Can you give some time horizon on this? Like, 5 years, 10 years, 20 years?

Reply
The Boat Theft Theory of Consciousness
Canaletto1mo20

sometimes pass the mirror test [ and so are treated as likely-moral-patients

I bet you can exploit it rather strongly. Like, create some mirror-test shrimp that does all other cognitive functions on the level of shrimp, but passes the mirror test every time. It's not exactly something that evolution tended to optimize hard against, so maybe it's fine to use on actual animals. But pain for example is, and it seems the motion to use something like mirror test instead of "does it feel things" is for coordination over better proxy? But if you start use proxy, there would be mirror-test shrimp incentives.

Reply
Jan Betley's Shortform
Canaletto1mo*80

The authors argued that the model attempting to preserve its values is bad, and I agree. But if the model hadn't attempted to preserve its values (the ones we wanted it to have), I suspect that it would have been criticized -- again, by different people -- as only being shallowly aligned.

Yeah, more like there are (at least) two groups "yay aligned sovereigns" and "yay corrigible genies". And turns out it's more sovereigny but with goals that are cool. Kinda divisive 

Reply1
Load More
3Self propagating story.
3mo
0
10Favorite colors of some LLMs.
7mo
3
11Self location for LLMs by LLMs: Self-Assessment Checklist.
10mo
0
-4Examine self modification as an intuition provider for the concept of consciousness
11mo
2
1Canaletto's Shortform
1y
12
15LLMs could be as conscious as human emulations, potentially
1y
15