lenivchick
lenivchick has not written any posts yet.

lenivchick has not written any posts yet.

Is there an argument where shoggoth's agency comes from? I can understand why it's useful to think of the mask (or simulated human) as an agent, not in our world though, but in the "matrix" shoggoth controls. Also I can understand that shoggoth must be really good at choosing very precise parameters for simulation (or acting) to simulate (or play) exactly the correct character that is most likely to write next token in very specific way. It seems very intelligent, but I don't get why shoggoth tend to develop some kind of agency of its own. Can someone elaborate on this?
Given that in the limit (infinite data and infinite parameters in the model) LLM's are world simulators with tiny simulated humans inside writing text on the internet, the pressure applied to that simulated human is not understanding our world, but understanding that simulated world and be an agent inside that world. Which I think gives some hope.
Of course real world LLM's are far from that limit, and we have no idea which path to that limit gradient descent takes. Eliezer famously argued about whole "simulator vs predictor" stuff which I think relevant to that intermidiate state far from limit.
Also RLHF applies additional weird pressures, for example a pressure to be aware that it's an AI (or at least pretend that it's aware, whatever that might mean), which makes fine-tuned LLM's actually less save than raw ones.
True, but you can always wriggle out saying that all of that doesn't count as "truly understanding". Yes, LLM's capabilities are impressive, but does drawing SVG changes the fact that somewhere inside the model all of these capabilities are represented by "mere" number relations?
Do LLM's "merely" repeat the training data? They do, but do they do it "merely"? There is no answer, unless somebody gives a commonly accepted criterion of "mereness".
The core issue with that is of course that since no one has a more or less formal and comprehensive definition of "truly understanding" that everyone agrees with - you can play with words however you like to rationalize whatever prior you... (read more)
Okay, let's imagine that you doing that experiment for 9999999 times, and then you get back all your memories.
You still better drink. Probablities don't change. Yes, if you are consistent with your choice (which you should be) - you have a 0.1 probability of being punished again and again and again. Also you have a 0.9 probability of being rewarded again and again and again.
Of course that seems counterintuitive, because in real life a perspective of "infinite punishment" (or nearly infinite punishment) is usually something to be avoided at all costs, even if you don't get reward. That's because in real life your utility scales highly non-linearly, and even if single punishment... (read more)
To be naively utilitarian, this question is reducible to following expression:
shouldBeAMoronAndAccelerate = ( pda * (ud + gud / sf) + pia * ui + (1 - pda - pia) * ud > pds * (ud + gud / sf) + pis * ui + (1 - pds - pis) * ud )
where:
pda: Probability of doom if you ignore safety and accelerate
pds: Probability of doom if you pay attention to safety
pia: Probability of achieving immortality if you accelerate
pis: Probability of achieving immortality of you play safe
ud: "Selfish" utility gain if you die (very negative)
gud: "Common good" utility gain if everyone dies (even more negative)
ui: "Selfish" utility gain if you achieve immortality (positive)
sf: "Selfish factor" - how much "Common good" utility... (read more)
I guess you've made it more confusing than it needs to be by introducing memory erasure to this setup. For all intents and purposes it's equivalent to say "you have only one shot" and after memory erasure it's not you anymore, but a person equivalent to other version of you next room.
So what we got is many different people in different spacetime boxes, with only one shot, and yes, you should drink. Yes, you have a 0.1 chance of being punished. But who cares if they will erase your memory anyway.
Actually we are kinda living in that experiment - we all gonna die eventually, so why bother doing stuff if you wont care after you die. But I guess we just got used to suppress that thought, otherwise nothing gonna be done. So drink.
Yeah, I realize that the whole "shoggoth" and "mask" distinction is just a metaphor, but I think it's a useful one. It's there in the data - in the infinite data and infinite parameters limit the model is the accurate universe simulator, including human writing text on the internet and separately the system that tweaks the parameters of the simulation according to the input. That of course doesn't necessary mean that actual LLM's far away from that limit reflect that distinction, but it seems to me natural to analyze model's "psychology" in that terms. One can even speculate that probably the... (read 815 more words →)