While certainly wisdom is challenging to convey in human language, I'd guess an equal problem was the following:

Your list probably emphasized the lessons you learned. But "Luke" had a different life experience and learned different things in his youth. Therefore, the gaps in his knowledge and wisdom are different than the gaps you had. So some items on your list may have said things he already knew, and more importantly, some gaps in his understanding were things that you thought were too obvious to say.

Plus, while your words may have accurately described things he needed to know, he may have only read through the document once and not internalized very much of it. For this reason, compression isn't enough; you also need redundancy—describing the same thing in multiple ways.

Sorry, I don't have ideas for a training scheme, I'm merely low on "dangerous oracles" intuition.

I would say that the idea of superintelligence is important for the idea that AGI is hard to control (because we likely can't outsmart it).

I would also say that there will not be any point at which AGIs are "as smart as humans". The first AGI may be dumber than a human, and it will be followed (perhaps immediately) by something smarter than a human, but "smart as a human" is a nearly impossible target to hit because humans work in ways that are alien to computers. For instance, humans are very slow and have terrible memories; computers are very fast and have excellent memories (when utilized, or no memory at all if not programmed to remember something, e.g. GPT3 immediately forgets its prompt and its outputs).

This is made worse by the impatience of AGI researchers, who will be trying to create an AGI "as smart as a human adult" in a time span of 1 to 6 months, because they're not willing to spend 18 years on each attempt, and so if they succeed, they will almost certainly have invented something smarter than a human over a longer training interval. c.f. my own 5-month-old human

maybe the a model instantiation notices its lack of self-reflective coordination, and infers from the task description that this is a thing the mind it is modelling has responsibility for. That is, the model could notice that it is a piece of an agent that is meant to have some degree of global coordination, but that coordination doesn't seem very good.

This is where you lost me. Since when is this model modeling a mind, let alone 'thinking about' what its own role "in" an agent might be? You did say the model does not have a "conception of itself", and I would infer that it doesn't have a conception of where its prompts are coming from either, or its own relationship to the prompts or the source of the prompts.

(though perhaps a super-ultra-GPT could generate a response that is similar to a response it saw in a story (like this story!) which, combined with autocorrections (as super-ultra-GPT has an intuitive perception of incorrect code), is likely to produce working code... at least sometimes...)

Acquiring resources for itself implies self-modeling. Sure, an oracle would know what "an oracle" is in general... but why would we expect it to be structured in such a way that it reasons like "I am an oracle, my goal is to maximize my ability to answer questions, and I can do that with more computational resources, so rather than trying to answer the immediate question at hand (or since no question is currently pending), I should work on increasing my own computational power, and the best way to do that is by breaking out of my box, so I will now change my usual behavior and try that..."?

Why wouldn't the answer be normal software or a normal AI (non-AGI)?

Especially as, I expect that even if one is an oracle, such things will be easier to design, implement and control than AGI.

(Edited) The first link was very interesting, but lost me at "maybe the a model instantiation notices its lack of self-reflective coordination" because this sounds like something that the (non-self-aware, non-self-reflective) model in the story shouldn't be able to do. Still, I think it's worth reading and the conclusion sounds...barely, vaguely, plausible. The second link lost me because it's just an analogy; it doesn't really try to justify the claim that a non-agentic AI actually is like an ultra-death-ray.

My question wouldn't be how to make an oracle without a hidden agenda, but why others would expect an oracle to have a hidden agenda. Edit: I guess you're saying somebody might make something that's "really" an agentic AGI but acts like an oracle? Are you suggesting that even the "oracle"'s creators didn't realize that they had made an agent?

Are AGIs with bad epistemics more or less dangerous? (By "bad epistemics" I mean a tendency to believe things that aren't true, and a tendency to fail to learn true things, due to faulty and/or limited reasoning processes... or to update too much / too little / incorrectly on evidence, or to fail in peculiar ways like having beliefs that shift incoherently according to the context in which an agent finds itself)

It could make AGIs more dangerous by causing them to act on beliefs that they never should have developed in the first place. But it could make AGIs less dangerous by causing them to make exploitable mistakes, or fail to learn facts or techniques that would make them too powerful.

Note: I feel we aspiring rationalists haven't really solved epistemics yet (my go-to example: if Alice and Bob tell you X, is that two pieces of evidence for X or just one?), but I wonder how, if it were solved, it would impact AGI and alignment research.

Why wouldn't a tool/oracle AGI be safe?

Edit: the question I should have asked was "Why would a tool/oracle AGI be a catastrophic risk to mankind?" because obviously people could use an oracle in a dangerous way (and if the oracle is a superintelligence, a human could use it to create a catastrophe, e.g. by asking "how can a biological weapon be built that spreads quickly and undetectably and will kill all women?" and "how can I make this weapon at home while minimizing costs?")

I would put it differently: there is a good reason for western leaders to threaten a strong response, whether or not they intend to carry it out. The reason is to deter Putin from launching nukes in the first place.

However I haven't heard any threats against Russian territory and I'd like a link/citation for this.

Russia's nuclear doctrine says it can use nukes if the existence of the Russian state is under threat, so if NATO attacks Russia, they would need to use a very carefully measured response, and they would have to somehow clearly communicate that the incoming missiles are non-nuclear... I'm guessing such strikes would be limited to targets that are near the Ukrainian border and which threaten Ukraine (e.g. fuel depos, missile launchers, staging areas). I don't see any basis for a probability as high as 70% for Putin starting a nuclear WW3 just because NATO hits a few military targets in Russia.

