for QACI, i intend to use pieces of data (constant-length raw bitstrings, or equivalently bounded natural numbers) to act kind of as "coordinates" or "pointers" around things we care about in the physical world, not just in space-time-timelines but also in encoding: a "location" for a blob of data would describe how that piece of data is written on the physical elementary particule structure of a harddrive or bar of memory, in a physics-world being ran by a hypothesis in solomonoff induction, or simply on the universal program.

for my purposes, i need to be able to:

  • locate three things in a world-hypothesis: the question q, the answer r, and the AI G.
  • filter for locations of q,r,G where q<r<G, with < being a partial order expressing causality — a pretty natural notion in turing machines.
  • be able to run counterfactual q's and get the resulting counterfactual r's.
  • possibly, locate G's (single?) action a and test for counterfactuals a' for the purpose of embedded agency, if we need to solve that.

to put this together, it seems like what we need is a way to locate pieces of data, and to be able to filter for precedence, and to hypothesize counterfactuals.

the format that i expect this to look like is a set of weighed hypotheses:

  • 40% chance that the blob is here and this would be how to insert a counterfactual in its place
  • 25% chance that it's there instead, and that would be how to insert a counterfactual in its place
  • etc…

"filtering for causality" would mean that, given one specific weighed location hypothesis for x and another for y, we can get an answer — either a boolean or a degree of confidence — about whether x<y or not. thus, if we require that x<y, we can either rule out or reduce the confidence in pairs of weighed location hypotheses that don't verify x<y.

i tentatively draw up such a potential blob-locating scheme here using "carving functions" C, but they might not be super well explained, they seem wonky, and they don't filter for causality.

we would want the signal from the set of correct hypotheses to outweigh incorrect hypotheses. there are two different shapes for failure to correctly locate blobs:

  • "naive" mislocations, where the hypotheses don't really capture what we want and and inserting a counterfactual would result in garbled results
  • adverserial mislocations, where some agent — such as the AI trying to make its own job easier, or remote aliens — is arranging their future lightcone to grab as much as they can of the mass of blob location hypotheses, and through those manipulate the behavior of our AI that uses them.

the latter case is particularly important: we need the AI to be unable to hijack the question-answer interval even if it fills almost its entire lightcone with fake but maximally-easily-locatable question-answer intervals. we need to go: "well, in most of the probability mass, the first instance of the AI is here, and so we'll only look at question-answer intervals before that".

because we care about "the first instance of the AI", sticking to solomonoff induction (where each world is its own program) rather than taking the entire universal program seems like it might make sense.

when it comes to inserting counterfactuals, it starts to seem like it would matter a whole lot whether the pattern being injected is (from the point of view of the AI being launched) is more (quantum-)random like q or more deterministic like G, because if the question q is quantum-random, then "injecting the counterfactual" might simply look like "find the other many-worlds timeline where q was a different string instead".

it even seems like such a blob-locating framework might be experimentable with to some extent.


New Comment
1 comment, sorted by Click to highlight new comments since: Today at 6:33 PM

Can you make this a bit less abstract by providing some examples for q, a, and G and how they are grounded in measurable things?

New to LessWrong?