Counterfactual Civilization Simulation Version -1.0 aka my application to Johannes Mayer's SPAR project

Pi Rogers

This is my "object level output" submission for Johannes Mayer's 2024 SPAR Application (the linked doc seems to be reused from the 2023 AISC application). Unless otherwise noted, all quote blocks in this post are from the application question doc.

For those of you who aren't Johannes Mayer reading this, I don't think this is the best use of your time, but your judgement on that is likely better than mine, especially when it's conditioned on mine, so if you still want to, read on!

0. The Problem

Make the following assumptions:
Reality can be perfectly modelled by a discrete model (including time).
You can compute everything that can be computed using finite memory and compute instantly.
You know the current state of the world perfectly.
You know the laws of physics perfectly.

Using these assumptions, come up with a high-level plan that when executed saves the world with very high probability. Be careful not to generate a missing steps plan.

If any constraints listed here are holding you back, and you think you could do better without them, ignore them! If there is a change you can make to the instructions such that you can get better outputs, make that change. If you made any changes (including ignoring instructions), briefly list them in the beginning and briefly explain how each change you made is an improvement over the original version.

Here are the changes I have made:

ADDITIONAL CONSTRAINT: You are unable to communicate to anyone in real life the fact that you have this unbounded compute and complete world model, nor ask them hypotheticals to this effect or do anything else that attempts to circumvent this constraint.
- Reason: To prevent me from answering with what I would actually do if I had this power, which would be to talk to a bunch of really smart alignment researchers (e.g. Eliezer Yudkowsky, John S. Wentworth, Tamsin Leake) and ask them what to do with my instant unbounded compute and perfect world model. They can probably do better than I can, but this "solution" is not very good for serving the actual purpose of this exercise.

1. The Plan

Short Summary

Grab Eliezer Yudkowsky, Nate Soares, and a bunch of other really smart, really sane people from our world model. Make sure to grab enough people to healthily propagate the species if necessary. Also grab some offices, labs, farms, etc.; everything they would need to survive on a mostly lifeless planet. Then, simulate all of that on ancient Earth, as early as there was enough oxygen in the atmosphere, so probably around 2 billion years ago. Set up the simulation so that it terminates once a predetermined "Signal Event" occurs, and then outputs the contents of a predetermined output channel. Since the result of this simulation can be computed with finite (albeit large) amounts of memory and compute, I can compute it instantly. If the plan worked, the output should contain instructions for a pivotal act that saves the world with very high probability (e.g. code for an aligned one-shot AGI). Execute those instructions.

Long Summary

Vessel Location

Assuming our complete knowledge of the world state is only a low-level physics model, it is nontrivial to "grab" high-level structures like people and farms. Also, our world model presumably contains the whole universe, so it is nontrivial to even locate the Earth within it. The method I found for getting around this is to put all of the people and things we want to grab into a designated area that we will call "the vessel". Then we will mark the vessel with something easily identifiable in the low-level model. Call this thing "the flag". The flag will be a pattern of tiles on a square grid, with some tiles made of Chromium, and others made of Zirconium (metals rare enough that it would be really weird for the pattern to come up naturally, but not so rare that I'm unable to buy the necessary amounts with the vast amount of money I can make using my unbounded computational power). The pattern will be a binary sequence encoding a bunch of data entangled with our planet e.g. the entirety of Wikipedia (this is probably overkill. Just a few bytes of data should be more than enough). Then we will specify this pattern and tell our god-computer to locate the flag and "grab" a certain volume (specified in natural units) below it (this volume will contain the vessel). Things necessary for this part of the plan:

Specification of the flag as something our god-computer can locate within a low level model of the universe
A flag location algorithm that is robust against faraway aliens/unaligned superintelligences trying to hijack it
Specification of the vessel relative to the flag
Actually building the vessel and the flag in real life

Ancient Earth Simulation

After grabbing the vessel, we run Earth back (using our perfect physics knowledge, we can run a simulation backwards in time) 2 billion years to place the vessel. Then, place the vessel on the surface of this ancient Earth and run the simulation forward! If the plan is successful, the researchers will likely start by repopulating the Earth and building a civilization that has a higher sanity waterline and is better at coordinating than ours. In particular, this civilization will take AI risk seriously and not rush headfirst into AGI. This civilization will also know that it is being simulated, and it will know how to send a message back to us once it solves alignment. Likely the message will actually end up being sent by the friendly AGI that the civilization builds, which will be able to reason about our world very well (especially given all of the data we put in the vessel) and know the best message to send to us. The flip side of this, though, is that if this civilization dies or goes insane, our message will be built by a different intelligent civilization that evolves, or perhaps by an unfriendly AGI. In order to prevent this, we will have a specified "check in" condition that the civilization will use to signal to us that it is still alive and well. An idea I had for such a condition is "if a hundred years go by without n photons in this specified frequency range exiting the earth's atmosphere (defined in terms of distance from the center of gravity), then terminate the simulation and output an error message with a few snapshots of the vessel destination along its timeline up until that point", where the frequency range is one used for cellular or radio communications (so that it's really easy for our civilization to confirm aliveness), and n is large enough that black-body radiation from the Earth and reflected sunrays aren't enough. Also, we'll only simulate our solar system (which we'll define as a sphere of a specified radius with the sun's center of gravity as its center) to prevent grabby aliens and faraway superintelligences from hijacking the simulation. Things necessary for this part:

Specification of "center of gravity" and the like.
Figuring out the best frequency range and n
Coming up with more and better safety checks to prevent weirder problems and become more robust against normal problems
Making sure that simulating only the solar system doesn't mess things up in ways that I don't realize because I don't know much astrophysics. Like, is the gravity from our galaxy's black hole important for some reason? Do random cosmic rays from outside stop our sun from exploding? I have no clue.
Ideally, include a way for this simulation to proceed for arbitrary lengths of time while preserving our civilization's ability to survive. Maybe throw in an artificial negentropy generator?
Figure out who and what to bring in the vessel to maximize the chances that they create a good, stable civilization

The Output

Most of the paragraph below is just speculation about the contents of the output, not technical details of the plan itself. Feel free to skip everything after the first sentence. Once our simulated beings produce a good plan, they will show it to us by building a new vessel, this time with different (pre-defined) data in the flag, but with the same general structure. Inside the Output Vessel, they will put all of the information that they want to give us. At this point, whoever (or whatever) is doing this is way smarter than I am, so whatever I can think of to put in the Output Vessel is likely worse than what they will actually end up putting, but my speculation will at least provide a lower bound. I think at the very least, they'll give us code for a self-improving one-shot AGI that executes a pivotal act that results in the world being saved with super high probability. I'd also guess that they might put brainscans of themselves into the Output Vessel, so that they can join us in the utopia we build. Unless, of course, friendly AGI is impossible, in which case they will probably just send us a more mundane but still very smart plan for preventing risks from unaligned AGI, or, if that turns out to also be impossible, preventing whatever other x-risks we face (e.g. biorisk). I'd guess they'll do this by simply telling us how to build a civilization like theirs on our world. Things necessary for this part:

Pretty much the same stuff as for vessel location

Bonus: Adapting this into a real-life plan for outer-alignment of an inner-aligned formal-goal AGI

At some point while coming up with this plan, I realized that it is actually very similar to the QACI alignment plan, and that perhaps we can turn this into a formal goal like QACI. I call this tentative alignment plan "CCS", which stands for Counterfactual Civilization Simulation. The big obstacle to this, of course, is that it would require a specification of actual physics precise enough to simulate the Earth with people on it for possibly billions of years. We of course, don't expect the AGI to run this simulation, it's just to make a formal goal that reliably points to our values Comparison of CCS vs QACI - QACI requires a true name of "counterfactual", but that's about it. It just needs to ask, "If we replace this blob with a question, what will most likely replace the answer blob?". Physics and everything else is expected to be inferred from the existence of this "question" blob. CCS, on the other hand, requires a prior specification of an approximation of physics at least good enough to simulate an Earth with humans for billions of years. - QACI is a function that must be called recursively (since we aren't expecting anyone to solve alignment fully within the short interval), creating a big complicated graph. There are lots of clever tricks for preventing this from causing a memetic catastrophe, but there are lots of places these tricks can fail. CCS, on the other hand, only needs to be called once. The simulacra solving alignment have a LOT more time than we do, and they can build an entire civilization optimized around our/their goal. - QACI is vulnerable to Solomonoff daemons and superintelligences launched within the simulated world (since it is the modern world with all of its AI development, and there might be a bunch of timelines dying during the QACI interval without us realizing). CCS immediately selects a single world without going through the universal prior, and that world is one where there they can delay AI development for as long as they want!) - The output is easier to "grab" from QACI, since it's just a file on a computer that can straightforwardly be interpreted as a math expression. Though, it actually shouldn't be too hard to rig up something similar for CCS. Maybe have the Output Vessel filled with more Chromium-Zirconium checkerboards with the math expression encoded or something like that - In general, CCS seems safer but also harder than QACI.

Full todo list

Roughly listed in the chronological order in which we should do these. Things marked with a (*) are things that we would need for an actual real-life alignment plan, but not in the hypothetical scenario.

Figure out formal math for the following:
- Locating the vessel flag in a way that is robust against hijacking from afar
- Simulating the past solar system 2 billion years ago
- Finding a place on the Earth's surface to put the vessel (alternatively, make the vessel very strong or attach a gigantic parachute and let the simulation just drop it from anywhere in the atmosphere)
- Identifying the check-in signal
- Detecting the output
- Reading data off of the output and giving it to us ((*) interpreting it as a utility function)
- (*) The actual physics simulation
(*) Figure out how to make an inner-aligned AGI with embedded agency whose goal is to maximize a mathematical function that we give it (obviously very infohazardous. Don't publish)
Decide who and what to put in the real-life vessel
Make the plan super-robust against any sorts of attack vectors
Figure out safe tests to ensure that the plan will work as intended, and execute those tests
(*) Make sure that no unaligned AGI is built in the meantime
Actually build the vessel and put the people inside
Run the simulation on our omegacomputer! ((*) Run our AGI with the CCS formal goal!) Note: Unless you are Johannes Mayer evaluating me, you probably won't get much value out of reading the rest of this post, aside from maybe part 5. Anything down there that might be remotely worth reading, I'll write up much better in a separate post at some point.

2. Progress on a subproblem

Summary

I focused on the problem of formalizing flag location so that we can program it into our supercomputer. I started with neutron and proton location, then individual elements, then finding atom-densities of regions of space, then seeing how close defined rectangular prisms are to being tiles of the desired time. Finally, this all culminated in a "distance" function that, when minimized, should give us a point in spacetime and some orienting vectors that correspond to the top-left corner of a correctly-built vessel flag! Todo: Add measures to defend against flag impersonations created by aliens or alien superintelligences.

The Formalizations

We'll ignore quantum physics and assume quarks are native in our physics model.
The goal: We start with , where $x$ is supposed to be the point in spacetime at the top left corner of the flag, $τ$ is the direction of time in the reference frame of Earth (since that might not be the default reference frame of our model), $n$ is a spacelike unit vector orthogonal to the plane containing the flag, pointing "upwards" out of the flag, and $m$ is a spacelike unit vector pointing the direction that is "right" on the flag, so that $m \times n$ points "down" on the flag.
Now to achieve this goal, we will start with the very basics: protons and neutrons, and work up from there. Unless otherwise specified, from now on assume we are in the Earth's relativistic reference frame given by $τ$ .
A point $p$ in 4-spacetime is defined to be a "neutron" if there are exactly 2 down quarks and 1 up quark and no other quarks within [neutron radius] of $p$ in space, and if no point within [neutron radius] of $p$ in space has already been designated as a "neutron". We'll define proton similarly, and do the same thing for whole atoms except with protons and neutrons instead of quarks, and nucleus radius instead of neutron radius.
To get density of a certain element in a certain volume, count the number of atoms of that element and divide it by the total volume. Let $d e n$ be a function from spacelike volumes to 118-dimensional vectors that gives the density of each element in the volume.
Now we will define $T D_{b}$ , which takes a point in spacetime (the top right corner of the tile), a time direction, two spacelike vectors for orientation, and a bit $b$ that tells whether we're looking for a Zirconium or a Chromium tile, and outputs a nonnegative number that is lower the "closer" the point is to being the desired tile. This will also use constants $w$ for the tile thickness and $l$ for the tile side legth. The dimensions of the tile are predetermined constants, as are the desired densities of all elements involved (it won't all be 0 except Zirconium/Chromium, since we can expect some impurities). We will subtract the 118-vector of actual element densities in the tile space defined by our $q$ from the 118-vector of desired densities $μ_{b, k}$ . Then we will dot product this with a vector of ${(\frac{1}{σ_{b, k}})}_{k = 1}^{118}$ , where $σ_{b, k}$ is the standard deviation of density of element $k$ , measured in real life on a sample of a bunch of (Zirconium if $b = 0$ , Chromium if $b = 1$ ) tiles and put into our program as constants. That way, variance in element concentrations that are supposed to vary won't matter as much.

T D_{b} = ∣ ∣ ∣ (d e n (R e c t P r i s m (x, τ, l, w, w, n, m)) - (μ_{b, k})_{k = 1}^{118}) \cdot {(\frac{1}{σ_{b, k}})}_{k = 1}^{118} ∣ ∣ ∣

(there are supposed to be absolute value signs around all of that but for some reason those aren't rendering in the editor to I'm not sure if you'll see them.
Error that I noticed 7 hours after posting: The absolute value signs should be around the difference $d e n (\dots) - (μ_{b, k})_{k = 1}^{118}$ and it should be evaluated componentwise, so that positive and negative differences don't cancel each other out.
Now if $q = (x, τ, n, m)$ and $p$ is an $n \times n$ bit (0-indexed) matrix with the desired pattern, with $x = (x, y, z, t)$ , we can let

d_{0} (q, p) = n - 1 \sum i = 0 n - 1 \sum j = 0 T D_{p_{i j}} (x + i l ¯ ¯¯¯ ¯ m + j l ¯ ¯¯¯ ¯ m \times ¯ ¯¯ ¯ n, τ, n),

where $¯ ¯¯ ¯ n$ and $¯ ¯¯¯ ¯ m$ are $n$ and $m$ converted to 4-vectors from the reference frame defined by $τ$ to the "global" reference frame, and cross products are defined in the 3-space orthogonal to $τ$ .
We're calling it $d_{0}$ rather then $d$ since we still need to add terms for false flag prevention, maybe distances to nearby galaxies?
Here's a whiteboard pic summarizing this part (apologies for terrible handwriting and layout. I will make a better fully digital version of this at some point if I decide to develop this plan further):

3. Retrospective

For a "summary" of this section, just jump to the "summary of failures and how I'll prevent them in the future" subsection.

Notes on thought process

I started by asking myself "what would I do with this power?"
This led to a "plan" that was very much cheating,^[1] so I instead asked "what would I do with this power if no one else on Earth were nearly as smart as me?"
However, something that came to mind before I came up with the cheating "plan" seemed promising: use the unlimited compute to simulate smart people solving alignment with a lot more {something} than us.
The first idea for that {something} was time. Right now, we probably have 2-30 years. What if we had a lot longer?
Then, I pretty much immediately came up with the idea to put these smart people on past Earth
The fact that my final product grew out of the first thing I came up with is suspicious. I think perhaps I should have spent longer doing a breath-first search of ideas before settling on one.
However, I think beyond this mistake, I developed the idea well, throwing out and replacing bad sub-ideas when necessary.
I also think my final product is something that could become a genuine outer-alignment/value-extrapolation solution.

Notes on time

A majority of my time was spent in the "ideation" process, i.e. the stuff that went into my documentation (see section 4), as opposed to "output-generation" which is mostly just writing this document, including this retrospective.
Of my "output generation" time, most of that was spent on parts 1 and 2.
I.e. I don't think I'm spending nearly enough time on this retrospective, nowhere near the suggested 15%
The failure here was that I did not predict and plan in advance how long each part of this would take, and so I fell prey to Hofstadter's Law and ended up spending too long on early parts and not enough on later parts, as well as finishing this whole thing a week later than I meant to (sorry Johannes).
- In my defense, schoolwork this quarter ended up being a lot more time-consuming than I had predicted. I will take on a much lighter courseload next quarter if I'll be doing SPAR at the same time.
But also, spending 15% of my time on this seems weird. It seems like there's no way I could spend that much time retrospecting. I notice I am confused. Is my idea of a retrospective missing something? Did I end up spending much more time than Johannes intended on part 1 and/or 2 of this?

Notes on desired output

The subproblem didn't end up involving "some hard problem of alignment". That sort of ended up being covered by the plan more broadly (delegating all of the "hard problems" to the simulated civilization, or, for the real-life version of the plan, delegating the "hard problem" of human value extrapolation to them).
Does this mean I still cheated? Nah, alignment is a super hard, super complicated problem. Thinking of a sort of out-of-the-box solution is not cheating. (Though, it was not that out of the box. It's in the same box as QACI, after all.)
Length of parts of the output roughly are in proportion with the amount of time I spent on each part. Any failure there was mostly caused by failure on time partitioning.
My documentation also ended up pretty weird. It was meant to be a sort of stream-of-consciousness as I was thinking about this problem, but I think much faster than I type, so a lot of thoughts were missed.
- I think the solution here is just to get faster at typing? Maybe switch to colemak or something? idk

Summary of failures and how I'll prevent them in the future

Problem: I immediately went with the first large-scale idea I came up with.
- Solution: Commit to spend a predetermined amount of time (maybe 5 or 10 minutes?) thinking of and listing large-scale ideas without delving deeper into them.
Problem: Distribution of time spent on each section was way off.
- Solution: Predict the amount of total time I spend on the project and partition it intentionally, accounting for Hofstadter's Law.
- Sub-problem: Confusion wrt the "spend 15% on retrospective" thing.
  - Solution: Get clarification from Johannes Mayer. Johannes, is my retrospective missing things and/or did I end up spending more time on parts 1 and 2 than you intended, such that your 15% advice no longer applies?
Problem: Documentation didn't fully capture my thought process.
- Solution: Maybe that's ok? I can still improve though by getting faster at typing, or thinking of other ways to more efficiently document my thoughts.

4. Documentation

For now I am omitting this from the public post, as it is a little embarassing. It does not to the best of my knowledge contain infohazards or private personal information, and the random number generator I used for glomarization did not roll a 1 (it was a d6), so if I do end up publishing it, it will be completely uncensored. I sent this uncensored documentation to Johannes Mayer along with a link to this post as my application to his SPAR team.

5. What now?

Well, I'll continue to develop this plan whenever I think it's the best use of my time. You can track my forecasted probability of pursuing this plan full-time here. If others make different enough predictions, I might subsidize a manifold market on it. It's quite low right now since there are a whole bunch of other things I could be doing in the near-future (including hopefully working on Johannes Mayer's Science Algorithm project :)). Anyone else is of course welcome to work on this as well if they want to. Message me on LW if you come up with anything cool!

^{^}
but as they say in dath ilan, cheating is technique!

LESSWRONG
LW