Diffractor

Belief Functions And Decision Theory

So, first off, I should probably say that a lot of the formalism overhead involved in *this post in particular* feels like the sort of thing that will get a whole lot more elegant as we work more things out, but "Basic inframeasure theory" still looks pretty good at this point and worth reading, and the basic results (ability to translate from pseudocausal to causal, dynamic consistency, capturing most of UDT, definition of learning) will still hold up.

Yes, your current understanding is correct, it's rebuilding probability theory in more generality to be suitable for RL in nonrealizable environments, and capturing a much broader range of decision-theoretic problems, as well as whatever spin-off applications may come from having the basic theory worked out, like our infradistribution logic stuff.

It copes with unrealizability because its hypotheses are not probability distributions, but sets of probability distributions (actually more general than that, but it's a good mental starting point), corresponding to properties that reality may have, without fully specifying everything. In particular, if an agent learns a class of belief functions (read: properties the environment may fulfill) is learned, this implies that for all properties within that class that the true environment fulfills (you don't know the true environment exactly), the infrabayes agent will match or exceed the expected utility lower bound that can be guaranteed if you know reality has that property (in the low-time-discount limit)

There's another key consideration which Vanessa was telling me to put in which I'll post in another comment once I fully work it out again.

Also, thank you for noticing that it took a lot of work to write all this up, the proofs took a while. n_n

Less Basic Inframeasure Theory

So, we've also got an analogue of KL-divergence for crisp infradistributions.

We'll be using and for crisp infradistributions, and and for probability distributions associated with them. will be used for the KL-divergence of infradistributions, and will be used for the KL-divergence of probability distributions. For crisp infradistributions, the KL-divergence is defined as

I'm not entirely sure why it's like this, but it has the basic properties you would expect of the KL-divergence, like concavity in both arguments and interacting well with continuous pushforwards and semidirect product.

Straight off the bat, we have:

**Proposition 1:**

Proof: KL-divergence between probability distributions is always nonnegative, by Gibb's inequality.

**Proposition 2:**

And now, because KL-divergence between probability distributions is 0 only when they're equal, we have:

**Proposition 3:** *If ** is the uniform distribution on **, then *

And the cross-entropy of any distribution with the uniform distribution is always , so:

**Proposition 4:** *is a concave function over* .

Proof: Let's use as our number in in order to talk about mixtures. Then,

Then we apply concavity of the KL-divergence for probability distributions to get:

**Proposition 5: **

At this point we can abbreviate the KL-divergence, and observe that we have a multiplication by 1, to get:

And then pack up the expectation

Then, with the choice of and fixed, we can move the choice of the all the way inside, to get:

Now, there's something else we can notice. When choosing , it doesn't matter what is selected, you want to take every and maximize the quantity inside the expectation, that consideration selects your . So, then we can get:

And pack up the KL-divergence to get:

And distribute the min to get:

And then, we can pull out that fixed quantity and get:

And pack up the KL-divergence to get:

**Proposition 6:**

To do this, we'll go through the proof of proposition 5 to the first place where we have an inequality. The last step before inequality was:

Now, for a direct product, it's like semidirect product but all the and are the same infradistribution, so we have:

Now, this is a constant, so we can pull it out of the expectation to get:

**Proposition 7:**

For this, we'll need to use the Disintegration Theorem (the classical version for probability distributions), and adapt some results from Proposition 5. Let's show as much as we can before showing this.

Now, hypothetically, if we had

then we could use that result to get

and we'd be done. So, our task is to show

for any pair of probability distributions and . Now, here's what we'll do. The and gives us probability distributions over , and the and are probability distributions over . So, let's take the joint distribution over given by selecting a point from according to the relevant distribution and applying . By the classical version of the disintegration theorem, we can write it either way as starting with the marginal distribution over and a semidirect product to , or by starting with the marginal distribution over and you take a semidirect product with some markov kernel to to get the joint distribution. So, we have:

for some Markov kernels . Why? Well, the joint distribution over is given by or respectively (you have a starting distribution, and lets you take an input in and get an output in ). But, breaking it down the other way, we start with the marginal distribution of those joint distributions on (the pushforward w.r.t. ), and can write the joint distribution as semidirect product going the other way. Basically, it's just two different ways of writing the same distributions, so that's why KL-divergence doesn't vary at all.

Now, it is also a fact that, for semidirect products (sorry, we're gonna let be arbitrary here and unconnected to the fixed ones we were looking at earlier, this is just a general property of semidirect products), we have:

To see this, run through the proof of Proposition 5, because probability distributions are special cases of infradistributions. Running up to right up before the inequality, we had

But when we're dealing with probability distributions, there's only one possible choice of probability distribution to select, so we just have

Applying this, we have:

The first equality is our expansion of semidirect product for probability distributions, second equality is the probability distributions being equal, and third equality is, again, expansion of semidirect product for probability distributions. Contracting the two sides of this, we have:

Now, the KL-divergence between a distribution and itself is 0, so the expectation on the left-hand side is 0, and we have

And bam, we have which is what we needed to carry the proof through.

CO2 Stripper Postmortem Thoughts

It is currently disassembled in my garage, will be fully tested when the 2.0 version is built, and the 2.0 version has had construction stalled for this year because I've been working on other projects. The 1.0 version did remove CO2 from a room as measured by a CO2 meter, but the size and volume made it not worthwhile.

John_Maxwell's Shortform

Potential counterargument: Second-strike capabilities are still relevant in the interstellar setting. You could build a bunch of hidden ships in the oort cloud to ram the foe and do equal devastation if the other party does it first, deterring a first strike even with tensions and an absence of communication. Further, while the "ram with high-relativistic objects" idea works pretty well for preemptively ending a civilization confined to a handful of planets, AI's would be able to colonize a bunch of little asteroids and KBO's and comets in the oort cloud, and the higher level of dispersal would lead to preemptive total elimination being less viable.

Introduction to Cartesian Frames

I will be hosting a readthrough of this sequence on MIRIxDiscord again, PM for a link.

The rationalist community's location problem

Reno has 90F daily highs during summer. Knocking 10 degrees off is a nonneglible improvement over Las Vegas, though.

Needed: AI infohazard policy

So, here's some considerations (not an actual policy)

It's instructive to look at the case of nuclear weapons, and the key analogies or disanalogies to math work. For nuclear weapons, the basic theory is pretty simple and building the hardware is the hard part, while for AI, the situation seems reversed. The hard part there is knowing what to do in the first place, not scrounging up the hardware to do it.

First, a chunk from Wikipedia

Most of the current ideas of the Teller–Ulam design came into public awareness after the DOE attempted to censor a magazine article by U.S. anti-weapons activist Howard Morland in 1979 on the "secret of the hydrogen bomb". In 1978, Morland had decided that discovering and exposing this "last remaining secret" would focus attention onto the arms race and allow citizens to feel empowered to question official statements on the importance of nuclear weapons and nuclear secrecy. Most of Morland's ideas about how the weapon worked were compiled from highly accessible sources—the drawings which most inspired his approach came from the

Encyclopedia Americana. Morland also interviewed (often informally) many former Los Alamos scientists (including Teller and Ulam, though neither gave him any useful information), and used a variety of interpersonal strategies to encourage informational responses from them (i.e., asking questions such as "Do they still use sparkplugs?" even if he wasn't aware what the latter term specifically referred to)....

When an early draft of the article, to be published inThe Progressivemagazine, was sent to the DOE after falling into the hands of a professor who was opposed to Morland's goal, the DOE requested that the article not be published, and pressed for a temporary injunction. After a short court hearing in which the DOE argued that Morland's information was (1). likely derived from classified sources, (2). if not derived from classified sources, itself counted as "secret" information under the "born secret" clause of the 1954 Atomic Energy Act, and (3). dangerous and would encourage nuclear proliferation...Through a variety of more complicated circumstances, the DOE case began to wane, as it became clear that some of the data they were attempting to claim as "secret" had been published in a students' encyclopedia a few years earlier....

Because the DOE sought to censor Morland's work—one of the few times they violated their usual approach of not acknowledging "secret" material which had been released—it is interpreted as being at least partially correct, though to what degree it lacks information or has incorrect information is not known with any great confidence.

So, broad takeaways from this: The Streisand effect is real. A huge part of keeping something secret is just having nobody suspect that there *is* a secret there to find. This is much trickier for nuclear weapons, which are of high interest to the state, while it's more doable for AI stuff (and I don't know how biosecurity has managed to stay so low-profile). This doesn't mean you can just wander around giving the rough sketch of the insight, in math, it's not too hard to reinvent things once you know what you're looking for. But, AI math does have a huge advantage in this it's a really broad field and hard to search through (I think my roommate said that so many papers get submitted to NeurIPS that you couldn't read through them all in time for the next NeurIPS conference), and, in order to reinvent something from scratch without having the fundamental insight, you need to be pointed in the *exact* right direction and even then you've got a good shot at missing it (see: the time-lag between the earliest neural net papers and the development of backpropagation, or, in the process of making the Infra-Bayes post, stumbling across concepts that could have been found months earlier if some time-traveler had said the right three sentences at the time.)

Also, secrets can get out through *really* dumb channels. Putting important parts of the H-bomb structure in a student's encyclopedia? Why would you do that? Well, probably because there's a lot of people in the government and people in different parts have different memories of which stuff is secret and which stuff isn't.

So, due to AI work being insight/math-based, security would be based a lot more on just... not telling people things. Or alluding to them. Although, there is an interesting possibility raised by the presence of so much other work in the field. For nuclear weapons work, things seem to be either secret or well-known among those interested in nuclear weapons. But AI has a big intermediate range between "secret" and "well-known". See all those Arxiv papers with like, 5 citations. So, for something that's kinda iffy (not serious enough (given the costs of the slowdown in research with full secrecy) to apply full secrecy, not benign enough to be comfortable giving a big presentation at NeurIPS about it), it might be possible to intentionally target that range. I don't think it's a binary between "full secret" and "full publish", there's probably intermediate options available.

Of course, if it's *known* that an organization is trying to fly under the radar with a result, you get the Streisand effect in full force. But, just as well-known authors may have pseudonyms, it's probably possible to just publish a paper on Arxiv (or something similar) under a pseudonym and not have it referenced anywhere by the organization as an official piece of research they funded. And it would be available for viewing and discussion and collaborative work in that form, while also (with high probability) remaining pretty low-profile.

Anyways, I'm gonna set a 10-minute timer to have thoughts about the guidelines:

Ok, the first thought I'm having is that this is probably a case where Inside View is just strictly better than Outside View. Making a policy ahead of time that can just be followed requires whoever came up with the policy to have a good classification in advance all the relevant categories of result and what to do with them, and that seems pretty dang hard to do especially because novel insights, almost by definition, are not something you expected to see ahead of time.

The next thought is that working something out for a while and then going "oh, this is roughly adjacent to something I wouldn't want to publish, when developed further" isn't *quite* as strong of an argument for secrecy as it looks like, because, as previously mentioned, even fairly basic additional insights (in retrospect) are pretty dang tricky to find ahead of time if you don't know what you're looking for. Roughly, the odds of someone finding the thing you want to hide scale with the number of people actively working on it, so that case seems to weigh in favor of publishing the result, but not actively publicizing it to the point where you can't befriend everyone else working on it. If one of the papers published by an organization could be built on to develop a serious result... well, you'd still have the problem of not knowing which paper it is, or what unremarked-on direction to go in to develop the result, if it was published as normal and not flagged as anything special. But if the paper got a whole bunch of publicity, the odds go up that someone puts the pieces together spontaneously. And, if you know everyone working on the paper, you've got a saving throw if someone runs across the thing.

There *is* a *very* strong argument for talking to several other people if you're unsure whether it'd be good to publish/publicize, because it reduces the problem of "person with laxest safety standards publicizes" to "organization with the laxest safety standards publicizes". This isn't a full solution, because there's still a coordination problem at the organization level, and it gives incentives for organizations to be really defensive about sharing their stuff, including safety-relevant stuff. Further work on the inter-organization level of "secrecy standards" is very much needed. But within an organization, "have personal conversation with senior personnel" sounds like the obvious thing to do.

So, current thoughts: There's some intermediate options available instead of just "full secret" or "full publish" (publish under pseudonym and don't list it as research, publish as normal but don't make efforts to advertise it broadly) and I haven't seen anyone mention that, and they seem preferable for results that would benefit from more eyes on them, that could also be developed in bad directions. I'd be skeptical of attempts to make a comprehensive policy ahead of time, this seems like a case where inside view on the details of the result would outperform an ahead-of-time policy. But, one essential aspect that *would* be critical on a policy level is "talk it out with a few senior people first to make the decision, instead of going straight for personal judgement", as that tamps down on the coordination problem considerably.

CO2 Stripper Postmortem Thoughts

Person in a room: - 35 g of O2/hr from room

Person in a room with a CO2 stripper: -35 g of O2/hr from room

How does the presence of a CO2 stripper do *anything at all* to the oxygen amount in the air?

Introduction To The Infra-Bayesianism Sequence

Do you think this problem is essentially different from "suppose Omega asks you for 10 bucks. You say no. Then Omega says "actually I flipped a fair coin that came up tails, if it had come up heads, I would have given you 100 dollars if I predicted you'd give me 10 dollars on tails"?

(I think I can motivate "reconsider choosing heads" if you're like "yeah, this is just counterfactual mugging with belated notification of what situation you're in, and I'd pay up in that circumstance")

I'd go with number 2, because my snap reaction was "ooh, there's a "show personal blogposts" button?"

EDIT: Ok, I found the button. The problem with that button is that it looks identical to the other tags, and is at the right side of the screen when the structure of "Latest" draws your eyes to the left side of the screen. I'd make it a bit bigger and on the left side of the screen.