Donald Hobson

Cambridge maths student

Donald Hobson's Comments

Referencing the Unreferencable

Suppose that I am in a simulation, and the simulator drops in a hard disc containing a detailed description of the world outside the simulation. This description says how to reply. The simulator is clearly referencable. Gradually reduce the amount of evidence. You are looking at a pattern in coinflips, it might be a message from simulators, or maybe noise. You are looking at physical constants and wondering why the simulators chose . There is no sharp line from referencable to unreferencable. Just a gradual increase in uncertainty.

When I say "there is a chair over there" I am not refer to a single hypothesis, a particular arrangement of atoms. Instead I am refering to an implicitly represented ensemble of hypothesis. This ensemble contains universes made of atoms, strings, platonic elements and much else besides. Within the set of atomic universes, the set contains all arrangements of atoms that contain a chair over there. So within this set is a universe of atoms, defined in terms of a long list of coordinates, in which [the moon is made of green cheese, and a solid diamond rocking chair is in the indicated direction ][Translated from a big list of numbers]. So "the simulator has green hair" is only a valid proposition over the subset of possible universes that contain exactly one simulator. The probability you assign to this subset can vary. When it is almost 1, "the simulator has green hair" feels either true or false. You feel like you can reference "the simulator"

The absurdity of un-referenceable entities
"Un-referenceable entities" is, after all, a reference.

But not to a single entity. Some expressions in ZFC uniquely refference a single real number. Ie or . All sorts of functions, roots trig functions ect can be expressed. There are countably many finite strings of symbols. The reals are uncountable, so the set of unreferenceable real numbers must be nonempty. But in general, testing if an arbitrary string really uniquely defines a value is not easy. It is equivalent to knowing if an arbitrary formula is true or false. So we need to work in ZFC+1. Within ZFC+1, there is a model of ZFC, and so you can take the set of all formulae that can be proved to uniquely define a number within the model, and then take the complement of it. (This is another source of subtlety, The reals within the model may not be the whole reals, which complement do you take)

This gets into some really complicated and subtle bits of model theory. Your paradox, like the set of all sets that don't contain themselves, is formed by the English language confusing concepts that are subtly different in formal maths.

Implications of the Doomsday Argument for x-risk reduction

Suppose we ignore the simulation argument and take the evidence of history and astronomy at face value. The doomsday argument provides a good prior. However, the evidence that shows we are on early earth is really strong, and the prior is updated away. If we take the simulation hypothesis into account, then there could be a version of us in reality, and many in simulations. The relative balance of preventing X risk vs having a good time is swung, but still strongly cares about X risk. Actually, the doomsday argument puts the probability that infinitely many people will exist, but only finitely many have existed so far at 0, so I'm don't think I believe it.

Solipsism is Underrated

Someone that knows quantum physics but almost no computing looks at a phone. They don't know how it works inside. They are uncertain about how apps result from material phenomenon. This is just normal uncertainty over a set of hypothesis. One of those hypotheses is the actual answer, many others will look like alternate choices of circuit board layout or programming language. They still need to find out how the phone works, but that is because they have many hypothesis that involve atoms. They have no reason to doubt that the phone is made of atoms.

I don't know how your brain works either, but I am equally sure it is made of (atoms, quantum waves, strings or whatever). I apply the same to my own brain.

In the materialist paradigm I can understand Newtonian gravity as at least an approximation of whatever the real rules are. How does a solipsist consider it?

Solipsism is Underrated
Suppose that you update on the evidence that you experience conscious qualia

What exactly would it mean to perform a baysian update on you not experiencing qualia?

The only ontological primitive is my own mind.

The primitives of materialism are described in equations. Does a solipsist seek an equation to tell them how angry they will be next Tuesday? If not, what is the substance of a solipsistic model of the world?

This belief in some mysterious ability for the mental to supervene on the physical

I am not sure what you mean my that, I consider my mind to be just an arrangement of atoms. An arrangement governed by the same laws as the rest of the universe.

how puzzling is the view, that the activity of these little material things somehow is responsible for conscious qualia?

I am not sure where the instinct that consciousness can't be materialistic comes from, although I would suspect that it might come from a large amount of uncertainty, and an inability to imagine any specific answer that you would consider a good explanation. Wherever this instinct comes from, I don't think it is reliable.

You know that "if a tree falls in a forest, and there is no one there to hear it, does it make a sound?" thing. Even after all the factual questions, like if audio equipment would record something, have been answered, there is a feeling of a question remaining. I expect any explanation of qualia to look somewhat similar, a description of how mental imperfections produce a sensation of something.

Consider the limiting case of describing minds in terms of algorithms, you scan a philosophers brain, put the data into a computer, and predict exactly their discussion on qualia. Once you have a complete understanding of why the philosopher talks about qualia, if the philosopher has any info about qualia at all, the process by which they gained that info should be part of the model.

Pick something up, drop it, watch it fall. Can solipsism consider this observation to be more likely than some max entropy observation? How does a solipsist predict the experience of watching the object fall.

What are the most plausible "AI Safety warning shot" scenarios?

I agree that these aren't very likely options. However, given two examples of an AI suddenly stopping when it discovers something, there are probably more for things that are harder to discover. In the pascel mugging example, the agent would stop working, only when it can deduce what potential muggers might want it to do, something much harder than noticing the phenomenon. The myopic agent has little incentive to make a non myopic version of itself. If dedicating a fraction of resources into making a copy of itself reduced the chance of the missile hacking working from 94%, to 93%, we get a near miss.

One book, probably not. A bunch of books and articles over years, maybe.

AGI in a vulnerable world

I put non-trivial probability mass (>10%) on a relitivisticly expanding bubble of Xonium (computronium, hedonium ect) within 1 second of AGI.

While big jumps are rarer than small jumps, they cover more distance, so it is quite possible we go from a world like this one, except with self driving cars, and a few other narrow AI applications to something smart enough to bootstrap very fast.

What are the most plausible "AI Safety warning shot" scenarios?
A "AI safety warning shot" is some event that causes a substantial fraction of the relevant human actors (governments, AI researchers, etc.) to become substantially more supportive of AI research and worried about existential risks posed by AI.

A really well written book on AI safety, or other public outreach campaign could have this effect.

For many events, such as a self driving car crashing, it might be used as evidence for an argument about AI risk.

On to powerful AI systems causing harm, I agree that your reasoning applies to most AI's. There are a few designs that would do something differently. Myopic agents are ones with lots of time discounting within their utility function. If you have a full super-intelligence that wants to do X as quickly as possible, such that the fastest way to do X will also destroy itself, that might be survivable. Consider an AI set to maximize the probability that its own computer case is damaged within the next hour. The AI could bootstrap molecular nanotech, but that would take several hours. The AI thinks that time travel is likely impossible, so by that point, all the mass in the universe can't help it. The AI can hack a nuke and target itself. Much better by its utility function. Nearly max utility. If it can, it might upload a copy of its code to some random computer. (There is some tiny chance that time travel is possible, or that its clock is wrong) So we only get a near miss, if the AI doesn't have enough spare bandwidth or compute to do both. This is assuming that it can't hack reality in a microsecond.

There are a few other scenarios, for instance impact minimising agents. There are some designs of agents that are restricted to have a "small" effect on the future, as a safety measure. This is measured by the difference between what actually happens, and what would happen if it did nothing. When this design understands chaos theory, it will find that all other actions result in too large an effect, and do nothing. It might do a lot of damage before this somehow, depending on circumstances. I think that the AI discovering some fact about the universe that causes the AI to stop optimising effectively is a possible behaviour mode. Another example of this would be pascals mugging. The agent acts dangerously, and then starts outputting gibberish as it capitulates to a parade of fanciful pascals muggers.

The questions one needs not address

The problem is that questions don't come with little labels saying whether or not they are answerable. Ban all deep philosophy and you don't get Francis Bacon or Isaac Newton. We can now say that trying to answer questions like "what is the true nature of god" isn't going to work. We now know that an alchemist can't turn lead into gold by rubbing lemons on it. However, it was a reasonable thing to try, given the knowledge of the time, and other alchemical experiments produced useful results like phosphorus.

Celebrating the people who dedicated their lives to building the first steam engine, while mocking people who tried to build perpetual motion machines before conservation of energy was understood, is just pure hindsight, and so can't be used as a lesson for the future.

Go ahead and mock those who aim for perpetual motion in the modern day.

people wasting their time thinking about ill-phrased questions, just replacing “God” with “a simulation” or replacing “repenting for the end times” with “handling AI risk”.

Given current evidence, I suspect that this field is a steam engine not a perpetual motion machine. I suspect that good answers are possible. We might not be skilled enough to reach them, but we know little enough about how much skill is needed that we can't be confident of failure. At least a few results, like mesa optimisers, look like successes.

Donald Hobson's Shortform

Soap and water or hand sanitiser are apparently fine to get covid19 off your skin. Suppose I rub X on my hands, then I touch an infected surface, then I touch my food or face. What X will kill the virus, without harming my hands?

I was thinking zinc salts, given zincs antiviral properties. Given soaps tendency to attach to the virus, maybe zinc soaps? Like a zinc atom in a salt with a fatty acid? This is babbling by someone who doesn't know enough biology to prune.

Load More