Donald Hobson

Cambridge maths student dph39@cam.ac.uk

Donald Hobson's Comments

Many Turing Machines

I think that you are putting forward example hypothesis that you don't really believe in order to prove your point. Unfortunately it isn't clear which hypothesis you do believe, and this makes your point opaque.

From a mathematical perspective, quantum collapse is about as bad as insisting that the universe will suddenly cease to exist in years time. Quantum collapse introduces a nontrivial complexity penalty, in particular you need to pick a space of simultaneity.

The different Turing machines don't interact at all. Physicists can split the universe into a pair of universes in the quantum multiverse, and then merge them back together in a way that lets them detect that both had an independent existence. In the quantum bomb test, without a bomb, the universes in which the photon took each path are identical, allowing interference. If the bomb does exist, no interference. Many worlds just says that these branches carry on existing whether or not scientists manage to make them interact again.

The Paradox of Robustness

Consider a self driving car. Call the human utility function . Call the space of all possible worlds . In the normal operation of a self driving car, the car has only makes decisions over the restricted space . Say In practice will contain a whole bunch of things the car could do. Suppose that the programmers only know the restriction of to . This is enough to make a self driving car that behaves correctly in the crash or don't crash dilemma.

However, suppose that a self driving car is faced with an off distribution situation from . Three things it could do include:

1) Recognise the problem and shut down.

2) Fail to coherently optimise at all

3) Coherently optimise some extrapolation of

The behavior we want is to optimise , but the info about what is just isn't there.

Options (1) and (2) makes the system brittle, tending to fail the moment anything goes slightly differently.

Option (3) leads to reasoning like, "I know not to crash into x, y and z, so maybe I shouldn't crash into anything", In other words, the extrapolation is often quite good when slightly off distribution. However when far off distribution, you can get traffic light maximizer behavior.

In short, the paradox of robustness exists because, when you don't know what to optimize for, you can fail to optimize, or you can guess at something and optimize that.

What is Abstraction?

I think that there are some abstractions that aren't predictively useful, but are still useful in deciding your actions.

Suppose I and my friend both have the goal of maximising the number of DNA strings whose MD5 hash is prime.

I call sequences with this property "ana" and those without this property "kata". Saying that "the DNA over there is ana" does tell me something about the world, there is an experiment that I can do to determine if this is true or false, namely sequencing it and taking the hash. The concept of "ana" isn't useful in a world where no agents care about it and no detectors have been built. If your utility function cares about the difference, it is a useful concept. If someone has connected an ana detector to the trigger of something important, then its a useful concept. If your a crime scene investigator, and all you know about the perpetrators DNA is that its ana, then finding out if Joe Blogs has ana DNA could be important. The concept of ana is useful. If you know the perpitrators entire genome, the concept stops being useful.

A general abstraction is consistent with several, but not all universe states. There are many different universe states in which the gas has a pressure of 37Pa, but also many where it isn't. So all abstractions are subsets of possible universe states. Usually, we use subsets that are suitable for reasoning about in some way.

Suppose you were literally omniscient, knowing every detail of the universe, but you had to give humans a 1Tb summary. Unable to include all the info you might want, you can only include a summery of the important points, you are now engaged in lossy compression.

Sensor data is also an abstraction, for instance you might have temperature and pressure sensors. Cameras record roughly how many photons hit them without tracking every one. So real world agents are translating one lossy approximation of the world into another without ever being able to express the whole thing explicitly.

How you do lossy compression depends on what you want. Music is compressed in a way that is specific to defects in human ears. Abstractions are much the same.

What are some non-purely-sampling ways to do deep RL?

The r vs r' problem can be reduced if you can find a way to sample points of high uncertainty.

On decision-prediction fixed points

I'm modeling humans as two agents that share a skull. One of those agents wants to do stuff and writes blog posts, the other likes lying in bed and has at least partial control of your actions. The part of you that does the talking can really say that it wants to do X, but it isn't in control.

Even if you can predict this whole thing, that still doesn't stop it happening.

On decision-prediction fixed points

Akrasia is the name we give the fact that the part of ourselves that communicates about X, and the part that actually does X have slightly different goals. The communicating part is always winging about how the other part is being lazy.

CO2 Stripper Postmortem Thoughts

If the whole reason you didn't want to open the window was the energy put in to heating/ cooling the air, why not use a heat exchanger? I reackon it cold be done using a desktop fan, a stack of thin aluminium plates, and a few pieces of cardboard or plastic to block air flow.

Open-Box Newcomb's Problem and the limitations of the Erasure framing

Imagine sitting outside the universe, and being given an exact description of everything that happened within the universe. From this perspective you can see who signed what.

You can also see whether your thoughts are happening in biology or silicon or whatever.

My point isn't "you can't tell whether or not your in a simulation so there is no difference", my point is that there is no sharp cut off point between simulation and not simulation. We have a "know it when you see it" definition with ambiguous edge cases. Decision theory can't have different rules for dealing with dogs and not dogs because some things are on the ambiguous edge of dogginess. Likewise decision theory can't have different rules for you, copies of you and simulations of you as there is no sharp cut off. If you want to propose a continuous "simulatedness" parameter, and explain where that gets added to decision theory, go ahead. (Or propose some sharp cutoff)

Open-Box Newcomb's Problem and the limitations of the Erasure framing
in fact, it could be an anti-rational agent with the opposite utility function.

These two people might look the same, the might be identical on a quantum level, but one of them is a largely rational agent, and the other is an anti-rational agent with the opposite utility function.

I think that calling something an anti-rational agent with the opposite utility function is a wierd description that doesn't cut reality at its joints. The is a simple notion of a perfect sphere. There is also a simple notion of a perfect optimizer. Real world objects aren't perfect spheres, but some are pretty close. Thus "sphere" is a useful approximation, and "sphere + error term" is a useful description. Real agents aren't perfect optimisers, (ignoring contived goals like "1 for doing whatever you were going to do anyway, 0 else") but some are pretty close, hence "utility function + biases" is a useful description. This makes the notion of an anti-rational agent with opposite utility function like an inside out sphere with its surface offset inwards by twice the radius. Its a cack handed description of a simple object in terms of a totally different simple object and a huge error term.

This is one of those circumstances where it is important to differentiate between you being in a situation and a simulation of you being in a situation.

I actually don't think that there is a general procedure to tell what is you, and what is a simulation of you. Standard argument about slowly replacing neurons with nanomachines, slowly porting it to software, slowly abstracting and proving theorems about it rather than running it directly.

It is an entirely meaningful utility function to only care about copies of your algorithm that are running on certain kinds of hardware. That makes you a "biochemical brains running this algorithm" mazimizer. The paperclip maximizer doesn't care about any copy of its algorithm. Humans worrying about whether the predictors simulation is detailed enough to really suffer is due to specific features of human morality. From the perspective of the paperclip maximizer doing decision theory, what we care about is logical correlation.


Metaphilosophical competence can't be disentangled from alignment

I think its hard to distinguish a lack of metaphilosophical sophistication from having different values. The (hypothetical) angsty teen says that they want to kill everyone. If they had the power to, they would. How do we tell whether they are mistaken about their utility function, or just have killing everyone as their utility function. If they clearly state some utility function that is dependant on some real world parameter, and they are mistaken about that parameter, then we could know. Ie they want to kill everyone if and only if the moon is made of green cheese. They are confident that the moon is made of green cheese, so don't even bother checking before killing everyone.

Alternately we could look at if they could be persuaded not to kill everyone, but some people could be persuaded of all sorts of things. The fact that you could be persuaded to do X says more about the persuasive ability of the persuader, and the vulnerabilities of your brain than whether you wanted X.

Alternatively we could look at whether they will regret it later. If I self modify into a paperclip maximiser, I won't regret it, because that action maximised paperclips. However a hypothetical self who hadn't been modified would regret it.

Suppose there are some nanobots in my brain that will slowly rewire me into a paperclip maximiser. I decide to remove them. The real me doesn't regret this decision, the hypothetical me who wasn't modified does. Suppose there is part of my brain that will make me power hungry and self centered once I become sufficiently powerful. I remove it. Which case is this? Am I damaging my alignment or preventing it from being damaged?

We don't understand the concept of a philosophical mistake well enough to say if someone is making one. It seems likely that, to the extent that humans have a utility function, some humans have utility functions that want to kill most humans.

who almost certainly care about the future well-being of humanity.

Is mistaken. I think that a relatively small proportion of humans care about the future well being of humanity in any way similar to what the words mean to a mordern rationalist.

To a rationalist, "future wellbeing of humanity" might mean a superintelligent AI filling the universe with simulated human minds.

To a random modern first world person, it might mean a fairly utopian "sustainable" future, full of renewable energy, electric cars ect.

To a North Sentinal Islander, they might have little idea that any humans beyond their tribe exist, and might hope for several years of good weather and rich harvests.

To a 10th century monk, they might hope that judgement day comes soon, and that all the righteous souls go to heaven.

To a barbarian warlord, they might hope that their tribe conquers many other tribes.

The only sensible definition of "care about the future of humanity" that covers all these cases is that their utility function has some term relating to things happening to some humans. Their terminal values reference some humans in some way. As opposed to a paperclip maximiser that sees humans as entirely instrumental.

Load More