Donald Hobson

MMath Cambridge. Currently studying postgrad at Edinburgh. D.P.Hobson@sms.ed.ac.uk

Sequences

Logical Counterfactuals and Proposition graphs
Assorted Maths

Comments

We got what's needed for COVID-19 vaccination completely wrong

Another technique would be to remove a vial of blood, add covid, keep the blood alive, and wait till it beats the covid. Then reinject blood full of trained immune cells.

Is this a thing? Why not? 

Donald Hobson's Shortform

There seems to be a consensus here towards throwing money at getting more vaccines. I think I agree with the reasoning, except possibly for the way that letting vaccine companies make large profits in pandemics encourages vaccine companies to start and spread pandemics.

How confident should we be that no company would do anything that evil?

I don't think they would, but ...

Don't encourage prisoners dilemmas

This adds a lot of complication, and in the real world, nothing is cleanly and perfectly opposite. Imagine two art museums bidding on a painting. Are they putting their money in opposite places, or are they working together to increase the amount of art displayed in public. (bidding also increases production)

Also, political donations are actually quite small, comparitively speaking. A different prisoners dilemma operates. Not saying the other effect is absent, just smaller.

A set of Democrats  would each like the democrats to win, but they also want a bigger TV. They would each prefer everyone to donate, but their donation on its own won't make a big difference, so they buy a new TV. 

The Median is Less than the Average

However, It depends what you are measuring, and what scale you are using. Use a scale like Busy Beaver IQ and almost everyone is below average. Because you are streaching out differences at the top of the scale.

There is also a difference between biological intelligence and optimization power. A caveman with the genes for being unusually smart still won't know much science. It is at least plausible that the median is less than the mean here, but again, it depends on the scale used.

The Median is Less than the Average

Intelligence is capped on the bottom but it is long-tailed at the top.

This seems wrong to me. From an evolutionary perspective, there shouldn't be complex new structures that are only occasionally present. 

There is a modal human brain. Most particular brains have a few small errors that make them slightly worse. Some have a small mutation that makes them slightly better. The long tail stretches downwards, into the people with severe learning disabilities. 

Massive consequences

This would be true in a chaotic classical world. But we live in a quantum world. 

The weather is chaotic. Consider a decision of whether or not to sneeze. And suppose you are standing next to a bank of huge fans, each being flipped on and off by a quantum randomness source. 

Whether or not you sneeze, the fans ensure you get a quantum superposition over a huge number of possible future weathers. The particular futures would be different, but with so many samples, most futures where you do sneeze would have a corresponding almost identical one where you didn't.

Of course, whether this picture or the chaotic picture is more correct depends on the scale of your decisions relative to the scale of the quantum randomness. It also depends on how much you consider things to be in superposition. If you don't know the name of the vice president of bolivia, do they have a superposition over all names until you look it up.

In this view, addressing a letter to vice president Alfronzo, or to vice president Bernardo have similar consequences, a superposition of the letter arriving or not (but mostly not)

Non-Obstruction: A Simple Concept Motivating Corrigibility

This definition of a non-obstructionist AI takes what would happen if it wasn't switched on as the base case. 

This can give weird infinite hall of mirrors effects if another very similar non-obstructionist AI would have been switched on, and another behind them. (Ie a human whose counterfactual behaviour on AI failure is to reboot and try again.) This would tend to lead to a kind of fixed point effect, where the attainable utility landscape is almost identical with the AI on and off. At some point it bottoms out when the hypothetical U utility humans give up and do something else. If we assume that the AI is at least weakly trying to maximize attainable utility, then several hundred levels of counterfactuals in, the only hypothetical humans that haven't given up are the ones that really like trying again and again at rebooting the non-obstructionist AI. Suppose the AI would be able to satisfy that value really well. So the AI will focus on the utility functions that are easy to satisfy in other ways, and those that would obstinately keep rebooting in the hypothetical where the AI kept not turning on. (This might be complete nonsense. It seems  to make sense to me)

A Critique of Non-Obstruction

What if, the moment the AI boots up, a bunch of humans tell it "our goals aren't on a spike." (It could technically realize this based on anthropic reasoning. If humans really wanted to maximize paperclips, and its easy to build a paperclip maximizer, we wouldn't have built a non-obstructive AI.)

We are talking policies here. If the humans goals were on a spike, they wouldn't have said that. So If the AI takes the policy of giving us a smoother attainable utility function in this case,  this still fits the bill. 

Actually I think that this definition is pushing much of the work off onto  . This is a function that can take any utility function, and say how a human would behave if maximising it. Flip that round, this takes the human policy and produces a set of possible utility functions. (Presumably very similar functions.) Over these indistinguishable utility functions, the AI tries to make sure that none of them are lower than they would have been if the AI didn't exist. Whether this is better or worse than maximizing the minimum or average would be sensitively dependant on exactly what this magic set of utility functions generated by  is.

Optimal play in human-judged Debate usually won't answer your question

Neural nets have adversarial examples. Adversarial optimization of part of the input can make the network do all sorts of things, including computations.

If you optimise the inputs to a buggy program hard enough, you get something that crashes the program in a way that happens to score highly. 

I suspect that optimal play on most adversarial computer games looks like a game of core wars. https://en.wikipedia.org/wiki/Core_War

Of course, if we really have myopic debate, not any mesaoptimisers, then neither AI is optimizing to have a long term effect or to avoid long term effects. They are optimising for a short term action, and to defend against their adversary. 

If AI1 manages to "persuade" the human not to look at AI2's "arguments" early on, then AI1 has free reign to optimise, the human could end up as a UFAI mesa-optimiser. Suppose the AI1 is rewarded by pressing a red button. The human could walk out of their still human level smart, but with the sole terminal goal of maximizing the number of red buttons pressed universewide. If the human is an AI researcher, this could potentially end with them making a button pressing ASI

Another option I consider fairly likely is that the debater ends up permanently non-functional. Possibly dead, possibly a twitching mess. After all, any fragment of behavioural pattern resembling "functional human" is attack surface. Suppose there is a computer running a buggy and insecure code. You are given first go at hacking it. After your go, someone else will try to hack it, and your code has to repel them. You are both optimising for a simple formal objective, like average pixel colour of screen.

You can get your virus into the system, now you want to make it as hard for your opponent to follow after you as possible. Your virus will probably wipe the OS, cut all network connections and blindly output your preferred output. 

That's plausibly a good strategy, a simple cognitive pattern that repeats itself and blindly outputs your preferred action, wiping away any complex dependence on observation that could be used to break the cycle. 

A third possibility is just semi nonsense that creates a short term compulsion or temporary trance state.

The human brain can recover fairly well from the unusual states caused by psychedelics. I don't know how that compares to recovering from unusual states caused by strong optimization pressure. In the ancestral environment, there would be some psychoactive substances, and some weak adversarial optimization from humans.

I would be more surprised if optimal play gave an answer that was like an actual plan. (This is the case of a plan for a perpetual motion machine, and a detailed technical physics argument for why it should work, that just has one small 0/0 hidden somewhere in the proof.)

I would be even more surprised if the plan actually worked, if the optimal debating AI's actually outputed highly useful info. 

For AI's strongly restricted in output length, I think it might produce useful info on the level of maths hints, "renormalize the vector, then differentiate". something that a human can follow and check quickly. If you don't have the bandwidth to hack the brain, you can't send complex novel plans, just a little hint towards a problem the human was on the verge of solving themselves. Of course, the humans might well follow the rules in this situation.

Load More