Donald Hobson

MMath Cambridge. Currently studying postgrad at Edinburgh.


Logical Counterfactuals and Proposition graphs
Assorted Maths

Wiki Contributions


Two Stupid AI Alignment Ideas

A couple more problems with extreme discounting. 

In contexts where the AI is doing AI coding, it is only weekly conserved. Ie the original AI doesn't care if it makes an AI that doesn't have super high discount rates, so long as that AI does the right things in the first 5 minutes of being switched on.

The theoretical possibility of time travel.


Also, the strong incentive to pay in Parfits hitchhiker only exists if Parfit can reliably predict you. If humans have the ability to look at any AI code, and reliably predict what it will do, then alignment is a lot easier, you just don't run any code you predict will do bad things.

Also FAI != enslaved AI.

In a successful FAI project, the AI has terminal goals carefully shaped by the programmers, and achieves those goals.

In a typical UFAI, the terminal goals are set by the random seed of network initialization, or arbitrary details in the training data.

A positive case for how we might succeed at prosaic AI alignment

I think you might be able to design advanced nanosystems without AI doing long term real world optimization. 

Well a sufficiently large team of smart humans could probably design nanotech. The question is how much an AI could help.

Suppose unlimited compute. You program a simulation of quantum field theory. Add a GUI to see visualizations and move atoms around. Designing nanosystems is already quite a bit easier.

Now suppose you brute force search over all arrangements of 100 atoms within a 1nm box, searching for the configuration that most efficiently transfers torque. 

You do similar searches for the smallest arrangement of atoms needed to make a functioning logic gate.

Then you download an existing microprocessor design, and copy it (but smaller) using your nanologic gates.

I know that if you start brute forcing over a trillion atoms, you might find a mesaoptimizer. (Although even then I would suspect that visualization inspection shouldn't result in anything brain hacky. It would only be actually synthesizing such a thing that was dangerous. (or maybe possibly simulating it, if the mesaoptimizer realizes it's in a simulation and there are general simulation escape strategies ))

So look at the static output of your brute forcing. If you see anything that looks computational, delete it. Don't brute force anything too big. 

(Obviously you need human engineers here, any long term real world planning is coming from them.)

Quantilizer ≡ Optimizer with a Bounded Amount of Output

You can quantize any distribution. For the random distribution, you need to use 0.000... 01% quantilization to get anything useful at all. (In non-trivial environments, the vast majority of random actions are useless junk. If you have a humanoid coffee making robot, almost all random motor inputs will result in twitching on the floor.)

However, you can also quantalize over a model trained to imitate humans. Suppose you give some people a joystick and ask them to remote control the robot to make coffee. They manage this about half the time. You train the robot to be a 25% quantalizer. The top 25% of human actions will do better than humans. 

This is technically equivalent to an imitator of humans, and an ASI that can only output 2 bits (which are used as a random seed for the human) 

Though these descriptions imply different internal workings, which may indicate different probabilities of the AI using rowhammer. 

What would we do if alignment were futile?

There may be a nanotech critical point. Getting to full advanced nanotech probably involves many stages of bootstrapping. If lots of nanobots have been designed on a computer, then an early stage of the bootstrapping process might be last to be designed. (Building a great nanobot with a mediocre nanobot might be easier than building the mediocre nanobot from something even worse.) This would mean a sudden transition where one group potentially suddenly had usable nanotech. 

So, can a team of 100 very smart humans, working together, with hand coded nanotech, stop an ASI being created. 

I would be unsurprised if blindly scanning and duplicating a human, to the resolution where memories and personality was preserved, was not that hard with hand coded nanotech. (Like a few researcher months of effort) 

Making nanomachines that destroy GPU's seems not that hard either. 

Nor does making enough money to just buy all the GPU's and top AI talent available.

Actually, find everyone capable of doing AI research, and pay them 2x as much to do whatever they like (as long as they don't publish and don't run their code on non toy problems) sounds like a good plan in general.

"For only $10 000 a year in fine whisky, we can keep this researcher too drunk to do any dangerous AI research." But thousands more researchers like him still spend their nights sober adjusting hyperparameters. That's why were asking you to help in this charity appeal. 

(Idea meant more as interesting wild speculation. Unintended incentives exist. That isn't to say it would be totally useless.) 

Discussion with Eliezer Yudkowsky on AGI interventions

Under the Eliezerian view, (the pessimistic view that is producing <10% chances of success). These approaches are basically doomed. (See logistic success curve) 

Now I can't give overwhelming evidence for this position. Whisps of evidence maybe, but not an overwheming mountain of it. 

Under these sort of assumptions, building a container for an arbitrary superintelligence such that it has only 80% chance of being immediately lethal, and a 5% chance of being marginally useful is an achievment.

(and all possible steelmannings, that's a huge space)

Discussion with Eliezer Yudkowsky on AGI interventions

Lets say you use all these filtering tricks. I have no strong intuitions about whether these are actually sufficient to stop those kind of human manipulation attacks. (Of course, if your computer security isn't flawless, it can hack whatever computer system its on and bypass all these filters to show the humans arbitrary images and probably access the internet.) 

But maybe you can at quite significant expense make a Faraday cage sandbox, and then use these tricks. This is beyond what most companies will do in the name of safety. But Miri or whoever could do it. Then they ask the superintelligence about nanosystems, and very carefully read the results. Then presumably they go and actually try to build nanosystems. Of course you didn't expect the superintelligences advice to be correct, did you? And not wrong in an easily detectable fail safe way either. You concepts and paradigm are all subtly malicious. Not clear testable and factually wrong statements. But nasty tricks hidden in the invisible background assumptions. 

Consequentialism may cost you

I feel you are taking some concepts that you think aren't very well defined, and throwing them away, replacing them with nothing. 

I admit that the intuitive notions of "morality" are not fully rigorous, but they are still far from total gibberish. Some smart philosopher may come along and find a good formal definition.

"Survival" is the closest we have to an objective moral or rational determinant. 

Whether or not a human survives is an objective question. The amount of hair they have is similarly objective. So is the amount of laughing they have done, or the amount of mathematical facts they know. 

All of these have ambiguity of definition, has a braindead body with a beating heart "survived"? This is a question of how you define "survive". And once you define that, its objective.

There is nothing special about survival, except to the extent that some part of ourselves already cares about it.

 And evolution as a force typically acts on collectives, not individuals.

Evolution doesn't effect any individual in particular. There is no individual moth who evolved to be dark. It acts on the population of moths as a whole. But evolution selects for the individuals that put themselves ahead. Often this means individuals that cheat to benefit themselves at the expense of the species. (Cooperative behaviour is favoured when creatures have long memories and a good reputation is a big survival advantage. Stab your hunting partner in the back to double your share once, and no one will ever hunt with you again.)

Intelligence or Evolution?

Firstly this would be AI's looking at their own version of the AI alignment problem. This is not random mutation or anything like it. Secondly I would expect there to only be a few rounds maximum of self modification that runs risk to goals. (Likely 0 rounds) Firstly damaging goals looses a lot of utility. You would only do it if its a small change in goals for a big increase in intelligence. And if you really need to be smarter and you can't make yourself smarter while preserving your goals. 

You don't have millions of AI all with goals different from each other. The self upgrading step happens once before the AI starts to spread across star systems.

Why is multi worlds not a good explanation for abiogenesis

I'm not sure what you mean by objectivity or why superimposed states don't have it.

If a pair of superimposed state differ by a couple of atom positions, they can interact and merge. If they differ by "cat alive" to "cat dead" then there is practically no interaction. The world stays superimposed forever.  So that gives causal isolation and permanence.

I mean maybe some of the connotations of "worlds" aren't quite accurate. Its a reasonably good word to use. 

Load More