shiney

Posts

Sorted by New

Wiki Contributions

Comments

Hello, this is great.

OOI what's the reason you haven't just uploaded all of it? Is this a lot of work for you? Are the AWS credits expensive etc.?

I was reading (listening) to this and I think I've got some good reasons to expect failed AI coups to happen.

In general we probably expect "Value is Fragile" and this will probably apply to AI goals too (and it will think this) this will mean a Consequentialist AI will expect that if there is a high chance of another AI taking over soon then all value in the universe (according to it's definition of value) then even though there is a low probability of a particular coup working it will still want to try it because if it doesn't succeed  then almost all the value will be destroyed. So for example this would mean if there are 4 similarly situated AI labs then an AI at one of them will reason they only have a 25% chance of getting control of all value in the universe so as soon as it can come up with a coup attempt that it believes has a greater than around a 25% chance it will probably want to go for it (maybe this is more complex but I think the qualitative point stands)

Secondly because "Value is Fragile" not only will AI's be worried about other labs AI's they will probably also be pretty worried about the next iteration of themselves after an SGD update, obviously there will be some correlation in beliefs about what is valuable between a similarly weighted Neural Network, but I don't think there's much reason to believe that NN weights will have been optimised to make this consistent.

So I think in conclusion to the extent the doom scenario is a runaway consequentialist AI I think unless ease  of coup attempts succeeding jumps massively from around 0% to around 100%  for some reason, there will be good reasons to expect that we will see failed coup attempts first.

Oh interesting didn't realise there was so much nondeterminism for sums on GPUs

I guess I thought that there's only 65k float 16s and the two highest ones are going to be chosen from a much smaller range from that 65k just because they have to be bigger than everything else.

I might be missing something but why does temperature 0 imply determinism? Neural nets don't work with real numbers, they work with floating points numbers so despitetemperature 0 implying an argmax there's no reason there arent justmultiple maxima. AFAICT GPT3 uses half precision floating point numbers so there's quite a lot of space for collisions.

Does anyone know if there's work to make a podcast version of this? I'd definitely be more willing to listen even if it is just at Nonlinear library quality rather than voice acted.

Getting massively out of my depth here, but is that an easy thing to do given the later stages will have to share weights with early stages?

"we don't currently know how to differentiably vary the size of the NN being run. We can certainly imagine NNs being rolled-out a fixed number of times (like RNNs), where the number of rollouts is controllable via a learned parameter, but this parameter won't be updateable via a standard gradient."

Is this really true? I can think of a way to do this in a standard gradient type way. 

Also there looks like there is a paper by someone who works in ML from 2017 where they do this https://arxiv.org/abs/1603.08983

TLDR at each roll out have a neuron that represents the halting probability and then make the result of the roll out the sum of the output vectors at each rollout weighted by the probability the network halted at that rollout.

Thanks, I'll see how that goes, assuming I get enough free time to try this.

Load More