In a word, yes. Very unappealing.
Cart-pole balancing seems like a good toy case
Is it relevant whether you knew about the apples before the apple man told you about them? If you didn't know, then the least exploitable response to a message that looks adversarial is to pretend you didn't hear it, which would mean not eating the apples.
Also, pascal's mugging is worth coordinating against- if everyone gives the 5 dollars, the stranger rapidly accumulates wealth via dishonesty. If no one eats the apples, then the stranger has the same tree of apples get less and less eaten, which is less caustic.
One way I could write a computer program that e.g. lands a rocket ship is to simulate many landings that could happen after possible control inputs, pick the simulated landing that has properties I like ( such as not exploding and staying far from actuator limits) and then run a low latency loop that locally makes reality track that simulation, counting on the simulation to reach a globally pleading end.
Is this what you mean by loading something into your pseudo prediction?
No law of physics stops the first AI in an RSI cascade from having its values completely destroyed by RSI. I think this is the default outcome?
It's a shame language model decoding isn't deterministic, or I could make a snarky but unhelpful comment that the information content is provably identical, by some sort of pigeon hole argument.
Epistemic status: 11 pages in to “The lathe of heaven” and dismayed by Orr
Are alignment methods that rely on the core intelligence being pre-trained on webtext sufficient to prevent ASI catastrophe?
What are the odds that, 40 years after the first AGI, the smartest intelligence is pretrained on webtext?
What are the odds that the best possible way to build an intelligent reasoning core is to pretrain on webtext?
What are the odds that we can stay in a local maximum for 40 years of everyone striving to create the smartest thing they can?
My mental model of the sequelae of AGI in ~10 years without an intentional global slowdown is that within my natural lifespan, there will be 4-40 transistions in the architecture of the current smartest intelligence, where the architecture undergoes changes in overall approach at least as large as the difference from evolution -> human brain or human brain -> RL'd language model. Alignment means building programs that themselves are benevolent, but are also both wise and mentally tough enough to only build benevolent and wise successors, even when put under crazy pressure to build carelessly. When I say crazy pressure, I mean "the entity trying to get you to build carelessly is dumber than you, but it gets to RL you into agreeing to help" levels of pressure. This is hard.
A successfully trained 1 hidden layer perceptron with 500 hidden activations has at absolute minimum 500! possible successful parameter settings.
Thanks for sharing this. I think I need to be more appreciative that my university experience may have been good through perhaps exceptional efforts on the part of the university, and not as some default. In particular, this can be true even as the best parts centered around the university getting out of the way of the students.
Second on Tux Paint
tux racer (penguin sledding) and supertux (platformer) are games with level editors, my three year old loves supertux and its level editor but it is a well-put together enough game to start to be addicting to him.
Whenever he sees me working, I'm on a terminal, and he wanted to learn how to use a terminal. I taught him how to type
```
sl
sl -a
sl; sl
sl | lolcat
cowsay hi
```
etc
and he found this very amusing. Often will demand to "make a train" if I get the laptop out where he can see me.