No, it does not say that either. I’m assuming you’re referring to “choose our words carefully”, but stating something imprecisely is a far ways from not telling the truth.
Yeah so this seems like what I was missing.
But it seems to me that in these types of models, where the utility function is based on the state of the world rather than on input to the AI, aligning the AI not to kill humanity is easier. Like if an AI gets a reward every time it sees a paperclip, then it seems hard to punish the AI for killing humans because "human dies" is a hard thing for an AI with just sensory input to explicitly recognize. If however the AI is trained on a bunch of runs where the utility function is the number of paperclips actually created, then we can also penalize the model for the number of people who actually die.
I'm not very familiar with these forms of training so I could be off here.
Steelmanning is useful as a technique because often the intuition of somebody’s argument is true even if the precise argument they are using is not. If the other person is a rationalist, then you can point out the argument’s flaws and expect them to update the argument to more precisely explore their intuition. If not, you likely have to do some of the heavily lifting for them by steelmanning their argument and seeing where its underlying intuition might be correct.
This post seems only focused on the rationalist case.
As with most things in life: this seems like it could be a real improvement, it's great that we're testing it and finding out!
For most products to be useful, they must be (perhaps not perfectly, but near-perfectly) reliable. A fridge that works 90% of the time is useless, as is a car that breaks down 1 out of every 10 times you try to go to work. The problem with AI is inherently that it’s unreliable - we don’t know how the inner algorithm works, so it just breaks at random points, especially because most of the tasks it handles are really hard (hence why we can’t just use classical algorithms). This makes it really hard to integrate AI until it gets really good, to the point where it can actually be called reliable
The things AI is already used for are things where reliability doesn’t matter as much. Advertisement algorithms just need to be as good as possible to make the company as much revenue as possible. People currently use machine translation just to get the message across and not for formal purposes, making AI algorithms sufficient (if they were better maybe we could use them for more formal purpose’s!). The list goes on.
I honestly think AI won’t become super practical until we reach AGI, at which point (if we ever get there) its usage will explode due to massive applicability and solid reliability (if it doesn’t take over the world, that is).
I'm going to quote this from an EA Forum post I just made for why simply repeated exposure to AI Safety (through eg media coverage) will probably do a lot to persuade people: