Wiki Contributions


I'm going to quote this from an EA Forum post I just made for why simply repeated exposure to AI Safety (through eg media coverage) will probably do a lot to persuade people:

[T]he more people hear about AI Safety, the more seriously people will take the issue. This seems to be true even if the coverage is purporting to debunk the issue (which as I will discuss later I think will be fairly rare) - a phenomenon called the illusory truth effect. I also think this effect will be especially strong for AI Safety. Right now, in EA-adjacent circles, the argument over AI Safety is mostly a war of vibes. There is very little object-level discussion - it's all just "these people are relying way too much on their obsession with tech/rationality" or "oh my god these really smart people think the world could end within my lifetime". The way we (AI Safety) win this war of vibes, which will hopefully bleed out beyond the EA-adjacent sphere, is just by giving people more exposure to our side.  

No, it does not say that either. I’m assuming you’re referring to “choose our words carefully”, but stating something imprecisely is a far ways from not telling the truth.

Nowhere in that quote does it say we should not speak the truth

Yeah so this seems like what I was missing.

But it seems to me that in these types of models, where the utility function is based on the state of the world rather than on input to the AI, aligning the AI not to kill humanity is easier. Like if an AI gets a reward every time it sees a paperclip, then it seems hard to punish the AI for killing humans because "human dies" is a hard thing for an AI with just sensory input to explicitly recognize. If however the AI is trained on a bunch of runs where the utility function is the number of paperclips actually created, then we can also penalize the model for the number of people who actually die.

I'm not very familiar with these forms of training so I could be off here.

Steelmanning is useful as a technique because often the intuition of somebody’s argument is true even if the precise argument they are using is not. If the other person is a rationalist, then you can point out the argument’s flaws and expect them to update the argument to more precisely explore their intuition. If not, you likely have to do some of the heavily lifting for them by steelmanning their argument and seeing where its underlying intuition might be correct.

This post seems only focused on the rationalist case.

As with most things in life: this seems like it could be a real improvement, it's great that we're testing it and finding out!

For most products to be useful, they must be (perhaps not perfectly, but near-perfectly) reliable. A fridge that works 90% of the time is useless, as is a car that breaks down 1 out of every 10 times you try to go to work. The problem with AI is inherently that it’s unreliable - we don’t know how the inner algorithm works, so it just breaks at random points, especially because most of the tasks it handles are really hard (hence why we can’t just use classical algorithms). This makes it really hard to integrate AI until it gets really good, to the point where it can actually be called reliable

The things AI is already used for are things where reliability doesn’t matter as much. Advertisement algorithms just need to be as good as possible to make the company as much revenue as possible. People currently use machine translation just to get the message across and not for formal purposes, making AI algorithms sufficient (if they were better maybe we could use them for more formal purpose’s!). The list goes on.

I honestly think AI won’t become super practical until we reach AGI, at which point (if we ever get there) its usage will explode due to massive applicability and solid reliability (if it doesn’t take over the world, that is).

Yeah, maybe we could show ratio of strong upvotes to upvotes