Here's the link ^
It is hard for me to phrase this question in a way that is correct and also not offensive. But research like this feels "not real" for some reason. It feels fuzzy, like random made-up philosophy that won't ever have any real impact on the world. It doesn't feel /real/ like a paper on "look this neural net is doing this bad thing, watch" or "look this model always lies and it's impossible to come up with a model that doesn't lie if it's in this format"
This just feels like pretend, made-up research that they put math equations in to seem like it's formal and rigorous. You just make up rules and then pretend they describe reality and then prove mathematically some results from there?
Does anyone else see what I'm saying about feeling like this sort of research is fake, and how would I convince myself that it isn't just useless random thoughts written down that kind of dance near the topic? At the end of all those questions, I feel no closer to knowing if a machine would stop you from pressing a button to shut it off.