Noumero
Noumero has not written any posts yet.

I'm a bit confused about how it'd be work in practice. Could you provide an example of a concrete machine-learning setup, and how its inputs/outputs would be defined in terms of your variables?
I see. I have a specific counterexample that feels like it had to have been considered already, but I haven't seen it mentioned...
The strategies such as penalizing inconsistencies seem to rely on our ability to isolate the AI within the context of training, or make it somehow “buy” into it — as opposed to quickly realizing what's happening and worming its way out of the proverbial box. It feels particularly strange to me when we're talking about AIs that can think better than the smartest human or handily beat specialized science AIs at the useful-ontology game.
Specific example: Once the AI figures out that it's being checked for consistency in parallel with other... (read 400 more words →)
Are there any additional articles exploring the strategy of penalizing inconsistencies across different inputs? It seems both really promising to me, and like something that should be trivially breakable. I'd like to get a more detailed understanding of it.
And that raises the question, even as we live through a rise in AI capabilities that is keeping Eliezer's concerns very topical, why did Drexler's nano-futurism fade...
One view I've seen is that perverse incentives did it. Widespread interest in nanotechnology led to governmental funding of the relevant research, which caused a competition within academic circles over that funding, and discrediting certain avenues of research was an easier way to win the competition than actually making progress. To quote:
... (read more)Hall blames public funding for science. Not just for nanotech, but for actually hurting progress in general. (I’ve never heard anyone before say government-funded science was bad for science!) “[The] great innovations that made the
Yeah, this is the part I'm confused about as well. I think this proposal involves training a neural network emulating a human? Otherwise I'm not sure how EvalH(F(sm),oh) is supposed to work. It requires a human to make a prediction about the next step using observations and the direct translation of the machine state, which requires us to have some way to describe the full state in a way that the "human" we're using can understand. This precludes using actual humans to label the data, because I don't think we actually have any way to provide such a description. We'd need to train up a human simulator specifically adapted for parsing this sort of output.