I'm Steve Byrnes, a professional physicist in the Boston area. I have a summary of my AGI safety research interests at:

steve2152's Comments

Will AI undergo discontinuous progress?

That's a good point; if a research group develops a more promising approach to AI, recursive self-improvement / capability enhancement might be one of the first things they do, before going for immediate money-making applications, because the programmers know that application area already, and they can just do it internally without going through the rigmarole of marketing, product design, etc. etc.

Curiosity Killed the Cat and the Asymptotically Optimal Agent

Hmm, I think I get it. Correct me if I'm wrong.

Your paper is about an agent which can perform well in any possible universe. (That's the "for all ν in ℳ"). That includes universes where the laws of physics suddenly change tomorrow. But in real life, I know that the laws of physics are not going to change tomorrow. Thus, I can get optimal results without doing the kind of exhaustive exploration that your paper is talking about. Agree or disagree?

On unfixably unsafe AGI architectures

Hmm, interesting. I think human cloning is an imperfect analogy because the only real reason to do it is to impress your friends, so if everyone coordinates on being scornful towards the first person to do human cloning (rather than being impressed), then there's no more personal benefit to cheating. By contrast, with an AGI, there would be the hope that you'll actually solve the safety problems, and then get tons of money and power and respect.

Biological weapons is maybe a better example, but not an especially encouraging one: as many as 8 countries may have secret bio-weapons programs, including North Korea. Maybe one could make an argument that there's a taboo against using bio-weapons, as opposed to merely stockpiling them? Likewise, the taboo against using nuclear weapons was not successfully turned into a taboo against countries starting new nuclear weapons programs. Maybe it's hard to get riled up against someone doing something that is not purposely aggressive? I don't know. I can't think of a great historical analogy.

There's also the issue that there's not too many actors who have any reason to start a bio-weapons programs, and the ability to do so without getting shut down. Really just secret military labs. Whereas in the worst case, many orders of magnitude more people would be willing and able to start doing illegal AGI experiments without the authorities realizing it.

Curiosity Killed the Cat and the Asymptotically Optimal Agent

This question is probably stupid, and also kinda generic (it applies to many other papers besides this one), but forgive me for asking it anyway.

So, I'm trying to think through how this kind of result generalizes beyond MDPs. In my own life, I don't go wandering around an environment looking for piles of cash that got randomly left on the sidewalk. My rewards aren't random. Instead, I have goals (or more generally, self-knowledge of what I find rewarding), and I have abstract knowledge constraining the ways in which those goals will or won't happen.

Yes, I do still have to do exploration—try new foods, meet new people, ponder new ideas, etc.—but because of my general prior knowledge about the world, this exploration kinda feels different than the kind of exploration that they talk about in MDPs. It's not really rolling the dice, I generally have a pretty good idea of what to expect, even if it's still a bit uncertain along some axes.

So, how do you think about the generalizability of these kinds of MDP results?

(I like the paper, by the way!)

Morality vs related concepts

Not a philosopher, but common-sensically, I understand utilitarianism as saying that actions that create more good for more people are progressively more praiseworthy. It's something else to label the one very best possible action as "moral / permitted" and label every other action as "immoral / forbidden". That seems like a weird and counterproductive way to talk about things. Do utilitarians actually do that?

Attainable Utility Preservation: Concepts

Just trying to think this through ... at the risk of proving I haven't carefully read all your posts ... :-)

I program my AI to invent a better solar cell. So it starts by reading a materials science textbook. OK, now it knows materials science ... it didn't before ... Is that a disallowed AU increase? (As the saying goes, "knowledge is power"...?)

The Catastrophic Convergence Conjecture

Cool. We're probably on the same page then.

Load More