Nathan Helm-Burger

AI alignment researcher, ML engineer. Masters in Neuroscience.

Wiki Contributions

Comments

Down voted because I think it misses the main point: avoid mindcrimes by not creating digital beings who are capable of suffering in circumstances where they are predicted to be doomed to suffer (e.g. slavery). I expect there will be digital beings capable of suffering, but they should be created only very thoughtfully, carefully, respectfully, and only after alignment is solved. Humorous related art: https://www.reddit.com/gallery/10j6w0i

I'm definitely feeling like I sacrificed both income and career capital by deciding to do alignment research full time. I don't feel like I'm being 'hurt' by the world though, I feel like the world is hurting. In a saner world, there would be more resources devoted to this, and it is to the world's detriment that this is not the case. I could go back to doing mainstream machine learning if I wasn't overwhelmed by a sense of impending doom and compelled by a feeling of duty to do what I can to help. I'm going to keep trying my best, but I would be a lot more effective if I were working as part of a group. Even just things like being able to share the burden of some of the boilerplate code I need to write in order to do my experiments would speed things up a lot, or having a reviewer to help point out mistakes to me.

Seems like perhaps directly investing in the nascent AGI, if such an option was available, would be a better bet. I do think some amount of real estate investment makes sense, if the land is directly giving you value currently (e.g. a place to live) since a mortgage is a loan over a longer time period than the likely time to AGI, and thus is a good way to bet against a 'nothing interesting happens' future.

Yeah, I think I am somewhat unusual in having tried to research timelines in depth and run experiments to support my research. My inside view continues to suggest we have less than 5 years. I've been struggling with how to write convincingly about this without divulging sociohazards. I feel apologetic for being in the situation of arguing for a point that I refuse to cite my evidence for.

Probably worth mentioning that this isn't an isolated incident, but a growing phenomena. It hits first for people who are in an emotionally vulnerable state and thus have a reason to want to believe, but we can see that the technological progress is enabling more convincing and persuasive versions each year. I wish we had some kind of population metrics on this phenomenon so we could analyze trends...

Yeah, I think that we will need to be careful not to create AIs capable of suffering and commit mindcrimes against them. I also think a confinement is much safer if the AI doesn't know it is being confined. I endorse Jacob Cannell's idea for training entirely within a simulation that has carefully censored information such that the sim appears to be the entire universe and doesn't mention computers or technology. https://www.lesswrong.com/posts/KLS3pADk4S9MSkbqB/review-love-in-a-simbox

Yeah, I don't think we know enough to be sure how it would work out one way or another. There's lots of different ways to wire up neurons to computers. I think it would be worth experimenting with if we had the time. We super don't though.

Oh, wonderful, all we have to do is make sure that nobody in charge of the dangerously powerful future AI is ever... lonely or otherwise emotionally vulnerable enough to be temporarily deceived and thus make a terrible error that can't be taken back. Um, I hope your comment was just sarcasm in poor taste and not actually a statement about why you are hopeful that nothing is going to go wrong.

Thanks for this! I continue to be excited about the possibility of making large black-box models safer by breaking them down into smaller modular components, and where possible, making those components be more interpretable distillations of the learned logic, like synthesized programs.  Related post with further links in comments: https://www.lesswrong.com/posts/xERh9dkBkHLHp7Lg6/making-it-harder-for-an-agi-to-trick-us-with-stvs 

Load More