Bay Area Winter Solstice 2019
Catalyst: a collaborative biosecurity summit
Sorted by Magic (New & Upvoted)
Magic (New & Upvoted)
Show Low Karma
Saturday, November 30th 2019
Sat, Nov 30th 2019
CO2 Stripper Postmortem Thoughts
Useful Does Not Mean Secure
What's been written about the nature of "son-of-CDT"?
Counterfactuals as a matter of Social Convention
What attempts have been made at global coordination around AI safety?
How To Change a Dance
Warsaw December Meetup
Zielna 39, Warszawa
Experimental Design Club
Seattle, WA, USA
Prediction Party 2020
Seattle, WA, USA
Portland SSC Meetup 12/07/19
Eliezer has the sequences, Scott the Codex; what does Robin Hanson have? Can someone point me to a direction where I could start reading his posts in a manner that makes sense? I found this post: https://www.lesswrong.com/posts/SSkYeEpTrYMErtsfa/what-are-some-of-robin-hanson-s-best-posts [https://www.lesswrong.com/posts/SSkYeEpTrYMErtsfa/what-are-some-of-robin-hanson-s-best-posts] which may be helpful, does someone have an opinion on this?
I've been thinking about interpretable models. If we have some system making decisions for us, it seems good if we can ask it "Why did you suggest action X?" and get back something intelligible. So I read up about what sorts of things other people have come up with. Something that seemed cool was this idea of tree regularization [http://www.shallowmind.co/jekyll/pixyll/2017/12/30/tree-regularization/]. The idea being that decision trees are sort of the standard for interpretable models because they typically make splits along features. You essentially train a regularizer (which is a neural net) which proxies average tree length (i.e. the complexity of a decision tree which is comparable to the actual model you're training). Then, when you're done, you can train a new decision tree which mimics the final neural net (the one you trained with the regularizer). The author pointed out that, in the process of doing so, you can see what features the model thinks are relevant. Sometimes they don't make sense, but the whole point is that you can at least tell that they don't make sense (from a human perspective) because the model is less opaque. You know more than just "well, it's a linear combination of the inputs, followed by some nonlinear transformations, repeated a bunch of times". But if the features don't seem to make sense, I'd still like to know why they were selected. If the system tells us "I suggested decision X because of factors A, B, and C" and C seems really surprising to us, I'd like to know what value it's providing to the prediction. I'm not sure what sort of justification we could expect from the model, though. Something like "Well, there was this regularity that I observed in all of the data you gave me, concerning factor C," seems like what's happening behind the scenes. Maybe that's a sign for us to investigate more in the world, and the responsibility shouldn't be on the system. But, still, food for thought.