Views purely my own unless clearly stated otherwise
My bad, I read you as disagreeing with Neel's point that it's good to gain experience in the field or otherwise become very competent at the type of thing your org is tackling before founding an AI safety org.
That is, I read "I think that founding, like research, is best learned by doing" as "go straight into founding and learn as you go along".
I naively expect the process of startup ideation and experimentation, aided by VC money
It's very difficult to come with AI safety startup ideas that are VC-fundable. This seems like a recipe for coming up with nice-sounding but ultimately useless ideas, or wasting a lot of effort on stuff that looks good to VCs but doesn't advance AI safety in any way.
I disagree with this frame. Founders should deeply understand the area they are founding an organization to deal with. It's not enough to be "good at founding".
This makes sense as a strategic choice, and thank you for explaining it clearly, but I think it’s bad for discussion norms because readers won’t automatically understand your intent as you’ve explained it here. Would it work to substitute the term “alignment target” or “developer’s goal”?
When I say "human values" without reference I mean "type of things that human-like mind can want and their extrapolations"
This is a reasonable concept, but should have a different handle from “human values”. (Because it makes common phrases like “we should optimize for human values” nonsensical. For example, human-like minds can want chocolate cake but that tells us nothing about the relative importance of chocolate cake and avoiding disease, which is relevant for decision making.)
What "human values" gesture at is distinction from values-in-general, while "preferences" might be about arbitrary values.
I don’t understand what this means.
Taking current wishes/wants/beliefs as the meaning of "preferences" or "values" (denying further development of values/preferences as part of the concept) is similarly misleading as taking "moral goodness" as meaning anything in particular that's currently legible, because the things that are currently legible are not where potential development of values/preferences would end up in the limit.
Is your point here that “values” and “preferences” are based on what you would decide to prefer after some amount of thinking/reflection? If yes, my point is that this should be stated explicitly in discussions, for example like “here I am discussing the preferences you, the reader, would have, after thinking for many hours.”
If you want to additionally claim that these preferences are tied to moral obligation, this should also be stated explicitly.
Yeah that's fair. I didn't follow the "In other words" sentence (it doesn't seem to be restating the rest of the comment in other words, but rather making a whole new (flawed) point).
Has this train of thought caused you to update away from "Human Values" as a useful construct?
I was curious so I read this comment thread, and am genuinely confused why Tsvi is so annoyed by the interaction (maybe I am being dumb and missing something). My interpretation of Wei Dai's point is the following:
(If this is indeed the point Wei Dai is making, I happen to think Tsvi is more correct, but I don't think WD's contribution is meaningless or in bad faith.)
I think those other types of startups also benefit from expertise and deep understanding of the relevant topics (for example, for advocacy, what are you advocating for and why, how well do you understand the surrounding arguments and thinking...). You don't want someone who doesn't understand the "field" working on "field-building".