habryka

Coding day in and out on LessWrong 2.0. You can reach me at habryka@lesswrong.com

Sequences

Concepts in formal epistemology

Wiki Contributions

Comments

My guess is in the case of AI warning shots there will also be some other alternative explanations like "Oh, the problem was just that this company's CEO was evil, nothing more general about AI systems".

Filtering out the AI tag should roughly do that.

Oops, sorry, I let our SSL certificate expire for like 20 minutes. Sorry for everyone who got a non-secure warning on the frontpage for the last 15 minutes or so, but should all be fixed now (I was tracking it as a thing to fix today, but didn't think about timezones when thinking about when to deal with it).

For what it's worth, I think I am actually in favor of downvoting content of which you think there is too much. The general rule for voting is "upvote this if you want to see more like this" and "downvote this if you want to see less like this". I think it's too easy to end up in a world where the site is filled with content that nobody likes, but everyone thinks someone else might like. I think it's better for people to just vote based on their preferences, and we will get it right in the aggregate.

habryka1dΩ7104

I appreciate the effort and strong-upvoted this post because I think it's following a good methodology of trying to build concrete gear-level models and concretely imagining what will happen, but also think this is really very much not what I expect to happen, and in my model of the world is quite deeply confused about how this will go (mostly by vastly overestimating the naturalness of the diamond abstraction, underestimating convergent instrumental goals and associated behaviors, and relying too much on the shard abstraction). I don't have time to write a whole response, but in the absence of a "disagreevote" on posts am leaving this comment.

It currently gives fixed-size karma bonuses or penalties. I think we should likely change it to be multipliers instead, but either should get the basic job done.

Right now, there are ~100 capabilities researchers vs ~30 alignment researchers at OpenAI.

I don't want to derail this thread, but I do really want to express my disbelief at this number before people keep quoting it. I definitely don't know 30 people at OpenAI who are working on making AI not kill everyone, and it seems kind of crazy to assert that there are (and I think assertions that there are are the result of some pretty adversarial dynamics I am sad about).

I think a warning shot would dramatically update them towards worry towards worry about accident risk, and therefore I anticipate that OpenAI would drastically shift most of their resources to alignment research. I would guess P(B|A) ~= 80%.

I would like to take bets here, though we are likely to run into doomsday-market problems, though there are ways around that.

but it could be useful to have something like per user ranking adjustments based on tags, so that people could more configure/personalize their experience.

Just to be clear, this does indeed exist. You can give a penalty or boost to any tag on your frontpage, and so shift the content in the direction of topics you are most interested in.

Tagging is crowdsourced. If something seems wrong, just vote down the relevance, and if it's 0 or lower, the tag gets removed.

I also don't understand. I would have understood it if we didn't have disagreement voting, but if you just disagree with something (but don't think the author should be disincentivized from saying it), use disagreement votes, not the approval votes.

Load More