Computer science master's student interested in AI and AI safety.
I suspect the desire for kids/lineage is really basic for a lot of people (almost everyone?)
This seems like an important point. One of the arguments for the inner alignment problem is that evolution intended to select humans for inclusive genetic fitness (IGF) but humans were instead motivated by other goals (e.g. seeking sex) that were strongly correlated with IGF in the ancestral environment.
Then when humans' environment changed (e.g. the invention of birth control), the correlation between these proxy goals and IGF broke down resulting in low fitness and inner misalignment.
However this statement seems to suggest that modern humans really have internalized IGF as one of their primary objectives and that they're inner aligned with evolution's outer objective.
I think the Zotero PDF reader has a lot of similar features that make the experience of reading papers much better:
I was thinking of doing this but the ChatGPT web app seems to have many features that are only available there and add a lot of value such as Code Interpreter, PDF uploads, DALL-E, and using custom GPTs so I still use ChatGPT Plus.
Thank you for the blog post. I thought it was very informative regarding the risk of autonomous replication in AIs.
It seems like the Centre for AI Security is a new organization.
I've seen the announcement post on it's website. Maybe it would be a good idea to cross-post it to LessWrong as well.
Is MIRI still doing technical alignment research as well?
This is a brilliant post, thanks. I appreciate the breakdown of different types of contributors and how orgs have expressed the need for some types of contributors over others.
Thanks for the table, it provides a good summary of the post's findings. It might also worthwhile to also add it to the EA Forum post as well.
I think the table should include the $10 million in OpenAI Superalignment fast grants as well.
I think there are some great points in this comment but I think it's overly negative about the LessWrong community. Sure, maybe there is a vocal and influential minority of individuals who are not receptive to or appreciative of your work and related work. But I think a better measure of the overall community's culture than opinions or personal interactions is upvotes and downvotes which are much more frequent and cheap actions and therefore more representative. For example, your posts such as Reward is not the optimization target have received hundreds of upvotes, so apparently they are positively received.
LessWrong these days is huge with probably over 100,000 monthly readers so I think it's challenging to summarize its culture in any particularly way (e.g. probably most users on LessWrong live outside the bay area and maybe even outside the US). I personally find that LessWrong as a whole is fairly meritocratic and not that dogmatic, and that a wide variety of views are supported provided that they are sufficiently well-argued.
In addition to LessWrong, I use some other related sites such as Twitter, Reddit, and Hacker News and although there may be problems with the discourse on LessWrong, I think it's generally significantly worse on these other sites. Even today, I'm sure you can find people saying things on Twitter about how AIs can't have goals or that wanting paperclips is stupid. These kinds of comments wouldn't be tolerated on LessWrong because they're ignorant and a waste of time. Human nature can be prone to ignorance, rigidness of opinions and so on but I think the LessWrong walled garden has been able to counteract these negative tendencies better than most other sites.
State-of-the-art models such as Gemini aren't LLMs anymore. They are natively multimodal or omni-modal transformer models that can process text, images, speech and video. These models seem to me like a huge jump in capabilities over text-only LLMs like GPT-3.
Nice paper! I found reading it quite insightful. Here are some key extracts from the paper:
Improving adversarial robustness by classifying several down-sampled noisy images at once:
Improving adversarial robustness by using an ensemble of intermediate layer predictions: