EU AI Policy / Mechatronics Engineer - Co-Lead @AI Standards Lab
Do you think there's some initial evidence for that? E.g. Voyager or others from Deepmind. Self play gets thrown around a lot, not sure if concretely we've seen much yet for LLMs using it.
But yes agree, good point regarding strategy games being a domain that could be verifiable
I was fairly on board with control before, I think my main remaining concern is the trusted models not being good enough. But with more elaborate control protocols (Assuming political/AI labs actually make an effort to implement), catching an escape attempt seems more likely if the model's performance is very skewed to specific domains. Though yeah I agree that some of what you mentioned might not have changed, and could still be an issue
Good post, thanks for writing!
With o1, and now o3, It seems fairly plausible now that there will be a split between "verifiable" capabilities, and general capabilities. Sure, there will be some cross-pollination (transfer), but this might have some limits.
What then? Can a superhuman mathematical + Coding AI also just reason through political strategy, or will it struggle and make errors/fallback on somewhat generic ideas in training data?
Can we get a "seed-AI style" consequentialist in some domains, while it fails to perform above human level in others? I'd like to believe reasoning would transfer (as it should be universal), but I dont think reasoning is sufficient for some more fuzzy domains - the model also needs good heuristics.
The AI Control agenda seems more promising now (for both this reason and some others).
Signal boost for the "username hiding" on homepage feature in settings - it seems cool, will see if it changes how I use LW.
I wonder also about a "hide karma by default". Though less sure if that will actually achieve the intended purpose, as karma can be a good filter when just skimming comments and not reading in detail.
LW Feature request/idea for a feature - In posts that have lots of in-text links to other posts, perhaps add an LLM 1-2 sentence (context informed) summary in the hover preview?
I assume that for someone who has been around the forum for many years, various posts are familiar enough that name-dropping them in a link is sufficient to give context. But If I have to click a link and read 4+ other posts as I am going through one post, perhaps the LW UI can fairly easily build in that feature.
(suggesting it as a features since it does seem like LW is a place that experiments with various features not too different from this - ofc, I can always ask for a LLM summary manually myself if I need to)
I have the Boox Nova Air (7inch) for nearly 2 years now - a bit small for reading papers but great for books and blog posts. You can run google play apps, and even set up a google drive sync to automatically transfer pdfs/epubs onto it. At some point I might get the 10inch version (the Note Air).
Another useful feature is taking notes inside pdfs, by highlighting and then handwriting the note into the Gboard handwrite-to-text keyboard. Not as smooth as on an iPad, but pretty good way to annotate a paper.
This was very interesting, looking forward to the follow up!
In the "AIs messing with your evaluations" (and checking for whether the AI is capable of/likely to do so) bit, I'm curious if there is any published research on this.
I had a great time at AISC8. Perhaps I would still find my way into a full time AI Safety position without it, but i'd guess at least 1 year later and significantly less neglected opportunity. My AI Safety Camp project later became the AI Standards Lab.
I know several others who benefitted quite a bit from it.