I'd speculate that you have a large advantage with practical partial solutions to alignment by being in silico. Some of the standard AI advantages for capability improvements maybe also be significant advantages for alignment (auto-alignment?). For example:
You said "immediately solve the alignment problem, in an ambitious way...", but you could have a smoother takeoff-paced series of alignment solutions. Maybe.
Could be helpful, yeah. I'd caution people to not put too much burden on yourself. This is one of those things where, at least at first, I'd want to do a minimal, stripped-down version, that requires as little extra effort as possible--because it's the sort of thing that you might not do at all due to cognitive costs. I do sometimes write something down during or after, but when I do it's usually a tiny snippet of just a couple words, to prompt myself to maybe unpack later.
then corrections are spaghetti coded added on to prevent particular failures with data from real experiments
My guess would be that the failures would be quite systematic, and would reflect the absence of substantial algorithms. That would suggest that you either have to come up with more algorithms, and/or you have to learn them from data. But to learn them from data without coming up with the algorithms or with algorithmic search spaces that sufficiently promote the relevant pieces, you need a lot of data; and brain algorithms that work on a time scale of an hour or a day have correspondingly or less data feasibly available compared to ~second-long events.
(Just noting that I agree with your footnote that the learning part is the hard part; that's the part that seems necessary for real minds, and that, when I ask neuro people about it, they're like "oh yeah, we don't know how to do that".)
Would you please listen for 2 minutes to the beginning of the interview around the 1 minute mark, here:
https://www.youtube.com/watch?v=02YLwsCKUww&t=59s
Over the interview, it's clear that Dario's stated timelines are significantly more confidently short than Demis's; e.g. see here https://youtu.be/02YLwsCKUww?t=1030
Then, if you listen here for a couple minutes:
https://youtu.be/02YLwsCKUww?t=1326
it's clear that Dario's stated reason for NOT slowing down HIS research, seemingly refering to the short-timelines research, i.e. the RSI research, is something about China.
@Zac Hatfield-Dodds @evhub @Dave Orr @Ethan Perez @Carson Denison @Drake Thomas @gasteigerjo @Aram Ebtekar Can you comment on this? Is that what they are planning to work on? Were you aware of this? Do you think that's a good thing to do?
As a piece of info about the current real-life situation, there was an episode recently where some members of Congress made a PSA saying that military people should not follow unlawful orders, and then the president called for them to be locked up (or executed). https://www.deseret.com/politics/2025/11/20/trump-responds-as-democratic-lawmakers-direct-video-message-at-troops/?utm_source=chatgpt.com
Have you written up somewhere what you've taken away from the Dario episode regarding "our" credulity? Would be curious to hear. (I could try to generate more specific questions/prompts if you like.)
How should debaters be incentivized? Maybe we could have the audience vote for "did person X present position Y very clearly / strongly / ideally?". Every one can vote on each X and Y separately, but people who say they agree with X both before and after the debate get more weight or prominence in judging whether either debater did a good job at presenting X; and the person who was there to present X can still be lauded if they do a good job at that, even if they actually switch sides halfway through the debate. In theory.
Well, depending on the targeter, it counts against being targeted because there's relatively less to expropriate, and it counts towards being targeted because you have less defenses and are more desperate / have a worse negotiation position.