Michaël Trazzi

Sequences

AI Races and Macrostrategy
Treacherous Turn
The Inside View (Podcast)

Wiki Contributions

Comments

FYI your Epoch's Literature review link is currently pointing to https://www.lesswrong.com/tag/ai-timelines

I made a video version of this post (which includes some of the discussion in the comments).
 

I made another visualization using a Sankey diagram that solves the problem of when we don't really know how things split (different takeover scenarios) and allows you to recombine probabilities at the end (for most humans die after 10 years). 

The evidence I'm interested goes something like:

  • we have more empirical ways to test IDA
  • it seems like future systems will decompose / delegates tasks to some sub-agents, so if we think either 1) it will be an important part of the final model that successfully recursively self-improves 2) there are non-trivial chances that this leads us to AGI before we can try other things, maybe it's high EV to focus more on IDA-like approaches?

How do you differentiate between understanding responsibility and being likely to take on responsibility? Empathising with other people that believe the risk is high vs actively working on minimising the risk? Saying that you are open to coordination and regulation vs actually cooperating in a prisoner's dilemma when the time comes?

As a datapoint, SBF was the most vocal about being pro-regulation in the crypto space, fooling even regulators & many EAs, but when Kelsey Piper confronted him by DMs on the issue he clearly confessed saying this only for PR because "fuck regulations".

[Note: written on a phone, quite rambly and disorganized]

I broadly agree with the approach, some comments:

  • people's timelines seem to be consistently updated in the same direction (getting shorter). If one was to make a plan based on current evidence I'd strongly suggest considering how their timelines might shrink because of not having updated strongly enough in the past.
  • a lot of my coversations with aspiring ai safety researchers goes something like "if timelines were so short I'd have basically no impact, that's why I'm choosing to do a PhD" or "[specific timelines report] gives X% of TAI by YYYY anyway". I believe people who choose to do research drastically underestimate the impact they could have in short timelines worlds (esp. through under-explored non-research paths, like governance / outreach etc) and overestimate the probability of AI timelines reports being right.
  • as you said, it makes senses to consider plans that works in short timelines and improve things in medium/long timelines as well. Thus you might actually want to estimate the EV of a research policy for 2023-2027 (A), 2027-2032 (B) and 2032-2042 (C) where by plicy I mean you apply a strategy for either A and update if no AGI in 2027, or you apply a strategy for A+B and update in 2032, etc.
  • It also makes sense to consider who could help you with your plan. If you plan to work at Anthropic, OAI, Conjecture etc it seems that many people there consider seriously the 2027 scenario, and teams there would be working on short timelines agendas matter what.
  • if you'd have 8x more impact on a long timelines scenario than short timelines, but consider short timelines only 7x more likely, working as if long timelines were true would create a lot of cognitive dissonance which could turn out to be counterproductive
  • if everyone was doing this and going to PhD, the community would end up producing less research now, therefore having less research for the ML community to interact with in the meantime. It would also reduce the number of low-quality research, and admittedly doing PhD one would also publish papers that would be a better way to attract more academics to the field.
  • one should stress the importance of testing for personal fit early on. If you think you'd be a great researcher in 10 years but have never tried research, consider doing internships / publishing research before going through the grad school pipeline? Also PhD can be a lonely path and unproductive for many. Especially if the goal is to do AI Safety research, test the fit for direct work as early as possible (alignment research is surprisingly more pre-paradigmatic than mainstream ML research)

meta: it seems like the collapse feature doesn't work on mobile, and the table is hard to read (especially the first column)

Use the dignity heuristic as reward shaping

“There's another interpretation of this, which I think might be better where you can model people like AI_WAIFU as modeling timelines where we don't win with literally zero value. That there is zero value whatsoever in timelines where we don't win. And Eliezer, or people like me, are saying, 'Actually, we should value them in proportion to how close to winning we got'. Because that is more healthy... It's reward shaping! We should give ourselves partial reward for getting partially the way. He says that in the post, how we should give ourselves dignity points in proportion to how close we get.

And this is, in my opinion, a much psychologically healthier way to actually deal with the problem. This is how I reason about the problem. I expect to die. I expect this not to work out. But hell, I'm going to give it a good shot and I'm going to have a great time along the way. I'm going to spend time with great people. I'm going to spend time with my friends. We're going to work on some really great problems. And if it doesn't work out, it doesn't work out. But hell, we're going to die with some dignity. We're going to go down swinging.”

Load More