Thanks for all the references! I don't currently have much time to read all of it right now so I can't really engage with the specific arguments for the rejection of using utility functions/studying recursive self-improvement.
I essentially agree with most of what you wrote. There is maybe a slight disagreement in how you framed (not what you meant) how research focus shifted since 2014.
I see Superintelligence as essentially saying "hey, there is pb A. And even if we solve A, then we might also have B. And given C and D, there might be E." Now that the field is more mature and we have many more researchers getting paid to work on these problems, the arguments became much more goal focused. Now people are saying "I'm going to make progress on sub-problem X, by publishing a paper on Y. And working on Z is not cost-effective given so I'm not going to work on it given humanity's current time constraints."
These approaches are often grouped as "long-term problems-focused" and "making tractable progress now focused". In the first group you have Yudkowsky 2010, Bostrom 2014, MIRI's current research and maybe CAIS. In the second one you have current CHAI/FHI/OpenAI/DeepMind/Ought papers.
Your original framing can be interpreted as "after proving some mathematical theorems, people rejected the main arguments of Superintelligence and now most of the community agrees that working on X, Y and Z are tractable but A, B and C are more controversials".
I think a more nuanced and precise framing would be: "In Superintelligence Bostrom exposes exhaustively the risks associated with advanced AI. A short portion of the book is dedicated to the problems are working on right now. Indeed, people stopped working on the other problems (largest portion of the book) because 1) there hasn't been really productive working on them 2) some rebuttals have been written online giving convincing arguments that those pbs are not tractable anyway 3) there are now well-funded research organizations with incentives to make tangible progress on those pbs."
In your last framing, you presented precise papers/rebuttals (thanks again!) for 2), and I think rebuttals are a great reason to stop working on a pb, but I think they're not the only reason and not the real reason people stopped working on those pb. To be fair, I think 1) can be explained by many more factors than "it's theoretically impossible to make progress on those pbs". It can be that the research mindset required to work on these pbs is less socially/intellectually validating or requires much more theoretical approaches, so will be off-putting/tiresome to most recent grads that enter the field. I also think that AI Safety is now much more intertwined with evidence-based approaches such as Effective Altruism than it was in 2014, which explains 3), so people start presenting their research as "partial solutions to the pb. of AI Safety" or "research agenda".
To be clear, I'm not criticizing the current shift in research. I think it's productive for the field, both in the short term and long term. To give a bit more personal context, I started getting interested in AI Safety after reading Bostrom and have always been more interested in the "finding problems" approach. I went to FHI to work on AI Safety because I was super interested in finding new pbs related to the treacherous turn. It's now almost taboo to say that we're working on pbs that are sub-optimally minimizing AI risk, but the real reason that pushed me to think about those pbs was because they were both important and interesting. The pb. with the current "shift in framing" is that it's making it socially unacceptable for people to think/work on more long-term pbs where there is more variance in research productivity.
I don't quite understand the question?
Sorry about that. I thought there was some link to our discussion about utility functions but I misunderstood.
EDIT: I also wanted to mention that the number of pages in a book doesn't account for how important the author think the pb. is (Bostrom even comments on this in the postface of its book). Again, the book is mostly about saying "here are all the pbs", not "these are the tractable pbs we should start working on, and we should dedicate research ressources proportionally to the amount of pages I talk about it in the book".
This framing really helped me think about gradual self-improvement, thanks for writing it down!
I agree with most of what you wrote. I still feel that in the case of an AGI re-writing its own code there's some sense of intent that hasn't been explicitly happening for the past thousand years.
Agreed, you could still model Humanity as some kind of self-improving Human + Computer Colossus (cf. Tim Urban's framing) that somehow has some agency. But it's much less effective at self-improving itself, and it's not thinking "yep, I need to invent this new science to optimize this utility function". I agree that the threshold is "when all the relevant action is from a single system improving itself".
there would also be warning signs before it was too late
And what happens then? Will we reach some kind of global consensus to stop any research in this area? How long will it take to build a safe "single system improving itself"? How will all the relevant actors behave in the meantime?
My intuition is that in the best scenario we reach some kind of AGI Cold War situation for long periods of time.
I get the sense that the crux here is more between fast / slow takeoffs than unipolar / multipolar scenarios.
In the case of a gradual transition into more powerful technology, what happens when the children of your analogy discovers recursive self improvement?
When you say "the last few years has seen many people here" for your 2nd/3rd paragraph, do you have any posts / authors in mind to illustrate?
I agree that there has been a shift in what people write about because the field grew (as Daniel Filan pointed out). However, I don't remember reading anyone dismiss convergent instrumental goals such as increasing your own intelligence or utility functions as an useful abstraction to think about agency.
In your thread with ofer, he asked what was the difference between using loss functions in neural nets vs. objective function / utility functions and I haven't fully catched your opinion on that.
the ones you mentioned
the ones you mentioned
To be clear, this is a linkpost for Philip Trammell's blogpost. I'm not involved in the writing.
As you say
As you say
To be clear, the author is Philip Trammell, not me. Added quotes to make it clearer.
Having printed and read the full version, this ultra-simplified version was an useful summary.
Happy to read a (not-so-)simplified version (like 20-30 paragraphs).
A comprehensive AI alignment introductory web hub
RAISE and Robert Miles provide introductory content. You can think of LW->alignment forum as "web hubs" for AI Alignment research.
There was a course on AGI Safety last fall in Berkeley.
A department or even a single outspokenly sympathetic official in any government of any industrialized nation
You can find a list of institutions/donors here.
A list of concrete and detailed policy proposals related to AI alignment
I would recommend reports from FHI/GovAI as a starting point.
Would this be valuable, and which resource would it be most useful to create?
Please give more detailed information about the project to receive feedback.
You can find AGI predictions, including Starcraft forecasts, in "When Will AI Exceed Human Performance? Evidence from AI Experts". Projects for having "all forecasts on AGI in one place" include ai.metaculus.com & foretold.io.