If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
GreaterWrong is calling the same API against the LW server, then serving the resulting data to you as HTML. As a result it has the same limitations, so if you keep going to the next page on Recent Comments, eventually you'll get to https://www.greaterwrong.com/recentcomments?offset=2020 and get an error "Exceeded maximum value for skip".
My attempt to resurrect the old LW Power Reader is facing an obstacle just before the finish line, due to current LW's API limitations. So this is a public appeal to the site admins/devs to relax the limit.
Specifically, my old code relied on LW1 allowing it to fetch all comments posted after a given comment ID, but I can't find anything similar in the current API. I tried reproducing this by using the allRecentComments endpoint in GraphQL, but due to the offset parameter being limited to <2000, I can't fetch comments older than a few weeks. The Power Reader is part designed to allow someone to catch up on or skim weeks/months worth of LW comments, hence the need for this functionality.
As a side effect of this project, my AI agents produced a documentation of LW's GraphQL API from LW's source code. (I was unable to find another API reference for it.) I believe it's fairly accurate, as the code written based on it seems to work well aside from the comment-loading limit.
What I had in mind is that they're relatively more esoteric than "AI could kill us all" and yet it's pretty hard to get people to take even that seriously! "Low-propensity-to-persuade-people" maybe?
Yeah, that makes sense. I guess I've been using "illegible" for a similar purpose, but maybe that's not a great word either, because that also seems to imply "hard to understand" but again it seems like these problems I've been writing about are not that hard to understand.
I wish I knew what is causing people to ignore these issues, including people in rationality/EA (e.g. the most famous rationalists have said little on them). I may be slowly growing an audience, e.g. Will MacAskill invited me to do a podcast with his org, and Jan Kulveit just tweeted "@weidai11 is completely right about the risk we won't be philosophically competent enough in time", but it's inexplicable to me how slow it has been, compared to something like UDT which instantly became "the talk of the town" among rationalists.
Pretty plausible that the same underlying mechanism is also causing the general public to not take "AI could kill us all" very seriously, and I wish I understood that better as well.
I appreciate the attention this brings to the subject, but from my perspective it doesn't emphasize enough the difficulties, or address existing concerns:
(These are of course closely related issues, not independent ones. E.g., much is downstream of the fact that we don't have a good explicit understanding of what philosophy is or should be.)
See some of my earlier writings where I talk about these (and related) difficulties in more detail. (Except for 5 which I perhaps need to write a post about.)
I'd wondered why you wrote so many pieces advising people to be cautious about more esoteric problems arising from AI,
Interesting that you have this impression, whereas I've been thinking of myself recently as doing a "breadth first search" to uncover high level problems that others seem to have missed or haven't bothered to write down. I feel like my writings in the last few years are pretty easy to understand without any specialized knowledge (whereas Google says "esoteric" is defined as "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest").
If on reflection you still think "esoteric" is right, I'd be interested in an expansion on this, e.g. which of the problems I've discussed seem esoteric to you and why.
to an extent that seemed extremely unlikely to be implemented in the real world
It doesn't look like humanity is on track to handle these problems, but "extremely unlikely" seems like an overstatement. I think there's still some paths where we handle these problems better, including 1) warning shots or political wind shift cause an AI pause/stop to be implemented, during which some of these problems/ideas are popularized or rediscovered 2) future AI advisors are influenced by my writings or are strategically competent enough to realize these same problems and help warn/convince their principals.
I also have other motivations including:
I think all sufficiently competent/reflective civilizations (including sovereign AIs) may want to do this, because it seems hard to be certain enough in one's philosophical competence to not do this as an additional check. The cost of running thousands or even millions of such simulations seem very small compared to potentially wasting the resources of an entire universe/lightcone due to philosophical mistakes. Also they may be running such simulations anyway for other purposes, so it may be essentially free to also gather some philosophical ideas from such simulations, to make sure you didn't miss something important or got stuck in some cognitive trap.
The classic idea from Yudkowsky, Christiano, etc. for what to do in a situation like this is to go meta: Ask the AI to predict what you'd conclude if you were a bit smarter, had more time to think, etc. Insofar as you'd conclude different things depending on the initial conditions, the AI should explain what and why.
Yeah, I might be too corrupted or biased to be a starting point for this. It seems like a lot of people or whole societies might not do well if placed in this kind of situation (of having something like CEV being extrapolated from them by AI), so I shouldn't trust myself either.
You, Wei, are proposing another plan: Ask the AI to simulate thousands of civilizations, and then search over those civilizations for examples of people doing philosophical reasoning of the sort that might appeal to you, and then present it all to you in a big list for you to peruse?
Not a big list to peruse, but more like, to start with, put the whole unfiltered distribution of philosophical outcomes in some secure database, then run relatively dumb/secure algorithms over it to gather statistics/patterns. (Looking at it directly by myself or using any advanced algorithms/AIs might be exposing me/us to infohazards.) For example I'd want to know what percent of civilizations think they've solved various problems like decision theory, ethics, metaphilosophy, how many clusters of solutions are there for each problem, are there any patterns/correlations between types/features of intelligence/civilization and what conclusions they ended up with.
This might give me some clues as to which clusters are more interesting/promising/safer to look at, and then I have to figure out what precautions to take before looking at the actual ideas/arguments (TBD, maybe get ideas about this from the simulations too). It doesn't seem like I can get something similar to this by just asking my AI to "do philosophy", without running simulations.
I see the application as indirect at this point, basically showing that decision theory is hard and we're unlikely to get it right without an AI pause/stop. See these two posts to get a better sense of what I mean:
Thanks. This sounds like a more peripheral interest/concern, compared to Eliezer/LW's, which was more like, we have to fully solve DT before building AGI/ASI, otherwise it could be catastrophic due to something like the AI falling prey to an acausal threat or commitment races, or can't cooperate with other AIs.
Do you have any examples of such ideas and techniques? Are any of the ideas and techniques in your paper potentially applicable to general AI alignment?