If anyone wants to have a voice chat with me about a topic that I'm interested in (see my recent post/comment history to get a sense), please contact me via PM.
My main "claims to fame":
I wonder how Eliezer would describe his "moat", i.e., what cognitive trait or combination of traits does he have, that is rarest or hardest to cultivate in others? (Would also be interested in anyone else's take on this.)
I'm curious what you say about "which are the specific problems (if any) where you specifically think 'we really need to have solved philosophy / improved-a-lot-at-metaphilosophy' to have a decent shot at solving this?'"
Assuming by "solving this" you mean solving AI x-safety or navigating the AI transition well, I just post a draft about this. Or if you already read that and are asking for an even more concrete example, a scenario I often think about is an otherwise aligned ASI, some time into the AI transition when things are moving very fast (from a human perspective) and many highly consequential decisions need to be made (e.g., what alliances to join, how to bargain with others, how to self-modify or take advantage of the latest AI advances, how to think about AI welfare and other near-term ethical issues, what to do about commitment races and threats, how to protect the user against manipulation or value drift, whether to satisfy some user request that might be harmful according to their real values) that often involve philosophical problems. And they can't just ask their user (or alignment target) or even predict "what would the user say if they thought about this for a long time" because the user themselves may not be philosophically very competent and/or making such predictions with high accuracy (over a long enough time frame) is still outside their range of capabilities.
So the specific problem is how to make sure this AI doesn't make wrong decisions that cause a lot of waste or harm, that quickly or over time cause most of the potential value of the universe to be lost, which in turn seems to involve figuring out how the AI should be thinking about philosophical problems, or how to make the AI philosophically competent even if their alignment target isn't.
Does this help / is this the kind of answer you're asking for?
Some of Eliezer's founder effects on the AI alignment/x-safety field, that seem detrimental and persist to this day:
I've repeatedly argued against 1 from the beginning, and also somewhat against 2 and 3, but perhaps not hard enough because I personally benefitted from them, i.e., having pre-existing interest/ideas in decision theory that became validated as centrally important for AI x-safety, and generally finding a community that is interested in philosophy and took my own ideas seriously.
Eliezer himself is now trying hard to change 1, and I think we should also try harder to correct 2 and 3. On the latter, I think academic philosophy suffers from various issues, but also that the problems are genuinely hard, and alignment researchers seem to have inherited Eliezer's gung-ho attitude towards solving these problems, without adequate reflection. Humanity having few competent professional philosophers should be seen as (yet another) sign that our civilization isn't ready to undergo the AI transition, not a license to wing it based on one's own philosophical beliefs or knowledge!
In this recent EAF comment, I analogize AI companies trying to build aligned AGI with no professional philosophers on staff (the only exception I know is Amanda Askell) with a company trying to build a fusion reactor with no physicists on staff, only engineers. I wonder if that analogy resonates with anyone.
To try to explain how I see the difference between philosophy and metaphilosophy:
My definition of philosophy is similar to @MichaelDickens' but I would use "have serviceable explicitly understood methods" instead of "formally studied" or "formalized" to define what isn't philosophy, as the latter might be or could be interpreted as being too high of a bar, e.g., in the sense of formal systems.
So in my view, philosophy is directly working on various confusing problems (such as "what is the right decision theory") using whatever poorly understood methods that we have or can implicitly apply, and then metaphilosophy is trying to help solve these problems on a meta level, by better understanding the nature of philosophy, for example:
Does this make sense?
One way to see that philosophy is exceptional is that we have serviceable explicit understandings of math and natural science, even formalizations in the forms of axiomatic set theory and Solomonoff Induction, but nothing comparable in the case of philosophy. (Those formalizations are far from ideal or complete, but still represent a much higher level of understanding than for philosophy.)
If you say that philosophy is a (non-natural) science, then I challenge you, come up with something like Solomonoff Induction, but for philosophy.
- Trading is a zero sum game inside a larger positive sum game. Though every trade has a winner and offsetting losers,
This isn't true. Sometimes you're trading against someone with non-valuation motives, i.e., someone buying or selling for a reason besides thinking that the current market price is too low or too high, for example, someone being liquidated due to a margin violation, or the founder of a company wanting to sell in order to diversify. In that case, it makes more sense to think of yourself as providing a service for the other side of the trade, instead of there being a winner and a loser.
markets as a whole direct resources across space and time and help civilizations grow.
Unpriced externalities imply that sometimes markets harm civilizations. I think investments into AGI/ASI is a prime example of this, with x-risks being the unpriced externality.
Figuring out the underlying substance behind "philosophy" is a central project of metaphilosophy, which is far from solved, but my usual starting point is "trying to solve confusing problems which we don't have established methodologies for solving" (methodologies meaning explicitly understood methods), which I think bakes in the least amount of assumptions about what philosophy is or could be, while still capturing the usual meaning of "philosophy" and explains why certain fields started off as being part of philosophy (e.g., science starting off as nature philosophy) and then became "not philosophy" when we figured out methodologies for solving them.
I think "figure out what are the right concepts to be use, and, use those concepts correctly, across all of relevant-Applied-conceptspace" is the expanded version of what I meant, which maybe feels more likely to be what you mean.
This bakes in "concepts" being the most important thing, but is that right? Must AIs necessarily think about philosophy using "concepts", or is that really the best way to formulate how idealized philosophical reasoning should work?
Is "concepts" even what distinguishes philosophy from non-philosophical problems, or is "concepts" just part of how humans reason about everything, which we latch onto when trying to define or taboo philosophy, because we have nothing else better to latch onto? My current perspective is that what uniquely distinguishes philosophy is their confusing nature and the fact that we have no well-understood methods for solving them (but would of course be happy to hear any other perspectives on this).
Regarding good philosophical taste (or judgment), that is another central mystery of metaphilosophy, which I've been thinking a lot about but don't have any good handles on. It seems like a thing that exists (and is crucial) but is very hard to see how/why it could exist or what kind of thing it could be.
So anyway, I'm not sure how much help any of this is, when trying to talk to the type of person you mentioned. The above are mostly some cached thoughts I have on this, originally for other purposes.
BTW, good philosophical taste being rare definitely seems like a very important part of the strategic picture, which potentially makes the overall problem insurmountable. My main hopes are 1) someone makes an unexpected metaphilosophical breakthrough (kind of like Satoshi coming out of nowhere to totally solve distributed currency) and there's enough good philosophical taste among the AI safety community (including at the major labs) to recognize it and incorporate it into AI design or 2) there's an AI pause during which human intelligence enhancement comes online and selecting for IQ increases the prevalence of good philosophical taste as a side effect (as it seems too much to hope that good philosophical taste would be directly selected for) and/or there's substantial metaphilosophical progress during the pause.
Unless you can abstract out the "alignment reasoning and judgement" part of a human's entire brain process (and philosophical reasoning and judgement as part of that) into some kind of explicit understanding of how it works, how do you actually build that into AI without solving uploading (which we're obviously not on track to solve in 2-4 year either)?
put a bunch of smart thoughtful humans in a sim and run it for a long time
Alignment researchers have had this thought for a long time (see e.g. Paul Christiano's A formalization of indirect normativity) but I think all of the practical alignment research programs that this line of thought led to, such as IDA and Debate, are all still bottlenecked by lack of metaphilosophical understanding, because without the kind of understanding that lets you build an "alignment/philosophical reasoning checker" (analogous to a proof checker for mathematical reasoning) they're stuck trying to do ML of alignment/philosophical reasoning from human data, which I think is unlikely to work out well.
b-money: I guess most people working on crypto-based payments were trying to integrate with the traditional banking system, and didn't have the insight/intuition that money is just a way for everyone to "keep tabs" of how much society as a whole owes to each person (e.g. for previous services rendered), and therefore a new form of money (i.e. not fiat or commodity) could be created and implemented as a public/distributed database or ledger.
UDT: I initially became interested in decision theory for a very different reason than Eliezer. I was trying to solve anthropic reasoning, and tried a lot of different ideas but couldn't find one that was satisfactory. Eventually I decided to look into decision theory (as the "source" of probability theory) and had the insight/intuition that if the decision theory didn't do any updating then we could sidestep the entire problem of anthropic reasoning. Hal Finney was the only one to seriously try to understand this idea, but couldn't or didn't appreciate it (in fairness my proto-UDT was way more complicated than EDT, CDT, or the later UDT, because I noticed that it would cooperate with its twin in one-shot PD, and added complications to make it defect instead, not questioning the conventional wisdom that that's what's rational).
Eventually I got the idea/hint from Eliezer that it can be rational to cooperate in one-shot PD, and also realized my old idea seem to fit well with what Nesov was discussing (counterfactual mugging), and this caused me to search for a formulation that was simple/elegant and could solve all of the problems known at the time, which became known as UDT.
I think Eliezer was also interested in anthropic reasoning, so I think he was missing my move to look into decision theory for inspiration/understanding and then making the radical call that maybe anthropic reasoning is unsolvable as posed, and should be side-stepped via a change to decision theory.
need for an AI pause/slowdown: I think I found Eliezer convincing when he started talking about the difficulty of making AI Friendly and why others likely wouldn't try hard enough to succeed, and just found it implausible that he could with a small team win a race against the entire world who was spending much less effort/resources on trying to make their AIs Friendly. Plus I had my own worries early on that we needed to either solve all the important philosophical problems before building AGI/ASI, or figure out how to make sure the AI itself is philosophically competent, and both are unlikely to happen without a pause/slowdown (partly because nobody else seemed to share this concern or talked about it).