Sure, though if you're just going to say "I know how to do it! Also I won't tell you!" then it doesn't seem very pointful?
@Nate Showell @P. @Tetraspace @Joseph Miller @Lorxus
I genuinely don't know what you want elaboration of. Reacts are nice for what they are, but saying something out loud about what you want to hear more about / what's confusing / what you did and didn't understand/agree with, is more helpful.
Re/ "to whom not...", I'm asking Wei: what groups of people would not be described by the list of 6 "underestimating the difficult of philosophy" things? It seems to me that broadly, EAs and "AI alignment" people tend to favor somewhat too concrete touchpoints like "well, suppressing revolts in the past has gone like such and such, so we should try to do similar for AGI". And broadly they don't credit an abstract argument about why something won't work, or would only work given substantial further philosophical insight.
Re/ "don't think thinking ...", well, if I say "LLMs basically don't think", they're like "sure it does, I can keep prompting it and it says more things, and I can even put that in a scaffold" or "what concrete behavior can you point to that it can't do". Like, bro, I'm saying it can't think. That's the tweet. What thinking is, isn't clear, but That thinking is should be presumed, pending a forceful philosophical conceptual replacement!
If you say to someone
Ok, so, there's this thing about AGI killing everyone. And there's this idea of avoiding that by making AGI that's useful like an AGI but doesn't kill everyone and does stuff we like. And you say you're working on that, or want to work on that. And what you're doing day to day is {some math thing, some programming thing, something about decision theory, ...}. What is the connection between these things?
and then you listen to what they say, and reask the question and interrogate their answers, IME what it very often grounds out into is something like:
Well, I don't know what to do to make aligned AI. But it seems like X ϵ {ontology, decision, preference function, NN latent space, logical uncertainty, reasoning under uncertainty, training procedures, negotiation, coordination, interoperability, planning, ...} is somehow relevant.
And, I have a formalized version of some small aspect of X in which is mathematically interesting / philosophically intriguing / amenable to testing with a program, and which seems like it's kinda related to X writ large. So what I'm going to do, is I'm going to tinker with this formalized version for a week/month/year, and then I'm going to zoom out and think about how this relates to X, and what I have and haven't learned, and so on.
This is a good strategy because this is how all mathematical / scientific / technological progress is made: you start with stuff you know; you expand outwards by following veins of interest, tractability, and generality/power; you keep an eye roughly towards broader goals by selecting the broad region you're in; and you build outward. What we see historically is that this process tends to lead us to think about the central / key / important / difficult / general problems--such problems show up everywhere, so we convergently will come to address them in due time. By mostly sticking, in our day-to-day work, to things that are relatively more concrete and tractable--though continually pushing and building toward difficult things--we make forward progress, sharpen our skills, and become familiar with the landscape of concepts and questions.
So I would summarize that position as endorsing streetlighting, in a very broad sense that encompasses most math / science / technology. And this position is largely correct! My claim is that
I discuss the problem more here: https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html
(But note that, while that essay frames things as "a proposed solution", the solution is barely anything--more like a few guesses at pieces of methodology--and the main point is the discussion of the problem; maybe a writing mistake.)
An underemphasized point that I should maybe elaborate more on: a main claim is that there's untapped guidance to be gotten from our partial understanding--at the philosophical level and for the philosophical level. In other words, our preliminary concepts and intuitions and propositions are, I think, already enough that there's a lot of progress to be made by having them talk to each other, so to speak.
The type of fundamental problem that proper speculative philosophy is supposed to solve is the sort where streetlighting doesn't work (or isn't working, or isn't working fast enough). But nearly all of the alignment field after like 2004 was still basically streetlighting. It was maybe a reasonable thing to have some hope in prospectively, but retrospectively it was too much investment in streetlighting, and retrospectively I can make arguments about why one should have maybe guessed that at the time. By 2018 IIRC, or certainly by 2019, I was vociferously arguing for that in AF team meetings--but the rest of the team either disagreed with me or didn't understand me, and on my own I'm just not that good a thinker, and I didn't find anyone else to try it with. I think they have good thoughts, but are nevertheless mostly streetlighting--i.e. not trying to take step after step of thinking at the level of speculative philosophy AND aimed at getting the understanding needed for alignment.
Yeah that was not my reaction. (More like "that's going to be the most beautiful thing ever" and "I want to be that too".)
more cautious/modest/self-critical about proposing new philosophical solutions
No, if anything the job loss resulted from not doing so much more, much more intently, and much sooner.
To whom does this not apply? Most people who "work on AI alignment" don't even think that thinking is a thing.
True (but obvious) taken literally. But if you also mean it's good to show sympathy by changing your stance in the discourse, such as by reallocating private or shared attention, it's not always true. In particular, many responses you implement could be exploited.
For example, say I'm ongoingly doing something bad, and whenever you try to talk to me about it, I "get upset". In this case, I'm probably actually upset, probably for multiple reasons; and probably a deep full empathic understanding of the various things going on with me would reveal that, in some real ways, I have good reason to be upset / there's something actually going wrong for me. But now say that your response to me "getting upset" is to allocate our shared attention away from the bad thing I'm doing. That may indeed be a suitable thing to do; e.g., maybe we can work together to understand what I'm upset about, and get the good versions of everything involved. However, hopefully it's clear how this could be taken advantage of--sometimes even catastrophically, if, say, you are for some reason very committed to the sort of cooperativeness that keeps reallocating attention this way, even to the ongoing abjection of your original concern for the thing I was originally and am ongoingly doing bad. (This is a nonfictional though intentionally vague example.)
(I won't reply more, by default.)
various facts about Anthropic mean that them-making-powerful-AI is likely better than the counterfactual, and evaluating a lab in a vacuum or disregarding inaction risk is a mistake
Look, if Anthropic was honestly and publically saying
We do not have a credible plan for how to make AGI, and we have no credible reason to think we can come up with a plan later. Neither does anyone else. But--on the off chance there's something that could be done with a nascent AGI that makes a nonomnicide outcome marginally more likely, if the nascent AGI is created and observed by people are at least thinking about the problem--on that off chance, we're going to keep up with the other leading labs. But again, given that no one has a credible plan or a credible credible-plan plan, better would be if everyone including us stopped. Please stop this industry.
If they were saying and doing that, then I would still raise my eyebrows a lot and wouldn't really trust it. But at least it would be plausibly consistent with doing good.
But that doesn't sound like either what they're saying or doing. IIUC they lobbied to remove protection for AI capabilities whistleblowers from SB 1047! That happened! Wow! And it seems like Zac feels he has to pretend to have a credible credible-plan plan.
From scratch but not from scratch. https://www.lesswrong.com/posts/noxHoo3XKkzPG6s7E/most-smart-and-skilled-people-are-outside-of-the-ea?commentId=DNvmP9BAR3eNPWGBa
https://tsvibt.blogspot.com/2023/09/a-hermeneutic-net-for-agency.html