Just to be clear, while I "vibe very hard" with what the author says on a conceptual level, I'm not directly calling for you to shut down those projects. I'm trying to explain what I think the author sees as a problem within the AI safety movement. Because I am talking to you specifically, I am using the immediate context of your work, but only as a frame not as a target. I found AI 2027 engaging, a good representation of a model of how takeoff will happen, and I thought it was designed and written well (tbh my biggest quibble is "why isn't it called AI 2028"). The author is very very light on actual positive "what we should do" policy recommendations, so if I talked about that I would be filling in with my own takes, which probably differ from the author's in several places. I am happy to do that if you want, though probably not publicly in a LW thread.
@Daniel Kokotajlo Addendum:
Finally, my interpretation of "Chapter 18: What Is to Be Done?" (and the closest I will come to answering your question based on the author's theory/frame) is something like "the AGI-birthing dynamic is not a rational dynamic, therefore it cannot be defeated by policies or strategies that are focused around rational action". Furthermore, since each actor wants to believe that their contribution to the dynamic is locally rational (if I don't do it someone else will/I'm counterfactually helping/this intervention will be net positive/I can use my influence for good at a pivotal moment [...] pick your argument), further arguments about optimally rational policies only encourages the delusion that everyone is acting rationally, making them dig in their heels further.
The core emotions the author points to that motivate the AGI dynamic are: thrill of novelty/innovation/discovery, paranoia and fear about "others" (other labs/other countries/other people) achieving immense power, distrust of institutions, philosophies, and systems that underpin the world, and a sense of self importance/destiny. All of these can be justified with intellectual arguments but are often the bottom line that comes before such arguments are written. On the other hand the author also shows how poor emotional understanding and estrangement from one's emotions and intuitions lead to people getting trapped by faulty but extremely sophisticated logic. Basically, emotions and intuitions offer first order heuristics in the massively high dimensional space of possible actions/policies, and when you cut off the heuristic system you are vulnerable to high dimensional traps/false leads that your logic or deductive abilities are insufficient to extract you from.
Therefore, the answer the author is pointing at is something like an emotional or frame realignment challenge. You don't start arguing with a suicidal person about why the logical reasons they have offered for jumping don't make sense (at least, you don't do this if you want them to stay alive), you try to point them to a different emotional frame or state (i.e. calming them down and showing them there is a way out). Though he leaves it very vague, it seems that he believes the world will also need such a fundamental frame shift or belief-reinterpretation to actually exit this destructive dynamic, the magnitude of which he likens to a religious revelation and compares to the redemptive power of love. Beyond this point I would be filling in my own interpretation and I will stop there, but I have a lot more thoughts about this (especially the idea of love/coordination/ends to moloch).
To be honest, I wasn't really pointing at you when I made the comment, more at the practice of the hedges and the qualifiers. I want to emphasise that (from the evidence available to me publicly) I think that you have internalised your beliefs a lot more than those the author collects into the "uniparty". I think that you have acted bravely and with courage in support of your convictions, especially in face of the NDA situation, for which I hold immense respect. It could not have been easy to leave when you did.
However, my interpretation of what the author is saying is that beliefs like "I think what these people are doing might seriously end the world" are in a sense fundamentally difficult to square with measured reasoning and careful qualifiers. The end of the world and existential risk are by their nature so totalising and awful ideas that any "sane" interaction with them (as in, trying to set measured bounds and make sensible models) is extremely epistemically unsound, the equivalent of arguing whether 1e8 + 14 people or 1e8 + 17 people (3 extra lives!) will be the true number of casualties in some kind of planetary extinction event when the error bars are themselves +- 1e5 or 1e6. (We are, after all, dealing with never-seen-before black swan events.)
In this sense, detailed debates about which metrics to include in a takeoff model and the precise slope of the METR exponential curve and which combination of chip trade and export policies increases tail risk the most/least is itself a kind of deception. This is because the arguing over details implies that our world and risk models have more accuracy and precision than they actually do, and in turn that we have more control over events than we actually do. "Directionally correct" is in fact the most accuracy we're going to get, because (per the author) Silicon Valley isn't actually doing some kind of carefully calculated compute-optimal RSI takeoff launch sequence with a well understood theory of learning. The AGI "industry" is more like a group of people pulling the lever of a slot machine over and over and over again, egged on by a crowd of eager onlookers, spending down the world's collective savings accounts until one of them wins big. By "win big", of course, I mean "unleashes a fundamentally new kind of intelligence into the world". And each of them may do it for different reasons, and some of them may in their heads actually have some kind of master plan, but all it looks like from the outside is ka-ching, ka-ching, ka-ching, ka-ching...
The ideal market-making move is to introduce a new necessity for continued existence, like water.
well, with nuance. Like, it's not my ideal policy package, I think if I were in charge of the whole world we'd stop AI development temporarily and then figure out a new, safer, less power-concentrating way to proceed with it. But it's significantly better by my lights than what most people in the industry and on twitter and in DC are advocating for. I guess I should say I approximately believe all those things, and/or I think they are all directionally correct
With all due respect, I'm pretty sure that the existence of this very long string of qualifiers and very carefully reasoned hedges is precisely what the author means when he talks about intellectualised but not internalised beliefs.
Information warfare and psychological warfare are well known terms. However, I would suggest that any well-intentioned outsider trying to figure out "what's going on with AI right now" (especially in a governance context) is effectively being subject to the equivalent of an information state of nature (a la Hobbes). There are masses of opinions being shouted furiously, most of the public experts have giant glowing signs marked "I have serious conflicts of interest", and the number of self-proclaimed insiders trying to get power/influence/money/a job at a lab/a slice of the lightcone by marketing insider takes is kind of deafening. And of course the companies are running targeted influencing/lobbying campaigns on top of all this, trying to position themselves as the sole reliable actors.
@Caleb Biddulph For future reference, what I meant by "set up other systems" is classical RL systems like vanilla Q-learning: https://www.geeksforgeeks.org/machine-learning/q-learning-in-python/ . Today we know Q-learning primarily as deep Q-learning (which was one of Deepmind's original Big Papers), but it is entirely possible to do Q-learning with no neural networks to learn state representations or Q-values, instead just using a lookup table that matches state and action. This is pretty inefficient, for somewhat obvious reasons.
Like this post a lot - might have more detailed thoughts later but just wanted to park this here
I remember that you posted some variant of this idea as a short form or a post some time ago. I can see that you feel the idea is very important, and I want to respond to it on its terms. My quick answer is that even under the "same" morals, people can undertake quite destructive actions to others, because most choices are made under a combination of perceived values (moral beliefs) and perceived circumstances (factual beliefs). Longer answer follows:
C. S. Lewis once tried to create a rough taxonomy of worldwide moral standards, showing that ideas such as the golden rule (do unto others what you would have others do unto you) and variants like the inverse golden rule (do not do unto others what you would not have others do unto you) were surprisingly popular across cultures. This was part of a broader project which is actually quite relevant to discussions of transhumanism. He was arguing that what we would call eugenics and transformative technology would annihilate "moral progress", but we can set aside the transhumanist argument for now and just focus on the shared principles---things like "family is good" or "killing is bad".
First of all, it should be a bad sign for your plan that such common principles can be identified at all, since it suggests that people might already have similar morals but still come to different conclusions about what is to be done. Second, it becomes quickly clear that some shared moral principles might lead to quite strong conflicts: I'm thinking about morals like "me/my family/my ethnic group is superior and should come first/have the priority in situations of scarcity and danger". If four different nations are led by governments with that same belief, and the pool of resources becomes limited, fighting will almost certainly break out. This is true even if cooperation would prevent the destruction of valuable limited resources, increasing total resources available to the nations as a whole!
From a broader perspective, what I see in the steelman of your idea is something like "if we get people to discuss, they will quickly realise that their ideas about what is moral are insufficient, and a better set of morals will emerge". So they might converge on something like "we are all one global family, and we should all stick together even when we disagree because conflict is terrible". However, this is where the circumstances part of the choice comes in. I can agree in principle that unity is good and that life is sacred. However, if I believe that someone else does not share those ethics, and is (for example) about to kill me or rob me, I might act in self defence. Most of us would call that justified, even though it violates my stated values. Today many leaders make lip service about respecting human rights and international norms... but it's just that those evil evildoers are so evil that we need to do something serious to stop them. My values are pure, but circumstances forced my hand. And so on, and so forth.
Now, if you can truly convince everyone that everyone else is also a reasonable and nice human being, then maybe some progress can be made, but this is a very, very difficult thing, especially when there are centuries of conflict to deal with and legacies of complex and multilayered traumas in many parts of the world. So all in all I think this proposal is very unlikely to succeed. I hope this makes sense.
My best argument as to why coarse-graining and "going up a layer" when describing complex systems are necessary:
Often we hear a reductionist case against ideas like emergence which goes something like this: "If we could simply track all the particles in e.g. a human body, we'd be able to predict what they did perfectly with no need for larger-scale simplified models of organs, cells, minds, personalities etc.". However, this kind of total knowledge is actually impossible given the bounds of the computational power available to us.
First of all, when we attempt to track billions of particle interactions we very quickly end up with a chaotic system, such that tiny errors in measurements and setting up initial states quickly compound into massive prediction errors (A metaphor I like is that you're "using up" the decimal points in your measurement: in a three body system the first timestep depends mostly on the value of the non-decimal portions of the starting velocity measurements. A few timesteps down changing .15 to .16 makes a big difference, and by the 10000th timestep the difference between a starting velocity of .15983849549 and .15983849548 is noticeable). This is the classic problem with weather prediction.
Second of all, tracking "every particle" means that the scope of the particles you need to track explodes out of the system you're trying to monitor into the interactions the system has with neighbouring particles, and then the neighbours of neighbours, so on and so forth. In the human case, you need to track every particle in the body, but also every particle the body touches or ingests (could be a virus), and then the particles that those particles touch... This continues until you reach the point where "to understand the baking process of an apple pie you must first track the position of every particle in the universe"
The emergence/systems solution to both problems is to essentially go up a level. Instead of tracking particles, you should track cells, organs, individual humans, systems etc. At each level (following Erik Hoel's Causal Emergence framework) you trade microscale precision for predictive power i.e. the size of the system you can predict for a given amount of computational power. Often this means collapsing large amounts of microscale interactions into random noise - a slot machine could in theory be deterministically predicted by tracking every element in the randomiser mechanism/chip, but in practice it's easier to model as a machine with an output distribution set by the operating company. Similarly, we trade Feynman diagrams for brownian motion and Langevin dynamics.