My top interest is AI safety, followed by reinforcement learning. My professional background is in software engineering, computer science, machine learning. I have degrees in electrical engineering, liberal arts, and public policy. I currently live in the Washington, DC metro area; before that, I lived in Berkeley for about five years.
@Hastings ... I don't think I made a comment in this thread -- and I don't see one when I look. I wonder if you are replying to a different one? Link it if you find it?
This diagram from page 4 of "Data Poisoning the Zeitgeist: The AI Consciousness Discourse as a pathway to Legal Catastrophe" conveys the core argument quite well:
I’m about to start reading “Fifty Year of Research on Self-Replication" (1998) by Moshe Sipper. I have a hunch that the history and interconnections therein might be under-appreciated in the field of AI safety. I look forward to diving in.
A quick disclosure of some of my pre-existing biases: I also have a desire to arm myself against the overreaching claims and self-importance of Stephen Wolfram. A friend of mine was “taken in” by Wolfram’s debate with Yudkowsky… and it rather sickened me to see Wolfram exerting persuasive power. At the same time, certain of Wolfram’s rules are indeed interesting, so I want to acknowledge his contributions fairly.
Sorry for the confusion. :P ... I do appreciate the feedback. Edited to say: "I'm noticing evidence that many of us may have an inaccurate view of the 1983 Soviet nuclear false alarm. I say this after reading..."
I have also seen conversations get derailed based on such disagreements.
I expect to largely adopt this terminology going forward
May I ask to which audience(s) you think this terminology will be helpful? And what particular phrasing(s) do you plan on trying out?
The quote above from Chalmers is dense and rather esoteric; so I would hesitate to use its particular terminology for most people (the ones likely to get derailed as discussed above). Instead, I would seek out simpler language. As a first draft, perhaps I would say:
Let's put aside whether LLMs think on the inside. Let's focus on what we observe -- are these observations consistent with the word "thinking"?
Parties aren't real, the power must be in specific humans or incentive systems.
I would caution against saying "parties aren't real" for at least two reasons. First, it more-or-less invites definitional wars which are rarely productive. Second, when we think about explanatory and predictive theories, whether something is "real" (however you define it) is often irrelevant. What matters more is is the concept sufficiently clear / standardized / "objective" to measure something and thus serve as some replicable part of a theory.
Humans have long been interested in making sense of power through various theories. One approach is to reduce it to purely individual decisions. Another approach involves attributing power to groups of people or even culture. Models serve many purposes, so I try to ground these kinds of discussions with questions such as:
These are very different questions, leading to very different models.
I'm noticing evidence that many of us may have an inaccurate view of the 1983 Soviet nuclear false alarm. I say this after reading "Did Stanislav Petrov save the world in 1983? It's complicated". The article is worth reading; it is a clear and detailed ~1100 words. I've included some excerpts here:
[...] I must say right away that there is absolutely no reason to doubt Petrov's account of the events. Also, there is no doubt that Stanislav Petrov did the right thing when he reported up the chain of command that in his assessment the alarm was false. That was a good call in stressful circumstances and Petrov fully deserves the praise for making it.
But did he literally avert a nuclear war and saved the world? In my view, that was not quite what happened. (I would note that as far as I know Petrov himself never claimed that he did.)
To begin with, one assumption that is absolutely critical for the "saved the world" version of the events is that the Soviet Union maintained the so-called launch-on-warning posture. This would mean that it was prepared to launch its missiles as soon as its early-warning system detects a missile attack. This is how US system works and, as it usually happens, most people automatically assume that everybody does the same. Or at least tries to. This assumption is, of course, wrong. The Soviet Union structured its strategic forces to absorb a nuclear attack and focused on assuring retaliation - the posture known as "deep second strike" ("ответный удар"). The idea was that some missiles (and submarines) will survive an attack and will be launched in retaliation once it is over.
[...] the Soviet Union would have waited for actual nuclear detonations on its soil. Nobody would have launched anything based on an alarm generated by the early-warning system, let alone by only one of its segments - the satellites.
[...] It is certain that the alarm would have been recognized as false at some stages. But even if it wasn't, the most radical thing the General Staff (with the involvement of the political leadership) would do was to issue a preliminary command. No missiles would be launched unless the system detected actual nuclear detonations on the Soviet territory.
Having said that, what Stanislav Petrov did was indeed commendable. The algorithm that generated the alarm got it wrong. The designers of the early-warning satellites took particular pride in the fact that the assessment is done by computers rather than by humans. So, it definitely took courage to make that call to the command center up the chain of command and insist that the alarm is false. We simply don't know what would have happened if he kept silence or confirmed the alarm as positive. And, importantly, he did not know it either. He just did all he could to prevent the worst from happening.
I haven't made a thorough assessment myself. For now, I'm adding The Dead Hand to my reading list.
Edited on 2025-12-25 to improve clarity of the first point. Thanks to readers for the feedback.
I am seeking definitions of key foundational concepts in this paper (cognitive pattern, context, influence, selection, motivations) with (a) something as close to formal precision as possible while (b) attempting a minimal word count. This might be asking a lot, but I think it can be done, and I think it is important. I suggest using a very basic foundation: the basic terminology of artificial neural networks (ANNs): neurons, weights, activations, etc. Let the difficulty arise in putting the ideas together, not in confusion about the definitions themselves. If there is ambiguity or variation in how these terms apply, I think it would make sense to lock-in some particulars so the definitions can be tightened up. (Walk before you run.) Even better if these definitions themselves are diagrammed and connected visually (perhaps with something like an ontology diagram).
I'd appreciate any efforts in this direction, thanks! I've started a draft myself, but I want to have some properly uninterrupted time to iterate on it before sharing.
Why do I ask this? Personally, I find this article hard to parse due to definitional reasons.
I want to be clear: Lots of terrifyingly smart people made this mistake, including some of the smartest scientists who ever lived. Many of them made this mistake for a decade or more before wising up or giving up.
Imagine this. Imagine a future world where gradient-driven optimization never achieves aligned AI. But there is success of a different kind. At great cost, ASI arrives. Humanity ends. In his few remaining days, a scholar with the pen name of Rete reflects back on the 80s approach (i.e. using deterministic rules and explicit knowledge) with the words: "The technology wasn't there yet; it didn't work commercially. But they were onto something -- at the very least, their approach was probably compatible with provably safe intelligence. Under other circumstances, perhaps it would have played a more influential role in promoting human thriving."
Communication note: writing
EGinstead ofe.g.feels unnecessarily confusing to me.In this context, EG probably should be reserved for Edmund Gettier: