What I don’t understand is, either in my model or Critch’s, where we find more hope by declining a pivotal act, once one becomes feasible?
Part of the reason for more hope is that people are more trustworthy if they commit to avoiding the worst forms of unilateralist curses and world conquest. So by having committed to avoiding the pivotal act, leading actors became more likely to cooperate in ways that avoided the need for a pivotal act.
If a single pivotal act becomes possible, then it seems likely that it will also be possible to find friendlier pivotal processes that include persuading most governments to take appropriate actions. An AI that can melt nearly all GPUs will be powerful enough to scare governments into doing lots of things that are currently way outside the Overton window.
Cheap printing was likely a nontrivial factor, but was influenced by much more than just the character sets. Printing presses weren't very reliable or affordable until a bunch of component technologies reached certain levels of sophistication. Even after they became practical, most cultures had limited interest in them.
Filtering out entire sites seems too broad and too crude to have much benefit.
I see plenty of room to turn this into a somewhat good proposal by having GPT-4 look through the dataset for a narrow set of topics. Something close to "how we will test AIs for deception".
A good deal of this post is correct. But the goals of language models are more complex than you admit, and not fully specified by natural language. LLMs do something that's approximately a simulation of a human. Those simulated quasi-humans are likely to have quasi-human goals that are unstated and tricky to observe, for much the same reasons that humans have such goals.
LLMs also have goals that influence what kind of human they simulate. We'll know approximately what those goals are, due to our knowledge of what generated those goals. But how do we tell whether approximately is good enough?
Many rationalists do follow something resembling the book's advice.
CFAR started out with too much emphasis on lecturing people, but quickly noticed that wasn't working, and pivoted to more emphasis on listening to people and making them feel comfortable. This is somewhat hard to see if you only know the rationalist movement via its online presence.
Eliezer is far from being the world's best listener, and that likely contributed to some failures in promoting rationality. But he did attract and encourage people who overcame his shortcomings for CFAR's in-person promotion of rationality.
I consider it pretty likely that CFAR's influence has caused OpenAI to act more reasonably than it otherwise would act, due to several OpenAI employees having attended CFAR workshops.
It seems premature to conclude that rationalists have failed, or that OpenAI's existence is bad.
Sorry, it doesn’t look like the conservatives have caught on to this kind of approach yet.
That's not consistent with my experiences interacting with conservatives. (If you're evaluating conservatives via broadcast online messages, I wouldn't expect you to see anything more than tribal signaling).
It may be uncommon for conservatives to use effective approaches at explicitly changing political beliefs. That's partly because politics are less central to conservative lives. You'd likely reach a more nuanced conclusion if you compare how Mormons persuade people to join their religion, which incidentally persuades people to become more conservative.
I haven't thought this out very carefully. I'm imagining a transformer trained both to predict text, and to predict the next frame of video.
Train it on all available videos that show realistic human body language.
Then ask the transformer to rate on a numeric scale how positively or negatively a human would feel in any particular situation.
This does not seem sufficient for a safe result, but implies that LeCun is less nutty than your model of him suggests.
Why assume LeCun would use only supervised learning to create the IC module?
If I were trying to make this model work, I'd use mainly self-supervised learning that's aimed at getting the module to predict what a typical human would feel. (I'd also pray for a highly multipolar scenario if I were making this module immutable when deployed.)
Verified safe software means the battle shifts to vulnerabilities in any human who has authority over the system.