Sorted by New

Wiki Contributions


David Chapman actually uses social media recommendation algorithms as a central example of AI that is already dangerous:

shared a review in some private channels, might as well share it here:

The book positions itself as a middle ground between optimistic capabilities researchers striding blithely into near-certain catastrophe and pessimistic alignment researchers too concerned with dramatic abstract doom scenarios to address more realistic harms that can still be averted. When addressing the latter, Chapman constructs a hypothetical "AI goes FOOM and unleashes nanomachine death" scenario and argues that while alignment researchers are correct that we have no capacity to prevent this awful scenario, it relies on many leaps (very fast boostrapped self-optimization, solving physics in seconds, nanomachines) that provoke skepticism. I'm inclined to agree: I know that the common line is that "nanomachines are just one example of how TAI can accomplish its goals, FOOM doom scenarios still work if you substitute it with a more plausible technology", but I'm not sure that they do! "Superdangerous virus synthesis" is the best substitute I've heard, but I'm skeptical of even that causing total human extinction (tho the mass suffering that it'd cause is grounds enough for extreme concern).

Chapman also suggests a doom scenario based on a mild extrapolation of current capabilities, where generative models optimized for engagement provoke humans into political activism that leads to world war. Preventing this scenario is a more tractable problem than the former. Instead crafting complex game-theoretic theories, we can discencentivize actors at the forefront of capabilities research from developing and deploying general models. Chapman suggests strengthening data collection regulation and framing generative content as a consumer hazard that deserves both legal and social penalty, like putting carcinogens or slave-labor-derived substances in products.

I think that he's too quick to dismiss alignment theory work as overly-abstract and unconcerned with plausibility. This dismissal is rhetorically useful in selling AI safety to readers hesitant to accept extreme pessimism based on heavily deductive arguments, but this doesn't win points with me because I'm not a fan of strategic distortion of fact. On the other hand, I really like that he proposes an overlooked strategy for addressing AI risk that not only addresses current harms, but is accessible to people with skills disjoint from those required for theoretical alignment work. Consumer protection is a well-established field with a numer of historical wins, and adopting its techniques sounds promising.

It is possible that the outlier dimensions are related to the LayerNorms since the layernorm gain and bias parameters often also have outlier dimensions and depart quite strongly from Gaussian statistics. 


This reminds me of a LessWrong comment that I saw a few months ago:

I think at least some GPT2 models have a really high-magnitude direction in their residual stream that might be used to preserve some scale information after LayerNorm.

I am surprised that these issues would apply to, say, Google translate. Google appears unconstrained by cost or shortage of knowledgeable engineers. If Google developed a better translation model, I would expect to see it quickly integrated into the current translation interface. If some external group developed better translation models, I would expect to see them quickly acquired by Google.

Why haven't they switched to newer models?

I thoroughly enjoyed this paper, and would very much like to see the same experiment performed on language models in the billion-parameter range. Would you expect the results to change, and how?

AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for


Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?

Thank you for clarifying your intended point. I agree with the argument that playful thinking is intrinsically valuable, but still hold that the point would have been better-reinforced by including some non-mathematical examples.

I literally don’t believe this

Here are two personal examples of playful thinking without obvious applications to working on alignment:

  1. A half-remembered quote, originally attributed to the French comics artist Moebius: “There are two ways to impress your audience: do a good drawing, or pack your drawing full of so much detail that they can’t help but be impressed.” Are there cases where less detail is more impressive than more detail? Well, impressive drawings of beautiful girls frequently have the strikingly sparse detail on the face, see drawings by Junji Ito, John Singer Sargent, Studio Kyoto. Why are these drawings appealing? Maybe because the sparse detail, mostly concentrated in the eyes (though Sargent reduces the eyes to simplified block-shadows as well), implies smooth, flawless skin. Maybe because the sparsity allows the viewer to interpret the empty space to contain their own idealized image— the Junji Ito girl’s nasal bridge is left undefined, so you can imagine her having a straight or button nose according to preference. Maybe there’s something inherently appealing to leaving shapes implied— doesn’t that Junji Ito girl’s nasal bridge remind you of the Kanizsa Triangle? Are people intrinsically drawn to having the eye fooled by abstracted drawings?
  2. A number of romantic Beatles songs have sinister undertones: the final verse of Norwegian Wood refers to committing arson, and even the sentimental Something has this bizarre protest “I don't want to leave her now”— ok dude, then why are you bringing it up? Is this consistent through their discography? Is this something unique to the Beatles, or were sinister love songs a well-established pop genre at the time? Did these sinister implications go over the heads of audiences, or were they key to the masses’ enjoyment of the songs?

If you can think of ways that these lines of thought apply to alignment, please let me know. This isn’t a “gotcha”, if you actually came up with something it’d be pretty dope.

I agree. It seems awfully convenient that the all of the “fun” described in this post involve the legibly-impressive topics of physics and mathematics. Most people, even highly technically competent people, aren’t intrinsically drawn to play with intellectually prestigious tasks. They find fun in sports, drawing, dancing, etc. Even when they take adopt an attitude of intellectual inquiry to their play, the insights generated from drawing techniques or dance moves are far less obviously applicable to working on alignment than the insights generated from studying physics. My concern is that presenting such a safely “useful” picture of fun undercuts the message to “follow your playful impulses, even if they’re silly”, because the implicit signal is “and btw, this is how smart people like us should be having fun B)”