It is possible that the outlier dimensions are related to the LayerNorms since the layernorm gain and bias parameters often also have outlier dimensions and depart quite strongly from Gaussian statistics.
This reminds me of a LessWrong comment that I saw a few months ago:
I think at least some GPT2 models have a really high-magnitude direction in their residual stream that might be used to preserve some scale information after LayerNorm.
I am surprised that these issues would apply to, say, Google translate. Google appears unconstrained by cost or shortage of knowledgeable engineers. If Google developed a better translation model, I would expect to see it quickly integrated into the current translation interface. If some external group developed better translation models, I would expect to see them quickly acquired by Google.
I thoroughly enjoyed this paper, and would very much like to see the same experiment performed on language models in the billion-parameter range. Would you expect the results to change, and how?
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for
Something that I’m really confused about: what is the state of machine translation? It seems like there is massive incentive to create flawless translation models. Yet when I interact with Google translate or Twitter’s translation feature, results are not great. Are there flawless translation models that I’m not aware of? If not, why is translation lagging behind other text analysis and generation tasks?
Thank you for clarifying your intended point. I agree with the argument that playful thinking is intrinsically valuable, but still hold that the point would have been better-reinforced by including some non-mathematical examples.
I literally don’t believe this
Here are two personal examples of playful thinking without obvious applications to working on alignment:
If you can think of ways that these lines of thought apply to alignment, please let me know. This isn’t a “gotcha”, if you actually came up with something it’d be pretty dope.
I agree. It seems awfully convenient that the all of the “fun” described in this post involve the legibly-impressive topics of physics and mathematics. Most people, even highly technically competent people, aren’t intrinsically drawn to play with intellectually prestigious tasks. They find fun in sports, drawing, dancing, etc. Even when they take adopt an attitude of intellectual inquiry to their play, the insights generated from drawing techniques or dance moves are far less obviously applicable to working on alignment than the insights generated from studying physics. My concern is that presenting such a safely “useful” picture of fun undercuts the message to “follow your playful impulses, even if they’re silly”, because the implicit signal is “and btw, this is how smart people like us should be having fun B)”
shared a review in some private channels, might as well share it here:
The book positions itself as a middle ground between optimistic capabilities researchers striding blithely into near-certain catastrophe and pessimistic alignment researchers too concerned with dramatic abstract doom scenarios to address more realistic harms that can still be averted. When addressing the latter, Chapman constructs a hypothetical "AI goes FOOM and unleashes nanomachine death" scenario and argues that while alignment researchers are correct that we have no capacity to prevent this awful scenario, it relies on many leaps (very fast boostrapped self-optimization, solving physics in seconds, nanomachines) that provoke skepticism. I'm inclined to agree: I know that the common line is that "nanomachines are just one example of how TAI can accomplish its goals, FOOM doom scenarios still work if you substitute it with a more plausible technology", but I'm not sure that they do! "Superdangerous virus synthesis" is the best substitute I've heard, but I'm skeptical of even that causing total human extinction (tho the mass suffering that it'd cause is grounds enough for extreme concern).
Chapman also suggests a doom scenario based on a mild extrapolation of current capabilities, where generative models optimized for engagement provoke humans into political activism that leads to world war. Preventing this scenario is a more tractable problem than the former. Instead crafting complex game-theoretic theories, we can discencentivize actors at the forefront of capabilities research from developing and deploying general models. Chapman suggests strengthening data collection regulation and framing generative content as a consumer hazard that deserves both legal and social penalty, like putting carcinogens or slave-labor-derived substances in products.
I think that he's too quick to dismiss alignment theory work as overly-abstract and unconcerned with plausibility. This dismissal is rhetorically useful in selling AI safety to readers hesitant to accept extreme pessimism based on heavily deductive arguments, but this doesn't win points with me because I'm not a fan of strategic distortion of fact. On the other hand, I really like that he proposes an overlooked strategy for addressing AI risk that not only addresses current harms, but is accessible to people with skills disjoint from those required for theoretical alignment work. Consumer protection is a well-established field with a numer of historical wins, and adopting its techniques sounds promising.