I have not reviewed the recent literature but last I checked (summer 2024), it seemed to me that the literature overwhelmingly drew the conclusion that English for the most part acted as a sort of mentalese, with translation neurons layered across the network, but mostly on the outermost layers on both sides (close to input and output)
It seems in this way that Haiku might be distinct from the heretofore studied models?
Do you think this is just a natural consequence of scale, with the understanding of a sort of a "universal grammar" naturally emerging, or is there anything to Claude's training that'd make it qualitatively different from other models in this regard?
Re: Multi-lingual capabilities
I have not reviewed the recent literature but last I checked (summer 2024), it seemed to me that the literature overwhelmingly drew the conclusion that English for the most part acted as a sort of mentalese, with translation neurons layered across the network, but mostly on the outermost layers on both sides (close to input and output)
It seems in this way that Haiku might be distinct from the heretofore studied models?
Do you think this is just a natural consequence of scale, with the understanding of a sort of a "universal grammar" naturally emerging, or is there anything to Claude's training that'd make it qualitatively different from other models in this regard?