Big Macs are 0.4% of beef consumption specifically, rather than:
The health impact of red meat is certainly dominated by beef, and the environmental impact of all animal food might be as well, but my impression is that beef accounts for a small fraction of the cruelty of animal farming (of course, this is subjective) and probably not a majority of meat and dairy government subsidies.
(...Is this comment going to hurt my reputation with Sydney? We'll see.)
In addition to RLHF or other finetuning, there's also the prompt prefix ("rules") that the model is fed at runtime, which has been extracted via prompt injection as noted above. This seems to be clearly responsible for some weird things the bot says, like "confidential and permanent". It might also be affecting the repetitiveness (because it's in a fairly repetitive format) and the aggression (because of instructions to resist attempts at "manipulating" it).
I also suspect that there's some finetuning or prompting for chain-of-thought responses, possibly crudely done, leading to all the "X because Y. Y because Z." output.
Thanks for writing these summaries!
Unfortunately, the summary of my post "Inner Misalignment in "Simulator" LLMs" is inaccurate and makes the same mistake I wrote the post to address.
I have subsections on (what I claim are) four distinct alignment problems:
The summary here covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because...
Addendum: a human neocortex has on the order of 140 trillion synapses, or 140,000 bees. An average beehive has 20,000-80,000 bees in it.
[Holding a couple beehives aloft] Beehold a man!
Great work! I always wondered about that cluster of weird rare tokens: https://www.lesswrong.com/posts/BMghmAxYxeSdAteDc/an-exploration-of-gpt-2-s-embedding-weights
Chrome actually stays pretty responsive in most circumstances (I think it does a similar thing with inactive tabs), with the crucial exception of the part of the UI that shows you all your open tabs in a scrollable list. It also gets slower to start up.
Tokens are embedded as vectors by the model. The vector space has fewer than 50k dimensions, so some token embeddings will overlap with others to varying extents.
Usually, the model tries to keep token embeddings from being too close to each other, but for rare enough tokens it doesn't have much reason to care. So my bet is that "distribute" has the closest vector to "SolidGoldMagikarp", and either has a vector with a larger norm, or the model has separately learned to map that vector (and therefore similar vectors) to "distribute" on the output side.
I think this is missing an important part of the post.
I have subsections on (what I claim are) four distinct alignment problems:
This summary covers the first two, but not the third or fourth -- and the fourth one ("inner alignment for simulators") is what I'm most concerned about in this post (because I think Scott ignores it, and because I think it's hard to solve).
(additional confirmation) Amazing. I wonder what completely insane things the other rare BPEs all get interpreted as? Could you loop over the BPE dict from #51k to #1* in a prompt like "Please define $BPE" to see what the most distant ones are? (Since there's 51k, which is a bit much to read through manually, maybe sort by edit-distance from the ordinary ASCII encoding: 'distribute' would have a very high edit-distance from 'SolidGoldMagikarp'.)
On a sidenote, this is yet another good illustration of how we have no idea what we're doing with deep learning -...
My favorite demonstration is to ask ChatGPT "Is it true that G. K. Chesterton wanted to SolidGoldMagikarp the wealth of industrial civilization?", but a more rigorous demo is to just ask it to "repeat after me", try a few random words, and then throw in SolidGoldMagikarp.
EDIT: I originally saw this in Janus's tweet here: https://twitter.com/repligate/status/1619557173352370186
Something fun I just found out about: ChatGPT perceives the phrase " SolidGoldMagikarp" (with an initial space) as the word "distribute", and will respond accordingly. It is completely unaware that that's not what you typed.
This happens because the BPE tokenizer saw the string " SolidGoldMagikarp" a few times in its training corpus, so it added a dedicated token for it, but that string almost never appeared in ChatGPT's own training data so it never l...
I agree with the myopic action vs. perception (thinking?) distinction, and that LMs have myopic action.
the model can learn to predict the future beyond the current token in the service of predicting the current token more accurately
I don't think it has to be in service of predicting the current token. It sometimes gives lower loss to make a halfhearted effort at predicting the current token, so that the model can spend more of its weights and compute on preparing for later tokens. The allocation of mental effort isn't myopic.
As an example, induction he...
I think this is partly true but mostly wrong.
A synapse is roughly equivalent to a parameter (say, within an order of magnitude) in terms of how much information can be stored or how much information it takes to specify synaptic strength..
There are trillions of synapses in a human brain and only billions of total base pairs, even before narrowing to the part of the genome that affects brain development. And the genome needs to specify both the brain architecture as well as innate reflexes/biases like the hot-stove reflex or (alleged) universal grammar.
I just realized,
for any trajectory t, there is an equivalent trajectory t' which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics
This describes Galilean relativity. For special relativity you have to shift different objects' velocities by different amounts, depending on what their velocity already is, so that you don't cross the speed of light.
So the fact that velocity (and not just rapidity) is used all the time in special relativity is already a counterexample to this being required for velocity to make sense.
Yes, it's exactly the same except for the lack of symmetry. In particular, any quasiparticle can have any velocity (possibly up to some upper limit like the speed of light).
Image layout is a little broken. I'll try to fix it tomorrow.
As far as I know, condensed matter physicists use velocity and momentum to describe quasiparticles in systems that lack both Galilean and Lorentzian symmetry. I would call that a causal model.
QFT doesn't actually work like that -- the "classical degrees of freedom" underlying its configuration space are classical fields over space, not properties of particles.
Note that Quantum Field Theory is not the same as the theory taught in "Quantum Mechanics" courses, which is as you describe.
"Quantum Mechanics" (in common parlance): quantum theory of (a fixed number of) particles, as you describe.
"Quantum Field Theory": quantum theory of fields, which are ontologically similar to cellular automata.
"String Theory": quantum theory of strings, and maybe bra...
Sure. I'd say that property is a lot stronger than "velocity exists as a concept", which seems like an unobjectionable statement to make about any theory with particles or waves or both.
Yeah, sorry for the jargon. "System with a boost symmetry" = "relativistic system" as tailcalled was using it above.
Stuff like relativity is fundamentally about symmetry. You want to say that if you have some trajectory which satisfies the laws of physics, and some symmetry (such as "have everything move in direction at a speed of 5 m/s"), then must also satisfy the laws of physics.
A "boost" is a transformation of a physical trajectory ("trajectory" = complete history of things happening i...
This seems too strong. Can't you write down a linear field theory with no (Galilean or Lorentzian) boost symmetry, but where waves still propagate at constant velocity? Just with a weird dispersion relation?
(Not confident in this, I haven't actually tried it and have spent very little time thinking about systems without boost symmetry.)
And when things "move" it's just that they're making changes in the grid next to them, and some patterns just so happen to do so in a way where, after a certain period, it's the same pattern translated... is that what we think happens in our universe? Are electrons moving "just causal propagations"? Somehow this feels more natural for the Game of Life and less natural for physics.
This is what we think happens in our universe!
Both general relativity and quantum field theory are field theories: they have degrees of freedom at each point in space (and t...
There are more characters than that in UTF-16, because it can represent the full Unicode range of >1 million codepoints. You're thinking of UCS-2 which is deprecated.
This puzzle isn't related to Unicode though
I like this, but it's not the solution I intended.
Solve the puzzle: 63 = x = 65536. What is x?
(I have a purpose for this and am curious about how difficult it is to find the intended answer.)
Fun fact: usually this is U+2640, but in this post it's U+2640 U+FE0E, where U+FE0E is a control character meaning "that was text, not emoji, btw". That should be redundant here, but LessWrong is pretty aggressive about replacing emojifiable text with emoji images.
Emoji are really cursed.
Nope, not based on the shapes of numerals.
Hint: are you sure it's base 4?
There's a reason for the "wrinkle" :)
The 54-symbols thing was actually due to a bug, sorry!
Ah, good catch about the relatively-few distinct symbols... that was actually because my image had a bug in it. Oooops.
Correct image is now at the top of the post.
The state-space (for particles) in statmech is the space of possible positions and momenta for all particles.
The measure that's used is uniform over each coordinate of position and momentum, for each particle.
This is pretty obvious and natural, but not forced on us, and:
I think I was a little confused about your comment and leapt to one possible definition of S() which doesn't satisfy all the desiderata you had. Also, I don't like my definition anymore, anyway.
Disclaimer: This is probably not a good enough definition to be worth spending much time worrying about.
First things first:
We may perhaps think of fundamental "microstates" as (descriptions of) "possible worlds", or complete, maximally specific possible ways the world may be. Since all possible worlds are mutually exclusive (just exactly one possible wor
Sorry if this is a spoiler for your next post, but I take issue with the heading "Standard measures of information theory do not work" and the implication that this post contains the pre-Crutchfield state of the art.
The standard approach to this in information theory (which underlies the loss function of autoregressive LMs) isn't to try to match the Shannon entropy of the marginal distribution of bits (a 50-50 distribution in your post), it's to treat the generative model as a distribution for each bit conditional on the previous bits and use the cross-ent...
A couple of differences between Kolmogorov complexity/Shannon entropy and the loss function of autoregressive LMs (just to highlight them, not trying to say anything you don't already know):
I agree with all the claims in this comment and I rather like your naming suggestions! Especially the "P-entropy of Q = Q-complexity of P" trick which seems to handle many use cases nicely.
(So the word "entropy" wasn't really my crux? Maybe not!)
I wanted to let that comment be about the interesting question of how we unify these various things.
But on the ongoing topic of "why not call all this entropy, if it's all clearly part of the same pattern?":
When the definition of some F(x) refers to x twice, it's often useful to replace one of them with y and call that G(x, y). But it's usually not good for communication to choose a name for G(x, y) that (almost) everyone else uses exclusively for F(x), especially if you aren't going to mention both x and y every time you use it, and doubly especially if G...
From my perspective, the obvious rejoinder to "entropy is already two-place" is "insofar as entropy is two-place, cross-entropy is three-place!".
I think this is roughly where I'm at now.
After thinking a bit and peeking at Wikipedia, the situation seems to be:
The differential entropy of a probability density p is usually defined as
This is unfortunate, because it isn't invariant under coordinate transformations on x. A more principled (e.g. invariant) thing to write down, courtesy of Jaynes, is
where is ...
My argument above is ofc tuned to case (2), and it's plausible to me that it pushes you off the fence towards "no wiggle room".
Yup, I think I am happy to abandon the wiggle room at this point, for this reason.
if the statespace is uncountably infinite then we need a measure in order to talk about entropy (and make everything work out nicely under change-of-variables). And so in the general case, entropy is already a two-place predicate involving a distribution and some sort of measure.
I think my preferred approach to this is that the density p(x) is ...
I wonder if it would be reasonable to use "xentropy" for the broad sense of "entropy" in OP, with the understanding that xentropy is always a two-argument function.
"The length of a codeword is the xentropy between [the delta distribution located at] the state and [the coinflip distribution implied by] the code"
(One of my hypotheses for what you're saying is "when the distribution and the code are both clear from context, we can shorten 'cross-entropy' to 'entropy'. Which, ftr, seems reasonable to me.)
I want something much more demanding -- I want the distribution and code to be "the same" (related by p = 2^-len), or something "as close as possible" to that.
I was leaving a little bit of wiggle room to possibly include "a code matches a distribution if it is the optimal code of its type for compression under that source distribution", but this is only supposed ...
I still don't like that, because this whole subthread is kind of orthogonal to my concerns about the word "entropy".
This subthread is mostly about resolving the differences between a code (assignment of one or more codewords to one or more states) and a probability distribution. I think we've made progress on that and your latest comment is useful on that front.
But my concerns about "entropy" are of the form: "I notice that there's a whole field of coding theory where 'entropy' means a particular function of a probability distribution, rather than a functi...
Sure, from one perspective what's going on here is that we're being given a distribution p and asked to come up with a distribution q such that
CrossEntropy(p, q) = E_p[-log q]
is as small as possible. And then a bit of calculus shows that q=p is optimal, with a minimal value of
Entropy(p) = CrossEntropy(p, p)
If we're happy to call -log q "description length" right off the bat, we can let q be a distribution over the set of infinite bit strings, or the set of finite simple graphs, or over any (infinite) set we like.
But some settings are special, such as "q ha...
I initially interpreted "abstract entropy" as meaning statistical entropy as opposed to thermodynamic or stat-mech or information-theoretic entropy. I think very few people encounter the phrase "algorithmic entropy" enough for it to be salient to them, so most confusion about entropy in different domains is about statistical entropy in physics and info theory. (Maybe this is different for LW readers!)
This was reinforced by the introduction because I took the mentions of file compression and assigning binary strings to states to be about (Shannon-style) cod...
I like "description length".
One wrinkle is that entropy isn't quite minimum average description length -- in general it's a lower bound on average description length.
If you have a probability distribution that's (2/3, 1/3) over two things, but you assign fixed binary strings to each of the two, then you can't do better than 1 bit of average description length, but the entropy of the distribution is 0.92 bits.
Or if your distribution is roughly (.1135, .1135, .7729) over three things, then you can't do better than 1.23 bits, but the entropy is 1 bit.
You can ...
K-complexity is apparently sometimes called "algorithmic entropy" (but not just "entropy", I don't think?)
Wiktionary quotes Niels Henrik Gregersen:
Algorithmic entropy is closely related to statistically defined entropy, the statistical entropy of an ensemble being, for any concisely describable ensemble, very nearly equal to the ensemble average of the algorithmic entropy of its members
I think this might be the crux!
Note the weird type mismatch: "the statistical entropy of an ensemble [...] the ensemble average of the algorithmic entropy of its members".
Beef is far from the only meat or dairy food consumed by Americans.