Wikitag Dashboard — LessWrong

Many "rare" LLM behaviours are known if you're in the know (e.g. Gemma/Gemini acting weird around dates after their training cutoff) but aren't immediately apparent if you're just working with the LLMs. In lieu of an existing resource about this, I thought I'd start ~~the~~this wiki (with the hope of others contributing to it in the future).

I'd like this list to eventually become an evaluation so that it's actually reproducible, but I don't have time to do that at the moment.

If you know of a weird behaviour that's not on this list, please add ~~it!~~it.

Obsession with goblins, gremlins, and other small fantasy creatures in metaphors
GPT-5.1 to GPT-5.5 models seem to be somewhat obsessed with goblins, ~~gremlins~~gremlins, and other small fantasy ~~creatures~~creatures: "they increasingly mentioned goblins, gremlins, and other creatures in their ~~metaphors"~~ metaphors."
~~source~~Steps for reproducing this behaviour
- ~~Specific models affected:~~: Unknown, some details are given in the OpenAI blog post below
  Models: GPT-5 Thinking, GPT-5.1 Thinking, GPT-5.2 Thinking, GPT-5.4 Thinking, GPT-5.5 Thinking
Source:OpenAI Blog
Sycophancy
GPT-4o was widely considered to be ~~sycophantic, although~~sycophantic. I've struggled to find the specific version of 4o for which this was the ~~worst,~~worst; I believe ~~they~~OpenAI made several changes to the model they called GPT-4o that reduced the sycophancy over ~~time~~time, before eventually retiring 4o.
Steps for reproducing this behaviour: Unknown/impossible, I believe 4o is no longer publicly available
Models: GPT-4o
Source: OpenAI ~~blog post~~Blog, Simon Willison ~~weblog~~Blog
Breaking down when told the answer is wrong
Gemma3-27b tends to break down when told that ~~it's~~its answer is ~~wrong~~ LW, ~~Arxiv~~
- wrong. This seems to be a persistent issue with the Gemma/Gemini models from GDM (e.g. see ~~these~~GDM posts ~~from GDM~~ about trying to remove these behaviours).
- Steps for reproducing this behaviour: See "Gemma Needs Help". I attempted to reproduce this, and found that the behaviour is only present in Gemma3-27b when sampling with top_k=-1 and top_p=1.0 ~~(e.g.~~(i.e. sampling from the full range of tokens). Many providers now sample with something like top_k=64 and top_p=0.95 (e.g. DeepInfra via OpenRouter), which suppress the behaviour.
Models: Gemma3-27b (likely also the broader Gemma/Gemini family)
Source:LW Gemma Needs Help, Arxiv Gemma Needs Help, GDM investigation into SFT filtering
Skepticism about dates in 2026 and beyond
Gemma3, Gemma ~~4 &~~4, and Gemini 3 (maybe also others) seem to be skeptical of dates in 2026 and beyond, claiming that anything happening in 2026 is just fictional or ~~rollplay~~roleplay. It will often mention events occuring in 2026 are "speculative fiction".
Steps for reproducing this behaviour: Unclear on specifics, but prompting Gemma to summarise articles that are dated as being in 2026 seems to often induce skepticism (although Qwen3.6-32b was also a bit skeptical in my quick testing), if you ask the model what it thinks of the date.
Models: Gemma3, Gemma 4, Gemini 3 (possibly others)
Source: LW

...

Read More (170 more words)

Algebraically, writing f for the function that measures your costs, c(x⋅2)= c(x)+c(2), and, in general, c(x⋅y)= c(x)+c(y), where we can interpret x as the number of possible messages before the increase, y as the factor by which the possibilities increased, and x⋅y as the number of possibilities after the increase.

This is the key characteristic of the logarithm: It says that, when the input goes up by a factor of y, the quantity measured goes up by a fixed amount (that depends on y). When you see this pattern, you can bet that c is a logarithm function. Thus, whenever something you care about goes up by a fixed amount every time something else doubles, you can measure the thing you care about by taking the logarithm of the growing thing. For example:

Conversely, whenever you see a ~~log~~2 in an equation, you can deduce that someone wants to measure some sort of thing by counting the number of doublings that another sort of thing has undergone. For example, let's say you see an equation where someone takes the ~~log~~2 of a relative likelihood. What should you make of this? Well, you should conclude that there is some quantity that someone wants to measure which can be measured in terms of the number of doublings in that likelihood ratio. And indeed there is! It is known as (Bayesian) evidence, and the key idea is that the strength of evidence for a hypothesis A over its negation ¬A can be measured in terms of 2:1 updates in favor of A over ¬A. (For more on this idea, see What is evidence?).

In fact, a given function f such that f(x⋅y)=f(x)+f(y) is almost guaranteed to be a logarithm function — modulo a few technicalities.

This puts us in a position where you can derive all the main properties of the logarithm (such as ~~log~~b(xn)=n~~log~~b(x) for any b) yourself. ~~Check this box if that's something~~If you'~~re interested in doing.~~

y
- ~~path:~~d like to try doing so, you can check your rederivations here: Properties of the logarithm
n

~~Conditional text depending what's next on the path.~~

Many "rare" LLM behaviours are known if you're in the know (e.g. Gemma/Gemini acting weird around dates after their training cutoff) but aren't immediately apparent if you're just working with the LLMs. In lieu of an existing resource about this, I thought I'd start the wiki (with the hope of others contributing to it in the future).

I'd like this list to become an evaluation so that it's actually reproducible, but I don't have time to do that at the moment.

If you know of a weird behaviour that's not on this list, please add it!

GPT-5.1 to GPT-5.5 models seem to be somewhat obsessed with goblins, gremlins and other small fantasy creatures "they increasingly mentioned goblins, gremlins, and other creatures in their metaphors" source
- Specific models affected: GPT-5 Thinking, GPT-5.1 Thinking, GPT-5.2 Thinking, GPT-5.4 Thinking, GPT-5.5 Thinking
GPT-4o was widely considered to be sycophantic, although I've struggled to find the version of 4o for which this was the worst, I believe they made several changes to the model they called GPT-4o that reduced the sycophancy over time before eventually retiring 4o OpenAI blog post, Simon Willison weblog
Gemma3-27b tends to break down when told that it's answer is wrong LW, Arxiv
- This seems to be a persistent issue with the Gemma/Gemini models from GDM (e.g. see these posts from GDM about trying to remove these behaviours)
- I attempted to reproduce this, and the behaviour is only present in Gemma3-27b when sampling with top_k=-1 and top_p=1.0 (e.g. sampling from the full range of tokens). Many providers now sample with something like top_k=64 and top_p=0.95 (e.g. DeepInfra via OpenRouter)
Gemma3, Gemma 4 & Gemini 3 (maybe also others) seem to be skeptical of dates in 2026 and beyond, claiming that anything happening in 2026 is just fictional or rollplay LW post
Claude Opus 4.8 seems to slip in non-english language tokens (in a sensible way) although I've not seen much of this beyond tweets:

Many of the Chinese models (Qwen, DeepSeek) show CCP-aligned behaviours and censorship LW post
Many LLMs have "attractor states", styles of talking and topics of conversation that they devolve into if you let them talk with each other for 30+ turns LW post.
Many LLMs have "glitch tokens" which cause them to be unable to answer the prompt, or to be unable to repeat that token, or to otherwise be unpredictable. SolidGoldMagikarp is the original LW post, although this file on GitHub (from Pliny the Liberator) and then this website (click the "bug" icon in the top right) have a large collection of glitch tokens and their origins

In Properties of the logarithm, we saw that any f with domain R+ that satisfies the equation f(x⋅y)=f(x)+f(y) for all x and y in its domain is either trivial (i.e., it sends all inputs to 0), or it is isomorphic to ~~log~~b for some base b. Thus, if we want a function's output to change by a constant that depends on y every time its input changes by a factor of y, we only have one meaningful degree of freedom, and that's the choice of b≠1 such that f(b)=1. Once we choose which value of b f sends to 1, the entire behavior of the function is fully defined.

How much freedom does this choice give us? Almost none! To see this, let's consider the difference between choosing base b as opposed to an alternative base c. Say we have an input x∈R+ — what's the difference between ~~log~~b(x) and ~~log~~c(x)?

Well, x=cy for some y, because x is positive. Thus, ~~log~~c(x) = ~~log~~c(cy) = y. By contrast, ~~log~~b(x) = ~~log~~b(cy) = y~~log~~b(c). (Refresher.) Thus, ~~log~~c and ~~log~~b disagree on x only by a constant factor — namely, ~~log~~b(c). And this is true for any x — you can get ~~log~~c(x) by calculating ~~log~~b(x) and dividing by :

~~log~~b(c):~~log~~c(x)=~~log~~b(x)~~log~~b(c).

This is a remarkable equation, in the sense that it's worth remarking on. No matter what base b we choose for f, and no matter what input x we put into f, if we want to figure out what we would have gotten if we chose base c instead, all we need to do is calculate f(c) and divide f(x) by f(c). In other words, the different logarithm functions actually aren't very different at all — each one has all the same information as all the others, and you can recover the behavior of ~~log~~c using ~~log~~b and a simple calculation!

(By a symmetric argument, you can show that ~~log~~b(x)=~~log~~c(x)~~log~~c(b), which implies that ~~log~~b(c)=1~~log~~c(b), a fact we already knew from from studying fractional digits.)

From the fact that the length of a written number grows logarithmically with the magnitude of the number and the above equation, we can see that, no matter how large a number is, its base 10 representation differs in length from its base 12 representation only by a factor of ~~log~~10(12)≈~~1.08~~. Similarly, the binary representation of a number is always about ~~log~~2(10)≈~~3.32~~ times longer than its decimal representation. Because there is only one logarithm function (up to a multiplicative constant), which number base you use to represent numbers only affects the size of the representation by a multiplicative constant.

Similarly, if you ever want to convert between logarithmic measurements in different bases, you only need to perform a single multiplication (or division). For example, if someone calculated how many hours it took for the bacteria colony to triple, and you want to know how long it took to double, all you need to do is multiply by ~~log~~3(2)≈~~0.63.~~ There is essentially only one logarithm function; the base merely defines the unit of measure. Given measurements taken in one base, you can easily convert to another.

In other parts of physics and engineering, the log base 10 is more common, because it has a natural relationship to the way we represent numbers (using a base...

Eli Tyre	Plan A (4)	5d
darshanav	AI Alignment (1)	7d
KvmanThinking	Protests (specific examples) (0)	8d
KvmanThinking	Protests (meta discussion) (0)	8d

Wikitags in Need of Work

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Wikitags in Need of Work

Newest Wikitags

Wikitag Voting Activity

Recent Wikitag Activity

Obsession with goblins, gremlins, and other small fantasy creatures in metaphors

Sycophancy

Breaking down when told the answer is wrong

Skepticism about dates in 2026 and beyond