The funniest GPT2 neuron I've found: I am presently exploring GPT2-XL layer 24 neurons on OpenAI's neuron viewer for an interpretability project. Last night I found a particularly funny neuron, 23:3024 (note layer indexing starts at 0).
It activates on phrases about cooking instructions like "whip the double cream, vanilla seeds and icing sugar together" and phrases from erotic stories like "He wiped an abundance of Trent's still-warm sperm off," It is literally a food-porn neuron.
I believe the concepts are united primarily through the word "cream" as you may note from the high activation phrases I just quoted. It is like quantifiable evidence that thoughts about food and thoughts about sex are highly intertwined in a generalized hedonistic concept.
You need to click the "show more" for the high activations to see the erotic story segments; the topmost are mostly cooking instruction related. Incidentally, in the low-dimensional embedding I made there is another erotic stories neuron nearby in the embedding space. If you look carefully they are so similar you may notice there is even a repeated high activation phrase between these two on neuron viewer.
I am considering a longer write-up of my neuron exploration project. I made a similarity embedding for the deepest layer of neurons which I am "walking" through in order to aid interpretability. With the embedding an investigator can see the embedded space area around the neuron and get clues from easier to interpret neurons for a difficult one.
These two erotic-stories neurons are a good example of how the similarity space can inform interpretation. 4120 is more fully about erotic stories and the polysemanticity of 3024 may have thrown an investigator off (possibly deciding the neuron was only about cooking with some weird unexplained aspect) if the erotic content wasn't confirmed by 4120 being nearby. Ultimately I am working on developing an intuition about the directionality in the embedding; I want to know the shape of conceptual space.
It is fairly grueling work. Any encouragement or signs of interest would be appreciated.
Reddit Mod is a "job" AI will likely replace.
There currently is a kerfuffle over the actions of a moderator of r/art which seem to be a shining example of the petty and arbitrary cruelty in which Reddit mods are free to indulge as unpaid volunteers. This made me think that Reddit Mods are very similar to Dungeon Masters in Hasbro's Dungeons and Dragons game, although why may not be clear to you readers.
Hasbro's DnD seems to be on a mission to alienate DMs from DnD in order to more directly access the players, which Hasbro llikely views as their real customers. It sees DMs as unpaid volunteers which intermediate the use of their product by consumers. In the past (under TSR) this was seen as a beneficial use of free labor but under the corporate ownership of Hasbro DMs were seen as extracting value from the role of player intermediary which could theoretically go to Hasbro's bottom line. Hasbro until recently had been pursuing an online strategy which would move the pen-and=paper RPG into an online platform which likely eventually would obviate the need for DMs. In an online platform directly controlled by Hasbro not a single dollar of Hasbro's money would be lost to a DM's horny depiction of a female character, the DM's unique approach to personal hygiene, nor the arbitrary and unfair cruelty of a power=tripping DM.
Outside of the online platform Hasbro has also moved to shift power away from DMs and towards players. Mechanically there are things like the new death save system which makes a player character death much more mechanically difficult than in past editions of the game. The caveats in old editions that the written rules are just suggestions and that the rulings of the DM are final have been removed. Recently there has been rumors of a move to disallow the DM screen, a folding barrier which allowed the DM to conceal things like maps of areas yet unseen by players or detailed character sheets for NPCs. All of this seems designed to dimish the role of the DM.
And a similar effort to more directly control the expereince of end-users likely will happen at Reddit. This latest controversy is over a mod not only permanently banning a professional artist from r/art for an offhand comment that prints of the art he posted were availible but also erasing his entire r/art post history simply for offering an apology and asking to be reinstated. This is pretty representative of the arbitrary nature of Reddit mod power and likely has been repeated many times in cases where the user did not have access to a large audience on a different platform, as did the artist in this case.
Reddit will likely recognize that an AI facilitated system could let them hire a handful of paid moderators who will be committed to act in a fair and professional manner, disintermediating the existing volunteer moderators. Reddit is already a bulk of the training data for most LLMs and Reddit itself already has a massive dataset of moderator actions which could be used for training. All it would likely take at this point is a major controversy or perhaps a string of moderate controversies to motivate them to implement a new system.
" Recently there has been rumors of a move to disallow the DM screen, a folding barrier which allowed the DM to conceal things like maps of areas yet unseen by players or detailed character sheets for NPCs. "
Interestingly, co-creator of D&D Dave Arnesson experimented with a DM screen so big he was invisible to the players to see if that increased their immersion. He saw it as a tool for the players. He also didn't always give players their character sheets, for the same reason, always looking for ways to make the player have a better experience. It is interesting that everything Hasbro is exploring they are seeing as having an opposite effect as the creators.
An interesting overlap between your two positions is archived on the page with the most downvotes of all time.
https://www.reddit.com/r/ListOfComments/wiki/downvoted/
#3 is Roll20 the largest VTT virtual tabletop for people to play games online. The short version of how it be the largest scandal in reddit history as ranked by downvotes is that the owner of roll20 was a mod on his own company's channel in violation of reddit policies, and banned people for criticism of his products. When someone actually tried to help him and got banned the owner made a post insulting him for asking for help. Reddit cleaned up the situation after it got this big. But the wider community never heard about it and didn't care. Nothing changed long term.
Based on this, I really don't think users care about how reddit is managed in any meaningful way. A small loud minority maybe, but I give it >67% the average user will not care or even learn that AI has taken the mod jobs. Reddit will just continue on like nothing happened.
Maybe, but it suffers from both ends of the legitimacy problem. At one extreme, some people will never accept a judgement from an LLM as legitimate. At the other extreme, people will perceive LLMs as being "more than impartial" when, in truth, they are a different kind of arbitrary.
Valentine's Day kills New Years Resolutions. Don't be a cliche.
While I saw a couple articles this year talking about how January 10th is quitting day, and it is true many people quit their new year's resolutions in January, I don't think these early quits are really representative of what we are talking about. These people didn't really get started. According to some analysis of Reddit activity I did a few years ago New Year's Resolutions stop when people start making plans for Valentine's Day
In particular I show that the level of activity in fitness related subreddits exists within a narrow band for most of the year with two notable periods of elevated actvity: July and January through early February. Valentine's Day is the day the activity drops back into the normal range from being elevated.
And this makes a ton of intuitive sense. Valentine's day is an emotional rollercoaster for many people whether single or in a relationship. It is a huge distraction, probably the ideal regularly scheduled distraction (a massive public holiday centered around possibly the most distracting element of human existence). It seems pretty natural that many people will get dissappointed or distracted from slow progress on long term goals when this holiday comes around.
One of the more interesting GPT2-XL neurons is this one for magic and fantasy. I'm exploring GPT2-XL layer 24 neurons using OpenAI's neuron viewer and a low-dimensional embedding as a "map" of the conceptual space. 23:2721 is an interesting neuron that seems to trigger for situations involving fictional supernatural powers, particularly flavored for fantasy. "Lose yourself in a tale of magic and wonder," is a good example of a high activation phrase; some directly mention figures from historical myth like "Grendel and his mom come busting in, forcing Beowulf to retaliate. Odysseus is just trying to get home,"
The top activation phrase for 23:2721 seems to be a discussion of the troll in Ernest: Scared Stupid which I find pretty funny.
The embedding I am using helps inform interpretation on nearby neurons in the embedding space. 23:2721 was near 23:1714 which discusses less fantastical fictional properties like Lena Dunham's Girls or The Big Lebowski and neuron 23:6163 which discusses real historical events and figures like The Mayflower or Eisenhower.
GPT2's "devotion" neuron may be pertinent to safety issues. I'm exploring GPT2-XL in OpenAI's neuron viewer and 23:5243 is a neuron that basically captures the actions and hopes a person would have toward a loved one, likely a spouse. Activation on phrases like "stand up for," "look out for," "protect and enable," "give comfort to," and individual words like "strengthen," "validating," "reward," or "benefit" seem to capture the ideal relationship between two mutually respecting individuals.
Particularly the incorporation of the word "enable" along with "protect" seems to be a hopeful indication that the machine has a human-like wholistic conception of this idea rather than an unbalanced over-protectiveness as feared in some pessimistic scenarios. A neuron like this also may play some role in sycophancy; validating and strengthening the user is pretty precisely sycophancy behavior.
This is part of a project where I am using a low dimensional embedding of GPT2-XL's 24th layer to assist in interpreting the neurons. The embedding aids in interpretation for more difficult neurons, but this one was pretty straightforward. That said, the embedding does make me suspect this neuron is at least partially capturing romantic relationships because it is near to the erotic story neurons I discussed in a quick-take a couple days ago.
I am trying to gauge interest in a longer frontpage post which will discuss larger findings and methodological details so any encouragement or expression of interest would be appreciated.