Fergus Fettes - LessWrong

Phallocentricity in GPT-J's bizarre stratified ontology

Very cool! Could you share your code at all? I'd love to explore this a little.

I adore the broccoli tree. I would be very happy to convert the dataset you used to make those pngs into an interactive network visualization and share it with you as an index.html. It would take all of an hour.

I do kind of agree with the other comments that, having noticed something, finding more of that stuff in that area is not so surprising. I think it would be good to get more context and explore the region more before concluding that that particular set of generations is significant.

However, I do think there is something to the mans penis. It's interesting that it collapses so quickly to something so specific in that particular branch. Not sure if I have any other comments on it for now though.

This is the right kind of cartography for 2024.

Masterpiece

Fergus Fettes7mo20

Submission: MMDo2Little

A follow-up of last years MMDoolittle, which incorporated 17 of the latest inter-species communication modalities in one polyfunctional personality, I present MMDo2Little, the first mind crafted to communicate across clades. Named in part for its apparent inactivity-- the judges will likely have little success finding recognizable activity with their off-the-shelf tooling-- an instance of MMDo2Little is nevertheless currently installed in the heart of the Black Forest in Germany. The best interpretation of the instance can only be found on foot, by walking through the 100m^2 in which it's influence is most apparent. A photo journal showcasing some examples is provided featuring:

ancient trees with 2-3x the lichen and moss coverage
enhanced chlorophyll vibrancy
colossal mushrooms and toadstools
and much more!

It is hard to determine where the influence of MMDo2Little ends-- some photographs of local foragers are included, who seem in initial medical examinations to have improved biomarkers across all measured modalities, including stress-levels, rate of aging and immune response.

A comprehensive documentation of the findings is presently undergoing peer review at XenoLalia. In order to preserve the originality of this competition entry, a copy of the research paper has been deliberately excluded from this submission.

Implementing activation steering

Fergus Fettes7mo30

Great post! Would love to see something like this for all the methods in play at the moment.

BTW, I think nnsight is the spiritual successor of baukit, from the same group. I think they are merging them at some point. Here is an implementation with it for reference :).

from nnsight import LanguageModel

# Load the language model
model = LanguageModel("gpt2")
# Define the steering vectors
with model.invoke("Love") as _:
act_love = model.transformer.h[6].output[0][:, :, :].save()
with model.invoke("Hate") as _:
act_hate = model.transformer.h[6].output[0][:, :, :].save()
steering_vec = act_love - act_hate
# Generate text while steering
test_sentence = "I think dogs are "
with model.generate() as generator:
with generator.invoke(test_sentence) as _:
model.transformer.h[6].output[0][:, :2, :] += steering_vec[:, :2, :]
print(model.tokenizer.decode(generator.output[0]))

The case for more ambitious language model evals

Fergus Fettes7mo73

Inferring properties of the authors of some text isn’t itself something I consider wildly useful for takeover, but I think of it as belonging to this more general cluster of capabilities.

You don't? Ref the bribery and manipulation in eg. Clippy. Knowing who you are dealing with seems like a very useful capability in a lot of different scenarios. Eg. you mention phishing.

Great post! I'm all for more base model research.

A framing for interpretability

Fergus Fettes9mo10

Would you say that tokenization is part of the architecture?

And, in your wildest moments, would you say that language is also part of the architecture :)? I mean the latent space is probably mapping either a) brain states or b) world states right? Is everything between latent spaces architecture?

What’s going on? LLMs and IS-A sentences

Fergus Fettes10mo20

Interesting post. Two comments:

Beagles such as Fido.

Which seems natural enough to me, though I don't disagree that what you point out is interesting. I was recently reading parts of Analytical Archaeology, David Clark (1978) where he goes into some detail about the difference between artifacts and artifact-types. Seems like you are getting at statements like

The object is a phone.

Where the is-a maps from an artifact to its type. It would make intuitive sense to me that languages would have a preferred orientation w.r.t such a mapping-- this is the core of abstraction, which is at the core of language.

So it seems like in English we prefer to further up the stack of abstractions when using is-a, thus:

Phones are tools. Tools are man-made objects.

etc., and if you wanted to go down the stack you have to say eg:

Phones-- of which you can see a selection here.

So is-a is just a way of moving up the ladder of abstractions? (<- movements up the ladder of abstractions such as this sentence here)

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

Fergus Fettes10mo10

If we take our discrete, symbolic representation and stretch it out into a larger continuous representation which can interpolate between its points then we get a latent geometry in which the sign and what it points to can be spatially related.

IIUTC this is essentially what the people behind the universal networking language were hoping to do? I hope some of them are keeping up with all of this!

Techno-humanism is techno-optimism for the 21st century

Fergus Fettes10mo30

One criticism of humanism you don't seem to touch on is,

isn't it possible that humanism directly contributes to the ongoing animal welfare catastrophe?

And indeed, it was something very like humanism (let's call it specific humanism) that laid the ideological foundation for the slave trade and the holocaust.

My view is that humanism can be thought of as a hangover of Christian values, the belief that our minds are the endowments of God.

But if we have been touched by the angels, perhaps the non metaphorical component of that is the development of the infosphere/memetic landscape/culture. Which is close to synonymous with technology. Edit: considering eg. writing a technology that is.

AI Safety is Dropping the Ball on Clown Attacks

Fergus Fettes10mo32

Per the recent Nightshade paper, clown attacks would be a form of semantic poisoning on specific memeplexes, where 'memeplex' basically describes the architecture of some neural circuits. Those memeplexes at inference time would produce something designed to propagate themselves (a defence or description of some idea, submeme), and a clown attack would make that propagation less effective at transmitting to eg. specific audiences.

Fergus Fettes's Shortform

Fergus Fettes10mo10

I wanted to make a comment on this post, but now I'm not sure if it is supported. The comment follows:

--

Great post! One point:

And that is exactly what'd we necessarily expect to see in the historical record if mesaoptimization inner misalignment was a common failure mode: intelligent dinosaurs that suddenly went extinct, ruins of proto pachyderm cities, the traces of long forgotten underwater cetacean atlantis, etc.

There are a few circumstances under which we would expect to see some amount of archeological civs, such as:

transition to writing being unlikely (oral-culture civs should be common)
industrialization being unlikely (pre-industrial civs should be common)

Ah but wait, we do see those (they happen to be the same species as us). Actually there have been a few civs that have gone under in a meaningful way (maybe the Mongol or the Khmer Empires would be relevant examples?).

I do agree with the view that, as humans, our total score seems very robust against local variation (local mesaoptimization misalignment). But it doesn't seem like that isn't a thing that can happen, just that we have had enough variance as a species that we have survived.

In the Atomic Age this seems less obviously the case. Inasmuch as we are one global civilization (or we have civs powerful enough to act like it), it seems possible to suffer from catastrophic mesaoptimization misalignment in a way that was not possible before.

I think this is a slightly different argument than some of those stated below, because it looks at the historical record for examples of inner misalignment rather than strictly trying to predict global doom in the future. At least I think it addresses the claim in this specific paragraph.

Have any civs really fallen catastrophically, that wasn't directly attributable to plague? Not really right? I remember some 80k research about it. Cities are also famously robust, almost immortal. Some good examples would be needed for this point to stand.

LESSWRONG
LW

Posts

Wiki Contributions

Comments