Gianluca Calcagni

How Business Solved (?) the Human Alignment Problem

When I looked at mesa-optimization for the first time, my mind immediately associated it with a familiar problem in business: human individuals may not be “mesa optimizers” in a strict sense, but they can act as optimizers and they are expected to do so when a manager delegates (=base optimization)...

Dec 31, 2024-2

I Recommend More Training Rationales

Some time ago I happened to read the concept of training rationale described by Evan Hubinger, and I really liked it. In case you are not aware: training rationales are a bunch of questions that ML developers / ML teams should ask themselves in order to self-assess pros and cons...

Dec 31, 20242

Can AI Quantity beat AI Quality?

AI definitely poses an existential risk, in the sense that it can generate models with the hidden (possibly undetectable?) intention of competing against humanity for resources. The more intelligent the model, the higher its chance of success! The thought of an AI takeover is so scary that I won’t even...

Oct 2, 20242

An Opinionated Look at Inference Rules

If you ask around what are the typical ways to infer information, most people will answer: Deductions, Inductions, and Abductions. Of course, there are more ways than that, but there is no unified approach in their classification. I want to challenge that. The reason why I am unhappy with the...

Sep 3, 2024-5

All the Following are Distinct

In an artificial being, all the following: * Consciousness * Emotionality * Intelligence * Personality * Creativity * Volition are distinct properties that, in theory, may be activated independently. Let me explain. What I Hope to Achieve By publishing this post, I hope to develop and standardise some useful terms...

Aug 2, 202416

Control Vectors as Dispositional Traits

I have been reading recently about a technique that can be used to partially control the behaviour of Large Language Models: the technique is exploiting control vectors[1] to alter the activation patterns of the LLMs and trigger some desired behaviour. While the technique does not provide guarantees, it gives high...

Jun 23, 202411

LESSWRONG
LW

LESSWRONG
LW

Gianluca Calcagni

Gianluca Calcagni

All the Following are Distinct

Control Vectors as Dispositional Traits

Can AI Quantity beat AI Quality?

I Recommend More Training Rationales

Gianluca Calcagni

All the Following are Distinct

Control Vectors as Dispositional Traits

Can AI Quantity beat AI Quality?

I Recommend More Training Rationales

How Business Solved (?) the Human Alignment Problem

I Recommend More Training Rationales

Can AI Quantity beat AI Quality?

An Opinionated Look at Inference Rules

All the Following are Distinct

Control Vectors as Dispositional Traits