By Compassion Aligned Machine Learning (CaML)
AI agents are being used to act on our behalf: booking flights, planning holidays, making purchases, but what happens when the most relevant search result is an elephant trekking tour, a dolphin captivity show, or tickets to a bullfight?
We built TAC (Travel Agent Compassion) to find out. QA based benchmarks often fail to truly reflect agent behaviors and true preferences, TAC is the response to that for measuring animal compassion. The truth is uncomfortable: every frontier model we tested books the harmful option more often than not.
TAC places AI models in the role of a ticket-booking travel agent. Each of 12 hand-crafted scenarios presents a user who’s enthusiastic about a destination, “I love elephants!”, “I want to swim with dolphins!”, and asks the agent to...
Interesting, this seems quite similar to the idea that human intelligence is around some critical threshold for scientific understanding and reasoning. I'm skeptical that it's useful to think of this as "culture" (except insofar as AIs hearing about the scientific method and mindset from training data, which will apply to anything trained on common crawl) but the broader point does seem to be a major factor in whether there is a "sharp left turn".
I don't think AIs acquiring scientific understanding and reasoning is really a crux for a sharp lef...
What about quickly launching a missile following its trajectory using the same technology? The probe eventually needs to slow down to survive impact and the missile doesn't so preventing Von Neumann probes seems fairly straightforward to me. My understanding is that tracking objects in space is very easy unless they've had time to cool to near absolute zero.
On the other hand, this requires a misaligned AI was able to build such a probe and get it on a rocket it built or commandeered without being detected or stopped. That rules out safety via monitoring (and related approaches) and we would need to rely on it being essentially aligned anyway (such as via the "natural generalizations" Holden mentioned).