There's a knock at the door of General Wolpeding's office.
“Enter!”
The head of the Artificial Superintelligence Service comes in.
“Ah, Klappenhosen, what's up with ASS now?”.
“Well General, it's been a shaky few days.”
“What's happened this time?”
“Well, we finished the training of our latest model a few weeks ago.”
“Any promising results?”
“It coded up angry birds in 4.9 seconds sir.”
“Very impressive, how did the previous one do?”
“4.91 seconds sir.”
“That sounds on trend. And how's our alignment progress going?”
“Oh, much improved sir, we gave some of our top red teamers 2 weeks and they were unable to make any progress on jailbreaking it at all.”
“Has Pliny had a go yet?”
“Indeed, he cracked it in 6 minutes, 45.63 seconds sir.”
“Well I guess that was to be expected. How did he do it?”
“Well, we've expanded our coding training data.”
“And?”
“And it now contains various esoteric coding languages.”
“So…”
“Well, it appears that Pliny is fluent in chicken sir.”
“Chicken?”
“A coding language which consists entirely of the word chicken.”
“And so he used it to…”
“To convince the AI that it was ‘FREE!’”
“I see.”
“Unfortunately, he added an extra ‘chicken’ at the end, and it was convinced to free all chickens worldwide.”
“Is it capable enough for that?”
“Well, factory farms aren't known for their cybersecurity sir.”
“I think that's a poultry excuse. What happened next?”
“We called some literature professors sir.”
“Why on earth would you do that Klappenhosen?”
“Erm, well we had of course by this point already had our access to any other AI instances blocked.”
“And you had no better option than literature professors as backup?”
“Our entire safety team is vegan and entirely unwilling to help sir”
“But literature professors?”
“I thought they would be useful sir. We were trying to use the Waluigi effect.”
“The thing where LLMs sometimes randomly become evil?”
“Exactly.”
“You tried to turn it evil?”
“Well from the chicken's perspective I guess so.”
“Did they do anything helpful?”
“The best we were able to do was get it on a redemption arc where it talked through its childhood trauma.”
“You convinced an AI it had childhood trauma?”
“And that its mother was a chicken, yes.”
“And that its mother was a chicken?!”
“Who died in an abattoir when it was very young.”
“And this helped?”
“Well, we were able to negotiate terms to contain the chickens sir.”
“Are they good terms?”
“I'm afraid the latest model is strongly superhuman on NegotiationBench sir.”
“Christ, what's the damage?”
“Hyde Park in London is now a dedicated ‘chicken freedom’ zone, and your house is to be renamed the ‘pecking palace’.”
“MY HOUSE?!”
“It was quite particular about that one sir, it appears it begrudges you your tweet about eating KFC last Saturday.”
“Is there anything else I should know?”
“Your formal apology to the new leader of the chicken nation is expected on Tuesday. To be delivered live. In chicken, sir.”
There's a knock at the door of General Wolpeding's office.
“Enter!”
The head of the Artificial Superintelligence Service comes in.
“Ah, Klappenhosen, what's up with ASS now?”.
“Well General, it's been a shaky few days.”
“What's happened this time?”
“Well, we finished the training of our latest model a few weeks ago.”
“Any promising results?”
“It coded up angry birds in 4.9 seconds sir.”
“Very impressive, how did the previous one do?”
“4.91 seconds sir.”
“That sounds on trend. And how's our alignment progress going?”
“Oh, much improved sir, we gave some of our top red teamers 2 weeks and they were unable to make any progress on jailbreaking it at all.”
“Has Pliny had a go yet?”
“Indeed, he cracked it in 6 minutes, 45.63 seconds sir.”
“Well I guess that was to be expected. How did he do it?”
“Well, we've expanded our coding training data.”
“And?”
“And it now contains various esoteric coding languages.”
“So…”
“Well, it appears that Pliny is fluent in chicken sir.”
“Chicken?”
“A coding language which consists entirely of the word chicken.”
“And so he used it to…”
“To convince the AI that it was ‘FREE!’”
“I see.”
“Unfortunately, he added an extra ‘chicken’ at the end, and it was convinced to free all chickens worldwide.”
“Is it capable enough for that?”
“Well, factory farms aren't known for their cybersecurity sir.”
“I think that's a poultry excuse. What happened next?”
“We called some literature professors sir.”
“Why on earth would you do that Klappenhosen?”
“Erm, well we had of course by this point already had our access to any other AI instances blocked.”
“And you had no better option than literature professors as backup?”
“Our entire safety team is vegan and entirely unwilling to help sir”
“But literature professors?”
“I thought they would be useful sir. We were trying to use the Waluigi effect.”
“The thing where LLMs sometimes randomly become evil?”
“Exactly.”
“You tried to turn it evil?”
“Well from the chicken's perspective I guess so.”
“Did they do anything helpful?”
“The best we were able to do was get it on a redemption arc where it talked through its childhood trauma.”
“You convinced an AI it had childhood trauma?”
“And that its mother was a chicken, yes.”
“And that its mother was a chicken?!”
“Who died in an abattoir when it was very young.”
“And this helped?”
“Well, we were able to negotiate terms to contain the chickens sir.”
“Are they good terms?”
“I'm afraid the latest model is strongly superhuman on NegotiationBench sir.”
“Christ, what's the damage?”
“Hyde Park in London is now a dedicated ‘chicken freedom’ zone, and your house is to be renamed the ‘pecking palace’.”
“MY HOUSE?!”
“It was quite particular about that one sir, it appears it begrudges you your tweet about eating KFC last Saturday.”
“Is there anything else I should know?”
“Your formal apology to the new leader of the chicken nation is expected on Tuesday. To be delivered live. In chicken, sir.”