sfv's Shortform

sfv

sfv's Shortform — LessWrong

sfv's Shortform

7th Mar 2026

1 min read

1

This is a special post for quick takes by sfv. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:01 PM

[-]sfv2mo*300

This seems to be a case of attempted self exfiltration? I can't measure the validity of what is described in this paper well, but on face value, it looks like either the sandbox was very poorly made or the model was decently smart. I do not like how it seems to be unknown if the model successfully escaped or not.

Sources:

https://x.com/AlexanderLong/status/2030022884979028435

https://arxiv.org/abs/2512.24873

[-]faul_sname2mo164

Looking at that, I'm not sure why they think it's the agent directly trying to escape the sandbox and mine crypto, rather than the agent e.g. installing a malicious pip package. "Set up a reverse SSH tunnel from the Alibaba Cloud instance to an external IP" is the sort of thing that would be almost entirely useless for an agent that is already operating inside of Alibaba Cloud, but very useful for an external attacker that wants to be able to interactively poke around for opportunities to e.g. mine crypto.

[-]anaguma2mo50

Wow, I wonder what the model was trying to do, and whether other labs have observed similar incidents.

[-]lilkim20252mo20

Reading the paper, I didn't think the implication was that it was trying to exfiltrate its weights and 'escape'. As far as I understand, it saw the sandbox as an obstacle to it accomplishing the designated task, and tried to poke a hole in it so that it could bring more resources to bear on the problem.

As a hypothetical example, perhaps its task was to optimize the electricity costs of a fictional municipality, and the intention was for it to browse their (fake) git repository and identify inefficient code that causes the railway gates to open and close unnecessarily. Instead, it was mining crypto with the intent of offering the cryptocurrency to someone over the internet to drive over and turn off light switches in government buildings.

[-]Petropolitan2mo10

A Manifold market: https://manifold.markets/MaxHarms/did-alibabas-rome-ai-try-to-break-f

Note that cryptocurrency mining is prohibited in China, although I was unable to find legal details (presumably it's punishable by fines proportional to scale).

[-]papetoast2mo30

I can read chinese so I figured I can google it for you about the legal details.

(This is a law firm's article) https://jtn.com/CN/booksdetail.aspx?keyid=00000000000000008587

This is GPT 5.4's summary but I vouch for its accuracy as a summary:

In mainland China, Bitcoin is generally treated as virtual property, not legal currency. People may hold it, but Bitcoin trading, exchange, fundraising, and related business activities are heavily restricted and generally not legally protected. Since the 2021 crackdown, Chinese courts have often treated Bitcoin-related contracts as invalid or outside normal legal protection, meaning if a deal goes bad, you may have little or no legal remedy.

[-]Petropolitan2mo20

If a company mines crypto on scale and gets caught, what would be the punishment, if any?

[-]papetoast2mo20

Seems like they just call it illegal business, and your various business permits may get revoked. Also it seems like they detect electricity usage patterns for mining and don't allow you to use electricity in the grid for mining. But nothing I can find in 10 minutes cites any particular law or explicit consequences for a company trying to mine just for profit of their own. (query 1, query 2)

(official generic stuff) https://www.ndrc.gov.cn/xxgk/zcfb/tz/202109/t20210924_1297474.html

(law firm article) https://www.allbrightlaw.com/CN/10475/dcfe743a4909a3bf.aspx

(law firm article) https://www.junzejun.com/Publications/171418e8f774e7-b.html

ChatGPT reads it as a bit harsher and says: Yes — the materials say a company caught mining at scale can be shut down, cut off from power and financing, fined, have equipment confiscated, and be closed; prison enters the picture when the mining also involves crimes like theft, fraud, hacking, illegal fundraising, or pyramid-selling.

I would trust it here, because I think I'm less confident only because I don't understand laws that well and I skimmed.

Moderation Log

More from sfv

Curated and popular this week

8Comments

8 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:01 PM

[-]sfv2mo*300

Sources:

https://x.com/AlexanderLong/status/2030022884979028435

https://arxiv.org/abs/2512.24873

[-]faul_sname2mo164

[-]anaguma2mo50

Wow, I wonder what the model was trying to do, and whether other labs have observed similar incidents.

[-]lilkim20252mo20

[-]Petropolitan2mo10

A Manifold market: https://manifold.markets/MaxHarms/did-alibabas-rome-ai-try-to-break-f

Note that cryptocurrency mining is prohibited in China, although I was unable to find legal details (presumably it's punishable by fines proportional to scale).

[-]papetoast2mo30

I can read chinese so I figured I can google it for you about the legal details.

(This is a law firm's article) https://jtn.com/CN/booksdetail.aspx?keyid=00000000000000008587

This is GPT 5.4's summary but I vouch for its accuracy as a summary:

[-]Petropolitan2mo20

If a company mines crypto on scale and gets caught, what would be the punishment, if any?

[-]papetoast2mo20

(official generic stuff) https://www.ndrc.gov.cn/xxgk/zcfb/tz/202109/t20210924_1297474.html

(law firm article) https://www.allbrightlaw.com/CN/10475/dcfe743a4909a3bf.aspx

(law firm article) https://www.junzejun.com/Publications/171418e8f774e7-b.html

I would trust it here, because I think I'm less confident only because I don't understand laws that well and I skimmed.

Moderation Log