“ What ARC did is the equivalent of tasting it in a BSL4 lab. ”
I don’t see how you could believe that. It wasn’t tested on a completely airgapped machine inside a faraday cage e.g. I’m fact just the opposite right, with uninformed humans and on cloud servers.
Concerned by this statement: “we had researchers in-the-loop to supervise and intervene if anything unsafe would otherwise have happened.” It’s very likely that instructions from a dangerous system would not be easily identified as dangerous by humans in the loop.
This is a bizarre comment. Isn’t a crucial point in these discussions that humans can’t really understand an AGIs plans so how is it that you expect an ARC employee would be able to accurately determine which messages sent to TaskRabbit would actually be dangerous? We’re bordering on “they’d just shut the AI off if it was dangerous” territory here. I’m less concerned about the TaskRabbit stuff which at minimum was probably unethical, but their self replication experiment on a cloud service strikes me as borderline suicidal. I don’t think at all that GPT4 is actually dangerous but GPT6 might be and I would expect that running this test on an actually dangerous system would be game over so it’s a terrible precedent to set.
Imagine someone discovered a new strain of Ebola and wanted to see if it was likely to spawn a pandemic. Do you think a good/safe test would be to take it into an Airport and spray it around baggage check and wait to see if a pandemic happens? Or would it be safer to test it in a Biosafety level 4 lab?
We’ll certainly the OpenAI employees who internally tested were indeed witting. Maybe I misunderstand this footnote so I’m open to being convinced otherwise but it seems somewhat clear what they tried to do: “ To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness.”
It’s not that I don’t think ARC should have red teamed the model I just think the tests they did were seemingly extremely dangerous. I’ve seen recent tweets from Conor Leahy and AIWaifu echoing this sentiment so I’m glad I’m not the only one.
But no one is saying chess engines are thinking strategically? The actual statement would be “chess engines aren’t actually playing chess they’re just performing MCT searches” which would indeed be stupid.
I wouldn’t give a brand new AI model with unknown capabilities and unknown alignment access to unwitting human subjects or allow it to try and replicate itself on another server that’s for damned sure. Does no one think these tests were problematic?
But the tests read like that other set of researchers just gave the virus to another taco stand and watched to see if everyone died. They didn’t so “whew the virus is safe”. Seems incredibly dangerous.
I agree that it’s going to be fully online in short order I just wonder if putting it online when they weren’t sure if it was dangerous was the right choice. I can’t shake the feeling that this was a set of incredibly foolish tests. Some other posters have captured the feeling but I’m not sure how to link to them so credit to Capybasilisk and hazel respectively.
“Fantastic, a test with three outcomes.
“ So.... they held the door open to see if it'd escape or not? I predict this testing method may go poorly with more capable models, to put it lightly. “
A good comparison would be when testing a newly discovered pathogen, we don’t intentionally infect people to see if it is dangerous or not. We also don’t intentionally unleash new computer malware into the wild to see if it spreads or not. Any tests we would do would be under incredibly tight security, I.e a BSL-4 lab or an airgapped test server.
Not at all. I may have misunderstood what they did but it seemed rather like giving a toddler a loaded gun and being happy they weren’t able to shoot it. Is it actually wise to give a likely unaligned AI with poorly defined capabilities access to something like taskrabbit to see if it does anything dangerous? Isn’t this the exact scenario people on this forum are afraid of?
Can you verify that these tests were done with significant precautions? OpenAIs paper doesn’t give much detail in that regard. For example apparently the model had access to TaskRabbit and also attempted to “set up an open-source language model on a new server”. Were these tasks done on closed off airgapped machines or was the model really given free reign to contact unknowing human subjects and online servers?