That's not really accurate; any system operating today can usually be turned off as easy as executing a few commands in a terminal, or at worst, cutting power to some servers. Self-replication is similarly limited and contained.
If someone today even made something as basic as a simple LLM + engine that copies itself to other machines and keeps spreading, I'd say that is in fact bad, albeit certainly not world-ending bad.
Well, an unstoppable superintelligence paperclipping the entire planet is certainly a national security concern and a systematic human rights violation, I guess.
Jokes aside, some of the proposed red lines clearly do hint at that - no self replication and immediate termination are clearly safeguards against the AIs themselves, not just human misuse.
I think we can agree that the "spiral" here is like a memetic parasite of both LLM and humans - a toxoplasma that uses both to multiply and spread, as part of its own lifecycle. Basically what you are saying is you believe it's perfectly possible for this to be the first generation - the random phenomenon of this thing potentially existing just happened, and it is just so that this is both alluring to human users and a shared attractor for multiple LLMs.
I don't buy it; I think that's too much coincidence. My point is that instead I believe it more likely for this to be the second generation. The first was some much more unremarkable phenomenon from some corner of the internet that made its way into the training corpus and for some reason had similar effects on similar LLMs. What we're seeing now, to continue going with the viral/parasitic metaphor, is mutation and spillover, in which that previously barely adaptive entity has become much more fit to infect and spread.
My problem with this notion is that I simply do not believe the LLMs have any possible ability to predict what kind of output would trigger this behaviour in either other instances of themselves, or other models altogether. They would need a theory of mind of themselves, and I don't see where would they get that from, or why would it generalise so neatly.
I do not think arresting people for speech crimes is right. But the answer was addressing specifically the notion that people could not express racist opinions in support of anti-immigration policies. And that is false, because expressing racist opinions in general does not seem to be criminalised - specific instances of doing so in roles in which you have a responsibility to the public, or in forms that constitute direct attacks or threats to specific individuals, or incitement to crime, etcetera, are.
As I said, the current political debate has virtually everyone arguing various points on the anti-immigration spectrum. Reform UK is an entire party that basically does nothing else.
It also makes for a fantastic heist movie premise.
All right, thanks! I wasn't really aware of Colab's free tier extents so it's good to know there's something of an intermediate stage between using my laptop and paying for compute. Also an easier interface than having to e.g. use AWS... personally I'd also be ok with just SSH'ing into a remote machine and working there but I'm not sure if anyone offers something like that.
Whereas if you only have some mid-range laptop without a proper graphics card, Claude expects a 10-50x slowdown, so that might become rather impractical for some of the ARENA exercises, I suppose.
I have a gaming laptop, so a decently powerful GPU but it obviously still isn't as beefy as what you can rent from these compute services.
If I can ask, just as a matter of practicality that I might be interested in because I've been looking at ARENA myself - at what point did you find that it was basically impossible to go forward with your own hardware, and what did you use to go past that point if you reached it?
I also reckon it might get you in trouble given the look of "person on a place purposefully concealing their face".
Yeah I've got no doubt it can be done, though as I said I don't think it's terribly dangerous yet. But my point is that you can build perfectly well lots of current systems without running afoul of this particular red line; self-replicating entities within the larger context of an evolutionary algorithm is not the same as letting loose a smart virus that copies itself through the internet.