I don't really understand how a local copy of the weights gives the terrorists more practical control over the software's alignment. I don't think it's easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?
You can finetune models for any specific purpose: just provide a few datapoints and train. The more specific the purpose, the easier tweaking the weights is, not harder. (Surely, if nothing else, you've seen all of the LoRAs and other things for finetuning image generation model to generate a specific character?) There is an extensive literature at this point on how it is trivial to strip away all of the friendly chatbot persona from released checkpoints, such as LLaMA, if you are able to access and modify the model slow weights and fast weights directly.
We haven't really established why OpenBrain's market dominance is inevitable.
I think they gave OpenBrain a generic name to indicate that they don't know which company this would be, so I think it's tautologically defined that OpenBrain is dominant because the dominant company is the one we're looking at.
Nitpick: No single organism can destroy the biosphere; at most it can fill its niche & severely disrupt all ecosystems.
Have you read the report on mirror life that came out a few months ago? A mirror bacterium has a niche of “inside any organism that uses carbon-based biochemistry”. At least, it would parasitize all animals, plants, fungi, and the larger Protozoa, and probably kill them. I guess bacteria and viruses would be left. I bet that a reasonably smart superintelligence could figure out a way to get them too.
Continuous progress beyond humans into superhuman abilities reminds me of Kurzweil's superlinear graph through agriculture, industry, computing, & his singularity. Not impossible, but it's a curve without much of a mechanism to explain it.
Kurzweil's predictions are looking pretty good compared to what most contemporaries of his thought, right? I say this as someone who is generally not a fan of his.
In real life it's well-established that software can match humans at many tasks, & also can often perform tasks much faster.
Neural networks also frequently surpass humans qualitatively as well, right? Like, AlphaGo wasn't just a faster version of Lee Sedol.
Also, note that scaling up RL training has already resulted in drift towards alien incomprehensible language in the CoT AIs use to think. It's still comprehensible / English, for now, but it's notably less so than it was a year ago.
Like many parts of this scenario, i think this scene is fairly plausible but i would place it decades later.
Why decades and not years or centuries?
This is another hypothetical superhuman ability. And if OpenBrain influences the public extremely rapidly, surely that speed will be evidence for tracing this back to OpenBrain?
Such influence might not be traced in practice. But also, it might be traced and still happen. Corporations do lobbying all the time, and succeed in getting their way often as a result, and often it is in fact traced back to them, and yet... it still happens.
I don't really understand how a local copy of the weights gives the terrorists more practical control over the software's alignment. I don't think it's easy to manually tweak weights for so specific a purpose. Maybe they just mean the API is doing a good job of blocking sketchy requests?
If you have the weights, you can do fine-tuning, activation vector steering, stuff like that. It's still sometimes difficult to get it to do what you want, but it's a lot easier than if you don't have the weights. At least so say I, on the basis of things I heard, I haven't got experience doing this myself.
We must imagine this happens in August or later, since it hasn't yet been a month since -mini was released. And 10% is a lot for such a rapid cultural shift.
Yeah fair.
sounds so expensive to run that there are only a few servers it could copy itself to. Surely those admins would notice? But this doesn't affect the scenario much.
Depends on what you mean by "a few." It couldn't run on personal computers, that's for sure. But OpenBrain has enough compute to run something like a million copies in parallel IIRC (the num_copies displayed in the side widget is just the number they are actually using for research, not the number they COULD run if they used all their compute for that purpose) and OpenBrain has only like 15% of the world's AI-relevant compute, meaning that in order to be able to fit a copy of Agent-2, a server needs to have something like one ten-millionth of the world's AI chips on it (e.g. H100's or similar). I don't know how many such servers exist but the answer is probably "a lot." Including probably at least a few hobby project, academic clusters, etc.
So probably Agent-2 could either (a) try to hack all of them, knowing that it would get caught some of the time but not get caught other times, or (b) try to hack the least-secure 0.1% of them, hoping to never get caught.
There also is the Rogue Replication Scenario where copies of Agent-2 hack not just 0.1% of servers, but semi-legitimately gain access to loads of servers and GPU accelerators in the wilderness. For example, the AI could produce ransomware and use the revenue to rent the compute instead of resorting to security breaches.
My thoughts on the recently posted story.
In my opinion, this story is realistic except for a few big 'fast-forward' moments. It's a story of businesses succeeding incredibly fast.
Most of AI 2027's scenes are sober & realistic extrapolations of the previous scene. Some are so realistic they make me laugh! But keep in mind that sci-fi technology is inserted into this story at 3 or 4 places. This does of course happen in real history! But you must be carefully skeptical of each of these insertions.