Alternatively, the board could choose once again not to fire Altman, watch as Altman finished taking control of OpenAI and turned it into a personal empire, and hope this turns out well for the world.
I think it's pretty clear that Altman had already basically consolidated de facto control.
If you've arranged things so that 90+ percent of the staff will threaten to quit if you're thrown out against your will, and a major funding source will enable you to instantly rehire many or most of those people elsewhere, and you'll have access to almost every bit of the existing work, and you have massive influence with outside players the organization needs to work with, and your view of how the organization should run is the one more in line with those outside players' actual interests, and you have a big PR machine on standby, and you're better at this game than anybody else in the place, then the organization needs you more than you need it. You have the ability to destroy it if need be.
If it's also true that your swing board member is unwilling to destroy the organization[1], then you have control.
I read somewhere that like half the OpenAI staff, probably constituting the committed core of the "safety" faction, left in 2019-2020. That's probably when his control became effectively absolute. Maybe they could have undone that by expanding the board, but presumably he'd have fought expansion, too, if it had meant more directors who might go against him. Maybe there were some trickier moves they could have come up with, but at a minimum Altman was immune to direct confrontation.
The surprising thing is that the board members apparently didn't realize they'd lost control for a couple of years. I mean, I know they're not expert corporate power players (nor am I, for that matter), but that's a long time to stay ignorant of something like that.
In fact, if Altman's really everything he's cracked up to be, it's also surprising that he didn't realize Sutskever could be swayed to fire him. He could probably have prevented that just by getting him in a room and laying out what would happen if something like this were done. And since he's better at this than I am, he could also probably have found a way to prevent it without that kind of crude threat. It's actually a huge slip on his part that the whole thing broke out into public drama. A dangerous slip; he might have actually had to destroy OpenAI and rebuild elsewhere.
None of this should be taken to mean that I think that it's good that Altman has "won", by the way. I think OpenAI would be dangerous even with the other faction in control, and Altman's even more dangerous.
The only reason Sutskever voted for the firing to begin with seems to be that he didn't that realize Altman could or would take OpenAI down with him (or, if you want to phrase it more charitably and politely, that Altman had overwhelmingly staffed it with people who shared his idea of how it should be run). ↩︎
I think that would help. I think the existing title primed me to expect something else, more in the line of it being impossible for an "aligned" program to exist because it couldn't figure out what to do.
Or perhaps the direct-statement style "Aligned status of software is undecideable" or something like that.
... but the inability to solve the halting problem doesn't imply that you can't construct a program that you can prove will or won't halt, only that there are programs for which you can't determine that by examination.
I originally wrote "You wouldn't try to build an 'aligned' agent by creating arbitrary programs at random and then checking to see if they happened to meet your definition of alignment"... but on reflection that's more or less what a lot of people do seem to be trying to do. I'm not sure a mere proof of impossibility is going to deter somebody like that, though.
The board has backed down after Altman rallied staff into a mass exodus.
How would that be bad if you were trying to shut it down? [On edit: how would the exodus be bad, not how would backing down be bad]
Especially because the people most likely to quit would be the ones driving the risky behavior?
The big problem would seem to be that they might (probably would/will) go off and recreate the danger elsewhere, but that's probably not avoidable anyway. If you don't act, they'll continue to do it under your roof. If you force them to go set up elsewhere, then at least you've slowed them down a bit.
And you might even be able to use the optics of the whole mess to improve the "you can do whatever you want as long as you're big enough" regulatory framework that seems to have been being put into place, partly under OpenAI's own influence. Probably not, but at least you can cause policymakers to perceive chaos and dissent, and perhaps think twice about whether it's a good idea to give the chaotic organizations a lot of rope.
The stereotype of a good and upstanding person is incompatible with the stereotype of [dark matter], and rather than make a complicated and effortful update to a more nuanced stereotype, people often simply snap to “well, I guess they were Secretly Horrible All Along, all of my direct evidence to the contrary notwithstanding.”
Maybe people really do change their assessments of people they know well. But maybe they decide that they're not willing to take the punishment risk from appearing to defend (or even conceal) One Of Them. The best way to avoid that is to pretend to suddenly decide that this person is horrible. With or without applying motivated cognition to intentionally convince yourself of it.
I'm not even sure which one would be the majority.
On the main point, I don't think you can make those optimizations safely unless you really understand a huge amount of detail about what's going on. Just being able to scan brains doesn't give you any understanding, but at the same time it's probably a prerequisite to getting a complete understanding. So you have to do the two relatively serially.
You might need help from superhuman AGI to even figure it out, and you might even have to be superhuman AGI to understand the result. Even if you don't, it's going to take you a long time, and the tests you'll need to do if you want to "optimize stuff out" aren't exactly risk free.
Basically, the more you deviate from just emulating the synapses you've found[1], and the more simplifications you let yourself make, the less it's like an upload and the more it's like a biology-inspired nonhuman AGI.
Also, I'm not so sure I see a reason to believe that those multicellular gadgets actually exist, except in the same way that you can find little motifs and subsystems that emerge, and even repeat, in plain old neural networks. If there are a vast number of them and they're hard-coded, then you have to ask where. Your whole genome is only what, 4GB? Most of it used for other stuff. And it seems as though it's a lot easier to from a developmental point of view to code for minor variations on "build this 1000 gross functional areas, and within them more or less just have every cell send out dendrites all over the place and learn which connections work", than for "put a this machine here and a that machine there within this functional area".
“Human brains have probably more than 1000 times as many synapses as current LLMs have weights.” → Can you elaborate? I thought the ratio was more like 100-200. (180-320T ÷ 1.7T)
I'm sorry; I was just plain off by a factor of 10 because apparently I can't do even approximate division right.
Humans can get injuries where they can’t move around or feel almost any of their body, and they sure aren’t happy about it, but they are neither insane nor unable to communicate.
A fair point up, with a few limitations. Not a lot of people are completely locked in with no high bandwidth sensory experience, and I don't think anybody's quite sure what's going on with the people who are. Vision and/or hearing are already going to be pretty hard to provide. But maybe not as hard as I'm making them out to be, if you're willing to trace the connections all the way back to the sensory cells. Maybe you do just have to do the head. I am not gonna volunteer, though.
In the end, I'm still not buying that uploads have enough of a chance of being practical to run in a pre-FOOM timeframe to be worth spending time on, as well as being pretty pessimistic about anything produced by any number of uploaded-or-not "alignment researchers" actually having much of a real impact on outcomes anyway. And I'm still very worried about a bunch of issues about ethics and values of all concerned.
... and all of that's assuming you could get the enormous resources to even try any of it.
By the way, I would have responded to these sooner, but apparently my algorithm for detecting them has bugs...
... which may already be really hard to do correctly... ↩︎
I think that's likely correct. What I mean is that it's not running all the way to the end of a network, computing a loss function at the end of a well defined inference cycle, computing a bunch of derivatives, etc... and also not doing anything like any of that mid-cycle. If you're willing to accept a large class of feedback systems as "essentially back propagation", then it depends on what's in your class. And I surely don't know what it's actually doing.
Any competent virologist could make a vaccine resistant, contagious, highly lethal to humans virus.
This is constantly repeated on here, and it's wrong.
Virologists can't do that. Not quickly, not confidently, and even less if they want it to be universally lethal.
Biology is messy and strange, unexpected things happen. You don't find out about those things until you test, and sometimes you don't find out until you test at scale. You cannot predict them with computer simulations, at least unless you have already converted the entire planet to computronium. You can't model everything that's going on with the virus in one host, let alone if you have to care about interactions with the rest of the world... which you do. And anything you do won't necessarily play out the same on repeated tries.
You can sometimes say "I expect that tweaking this amino acid will probably make the thing more infectious", and be right. You can't be sure you're right, nor know how much more infectious, unless you try it. And you can't make a whole suite of changes to get a whole suite of properties, all at the same time, with no intermediate steps.
You can throw in some manual tweaks, and also let it randomly mutate, and try to evolve it by hothouse methods... but that takes a lot of time and a significant number of hosts.
90 percent lethality is much harder than 50. 99 is much harder than 90.
The more of the population you wipe out, the less contact there is to spread your plague... which mean that 100 percent is basically impossible. Not to mention that if it's really lethal, people tend to resort to drastic measures like shutting down all travel. If you want an animal vector or something to get around that sort of thing, you've added another very difficult constraint.
Vaccine resistance, and even natural immunity resistance, tend to depend on mutations. The virus isn't going to feel any obligation to evolve in ways that are convenient for you, and your preferred strains can get outcompeted. In fact, too much lethality is actually bad for a virus in terms of reproductive fitness... which is really the only metric that matters.
PSA: The Anarchist's Cookbook is notorious for having bogus and/or dangerous recipes. For lots of things, not just bombs. Apparently that was intentional.
US Army TM 31-210 is free on the Web with a Google search, though.
I was actually surprised that the new board ended up with members who might reasonably be expected, under the right circumstances, to resist something Microsoft wanted. I wouldn't have been surprised if it had ended up all Microsoft employees and obvious Microsoft proxies.
Probably that was a concession in return for the old board agreeing to the whole thing. But it's also convenient for Altman. It doesn't matter if he pledges allegiance. The question is what actual leverage Microsoft has over him should he choose to do something Microsoft doesn't like. This makes a nonzero difference in his favor.