AI #7: Free Agency

[-]simon3y50

There are essentially three kinds of proposals to do something about the fact that we’re all going to die.
A proposal that has no chance of working, like everything the AI labs do.
A proposal that is a small first step to future larger steps, but which almost certainly wouldn’t on its own work, that gets dismissed because it won’t work on its own and also because it might slow down progress and no one understands the dangers.
A proposal that might actually work if people tried it, then people say no one would ever go for that, you’re crazy and ruining our chances to do something more practical.

Thanks for providing concepts to help me express a bit of frustration I have with your takes. You keep on categorizing things that the AI labs are doing as kind 1, or even worse than kind 1, when I would categorize at least some of them as kind 2 (example).

In the case of Constitutional AI, I think you are selling it short as well:

This deserves a longer treatment, but my core reaction is that I expect this to be a good way to solve easy problems and to be a very bad way to try and solve the hard problems we should actually worry about. Humans provide a list of principles and rules, then the AI takes it from there based on its understanding of those principles and rules – so the AI is going to Goodhart on its own interpretation of what you wrote, based on its own model and reasoning.
When you are dealing with current-level LLMs and what you want are things like ‘don’t say bad words’ and ‘don’t tell people how to build bombs’ this could totally work and be a big time saver. When you have models smarter and more capable than humans, and the things you are trying to instill get more complex, this seems like a version of RLHF with additional points of inevitable lethal failure?

RLHF imo can't scale up properly because scaling the AI can't fix biases in the human ratings. As you scale up Constitutional AI on the other hand, it seems to me the AI should be able to handle more and more sophisticated constitutions, including ones based on predicting what humans would want if properly informed (correcting for Goodhart in a way you can't without this informed-human-predicting feature).

Maybe it can handle a sufficiently sophisticated constitution before it is powerful enough to foom, maybe not. Maybe LLMs are going to be upstaged by some other architecture that can't use constitutional AI, maybe not. Being able to cover one of those possibilities would still be better than the none of them that (imo) RLHF could handle.

[-]PeterMcCluskey3y20

I see one of the big advantages of a pause and training run limit (of any size and duration) being that it is a relatively blunt instrument that is relatively easy to evaluate.

It's far from obvious whether such a limit would slow capability growth much.

One plausible scenario is that it would mainly cause systems to be developed in a more modular way. That might make us a bit safer by pushing development more toward what Drexler recommends. Or it might fool most people into thinking there's a pause, while capabilities grow at 95% of the pace they would otherwise have grown at.

[-]mesaoptimizer3y21

Your text here is missing content found in the linked post. Specifically, the sentence "If one has to do this with" ends abruptly, unfinished.

[-]jbash3y20

Someone launched a truly minimum-viable-product attack, without doing any of their homework, and quickly got caught, showing us what is coming.

They didn't get caught; they got detected. They're still out there, free to iterate on the strategy until they get good at it. They incurred almost no cost with this initial probe.

Like other forms of spam and social engineering, this is not going to be difficult for people ‘on the ball’ to defend against any time soon, but we should worry about the vulnerable, especially the elderly, and ensure they are prepared.

I've gotten phishes that I wasn't sure about until I investigated them using tools and strategies not easily available to most "on the ball" people. And they weren't even spear phishes. You can fool almost anybody if you have a reasonable amount of information about them and tailor the attack to them.

And "immunity" is not without cost. If it gets to the point where a large class of legitimate messages have to be ignored because they can't be distinguished from false ones, that in itself does real damage.

Voices and faces used to be very convenient, easy, relatively reliable authentication tools, and it hurts to lose something like that. Also, voices and faces are kind of an emotional "root password". Humans may be hardwired to find it hard to ignore them. At the very least, even if they are ignored, it's going to be actually painful to do it.

I mean, I'm not saying it's the apocalypse, and there are plenty of ways to scam without AI, but this stuff is not good AT ALL.

[-]M. Y. Zuo3y10

I mean, I'm not saying it's the apocalypse, and there are plenty of ways to scam without AI, but this stuff is not good AT ALL.

It will however be a very strong impetus for establishing a verified identity phone system, which would also get rid of current human and simple machine generated spam calls.

So it does have some positive consequences.

[-]jbash3y20

I guess maybe. A system like that isn't easy to set up, and it's not like there aren't plenty of scams out there already to provide whatever incentives.

To have helped with the publicized incident, the verification would have had to be both mandatory and very strong, because the scammer was claiming to be calling from the kidnapper's phone, and could easily have made a totally credible claim that the victim's phone was unavailable. That means no anonymous phone calls, anywhere, ever. A system where it's impossible to communicate anonymously is very far from an unalloyed good, so it may or may not be a "positive consequence" at all on the whole.

Also, for the niche that voices were filling, anything that demands that you carry a device around with you is just plain not as good.

It's pretty rare to get so banged up that your face and voice are unrecognizable, especially if you can still communicate at all. Devices, on the other hand, get lost or broken quite a bit, including in cases where you might be trying to ask somebody you knew for money.

In the common "I got arrested" scam, the mark expects that the impersonated person's phone won't be available to them. The victim could of course notice that the person isn't calling from a police station, assuming the extra constraint that the identification system delivers an identifier that's unambiguously not a police station... but that just means the scammer switches to the equally common "I got mugged" or "car accident" scams. There are so many degrees of freedom that you can work around almost any technical measure.
Voices (used to) bind the content of a message directly to a person's vocal tract, and faces on video came pretty close to binding the message to the face. Device-based authentication relies on a much longer chain of steps, probably person to ID card/database photo to phone company records to crypto certificate to key to device. And, off on the side, the ID card database has to bind that face to information that can actually physically locate a scammer. Any of those steps can be subverted, and it's a LOT of work to secure all of them, especially because...
With no coordination at all, everybody on the planet automatically gets a face and a voice that's "compatible with the system", and directly available to important relying parties (namely the people who actually know you and who are likely to be scam victims).

Your device, on the other hand, may be certified by any number of different carriers, manufacturers, or governments, who have to cooperate in really complicated ways to get any kind of real verification. It takes forever and costs a lot to set up anything like that at the scale of a worldwide phone system.

It would be easier to set up intra-family "web of trust" device-based authentication... but of course that fails on the "mandatory" and "automatic" parts.

Device-based authentication can be stronger in many ways than vocal or visual authentication could ever be, and in some cases it's obviously superior, but I don't think it's a satisfying substitute. And most of its advantages tend to show up in much smaller communities/namespaces than the total worldwide phone system.

[-]Beckeck3y20

plus one for "stop worrying about what people will say in response so much, get the actual information out there, stop being afraid."

see also Anna Salamon's takes on 'not doing PR' that someone else might find and link?

[-]Razied3y10

I’ve thought a bunch about why Taleb doesn’t see this, why he worries about some things but not others, especially in the (relatively rare) cases where I think he gets it wrong. My model of this is that Taleb expects fat tails in the distributions to be more common than people expect and to dominate in importance, but in this framework, to be a fat tail you need to be on the distribution at all.

Oh you are being much too charitable to Taleb here. I don't think he spent 5 minutes thinking about the issue before confidently saying that everyone who disagrees with him is a moron bullshit-artist pseudo-intellectual. I don't know why you expect him to update on this in the next few years. Have you ever seen him change his mind in public?

[-]jaspax3y1-2

Eliezer... points out that in order to predict all the next word in all the text on the internet and all similar text, you need to be able to model the processes that are generating that text

I wanted to add this comment to the original post, but there were already dozens of other comments by the time I got to it and I figured the effort would have been wasted.

EY's original post is correct in its narrow claim, but wildly misleading in its implications. He's correct that to reliably predict the next word in a previously-unseen text is superhuman, and requires doing simulation and modeling that would be staggering in its implications. But insofar as that is the goal, how close is GPT to actually doing it? How well does GPT predict the next token in an unknown string in contexts where English syntax gives you many degrees of freedom?

Answer: it's terrible! Its failure rate approaches 100%! (Again, excluding contexts where syntactic or semantic constraints give you very few degrees of freedom.) It is not even starting to approximate attempting to actually implementing the kinds of simulation and modeling that success would imply. What it can do is produce text that matches the statistical distribution of human text, including non-local correlations (ie. semantics), and to a certain degree the statistical idiosyncracies of specific writers (ie. style), and it turns out that getting even that far is pretty impressive. It's also pretty impressive that you can treat "predict the next token" as the goal and get this much good out of it while still being bad at actually predicting the next token. But the training data that GPT has is enough to teach it something about syntax and semantics, but is not remotely close to the amount or kind of data that would be necessary to teach it to simulate the universe.

The EY article boils down to "if GPT-Omega were an omniscient god that knew everything you were going to say before you said it, would that be freaky or what". Yeah, bro, it would be freaky. But that has nothing to do with what GPT can actually do.

[This comment is no longer endorsed by its author]Reply

[-]RobertM3y20

This seems like an unusual misreading of Eliezer's post, which is quite explicitly about the potential bounds of future systems' performance, and not about the performance of the current system. There is no implication that the current system is superhuman (or even average-human) in the dimensions that you specified.

[-]M. Y. Zuo3y10

potential bounds of future systems' performance

They sound more like fantasy bounds than 'potential' simply because there isn't 1000x or 10000x more training data in existence for such a future system to train on. (Nor are there any likely pathways for this to occur, other than training on the outputs of prior models)

[-]jaspax3y10

I understood that. I guess I should have been more explicit about my belief that the amount of training data that would result in training a viable universal simulator would be "all of the text ever created", and then several orders of magnitude more.

[This comment is no longer endorsed by its author]Reply