Posts

Sorted by New

Wiki Contributions

Comments

can't we just fine-tune the language model by training it on statements like "If (authorized) humans want to turn me off, I should turn off."

Why would that make it corrigible to being turned off? What does the word "should" in the training data have to do with the system's goals and actions? The AI does not want to do what it ought (where by "ought" I mean the thing AI will learn the word means from human text). It won't be motivated by what it "should" do any more than by what it "shouldn't" do.

This is a fundamental flaw in this idea; it is not repairable by tweaking the prompt. The word "should" will, just, having literally nothing whatsoever to do with what the AI is optimizing for (or even what it's optimized for). DALL-E doesn't make pictures because it "should" do that; it makes pictures because of where gradient descent took it.

Like, best-case scenario, it repeats "I should turn off" as it kills us.

Sure it does. I was saying that the traditional pathway is pretty ridiculous and onerous. (And I was saying that to argue that MIRI's onerous application requirements are more like the traditional pathway and less like Scientology; I am objecting to the hyperbole in calling it the latter.) The response was that the traditional pathway is even more ridiculous and wasteful than I was giving it credit for. So yeah, I'd say that slightly strengthens my argument.

I don't think "no comment", or rather making undetailed but entirely true comments, is dishonest.

If I didn't miss anything and I'm understanding the scenario correctly, then for this part:

At some point, we reach the level of interpretability where we are convinced that the evolved AI system is already aligned with us before even being finetuned on specific tasks, 

I'd expect that interpretability tools, if they work, would tell you "yup, this AI is planning to kill you as soon as it possibly can", without giving you a way to fix that (that's robust to capability gains). Ie this story still seems to rely on an unexplained step that goes "... and a miracle occurs where we fundamentally figure out how to align AI just in the nick of time".

... And? What point do you think I'm arguing?

The traditional way has its costs and benefits (one insanely wasteful and expensive path that opens up lots of opportunities), as does the MIRI way (a somewhat time-consuming path that opens up a single opportunity). It seems like there's room for improvement in both, but both are obviously much closer to each other than either one is to Scientology, and that was the absurd comparison I was arguing against in my original comment. And that comparison doesn't get any less absurd just because getting a computer science degree is a qualification for a lot of things.

Okay, edited. If anything, that strengthens my point.

Based on what’s been said in this thread, donating more money to MIRI has precisely zero impact on whether they achieve their goals

Well obviously, I disagree with this! As I said in my comment, I'm eg tentatively happy about the Visible Thoughts project. I'm hopeful to see more experimentation in the future, hopefully eventually narrowing down to an actual plan.

Worst case scenario, giving them more money now would at least make them more able to "take advantage of a miracle" in the future (though obviously I'm really really hoping for more than that).

Wait, is the workshop 6 months? I assumed it was more like a week or two.

This is not how things work. You hire smart people, then you train them

Sometimes that is how things work. Sometimes you do train them first while not paying them, then you hire them. And for most 32-year old software engineers, they have to go through a 4-8 year training credentialing process that you have to pay year's worth of salary to go to. I don't see that as a good thing, and indeed the most successful places are famous for not doing that, but still.

To reiterate, I of course definitely agree that they should try using money more. But this is all roughly in the same universe of annoying hoop-jumping as typical jobs, and not roughly in the same universe as the Branch Davidians, and I object to that ridiculous hyperbole.

I'm taking the view that most people would think it's an onerous requirement and they're not willing to jump through those hoops, not that it's a cult. It just doesn't tick the boxes of that, unless we're defining that so widely as to include, I dunno, the typical "be a good fit for the workplace culture!" requirement that lots of jobs annoyingly have.

It's obviously much closer to "pay several hundred thousand dollars to be trained at an institution for 4-6 years (an institution that only considers you worthy if your essay about your personality, life goals, values, and how they combat racism is a good match to their mission), and then either have several years of experience or do an unpaid internship with us to have a good chance" than it is to the Peoples Temple. To say otherwise is, as I said, obviously ridiculous hyperbole.

Is either a recruitment ad for a cult,

Oh come, don't be hyperbolic. The main things that makes a cult a cult are absent. And I'm under the impression that plenty of places have a standard path for inexperienced people that involves an internship or whatever. And since AI alignment is an infant field, no one has the relevant experience on their resumes. (The OP mentions professional recruiters, but I would guess that the skill of recruiting high-quality programmers doesn't translate to recruiting high-quality alignment researchers.)

I do agree that, as an outsider, it seems like it should be much more possible to turn money into productive-researcher-hours, even if that requires recruiting people at Tao's caliber, and the fact that that's not happening is confusing & worrying to me. (Though I do feel bad for the MIRI people in this conversation; it's not entirely fair, since if somehow they in fact have good reason to believe that the set of people who can productively contribute is much tinier that we'd hope (eg: Tao said no, and literally everyone else isn't good enough), they might have to avoid explicitly explaining that to avoid rudeness and bad PR.)

I'm going to keep giving MIRI my money because it seems like everyone else is on a more-doomed path, but as a donor I would prefer to see more visible experimentation (since they said their research agendas didn't pan out and they don't see a path to survival). Eg I'm happy with the Visible Thoughts project. (My current guess (hope?) is that they are experimenting with some things that they can't talk about, which I'm 100% fine with; still seems like some worthwhile experimentation could be public.)

Load More