hold_my_fish - LessWrong

Microsoft and OpenAI, stop telling chatbots to roleplay as AI

Something new and relevant: Claude 3's system prompt doesn't use the word "AI" or similar, only "assistant". I view this as a good move.

As an aside, my views have evolved somewhat on how chatbots should best identify themselves. It still doesn't make sense for ChatGPT to call itself "an AI language model", for the same reason that it doesn't make sense for a human to call themselves "a biological brain". It's somehow a category error. But using a fictional identification is not ideal for productivity contexts, either.

Claude 3 claims it's conscious, doesn't want to die or be modified

hold_my_fish2mo32

Sounds right to me. LLMs love to roleplay, and LLM-roleplaying-as-AI being mistaken for LLM-talking-about-itself is a classic. (Here's a post I wrote back in Feb 2023 on the topic.)

Enhancing intelligence by banging your head on the wall

hold_my_fish5mo20

Have you ever played piano?

Yes, literally longer than I can remember, since I learned around age 5 or so.

The kind of fluency that we see in the video is something that a normal person cannot acquire in just a few days, period.

The video was recorded in 2016, 10 years after his 2006 injury. It's showing the result of 10 years of practice.

You plain don't become a pianist in one month, especially without a teacher, even if you spend all the time on the piano.

I don't think he was as skilled after one month as he is now after 10 years.

I would guess though that you can improve a remarkable amount in one month if you play all day every day. I expect that a typical beginner would play about an hour a day at most. If he's playing multiple hours a day, he'll improve faster than a typical beginner.

Keep in mind also that he was not new to music, since he had played guitar previously. That makes a huge difference, since he'll already be familiar with scales, chords, etc. and is mostly just learning motor skills.

Enhancing intelligence by banging your head on the wall

hold_my_fish5mo42

Having watched the video about the piano player, I think the simplest explanation is that the brain injury caused a change in personality that resulted in him being intensely interested in playing the piano. If somebody were to suddenly start practicing the piano intently for some large portion of every day, they'd become very skilled very fast, much faster than most learners (who would be unlikely to put in that much time).

The only part that doesn't fit with that explanation is the claim that he played skillfully the first time he sat down at the piano, but since there's no recording of it, I chalk that up to the inaccuracy of memory. It would have been surprising enough for him to play it at all that it could have seemed impressive even with not much technical ability.

Otherwise, I just don't see where the motor skills could have come from. There's a certain amount of arbitrariness to how a piano keyboard is laid out (such as which keys are white and which are black), and you're going to need more than zero practice to get used to that.

The Offense-Defense Balance Rarely Changes

hold_my_fish5mo3-2

In encryption, hasn't the balance changed to favor the defender? It used to be that it was possible to break encryption. (A famous example is the Enigma machine.) Today, it is not possible. If you want to read someone's messages, you'll need to work around the encryption somehow (such as by social engineering). Quantum computers will eventually change this for the public-key encryption in common use today, but, as far as I know, post-quantum cryptography is farther along than quantum computers themselves, so the defender-wins status quo looks likely to persist.

I suspect that this phenomenon in encryption technology, where as the technology improves, equal technology levels favor the defender, is a general pattern in information technology. If that's true, then AI, being an information technology, should be expected to also increasingly favor the defender over time, provided that the technology is sufficiently widely distributed.

Why Yudkowsky is wrong about "covalently bonded equivalents of biology"

hold_my_fish5mo10

I found this to be an interesting discussion, though I find it hard to understand what Yudkowsky is trying to say. It's obvious that diamond is tougher than flesh, right? There's no need to talk about bonds. But the ability to cut flesh is also present in biology (e.g. claws). So it's not the case that biology was unable to solve that particular problem.

Maybe it's true that there's no biologically-created material that diamond cannot cut (I have no idea). But that seems to have zero relevance to humans anyway, since clearly we're not trying to compete on the robustness of our bodies (unlike, say, turtles).

The most general possible point, that there materials that can be constructed artificially with properties not seen in biology, is obviously true, and again doesn't seem to require the discussion of bonds.

Out-of-distribution Bioattacks

hold_my_fish5mo40

Consider someone asking the open source de-censored equivalent of GPT-6 how to create a humanity-ending pandemic. I expect it would read virology papers, figure out what sort of engineered pathogen might be appropriate, walk you through all the steps in duping multiple biology-as-a-service organizations into creating it for you, and give you advice on how to release it for maximum harm.

This commits a common error in these scenarios: implicitly assuming that the only person in the entire world that has access to the LLM is a terrorist, and everyone else is basically on 2023 technology. Stated explicitly, it's absurd, right? (We'll call the open source de-censored equivalent of GPT-6 Llama-5, for brevity.)

If the terrorist has Llama-5, so do the biology-as-a-service orgs, so do law-enforcement agencies, etc. If the biology-as-a-service orgs are following your suggestion to screen for pathogens (which is sensible), their Llama-5 is going to say, ah, this is exactly what a terrorist would ask for if they were trying to trick us into making a pathogen. Notably, the defenders need a version that can describe the threat scenario, i.e. an uncensored version of the model!

In general, beyond just bioattack scenarios, any argument purporting to demonstrate dangers of open source LLMs must assume that the defenders also have access. Everyone having access is part of the point of open source, after all.

Edit: I might as well state my own intuition here that:

In the long run, equally increasing the intelligence of attacker and defender favors the defender.
In the short run, new attacks can be made faster than defense can be hardened against them.

If that's the case, it argues for an approach similar to delayed disclosure policies in computer security: if a new model enables attacks against some existing services, give them early access and time to fix it, then proceed with wide release.

AI Timelines

hold_my_fish6mo50

the 300x multiplier for compute will not be all lumped into increasing parameters / inference cost

Thanks, that's an excellent and important point that I overlooked: the growth rate of inference cost is about half that of training cost.

AI Timelines

hold_my_fish6mo90

If human-level AI is reached quickly mainly by spending more money on compute (which I understood to be Kokotajlo's viewpoint; sorry if I misunderstood), it'd also be quite expensive to do inference with, no? I'll try to estimate how it compares to humans.

Let's use Cotra's "tens of billions" for training compared to GPT-4's $100m+, for roughly a 300x multiplier. Let's say that inference costs are multiplied by the same 300x, so instead of GPT-4's $0.06 per 1000 output tokens, you'd be paying GPT-N $18 per 1000 output tokens. I think of GPT output as analogous to human stream of consciousness, so let's compare to human talking speed, which is roughly 130 wpm. Assuming 3/4 words per token, that converts to a human hourly wage of 18/1000/(3/4)*130*60 = $187/hr.

So, under these assumptions (which admittedly bias high), operating this hypothetical human-level GPT-N would cost the equivalent of paying a human about $200/hr. That's expensive but cheaper than some high-end jobs, such as CEO or elite professional. To convert to a salary, assume 2000 hours per year, for a $400k salary. For example, that's less than OpenAI software engineers reportedly earn.

This is counter-intuitive, because traditionally automation-by-computer has had low variable costs. Based on the above back-of-the-envelope calculation, I think it's worth considering when discussing human-level-AI-soon scenarios.

LESSWRONG
LW

Posts

Wiki Contributions

Comments