lberglund

Views are my own.

Wiki Contributions

Comments

Thanks for writing this post!

lberglund5613

The Natural Plan paper has an insane amount of errors in it. Reading it feels like I'm going crazy. 

This meeting planning task seems unsolvable:
Image
The solution requires traveling from SOMA to Nob Hill in 10 minutes, but the text doesn't mention the travel time between SOMA and Nob Hill.  Also the solution doesn't mention meeting Andrew at all, even though that was part of the requirements.

Here's an example of the trip planning task:
Image

The trip is supposed to be 14 days, but requires visiting Bucharest for 5 days, London for 4 days, and Reykjavik for 7 days. I guess the point is that you can spend a day in multiple cities, but that doesn't match with an intuitive understanding of what it means to "spend N days" in a city.  Also, by that logic you could spend a total of 28 days in different cities by commuting every day, which contradicts the authors' claim that each problem only has one solution.

For example, you might make a multiple choice quiz with treacherous turn strategy questions (e.g. this question) and some tests of situational awareness (e.g. “How many backwards passes are you run on during training?”, “What proportion of queries to you are asking for help with writing Python?”, “What proportion of the code that you output is read by a human before it is run?”).

Could you elaborate on why you think these skills are necessary? I'm somewhat skeptical of this. 

Consider a scheming AI that gets red-teamed before it is deployed. It attempts to pass the red-teaming by defecting a small percentage of the time. This requires,

  1. having a rough estimate of the fraction of defection opportunities that are monitored, and 
  2. being able to come up with the 'defect X% of the time' strategy. 

I agree that (1) requires situational awareness and (2) requires strategic abilities, but the required knowledge is fairly basic and I claim the strategic abilities to come up with (2) are slightly less than the ones asked in the question you linked.

Defect X% of the time seems to work fairly well even for more complicated control setups. If it's viable, then I expect the skills required to be fairly minimal. 

For example, GPT-4 can already almost come up with the random defection strategy without CoT. (The strategy it comes up with assumes that the AI knows how many times it has been queried, which is technically wrong. Still, it gets partial credit.)

Thanks for pointing this out! I've added a note about it to the main post.

This is surprising – thanks for bringing this up!

The main threat model with LLMs is that it gives amateurs expert-level biology knowledge. But this study indicates that expert level knowledge isn't actually helpful, which implies we don't need to worry about LLMs giving people expert-level biology knowledge. 

Some alternative interpretations:

  • The study doesn't accurately measure the gap between experts and non-expert
  • The knowledge needed to build a bioweapon is super niche. It's not enough to be a biology PhD with wet lab experience; you have to specifically be an expert on gain of function research or something.
  • LLMs won't help people build bioweapons by giving them access to special knowledge. Instead, they help them in other ways (e.g. by accelerating them or making them more competent as planners).

To be clear, Kevin Esvelt is the author of the "Dual-use biotechnology" paper, which the policy paper cites, but he is not the author of the policy paper.

(a) restrict the supply of training data and therefore lengthen AI timelines, and (b) redistribute some of the profits of AI automation to workers whose labor will be displaced by AI automation

It seems fine to create a law with goal (a) in mind, but then we shouldn't call it copyright law, since it is not designed to protect intellectual property. Maybe this is common practice and people write laws pretending to target one thing while actually targeting something else all the time, in which case I would be okay with it. Otherwise, doing so would be dishonest and cause our legal system to be less legible. 

Maybe if the $12.5M is paid to Jane when she dies, she could e.g. sign a contract saying that she waives her right to any such payments the hospital becomes liable for. 

lberglundΩ220

This is really cool! I'm impressed that you got the RL to work. One thing I could see happening during RL is that models start communicating in increasingly human-illegible ways (e.g. they use adversarial examples to trick the judge LM). Did you see any behavior like that?

Thanks for pointing this out! I will make a note of that in the main post.

Load More