Tom Shlomi's Shortform

Tom Shlomi

LESSWRONG
LW

Tom Shlomi's Shortform

1 min read21st Feb 20232 comments

This is a special post for quick takes by Tom Shlomi. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

New to LessWrong?

Getting Started

FAQ

Library

Tom Shlomi's Shortform

21st Feb 2023

1Tom Shlomi

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:19 AM

[-]Tom Shlomi1y10

Talking about what a language model "knows" feels confused. There's a big distinction between what a language model can tell you if you ask it directly, what it can tell you if you ask it with some clever prompting, and what a smart alien could tell you after only interacting with that model. A moderately smart alien that could interact with GPT-3 could correctly answer far more questions than GPT-3 can even with any amount of clever prompting.

[-]Tom Shlomi1y11

The Constitutional AI paper, in a sense, shows that a smart alien with access to an RLHFed helpful language model can figure out how to write text according to a set of human-defined rules. It scares me a bit that this works well, and I worry that this sort of self-improvement is going to be a major source of capabilities progress going forward.

Moderation Log