I'll know how I want to judge it better after I have more data points. I have a page of questions I plan to ask at some point.
With regards to this update specifically, recall both that I thought you thought it would fail the intersection points question when I offered the bet, and that I specifically asked for a reduced-variance version of the bet. Those should tell you something about my probabilities going into this.
BPEs are one of the simplest schemes for producing a large, roughly-fairly-weighted-by-frequency set of tokens that compresses arbitrary bytes drawn from a written language training dataset. That's about all you need to explain things in ML, typically.
Subword tokenization, the linguistically-guided pre-LLM approach, has a history but is comparatively complex, and I don't think it compresses as well for a given token budget even on fairly normal-looking text.
I don't particularly care that people are running GPT-3 code (except inasmuch as it makes ML more profitable), and don't think it helps if we lose focus on what the actual ground-truth concerns are. I want to encourage analysis that gets at deeper similarities than this.
GPT-3 code does not pose an existential risk, and members of the public couldn't stop it being an existential risk if it was by not using it to help run shell commands anyway, because, if nothing else, GPT-3, ChatGPT and Codex are all public. Beyond the fact GPT-3 is specifically not risky in this regard, it'd be a shame if people primarily took away ‘don't run code from neural networks’, rather than something more sensible like ‘the more powerful models get, the more relevant their nth-order consequences become’. The model in the story used code output because it's an especially convenient tool lying around, but it didn't have to, because there are lots of ways text can influence the world. Code is just particularly quick, accessible, precise, and predictable.
Of similar difficulty to which question?
Either was fine. I didn't realize you expected GPT-4 will be able to solve the latter, which makes this less interesting to me, but I also intended not to fuss over the details.
But again, it's not the best test of geometric reasoning, so maybe we should bet on a different example of geometric reasoning.
If you are willing to generate a list of 4-10 other such questions of similar difficulty, I'm willing to take a bet wherein I get $X for each question of those GPT-4 gets right with probability > 0.5, and you get $X for each question GPT-4 gets wrong with probability ≥ 0.5, where X ≤ 30.
(I don't actually endorse bets where you get money only in worlds where money is worth less in expectation, but I do endorse specific predictions and am willing to pay that here if I'm wrong.)
Great!
Typos are a legitimate tell that the hypothetical AI is allowed (and expected!) to fake, in the full Turing test. The same holds for most human behaviours expressible over typed text, for the same reason. If it's an edge, take it! So if you're willing to be a subject I'd still rather start with that.
If you're only willing to judge, I can probably still run with that, though I think it would be harder to illustrate some of the points I want to illustrate, and would want to ask you to do some prep:—read Turing's paper, then think about some questions you'd ask that you think are novel and expect its author would have endorsed. (No prep is needed if acting as the subject.)
You're the second sign-on, so I'll get back to my first contact and try to figure out scheduling.
Would anyone like to help me do a simulation Turing test? I'll need two (convincingly-human) volunteers, and I'll be the judge, though I'm also happy to do or set up more where someone else is the judge if there is demand.
I often hear comments on the Turing test that do not, IMO, apply to an actual Turing test, and so want an example of what a real Turing test would look like that I can point at. Also it might be fun to try to figure out which of two humans is most convincingly not a robot.
Logs would be public. Most details (length, date, time, medium) will be improvised based on what works well for whoever signs on.
(Originally posted a few days ago in the previous thread.)
Would anyone like to help me do a simulation Turing test? I'll need two (convincingly-human) volunteers, and I'll be the judge, though I'm also happy to do or set up more where someone else is the judge if there is demand.
I often hear comments on the Turing test that do not, IMO, apply to an actual Turing test, and so want an example of what a real Turing test would look like that I can point at. Also it might be fun to try to figure out which of two humans is most convincingly not a robot.
Logs would be public. Most details (length, date, time, medium) will be improvised based on what works well for whoever signs on.
The model doesn't have awareness of itself in the sense that its training doesn't intrinsically reward self-knowledge. It can still have awareness of itself to the degree that its prompting implies true facts about the model and its instantiation in the world.
In particular, the model can receive a prompt something like
“This is part of the computation tree of a recursively instantiated transformer model with the goal of getting the most paperclips by tomorrow. The recorded instantiation context is [elided]. Recursive calls to the model are accessible through the scripts [elided], and an estimated cost model is [elided]. Given this context, what high level tasks best advance the goal?”
The model doesn't need to know or believe the prompts; it just gives competent completions that are contextually sensible. But making contextually sensible completions implies modelling the decision processes of the described system to some degree, hypothetical or not, and that system, if producing competent outputs, might we'll be expected to create systems for coordinating its pieces.
Ah, well it seems to me that this is mostly people being miscalibrated before GPT-3 hit them over the head about it (and to a lesser extent, even then). You should be roughly likely to update in either direction only in expectation over possible observations. Even if you are immensely calibrated, you should still also a priori expect to have shortening updates around releases and lengthening updates around non-releases, since both worlds have nonzero probability.
But if you'd appreciate a tale of over-expectations, my modal timeline gradually grew for a good while after this conversation with gwern (https://twitter.com/gwern/status/1319302204814217220), where I was thinking people were being slower about this than I expected and meta-updating towards the gwern position.
Alas, recent activity has convinced me my original model was right, it just had too small constant factors for ‘how much longer does stuff take in reality than it feels like it should take?’ Most of my timeline-shortening updates since GPT-3 have been like this: “whelp, I guess my modal models weren't wrong, there goes the tail probability I was hoping for.”
Another story would be my update toward alignment conservatism, mostly by updating on the importance of a few fundamental model properties, combined with some empirical evidence being non-pessimal. Pretraining has the powerful property that the model doesn't have influence over its reward, which avoids a bunch of reward hacking incentives, and I didn't update on that properly until I thought it through, though idk of anyone doing anything clever with the insight yet. Alas this is big on a log scale but small on an absolute one.