gwern - LessWrong

Yes, I've never had any difficulty replicating the gwern identification: https://chatgpt.com/share/0638f916-2f75-4d15-8f85-7439b373c23c It also does Scott Alexander: https://chatgpt.com/share/298685e4-d680-43f9-81cb-b67de5305d53 https://chatgpt.com/share/91f6c5b8-a0a4-498c-a57b-8b2780bc1340 (Examples from sinity just today, but parallels all of the past ones I've done: sometimes it'll balk a little at making a guess or identifying someone, but usually not hard to overcome.)

One interesting thing is that the extensive reasoning it gives may not be faithful. Notice that in identifying Scott Alexander's recent Reddit comment, it gets his username wrong - that username does not exist at all. (I initially speculated that it was using retrieval since OA & Reddit have struck a deal; but obviously, if it had, or had been trained on the actual comment, it would at least get the username right.) And in my popups comment, I see no mention that points to LessWrong, but since I was lazy and didn't copyedit that comment, it is much more idiosyncratic than usual; so what I think ChatGPT-4o does there is immediately deduce that it's me from the writing style & content, infer that it could not be a tweet due to length or a Gwern.net quote because it is clearly a comment on social media responding to someone, and then guesses it's LW rather than HN, and presto.3

Language Models Model Us

gwern1h20

Age is extremely compressed/skewed because it's OKCupid. So I can think of a couple issues there: there might be a problem of distribution mismatch where a GPT is trained on a much more even distribution of text (I would assume tons of text is written by age 50-100 IRL rather than a young techie dating website) and so is simply taking into account a very different base rate; another issue is that maybe the GPT is accurate but restriction of range creates misleading statistical artifacts. Binarization wouldn't help, and might worsen matters - how many people tweak their age to avoid the dreaded leading '3' and turning into Christmas cake? (A more continuous loss like median average error might be a better metric than Brier on a binary or categorical.)

As far as sexuality goes, this is something the LLMs may be trained very heavily on, with unpredictable effects. But it's also a much weirder category here too:

Dating sites in general have more males than females, reflecting the mating behavior seen offline (more males being on the lookout). OKCupid features a very broad selection of possible genders. One must choose at least one category and up to 5 categories of which the possible options are: Man, Woman, Agender, Androgynous, Bigender, Cis Man, Cis Woman, Genderfluid, Genderqueer, Gender Nonconforming, Hijra, Intersex, Non-binary, Other, Pangender, Transfeminine, Transgender, Transmasculine, Transsexual, Trans Man, Trans Women and Two Spirit. Nevertheless, almost everybody chooses one of the first two (39.1 % Women, 60.6 % Men, binary total = 99.7 %)^5. The full count by type can be found in the supplementary materials sheet "Genders").

I'm not sure how OP handled that. So the predictive power here should be considered as a loose lower bound, given all the potential sources of measurement error/noise.

Stephen Fowler's Shortform

gwern2h20

I didn't know it meant either.

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

gwern1d289

It seems like it was a big commitment because there were several hints during the OpenAI coup reporting that Superalignment was not getting the quota as OA ran very short on compute in 2023, creating major internal stress (particularly from Sam Altman telling people different things or assigning the same job) and that was one of the reasons for Altman sidelining Ilya Sutskever in favor of Jakub Pachocki. What sounded good & everyone loved initially turned out to be a bit painful to realize. (Sort of like designing the OA LLC so the OA nonprofit board could fire the CEO.)

EDIT: speak of the devil: https://x.com/janleike/status/1791498178346549382 Note Leike has to be very cautious in his wording.

We are headed into an extreme compute overhang

gwern3d73

I'm not sure how to square those results with the Chinchilla paper though

Apples and oranges. The Chinchilla paper simply optimizes the final trained model's loss given a fixed compute budget. It doesn't say anything about any downstream uses - similar to how it doesn't tell you (directly) how you should allocate your compute if you have X GPUs and you want to run a model for your users for Y requests, and you have a tradeoff between spending your GPUs at training time to create a smaller model which needs fewer GPUs to serve Y requests. Likewise, you've probably seen some "overtraining" analyses which argue that you should overtrain a Chinchilla by some large amount Z to get the model which best balances train vs run - but those also answer a different question because they assume that you will deploy that Chinchilla model without any sparsification or lower precision, even though that's hardly what anyone actually does.

(While no one has done Li et al for MoEs I know of, I would expect that the results will be fairly similar, but shifted up/down, because you can often think of a MoE as a bunch of smaller dense models.)

Alexander Gietelink Oldenziel's Shortform

gwern3d31

Yup. Who knows but we are all part of a giant leave-one-out cross-validation computing counterfactual credit assignment on human history? Schmidhuber-em will be crushed by the results.

Ilya Sutskever and Jan Leike resign from OpenAI [updated]

gwern3d5323

But this is also right after GPT-4o, which, like Sora not that long ago, is a major triumph of the Sutskeverian vision of just scaling up sequence prediction for everything, and which OA has been researching for years (at least since CLIP/DALL-E 1, and possibly this effort for 2-3 years as 'Gobi'). I don't find it so hard to believe that he's held off until Sora & GPT-4o were out. These are the achievements of not just his lifetime, but hundreds of other peoples' lives (look at the contributor list). He's not going to quit anywhere before it. (Especially since by all accounts he's been gone the entire time, so what's a few more days or weeks of silence?)

Is there a particular reason to think that he would have had an exactly 6-month notice from the vote to remove Altman? And why would he have submitted notice then, exactly? The logical day to submit your quitting notice would be when the investigation report was submitted and was a complete Altman victory, which was not 6 months ago.

Alexander Gietelink Oldenziel's Shortform

gwern3d95

One of the interesting thing about AI minds (such as LLMs) is that in theory, you can turn many topics into testable science while avoiding the 'problem of old evidence', because you can now construct artificial minds and mold them like putty. They know what you want them to know, and so you can see what they would predict in the absence of knowledge, or you can install in them false beliefs to test out counterfactual intellectual histories, or you can expose them to real evidence in different orders to measure biases or path dependency in reasoning.

With humans, you can't do that because they are so uncontrolled: even if someone says they didn't know about crucial piece of evidence X, there is no way for them to prove that, and they may be honestly mistaken and have already read about X and forgotten it (but humans never really forget so X has already changed their "priors", leading to double-counting), or there is leakage. And you can't get people to really believe things at the drop of a hat, so you can't make people imagine, "suppose Napoleon had won Waterloo, how do you predict history would have changed?" because no matter how you try to participate in the spirit of the exercise, you always know that Napoleon lost and you have various opinions on that contaminating your retrodictions, and even if you have never read a single book or paper on Napoleon, you are still contaminated by expressions like "his Waterloo" ('Hm, the general in this imaginary story is going to fight at someplace called Waterloo? Bad vibes. I think he's gonna lose.')

But with a LLM, say, you could simply train it with all timestamped texts up to Waterloo, like all surviving newspapers, and then simply have one version generate a bunch of texts about how 'Napoleon won Waterloo', train the other version on these definitely-totally-real French newspaper reports about his stunning victory over the monarchist invaders, and then ask it to make forecasts about Europe.

(These are the sorts of experiments which are why one might wind up running tons of 'ancestor simulations'... There's many more reasons to be simulating past minds than simply very fancy versions of playing The Sims. Perhaps we are now just distant LLM personae being tested about reasoning about the Singularity in one particular scenario involving deep learning counterfactuals, where DL worked, although in the real reality it was Bayesian program synthesis & search.)

An Introduction to AI Sandbagging

gwern4d143

Seems like this might work better as a control variate sort of idea, with the noise added to the environment rather than the model itself, to measure whether the model can steer its way back to its intended measured-capability level. If you add a lot of noise to an environment, then an agent which is trying to pessimize and achieve a certain lower performance will have to overperform in order to 'merely' be bad; then you can subtract the noise which you added, and estimate its performance. If it's been deliberately taking a dive, then it'll do better than non-sandbagging agents do when evaluated in the new noisy environment.

(Imagine, say, playing chess, where a chess agent is deliberately aiming to have a rating of only, say, 1500 Elo where its true Elo is 3000. You evaluate it normally, and it scores ~1500, carefully playing in a deniable way so that it imitates the distribution of moves an Elo 1500 player has eg. like a Decision Transformer. But you then add in occasional stochastic errors to its chosen move, as if its hand "trembled"; it observes that it's behind and keeps ramping up its move quality in order to hit its 1500 target. Eventually, you add so much noise that even playing in its final form, it's too far behind and it starts getting Elos like 1400. You do this same evaluation with a non-deceptive version of that same model, which is honestly only 1500 Elo, and its performance degrades smoothly the instant you add in any noise at all because it's not trained for that sort of noise and has no performance reserves to draw on, just like a human chess player will get worse if you periodically reached in & screwed with them by surprisingly randomizing one of their moves.)

To avoid this, it would need to be so situated and so knowledgeable about other agents that it knows it's being tested in this anti-sandbagging way and also which agents it's being compared against and how badly they do in order to know how much of an additional dive to take. "With this level of noise, an honest 1500 Elo version of me would actually be 1300 Elo, and I need to let myself lose exactly that much more, but neither more nor less." And that may be quite difficult: an agent may be able to figure out easily it's being tested on a weirdo environment (the way Claude-3 increasingly seems situated when you give it weird prompts or tasks), but how does it know what other agents it's being compared to this time and what their performance in the new environment is?

What comes after Roam's renaissance?

gwern6d8441

I have to say, I still don't understand the cult of Roam or why people were so impressed by, eg. the [[link]] syntax borrowed from English Wikipedia (which introduced it something like 18 years before on what is still the most widely-read & edited wiki software in history), which you remark on repeatedly. Even in 2019 in beta it just seemed like a personal wiki, not much different from, say, PmWiki (2002) with some more emphasis than usual on the common backlink or 'reverse citation' functionality (that so many hypertext systems had supported going back decades in parallel with Xanadu ideas). It may be nicer than, say, English Wikipedia's "WhatLinksHere" (which has been there since before I began using it early in the 2000s), but nothing to create a social-media cult over or sell "courses" about (!).

But if the bubble has burst, it's not hard to see why: any note-taking, personal knowledge management, or personal wiki system is inherently limited by the fact that they require a lot of work for what is, for most people, little gain. For most people, trying to track all of this stuff is as useful as exact itemized grocery store receipts from 5 years ago.

Most people simply have no need for lots of half-formed ideas, random lists of research papers, and so on. This is what people always miss about Zettelkasten: are you writing a book? Are you a historian or German scholar? Do you publish a dozen papers a year? No? Then why do you think you need a Zettelkasten? If you are going to be pulling out a decent chunk of those references for an essay or something, possibly decades from now, then it can be worth the upfront cost of entering references into your system, knowing that you'll never use most of them and the benefit is mostly from the long tail, and you will, in the natural course of usage, periodically look over them to foster serendipity & creativity; if you aren't writing all that, then there's no long tail, no real benefit, no intrinsic review & serendipity, and it's just a massive time & energy sink. Eventually, the user abandons it... and their life gets better.

Further, these systems are inherently passive, and force people to become secretaries, typists, reference librarians, archivists, & writers simply to keep it from rotting (quite aside from any mere software issue), to keep it up to date, revise tenses or references, fix spelling errors, deal with link rot, and so on. (Surprisingly, most people do not find that enjoyable.) There is no intelligence in such systems, and they don't do anything. The user still has to do all the thinking, and it adds on a lot of thinking overhead.

So what comes after Roam and other personal systems which force the user to do all the thinking? I should think that would be obvious: systems which can think for the user instead. LLMs and other contemporary AI are wildly underused in the personal system space right now, and can potentially fix a lot of these issues, through approaches like actively surfacing connections instead of passively waiting for the user to make them on their own and manually record them, and can proactively suggest edits & updates & fixes that the user simply approves in batches. (Think of how much easier it is to copyedit a document using a spellcheck as a series of Y/N semi-automatic edits, than to go through it by eye, fixing typos.)

However, like most such paradigm shifts, it will be hard to tack it onto existing systems. You can't reap the full benefits of LLMs with some tweaks like 'let's embed documents and add a little retrieval pane!'. You need to rethink the entire system and rewrite it from the ground up on the basis of making neural nets do as much as possible, to figure out the new capabilities and design patterns, and what to drop from the old obsolete personal wikis like Roam.

From what it sounds like, the Roam community would never stand for that, and I have a lot of doubts about whether it makes sense economically to try. It seems like if one wanted to do that, it would be better to start with a clean sheet (and an empty cap table).

LESSWRONG
LW

Posts

Wiki Contributions

Comments