Best of LessWrong 2022

This post explores the concept of simulators in AI, particularly self-supervised models like GPT. Janus argues that GPT and similar models are best understood as simulators that can generate various simulacra, not as agents themselves. This framing helps explain many counterintuitive properties of language models. Powerful simulators could have major implications for AI capabilities and alignment.

Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
Akash310
22
Suppose the US government pursued a "Manhattan Project for AGI". At its onset, it's primarily fuelled by a desire to beat China to AGI. However, there's some chance that its motivation shifts over time (e.g., if the government ends up thinking that misalignment risks are a big deal, its approach to AGI might change.) Do you think this would be (a) better than the current situation, (b) worse than the current situation, or (c) it depends on XYZ factors?
StefanHex370
7
Collection of some mech interp knowledge about transformers: Writing up folk wisdom & recent results, mostly for mentees and as a link to send to people. Aimed at people who are already a bit familiar with mech interp. I've just quickly written down what came to my head, and may have missed or misrepresented some things. In particular, the last point is very brief and deserves a much more expanded comment at some point. The opinions expressed here are my own and do not necessarily reflect the views of Apollo Research. Transformers take in a sequence of tokens, and return logprob predictions for the next token. We think it works like this: 1. Activations represent a sum of feature directions, each direction representing to some semantic concept. The magnitude of directions corresponds to the strength or importance of the concept. 1. These features may be 1-dimensional, but maybe multi-dimensional features make sense too. We can either allow for multi-dimensional features (e.g. circle of days of the week), acknowledge that the relative directions of feature embeddings matter (e.g. considering days of the week individual features but span a circle), or both. See also Jake Mendel's post. 2. The concepts may be "linearly" encoded, in the sense that two concepts A and B being present (say with strengths α and β) are represented as α*vector_A + β*vector_B). This is the key assumption of linear representation hypothesis. See Chris Olah & Adam Jermyn but also Lewis Smith. 2. The residual stream of a transformer stores information the model needs later. Attention and MLP layers read from and write to this residual stream. Think of it as a kind of "shared memory", with this picture in your head, from Anthropic's famous AMFTC. 1. This residual stream seems to slowly accumulate information throughout the forward pass, as suggested by LogitLens. 2. Additionally, we expect there to be internally-relevant information inside the residual stream, such as whether
Every time I have an application form for some event, the pattern is always the same. Steady trickle of applications, and then a doubling on the last day. And for some reason it still surprises me how accurate this model is. The trickle can be a bit uneven, but the doubling the last day is usually close to spot on. This means that by the time I have a good estimate of what the average number of applications per day is, then I can predict what the final number will be. This is very useful, for knowing if I need to advertise more or not. For the upcoming AISC, the trickle was a late skewed, which meant that an early estimate had me at around 200 applicants, but the final number of on-time application is 356. I think this is because we where a bit slow at advertising early on, but Remmelt made a good job sending out reminders towards the end. Application deadline was Nov 17.  At midnight GMT before Nov 17 we had 172 application.  At noon GMT Nov 18 (end of Nov 17 anywhere-on-Earth) we had 356 application  The doubling rule predicted 344, which is only 3% off Yes, I count the last 36 hours as "the last day". This is not cheating since that's what I always done (approximately [1]), since starting to observe this pattern. It's the natural thing to do when you live at or close to GMT, or at least if your brain works like mine.  1. ^ I've always used my local midnight as the divider. Sometimes that has been Central European Time, and sometimes there is daylight saving time. But it's all pretty close.
I wish there was a bibTeX functionality for alignment forum posts...
'🚨 The annual report of the US-China Economic and Security Review Commission is now live. 🚨 Its top recommendation is for Congress and the DoD to fund a Manhattan Project-like program to race to AGI. Buckle up...'  https://x.com/hamandcheese/status/1858897287268725080

Popular Comments

Recent Discussion

DeepSeek-R1-Lite-Preview was announced today. It's available via chatbot. (Post; translation of Chinese post.)

DeepSeek says it will release the weights and publish a report.

The model appears to be stronger than o1-preview on math, similar on coding, and weaker on other tasks.

DeepSeek is Chinese. I'm not really familiar with the company. I thought Chinese companies were at least a year behind the frontier; now I don't know what to think and hope people do more evals and play with this model. Chinese companies tend to game benchmarks more than the frontier Western companies, but I think DeepSeek hasn't gamed benchmarks much historically.

The post also shows inference-time scaling, like o1:

Note that o1 is substantially stronger than o1-preview; see the o1 post:

(Parts of this post and some of my comments are stolen from various people-who-are-not-me.)

4Zach Stein-Perlman
Claim by SemiAnalysis guy: DeepSeek has over 50,000 H100s. (And presumably they don't spend much compute on inference.)
2Zach Stein-Perlman
Some context: June FT article on DeepSeek; parent hedge fund Wikipedia page.
8nikola
One weird detail I noticed is that in DeepSeek's results, they claim GPT-4o's pass@1 accuracy on MATH is 76.6%, but OpenAI claims it's 60.3% in their o1 blog post.  This is quite confusing as it's a large difference that seems hard to explain with different training checkpoints of 4o.

It seems that 76.6% originally came from the GPT-4o announcement blog post. I'm not sure why it dropped to 60.3% by the time of o1's blog post.

Many of you readers may instinctively know that this is wrong. If you flip a coin (50% chance) twice, you are not guaranteed to get heads. The probability of getting a heads is 75%. However you may be surprised to learn that there is some truth to this statement; modifying the statement just slightly will yield not just a true statement, but a useful and interesting one.

It's a spoiler, though. If you want to figure this out as you read this article yourself, you should skip this and then come back. Ok, ready? Here it is:

It's a  chance and I did it  times, so the probability should be... 
Almost always.

 

The math:

Suppose you're flipping a coin and you want to find the probability of NOT flipping a single heads in a...

jmh20

Years ago when I was hanging out with day traders there was a heuristic they all seemed to hold. If their trading model was producing winning trades two out of three times they thought the model was good and could be used. No one ever suggested why that particular rate was the shared meme/norm -- why not 4 out of 5 or 3 out of 5. I wonder if empirically (or just intuitively over time) they simply approximated the results in this post.

Or maybe just a coincidence, but generally when money is at stake I think the common practices will tend to reflect some fundamental fact of the environment. 

1transhumanist_atom_understander
My guesses at what the spoiler was going to be: * Ten non-independent trials, a 10% chance each (in the prior state of knowledge, not conditional on previous results,), and only one trial can succeed. You satisfy these conditions with something like "I hid a ball in one of ten boxes", and the chance really is 100% that one is a "success". * Regardless of whether the trials are independent, the maximum probability that at least one is a success is the sum of the probabilities per trial. In this case that doesn't yield a useful bound because we already know probabilities are below 100%, but in general it's useful. Yeah, it's cool that "I did n trials, with a 1/n chance each, so the probability of at least one success is... " does have a general answer, even if it's not 100%. Just noting that it's not the only small modification of the title yielding a useful and interesting correct statement. The ones that came to my mind still involved the sum of the per-trial probabilities. If it was clear that we were looking for something preserving the "n trials with 1/n chance", rather than the summation, I think it would have been more obvious where you're going with this.
2Dweomite
Is that error common?  I can only recall encountering one instance of it with surety, and I only know about that particular example because it was signal-boosted by people who were mocking it.
1noggin-scratcher
I know someone who taught math to low-ability kids, and reported finding it difficult to persuade them otherwise. I assume some number of them carried on into adulthood still doing it.

TL;DR version

In the course of my life, there have been a handful of times I discovered an idea that changed the way I thought about where our species is headed. The first occurred when I picked up Nick Bostrom’s book “superintelligence” and realized that AI would utterly transform the world. The second was when I learned about embryo selection and how it could change future generations. And the third happened a few months ago when I read a message from a friend of mine on Discord about editing the genome of a living person.

We’ve had gene therapy to treat cancer and single gene disorders for decades. But the process involved in making such changes to the cells of a living person is excruciating and extremely expensive. CAR T-cell therapy,...

1ASingh21112
Will you be working on enhancement of other cognitive abilities besides intelligence, such as memory?

I think it is an obvious yes. I tend to think of intelligence as being efficient with energy, fast, accounting for any possibly useful stimuli or information and creativity. Now creativity, relies very much so on memory. One of the functions of the nervous system is to navigate and keep safe of the body from dying basically, keeping the system on. If that has to be done, then it is useful to have nervous system that properly, or barely or enough properly maps the environment, collects information and keeps a record of it too so when that encounter happens ... (read more)

This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race for Machine Superintelligence. Consider subscribing to stay up to date with my work.

An influential congressional commission is calling for a militarized race to build superintelligent AI based on threadbare evidence

The US-China AI rivalry is entering a dangerous new phase. 

Earlier today, the US-China Economic and Security Review Commission (USCC) released its annual report, with the following as its top recommendation: 

Congress establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence (AGI) capability. AGI is generally defined as

...
2garrison
As mentioned in another reply, I'm planning to do a lot more research and interviews on this topic, especially with people who are more hawkish on China. I also think it's important that unsupported claims with large stakes get timely pushback, which is in tension with the type of information gathering you're recommending (which is also really important, TBC!).
Raemon20

Oh to be clear I don’t think it was bad for you to post this as-is. Just that I’d like to see more followup

1garrison
Claiming that China as a country is racing toward AGI != Chinese AI companies aren't fast following US AI companies, which are explicitly trying to build AGI. This is a big distinction!
1garrison
Hey Seth, appreciate the detailed engagement. I don't think the 2017 report is the best way to understand what China's intentions are WRT to AI, but there was nothing in the report to support Helberg's claim to Reuters. I also cite multiple other sources discussing more recent developments (with the caveat in the piece that they should be taken with a grain of salt). I think the fact that this commission was not able to find evidence for the "China is racing to AGI" claim is actually pretty convincing evidence in itself. I'm very interested in better understanding China's intentions here and plan to deep dive into it over the next few months, but I didn't want to wait until I could exhaustively search for the evidence that the report should have offered while an extremely dangerous and unsupported narrative takes off. I also really don't get the error pushback. These really were less technical errors than basic factual errors and incoherent statements. They speak to a sloppiness that should affect how seriously the report should be taken. I'm not one to gatekeep ai expertise, but idt it's too much to expect a congressional commission with a top recommendation to commence in a militaristic AI arms race to have SOMEONE read a draft who knows that chatgpt-3 isn't a thing.

Nobody designing a financial system today would invent credit cards. The Western world uses credit cards because replacing legacy systems is expensive. China doesn't use credit cards. They skipped straight from cash to WeChat Pay. Skipping straight to the newest technology when you're playing catch-up is called leapfrogging.

A world-class military takes decades to create. The United States' oldest active aircraft carrier was commissioned in 1975. For reference, the Microsoft Windows operating system was released in 1985. The backbone of NATO's armed forces was designed for a world before autonomous drones and machine learning.

The United States dominates at modern warfare. Developed in WWII, modern warfare combines tanks, aircraft, artillery and mechanized[1] infantry to advance faster than the enemy can coordinate a response.

Modern warfare is expensive—and not just because...

lsusr20

You're right. I just like the phrase "postmodern warfare" because I think it's funny.

NB: This week there is a film-watching event afterwards. Vote in the comments on what film we watch. Yes, you have to read the sequences in order to join the film-watching.

Come get old-fashioned with us, and let's read the sequences at Lighthaven! We'll show up, mingle, do intros, and then split off into randomized groups for some sequences discussion. Please do the reading beforehand - it should be no more than 20 minutes of reading.

This group is aimed for people who are new to the sequences and would enjoy a group experience, but also for people who've been around LessWrong and LessWrong meetups for a while and would like a refresher.

This meetup will also have dinner provided! We'll be ordering pizza-of-the-day from Sliver (including 2 vegan pizzas).

...
2trevor
Screen arrangement suggestion: Rather than everyone sitting in a single crowd and commenting on the film, we split into two clusters, one closer to the screen and one further.  The people in the front cluster hope to watch the film quietly, the people in the back cluster aim to comment/converse/socialize during the film, with the common knowledge that they should aim to not be audible to the people in the front group, and people can form clusters and move between them freely.  The value of this depends on what film is chosen; eg "A space Odyssey" is not watchable without discussing historical context and "Tenet" ought to have some viewers wanting to better understand the details of what time travelly thing just happened.
5Said Achmiz
… why? I’ve watched this movie, and I… don’t think I’m aware of any special “historical context” that was relevant to it. (Or, at any rate, I don’t know what you mean by this.) It seemed to work out fine…
trevor40

The content/minute rate is too low, it follows 1960s film standards where audiences weren't interested in science fiction films unless concepts were introduced to them very very slowly (at the time they were quite satisfied by this due to lower standards, similar to Shakespeare).

As a result it is not enjoyable (people will be on their phones) unless you spend much of the film either thinking or talking with friends about how it might have affected the course of science fiction as a foundational work in the genre (almost every sci-fi fan and writer at the time watched it).

1PaulBecon
Rashomon (Kurosawa in Japanese) epistemics of 5 people who share an experience but have disjoint recollections
To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with
Daniel Kokotajlo

Here's a fairly concrete AGI safety proposal:

 

Default AGI design: Let's suppose we are starting with a pretrained LLM 'base model' and then we are going to do a ton of additional RL ('agency training') to turn it into a general-purpose autonomous agent. So, during training it'll do lots of CoT of 'reasoning' (think like how o1 does it) and then it'll output some text that the user or some external interface sees (e.g. typing into a browser, or a chat window), and then maybe it'll get some external input (the user's reply, etc.) and then the process repeats many times, and then some process evaluates overall performance (by looking at the entire trajectory as well as the final result) and doles out reinforcement.

Proposal part 1: Shoggoth/Face

...
6johnswentworth
The problem with that sort of attitude is that, when the "experiment" yields so few bits and has such a tenuous connection to the thing we actually care about (as in Charlie's concern), that's exactly when You Are Not Measuring What You Think You Are Measuring bites real hard. Like, sure, you'll see this system do something in the toy chess experiment, but that's just not going to be particularly relevant to the things an actual smarter-than-human AI does in the situations Charlie's concerned about. If anything, the experimenter is far more to likely to fool themselves into thinking their results are relevant to Charlie's concern than they are to correctly learn anything relevant to Charlie's concern.
4Daniel Kokotajlo
That's a reasonable point and a good cautionary note. Nevertheless, I think someone should do the experiment I described. It feels like a good start to me, even though it doesn't solve Charlie's concern.

I haven't decided yet whether to write up a proper "Why Not Just..." for the post's proposal, but here's an overcompressed summary. (Note that I'm intentionally playing devil's advocate here, not giving an all-things-considered reflectively-endorsed take, but the object-level part of my reflectively-endorsed take would be pretty close to this.)

Charlie's concern isn't the only thing it doesn't handle. The only thing this proposal does handle is an AI extremely similar to today's, thinking very explicitly about intentional deception, and even then the propos... (read more)

2Bogdan Ionut Cirstea
Here's a somewhat wild idea to have a 'canary in a coalmine' when it comes to steganography and non-human (linguistic) representations: monitor for very sharp drops in BrainScores (linear correlations between LM activations and brain measurements, on the same inputs) - e.g. like those calculated in Scaling laws for language encoding models in fMRI. (Ideally using larger, more diverse, higher-resolution brain data.)  

Trump and the Republican party will wield broad governmental control during what will almost certainly be a critical period for AGI development. In this post, we want to briefly share various frames and ideas we’ve been thinking through and actively pitching to Republican lawmakers over the past months in preparation for the possibility of a Trump win.

Why are we sharing this here? Given that >98% of the EAs and alignment researchers we surveyed earlier this year identified as everything-other-than-conservative, we consider thinking through these questions to be another strategically worthwhile neglected direction. 

(Along these lines, we also want to proactively emphasize that politics is the mind-killer, and that, regardless of one’s ideological convictions, those who earnestly care about alignment must take seriously the possibility that Trump will be the US president...

Thanks for clarifying. By "policy" and "standards" and "compelled speech" I thought you meant something more than community norms and customs. This is traditionally an important distinction to libertarians and free speech advocates. I think the distinction carves reality at the joints, and I hope you agree. I agree that community norms and customs can be unwelcoming.

3xpym
My biggest problem with the trans discourse is that it's a giant tower of motte-and-baileys, and there's no point where it's socially acceptable to get off the crazy train. Sure, at this point it seems likely that gender dysphoria isn't an entirely empty notion. Implying that this condition might be in any way undesirable is already a red line though, with discussions of how much of it is due to social contagion being very taboo, naturally. And that only people experiencing bad enough dysphoria to require hormones and/or surgery could claim to be legitimately trans is a battle lost long ago. Moving past that, there is non-binary, genderfluid, neo-genders, otherkin, etc, concepts that don't seem to be plausibly based in some currently known crippling biological glitch, and yet those identities are apparently just as legitimate. Where does it stop? Should society be entirely reorganized every time a new fad gains traction? Should everybody questioning that be ostracized? Then there's the "passing" issue. I accept the argument that nowadays in most social situations we have no strong reasons to care about chromosomes/etc, people can successfully play many roles traditionally associated with the opposite sex. But sexual dimorphism is the entire reason for having different pronouns in the first place, and yet apparently you don't even have to try (at all, let alone very hard) to "pass" as your chosen gender for your claim to be legitimate. What is the point? Here the unresolved tension between gender-critical and gender-affirming feminism is the most glaring.
1xpym
I'd say that atheism had already set the "conservatives not welcome" baseline way back when, and this resulted in the community norms evolving accordingly. Granted, these days the trans stuff is more salient, but the reason it flourished here even more than in other tech-adjacent spaces has much to do with that early baseline. Sure, but somebody admitting that certainly isn't the modal conservative.
1Sting
I wouldn't call the tone back then "conservatives not welcome". Conservatism is correlated with religiosity, but it's not the same thing. And I wouldn't even call the tone "religious people are unwelcome" -- people were perfectly civil with religious community members.  The community back then were willing to call irrational beliefs irrational, but they didn't go beyond that. Filtering out people who are militantly opposed to rational conclusions seems fine. 
3Oliver Daniels
I wish there was a bibTeX functionality for alignment forum posts...

Yeah, IMO we should just add a bunch of functionality for integrating alignment forum stuff more with academic things. It’s been on my to do list for a long time.