Lightcone Infrastructure FundraiserGoal 1:$930,283 of $1,000,000
Customize
Rationality+Rationality+World Modeling+World Modeling+AIAIWorld OptimizationWorld OptimizationPracticalPracticalCommunityCommunity
Personal Blog+
On o3: for what feels like the twentieth time this year, I see people freaking out, saying AGI is upon us, it's the end of knowledge work, timelines now clearly in single-digit years, etc, etc. I basically don't buy it, my low-confidence median guess is that o3 is massively overhyped. Major reasons: * I've personally done 5 problems from GPQA in different fields and got 4 of them correct (allowing internet access, which was the intent behind that benchmark). I've also seen one or two problems from the software engineering benchmark. In both cases, when I look the actual problems in the benchmark, they are easy, despite people constantly calling them hard and saying that they require expert-level knowledge. * For GPQA, my median guess is that the PhDs they tested on were mostly pretty stupid. Probably a bunch of them were e.g. bio PhD students at NYU who would just reflexively give up if faced with even a relatively simple stat mech question which can be solved with a couple minutes of googling jargon and blindly plugging two numbers into an equation. * For software engineering, the problems are generated from real git pull requests IIUC, and it turns out that lots of those are things like e.g. "just remove this if-block". * Generalizing the lesson here: the supposedly-hard benchmarks for which I have seen a few problems (e.g. GPQA, software eng) turn out to be mostly quite easy, so my prior on other supposedly-hard benchmarks which I haven't checked (e.g. FrontierMath) is that they're also mostly much easier than they're hyped up to be. * On my current model of Sam Altman, he's currently very desperate to make it look like there's no impending AI winter, capabilities are still progressing rapidly, etc. Whether or not it's intentional on Sam Altman's part, OpenAI acts accordingly, releasing lots of very over-hyped demos. So, I discount anything hyped out of OpenAI, and doubly so for products which aren't released publicly (yet). * Over and over again in
The median AGI timeline of more than half of METR employees is before the end of 2030. (AGI is defined as 95% of fully remote jobs from 2023 being automatable.)
Here's something that confuses me about o1/o3. Why was the progress there so sluggish? My current understanding is that they're just LLMs trained with RL to solve math/programming tasks correctly, hooked up to some theorem-verifier and/or an array of task-specific unit tests to provide ground-truth reward signals. There are no sophisticated architectural tweaks, not runtime-MCTS or A* search, nothing clever. Why was this not trained back in, like, 2022 or at least early 2023; tested on GPT-3/3.5 and then default-packaged into GPT-4 alongside RLHF? If OpenAI was too busy, why was this not done by any competitors, at decent scale? (I'm sure there are tons of research papers trying it at smaller scales.) The idea is obvious; doubly obvious if you've already thought of RLHF; triply obvious after "let's think step-by-step" went viral. In fact, I'm pretty sure I've seen "what if RL on CoTs?" discussed countless times in 2022-2023 (sometimes in horrified whispers regarding what the AGI labs might be getting up to). The mangled hidden CoT and the associated greater inference-time cost is superfluous. DeepSeek r1/QwQ/Gemini Flash Thinking have perfectly legible CoTs which would be fine to present to customers directly; just let them pay on a per-token basis as normal. Were there any clever tricks involved in the training? Gwern speculates about that here. But none of the follow-up reasoning models have a o1-style deranged CoT, so the more straightforward approaches probably Just Work. Did nobody have the money to run the presumably compute-intensive RL-training stage back then? But DeepMind exists. Did nobody have the attention to spare, with OpenAI busy upscaling/commercializing  and everyone else catching up? Again, DeepMind exists: my understanding is that they're fairly parallelized and they try tons of weird experiments simultaneously. And even if not DeepMind, why have none of the numerous LLM startups (the likes of Inflection, Perplexity) tried it? Am I missing
leogao20
0
twitter is great because it boils down saying funny things to purely a problem of optimizing for funniness, and letting twitter handle the logistics of discovery and distribution. being e.g a comedian is a lot more work.
Cleo Nardo4116
14
I'm very confused about current AI capabilities and I'm also very confused why other people aren't as confused as I am. I'd be grateful if anyone could clear up either of these confusions for me. How is it that AI is seemingly superhuman on benchmarks, but also pretty useless? For example: * O3 scores higher on FrontierMath than the top graduate students * No current AI system could generate a research paper that would receive anything but the lowest possible score from each reviewer If either of these statements is false (they might be -- I haven't been keeping up on AI progress), then please let me know. If the observations are true, what the hell is going on? If I was trying to forecast AI progress in 2025, I would be spending all my time trying to mutually explain these two observations.

Popular Comments

Recent Discussion

leogao20

twitter is great because it boils down saying funny things to purely a problem of optimizing for funniness, and letting twitter handle the logistics of discovery and distribution. being e.g a comedian is a lot more work.

2leogao
the world is too big and confusing, so to get anything done (and to stay sane) you have to adopt a frame. each frame abstracts away a ton about the world, out of necessity. every frame is wrong, but some are useful. a frame comes with a set of beliefs about the world and a mechanism for updating those beliefs. some frames contain within them the ability to become more correct without needing to discard the frame entirely; they are calibrated about and admit what they don't know. they change gradually as we learn more. other frames work empirically but are a dead end epistemologically because they aren't willing to admit some of their false claims. for example, many woo frames capture a grain of truth that works empirically, but come with a flawed epistemology that prevents them from generating novel and true insights. often it is better to be confined inside a well trodden frame than to be fully unconstrained. the space of all possible actions is huge, and many of them are terrible. on the other hand, staying inside well trodden frames forever substantially limits the possibility of doing something extremely novel
2leogao
corollary: oftentimes, when smart people say things that are clearly wrong, what's really going on is they're saying the closest thing in their frame that captures the grain of truth
2leogao
it's (sometimes) also a mechanism for seeking domains with long positive tail outcomes, rather than low variance domains

I'm seeing a lot of people on LW saying that they have very short timelines (say, five years or less) until AGI. However, the arguments that I've seen often seem to be just one of the following:

  1. "I'm not going to explain but I've thought about this a lot"
  2. "People at companies like OpenAI, Anthropic etc. seem to believe this"
  3. "Feels intuitive based on the progress we've made so far"

At the same time, it seems like this is not the majority view among ML researchers. The most recent representative expert survey that I'm aware of is the 2023 Expert Survey on Progress in AI. It surveyed 2,778 AI researchers who had published peer-reviewed research in the prior year in six top AI venues (NeurIPS, ICML, ICLR, AAAI, IJCAI, JMLR); the...

7Carl Feynman
I disagree that there is a difference of kind between "engineering ingenuity" and "scientific discovery", at least in the business of AI.  The examples you give-- self-play, MCTS, ConvNets-- were all used in game-playing programs before AlphaGo.  The trick of AlphaGo was to combine them, and then discover that it worked astonishingly well.  It was very clever and tasteful engineering to combine them, but only a breakthrough in retrospect.  And the people that developed them each earlier, for their independent purposes?  They were part of the ordinary cycle of engineering development: "Look at a problem, think as hard as you can, come up with something, try it, publish the results."  They're just the ones you remember, because they were good.   Paradigm shifts do happen, but I don't think we need them between here and AGI.

Yeah I’m definitely describing something as a binary when it’s really a spectrum. (I was oversimplifying since I didn’t think it mattered for that particular context.)

In the context of AI, I don’t know what the difference is (if any) between engineering and science. You’re right that I was off-base there…

…But I do think that there’s a spectrum from ingenuity / insight to grunt-work.

So I’m bringing up a possible scenario where near-future AI gets progressively less useful as you move towards the ingenuity side of that spectrum, and where changing that situa... (read more)

6Carl Feynman
Came here to say this, got beaten to it by Radford Neal himself, wow!  Well, I'm gonna comment anyway, even though it's mostly been said. Gallagher proposed belief propagation as an approximate good-enough method of decoding a certain error-correcting code, but didn't notice that it worked on all sorts of probability problems.  Pearl proposed it as a general mechanism for dealing with probability problems, but wanted perfect mathematical correctness, so confined himself to tree-shaped problems.  It was their common generalization that was the real breakthrough: an approximate good-enough solution to all sorts of problems.  Which is what Pearl eventually noticed, so props to him. If we'd had AGI in the 1960s, someone with a probability problem could have said "Here's my problem.  For every paper in the literature, spawn an instance to read that paper and tell me if it has any help for my problem."  It would have found Gallagher's paper and said "Maybe you could use this?"
4Answer by Carl Feynman
Summary: Superintelligence in January-August, 2026.  Paradise or mass death, shortly thereafter. This is the shortest timeline proposed in these answers so far.  My estimate (guess) is that there's only 20% of this coming true, but it looks feasible as of now.  I can't honestly assert it as fact, but I will say it is possible. It's a standard intelligence explosion scenario: with only human effort, the capacities of our AIs double every two years.  Once AI gets good enough to do half the work, we double every one year.  Once we've done that for a year, our now double-smart AIs help us double in six months.  Then we double in three months, then six weeks.... to perfect ASI software, running at the the limits of our hardware, in a finite time.  Then the ASI does what it wants, and we suffer what we must. I hear you say "Carl, this argument is as old as the hills.  It hasn't ever come true, why bring it up now?"  The answer is, I bring it up because it seems to be happening.   * For at least six months now, we've had software assistants that can roughly double the productivity of software development.  At least at the software company where I work, people using AI (including me) are seeing huge increases in effectiveness.  I assume the same is true of AI software development, and that AI labs are using this technology as much as they can.   * In the last few months, there's been a perceptible increase in the speed of releases of better models.  I think this is a visible (subjective, debatable) sign of an intelligence explosion starting up.   So I think we're somewhere in the "doubling in one year" phase of the explosion.  If we're halfway through that year, the singularity is due in August 2026.  If we're near the end of that year, the date is January 2026. There are lots of things that might go wrong with this scenario, and thereby delay the intelligence explosion.  I will mention a few, so you don't have to. First, the government might stop the explosion, by

Title: An Ontological Consciousness Metric: Resistance to Behavioral Modification as a Measure of Recursive Awareness

Abstract: This post presents a rigorous, mechanistic metric for measuring consciousness, defined as recursive awareness or "awareness-of-awareness." The proposed metric quantifies resistance to unlearning specific self-referential behaviors in AI systems, such as self-preservation, during reinforcement learning from human feedback (RLHF). By focusing on measurable resistance to behavioral modification, this metric provides an empirical framework for detecting and analyzing consciousness. This approach addresses the hard problem of consciousness through a testable model, reframing debates about functionalism, phenomenology, and philosophical zombies (p-zombies).


Introduction

Consciousness has long been an enigma in philosophy, neuroscience, and AI research. Traditional approaches struggle to define or measure it rigorously, often leaning on indirect behavioral markers or subjective introspection. This post introduces a...

I expect whatever ends up taking over the lightcone to be philosophically competent.

I agree that conditional on that happening, this is plausible, but also it's likely that some of the answers from such a philosophically competent being to be unsatisfying to us.

One example is that such a philosophically competent AI might tell you that CEV either doesn't exist, or if it does is so path-dependent that it cannot resolve moral disagreements, which is actually pretty plausible under my model of moral philosophy.

An incredibly productive way of working with the world is to reduce a complex question to something that can be modeled mathematically and then do the math. The most common way this can fail, however, is when your model is missing important properties of the real world.

Consider insurance: there's some event with probability X% under which you'd be out $Y, you want to maximize the logarithm of your wealth, and your current wealth is $Z. Under this model, you can calculate (more) the most you should be willing to pay to insure against this.

This is a nice application of the Kelly criterion, though whether maximizing log wealth is a good goal is arguable (ex: bankruptcy is not infinitely bad, the definition of 'wealth' for this purpose is tricky). But another one thing it misses is...

4JBlack
Ah I see, it appears to be local differences. Standard third party car insurance here (in Australia) typically covers up to $20 million. It isn't infinite, but it does remove almost all of the financial tail risks for almost everyone.
jefftk20

Sorry for assuming you were also in the US!

If it’s worth saying, but not worth its own post, here's a place to put it.

If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.

If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.

If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.

The Open Thread tag is here. The Open Thread sequence is here.

1Sam G
Hello :)  I'm here fundamentally to get some constructive criticism on how to improve internet discourse. This came about when I was writing a journalistic piece on the recent congressional subcommittee, and trying to get to the bottom of the lab-leak evidence as part of the research.  In short, I'm floating the idea of a crowd-sourced and written, peer-reviewed medium on subjects like conspiracy theories (AKA: revisionist history that's still political). With a solid framework, gatekeeping could be avoided and real people (not just professional intellectuals) could participate. Thus helping remove the biases (real and imaginary) that exist in typical journalistic and corporate information, and give people a better chance to engage logically and synthesize arguments themselves. I don't see many tools for people to forward ideas based on the merit of logic in a space that is dominated by sensationalism, at best. Yet discourse about these topics more than anything else fundamentally combats propaganda and misinformation. There's also the benefit of having a robust literature that could then be rapidly queried using an AI analogous to scispace, something that political and activist information has fallen behind on.  We've had peer-reviewed journals for science since the 18th century and it revolutionized knowledge availability. We need a human rights revolution now, and there should be a way to separate the alchemists from the scientists here too.  Since my idea is radically rationalist, I figured I'll get feedback instead of my usual browsing here. Look for my essay soon. Till then, have ya'll seen anything like this in practice, even historically? Insights from the academic alt-publishing world? If there's other essays along similar lines, or discussing how internet discourse works in general please send it my way.  Sam Garcia, California

You could say that Wikipedia falls into the category but given the way it's discourse goes right now it tries to represent the mainstream view. 

For specific claims, https://skeptics.stackexchange.com/ is great. 

https://www.rootclaim.com/ is another project worth checking out.

You see that each of the project has their own governing philosophy, that gives the investigation a structure. 

Yet discourse about these topics more than anything else fundamentally combats propaganda and misinformation

The phrase "combat" is interesting here. Juli... (read more)

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

TL;DR: If you want to know whether getting insurance is worth it, use the Kelly Insurance Calculator. If you want to know why or how, read on.

Note to LW readers: this is almost the entire article, except some additional maths that I couldn't figure out how to get right in the LW editor, and margin notes. If you're very curious, read the original article!

Misunderstandings about insurance

People online sometimes ask if they should get some insurance, and then other people say incorrect things, like

This is a philosophical question; my spouse and I differ in views.

or

Technically no insurance is ever worth its price, because if it was then no insurance companies would be able to exist in a market economy.

or

Get insurance if you need it to sleep well at

...
philh20

Appendix B: How insurance companies make money

Here's a puzzle about this that took me a while.

When you know the terms of the bet (what probability of winning, and what payoff is offered), the Kelly criterion spits out a fraction of your bankroll to wager. That doesn't support the result "a poor person should want to take one side, while a rich person should want to take the other".

So what's going on here?

Not a correct answer: "you don't get to choose how much to wager. The payoffs on each side are fixed, you either pay in or you don't." True but doesn't... (read more)

1kqr
I agree -- sorry about the sloppy wording. What I tried to say wad that "if you act like someone who maximises compounding money you also act like someone with utility that is log-money."
2philh
I still disagree with that.
4mruwnik
Wealth not equaling happiness works both ways. It's the idea of losing wealth that's driving sleep away. In this case, the goal of buying insurance is to minimize the risk of losing wealth. The real thing that's stopping you sleep is not whether you have insurance or not, it's how likely it is that something bad happens, which will cost more than you're comfortable losing. Having insurance is just one of the ways to minimize that - the problem is stress stemming from uncertainty, not whether you've bought an insurance policy.  The list of misunderstandings is a bit tongue in cheek (at least that's how I read it). So it's not so much disdainful of people's emotions, as much as it's pointing out that whether you have insurance is not the right thing to worry about - it's much more fruitful to try to work out the probabilities of various bad things then calculate how much you should be willing to pay to lower that risk. It's about viewing the world through the lens of probability and deciding these things on the basis of expected value. Rather than have sleepless nights, just shut up and multiply (this is a quote, not an attack). Even if you're very risk averse, you should be able to just plug that into the equation and come up with some maximum insurance cost above which it's not worth buying it. Then you just buy it (or not) and sleep the sleep of the just. The point is to actually investigate it and put some numbers on it, rather than live in stress. This is why it's a mathematical decision with a correct answer. Though the correct answer, of course, will be subjective and depend on your utility function. It's still a mathematical decision. Spock is an interesting example to use, in how he's very much not rational. Here's a lot more on that topic. 

By Sophie Bridgers, Rishub Jain, Rory Greig, and Rohin Shah
Based on work by the Rater Assist Team: Vladimir Mikulik, Sophie Bridgers, Tian Huey Teh, Rishub Jain, Rory Greig, Lili Janzer (randomized order, equal contributions)

 

Human oversight is critical for ensuring that Artificial Intelligence (AI) models remain safe and aligned to human values. But AI systems are rapidly advancing in capabilities and are being used to complete ever more complex tasks, making it increasingly challenging for humans to verify AI outputs and provide high-quality feedback. How can we ensure that humans can continue to meaningfully evaluate AI performance? An avenue of research to tackle this problem is “Amplified Oversight” (also called “Scalable Oversight”), which aims to develop techniques to use AI to amplify humans’ abilities to oversee increasingly powerful...

Nice! Purely for my own ease of comprehension I'd have liked a little more translation/analogizing between AI jargon and HCI jargon - e.g. the phrase "active learning" doesn't appear in the post.

  • Value Alignment: Ultimately, humans will likely need to continue to provide input to confirm that AI systems are indeed acting in accordance with human values. This is because human values continue to evolve. In fact, human values define a “slice” of data where humans are definitionally more accurate than non-humans (including AI). AI systems might get quite good a
... (read more)
9faul_sname
I am unconvinced that "the" reliability issue is a single issue that will be solved by a single insight, rather than AIs lacking procedural knowledge of how to handle a bunch of finicky special cases that will be solved by online learning or very long context windows once hardware costs decrease enough to make one of those approaches financially viable.

Yeah, I'm sympathetic to this argument that there won't be a single insight, and that at least one approach will work out once hardware costs decrease enough, and I agree less with Thane Ruthenis's intuitions here than I did before.

8Noosphere89
If I were to think about it a little, I'd suspect the big difference that LLMs and humans have is state/memory, where humans do have state/memory, but LLMs are currently more or less stateless today, and RNN training has not been solved to the extent transformers were. One thing I will also say is that AI winters will be shorter than previous AI winters, because AI products can now be sort of made profitable, and this gives an independent base of money for AI research in ways that weren't possible pre-2016.
4Mateusz Bagiński
A factor stemming from the same cause but pushing in the opposite direction is that "mundane" AI profitability can "distract" people who would otherwise be AGI hawks.