This is a great post, thanks for writing it. I agree that, when it comes to creative endeavours, there's just no "there" there with current AI systems. They just don't "get it". I'm reminded of this tweet:

Mark Cummins: After using Deep Research for a while, I finally get the “it’s just slop” complaint people have about AI art.

Because I don’t care much about art, most AI art seems pretty good to me. But information is something where I’m much closer to a connoisseur, and Deep Research is just nowhere near a good human output. It’s not useless, I think maybe ~20% of the time I get something I’m satisfied with. Even then, there’s this kind of hall-of-mirrors quality to the output, I can’t fully trust it, it’s subtly distorted. I feel like I’m wading through epistemic pollution.

Obviously it’s going to improve, and probably quite rapidly. If it read 10x more sources, thought 100x longer, and had 1000x lower error rate, I think that would do it. So no huge leap required, just turning some knobs, it’s definitely going to get there. But at the same time, it’s quite jarring to me that a large fraction of people already find the outputs compelling.

As someone who does care about art, and has, I think, discerning taste, I have always kind of felt this, and only when I read the above tweet did I realise that not everyone felt what I felt. When Sam Altman tweeted that story, which seemed to impress some people and inspire disgust/ridicule from others, the division became even clearer.

I think with Deep Research the slop is actually not as much of a problem -- you can just treat it as a web search on steroids and can always jump into the cited sources to verify things. And for similar reasons, it seems true that if DR "read 10x more sources, thought 100x longer, and had 1000x lower error rate", it could be as good as me at doing bounded investigations. For the hardest bits needed for AI to generate genuinely good creative fiction, it feels less obvious whether the same type of predictable progress will happen.

I think I'm less sure than you that the problem has to do with attractor basins, though. That does feel part of or related to the problem, but I think a larger issue is that chatbots are not coherent enough. Good art has a sort of underlying internal logic to it, which even if you do not notice it contributes to making the artwork feel like a unified whole. Chatbots don't do that, they are too all over the place.

Reply

The Failed Strategy of Artificial Intelligence Doomers

Erich_Grunewald2mo61

I think this article far overstates the extent to which these AI policy orgs (maybe with the exception of MIRI? but I don’t think so) are working towards an AI pause, or see the goal of policy/regulation as slowing AI development. (I mean policy orgs, not advocacy orgs.) I see as much more common policy objectives: creating transparency around AI development, directing R&D towards safety research, laying groundwork for international agreements, slowing Chinese AI development, etc. — things that (is the hope) are useful on their own, not because of any effect on timelines.

Reply

Erich_Grunewald's Shortform

Erich_Grunewald2mo10

On the advice of @adamShimi, I recently read Hasok Chang's Inventing Temperature. The book is terrific and full of deep ideas, many of which relate in interesting ways to AI safety. What follows are some thoughts on that relationship, from someone who is not an AI safety researcher and only somewhat follows developments there, and who probably got one or two things wrong.

(Definitions: By "operationalizing", I mean "giving a concept meaning by describing it in terms of measurable or closer-to-measurable operations", whereas "abstracting" means "removing properties in the description of an object".)

There has been discussion on LessWrong about the relative value of abstract work on AI safety (e.g., agent foundations) versus concrete work on AI safety (e.g., mechanistic interpretability, prosaic alignment). Proponents of abstract work argue roughly that general mathematical models of AI systems are useful or essential for understanding risks, especially coming from not-yet-existing systems like superintelligences. Proponents of concrete work argue roughly that safety work is more relevant when empirically grounded and subjected to rapid feedback loops. (Note: The abstract-concrete distinction is similar to, but different from, the distinction between applied and basic safety research.)

As someone who has done neither, I think we need both. We need abstract work because we need to build safety mechanisms using generalizable concepts, so that we can be confident that the mechanisms apply to new AI systems and new situations. We need concrete work because we must operationalize the abstract concepts in order to measure them and apply them to actually existing systems. And finally we need work that connects the abstract concepts to the concrete concepts, to see that they are coherent and for each to justify the other.

Chang writes:

The dichotomy between the abstract and the concrete has been enormously helpful in clarifying my thinking at the earlier stages, but I can now afford to be more sophisticated. What we really have is a continuum, or at least a stepwise sequence, between the most abstract and the most concrete. This means that the operationalization of a very abstract concept can proceed step by step, and so can the building-up of a concept from concrete operations. And it may be beneficial to move only a little bit at a time up and down the ladder of abstraction.

Take for example the concept of (capacity for) corrigibility, i.e., the degree to which an AI system can be corrected or shut down. The recent alignment faking paper showed that, in experiments, Claude would sometimes "pretend" to change its behavior when it was ostensibly being trained with new alignment criteria, while not actually changing its behavior. That's an interesting and important result. But (channeling Bridgman) we can only be confident that it applies to the concrete concept of corrigibility measured by the operations used in the experiments -- we have no guarantees that it holds for some abstract corrigibility, or when corrigibility is measured using another set of operations or under other circumstances.

An interesting case study discussed in the book is the development of the abstract concept of temperature by Lord Kelvin (the artist formerly known as William Thomson) in collaboration with James Prescott Joule (of conservation of energy fame). Thomson defined his abstract temperature in terms of work and pressure (which were themselves abstract and needed to be operationalized). He based his definition on the Carnot cycle, an idealized process performed by the theoretical Carnot heat engine. The Carnot heat engine was inspired by actual heat engines, but was fully theoretical -- there was no physical Carnot heat engine that could be used in experiments. In other words, the operationalization of temperature that Thomson invented using the Carnot cycle was an intermediate step that required further operationalization before Thomson's abstract temperature could be connected with experimental data. Chang suggests that, while the Carnot engine was never necessary for developing an abstract concept of temperature, it did help Thomson achieve that feat.

Ok, back to AI safety. So above I said that, for the whole AI thing to go well, we probably need progress on both abstract and concrete AI safety concepts, as well as work to bridge the two. But where should research effort be spent on the margin?

You may think abstract work is useless because it has no error-correcting mechanism when it is not trying to, or is not close to being able to, operationalize its abstract concepts. If it is not grounded in any measurable quantities, it can't be empirically validated. On the other hand, many abstract concepts (such as corrigibility) still make sense today and are currently being studied in the concrete (though they have not yet been connected to fully abstract concepts) despite being formulated before AI systems looked much like they do today.

You may think concrete work is useless because AI changes so quickly that the operations used to measure things today will soon be irrelevant, or more pertinently perhaps, because the superintelligent systems we truly need to align are presumably vastly different from today's AI systems, in their behavior if not in their architecture. In that way, AI is quite different from temperature. The physical nature of temperature is constant in space and time -- if you measure temperature with a specific set of operations (measurement tools and procedures), you would expect the same outcomes regardless of which century or country you do it in -- whereas the properties of AI change rapidly over time and across architectures. On the other hand, timelines seem short, such that AGI may share many similarities with today's AI systems, and it is possible to build abstractions gradually on top of concrete operations.

There is in fact an example from the history of thermometry of extending concrete concepts to new environments without recourse to abstract concepts. In the 18th century, scientists realized that the mercury and air thermometers used then behaved very differently, or could not be used at all due to freezing and melting, for very low and very high temperatures. While they had an intuitive notion that some abstract temperature ought to apply across all degrees of heat or cold, their operationalized temperatures clearly only applied to a limited range of heat and cold. To solve this, they eventually developed different sets of operations for the measurement of temperatures in extreme ranges. For example, Josiah Wedgwood measured very high temperatures in ovens by baking standardized clay cylinders and measuring how much they'd shrunk. These different operations, which yielded measurements of temperature on different scales, were then connected by measuring temperature for both scales (using different operations) in an overlapping range and lining those up. All this was done without an abstract theory of temperature, and while the resulting scale was not on very solid theoretical ground, it was good enough to provide practical value.

Of course, the issue with superintelligence is that, because of e.g., deceptive alignment and gradient hacking, we want trustworthy safety mechanisms and alignment techniques in place well before the system has finished training. That's why we want to tie those techniques to abstract concepts which we are confident will generalize well. But I have no idea what the appropriate resource allocation is across these different levels of abstraction.^[1] Maybe what I want to suggest is that abstract and concrete work is complementary and should strive towards one another. But maybe that's what people have been doing all along?

The most upvoted dialogue topic on an October 2023 post by Ben Pace was "Prosaic Alignment is currently more important to work on than Agent Foundations work", which received 40 agree and 32 disagree votes, suggesting that the general opinion on LessWrong at that time was that the current balance was about right, or that prosaic alignment should get some more resources on the margin. ↩︎

Reply

Jesse Hoogland's Shortform

Erich_Grunewald2mo30

Hmm, if the Taiwan tariff announcement caused the NVIDIA stock crash, then why did Apple stock (which should be similarly impacted by those tariffs) go up that day? I think DeepSeek -- as illogical as it is -- is the better explanation.

Reply

AI #100: Meet the New Boss

Erich_Grunewald2mo30

Those diffusion regulations were projected by Nvidia to not have a substantive impact on their bottom line in their official financial statement.

I don't think that's true? AFAIK there's no requirement for companies to report material impact on an 8-K form. In a sense, the fact that NVIDIA even filed an 8-K form is a signal that the diffusion rule is significant for their business -- which it obviously is, though it's not clear whether the impact will be substantially material. I think we have to wait for their 10-Q/10-K filings to see what NVIDIA signals to investors, since there I do think it is the case that they'd need to report expected material impact.

Reply

Human takeover might be worse than AI takeover

Erich_Grunewald3mo40

You talk later about evolution to be selfish; not only is the story for humans is far more complicated (why do humans often offer an even split in the ultimatum game?), but also humans talk a nicer game than they act (See construal level theory, or social-desirability bias.). Once you start looking at AI agents who have similar affordances and incentives that humans have, I think you'll see a lot of the same behaviors.

Some people have looked at this, sorta:

I think I'd guess roughly that, "Claude is probably more altruistic and cooperative than the median Western human, most other models are probably about the same, or a bit worse, in these simulated scenarios". But of course a major difference here is that the LLMs don't actually have anything on the line -- they don't stand to earn or lose any money, for example, and if they did, they would have nothing to do with the money. So you might expect them to be more altruistic and cooperative than they would under the conditions humans are tested.

Reply

Disagreement on AGI Suggests It’s Near

Erich_Grunewald3mo*32

In the New York example, it could be that when someone says “Guys, we should really buy those Broadway tickets. The trip to New York is next month already.” they prompt the response “What? I thought we were going the month after!”, hence the disagreement. If this detail had been discussed earlier, there might have been the “February trip” and the “March trip” in order to disambiguate the trip(s) to New York.

I guess I don't understand what focusing on disagreements adds. Sure, in this situation, the disagreement stems from some people thinking the trip is near (and others thinking it's farther away). But we already knew that some people think AGI is near and others think it's farther away! What does observing that people disagree about that stuff add?

What seems to have happened is that people at one point latched on to the concept of AGI, thinking that their interpretation was virtually the same as those of others because of its lack of definition. Again, if they had disagreed with the definition to begin with, they would have used a different word altogether. Now that some people are claiming that AGI is here or here soon, it turns out that the interpretations do in fact differ.

Yeah, I would say that as those early benchmarks ("can beat anyone at chess", etc.) are achieved without producing what "feels like" AGI, people are forced to make their intuitions concrete, or anyway reckon with their old bad operationalizations of AGI. And that naturally leads to lots of discussion around what actually constitutes AGI. But again, all this is evidence of is that those early benchmarks have been achieved without producing what "feels like" AGI. But we already knew that.

Reply

Disagreement on AGI Suggests It’s Near

Erich_Grunewald3mo65

I think that, in your New York example, the increasing disagreement is driven by people spending more time thinking about the concrete details of the trip. They do so because it is obviously more urgent, because they know the trip is happening soon. The disagreements were presumably already there in the form of differing expectations/preferences, and were only surfaced later on as they started discussing things more concretely. So the increasing disagreements are driven by increasing attention to concrete details.

It seems likely to me that the increasing disagreement around AGI is also driven by people spending more time thinking about the concrete details of what constitutes AGI. But where in the New York example we can assume people pay more attention to the details because they know the trip is upcoming, with AGI we know that people don't know when AGI will happen, so there must be some other reason.

One reason could be "a bunch of people think/feel AGI is near", but we already knew that before noticing disagreement around AGI. Another reason could be that there's currently a lot of hype and activity around AI and AGI. But the fact that there's lots of hype around AI/AGI doesn't seem like much evidence that AGI is near, or if it is, can also be stated more directly than through a detour via disagreements.

Reply

What Indicators Should We Watch to Disambiguate AGI Timelines?

Erich_Grunewald3mo10

Thanks for writing this -- it's very concrete and interesting.

Have you thought about using company market caps as an indicator of AGI nearness? I would guess that an AI index -- maybe NVIDIA, Alphabet, Meta, and Microsoft -- would look really significantly different in the two scenarios you paint. To control for general economic conditions, you could look at the those companies relative to the NASDAQ-100 (minus AI companies). An advantage of this is that it tracks a lot of different indicators, including ones that are really fuzzy or hard to discover, through the medium of market speculators. Another advantage is that it is a clear and easily measurable quantity. That makes it easy to make bets and create prediction markets around it.

Of course, there is weirdness around how we should expect the market to behave in the run-up to AGI, where wealth may become less relevant, etc. But I'd still expect the market to be significantly more bullish on AI stocks in the run-up to AGI, than in an AI fizzle/winter scenario.

Reply

artemium's Shortform

Erich_Grunewald3mo50

The jury is still out, but it's currently available even in Direct Chat on Chatbot Arena, there will be more data on this soon.

Fyi, it's also available on https://chat.deepseek.com/, as is their reasoning model DeepSeek-R1-Lite-Preview ("DeepThink"). (I suggest signing up with a throwaway email and not inputting any sensitive queries.) From quickly throwing it a few requests I recently asked 3.5 Sonnet, DeepSeek-V3 seems slightly worse, but nonetheless solid.

Reply

1