LESSWRONG
LW

devrandom
731140
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
We are headed into an extreme compute overhang
devrandom8mo78

This scenario now seems less likely with the OpenAI "O" series.  It seems like we might reach AGI with heavy inference compute cost at first.  This would mean much less overhang.

Reply
Orienting to 3 year AGI timelines
devrandom8mo103

The post doesn't seem to contemplate the effect that open-weights models will have on the take-off dynamics.  For example, it seems like the DeepSeek V3 release shows that whatever performance is achieved at the frontier, is then achieved in open-weights at a much lower cost.

Given that, the centralization forces might not dominate.

Reply1
Is "superhuman" AI forecasting BS? Some experiments on the "539" bot from the Centre for AI Safety
devrandom1y*12

There seem to be substantial problems with low probability events, coherent predictions over time, short term events, probabilities adding up to more than 100%, etc

 

A probabilistic oracle being inconsistent is completely besides the point.  If I have a probabilistic oracle that has high accuracy but is sometimes inconsistent, I can just post-process the predictions to force them into a consistent format. For example, I can normalize the probabilities to 100%.

The economic value is in the overall accuracy. Being consistent is a cosmetic consideration.

Reply
We are headed into an extreme compute overhang
devrandom1y10

New Transformer specific chips from Etched are in the works.  This might make inference even cheaper compared to compute.

Reply
We are headed into an extreme compute overhang
devrandom1y10

Post from Epoch AI about trading off training compute against inference compute.

Reply
We are headed into an extreme compute overhang
devrandom1y30

These are good points.

But don't the additional GPU requirements apply equally to training and inference?  If that's the case, then the number of inference instances that can be run on training hardware (post-training) will still be on the order of 1e6.

Reply
We are headed into an extreme compute overhang
devrandom1y10

https://www.lesswrong.com/posts/aH9R8amREaDSwFc97/rapid-capability-gain-around-supergenius-level-seems also seems relevant to this discussion.

Reply
We are headed into an extreme compute overhang
devrandom1y10

The main advantage is that you can immediately distribute fine-tunes to all of the copies.  This is much higher bandwidth compared to our own low-bandwidth/high-effort knowledge dissemination methods.

The monolithic aspect may potentially be a disadvantage, but there are a couple of mitigations:

  • AGI are by definition generalists
  • you can segment the population into specialists (see also this comment about MoE)
Reply
We are headed into an extreme compute overhang
devrandom1y10

I think this only holds if fine tunes are composable [...] you probably can't take a million independently-fine-tuned models and merge them [...]

 

The purpose of a fine-tune is to "internalize" some knowledge - either because it is important to have implicit knowledge of it, or because you want to develop a skill.

Although you may have a million instances executing tasks, the knowledge you want to internalize is likely much more sparse.  For example, if an instance is tasked with exploring a portion of a search space, and it doesn't find a solution in that portion, it can just summarize its finding in a few words.  There might not even be a reason to internalize this summary - it might be merged with other summaries for a more global view of the search landscape.

So I don't see the need for millions of fine-tunes.  It seems more likely that you'd have periodic fine-tunes to internalize recent progress - maybe once an hour.

The main point is that the single periodic fine-tune can be copied to all instances.  This ability to copy the fine-tune is the main advantage of instances being identical clones.

Reply
We are headed into an extreme compute overhang
devrandom1y10

On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24/7, exchange information electronically, etc.), will be able to significantly "outcompete" (in some fashion) 8 billion humans? This seems worth further exploration / justification.

 

Good point, but a couple of thoughts:

  • the operational definition of AGI referred in the article is significantly stronger than the average human
  • the humans are poorly organized
  • the 8 billion humans are supporting a civilization, while the AGIs can focus on AI research and self-improvement
Reply
Load More
No wikitag contributions to display.
54We are headed into an extreme compute overhang
1y
34