Davidmanheim

Sequences

Modeling Transformative AI Risk (MTAIR)

Wiki Contributions

Comments

Sorted by

Strongly agree. I was making a narrower point, but the metric is clearly different than the goal - if anything it's more surprising that we see so much correlation as we do, given how much it has been optimized.

Toby Ord writes that “the required resources [for LLM training] grow polynomially with the desired level of accuracy [measured by log-loss].” He then concludes that this shows “very poor returns to scale,” and christens it the "Scaling Paradox." (He continues to point out that this doesn’t imply it can’t create superintelligence, but I agree with him about that.)

But what would it look like if this were untrue? That is, what would be the conceptual alternative, where required resources grow more slowly?I think the answer is that it’s conceptually impossible.

To start, there is a fundamental bound on loss at zero, since the best possible model perfectly predicts everything - it exactly learns the distribution. This can happen when overfitting a model, but it can also happen when there is a learnable ground truth; models that are trained to learn a polynomial function can learn them exactly. 

But there is strong reason to expect the bound to be significantly above zero loss. The training data for LLMs contains lots of aleatory randomness, things that are fundamentally conceptually unpredictable. I think it’s likely that things like RAND’s random number book are in the training data, and it’s fundamentally impossible to predict randomness. I think something similar is generally true for many other things - predicting world choice for semantically equivalent words, predicting where typos occur, etc.

Aside from being bound well above zero, there's a strong reason to expect that scaling is required to reduce loss for some tasks. In fact, it’s mathematically guaranteed to require significant computation to get near that level for many tasks that are in the training data. Eliezer pointed out that GPTs are predictors, and gives the example of a list of numbers followed by their two prime factors. It’s easy to generate such a list by picking pairs of primes and multiplying them, the writing the answer first - but decreasing loss for generating the next token to predict the primes from the product is definitionally going to require exponentially more computation to perform better for larger primes.

And I don't think this is the exception, I think it's at least often the rule. The training data for LLMs contains lots of data where the order of the input doesn’t follow the computational order of building that input. When I write an essay, I sometimes arrive at conclusions and then edit the beginning to make sense. When I write code, the functions placed earlier often don’t make sense until you see how they get used later. Mathematical proofs are another example where this would often be true.

An obvious response is that we’ve been using exponentially more compute for better accomplishing tasks that aren’t impossible in this way - but I’m unsure if that is true. Benchmarks keep getting saturated, and there’s no natural scale for intelligence. So I’m left wondering whether there’s any actual content in the “Scaling Paradox.”

(Edit: now also posted to my substack.)

True, and even more, if optimizing for impact or magnitude has Goodhart effects, of various types, then even otherwise good directions are likely to be ruined by pushing on them too hard. (In large part because it seems likely that the space we care about is not going to have linear divisions into good and bad, there will be much more complex regions, and even when pointed in a directino that is locally better, pushing too far is possible, and very hard to predict from local features even if people try, which they mostly don't.)

I think the point wasn't having a unit norm, it was that impact wasn't defined as directional, so we'd need to remove the dimensionality from a multidimensionally defined  direction.

So to continue the nitpicking, I'd argue impact = || Magnitude * Direction ||, or better, ||Impact|| = Magnitude * Direction, so that we can talk about size of impact. And that makes my point in a different comment even clearer - because almost by assumption, the vast majority of those with large impact are pointed in net-negative directions, unless you think either a significant proportion of directions are positive, or that people are selecting for it very strongly, which seems not to be the case.

I think some of this is on target, but I also think there's insufficient attention to a couple of factors.

First, in the short and intermediate term, I think you're overestimating how much most people will actually update their personal feelings around AI systems. I agree that there is a fundamental reason that fairly near-term AI will be able to function as  better companion and assistant than humans - but as a useful parallel, we know that nuclear power is fundamentally better than most other power sources that were available in the 1960s, but people's semi-irrational yuck reaction to "dirty" or "unclean" radiation - far more than the actual risks - made it publicly unacceptable. Similarly, I think the public perception of artificial minds will be generally pretty negative, especially looking at current public views of AI. (Regardless of how appropriate or good this is in relation to loss-of-control and misalignment, it seems pretty clearly maladaptive for generally friendly near-AGI and AGI systems.)

Second, I think there is a paperclip maximizer aspect to status competition, in the sense Eliezer uses the concept. That is,  Specifically, given massively increased wealth, abilities, and capacity, even if a implausibly large 99% of humans find great ways to enhance their lives in ways that don't devolve into status competition, there are few other domains where an indefinite amount of wealth and optimization power can be applied usefully. Obviously, this is at best zero-sum, but I think there aren't lots of obvious alternative places for positive sum indefinite investments. And even where such positive-sum options exist, they often are harder to arrive at as equilibria. (We see a similar dynamic with education, housing, and healthcare, where increasing wealth leads to competition over often artificially-constrained resources rather than expansion of useful capacity.)

Finally and more specifically, your idea that we'd see intelligence enhancement as a new (instrumental) goal in the intermediate term seems possible and even likely, but not a strong competitor for, nor inhibitor of, status competition. (Even ignoring the fact that intelligence itself is often an instrumental goal for status competition!) Even aside from the instrumental nature of the goal, I will posit that some strongly reduced returns to investment will exist - regardless of the fact that it's unlikely on priors that these limits are near the current levels. Once those points are reached, the indefinite investment of resources will trade-off between more direct status competition and further intelligence increases, and as the latter shows decreased returns, as noted above, the former becomes the metaphorical paperclip which individuals can invest indefinitely into.

my uninformed intuition is that the people with the biggest positive impact on the world have prioritized the Magnitude

 

That's probably true, but it's selecting on the outcome variable. And I'll bet that the people with the biggest negative impact are even more overwhelmingly also those who prioritized magnitude.

"If you already know that an adverse event is highly likely for your specific circumstances, then it is likely that the insurer will refuse to pay out for not disclosing "material information" - a breach of contract."

Having worked in insurance, that's not what the companies usually do. Denying explicitly for clear but legally hard to defend reasons, especially those which a jury would likely rule against, isn't a good way to reduce costs and losses. (They usually will just say no and wait to see if you bother following up. Anyone determined enough to push to get a reasonable claim is gonna be cheaper to pay out for than to fight.)

Yes - the word 'global' is a minimum necessary qualification for referring to catastrophes of the type we plausibly care about - and even then, it is not always clear that something like COVID-19 was too small an event to qualify.

I definitely appreciate that confusion. I think it's a good reason to read the sequence and think through the questions clearly; https://www.lesswrong.com/s/p3TndjYbdYaiWwm9x - I think this resolves the vast majority of the confusion people have, even if it doesn't "answer" the questions.

The math is good, the point is useful, the explanations are fine, but the embracing straw vulcan version of rationality and dismissing any notion of people legitimately wanting things other than money seems really quite bad, which leaves me wishing this wasn't being highlighted for visitors to the site.

Load More