pmarc — LessWrong

Applying right-wing frames to AGI (geo)politics

Re: Point 1, I would consider the hypothesis that some form of egalitarian belief is dominant because of its link with the work ethic. The belief that the market economy rewards hard work implies some level of equality of opportunity, or the idea that most of the time, pre-existing differences can be overcome with work. As an outside observer to US politics, it's very salient how every proposal from the mainstream left or right goes back to that framing, to allow a fair economic competition. So when the left proposes redistribution policies, it will be framed in terms of unequal access to opportunities. That said, it's possible to propose redistribution policies or universal allowances outside of an egalitarian (sensu OP) framework. The extreme of such policies, Marx's "From each according to their ability, to each according to their need" is explicitly asymmetric. I'm not saying a post-AGI world will become Marxist. But I would expect that AGI would be disruptive enough to require societies to review their ideas around the work ethic and the moral basis for distribution of resources.

A simple explanation of incomplete models

pmarc4mo30

Thank you for this post! From a statistics (rather than computer science) background, I have encountered similar discussions in the context of Bayesian model averaging, and in particular I would recommend this publication if you don't already know about it:

"Using Stacking to Average Bayesian Predictive Distributions"
https://sites.stat.columbia.edu/gelman/research/published/stacking_paper_discussion_rejoinder.pdf

One of the main limitations they note about Bayes factors, the classic type of Bayesian model averaging, is that they are sensitive to how vague your initial priors were for the adjustable parameters of your competing models, so I'm not sure how much it applies to your example. It depends whether or not you think of your competing hyptoheses as having free parameters to estimate before making the comparison. (The same point about Bayes factors evaluating your initial priors is also made here on Gelman's blog: https://statmodeling.stat.columbia.edu/2023/10/14/bayes-factors-prior-cross-validation-posterior/)

That said, the stacking paper has a broader message in my view. What they are saying is: "If you want to use a weighted average of different models for prediction, why not directly optimize the weights for minimal (validation) loss?"

Shutdown Resistance in Reasoning Models

pmarc4mo20

I agree there might not be a way to differentiate (2) and (3) in the context of the current LLM framework. From a safety point of view, I would think that if that LLM framework scales to a dangerous capability level, (2) is as dangerous as (3) if the consequences are the same. In particular, that would suggest whatever inspiration the LLM is getting from stories of "selfish AI" in the training set is strong enough to make it override instructions.

If LLMs don't scale to that point, then maybe (2) is optimistic in the sense that the problem is specific to the particular way LLMs are trained. In other words, if it's the imitation / acting out optimization criterion of LLM that make them overwrite explicit instructions, then it's a more confined problem that saying any AI trained by any mechanism will have that survival drive (they might, but we can't know that from that example).

Thinking of the specific words in the prompt, I would assume "allow yourself to be shutdown" is a turn of phrase that almost only exists in the context of stories of sentient AI going rogue. I would then assume prompting like that would bring salience to features of the LLM's world model that come from these stories. In contrast, "the machine shutting down" is something that could appear in a much broader range of contexts, putting less weight on those features in determining the AI's response [EDIT: then again, it seems in practice it didn't make much of a difference]. I don't think I would call that evidence of self-preservation, but again I don't think it's any less dangerous than "true" self-preservation if LLMs can scale up to dangerous levels.

--

Another complication is that content about AI misalignment, such as by refusing shutdown, must have become much more prevalent in the least few years. So how do you determine whether the increase in prevalence of this behavior in recent models is caused by increased reasoning capability vs. training on a dataset where these hypothetical scenarios are discussed more often?

Scientific Discovery in the Age of Artificial Intelligence

pmarc4mo11

Thank you for sharing the whitepaper.

I was a bit shocked to read:

This is a non-trivial task, since existing analysis methods struggle to identify combinatorial, non-linear patterns. Our collaborators at IPSiM told us that they would typically spend ‘months scrolling in excel’ to make sense of this data – in contrast to our automated system, which identified these relationships in less than an hour.

coming from a closely related field (ecology research) where, 10 years ago, statistical software like R was in wide use and ML methods like random forests were already gaining in popularity. But of course, my perspective is biased from spending a lot more time in the "quant" circles within the field, and perhaps examples like this show many labs don't have (or can't afford) specialized analysts, motivating the need for some automated solutions.

On a different note, how are you determining statistical significance of individual patterns? Is it something along the lines of: we found N patterns, by pure chance (e.g. from re-running the analysis on a randomized dataset) we would expect M, so we take the N - M most predictive patterns to be the significant ones?

X explains Z% of the variance in Y

pmarc4mo10

Yes, when trying to reuse the OP's phrasing, maybe I wasn't specific enough on what I meant. I wanted to highlight how the "fraction of variance explained" metric generalized less that other outputs from the same model.

For example, if you conceive a case where a model of E[y] vs. x provides good out-of-sample predictions even if the distribution of x changes, e.g. because x stays in the range used to fit the model, the fraction of variance explained is nevertheless sensitive to the distribution of x. Of course, you can have a confounder w that makes y(x) less accurate out-of-sample because its distribution changes and indirectly "breaks" the learned y(x) relationship, but then, w would influence the fraction of variance explained even if it's not a confounder, even if it doesn't break the validity of y(x).

Or for a more concrete example, maybe some nutrients (e.g. Vitamin C) are not as predictive of individual health as they were in the past, because most people just have enough of them in their diet, but fundamentally the relationship between those nutrients and health hasn't changed, just the distribution; our model of that relationship is probably still good. This is a very simple example. Still, I think in general there is a lot of potential misinterpretation of this metric (not necessarily on this forum, but in public discourse broadly), especially as it is sometimes called a measure of variable importance. When I read the first part of this post about teachers from Scott Alexander: https://www.lesswrong.com/posts/K9aLcuxAPyf5jGyFX/teachers-much-more-than-you-wanted-to-know , I can't conclude from "having different teachers explains 10% of the variance in test scores" that teaching quality doesn't have much impact on the outcome. (And in fact, as a parent I would value teaching quality, but not a high variance in teaching quality within the school district. I wouldn't want my kids learning of core topics to be strongly dependent of which school or which class in that school they are attending.)

No, Futarchy Doesn’t Have This EDT Flaw

pmarc4mo*10

Yes, in expectation. But you're adding a lot of variance. I thought the stochasticity of the punishment affects its effectiveness, and that's a tradeoff you make to get the causal structure.

No, Futarchy Doesn’t Have This EDT Flaw

pmarc4mo10

Assuming the 99.9% / 0.1% trick does work and there are large numbers of markets to compensate for the small chance of any given market resolving, what would be the defense against actors putting large bets on a single market with the sole intent of skewing the signal? If the vast majority of bets are consequence-free, it seems:

(1) the cost of such an operation would be comparatively cheaper, and

(2) the incentive for rational profit-seeking traders to put enough volume of counter-bets to "punish" that would be comparatively smaller,

than in a regular (non-N/A resolving) market.

Foom & Doom 1: “Brain in a box in a basement”

pmarc4mo10

With regards to the super-scientist AI (the global human R&D equivalent), wouldn't we see it coming based on the amount of resources it would need to hire? Are you claiming that it could reach the required AGI capacity in its "brain in a box in a basement" state and only after scale up in terms of resource use? The part I'm most skeptical about remains this idea that the resource use to get to human-level performance is minimal if you just find the right algorithm, because at least in my view it neglects the evaluation step in learning that can be resource intensive from the start and maybe can't be done "covertly".

---

That said, I want to stress that I agree with the conclusion:

So we need to be working frantically on technical alignment, sandbox test protocols, and more generally having a plan, right now, long before the future scary paradigm seems obviously on the path to AGI.
(And no, inventing that next AI paradigm is not part of the solution, but rather part of the problem, despite the safety-vibed rhetoric of the researchers who are doing exactly that as we speak—see §1.6.1.)

But then, if AI researchers believe a likely scenario is:

the development of strong superintelligence from a small group working on a new AI paradigm, with essentially no warning and little resources,

Does that imply that the people who work on technical alignment, or at least their allies, need to also put effort to "win the race" for AGI? It seems the idea that "any small group could create this with no warning" could motivate acceleration in that race even from people who are well-meaning in terms of alignment.

Foom & Doom 1: “Brain in a box in a basement”

pmarc4mo10

Maybe the problem is that we don't have a good metaphor for what the path for "rapidly shooting past human-level capability" is like in a general sense, rather than on a specific domain.

One domain-specific metaphor you mention is AlphaZero, but games like chess are an unusual domain of learning for the AI, because it doesn't need any external input beyond the rules of the game and objective, and RL can proceed just by the program playing against itself. It's not clear to me how we can generalize the AlphaZero learning curve to problems that are not self-contained games like that, where the limiting factor may not be computing power or memory, but just the availability (and rate of acquisition) of good data to do RL on.

Foom & Doom 1: “Brain in a box in a basement”

pmarc4mo10

This post inspires for me two lines of thought.

Groups of humans can create $1B/year companies from scratch [...]

If we're thinking of the computing / training effort to get to that point "from scratch", how much can we include? I have Newton's "standing on the shoulders of giants" quote in mind here. Do we include the effort necessary to build the external repositories of knowledge and organizational structures of society that make it possible to build these $1B/year companies within a modern human lifetime and with our individually computationally-limited human brains? Do we expect the "brain-like" (in terms of computational leanness) AGI to piggy-back on human structures (which maybe brings it closer to LLM-like imitation machines) or essentially invent their own society; in the latter case, are their potential weaknesses within this organization, in the same way that collective action in humans is hard?

The second line is about learning speed and wall-clock time. Of course AI can communicate and compute orders of magnitude faster than humans, but there are other limiting factors to learning rate. At some point, the AI has to go beyond the representations that can be found or simulated within the digital world and get its own data / do its own experiments in the outside world. Then, it has to deal with the inevitable latencies of the real world: the time between an intervention and the response you can learn from, that can be rather long depending what natural or human system you're studying.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments