I work at Redwood Research.
Oh, by "as qualitatively smart as humans" I meant "as qualitatively smart as the best human experts".
I also maybe disagree with:
In terms of "fast and cheap and comparable to the average human" - well, then for a number of roles and niches we're already there.
Or at least the % of economic activity covered by this still seems low to me.
Superintelligence
To me, superintelligence implies qualitatively much smarter than the best humans. I don't think this is needed for AI to be transformative. Fast and cheap-to-run AIs which are as qualitatively smart as humans would likely be transformative.
I propose that LLMs cannot do things in this category at human level, as of today—e.g. AutoGPT basically doesn’t work, last I heard. And this category of capability isn’t just a random cherrypicked task, but rather central to human capabilities, I claim.
What would you claim is a central example of a task which requires this type of learning? ARA type tasks? Agency tasks? Novel ML research? Do you think these tasks certainly require something qualitatively different than a scaled up version of what we have now (pretraining, in-context learning, RL, maybe training on synthetic domain specific datasets)? If so, why? (Feel free to not answer this or just link me what you've written on the topic. I'm more just reacting than making a bid for you to answer these questions here.)
Separately, I think it's non-obvious that you can't make human-competitive sample efficient learning happen in many domains where LLMs are already competitive with humans in other non-learning ways by spending massive amounts of compute doing training (with SGD) and synthetic data generation. (See e.g. efficient-zero.) It's just that the amount of compute/spend is such that you're just effectively doing a bunch more pretraining and thus it's not really an interestingly different concept. (See also the discussion here which is mildly relevant.)
In domains where LLMs are much worse than typical humans in non-learning ways, it's harder to do the comparison, but it's still non-obvious that the learning speed is worse given massive computational resources and some investment.
I think this mostly just reveals that "AGI" and "human-level" are bad terms.
Under your proposed usage, modern transformers are (IMO) brutally non-central with respect to the terms "AGI" and "human-level" from the perspective of most people.
Unfortunately, I don't think there is any defintion of "AGI" and "human-level" which:
I prefer the term "transformative AI", ideally paired with a definition.
(E.g. in The case for ensuring that powerful AIs are controlled, we use the terms "transformatively useful AI" and "early tranformatively useful AI" both of which we define. We were initially planning on some term like "human-level", but we ran into a bunch of issues with using this term due to wanting a more precise concept and thus instead used a concept like not-wildly-qualitatively-superhuman-in-dangerous-domains or non-wildly-qualitatively-superhuman-in-general-relevant-capabilities.)
I should probably taboo human-level more than I currently do, this term is problematic.
I see the intuition here, but I think the actual answer on how convex agents behave is pretty messy and complicated for a few reasons:
I think the core confusion is that outer/inner (mis)-alignment have different (reasonable) meanings which are often mixed up:
The key thing is that threat models are not necessarily problems that need to be solved directly. For instance, AI control aims to address the threat model of inner misalignment without solving inner alignment.
Here are these terms defined as threat models
This seems like a pretty reasonable decomposition of problems to me, but again note that these problems don't have to respectively be solved by inner/outer alignment "solutions".
This proposal has two necessary desiderata:
This seems like generally reasonable overall proposal, though there are alternatives. And the caveats around outer alignment only needing to be locally adequate are important.
This content is copied out of this draft which I've never gotten around to cleaning up and publishing.
The outer misalignment threat model covers cases where problematic feedback results in training a misaligned AI even if the actual oversight process used for training would actually have caught the catastrophically bad behavior if it was applied to this action.
For AIs which aren’t well described as pursuing goals, it’s sufficient for the AI to just be reasonably well optimized to perform well according to this reward provision process. However, note that AIs which aren't well described as pursuing goals also likely pose no misalignment risk.
Prior work hasn’t been clear about outer alignment solutions just needing to be robust to a particular AI produced by some training process, but this seems extremely key to a reasonable definition of the problem from my perspective. This is both because we don’t need to be arbitrarily robust (the key AIs will only be so smart, perhaps not even smarter than humans) and because approaches might depend on utilizing the AI itself in the oversight process (recursive oversight) such that they’re only robust to that AI but not others. For an example of recursive oversight being robust to a specific AI, consider ELK type approaches. ELK could be applicable to outer alignment via ensuring that a human overseer is well informed about everything a given AI knows (but not necessarily well informed about everything any AI could know).Os
Again, this is only important insofar as AIs are doing anything well described as "trying" in any cases.
In some circumstances, it’s unclear exactly what it would even mean to optimize a given reward provision process as the process is totally inapplicable to the novel circumstances. We’ll ignore this issue.
Shapley seems like quite an arbitrary choice (why uniform over all coalitions?).
I think the actually mathematically right thing is just EDT/UDT, though this doesn't imply a clear notion of credit. (Maximizing shapley yields crazy results.)
Unfortunately, I don't think there is a correct notion of credit.
IMO robin is quite repetitive (even relative to other blogs like Scott Alexander's blog). So the quality is maybe the same, but the marginal value add seems to me to be substantially degrading.
In this particular case, I don't really see any transparency benefits. If it was the case that there was important public information attached to Scott's full name, then this argument would make sense to me.
(E.g. if Scott Alexander was actually Mark Zuckerberg or some other public figure with information attacked to their real full name then this argument would go through.)
Fair enough if NYT needs to have a extremely coarse grained policy where they always dox influential people consistently and can't do any cost benefit on particular cases.