From Atoms To Agents
"Why Not Just..."
Basic Foundations for Agent Models
Framing Practicum
Gears Which Turn The World
Abstraction 2020
Gears of Aging
Model Comparison

Wiki Contributions


Yeah, I think that's a sensible concern to have. On the other hand, at that point we'd be relatively few bits-of-optimization away from a much better situation than today: adjusting liability laws to better target actual safety, in a world where liability is already a de-facto decision driver at AI companies, is a much easier problem than causing AI companies to de-novo adopt decision-driving-processes which can actually block deployments.

I indeed mean to imply that AI governance minus AI policy is not a priority. Before the recent events at OpenAI, I would have assigned minority-but-not-negligible probability to the possibility that lab governance might have any meaningful effect. After the recent events at OpenAI... the key question is "what exactly is the mechanism by which lab governance will result in a dangerous model not being built/deployed?", and the answer sure seems to be "it won't". (Note that I will likely update back toward minority-but-not-negligible probability if the eventual outcome at OpenAI involves a board which will clearly say "no" sometimes, in ways which meaningfully impact the bottom line, and actually get their way when they do so.)

Things like structured access and RSPs are nice-to-have, but I do not see any plausible trajectory on which those successfully address major bottlenecks to humanity's survival.

I mean, those are all decent projects, but I would call zero of them "great". Like, the whole appeal of governance as an approach to AI safety is that it's (supposed to be) bottlenecked mainly on execution, not on research. None of the projects you list sound like they're addressing an actual rate-limiting step to useful AI governance.

Among the people who do outreach/policymaker engagement, my impression is that there has been more focus on the executive branch (and less on Congress/congressional staffers).

That makes sense and sounds sensible, at least pre-ChatGPT.

I started asking other folks in AI Governance. The vast majority had not talked to congressional staffers (at all).

??? WTF do people "in AI governance" do?

It's of course possible that this is because the methods are bad, though my guess is that at the 99% standard this is reflecting non-sparsity / messiness in the territory (and so isn't going to go away with better methods).

I have the opposite expectation there; I think it's just that current methods are pretty primitive.

+1 to Cannell's answer, and I'll also add pipelining.

Let's say (one instance of) the system is distributed across 10 GPUs, arranged in series - to to do a forward pass, the first GPU does some stuff, passes its result to the second GPU, which passes to the third, etc. If only one user at a time were being serviced, then 90% of those GPUs would be idle at any given time. But pipelining means that, once the first GPU in line has finished one request (or, realistically, batch of requests), it can immediately start on another batch of requests.

More generally: the rough estimate in the post above tries to estimate throughput from latency, which doesn't really work. Parallelism/pipelining mean that latency isn't a good way to measure throughput, unless we also know how many requests are processed in parallel at a time.

(Also I have been operating under the assumption that OpenAI is not profitable at-the-margin, and I'm curious to see an estimate.)

But certain details there are still somewhat sketchy, in particular we don't have a detailed understanding of the attention circuit, and replacing the query with "the projection onto the subspace we thought was all that mattered" harmed performance significantly (down to 30-40%).

@Neel Nanda FYI my first thought when reading that was "did you try adding random normal noise along the directions orthogonal to the subspace to match the typical variance along those directions?". Mentioning in case that's a different kind of thing than you'd already thought of.

One piece missing here, insofar as current methods don't get to 99% of loss recovered, is repeatedly drilling into the residual until they do get to 99%. That's a pretty core part of what makes science work, in general. And yeah, that's hard (at least in the sense of being a lot of work; more arguable whether it's hard in a stronger sense than that).

This dialogue mostly makes me want to rant about how all y'all are doing mech interp wrong. So, rant time. This-is-a-rant-so-not-necessarily-reflectively-endorsed, etc.

Starting point: Science In A High-Dimensional World. Quoting from that post:

In a high-dimensional world like ours, there are billions of variables which could influence an outcome. The great challenge is to figure out which variables are directly relevant - i.e. which variables mediate the influence of everything else. In practice, this looks like finding mediators and hunting down sources of randomness. Once we have a set of control variables which is sufficient to (approximately) determine the outcome, we can (approximately) rule out the relevance of any other variables in the rest of the universe, given the control variables.

A remarkable empirical finding across many scientific fields, at many different scales and levels of abstraction, is that a small set of control variables usually suffices. Most of the universe is not directly relevant to most outcomes most of the time.

Ultimately, this is a picture of “gears-level science”: look for mediation, hunt down sources of randomness, rule out the influence of all the other variables in the universe.

This applies to interpretability just like any other scientific field. The real gold-standard thing to look for is some relatively-small set of variables which determine some other variables, basically-deterministically. Or, slightly weaker: a relatively-small Markov blanket which screens off some chunk of the system from everything else.

In order for this to be useful, the determinism/screening does need pretty high precision - e.g. Ryan's 99% number sounds like a reasonable day-to-day heuristic, many nines might be needed if there's a lot of bits involved, etc.

On the flip side, this does not necessarily need to look like a complete mechanistic explanation. Ideally, findings of screening are the building blocks from which a complete mechanistic model is built. The key point is that findings of screening provide an intermediate unit of progress, in between "no clue what's going on" and "full mechanistic interpretation". Those intermediate units of progress can be directly valuable in their own right, because they allow us to rule things out: (one way to frame) the whole point of screening is that lots of interactions are ruled out. And they directly steer the search for mechanistic explanations, by ruling out broad classes of models.

That is the sort of approach to mech interp which would be able to provide valuable incremental progress on large models, not just toy models, because it doesn't require understanding everything about a piece before something useful is produced.

(Side note: yet another framing of all this would be in terms of modules/modularity.)

Load More