Hoagy

Wiki Contributions

Comments

Do you have a writeup of the other ways of performing these edits that you tried and why you chose the one you did?

In particular, I'm surprised by the method of adding the activations that was chosen because the tokens of the different prompts don't line up with each other in a way that I would have thought would be necessary for this approach to work, super interesting to me that it does.

If I were to try and reinvent the system after just reading the first paragraph or two I would have done something like:

  • Take multiple pairs of prompts that differ primarily in the property we're trying to capture.
  • Take the difference in the residual stream at the next token.
  • Take the average difference vector, and add that to every position in the new generated text.

I'd love to know which parts were chosen among many as the ones which worked best and which were just the first/only things tried.

Yeah I agree it's not in human brains, not really disagreeing with the bulk of the argument re brains but just about whether it does much to reduce foom %. Maybe it constrains the ultra fast scenarios a bit but not much more imo.

"Small" (ie << 6 OOM) jump in underlying brain function from current paradigm AI -> Gigantic shift in tech frontier rate of change -> Exotic tech becomes quickly reachable -> YudFoom

The key thing I disagree with is:

In some sense the Foom already occurred - it was us. But it wasn't the result of any new feature in the brain - our brains are just standard primate brains, scaled up a bit[14] and trained for longer. Human intelligence is the result of a complex one time meta-systems transition: brains networking together and organizing into families, tribes, nations, and civilizations through language. ... That transition only happens once - there are not ever more and more levels of universality or linguistic programmability. AGI does not FOOM again in the same way.

Although I think agree the 'meta-systems transition' is a super important shift, which can lead us to overestimate the level of difference between us and previous apes, it also doesn't seem like it was just a one time shift. We had fire, stone tools and probably language for literally millions of years before the Neolithic revolution. For the industrial revolution it seems that a few bits of cognitive technology (not even genes, just memes!) in renaissance Europe sent the world suddenly off on a whole new exponential.

The lesson, for me, is that the capability level of the meta-system/technology frontier is a very sensitive function of the kind of intelligences which are operating, and we therefore shouldn't feel at all confident generalising out of distribution. Then, once we start to incorporate feedback loops from the technology frontier back into the underlying intelligences which are developing that technology, all modelling goes out the window.

From a technical modelling perspective, I understand that the Roodman model that you reference below (hard singularity at median 2047) has both hyperbolic growth and random shocks, and so even within that model, we shouldn't be too surprised to see a sudden shift in gears and a much sooner singularity, even without accounting for RSI taking us somehow off-script.

I think strategically, only automated and black-box approaches to interpretability make practical sense to develop now.

Just on this, I (not part of SERI MATS but working from their office) had a go at a basic 'make ChatGPT interpret this neuron' system for the interpretability hackathon over the weekend. (GitHub)

While it's fun, and managed to find meaningful correlations for 1-2 neurons / 50, the strongest takeaway for me was the inadequacy of the paradigm 'what concept does neuron X correspond to'. It's clear (no surprise, but I'd never had it shoved in my face) that we need a lot of improved theory before we can automate. Maybe AI will automate that theoretical progress but it feels harder, and further from automation, than learning how to handoff solidly paradigmatic interpretability approaches to AI. ManualMechInterp combined with mathematical theory and toy examples seems like the right mix of strategies to me, tho ManualMechInterp shouldn't be the largest component imo.

FWIW, I agree with learning history/philosophy of science as a good source of models and healthy experimental thought patterns. I was recommended Hasok Chang's books (Inventing Temperature, Is Water H20) by folks at Conjecture and I'd heartily recommend them in turn. 

I know the SERI MATS technical lead @Joe_Collman spends a lot of his time thinking about how they can improve feedback loops, he might be interested in a chat.

You also might be interested in Mike Webb's project to set up programs to pass quality decision-making from top researchers to students, being tested on SERI MATS people at the moment.

Agree that it's super important, would be better if these things didn't exist but since they do and are probably here to stay, working out how to leverage their own capability to stay aligned rather than failing to even try seems better (and if anyone will attempt a pivotal act I imagine it will be with systems such as these).

Only downside I suppose is that these things seem quite likely to cause an impactful but not fatal warning shot which could be net positive, v unsure how to evaluate this consideration.

I've not noticed this but it'd be interesting if true as it seems that the tuning/RLHF has managed to remove most of the behaviour where it talks down to the level of the person writing as evidenced by e.g. spelling mistakes. Should be easily testable too.

Moore's law is a doubling every 2 years, while this proposes doubling every 18 months, so pretty much what you suggest (not sure if you were disagreeing tbh but seemed like you might be?)

0.2 OOMs/year is equivalent to a doubling time of 8 months.

I think this is wrong, that's nearly 8 doublings in 5 years, should instead be doubling every 5 years, should instead be doubling every 5 / log2(10) = 1.5.. years

I think pushing GPT-4 out to 2029 would be a good level of slowdown from 2022, but assuming that we could achieve that level of impact, what's the case for having a fixed exponential increase? Is it to let of some level of 'steam' in the AI industry? So that we can still get AGI in our lifetimes? To make it seem more reasonable to policymakers?

I would still rather have a moratorium until some measure of progress of understanding personally. We don't have a fixed temperature increase per decade built into our climate targets.

Load More