Joseph Miller

Wiki Contributions

Comments

Computing the exact layer-truncated residual streams on GPT-2 Small, it seems that the effective layer horizon is quite large:

I'm mean ablating every edge with a source node more than n layers back and calculating the loss on 100 samples from The Pile.

Source code: https://gist.github.com/UFO-101/7b5e27291424029d092d8798ee1a1161

I believe the horizon may be large because, even if the approximation is fairly good at any particular layer, the errors compound as you go through the layers. If we just apply the horizon at the final output the horizon is smaller.


However, if we apply at just the middle layer (6), the horizon is surprisingly small, so we would expect relatively little error propagated.
 

But this appears to be an outlier. Compare to 5 and 7.

Source: https://gist.github.com/UFO-101/5ba35d88428beb1dab0a254dec07c33b

In this piece, we want to paint a picture of the possible benefits of AI, without ignoring the risks or shying away from radical visions.

Thanks for this piece! In my opinion you are still shying away from discussing radical (although quite plausible) visions. I expect the median good outcome from superintelligence involves everyone being mind uploaded / living in simulations experiencing things that are hard to imagine currently.

Even short of that, in the first year after a singularity, I would want to:

  • Use brain computer interfaces to play videogames / simulations that feel 100% real to all senses, but which are not constrained by physics.
  • Go to Hogwarts (in a 100% realistic simulation) and learn magic and make real (AI) friends with Ron and Hermione.
  • Visit ancient Greece or view all the most important events of history based on superhuman AI archeology and historical reconstruction.
  • Take medication that makes you always feel wide awake, focused etc. with no side effects.
  • Engineer your body / use cybernetics to make yourself never have to eat, sleep, wash, etc. and be able to jump very high, run very fast, climb up walls, etc.
  • Use AI as the best teacher ever to learn maths, physics and every subject and language and musical instruments to super-expert level.
  • Visit other planets. Geoengineer them to have crazy landscapes and climates.
    • Play God and oversee the evolution of life on other planets.
  • Design buildings in new architectural styles and have AI build them.
  • Genetically modify cats to play catch.
  • Listen to new types of music, perfectly designed to sound good to you.
  • Design the biggest roller coaster ever and have AI build it.
  • Modify your brain to have better short term memory, eidetic memory, be able to calculate any arithmetic super fast, be super charismatic.
  • Bring back Dinosaurs and create new creatures.
  • Ask AI for way better ideas for this list.

I expect UBI, curing aging etc. to be solved within a few days of a friendly intelligence explosion.

Although I think we also plausibly will see a new type of scarcity. There is limited amount of compute you can create using the materials / energy in the universe. And if in fact most humans are mind-uploaded / brains in vats living in simulations, we will have to divide this among ourselves in order to run the simulations. If you have twice as much compute, you can simulate your brain twice as fast (or run two of you in parallel?), and thus experience twice as much subjective time - and so live twice as long until the heat death of the universe.

Note that the group I was in only played on the app. I expect this makes it significantly harder to understand what's going on.

Yes that's correct, this wording was imprecise.

Would you ever really want mean ablation except as a cheaper approximation to resample ablation?


Resample ablation is not more expensive than mean (they both are just replacing activations with different values). But to answer the question, I think you would - resample ablation biases the model toward some particular corrupt output. 

It seems to me that if you ask the question clearly enough, there's a correct kind of ablation. For example, if the question is "how do we reproduce this behavior from scratch", you want zero ablation.

Yes I agree. That's the point we were trying to communicate with "the ablation determines the task."

  • direct effect vs indirect effect corresponds to whether you ablate the complement of the circuit (direct effect) vs restoring the circuit itself (indirect effect, mediated by the rest of the model)
  • necessity vs sufficiency corresponds to whether you ablate the circuit (direct effect necessary) / restore the complement of the circuit (indirect effect necessary) vs restoring the circuit (indirect effect sufficient) / ablating the complement of the circuit (direct effect sufficient)

Thanks! That's great perspective. We probably should have done more to connect ablations back to the causality literature.

  • "all tokens vs specific tokens" should be absorbed into the more general category of "what's the reference dataset distribution under consideration" / "what's the null hypothesis over",
  • mean ablation is an approximation to resample ablation which itself is an approximation to computing the expected/typical behavior over some distribution

These don't seem correct to me, could you explain further? "Specific tokens" means "we specify the token positions at which each edge in the circuit exists".

I think so. Mostly we learned about trading and the price discovery mechanism that is a core mechanic of the game. We started with minimal explanation of the rules, so I expect these things can be grokked faster by just saying them when introducing the game.

We just played Figgie at MATS 6.0, most players playing for the first time. I think we made lots of clearly bad decisions for the first 6 or 7 games. And reached a barely acceptable standard by about 10-15 games (but I say this as someone who was also playing for the first time).

(crossposted to the EA Forum)

Nonetheless, the piece exhibited some patterns that gave me a pretty strong allergic reaction. It made or implied claims like:
* a small circle of the smartest people believe this
* i will give you a view into this small elite group who are the only who are situationally aware
* the inner circle longed tsmc way before you
* if you believe me; you can get 100x richer -- there's still alpha, you can still be early
* This geopolitical outcome is "inevitable" (sic!)
* in the future the coolest and most elite group will work on The Project. "see you in the desert" (sic)
* Etc.

These are not just vibes - they are all empirical claims (except the last maybe). If you think they are wrong, you should say so and explain why. It's not epistemically poor to say these things if they're actually true.

If this were the case, wouldn't you expect the mean of the code steering vectors to also be a good code steering vector? But in fact, Jacob says that this is not case. Edit: Actually it does work when scaled - see nostalgebraist's comment.

Thanks. So will the building foundations be going through several meters of foam glass to the ice below?

Load More