elriggs

Comments

Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

I’m expecting either (1) A future GPT’s meta-learning combined with better prompt engineering will be able to learn the correct distribution and find the correct distribution, respectively. Or (2) curating enough examples will be good enough (though I’m not sure if GPT-3 could do it even then).

Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

I also expect it to be harder as well, and 10-30% chance that it will require some new insight seems reasonable.

What's a Decomposable Alignment Topic?

b) seems right. I'm unsure what (a) could mean (not much overhead?).

I feel confused to think about decomposability w/o considering the capabilities of the people I'm handing the tasks off to. I would only add:

By "smart", assume they can notice confusion, google, and program

since that makes the capabilities explicit.

What's a Decomposable Alignment Topic?

If you only had access to people who can google, program, and notice confusion, how could you utilize that to make conceptual progress on a topic you care about?

Decomposable: Make a simple first person shooter. Could be decomposed into creating asset models, and various parts of the actual code can be decomposed (input-mapping, getting/dealing damage).

Non-decomposable: Help me write an awesome piano song. Although this can be decomposed, I don't expect anyone to have the skills required (and acquiring the skills requires too much overhead).

Let's operationalize "too much overhead" to mean "takes more than 10 hours to do useful, meaningful tasks".

What's a Decomposable Alignment Topic?

The first one. As long as you can decompose the open problem into tractable, bite-sized pieces, it's good.

Vanessa mentioned some strategies that might generalize to other open problems: group decomposition (we decide how to break a problem up), programming to empirically verify X, and literature reviews.

What's a Decomposable Alignment Topic?

I don't know (partially because I'm unsure who would stay and leave).

If you didn't take math background that in consideration and wrote a proposal (saying "requires background in real analysis" or ...), then that may push out people w/o that background but also attract people with that background.

As long as pre-reqs are explicit, you should go for it.

Writing Piano Songs: A Journey

I tend to write melodies in multiple different ways:

1. Hearing it in my head, then playing it out. It's very easy to generate (like GPT but with melodies), but transcribing is very hard! The common advice is to sing it out, and then match it with the instrument. This is exactly what you did with whistling. If I don't record it, I will very often not remember it at all later; very similar to forgetting a dream. When I hear someone else's piano piece (or my own recorded), I will often think "I would've played that part differently" which is the same as my brain predicting a different melody.

2. "Asemic playing" (thanks for the phrase!) - I've improv-ed for hundreds of hours, and I very often run into playing similar patterns when I'm in similar "areas" such as playing the same chord progression. I'll often have (1) melodies playing in my head while improvising, but I will often play the "wrong" note and it still sound good. Over the years, I've gotten much better at remembering melodies I just played (because my brain predicts that the melody will repeat) and playing the "correct" note in my head on the fly.

3. Smashing "concepts" into a melody:

  • What if I played this melody backwards?
  • Pressed every note twice?
  • Held every other note a half-note longer?
  • Used a different chord progression (so specific notes of the melody needs to change to harmonize)
  • Taking a specific pattern of a melody, like which notes it uses, and playing new patterns there.
  • Taking a specific pattern of a melody, like the rhythm between the notes (how long you hold each note, including rests) and applying it to other melodies.
  • Taking a specific patter of a melody, like the exact rhythm and relative notes, and starting on a different note (then continuing to play the same notes, relatively)
Solving Key Alignment Problems Group

Thanks for reaching out. I've sent you the links in a DM.

I would like to be listed in the list of various AI Safety initiatives.

I'm looking forward to this month's AI Safety discussion day (I saw yours and Vanessa's post about it in Diffractor's Discord).

I'll start reading other's maps of Alignment in a couple days, so I would appreciate the link from FLI; thank you. Gyrodiot's post has several links related to "mapping AI", including one from FLI (Benefits and Risks of AI), but it seems like a different link than the one you meant.

Solving Key Alignment Problems Group

It's not clear in the OP, but I'm planning on a depth-first search as opposed to breadth. Week 2-XX will focus on a singular topic (like turntrout's impact measures or johnswentworth's abstractions).

I am looking forward to disjunctive maps though!

No Ultimate Goal and a Small Existential Crisis

But how do you verify that? What does it mean (to you) to become more conscious of it?

Load More