Jan

phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/

Wiki Contributions

Comments

The Greedy Doctor Problem... turns out to be relevant to the ELK problem?

Huh, thanks for spotting that! Yes, should totally be ELK 😀 Fixed it.

Formal Philosophy and Alignment Possible Projects
Jan1moΩ230

This work by Michael Aird and Justin Shovelain might also be relevant: "Using vector fields to visualise preferences and make them consistent"

And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: "Inferring utility functions from locally non-transitive preferences"

(Extremely cool project ideas btw)

A descriptive, not prescriptive, overview of current AI Alignment Research

Hey Ben! :) Thanks for the comment and the careful reading!

Yes, we only added the missing arx.iv papers after clustering, but then we repeat the dimensionality reduction and show that the original clustering still holds up even with the new papers (Figure 4 bottom right). I think that's pretty neat (especially since the dimensionality reduction doesn't "know" about the clustering) but of course the clusters might look slightly different if we also re-run k-means on the extended dataset.

[Link] Adversarially trained neural representations may already be as robust as corresponding biological neural representations

There's an important caveat here:

The visual stimuli are presented 8 degrees over the visual field for 100ms followed by a 100ms grey mask as in a standard rapid serial visual presentation (RSVP) task.

I'd be willing to bet that if you give the macaque more than 100ms they'll get it right - That's at least how it is for humans!

(Not trying to shift the goalpost, it's a cool result! Just pointing at the next step.)

"Brain enthusiasts" in AI Safety

Great points, thanks for the comment! :) I agree that there are potentially some very low-hanging fruits. I could even imagine that some of these methods work better in artificial networks than in biological networks (less noise, more controlled environment).

But I believe one of the major bottlenecks might be that the weights and activations of an artificial neural network are just so difficult to access? Putting the weights and activations of a large model like GPT-3 under the microscope requires impressive hardware (running forward passes, storing the activations, transforming everything into a useful form, ...) and then there are so many parameters to look at. 

Giving researchers structured access to the model via a research API could solve a lot of those difficulties and appears like something that totally should exist (although there is of course the danger of accelerating progress on the capabilities side also).

"Brain enthusiasts" in AI Safety

Great point! And thanks for the references :) 

I'll change your background to Computational Cognitive Science in the table! (unless you object or think a different field is even more appropriate)

A descriptive, not prescriptive, overview of current AI Alignment Research

Thank you for the comment and the questions! :)

This is not clear from how we wrote the paper but we actually do the clustering in the full 768-dimensional space! If you look closely as the clustering plot you can see that the clusters are slightly overlapping - that would be impossible with k-means in 2D, since in that setting membership is determined by distance from the 2D centroid.

The Brain That Builds Itself

Oh true, I completely overlooked that! (if I keep collecting mistakes like this I'll soon have enough for a "My mistakes" page)

The Brain That Builds Itself

Yes, good point! I had that in an earlier draft and then removed it for simplicity and for the other argument you're making!

Adversarial attacks and optimal control

This sounds right to me! In particular, I just (re-)discovered this old post by Yudkowsky and this newer post by Alex Flint that both go a lot deeper on the topic. I think the optimal control perspective is a nice complement to those posts and if I find the time to look more into this then that work is probably the right direction.

Load More