Chris_Leong

Sequences

Linguistic Freedom: Map and Territory Revisted
INVESTIGATIONS INTO INFINITY

Wiki Contributions

Load More

Comments

What is the Rose Garden Inn used for these days?

I’m confused about the back door attack detection task even after reading it a few times:

The article says: “The key difference in the attack detection task is that you are given the backdoor input along with the backdoored model, and merely need to recognize the nput as an attack”.

When I read that, I find myself wondering why that isn’t trivial solved by a model that memorises which input(s) are known to be an attack.

My best interpretation is that there are a bunch of possible inputs that cause an attack and you are given one of them and just have to recognise that one plus the others you don’t see. Is this interpretation correct?

This is a good post, so I’d definitely encourage you to write up a few more posts.

I know very little about you, so it’d be hard for me to make good suggestions, but here’s two possibilities for your consideration:

  • Help other people figure out how they can contribute, particularly those looking to contribute in a non-technical way. If this is something you’d be interested in doing, I’d probably invest some more time in understanding the strategic landscape first (before someone starts advising, it’s important to have a robust model of what potential downside risks exist)
  • If you run out of post ideas, find others with things they’d like to write up if they had time and help them write it up

Out of curiosity, what role do you see yourself playing?

Excellent post. One part I disagree with though:

“ If you know anybody in politics anywhere, it might be a good idea to try and convince them to pay attention to this AGI thing” - It wouldn’t surprise me if this was net-negative and the default outcome of informing actors about AGI is for them to attempt to accelerate it.

Another part I’d disagree with is lionising technical researchers over everyone else.

Would be very curious to know who’s ultimately funding this.

gotten only 


This should read gotten only d

What does the word swap add? Isn't the human just going to swap the words back as part of the reconstruction? Or are you betting on the rare cases where words can be written in any order, ie: "black round" instead of "round black"?

It would also be nice to have a better idea of how the humans are supposed to be rewriting the plan. I suspect the best way to do this would be to provide an actual example of a plan being reconstructed by a human. One particular aspect I would like clarity on: how do you see the details of a plan being changed when the worry is that the plan is subtly off? Do you see this as occurring accidentally during reconstruction or is the idea that humans should intentionally change the details?

So this project was something along the line of ARC Evals?

Omega has more processing power than you

Load More