shawnghu

Posts

Sorted by New

20Disentangling Perspectives On Strategy-Stealing in AI Safety

Ω

3y

Ω

1

Wiki Contributions

Comments

But is it really in Rome? An investigation of the ROME model editing technique

shawnghu1mo32

This is a good post; it articulated several of my own critiques of the ROME paper well, and furthermore, helped save me time in understanding the nuts and bolts level stuff in the paper. It was also somewhat helpful to see the results of some of the experiments you did.

I don't believe you technically mentioned this, though you mentioned many things which are conceptually similar: observing the limitations of the ROME paper made me realize that even given ideal model-editing powers, I think that the task of editing a model's understanding is underspecified:

Any time you tell a model to believe something which is not true, typically several other things will have to change to accommodate it, but it is not clear by default how deep the rabbit hole goes. (This is something which is technically also true with human lying, or just what happens when you audit factual beliefs that are not true.) For example, if you're to say the Eiffel tower is in Rome, sure, now where in Rome is it? Supposing it's 1km north of the Colosseum, what happened to (the building which in reality actually occupies that location)? Likewise, if a person A speaks French, is it because they were born in France? If they were born in France, how did their parents get there? Maybe world history should now be different.

Reply