Vika

Research scientist at DeepMind working on AI safety, and cofounder of the Future of Life Institute. Website and blog: vkrakovna.wordpress.com

Wiki Contributions

Comments

Optimization Concepts in the Game of Life

Ah I see, thanks for the clarification! The 'bottle cap' (block) example is robust to removing any one cell but not robust to adding cells next to it (as mentioned in Oscar's comment). So most random perturbations that overlap with the block will probably destroy it. 

Optimization Concepts in the Game of Life
  1. Actually, we realized that if we consider an empty board an optimizing system, then any finite pattern is an optimizing system (because it's similarly robust to adding non-viable collections of live cells), which is not very interesting. We have updated the post to reflect this.
Optimization Concepts in the Game of Life

Thanks for pointing this out! We realized that if we consider an empty board an optimizing system then any finite pattern is an optimizing system (because it's similarly robust to adding non-viable collections of live cells), which is not very interesting. We have updated the post to reflect this.

The 'bottle cap' example would be an optimizing system if it was robust to cells colliding / interacting with it, e.g. being hit by a glider (similarly to the eater). 

List of good AI safety project ideas?

Thanks Aryeh for collecting these! I added them to a new Project Ideas section in my AI Safety Resources list.

AI Safety Reading Group

Is this reading group still running? I'm wondering whether to point people to it.

MIRI location optimization (and related topics) discussion

+1 to everything Jacob said about living near London, plus the advantages of being near an existing AI safety hub (DeepMind, FHI, etc). 

Takeaways from one year of lockdown

As a data point, I found it to be a net positive to live in a smallish group house (~5 people) during the pandemic. The negotiations around covid protocols were time-consuming and annoying at times, but still manageable because of the small number of people, and seemed worth it for the benefits of socializing in person to my mental well-being. It also helped that we had been living together for a few years and knew each other pretty well. I can see how this would quickly become overwhelming with more people involved, and result in nothing being allowed if anyone can veto any given activity. 

Classifying specification problems as variants of Goodhart's Law

Writing this post helped clarify my understanding of the concepts in both taxonomies - the different levels of specification and types of Goodhart effects. The parts of the taxonomies that I was not sure how to match up usually corresponded to the concepts I was most confused about. For example, I initially thought that adversarial Goodhart is an emergent specification problem, but upon further reflection this didn't seem right. Looking back, I think I still endorse the mapping described in this post.

I hoped to get more comments on this post proposing other ways to match up these concepts, and I think the post would have more impact if there was more discussion of its claims. The low level of engagement with this post was an update for me that the exercise of connecting different maps of safety problems is less valuable than I thought. 

"Do Nothing" utility function, 3½ years later?
Answer by VikaJul 20, 20207

Hi there! If you'd like to get up to speed on impact measures, I would recommend these papers and the Reframing Impact sequence.

Tradeoff between desirable properties for baseline choices in impact measures
Vika1y4Ω2

It was not my intention to imply that semantic structure is never needed - I was just saying that the pedestrian example does not indicate the need for semantic structure. I would generally like to minimize the use of semantic structure in impact measures, but I agree it's unlikely we can get away without it. 

There are some kinds of semantic structure that the agent can learn without explicit human input, e.g. by observing how humans have arranged the world (as in the RLSP paper). I think it's plausible that agents can learn the semantic structure that's needed for impact measures through unsupervised learning about the world, without relying on human input. This information could be incorporated in the weights assigned to reaching different states or satisfying different utility functions by the deviation measure (e.g. states where pigeons / cats are alive). 

Load More