Senior research scholar at FHI. My current research interests are mainly the behaviour and interactions of boundedly rational agents, complex interacting systems, and strategies to influence the long-term future, with focus on AI alignment.

Previously I was a researcher in physics, studying phase transitions, network science and complex systems.


On Destroying the World

There seem to be multiple meta- games

  1. press the button or not
  2. take it as a game | as a serious ritual | as a serious experiment
  3. cooperate or defect on the implicit rule allowing play behaviour ~ "you are allowed to play and experiment in games and this is safe. it is understood actions you take within the game will not be used as an evidence of intent outside of the game". (imagine I play a game of chess with someone and interpret my opponent taking my pieces as literarily trying to harm me)
  4. the meta-game of making the game interesting; cf munchkin
  5. the meta-game of making the experiment valuable for learning
  6. coordination about which of these games we are playing

To me Chris's story hears like [ press | game | cooperate | ? | ? | ]

Overall while I think there is a lot of value in having a community of people who do not press big red buttons, I also see a lot of value in noticing these other games and "cooperating" in them. 

The rationalist community's location problem

Overall I think the rationalist community is concentrated too much in one hub and the secondary and tertiary hubs are weaker than they should be.

The main negatives are
- this creates a bit of single-point-of-failure dynamic; imagine the single hub becomes infected by some particularly dangerous meme, or bad community norms
- the single hub is still embedded in the wider society of the place where it is located, introducing some systematic bias (the epistemic climate of contemporary US seems increasingly scary; Bay rationalists sometimes seems overcompensating for the insanities of the broader society)
- the single hub would be vulnerable to a coordinated attack originating from the environment

There are also advantages of single hub
- in theory in a single hub it is easy to visit people and form connections; in practice it seems this is true in Berkeley, less true in the whole Bay where travel distances are comparable to flight times between European cities

And there is the huge advantage of Bay
- being close to the nexus of power and the most future-shaping place is extremely important (as explained by Scott and others)

Advantages of more hubs are
- in my view, could support more strains of thoughts / more experiments with community / more opportunities where people can lead things 
- less fragility
- more of the total available talent used; some people will just not move to the Bay (will not get visas / can not bear with culture /....)

Instead of thinking "should we find the location X and move The hub" I would suggest thinking about optimal allocation of people in a structure of networked places

- which secondary hubs should grow / grow faster / be founded
- how to create links; people should consider moving temporarily between the hubs (for eg half a year or a year), even in the direction "Bay -> elsewhere" - this is often the best way to form links

What should be avoided
- some "holier-than-thou" dynamic where people who made the sacrifice of moving to the Bay and living there even if they think it terrible place with low quality of life assume that people who did not made the sacrifice are not sufficiently dedicated to the mission or similar; hence the rest of the world can be ignored


The case for C19 being widespread

Of the many problems of this theory...

Many places need ~10 PCR tests to find one infection while the group tested is often highly pre-selected, such as "symptomatic people with known contacts". You should have much higher prior it is infected. Some of the numbers proposed in the "tip of the iceberg" framework would actually mean the prior probability of being infected in the "tested group" is lower than in the general population.

With this hypothesis its very hard to make sense of China. Outside of Hubei, China managed to contain the outbreak in large part by contact tracing & testing. However if you assume there is some very high number of cases you don't know about, it is difficult to explain why contact tracing can influence anything.

March Coronavirus Open Thread

We are looking for forecasters/"estimators" to help with estimating various COVID-19 parameters, such as number of infected cases, which will go into epidemic modelling, augmenting unreliable reported data. Ideally the end product should be the results of the modelling presented in a good web UI. If you would be interested in helping, reply privately.

Q&A: How does it compare to Metaculus? In a few important ways.

1. the estimates are not the end product, but an input to epidemic modelling software

2. in our UX, we want to clearly communicate the results of the epidemic are not pre-determined, but depend on actions humanity will take

3. we want to expose more of the uncertainties and underlying dynamic, as opposed to static forecasts

Becoming Unusually Truth-Oriented

My best guess gears-level model of what's going on here

  • the "predictive processing engine" has quite rich model of the world / people / histories / ... whatever
  • somewhat special domain into which it is "predicting" are thoughts / concepts / language / "the voice in your head" (somewhat overlapping with "S2")
  • with "words on top of your tongue", the PP system is trying to find a structure in the "thinking/verbal" domain which would be fitting the "PP" structure (many people have pretty specific sense of prediction error if they are missing the right word, which drops when they find it / the word "fits")
  • generally directing attention toward such interface can greatly increase it's throughput/precision

And so here are some caveats

  • This isn't as directly grounded in reality as it may seem
  • The nature of PP is such that model adjustment will be going on both sides (e.g. if I'll be looking at cloud shapes in the sky, and some cloud will start resonating with the concept/word Stegosaurus, my perception will change all the way down toward noticing plate-resembling parts of the cloud, etc.)
  • In particular with probing in more detail, the PP machinery will generally be able to generate more details; in case of memories, as you note, the problem is they they are mostly output of generative word model inside your head, not of the external world; if your generative world is precise enough and your attention was focused on something while experiencing it, the recall could be quite reliable
  • The relation of the language/concept space with reality is somewhat complicated... notice that in the above given example with clouds the concept of Stegosaurus is a result of pretty impressive and big cultural computation which happened almost entirely outside of your head

So... while I generally like most of the specific advice, I don't think truth-orientated thinking is a good label. In my view what's a necessary ingredient for truth orientation, missing here, are strong links between anything happening inside the brain and "the rest of the reality".

Disincentives for participating on LW/AF


1) From the LW user perspective, the way AF is integrated in a way which signals there are two classes of users, where the AF members are something like "the officially approved experts" (specialists, etc.), together with omega badges, special karma, application process, etc. In such setup it is hard to avoid for the status-tracking subsystem which humans generally have to not care about what is "high status". At the same time: I went through the list of AF users, and it seems much better representation of something which Rohin called "viewpoint X" than the field of AI alignment in general. I would expect some subtle distortion as a result

2) The LW team seem quite keen about e.g. karma, cash prizes on questions, omegas, daily karma updates, and similar technical measures which in S2-centric views bring clear benefits (sorting of comments, credible signalling of interest in questions, creating high-context environment for experts,...). Often these likely have some important effects on S1 motivations / social interactions / etc. I've discussed karma and omegas before, creating an environment driven by prizes risks eroding the spirit of cooperativeness and sharing of ideas which is one of virtues of AI safety community, and so on. "Herding elephants with small electric jolts" is a poetic description of effects people's S1 get from downvotes and strong downvotes.

Disincentives for participating on LW/AF

As a datapoint - my reasons for mostly not participating in discussion here:

  • The karma system messes up with my S1 motivations and research taste; I do not want to update toward "LW average taste" - I don't think LW average taste is that great. Also IMO on the margin it is better for the field to add ppl who are trying to orient themselves in AI alignment independently, in contrast to people guided by "what's popular on LW"
  • Commenting seems costly; feels like comments are expected to be written very clearly and reader-friendly, which is time costly
  • Posting seems super-costly; my impression is many readers are calibrated on quality of writing of Eliezer, Scott & likes, not on informal research conversation
  • Quality of debate on topics I find interesting is much worse than in person
  • Not the top reason, but still... System of AF members vs. hoi polloi, omegas, etc. creates some subtle corruption/distortion field. My overall vague impression is the LW team generally tends to like solutions which look theoretically nice, and tends to not see subtler impacts on the elephants. Where my approach would be to try move much of the elephants-playing-status-game out of the way, what's attempted here sometimes feels a bit like herding elephants with small electric jolts.
Epistea Summer Experiment

No. It's planned so you can attend both events.

Habryka's Shortform Feed

FWIW I also think it's quite possible the current equilibrium is decent (which is part of reasons why I did not posted something like "How did I turned karma off" with simple instruction about how to do it on the forum, which I did consider). On the other hand I'd be curious about more people trying it and reporting their experiences.

I suspect many people kind of don't have this action in the space of things they usually consider - I'd expect what most people would do is 1) just stop posting 2) write about their negative experience 3) complain privately.

Habryka's Shortform Feed

Actually I turned the karma for all comments, not just mine. The bold claim is my individual taste in what's good on the EA forum is in important ways better than the karma system, and the karma signal is similar to sounds made by a noisy mob. If I want I can actually predict what average sounds will the crowd make reasonably well, so it is not any new source of information. But it still messes up with your S1 processing and motivations.

Continuing with the party metaphor, I think it is generally not that difficult to understand what sort of behaviour will make you popular at a party, and what sort of behaviours even when they are quite good in a broader scheme of things will make you unpopular at parties. Also personally I often feel something like "I actually want to have good conversations about juicy topics in a quite place, unfortunately you all people are congregating at this super loud space, with all these status games, social signals, and ethically problematic norms how to treat other people" toward most parties.

Overall I posted this here because it seemed like an interesting datapoint. Generally I think it would be great if people moved toward writing information rich feedback instead of voting, so such shift seems good. From what I've seen on EA forum it's quite rarely "many people" doing anything. More often it is like 6 users upvote a comment, 1user strongly downvotes it, something like karma 2 is a result. I would guess you may be in larger risk of distorted perception that this represents some meaningful opinion of the community. (Also I see some important practical cases where people are misled by "noises of the crowd" and it influences them in a harmful way.)

Load More