I'm Ben Weinstein-Raun; I work at MIRI, and am originally from Blacksburg, Virginia.

If you have feedback for me, you can fill out the form at .

Or you can email me, at [the second letter of the alphabet]@[my username].net


benwr's unpolished thoughts

A thing that feels especially good about this way of thinking about things is that it feels like the kind of problem with straightforward engineering / cryptography style solutions.

benwr's unpolished thoughts

I'm interested in concrete ways for humans to evaluate and verify complex facts about the world. I'm especially interested in a set of things that might be described as "bootstrapping trust".

For example:

Say I want to compute some expensive function f on an input x. I have access to a computer C that can compute f; it gives me a result r. But I don't fully trust C - it might be maliciously programmed to tell me a wrong answer. In some cases, I can require that C produce a proof that f(x) = r that I can easily check. In others, I can't. Which cases are which?

A partial answer to this question is "the complexity class NP". But in practice this isn't really satisfying. I have to make some assumptions about what tools are available that I do trust.

Maybe I trust simple mathematical facts (and I think I even trust that serious mathematics and theoretical computer science track truth really well). I also trust my own senses and memory, to a nontrivial extent. Reaching much beyond that is starting to feel iffy. For example, I might not (yet) have a computer of my own that I trust to help me with the verification. What kinds of proof can I accept with the limitations I've chosen? And how can I use those trustworthy proofs to bootstrap other trusted tools?

Other problems in this bucket include "How can we have trustworthy evidence - say videos - in a world with nearly perfect generative models?" and a bunch of subquestions of "Does debate scale as an AI alignment strategy?"

This class of questions feels like an interesting lens on some things that are relevant to some sorts of AI alignment work such as debate and interpretability. It's also obviously related to some parts of information security and cryptography.

"Bootstrapping trust" is basically just a restatement of the whole problem. It's not exactly that I think this is a good way to decide how to direct AI alignment effort; I just notice that it seems somehow like a "fresh" way of viewing things.

Prize: Interesting Examples of Evaluations

"Postmortem culture" from the Google SRE book:

This book has some other sections that are also about evaluation, but this chapter is possibly my favorite chapter from any corporate handbook.

Prize: Interesting Examples of Evaluations
Answer by benwrNov 28, 202010

Two that are focused on critique rather than evaluation per se:

benwr's unpolished thoughts

If I got to pick the moral of today's Petrov day incident, it would be something like "being trustworthy requires that you be more difficult to trick than it would be worth", and I think very few people reliably live up to this standard.

benwr's unpolished thoughts

Beth Barnes notices: Rationalists seem to use the word "actually" a lot more than the typical English speaker; it seems like the word "really" means basically the same thing.

We wrote a quick script, and the words "actually" and "really" occur about equally often on LessWrong, while Google Trends suggests that "really" is ~3x more common in search volume. SSC has ~2/3 as many "actually"s as "really"s.

What's up with this? Should we stop?

Did any US politician react appropriately to COVID-19 early on?

San Francisco's mayor, London Breed, declared a state of emergency in the city on February 25th, and it seems like she was concerned about the disease (and specifically ICU capacity) as early as January.

I don't know what actions the mayor's office actually took during this time, but it seems like she was at least aware and concerned well ahead of most other politicians.

benwr's unpolished thoughts

darn - I've been playing it on my old ipad for a long time

benwr's unpolished thoughts

Recently I tried to use Google to learn about the structure of the human nasal cavity & sinuses, and it seems to me that somehow medical illustrators haven't talked much to mechanical draftspeople. Just about every medical illustration I could find tried to use colors to indicate structure, and only gave a side-view (or occasionally a front view) of the region. In almost none of the illustrations was it clear which parts of your nasal cavity and sinuses are split down the middle of your head, vs joined together. I still feel pretty in-the-dark about it.

In drafting, you express 3d figures by drawing a set of multiple projections: Typically, you give a top view, a front view, and a side view (though other views, including cross-sections and arbitrary isometric perspective, may be useful or necessary). This lets you give enough detail that a (practiced) viewer can reconstruct a good mental model of the object, so that they can (for example) use their machine shop to produce the object out of raw material.

There's a pretty fun puzzle game that lets you practice this skill called ".projekt"; there are probably lots more.

Load More