Someone asked me about this, so here are my quick thoughts.

Although I've learned a lot of math over the last year and a half, it still isn't my comparative advantage. What I do instead is,

Find a problem

that seems plausibly important to AI safety (low impact), or a phenomenon that's secretly confusing but not really explored (instrumental convergence). If you're looking for a problem, corrigibility strikes me as another thing that meets these criteria, and is still mysterious.

Think about the problem

Stare at the problem on my own, ignoring any existing thinking as much as possible. Just think about what the problem is, what's confusing about it, what a solution would look like. In retrospect, this has helped me avoid anchoring myself. Also, my prior for existing work is that it's confused and unhelpful, and I can do better by just thinking hard. I think this is pretty reasonable for a field as young as AI alignment, but I wouldn't expect this to be true at all for e.g. physics or abstract algebra. I also think this is likely to be true in any field where philosophy is required, where you need to find the right formalisms instead of working from axioms.

Therefore, when thinking about whether "responsibility for outcomes" has a simple core concept, I nearly instantly concluded it didn't, without spending a second glancing over the surely countless philosophy papers wringing their hands (yup, papers have hands) over this debate. This was the right move. I just trusted my own thinking. Lit reviews are just proxy signals of your having gained comprehension and coming to a well-considered conclusion.

Concrete examples are helpful: at first, thinking about vases in the context of impact measurement was helpful for getting a grip on low impact, even though it was secretly a red herring. I like to be concrete because we actually need solutions - I want to learn more about the relationship between solution specifications and the task at hand.

Make simplifying assumptions wherever possible. Assume a ridiculous amount of stuff, and then pare it down.

Don't formalize your thoughts too early - you'll just get useless mathy sludge out on the other side, the product of your confusion. Don't think for a second that having math representing your thoughts means you've necessarily made progress - for the kind of problems I'm thinking about right now, the math has to sing with the elegance of the philosophical insight you're formalizing.

Forget all about whether you have the license or background to come up with a solution. When I was starting out, I was too busy being fascinated by the problem to remember that I, you know, wasn't allowed to solve it.

Obviously, there are common-sense exceptions to this, mostly revolving around trying to run without any feet. It would be pretty silly to think about logical uncertainty without even knowing propositional logic. One of the advantages of immersing myself in a lot of math isn't just knowing more, but knowing what I don't know. However, I think it's rare to secretly lack the basic skills to even start on the problem at hand. You'll probably know if you are, because all your thoughts keep coming back to the same kind of confusions about a formalism, or something. Then, you look for ways to resolve the confusion (possibly by asking a question on LW or in the MIRIx Discord), find the thing, and get back to work.

Stress-test thoughts

So you've had some novel thoughts, and an insight or two, and the outlines of a solution are coming into focus. It's important not to become enamored with what you have, because it stops you from finding the truth and winning. Therefore, think about ways in which you could be wrong, situations in which the insights don't apply or in which the solution breaks. Maybe you realize the problem is a bit ill-defined, so you refactor it.

The process here is: break the solution, deeply understand why it breaks, and repeat. Don't get stuck with patches; there's a rhythm you pick up on in AI alignment, where good solutions have a certain flavor of integrity and compactness. It's OK if you don't find it right away. The key thing to keep in mind is that you aren't trying to pass the test cases, but rather to find brick after brick of insight to build a firm foundation of deep comprehension. You aren't trying to find the right equation, you're trying to find the state of mind that makes the right equation obvious. You want to understand new pieces of the world, and maybe one day, those pieces will make the difference.

ETA: I think a lot of these skills apply more broadly. Emotional trust in one's own ability to think seems important for taking actions that aren't e.g. prescribed by an authority figure. Letting myself just think lets me be light on my mental feet, and bold in where those feet lead me.

ETA 2: Apparently simulating drop-caps:

ike this

isn't the greatest idea. Formatting edit.


New Comment
26 comments, sorted by Click to highlight new comments since: Today at 3:43 AM

These are great suggestions for the thinking part of doing research.

For people who have difficulty with the first part – finding a good problem – I recommend the classic The Craft of Research. It also has practical guidance about writing down your results.

𝕀 like the creative use of blackboard bold to simulate drop caps!

I, on the other hand, consider it a crime against typography. :(

Agreed this attempt was kind of criminal, but still inspiring in its own way. Modernism has been making me tired. The internet could really use some drop caps, flourishes, flower-filled margins, and random weird drawings in the middle of text.

Yes, I agree. But this attempt at “drop caps” is an insult to drop caps.

Compare these drop caps on

Yeah, he's doing it right :-)

(I should note that I am not at all unbiased in recommending the given examples as “the right way”, as I did the current site design… but, on that note, thank you for the endorsement! :)

Congrats on the great work then! Maybe ask him about flowers in the margins?

ow can make amends, aid?

ETA: the special formatting apparently doesn't show up on the front page's comment feed. Weird.

For what it's worth, I'm with Said rather than Zack on this one.

(It would make more sense if these initial letters were associated with a mnemonic or something; then there would be a reason for emphasizing a bunch of first letters. But it seems to have been done just for, I dunno, fun.)

Bullets wouldn’t work because some tips had several paragraphs, and it would have been awkward to make a new subsubsection with eg two sentences (make simplifying assumptions). So, I did this, and I, like Zack, liked how it looks.

Instead you should have used the “first several words are in small caps” technique (example).

FYI, I'm told that normal users are not allowed to use this kind of formatting, but the admins can edit it in.

ETA maybe you could do , though.

Boldface the first few words.

That's a good, simple idea, thanks! I'll consider doing that.

If you’re stooping to the use of Unicode alternates for your “drop-caps”, then there’s no reason not to do the same for small caps, yes?

(But note that I do not actually advocate doing any of this. None of this—not Unicode “blackboard bold”, not Unicode small caps, not anything along such lines—is an appropriate use of Unicode; it is harmful to accessibility, searchability, archivability, etc., and is generally a gross violation of separation of concerns.)

Meta: at first, I found the tone of this thread to be fun and nerdy, but I'm quickly changing my mind. In fairness to you and the other commenters, I specified no moderation guidelines on this post. However, here are my thoughts on this kind of thing going forward:

I don't at all mind people debating my formatting choices, or saying they didn't like something I did. Here's a good comment:

As an aside, I see that you tried to employ makeshift drop caps, but I don't think it works well like that. Personally, I prefer using [technique X], so you might consider that instead. Additionally, it has the benefits of Y and Z. ✔️

Here's a bad comment:

this attempt at “drop caps” is an insult to drop caps. ❌

Here's another bad comment:

If you're stooping to the use of Unicode alternates for your "drop-caps"... ❌

This is needlessly abrasive, and I won't tolerate it on my posts in the future.

Your moderation norms are, of course, yours to declare and enforce, but I must note that this:

As an aside, I see that you tried to employ makeshift drop caps, but I don’t think it works well like that.

… does not, actually, have the same meaning as this:

this attempt at “drop caps” is an insult to drop caps.

I am happy to be as civil as you like, but what you propose sacrifices communication for civility. Once again, enforcing such a sacrifice is your right, by Less Wrong’s rules, but then you should know that the given norms will mean that certain information will simply not reach you.

Some additional thoughts on this (don’t feel that you need to respond if you don’t want to):

It has been said (and we generally take it to be true) that if you say “I’ll try to do X”, then what you really mean is that you’re going to try to try to do X; you might not, in the end, actually try to do X.

It has also been said (and we generally take it to be true) that if you say “I believe X”, or “I believe in X”, then what you’re expressing isn’t a belief in X, but rather a belief that you believe X. Perhaps you actually believe X; perhaps not.

The pattern here is similar to (though not identical with) those patterns. The following two statements are, in fact, expressing different things:

  1. This doesn’t work well.

  2. I don’t think this works well.

Likewise, these two statements are expressing different things:

  1. X is better than Y.

  2. I prefer X to Y.

And, again, these two statements are expressing different things:

  1. X is terrible.

  2. X is sub-optimal.

Consider a norm that you should never make object-level claims; you must only state explicitly that you believe so-and-so. What effect does this have, on our ability to discern and discuss the above distinctions?

I see that you're trying to enforce civility norms. Personally, I prefer the equilibrium where authors develop a slightly thicker skin to arguably-abrasively-phrased substantive criticisms, rather than the one where we all have to tiptoe on eggshells around each other and expend effort trying to figure out how to rephrase criticisms in a way that doesn't threaten anyone's ego.

I claim that abrasiveness-tolerant "thick skin" norms are more effcient for intellectual progress. Think of it from an AI design perspective: if you design a reinforcement learner that can only except positive feedback, or only small-magnitude negative feedback, that AI is going to be a less efficient learner than one that can take proportionately larger punishments when it makes proportionately larger mistakes. So too with human discourse norms. If you're only allowed to say "Personally I prefer ... you might consider", then you have been deprived of the expressive language that you could have used to communicate the distinction between cases where "Personally you prefer" and cases where your interlocutor's idea is actually stupid and an insult to the good version of whatever they were trying to do.

(Additionally, I think the "thick skin" norms also have the potential to be more fun, but that's more subjective and harder to establish.)

It's interesting that your template example of a good comment includes reasons for the stated opinion ("it has the benefits of Y and Z"), but your examples of bad comments don't: someone who just read your comment out of context might walk away with the impression that you were reacting to content-free abrasiveness.

But I think Said is actually being pretty specific as well as consise (which could read as brusque, which could be read as abrasive). "If you're stooping to Unicode pseudo-drop-caps, there's no reason not to use Unicode small caps" is an appeal to argumentative local validity: the point is that the same rationale supports Unicode blackboard bold pseudo-drop-caps and Unicode small caps, so that the choice of one over the other (or, as Said would prefer, neither) should be made on typographical grounds; it's not consistent to appeal to technical limitations only in the case of small caps.

As I see it, Said is helping you out by offering you this argument (and possibly informing or reminding you of the existence of Unicode small caps). Are you going to reject his gift because you don't like the word "stooped"? Isn't that kind of rude and ungrateful of you?!

Obviously Turntrout gets to decide whether he wants to host this subthread on his post. 

But, noting: part of the point of having author moderation norms is that authors get to enforce them without having to defend their right to every single time. I do think there's plenty worth discussing here but suspect it makes more sense to do so on a separate thread directly addressing the meta-norm, which wasn't Turntrout's decision.

I see that you're trying to enforce civility norms.

Very clever.

You're right that Said's criticism was substantive, and I didn't mean to downplay that in my comment. I do, in fact, think that Said is right: my formatting messes with archiving and search, and there are better alternatives. He has successfully persuaded me of this. In fact, I'll update the post after writing this comment!

The reason I made that comment is that I notice his tone makes it harder for me to update on and accept his argumentation. Although an ideal reasoner might not mind, I do. The additional difficulty of updating is a real cost, and the tone just seemed consistently unreasonable for the situation.

I don't think we should just prioritize authors' simply getting thicker skin, although I agree it's good for authors to strive for individually. Here is some of my reasoning.

  • Suppose I were a newcomer to the site, I wrote a post about my research habits, and then I recieved this comment thread in return. Would I write more posts? 2017-me would not have done so. Suppose I even saw this happening on someone else's post. Would I write posts? No. I, like many people I have anecdotally heard of, was already intimidated by the percieved harshness of this site's comments. I think you might not currently be appreciating how brutal this site can look at first. If there are tradeoffs we can make along the lines of saying "if you're resorting to X" instead of "if you're stooping to X", tradeoffs that don't really lose much informational content but do significantly reduce potential for abrasion, it seems sensible to make them.

  • Truly thickening one's skin seems pretty difficult. Maybe I can just sit down and Internal Double Crux this, but even if so, do we just expect authors to do this in general?

  • Microhedonics. If one has reasonably but imperfectly thick skin, then the author might be slightly discouraged from engaging with the community. Obviously there is a balance to be struck here, but the line I drew does not seem unreasonable to me.

ETA: My comment also wasn't saying that people have to specifically follow the scripted example. They don't need to say they just prefer X, or whatever. The "good" example is probably overly flowery. Just avoid being needlessly abrasive.

a newcomer to the site [...] engaging with the community

Recruiting newcomers and maximizing engagement have costs (in the form of making it harder to maintain the culture that made the community valuable in the first place) as well as benefits. See Ben Hoffman's "On the Construction of Beacons" for a longer argument along these lines.

the tone just seemed consistently unreasonable for the situation.

I see how it could be read as unreasonably hostile, but I read it as unreasonably passionate about typography. The principle of charity recommends the latter reading.

If there are tradeoffs we can make along the lines of saying "if you're resorting to X" instead of "if you're stooping to X", tradeoffs that don't really lose much informational content but do significantly reduce potential for abrasion

I can think of two ways to reply to this.

The first reply. While individual writers would do well to carefully weigh the the shades of meaning between "stoop" and "resort" when composing their comments, I fear such fine distinctions aren't well suited for intersubjectively applicable norms, which need to be robust over the vagaries of many authors' goals, talents, and writing styles. A key feature that makes "No direct personal attacks" a great norm is that it's a stable Schelling point: it's not hard to get broad agreement, and therefore common knowledge, that "You're stupid" is a personal attack, and "The idea that you expressed in this post is wrong and harmful" is not. Common knowledge makes norm-enforcement a lot easier by circumventing the possibility of plausible-deniability games: if I know that "You're stupid" is a forbidden personal attack, and the mods know that I know, &c. then the elephant in my brain won't be tempted to say it and then plead ignorance of the law, because no one's going to buy that. When we get into the weeds of litigating which of various near-synonyms have which effects on reader microhedonics, this is no longer true.

The second reply. Don't lose "much" informational content? Alex, you're a goddamned alignment researcher. The human species is facing an impossible problem. lives per second depend on you thinking more clearly than the heroes of scientific legend, seeing more deeply into the true secrets of mind and cognition than all who have come before you! Ten years from now, a fellow researcher finds a flaw in your proof that the AI your institute is deploying next week is low-impact. Do you want them to spend time fretting about exactly how to phrase their comment so as to not hurt your feelings? Or do you want them to tell you about the flaw as soon as possible and as clearly as possible so that you can not destroy the world?! This "backslide from [philosophy club] norms towards more diplomatic norms" will be the death of us all!

Don't lose "much" informational content? Alex, you're a goddamned alignment researcher... Do you want them to spend time fretting about exactly how to phrase their comment so as to not hurt your feelings?

This is a straw interpretation of what I'm trying to communicate. This argument isn't addressing the actual norm I plan on enforcing, and seems to instead cast me as walling myself off from anything I might not like.

The norm I'm actually proposing is that if you see an easy way to make an abrasive thing less abrasive, you take it. If the thing still has to be abrasive, that's fine. Remember, I said

needlessly abrasive

Am I to believe that if people can't say the thing that first comes to mind in the heat of the moment, there isn't any way to express the same information? What am I plausibly losing out on by not hearing "stooping to X" instead of "resorting to X"? That Said first thought of "stooping"? That he's passionate about typography?

I don't see many reasonable situations where this kind of statement

The thing you're doing really doesn't work, because [reason X]. You should do [alternative] because [reason Y].

doesn't suffice (again, this isn't a "template"; it's just an example of a reasonable level of frankness).

I've been actively engaged with LW for a good while, and this issue hasn't come up until now (which makes me think it's uncommon, thankfully). Additionally, I've worked with people in the alignment research community. No one has seemed to have trouble communicating efficiently in a minimally abrasive fashion, and I don't see why that will change.

I don't currently plan on commenting further on this thread, although I thank both you and Said for your contributions and certainly hope you keep commenting on my posts!

The principle of charity recommends the latter reading.

If someone has a thin skin, that almost by definition means that their subconscious is running on something opposite to the principle of charity: the principle of erring-on-the-side-of-assuming-that-people-who-say-potentially-hostile-things-are-in-fact-hostile. You can't just tell someone to apply the principle of charity in that case; that's applying a counteractive strategy to a deeply felt emotional experience, and as such ineffective.

But this is not the actual test, is it? I mean, it's all good and well and lets you generate ideas and discard obviously unfortunate ones before - something that slays or upholds. (I know nothing about AI research apart from the fact that it is a thing.)

New to LessWrong?