All of strawberry calm's Comments + Replies

I agree that Pinker' advice is moderate — e.g. he doesn't prohibit authors from self-reference.

But this isn't because classic style is moderate — actually classic style is very strict — e.g. it does prohibit authors from self-reference.

Rather, Pinker's advice is moderate because he weakly endorses classic style. His advice is "use classic style except in rare situations where this would be bad on these other metric.

If I've read him correctly, then he might agree with all the limitations of classic style I've mentioned.

(But maybe I've misread Pinker. Maybe he endorses classic style absolutely but uses "classic style" to refer to a moderate set of rules.)

1Richard Korzekwa 5d
I agree that classic style as described by Thomas and Turner is a less moderate and more epistemically dubious way of writing, compared to what Pinker endorses. For example, from chapter 1 of Clear and Simple as the Truth: I also agree that it is a bad idea to write in a maximally classic style in many contexts. But I think that many central examples of classic style writing are: 1. Not in compliance with the list of rules given in this post 2. Better writing than most of what is written on LW It is easy to find samples of writing used to demonstrate characteristics of classic style in Pure and Simple as the Truth that use the first person, hedge, mention the document or the reader, or use the words listed in the "concepts about concepts" section. (To this post's credit, it is easy to get the impression that classic style does outright exclude these things, because Thomas and Turner, using classic style, do not hedge their explicit statements about what is or is not classic style presumably because they expect the reader to see this clearly through examples and elaboration.) Getting back to my initial comment, it is not clear to me what kind of writing this post is actually about. It's hard to identify without examples, especially when the referenced books on style do not seem to agree with what the post is describing.
As i understand the linked text, EURISKO just played a game, not compared the spirit of the game with the rules as written. The latter would require general knowledge about the world at the level of current language models.

When reading an academic paper, you don't find it useful when the author points out their contributions? I definitely do. I like to know whether the author asserts  because it's the consensus in the field, or whether the author asserts  because that's the conclusion of the data. If I later encounter strong evidence against  then this difference matters — it determines whether I update against that particular author or against the whole field.

Its a matter of taste maybe. Honestly, I don't think I have ever found it useful (outside refereeing). I was recently reading about how you quantify a particular thing. Instead of providing the equation in a self-contained way (which would have taken 3 lines of maths, and 2 sentences) the paper explained it sideways by first giving someone else's (wrong) suggestion and then explaining how they have modified that. I really just wanted the right method stated clearly. Providing the whole apparatus of a wrong method then a text explaining what changes will make it right makes it clearer who has discovered what, but its really bad for the useability of the paper.

Writing can definitely be overly "self-aware" sometimes (trust me I know!) but "classic style" is waaaayyy too restrictive.

My rule of thumb would be:

Write sentences that are maximally informative to your reader.

If you know that  and you expect that the reader's beliefs about the subject matter would significantly change if they also knew , then write that .  

This will include sentences about the document and the author — rather than just the subject.

I don’t think people do all the hedging because they think it would be informative. That’s the problem. The policy of advocating classic style is that people are either bad at using meta-discourse well, or they’re unconsciously using it in pursuit of a goal other than being maximally informative. By eliminating it and sticking with classic style, the writer comes closer to being maximally informative in most cases.

Avoiding hedging is only one aspect of classic style. I would also recommend against hedging, but I would replace hedging with more precise notions of uncertainty.

I think classic style is bad for all the situations that Pinker endorses it:

  • Academic papers
  • Non-fiction books
  • Textbooks
  • Blog posts
  • Manuals

This is because I can't think of any situations where the five limitations I mention would be appropriate.

I think Academic papers could benefit with more of this classical style (not taken the whole way). Often I see "in section III we address the impact of the RW approximation." I scroll down. The title of Section III is "Impact of the RW approximation". So that was pointless. Often I see "In contrast to the analysis of Whoever et al we here account for foo via a blah blah blah" and similar. These serve to neatly partition the credit, drawing a line in the sand around what is new in this paper. As a reviewer these statements are useful, but as a reader of the paper who is not the reviewer they are a waste of space. I came to learn about apples, not about the division of novelty points between apple studiers. Ideally this kind of meta-data could be contained in a "letter to the referees" which could be linked to the paper online. Although recommending full-on classical style for anything seems nuts. "This manual describes the operation of the widget type A27, it is not suitable for other models."- Forbidden.

Quick emarks and questions:

  1. AI developers have been competing to solve purely-adversarial / zero-sum games, like Chess or Go. But Diplomacy, in contrast, is semi-cooperative. Will be safer if AGI emerges from semi-cooperative games than purely-adversarial games?
  2. Is it safer if AGI can be negotiated with?
  3. No-Press Diplomacy was solved by DeepMind in 2020. MetaAI was just solved Full-Press Diplomacy. The difference is that in No-Press Diplomacy the players can't communicate whereas in Full-Press Diplomacy the players can chat for 5 minutes between r
... (read more)
Re 3: Cicero team concedes they haven't overcome the challenge of maintaining coherency in chatting agents. They think they got away with it because 5 minutes are too short, and consider the game with longer negotiation periods will be more challenging.
5Daniel Kokotajlo13d
My brief opinions: 1. Slightly lower probability of s-risk, approximately the same probability of x-risk. 2. Slightly lower probability of s-risk, approximtely the same probability of x-risk. 3. Prior to Cicero I thought full-press diplomacy was significantly more difficult, due to the politics aspect. Now I guess it wasn't actually significantly more difficult. 4. Not sure. 5. No. 6. No.

EA is constrained by the following formula:

Number of Donors x Average Donation = Number of Grants x Average Grant

If we lose a big donor, there are four things EA can do:

  1. Increase the number of donors:
    1. Outreach. Community growth. Might be difficult right now for reputation reasons, though fortunately, EA was very quick to denounce SBF.
    2. Maybe lobby the government for cash?
    3. Maybe lobby OpenAI, DeepMind, etc for cash?
  2. Increase average donation:
    1. Get another billionaire donor. Presumably, this is hard because otherwise EA would've done it already, but there might be f
... (read more)

Are you saying that it's too early to claim "SBF committed fraud", or "SBF did something unethical", or "if SBF committed fraud, then he did something unethical"?

I think we have enough evidence to assert all three.

2Alex Flint25d
The direct information I'm aware of is (1) CZ's tweets about not acquiring, (2) SBF's own tweets yesterday, (3) the leaked P&L doc from Alameda. I don't think any of these are sufficient to decide "SBF committed fraud" or "SBF did something unethical". Perhaps there is additional information that I haven't seen, though. (I do think that if SBF committed fraud, then he did something unethical.)

Thanks for the comments. I've made two edits:

There is a spectrum between two types of people, K-types and T-types.


I've tried to include views I endorse in both columns, however most of my own views are right-hand column because I am more K-type than T-type. 

You're correct that this is a spectrum rather than a strict binary. I should've clarified this. But I think it's quite common to describe spectra by their extrema, for example:

also, in the correct contrarian cluster, atheism is listed twice.

You could still be doing perfect bayesian reasoning regardless of your prior credences. Bayesian reasoning (at least as I've seen the term used) is agnostic about the prior, so there's nothing defective about assigned a low prior to programs with high time-complexity.

This is true in the abstract, but the physical word seems to be such that difficult computations are done for free in the physical substrate (e.g,. when you throw a ball, this seems to happen instantaneously, rather than having to wait for a lengthy derivation of the path it traces). This suggests a correct bias in favor of low-complexity theories regardless of their computational cost, at least in physics.

when translating between proof theory and computer science:

(computer program, computational steps, output) is mapped to (axioms, deductive steps, theorems) respectively.

kolmogorov-complexity maps to "total length of the axioms" and time-complexity maps to "number of deductive steps".

I see, with that mapping your original paragraph makes sense. Just want to note though that such a mapping is quite weird and I don't really see a mathematical justification behind it. I only know of the Curry-Howard isomorphism as a way to translate between proof theory and computer science, and it maps programs to proofs, not to axioms.

what do you mean "the solomonoff prior is correct"? do you mean that you assign high prior likelihood to theories with low kolmogorov complexity?

this post claims: many people assign high prior likelihood to theories with low time complexity. and this is somewhat rational for them to do if they think that they would otherwise be susceptible to fallacious reasoning.

I mean it is so fundamentally correct that it is just how statistical learning works - all statistical learning systems that actually function well approximate bayesian learning (which uses a solomnoff/complexity prior). This includes the brain and modern DL systems, which implement various forms of P(M|E) ~ P(E|M) P(M) - ie they find approximate models which 'compress' the data by balancing predictive capability against model complexity.

To me it seems that it might just as well make timelines longer to depend on algorithmic innovations as opposed to the improvements in compute that would help increase parameters.

I'll give you an analogy:

Suppose your friend is running a marathon. You hear that at the halfway point she has a time of 1 hour 30 minutes. You think "okay I estimate she'll finish the race in 4 hours". Now you hear she has been running with her shoelaces untied. Should you increase or decrease your estimate?

Well, decrease. The time of 1:30 is more impressive if you learn her shoe... (read more)

This analogy is misleading because it pumps the intuition that we know how to generate the algorithmic innovations that would improve future performance, much as we know how to tie our shoelaces once we notice they are untied. This is not the case. Research programmes can and do stagnate for long periods because crucial insights are hard to come by and hard to implement correctly at scale. Predicting the timescale on which algorithmic innovations occur is a very different proposition from predicting the timescale on which it will be feasible to increase parameter count.
2Matt Goldenberg3mo
It's not clear to me that this is the case. You have both found evidence that there are large increases available, AND evidence that there is one less large increase than previously. It seems to depend on your priors which way you should update about the expectance on finding future similar increases.

Google owns DeepMind, but it seems that there is little flow of information back and forth.

Example 1: GoogleBrain spent approximately $12M to train PaLM, and $9M was wasted on suboptimal training because DeepMind didn't share the Hoffman2022 results with them.

Example 2: I'm not a lawyer, but I think it would be illegal for Google to share any of its non-public data with DeepMind.