Joseph Miller


Sorted by New

Wiki Contributions


Any update on when this might happen?

In a transformer, the compute cost for context length n grows at O(n^2)[4], so it's a 16x increase in compute cost to go from 2000 tokens to 8000, and another 16x increase to go to 32000. To the best of my knowledge, there isn't much additional cost to a longer context window - the number of parameters to encode more positions is very small for a model this big.

I do not understand this paragraph, it seems like the first sentence contradicts the second.

Edit: I think I understand. Are you saying there isn't much additional cost on top of the cost mentioned in the previous sentence because the position encoding is tiny compared to everything else in the model?

I've also noticed this. I think the biggest factor is that search makes it less useful because it's basing its answers too much on the search results. Probably bad fine tuning is another part of it. I usually prompt it with "Don't perform any searches" and get better results.

I'd recommend this extension that allows you to conveniently watch videos faster than 2x.

Oh okay, I misunderstood. I forgot about that whole DNC scandal.

I agree that a public investigation would probably hurt the rationalist's reputation.

However reputation is only one consideration and the key disanalogy is still the level of evidence. Also a discreet investigation may be possible.

I don't think there is grounds for a high profile external investigation into the rationalist community.

But yes, we should try to be better than the rest of society in every way. I think the risk of sexual abuse is high enough that this would be a profitable use of resources whereas my prior is that the risk of child abuse (at least child sex trafficking) does not merit spending effort to investigate.

Idk anything about the DNC so I don't know what it's worth their effort to do.

I think you are suggesting that I am committing the fallacy of privileging the hypothesis, but I think the stories in the article and associated comment sections are sufficient to raise this to our attention.

Several things can be true simultaneously:

  • This article is similar to much other mainstream coverage of EA/rationality and paints the community in an unfairly negative light.
  • The specific claims in the article have been previously addressed.
  • There is no good evidence that the LW / rationalist community has higher than average levels of abuse.
  • It is worthwhile putting effort into finding out if the community has higher than average levels of abuse, which it does not seem has been done by people in the community. Given the gender imbalance, our prior should be that higher than average levels of abuse are somewhat likely.
  • We can and should have much lower than average levels of abuse.
  • This community strives to exceed the rest of society in many domains. It is anomalous that people are quite uninterested in optimizing this as it seems clearly important.

To be clear, I'm not at all confident that all of the empirical claims above are true. But it seems that people are using the earlier points as an excuse to ignore the later ones. 

It would be great if you could share some code from your experiments. Did you use PyTorch? It seems non-trivial to load the model as it wasn't implemented in PyTorch.


For each token prediction we record the activation of the neuron and whether on not " an" has a greater logit than any other token (if it was the top prediction).

We group the activations into buckets of width . For each bucket we plot

Does that clarify things for you?

I don't think there was much reason for choosing " a" vs. " an" to study over something else. This was the first thing we investigated and we were excited to see a single neuron mechanism, so we kept going. Bear in mind this project originated in a 48 hour hackathon :)

Load More