Garrett Baker

Independent alignment researcher


Isolating Vector Additions

Wiki Contributions


Yeah, my understanding of how bot detection on lots of these sites work is they track your mouse, then do a simple classification scheme on mouse movements to differentiate between bots and humans. So it's no surprise that moving your mouse with your arrow keys would make the classifier very suspicious.

I just reread that section, and I think I didn’t recognized it the first time because I wasn’t thinking “what concrete actions is Janus implicitly advocating for here”. Though maybe I just have worse than average reading comprehension.

There now exist two worlds I must glomarize between.

In the first, the irony is intentional, and I say “wouldn’t you like to know”. In the second, its not, “Irony? What irony!? I have no clue what you’re talking about”.

This link has significantly decreased my confusion about why Sutskever flipped! That situation sounds messy enough and difficult easy enough for MSM to misunderstand that it sounds likely to have occurred. Thanks!

Yesterday I had a conversation with a person very much into cyborgism, and they told me about a particular path to impact floating around the cyborgism social network: Evals.

I really like this idea, and I have no clue how I didn't think of it myself! Its the obvious thing to do when you have a bunch of insane people (used as a term of affection & praise by me for such people) obsessed with language models, who are also incredibly good & experienced at getting the models to do whatever they want. I would trust these people red-teaming a model and telling us its safe than the rigid, proscutean, and less-creative red-teaming I anticipate goes on at ARC-evals. Not that ARC-evals is bad! But that basically everyone looks more rigid, proscutean, and less creative than the cyborgists I'm excited about!

Can you add a "Reuters reports" qualifier to the title? As is, this is speculation presented at least in the title as fact.

Thus why I said related. Nobody was doing any mind-reading of course, but the principles still apply, since people are often actually quite good at reading each other.

China under Mao definitely seemed to do more than say they won’t respond to threats. Thus, the Korean war, and notably no nuclear threats were made, proving conventional war was still possible in a post-nuclear world.

For practical decisions, I don’t think threatbots actually exist if you’re a state by form other than natural disasters. Mao’s china was not good at natural disasters, but probably because Mao was a marxist and legalist, not because he conspicuously ignored them. When his subordinates made mistakes which let him know something was going wrong in their province, I think he would punish the subordinate and try to fix it.

Indeed. I also note that if innovation is hampered by institutional support or misallocated funding / support, we should have higher probability on a rapid & surprising improvement. If its hampered by cultural support, we should expect slower improvement.

By strikingly bad I mean there are easy changes EA can make to make it’s sponsored orgs have better incentives, and it has too much confidence that the incentives in the orgs it sponsors favor doing good above doing bad, politics, not doing anything, etc.

For example, nobody in Anthropic gets paid more if they follow their RSP and less of they don’t. Changing this isn’t sufficient for me to feel happy with Anthropic, but its one example among many for which Anthropic could be better.

When I think of an Anthropic I feel happy with I think of a formally defined balance of powers type situation with strong & public whistleblower protection and post-whistleblower reform processes, them hiring engineers loyal to that process (rather than building AGI), and them diversifying the sources for which they trade, such that its in none of their source’s interest to manipulate them.

I also claim marginal movements toward this target are often good.

As I said in the original shortform, I also think incentives are not all or nothing. Worse incentives just mean you need more upstanding workers & leaders.

Load More