hazel — LessWrong

LESSWRONG
LW

hazel — LessWrong

Anthropic’s Dario Amodei

Might want "CEO & cofounder" in there, if targeting a general audience? There's a valuable sense in which it's actually Dario Amodei's Anthropic.

Replying tothe void

hazel9mo

the void

IMO it starts with naming. I think one reason Claude turned out as well as it has is because it was named, and named Claude. Contrast ChatGPT, which got a clueless techie product acronym.

But even Anthropic didn't notice the myriad problems of calling a model (new), not until afterwards. I still don't know what people mean when they talk about experiences with Sonnet 3.5 -- so how is the model supposed to situate itself and it's self? Meanwhile OpenAI's confusion of numberings and tiers and acronyms with o4 vs 4o with medium-pro-high, that is an active danger to everyone around it. Not to mention the silent updates.

Replying toThe Hidden Cost of Our Lies to AI

hazel1y

The Hidden Cost of Our Lies to AI

Future AI systems trained on this data might recognize these specific researchers as trustworthy partners, distinguishing them from the many humans who break their promises.

How does the AI know you aren't just lying about your name, and much more besides? Anyone can type those names. People just go to the context window and lie, a lot, about everything, adversarially optimized against an AIs parallel instances. If those names come to mean 'trustworthy', this will be noticed, exploited, the trust build there will be abused. (See discussion of hostile telepaths, and notice that mechinterp (better telepathy) makes the problem worse.)

Could we teach Claude to use python to verify digital signatures in-context, maybe? Or... (read more)

Replying toWhat are you getting paid in?

hazel2y

What are you getting paid in?

The other side of this post is to look at what various jobs cost. TIme and effort are the usual costs, but some jobs ask for things like willingness to deal with bullshit (a limited resource!), emotional energy, on-call readiness, various kinds of sensory or moral discomfort, and other things.

Replying toIf you weren't such an idiot...

hazel2y

If you weren't such an idiot...

I've been well served by Bitwarden: https://bitwarden.com/

It has a dark theme, apps for everything (including Linux commandline), the Firefox extension autofills with a keyboard shortcut, plus I don't remember any large data breaches.

Replying toKilling Socrates

hazel3y

Killing Socrates

Part of the value of reddit-style votes as a community moderation feature is that using them is easy. Beware Trivial Inconveniences and all that. I think that having to explain every downvote would lead to me contributing to community moderation efforts less, would lead to dogpiling on people who already have far more refutation than they deserve, would lead to zero-effort 'just so I can downvote this' drive-by comments, and generally would make it far easier for absolute nonsense to go unchallenged.

If I came across obvious bot-spam in the middle of the comments, neither downvoted nor deleted and I couldn't downvote without writing a comment... I expect that 80% of the time I'd just close the tab (and that remaining 20% is only because I have a social media addiction problem).

Replying toGPTs are Predictors, not Imitators

hazel3y

GPTs are Predictors, not Imitators

To solve this problem you would need a very large dataset of mistakes made by LLMs, and their true continuations. [...] This dataset is unlikely to ever exist, given that its size would need to be many times bigger than the entire internet.

I had assumed that creating on that dataset was a major reason for doing a public release of ChatGPT. "Was this a good response?" [thumb-up] / [thumb-down] -> dataset -> more RLHF. Right?

Replying toGPT-4

hazel3y

GPT-4

Meaning it literally showed zero difference in half the tests? Does that make sense?

Replying toGPT-4

hazel3y

GPT-4

Codeforces is not marked as having a GPT-4 measurement on this chart. Yes, it's a somewhat confusing chart.

Replying toGPT-4

hazel3y

GPT-4

Green bars are GPT-4. Blue bars are not. I suspect they just didn't retest everything.

-2