LESSWRONG
LW

566
dirk
82243032
Message
Dialogue
Subscribe

see also my eaforum at https://forum.effectivealtruism.org/users/dirk and my tumblr at https://d-i-r-k-s-t-r-i-d-e-r.tumblr.com/ .

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
2dirk's Shortform
2y
64
Learning information which is full of spiders
dirk1h10

There are actually quite a few more, though most of them feature her being isekaied elsewhere; https://glowfic.com/characters/12823?view=posts should show you ~all of them.

Reply
LessWrong Feed [new, now in beta]
dirk14h10

When I scroll down a little bit, the text on the feed preview for 2025 Prediction Thread gets wider so it clips off the background:

Reply
Nicolas Lupinski's Shortform
dirk1d10

Asserting nociception as fact when that's the very thing under question is poor argumentative behavior.

Does your model account for Models Don't "Get Reward"? If so, how?

Reply
dirk's Shortform
dirk1d32

I interrogated Claude further (full conversation), and it claims that it's using a system which attaches citations to its statements, rather than inserting them manually. This seems corroborated by the fact that when I asked it to try writing them manually, all the citations went to places which directly related to the paragraphs they were attached to. Consequently, I think this is not Sonnet 4.5 hallucinating, but rather an issue with some intervening layer.

Reply
Just Make a New Rule!
dirk1d30

You've gotten what normal people think almost precisely backwards. Normal people understand that it's infeasible to explicitly spell out all desired and undesired behaviors, which is why the fact that it's important to follow the spirit as well as the letter of the law is so commonly repeated as to be cliché, "rules-lawyering" is pejorative, and work-to-rule strikes are even possible to conceptualize. Why, Wikipedia tells me that even the Bible has an insult for that! It's, well, autists who, being impaired in the understanding-the-spirit-of-people's-words department, want all appropriate behaviors exhaustively spelled out in written law. (Since we're on LessWrong it is indeed appropriate to err in that direction—certainly many recent problems could've been prevented with an explicit rule against giving people shit for comment-blocking you :P—but let's not pretend that's what normies believe in).

You also paint an overly-rosy picture of how normal people react to having to add a new rule; typically, it's seen as quite negative if one behaves in such a way as to require the creation of a rule specifically banning that behavior. This is because the response to rules normal people expect is for the would-be rulebreaker to use those rules as information about the intentions of the rulemaker, and then refrain from behavior the rulemaker would be displeased with in expectation. I think your characterization of rules as freeing people from the burden of socially modeling others is more fantasy than fact.

With regards to lead paint: paint manufacturers aren't lead-maximizers, sure: they're profit-maximizers. They are in fact optimizing in an orthogonal direction from human values! (Consider e.g. FTX: they weren't trying to financially harm customers, it's just that customer accounts were made of dollars they could use for something else). It's for this reason that financial regulations especially have to be so tightly-specified and nitpicky; when they aren't, companies generally do in fact use the nearest unblocked strategy. (And even when they are, companies sometimes just ignore the law! It's not universal but it's not as unheard-of as you seem to imply.)

(And for that matter, how do you think rules are enforced? It's by authorities making judgement calls about whether the fine details of peoples' behavior broke the rules!)

I think this post would be stronger if it focused on arguing for your position, which is a reasonable one, rather than on drawing these dubious associations.

Reply
Did you know you can just buy blackbelts?
dirk2d10

Really struck a nerve with the misaligned optimizer squad, huh.

Reply
Kongo Landwalker's Shortform
dirk2d20

Brian Eno quote (here on Goodreads) which this reminded me of; might be an interesting counterpoint:

Whatever you now find weird, ugly, uncomfortable and nasty about a new medium will surely become its signature. CD distortion, the jitteriness of digital video, the crap sound of 8-bit - all of these will be cherished and emulated as soon as they can be avoided. It’s the sound of failure: so much modern art is the sound of things going out of control, of a medium pushing to its limits and breaking apart. The distorted guitar sound is the sound of something too loud for the medium supposed to carry it. The blues singer with the cracked voice is the sound of an emotional cry too powerful for the throat that releases it. The excitement of grainy film, of bleached-out black and white, is the excitement of witnessing events too momentous for the medium assigned to record them.

Reply
dirk's Shortform
dirk2d20

Sonnet 4.5 hallucinates citations. See for instance this chat I was just having with it; if you follow the citations in its third message, you'll find that the majority of them don't relate at all to the claims they're attached to. For example, its citation for "a 19th-century guide to diary keeping" goes to Gender identity better than sex explains individual differences in episodic and semantic components of autobiographical memory: An fMRI study. (It also did this with some local politics questions I had the other day).

When I've looked up the mis-cited information I've often found it is in fact correct—for instance, the diary entry it attributes to that fMRI study really does exist on the web—but the incorrect citations are nonetheless an issue worth being aware of.

Reply
Curing PMDD with Hair Loss Pills
dirk3d10

I think if you have this little information it's better to refrain from mentioning it. Without details like the friends' level of domain expertise, their basis for dismissal, your level of trust in your acquaintance's judgement, &c there's no way to judge the credibility of this thirdhand report, and thus no good reason to update on it. However, most people involuntarily privilege the hypotheses they're presented with, so by sharing this kind of underspecified information as though it were something concrete enough to give weight to, you risk causing people to update in epistemically unjustified directions, either because they trust you too much, or because the low standards of evidence shown here cause them to erroneously update against the position you express (a lá The main way I've seen people turn ideologically crazy).

Reply
Did you know you can just buy blackbelts?
dirk3d*-30

The rules have intent as well as literal text; the point of calibration trivia is to get better at calibration, not to find the person who's best at Goodharting the rules to get a high score without being well-calibrated. (Personally, I don't find it fun to play with such people).

Reply
Load More
18Feature request: comment bookmarks
10mo
2
12GPT-4o Can In Some Cases Solve Moderately Complicated Captchas
1y
2
26Just because an LLM said it doesn't mean it's true: an illustrative example
1y
12
2dirk's Shortform
2y
64
Guide to the LessWrong Editor
5 months ago
Guide to the LessWrong Editor
5 months ago
Guide to the LessWrong Editor
5 months ago
Guide to the LessWrong Editor
5 months ago