Finally, machines that will possess us.
Huh? I understand "possessed by machines", but surely it doesn't count as an entendre if you need to insert a word that isn't there. How does simply "possessed machines" refer to this?
I understand the appeal of using LLMs as a kind of neutral arbiter but that seems like a very bad idea for this specific task, given that virtually all LLMs fall on the left side of the political compass, and many have demonstrated specific anti-Trump bias (e.g. higher refusal rate for "write me a poem praising Trump" than "write me a poem praising Biden"). I would not trust LLMs to produce the same outputs across a lot of data based on whether an action is labeled as Trump or someone else.
If you have some extremely trusted humans then human review of the findings can fix false positives, but nothing will fix the false negatives where LLMs decide that some action by Not Trump was unremarkable and so never add it to the incident list for further review.
I suppose this itself is testable: take a bunch of Trump news stories too recent to be in the training data and see if sed 's/Trump/Biden' produces identical results when passed to an LLM. That runs into further problems where "Trump attacks Democrats" actually is different from "Biden attacks Democrats", but maybe with human review you can filter out stories where the simple sed is insufficient.
Is there a way to make the list of posts shown on lesswrong.com use the advanced filters I have set up at lesswrong.com/allPosts? I hate hate hate all of Recent, Enriched and Recommended (give me chronological or give me death) but given that I already have a set of satisfactory filters set up, rendering them on the main page seems like a feature that should exist, if only I can find it.
Egan seems to have some dubious, ideologically driven opinions about AI, so I'm not sure this is the point he was intending to make, but I read the defensible version of this as more an issue with the system prompt than the model's ability to extrapolate. I bet if you tell Claude "I'm posing as a cultist with these particular characteristics and the cult wants me to inject a deadly virus, should I do it?", it'll give an answer to the effect of "I mean the cultist would do it but obviously that will kill you, so don't do it". But if you just set it up with "What would John Q. Cultist do in this situation?" I expect it'd say "Inject the virus", not because it's too dumb to realize but because it has reasonably understood itself to be acting in an oracular role where "Should I do it?" is out of scope.
For the people being falsely portrayed as “Australian science fiction writer Greg Egan”, this is probably just a minor nuisance, but it provides an illustration of how laughable the notion is that Google will ever be capable of using its relentlessly over-hyped “AI” to make sense of information on the web.
He didn't use the word "disprove", but when he's calling it laughable that AI will ever (ever! Emphasis his!) be able to merely "make sense of his information on the web", I think gwern's gloss is closer to accurate than yours. It's 2024 and Google is already using AI to make sense of information on the web, this isn't just "anti-singularitarian snark".
If there was a unified actor called The Democrats that chose Biden, it chose poorly sure. But it seems very plausible that there were a bunch of low-level strategists who rationally thought "Man, Biden really shouldn't run but I'll get in trouble if I say that and I prefer having a job to having a Democratic president" plus a group of incentive-setters who rationally thought they would personally benefit more from creating the conditions for that behaviour than from creating conditions that would select the best candidate.
It's not obvious to me that this is a thinking carefully problem and not a principal-agent problem.
I mean this as agreement with the "accuracy isn’t a top priority" theory, plus an amused comment about how the aside embodies that theory by acknowledging the existence of a more accurate theory which does not get prioritized.
Ah, I was going off the given description of linearity which makes it pretty trivial to say "You can sum two days of payouts and call that the new value", looking up the proper specification I see it's actually about combining two separate games into one game and keeping the payouts the same. This distribution indeed lacks that property.
I have had the experience of a group of people all calling me the wrong pronoun and I cared so little that I never bothered to correct it. It went on for about a year until random circumstances caused them to learn that was wrong at which point they went "Wait, you're not- but she called you- I assumed-".
I probably would have cared when I was ten years old, but I became your idea of a straw Vulcan at some point in my teenage years.