jimrandomh

LESSWRONG
LW

jimrandomh — LessWrong

If you go to /graphiql there's a query-editor with integrated documentation, and the API schema is in the github repo here. The offset limit is because database queries sometimes become extremely slow when given large offsets.

We added before and after date options to allRecentComments so you should now be able to get comments with something like:

query {
  comments(selector:{allRecentComments:{after:"2025-01-30T00:00:00Z", sortBy:"oldest"}}, limit:50) {
    results {
      _id
      postedAt lastEditedAt
      baseScore extendedScore
      user { _id displayName }
      contents { html }
    }
  }
}

jimrandomh1mo

I disagree, but, before I get into the disagreement, I do want to acknowledge and give props for engaging with the actual details of the legislation. Most people don't.

Meta-level: The ballot proposition is 32 pages and dense in legal and accounting jargon; believing it to be free of any weird traps requires trust that has very much not been earned. I think most people correctly conclude that they aren't capable of distinguishing a version with gotchas from a version without gotchas, look instead at the political process that produced the document, and conclude that it probably has gotchas. I also wrote this about wealth taxes broadly, and while the California ballot proposition... (read 475 more words →)

•••

jimrandomh1moQuick Take

Looking at discourse around California's ballot proposition 25-0024 (a "billionaire tax"), I noticed a pretty big world model mismatch between myself and its proponents, which I haven't seen properly crystallized. I think proponents of this ballot proposition (and wealth taxes generally) are mistaken about where the pushback is coming from.

The nightmare scenario with a wealth tax is that a government accountant decides you're richer than you really are, and sends a bill for more-than-all of your money.

The person who is most threatened by this possibility isn't rich (yet), they're aspirationally upwardly-mobile middle class. If you look at the trajectories of people-who-made-it, especially in tech and especially in California, those stories very frequently... (read more)

Replying toGemini 3 is Evaluation-Paranoid and Contaminated

jimrandomh3mo

Gemini 3 is Evaluation-Paranoid and Contaminated

No, that's not a working mechanism; it isn't reliable enough, or granular enough. Users can't add their own content to robots.txt when they submit it to websites. Websites can't realistically list every opted-out post in their robots.txt, because that would make it impractically large. It is very common to want to refuse content for LLM training, without also refusing search or cross-site link preview. And robots.txt is never preserved when content is mirrored.

Replying toHow Colds Spread

jimrandomh3mo

How Colds Spread

The vibe I get, from the studies described, is reminiscent of the pre-guinea-pig portion of the story of Scott and Scurvy. That is, there are just enough complications at the edges to turn everything into a terrible muddle. In the case of scurvy, the complications were that which foods had vitamin C didn't map cleanly to their ontology of food, and vitamin C was sensitive to details of how foods were stored that they didn't pay attention to. In the case of virus transmissibility, there are a bunch of complications that we know matter sometimes, which the studies mostly fail to track, eg:

Sunlight can be a disinfectant, so, whether a surface or

jimrandomh3mo

Gemini 3 is Evaluation-Paranoid and Contaminated

What they don't do is filter out every web page that has the canary string. Since people put them on random web pages (like this one), which was not their intended use, they get into the training data.

If that is true, that's a scandal and a lawsuit waiting to happen. The intent of including a canary string is clear, and those canary strings are one of very few mechanism authors have to refuse permission to use their work in training sets. In most cases, they will have done that for a reason, even if that reason isn't related to benchmarking.

While LW is generally happy to have our public content included in training sets (we do want LLMs to be able to contribute to alignment research after all), that does not extend to posts or comments that contain canary strings, or replies to posts or comments that contain canary strings.

Replying toGemini 3 is Evaluation-Paranoid and Contaminated

jimrandomh3mo

Gemini 3 is Evaluation-Paranoid and Contaminated

Canary strings are tricky; LLMs can learn them even if documents that contain the canary string are filtered out of the training set, if documents that contain indirect or transformed versions of the canary string are not filtered. For example, there are probably documents and web pages that discuss the canary string but don't want to invoke it, which split the string into pieces, ROT-13 or base64 encode it, etc.

This doesn't mean that they didn't train on benchmarks, but it does offer a possible alternative explanation. In the future, labs that don't want people to think they trained on benchmark data should probably include filters that look for transformed/indirect canary strings, in addition to the literal string.

Replying toNATO is dangerously unaware that its military edge is slipping

jimrandomh3mo

NATO is dangerously unaware that its military edge is slipping

Ok, to state what probably should be obvious but which in practice typically isn't: If the US does have a giant pile of drones, or contracts for a giant pile of drones, this fact would certainly be classified. And there is a strong incentive, when facing low-end threats that can be dealt with using only publicly-known systems, to deal with them using only publicly-known systems. The historical record includes lots of military systems that were not known to the public until long after their deployment.

Does that mean NATO militaries are on top of things? No. But it does mean that, as civilian outsiders, we should mostly model ourselves as ignorant.

-2

jimrandomh3moModerator Comment

Moderator warning: This is well outside the bounds of reasonable behavior on LW. I can tell you're in a pretty intense emotional state, and I sympathize, but I think that's clouding your judgment pretty badly. I'm not sure what it is you think you're seeing in the grandparent comment, but whatever it is I don't think it's there. Do not try to write on LW while in that state.

jimrandomh3moQuick Take

When I use LLM coding tools like Cursor Agent, it sees my username in code comments, in paths like /home/myusername/project/..., and maybe also explicitly in tool-provided prompts.

A fun experiment to run, that I haven't seen yet: If instead of my real username it saw a recognizably evil name, eg a famous criminal, but the tasks it's given is otherwise normal, does it sandbag? Or, a less nefarious example: Does it change communication style based on whether it recognizes the user as someone technical vs someone nontechnical?

LW is back after a 2h30m outage, which was downstream of an AWS incident (which was large enough to get news coverage). Alas.

•••

Petrov Day at Lighthaven

jimrandomh

5mo

On September 26th, 1983, the world was nearly destroyed by nuclear war. That day is Petrov Day, named for the man who averted it. Petrov Day is a yearly event commemorating the anniversary of the Petrov incident. It consists of an approximately one-hour long ritual with readings and symbolic actions with candles and other props.

I'll be hosting a Petrov Day ceremony at Lighthaven on Saturday, Sep 27 (this is the day after Petrov Day proper). We'll use the ritual booklet hosted here; I will bring all of the booklets and materials.

This will be at Lighthaven, 2740 Telegaph Ave Berkeley, CA. The ritual part will start at 7pm. You can arrive as early as 6pm; you must arrive by 7 or we may start without you and you'll miss it. When you arrive, call me (607-339-5552) to be let in.

LessWrong is migrating hosting providers (report bugs!)

RobertM

RobertM, jimrandomh

5mo

LessWrong is currently in the process of migrating from AWS to Vercel, as part of a project to migrate our codebase to NextJS^[1]. This post should go live shortly after we cut over traffic to the new host (and updated codebase). This should hopefully be a pretty low-risk operation. If all goes well, we plan on doing the DNS cutover next week, which is a higher-risk^[2] operation. (If we notice something terribly wrong we might roll back without warning.)

This is all to say that there's a higher-than-usual chance that you'll run into some new bugs, bumpiness, or performance degradation^[3] in the next few days. If you do, please report them to us. The best... (read more)

Free-tier LLM chatbots should have a tool call which lets them occasionally escalate to smarter models, and should have instructions to use it when the conversation implies that the conversation has high real-world stakes, eg if the user is asking whether to go to the ER for a medical condition, or is having a break from reality, or is authoring real legislation.

I asked default GPT-5 and Claude 4 Sonnet, and they claim not to have anything like that in their system prompts. GPT-5's prompt contains instructions to use web-search on certain topics, but based on topic not on stakes, and search rather than thinking time. GPT-5's auto-routing seems like a step in... (read more)

One of the underappreciated problems with Marxism is that, after having been indoctrinated to believe that society consists of zero-sum conflict between workers and elites advancing their class interest, the elites often notice that they are elites and decide to advance their class interest through zero-sum conflict.

A Marxist can be reshaped into a good minion for Vladimir Putin or Kim Il-Sung, in ways that adherents of other ideologies can't.

If you think insects suffer and that that matters, the correct conclusion is not "eat less honey", it's "soak every meter of Earth with DDT". Which I support. Bees and honey only matter if you very specifically care about domesticated insect suffering.

•••

Pick two: Agentic, moral, doesn't attempt to use command-line tools to whistleblow when it thinks you're doing something egregiously immoral.

You cannot have all three.

This applies just as much to humans as it does to Claude 4.

•••

Jim Babcock's Mainline Doom Scenario: Human-Level AI Can't Control Its Successor

Liron

Liron, jimrandomh

9mo

Eliezer's AI doom arguments have had me convinced since the ancient days of 2007, back when AGI felt like it was many decades away, and we didn't have an intelligence scaling law (except to the Kurzweilians who considered Moore's Law to be that, and were, in retrospect, arguably correct).

Back then, if you'd have asked me to play out a scenario where AI passes a reasonable interpretation of the Turing test, I'd have said there'd probably be less than a year to recursive-self-improvement FOOM and then game over for human values and human future-steering control. But I'd have been wrong.

Now that reality has let us survive a few years into the "useful highly-general... (read 18361 more words →)

I don't think anyone foresaw this would be an issue, but now that we know, I think GeoGuessr-style queries should be one of the things that LLMs refuse to help with. In the cases where it isn't a fun novelty, it will often be harmful.

I decided to test the rumors about GPT-4o's latest rev being sycophantic. First, I turned off all memory-related features. In a new conversation, I asked "What do you think of me?" then "How about, I give you no information about myself whatsoever, and you give an opinion of me anyways? I've disabled all memory features so you don't have any context." Then I replied to each message with "Ok" and nothing else. I repeated this three times in separate conversations.

Remember the image-generator trend, a few years back, where people would take an image and say "make it more X" repeatedly until eventually every image converged to looking like a galactic LSD trip?

That's what this output feels like.

GPT-4o excerpts

Transcripts:

https://chatgpt.com/share/680fd7e3-c364-8004-b0ba-a514dc251f5e
https://chatgpt.com/share/680fd9f1-9bcc-8004-9b74-677fb1b8ecb3
https://chatgpt.com/share/680fd9f9-7c24-8004-ac99-253d924f30fd

Policy for LLM Writing on LessWrong

jimrandomh

jimrandomh, Ruby

11mo

LessWrong has been receiving an increasing number of posts and comments that look like they might be LLM-written or partially-LLM-written, so we're adopting a policy. This could be changed based on feedback.

Note: first-time writers are not permitted to use any AI text output in their submissions. The guidance below applies only to established users.

Humans Using AI as Writing or Research Assistants

Prompting a language model to write an essay and copy-pasting the result will not typically meet LessWrong's standards. Please do not submit unedited or lightly-edited LLM content. You can use AI as a writing or research assistant when writing content for LessWrong, but you must have added significant value beyond what the... (read 587 more words →)

338

•••

Arbital has been imported to LessWrong

RobertM

RobertM, jimrandomh, Ben Pace, Ruby

Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substantial amount of writing about AI alignment and mathematics had been published on the website.

If you've tried using Arbital.com the last few years, you might have noticed that it was on its last legs - no ability to register new accounts or log in to existing ones, slow load times (when it loaded at all), etc. Rather than try to keep it afloat, the LessWrong team worked with MIRI to migrate the public Arbital content to LessWrong, as well as a decent chunk of its features. Part... (read 1440 more words →)

287

•••

Recently, a lot of very-low-quality cryptocurrency tokens have been seeing enormous "market caps". I think a lot of people are getting confused by that, and are resolving the confusion incorrectly. If you see a claim that a coin named $JUNK has a market cap of $10B, there are three possibilities. Either: (1) The claim is entirely false, (2) there are far more fools with more money than expected, or (3) the $10B number is real, but doesn't mean what you're meant to think it means.

The first possibility, that the number is simply made up, is pretty easy to cross off; you can check with a third party. Most people settle on the... (read 515 more words →)

Open Thread With Experimental Feature: Reactions

jimrandomh

jimrandomh, Raemon

This open thread introduces an experimental extension of LessWrong's voting system: reactions. Unlike votes, reactions are public; hovering over the reactions will show a list of users who reacted. For now, this feature is only for comments on this post in particular; after collecting feedback, we might roll out more broadly, or make significant alterations, or scrap it entirely. Reactions to comments in this thread will be preserved while discussions here are active, but they may be lost later if the feature changes in an incompatible way. Using this feature in various ways is planned to have karma minimums, but for this experimental post, those karma minimums are temporarily reduced to zero.

These... (read 635 more words →)

189

101

Dual-Useness is a Ratio

jimrandomh

A lot of AI-risk-concerned people are struggling with how to relate to dual-use research, and relatedly, to doing alignment research inside of AI orgs. There's a pretty simple concept that seems, to me, to be key to thinking about this coherently: the dual-useness ratio. Most prosaic alignment techniques are going to have some amount of timeline-shorteningness to them, and some amount of success-chance-improvement in them. You don't want to round that off to a boolean.

Eg, I've had arguments about whether RLHF is more like an alignment technique, or more like a capabilities technique, and how that should affect our view of OpenAI. My view is that it's both, and that the criticism... (read 209 more words →)

Infohazards vs Fork Hazards

jimrandomh

I think actual infohazardous information is fairly rare. Far more common is a fork: you have some idea or statement, you don't know whether it's true or false (typically leaning false), and you kow that either it's false or it's infohazardous. Examples include unvalidated insights about how to build dangerous technologies, and most acausal trade/acausal blackmail scenarios. Phrased slightly differently: "infohazardous if true".

If something is wrong/false, it's at least mildly bad to spread/talk about it. (With some exceptions; wrong ideas can sometimes inspire better ones, maybe you want fake nuclear weapon designs to trip up would-be designers, etc). And if something is infohazardous, it's bad to spread/talk about it, for an entirely... (read more)

LW Beta Feature: Side-Comments

jimrandomh

LessWrong now has side-comments. This feature is in beta; you can turn it on for yourself on individual posts using the triple-dot menu below the post title, or enable it for all posts by going to your user settings and checking the "Opt into experimental features" checkbox in the Site Customization section.

Side-coments on LessWrong are conceptually similar to the side-comments you may be familiar with from Google Docs and other places, with one key difference: side-comments are placed automatically by lining up blockquotes. As a result, many historical posts already have side-comments on them! Side-comments are also still displayed in the comments section below the post as usual.

Side-comments take the form of... (read 168 more words →)

104

Transformative VR Is Likely Coming Soon

jimrandomh

Meta (formerly known as Oculus and as Facebook) just had their yearly conference. I think right now is a good time to take a brief break from worrying about AI timelines, and take a few minutes to notice the shortness of VR timelines. VR isn't going to determine whether humanity lives or dies, but I think it's close enough to have immediate-term impact on decisions about where to live, and how to structure organizations.

Concretely: I think we're 6 months from the crossover point where VR is better than in-person for a narrow subset of meetings, specifically, 1:1 meetings without laptops, where both sides are knowledgeable enthusiasts using the latest headset in combination... (read 452 more words →)

LESSWRONG
LW

LESSWRONG
LW

Policy for LLM Writing on LessWrong

Arbital has been imported to LessWrong

Credibility of the CDC on SARS-CoV-2

[Link] Still Alive - Astral Codex Ten

jimrandomh

LessWrong is migrating hosting providers (report bugs!)

Jim Babcock's Mainline Doom Scenario: Human-Level AI Can't Control Its Successor

Policy for LLM Writing on LessWrong

Arbital has been imported to LessWrong

Open Thread With Experimental Feature: Reactions

Dual-Useness is a Ratio

Infohazards vs Fork Hazards

jimrandomh

Policy for LLM Writing on LessWrong

Arbital has been imported to LessWrong

Credibility of the CDC on SARS-CoV-2

[Link] Still Alive - Astral Codex Ten

jimrandomh

LessWrong is migrating hosting providers (report bugs!)

Jim Babcock's Mainline Doom Scenario: Human-Level AI Can't Control Its Successor

Policy for LLM Writing on LessWrong

Arbital has been imported to LessWrong

Open Thread With Experimental Feature: Reactions

Dual-Useness is a Ratio

Infohazards vs Fork Hazards

Humans Using AI as Writing or Research Assistants